Join to apply for the AI Platform Engineer role at Denvr .
Get AI-powered advice on this job and more exclusive features.
Denvr is a vertically integrated AI Platform Services company with headquarters in Calgary, Canada. We provide foundational compute infrastructure and services to support the broader AI ecosystem and its end users. The platform includes cloud-native solutions for training, inference, high-performance computing, data processing, scalable storage, and a suite of software toolsets that accelerate the development, deployment, and integration of AI applications.
These capabilities are accessible via the public Denvr AI Cloud or through Private AI Platform Services , which offer fully dedicated, sovereign environments with enhanced security. Private deployments incorporate advanced data centers, optimized compute architectures, high-throughput storage fabrics, and tightly integrated platform operations software—engineered to meet the demands of large-scale, mission‑critical AI workloads.
We design proprietary AI Data Centers powered by our ultra‑efficient, modular data centers built for hyper‑scale AI deployments.
At Denvr, we’re driven to create exceptional customer experiences that empower AI innovators, entrepreneurs, and enterprises to achieve real results. Our mission is to help unlock breakthroughs that transform industries, enhance creativity, and shape a better future.
Why Join Us
Joining Denvr means being part of a world‑class team in the fast‑moving field of AI and high‑performance computing. We value curiosity, collaboration, and continuous learning. Our people are proactive problem solvers who take pride in delivering great results, thrive in open and transparent environments, and enjoy learning by doing.
If you’re forward‑thinking, motivated by innovation, and excited to help drive a rapidly growing business, Denvr is the place to make an impact—and grow your career alongside exceptional teammates.
About the Role
As an AI Platform Engineer, you will play a pivotal role in ensuring the success of our AI Platform design, implementation, and operation. The AI Platform team designs and manages AI compute architectures and software supporting state‑of‑the‑art NVIDIA Blackwell / Hopper / Ampere clusters with InfiniBand and RoCE GPU fabrics. The team additionally tests and manages alternate silicon solutions and network and data fabrics to support Denvr AI Cloud customers. You will collaborate closely with cross‑functional teams, namely Product, Data Center Operations, and Customer Support, to deliver exceptional customer products and experiences.
What You’ll Do
Technical Responsibilities :
- AI & HPC Infrastructure : Architect and optimize high-performance AI Platform solutions for AI training and inferencing, leveraging NVIDIA systems (H200 / H200 / A100 / GH200) and distributed training optimizations (NCCL, RDMA / Infiniband).
- Cloud & Virtualization : Administer RKE Kubernetes clusters, including custom operator development (KOPF), CNIs (Kube-OVN), and KubeVirt, alongside managing traditional virtualization (VMware ESXi / vCenter) and bare‑metal provisioning (Metal3, Ironic).
- Linux & Systems Engineering : Perform advanced OS management (Ubuntu), including kernel parameter optimization and hardware‑level troubleshooting on Supermicro / Dell platforms.
- Advanced Networking : Manage high‑throughput network fabrics using BGP EVPN, SONiC, and leaf / spine topologies, while maintaining network security via firewalls, VPNs, internet gateways, and granular policy management.
- Storage & Data Platforms : Deploy and maintain scalable, high-performance storage fabrics for data‑intensive workloads using technologies such as WEKA, Ceph (Rook), Qumulo, and Dell PowerStore.
- Systems Integration : Design and build critical backend APIs and microservices using Python (FastAPI, asyncio, Pydantic) or Golang, including the development of Kubernetes Operators and integration with relational / NoSQL databases.
- Automation & IaC : Drive infrastructure consistency and repeatability through Terraform, CloudFormation, and Ansible, integrated within robust CI / CD pipelines.
- Operational Excellence : Adherence to change / release management, incident / problem management, documentation standards, cross‑team architectural reviews, post‑sales L3 support, and customer‑facing technical engagement for both public cloud and private platform deployments.
Customer Interaction :
Work cross‑functionally with vendors, engineering, and platform operations to define requirements, document processes, and continuously improve platform reliability and performance.Support business development and customer success teams by providing clear technical guidance, translating complex concepts, and aligning solutions to customer requirements.Opportunities to meet directly with customers to design and review complex platform integrations, custom architectures, and workload‑optimized AI solutions.Collaborate with vendors to evaluate and validate new GPU and ASIC hardware, firmware, and system architectures, providing feedback for integration and improvement.Provide L3 engineering support for advanced troubleshooting, root‑cause analysis, and performance evaluation across compute, storage, networking, and AI systems.Professional Development :
Stay up to date with industry trends, attend workshops, seminars, and conferences.Pursue relevant certifications and continuous learning in cloud, AI / ML infrastructure, networking, storage, and security domains.Engage in internal knowledge sharing through documentation, demos, tech talks, and mentorship of peers.Who You Are
Independent and Self‑Starter :
Able to operate in a fast‑moving, evolving business with strong prioritization toward customer and business value.Comfortable using AI‑assisted development tools to accelerate delivery.Collaborates effectively within a broader platform engineering group while owning high‑quality execution of assigned work.Education and Experience :
Post secondary education in Computer Science, Engineering, Information Technology, or related technical discipline.3+ years experience with AI / ML solutions engineering, cloud infrastructure, or a related field (preferred).Background in software development, system design, or technical consulting is highly valued.Soft Skills :
Excellent written and verbal communication, with the ability to simplify and explain complex technical concepts.Strong customer empathy and discovery skills to uncover real needs and guide solution direction.Confident presenter who can engage both technical and non‑technical audiences.Highly organized, able to manage multiple priorities, and comfortable shifting focus as business needs evolve.Creative problem‑solver with a structured approach to diagnosing issues and designing solutions.Strong sense of ownership, accountability, and alignment with company vision and direction.Industry Knowledge :
Familiarity with AI industry trends, cloud and data center infrastructure, and secure, reliable operations at scale.Understanding of AI / ML workflows (training, multi‑GPU / multi‑node scaling, inferencing) and distributed storage fundamentals.General awareness of competitive landscape and emerging technologies in AI infrastructure and cloud services.Teamwork and Collaboration :
Effective collaborator across cross‑functional teams including Sales, Marketing, Product, and Engineering.Comfortable working in customer‑facing technical roles where clarity, empathy, and responsiveness are critical.Analytical Skills :
Strong analytical mindset for evaluating complex systems and diagnosing issues across compute, storage, and networking.Ability to design, articulate, and innovate technical solutions aligned with customer and business requirements.If you are passionate about technology, system design and innovative technical solutions, and want to be part of a remote‑first forward‑thinking company, we would love to hear from you. Click on the link to apply!
Seniority level
Mid‑Senior levelEmployment type
Full‑timeJob function
Engineering and Information Technology#J-18808-Ljbffr