Talent.com
Software Development Engineer– Software DevOps & Continuous Integration Team
Software Development Engineer– Software DevOps & Continuous Integration TeamAdvanced Micro Devices, Inc • CALGARY, Alberta, Canada
Software Development Engineer– Software DevOps & Continuous Integration Team

Software Development Engineer– Software DevOps & Continuous Integration Team

Advanced Micro Devices, Inc • CALGARY, Alberta, Canada
29 days ago
Job type
  • Full-time
Job description

WHAT YOU DO AT AMD CHANGES EVERYTHING At AMD, our mission is to build great products that accelerate next-generation computing experiences—from AI and data centers, to PCs, gaming and embedded systems. Grounded in a culture of innovation and collaboration, we believe real progress comes from bold ideas, human ingenuity and a shared passion to create something extraordinary. When you join AMD, you’ll discover the real differentiator is our culture. We push the limits of innovation to solve the world’s most important challenges—striving for execution excellence, while being direct, humble, collaborative, and inclusive of diverse perspectives. Join us as we shape the future of AI and beyond. Together, we advance your career. THE ROLE : The AI / ML Frameworks team is hiring an MTS Software Development Engineer to build and maintain scalable DevOps infrastructure that accelerates AMD’s AI software development. You will design and own CI / CD pipelines, manage Kubernetes‑based GPU environments, and automate systems using Python, Go, and Ansible. The role involves creating and maintaining production‑grade automation and tooling that enables fast, reliable software delivery across teams. THE PERSON : The ideal candidate is a skilled DevOps / infrastructure engineer with strong programming abilities. They write clean, maintainable code in Python or Go, and can navigate ML framework source code (PyTorch, TensorFlow, ROCm) to debug issues, optimize build processes, or contribute fixes. They have solid knowledge of build systems and toolchains—understanding how CMake, Bazel, and compiler toolchains work is critical for effective issue triaging and root cause analysis. They are proficient in Kubernetes, CI / CD tools, and infrastructure automation frameworks such as Ansible. Familiarity with C++ is valuable for navigating lower-level framework components. This person thrives in collaborative, fast-paced environments, can drive technical execution with minimal oversight, and is passionate about knowledge sharing and upleveling their team. KEY RESPONSIBILITIES : Build System Expertise & Issue Triaging : Develop deep expertise in build tools and flows (CMake, Bazel, Make, compiler toolchains). Triage complex build failures by understanding the full build pipeline—from source to binary. Identify root causes across infrastructure, toolchain, and code-level issues. Team Training & Knowledge Sharing : Train and mentor team members on build systems, CI / CD workflows, and debugging techniques. Create documentation, runbooks, and training sessions to ensure the team can effectively triage issues independently. Foster a culture of continuous learning around build infrastructure. ML Framework Integration & Code Contribution : Understand the architecture and codebase of ML frameworks (PyTorch, TensorFlow, ROCm stack). Review, debug, and contribute code changes as needed to resolve build issues, improve CI reliability, or support new features. Tooling & Automation Development : Design and develop internal tools, automation scripts, and services primarily in Python and Go. Write well-tested, production-grade code to solve infrastructure and workflow challenges. CI / CD Pipeline Development : Design, implement, and manage efficient continuous integration and delivery pipelines using Buildkite, GitHub Actions, and Jenkins to enable rapid and reliable software deployment for ML workloads. Kubernetes Infrastructure Management : Deploy and maintain robust Kubernetes-based environments across both on-premise and cloud platforms to support scalable service orchestration. Infrastructure Automation : Automate provisioning, configuration, and management of infrastructure using Ansible, Python, and Bash to improve system consistency and reduce manual intervention. Service Deployment with Helm : Administer application and service deployment in Kubernetes using Helm charts for consistent and repeatable release processes. GPU Server Support : Configure, manage, and maintain GPU-based compute environments including lifecycle automation and hardware-level test integration for ML training and inference workloads. Database and Observability Integration : Interact with MySQL databases to support dynamic data updates and integrate data sources into Grafana dashboards for monitoring and insights. Cross-Functional Collaboration : Work closely with ML framework developers, SREs, and project stakeholders to ensure system-level alignment and high-impact delivery. Quality Assurance Enablement : Integrate automated testing frameworks into CI pipelines to ensure code quality, stability, and performance across development cycles. PREFERRED EXPERIENCE : Build Systems & Toolchains : Strong understanding of CMake, Bazel, Make, and compiler toolchains (GCC, Clang, LLVM). Ability to debug complex build failures, understand dependency resolution, and optimize build performance. Programming Languages : Strong proficiency in Python and Go for building tools, services, and automation. The ability to read and modify C++ code is a plus for working with ML framework internals and build configurations. ML Framework Familiarity : Understanding of ML framework architecture (PyTorch, TensorFlow, JAX, or similar). Ability to navigate large codebases, understand their build systems, and contribute fixes or improvements. Mentorship & Training : Experience documenting complex systems and training team members. Ability to break down technical concepts and create effective learning materials. DevOps Tools & Automation : Proficient with Buildkite, GitHub Actions, Jenkins, Ansible, and scripting for streamlining DevOps workflows. Containerization & Orchestration : Strong experience with Docker, Kubernetes, and Helm for deploying and managing scalable, containerized applications. Infrastructure as Code (IaC) : Hands-on experience automating infrastructure provisioning and configuration to ensure reproducibility and scalability across environments. GPU-Based Compute Environments : Familiarity with GPU server lifecycle management, ROCm / CUDA toolchains, and integration of GPU resources into CI test workflows for performance-critical ML applications. Monitoring & Observability : Experience using tools like Checkmk, Prometheus, and Grafana to monitor infrastructure health and application performance. Version Control & Collaboration : Advanced knowledge of Git-based version control, including branching strategies and CI / CD integration for collaborative development. Linux & System Administration : Solid background in Linux environments, including shell scripting and system-level troubleshooting across distributed systems. Agile & Cross-Disciplinary Collaboration : Comfort working in Agile teams and partnering with software, infrastructure, and product teams to drive consistent delivery and innovation. ACADEMIC CREDENTIALS : Bachelor's or Master's degree in Computer Science, Software Engineering, or related technical discipline. #LI-JG1 Benefits offered are described : AMD benefits at a glance. AMD does not accept unsolicited resumes from headhunters, recruitment agencies, or fee-based recruitment services. AMD and its subsidiaries are equal opportunity, inclusive employers and will consider all applicants without regard to age, ancestry, color, marital status, medical condition, mental or physical disability, national origin, race, religion, political and / or third-party affiliation, sex, pregnancy, sexual orientation, gender identity, military or veteran status, or any other characteristic protected by law. We encourage applications from all qualified candidates and will accommodate applicants’ needs under the respective laws throughout all stages of the recruitment and selection process. AMD may use Artificial Intelligence to help screen, assess or select applicants for this position. AMD’s “Responsible AI Policy” is available here. This posting is for an existing vacancy.THE ROLE : The AI / ML Frameworks team is hiring an MTS Software Development Engineer to build and maintain scalable DevOps infrastructure that accelerates AMD’s AI software development. You will design and own CI / CD pipelines, manage Kubernetes‑based GPU environments, and automate systems using Python, Go, and Ansible. The role involves creating and maintaining production‑grade automation and tooling that enables fast, reliable software delivery across teams. THE PERSON : The ideal candidate is a skilled DevOps / infrastructure engineer with strong programming abilities. They write clean, maintainable code in Python or Go, and can navigate ML framework source code (PyTorch, TensorFlow, ROCm) to debug issues, optimize build processes, or contribute fixes. They have solid knowledge of build systems and toolchains—understanding how CMake, Bazel, and compiler toolchains work is critical for effective issue triaging and root cause analysis. They are proficient in Kubernetes, CI / CD tools, and infrastructure automation frameworks such as Ansible. Familiarity with C++ is valuable for navigating lower-level framework components. This person thrives in collaborative, fast-paced environments, can drive technical execution with minimal oversight, and is passionate about knowledge sharing and upleveling their team. KEY RESPONSIBILITIES : Build System Expertise & Issue Triaging : Develop deep expertise in build tools and flows (CMake, Bazel, Make, compiler toolchains). Triage complex build failures by understanding the full build pipeline—from source to binary. Identify root causes across infrastructure, toolchain, and code-level issues. Team Training & Knowledge Sharing : Train and mentor team members on build systems, CI / CD workflows, and debugging techniques. Create documentation, runbooks, and training sessions to ensure the team can effectively triage issues independently. Foster a culture of continuous learning around build infrastructure. ML Framework Integration & Code Contribution : Understand the architecture and codebase of ML frameworks (PyTorch, TensorFlow, ROCm stack). Review, debug, and contribute code changes as needed to resolve build issues, improve CI reliability, or support new features. Tooling & Automation Development : Design and develop internal tools, automation scripts, and services primarily in Python and Go. Write well-tested, production-grade code to solve infrastructure and workflow challenges. CI / CD Pipeline Development : Design, implement, and manage efficient continuous integration and delivery pipelines using Buildkite, GitHub Actions, and Jenkins to enable rapid and reliable software deployment for ML workloads. Kubernetes Infrastructure Management : Deploy and maintain robust Kubernetes-based environments across both on-premise and cloud platforms to support scalable service orchestration. Infrastructure Automation : Automate provisioning, configuration, and management of infrastructure using Ansible, Python, and Bash to improve system consistency and reduce manual intervention. Service Deployment with Helm : Administer application and service deployment in Kubernetes using Helm charts for consistent and repeatable release processes. GPU Server Support : Configure, manage, and maintain GPU-based compute environments including lifecycle automation and hardware-level test integration for ML training and inference workloads. Database and Observability Integration : Interact with MySQL databases to support dynamic data updates and integrate data sources into Grafana dashboards for monitoring and insights. Cross-Functional Collaboration : Work closely with ML framework developers, SREs, and project stakeholders to ensure system-level alignment and high-impact delivery. Quality Assurance Enablement : Integrate automated testing frameworks into CI pipelines to ensure code quality, stability, and performance across development cycles. PREFERRED EXPERIENCE : Build Systems & Toolchains : Strong understanding of CMake, Bazel, Make, and compiler toolchains (GCC, Clang, LLVM). Ability to debug complex build failures, understand dependency resolution, and optimize build performance. Programming Languages : Strong proficiency in Python and Go for building tools, services, and automation. The ability to read and modify C++ code is a plus for working with ML framework internals and build configurations. ML Framework Familiarity : Understanding of ML framework architecture (PyTorch, TensorFlow, JAX, or similar). Ability to navigate large codebases, understand their build systems, and contribute fixes or improvements. Mentorship & Training : Experience documenting complex systems and training team members. Ability to break down technical concepts and create effective learning materials. DevOps Tools & Automation : Proficient with Buildkite, GitHub Actions, Jenkins, Ansible, and scripting for streamlining DevOps workflows. Containerization & Orchestration : Strong experience with Docker, Kubernetes, and Helm for deploying and managing scalable, containerized applications. Infrastructure as Code (IaC) : Hands-on experience automating infrastructure provisioning and configuration to ensure reproducibility and scalability across environments. GPU-Based Compute Environments : Familiarity with GPU server lifecycle management, ROCm / CUDA toolchains, and integration of GPU resources into CI test workflows for performance-critical ML applications. Monitoring & Observability : Experience using tools like Checkmk, Prometheus, and Grafana to monitor infrastructure health and application performance. Version Control & Collaboration : Advanced knowledge of Git-based version control, including branching strategies and CI / CD integration for collaborative development. Linux & System Administration : Solid background in Linux environments, including shell scripting and system-level troubleshooting across distributed systems. Agile & Cross-Disciplinary Collaboration : Comfort working in Agile teams and partnering with software, infrastructure, and product teams to drive consistent delivery and innovation. ACADEMIC CREDENTIALS : Bachelor's or Master's degree in Computer Science, Software Engineering, or related technical discipline. #LI-JG1

Benefits offered are described : AMD benefits at a glance. AMD does not accept unsolicited resumes from headhunters, recruitment agencies, or fee-based recruitment services. AMD and its subsidiaries are equal opportunity, inclusive employers and will consider all applicants without regard to age, ancestry, color, marital status, medical condition, mental or physical disability, national origin, race, religion, political and / or third-party affiliation, sex, pregnancy, sexual orientation, gender identity, military or veteran status, or any other characteristic protected by law. We encourage applications from all qualified candidates and will accommodate applicants’ needs under the respective laws throughout all stages of the recruitment and selection process. AMD may use Artificial Intelligence to help screen, assess or select applicants for this position. AMD’s “Responsible AI Policy” is available here. This posting is for an existing vacancy.

Create a job alert for this search

Software Development Engineer Software DevOps Continuous Integration Team • CALGARY, Alberta, Canada

Similar jobs
Software Engineering Developer, DevOps

Software Engineering Developer, DevOps

General Dynamics Mission Systems–Canada • Calgary
Full-time
At General Dynamics Mission Systems–Canada, we’re not just engineering technology — we’re shaping the future of defence and security.Our teams design and deliver advanced, mission‑critical solution...Show more
Last updated: 26 days ago • Promoted
Development Engineer

Development Engineer

ConocoPhillips • Calgary
Full-time
Development Engineer page is loaded## Development Engineerremote type: Fully Onsitelocations: Calgary, ABtime type: Full timeposted on: Posted Todayjob requisition id: REQ-005624Welcome to ConocoPh...Show more
Last updated: 1 day ago • Promoted
Azure DevOps Engineer

Azure DevOps Engineer

LTIMindtree • calgary, ab, ca
Full-time
LTIMindtree is an equal opportunity employer that is committed to diversity in the workplace.Our employment decisions are made without regard to race, color, creed, religion, sex (including pregnan...Show more
Last updated: 30+ days ago • Promoted
Senior Software Engineer (Python) | Remote U.S. AI SaaS

Senior Software Engineer (Python) | Remote U.S. AI SaaS

Atomic HR • Calgary, Alberta, .CA
Remote
Full-time
Quick Apply
We connect talented tech professionals in Latin America and Canada with remote career opportunities at innovative startups worldwide.We specialize in finding roles that align with your skills, expe...Show more
Last updated: 18 days ago
Software Engineer II: Next-Gen Cloud & Security

Software Engineer II: Next-Gen Cloud & Security

Cisco Systems • Calgary
Full-time
Please note this posting is to advertise potential job opportunities.This exact role may not be open today but could open in the near future.When you apply, a Cisco representative may contact you d...Show more
Last updated: 1 day ago • Promoted
Senior Geophysical Java Software Engineer (Seismic/Cloud)

Senior Geophysical Java Software Engineer (Seismic/Cloud)

Halliburton • Calgary
Full-time
An international energy service company in Calgary is seeking an Experienced Software Developer to build and maintain high-performance applications for geophysical and geological data analysis.You ...Show more
Last updated: 30+ days ago • Promoted
Senior Staff Developer & Tech Lead — Cloud-Native Systems

Senior Staff Developer & Tech Lead — Cloud-Native Systems

Benevity • Calgary
Full-time
A leading social impact technology company in Calgary is seeking a Senior Staff Developer with extensive full stack development skills and cloud experience.This role will shape the technical strate...Show more
Last updated: 11 hours ago • Promoted • New!
Senior Software Engineer, Payment Experience / Développeur(euse) logiciel sénior

Senior Software Engineer, Payment Experience / Développeur(euse) logiciel sénior

Paramount Commerce • Calgary, AB, CA
Remote
Full-time
Quick Apply
Based in Canada and headquartered in Toronto, Paramount Commerce is a fintech company focused on making payments simpler and more secure.We’ve been helping reshape online payments since the early 2...Show more
Last updated: 8 days ago
Senior AI Software Engineer: End-to-End Ownership

Senior AI Software Engineer: End-to-End Ownership

Cenovus Energy • Calgary
Full-time
A leading oil and gas company in Calgary is hiring a Software Engineering Specialist to join their Business Transformation group.You will develop AI-powered products, translating business needs int...Show more
Last updated: 1 day ago • Promoted
Senior Java Full-Stack Engineer — Calgary, Hybrid

Senior Java Full-Stack Engineer — Calgary, Hybrid

CGI • Calgary
Full-time
A leading IT consulting firm is seeking a Senior Java Full Stack Developer in Calgary, Alberta.The position involves investigating production code issues, driving software design and development, a...Show more
Last updated: 30+ days ago • Promoted
Full Stack Engineer

Full Stack Engineer

Presto Recruiting Solutions • Airdrie, Alberta, Canada
Full-time
Design & develop cutting-edge features while evolving and maintaining robust existing applications.Own the full stack: craft intuitive front-end UIs and powerful back-end services.Build, deploy, an...Show more
Last updated: 12 days ago • Promoted
Call Center Representative Agent Work From Home - Part Time Focus Group Panelists

Call Center Representative Agent Work From Home - Part Time Focus Group Panelists

ApexFocusGroup • Okotoks, Alberta, Canada
Remote
Full-time +1
Now accepting applicants for Focus Group studies.Earn up to $850 per week part-time working from home.Must register to see if you qualify.No Call Center Representative Agent experience needed.Call ...Show more
Last updated: 30+ days ago • Promoted
Customer Service Agent - 50k-60k/Year - Remote

Customer Service Agent - 50k-60k/Year - Remote

Spade Recruiting • High River, Alberta
Remote
Full-time
Quick Apply
We’re looking for enthusiastic, self-driven, individuals to assist existing and prospective clients within our organization.This position will work with multiple clients throughout the day providin...Show more
Last updated: 23 days ago • Promoted
Planning & Scheduling Engineer III

Planning & Scheduling Engineer III

NES Fircroft • airdrie, AB, ca
Full-time
MAIN FUNCTIONSThe Planning and Scheduling Engineer (PE) is a specialist in project schedule development, control and forecasting.The PE is responsible for the development of the Project Schedule an...Show more
Last updated: 21 days ago • Promoted
VC Investor

VC Investor

Mistral • airdrie, AB, ca
Full-time
About the RoleMistral Venture Partners is looking for the right person to join our investment team.You will contribute to all aspects of early-stage investing, from finding amazing founders to help...Show more
Last updated: 30+ days ago • Promoted
Senior Software Development Engineer

Senior Software Development Engineer

Wagepoint • Calgary, Alberta, CA
Full-time
Quick Apply
Wagepoint is a small-but-mighty fintech on a mission to make payroll simple (and maybe even delightful).Our online software takes care of the “ugh” stuff — like wage calculations and tax reporting ...Show more
Last updated: 2 days ago
Solutions Engineer

Solutions Engineer

Meld • calgary, ab, ca
Full-time
Meld is a fast growing startup looking to add developer support for customers who use our API driven platform for managing their crypto related integrations.We're focused on helping money move on c...Show more
Last updated: 12 days ago • Promoted
Deliver with Uber - Sign Up and Start Earning

Deliver with Uber - Sign Up and Start Earning

Uber eats • Okotoks, AB, CA
Full-time +2
Delivering with Uber is an alternative to a part-time or full-time job and can get you earning cash quickly.Delivering with Uber allows you to earn quick cash while maintaining the flexibility your...Show more
Last updated: 30+ days ago • Promoted