Talent.com
Software Development Engineer – Software Dev Ops & Continuous Integration Team
Software Development Engineer – Software Dev Ops & Continuous Integration TeamAdvanced Micro Devices, Inc • MARKHAM, Ontario, Canada
Software Development Engineer – Software Dev Ops & Continuous Integration Team

Software Development Engineer – Software Dev Ops & Continuous Integration Team

Advanced Micro Devices, Inc • MARKHAM, Ontario, Canada
Il y a 21 jours
Type de contrat
  • Temps plein
Description de poste

WHAT YOU DO AT AMD CHANGES EVERYTHING At AMD, our mission is to build great products that accelerate next-generation computing experiences—from AI and data centers, to PCs, gaming and embedded systems. Grounded in a culture of innovation and collaboration, we believe real progress comes from bold ideas, human ingenuity and a shared passion to create something extraordinary. When you join AMD, you’ll discover the real differentiator is our culture. We push the limits of innovation to solve the world’s most important challenges—striving for execution excellence, while being direct, humble, collaborative, and inclusive of diverse perspectives. Join us as we shape the future of AI and beyond. Together, we advance your career. THE ROLE : The AI / ML Frameworks team is hiring an MTS Software Development Engineer to build and maintain scalable DevOps infrastructure that accelerates AMD’s AI software development. You will design and own CI / CD pipelines, manage Kubernetes‑based GPU environments, and automate systems using Python, Go, and Ansible. The role involves creating and maintaining production‑grade automation and tooling that enables fast, reliable software delivery across teams. THE PERSON : The ideal candidate is a skilled DevOps / infrastructure engineer with strong programming abilities. They write clean, maintainable code in Python or Go, and can navigate ML framework source code (PyTorch, TensorFlow, ROCm) to debug issues, optimize build processes, or contribute fixes. They have solid knowledge of build systems and toolchains—understanding how CMake, Bazel, and compiler toolchains work is critical for effective issue triaging and root cause analysis. They are proficient in Kubernetes, CI / CD tools, and infrastructure automation frameworks such as Ansible. Familiarity with C++ is valuable for navigating lower-level framework components. This person thrives in collaborative, fast-paced environments, can drive technical execution with minimal oversight, and is passionate about knowledge sharing and upleveling their team. KEY RESPONSIBILITIES : Build System Expertise & Issue Triaging : Develop deep expertise in build tools and flows (CMake, Bazel, Make, compiler toolchains). Triage complex build failures by understanding the full build pipeline—from source to binary. Identify root causes across infrastructure, toolchain, and code-level issues. Team Training & Knowledge Sharing : Train and mentor team members on build systems, CI / CD workflows, and debugging techniques. Create documentation, runbooks, and training sessions to ensure the team can effectively triage issues independently. Foster a culture of continuous learning around build infrastructure. ML Framework Integration & Code Contribution : Understand the architecture and codebase of ML frameworks (PyTorch, TensorFlow, ROCm stack). Review, debug, and contribute code changes as needed to resolve build issues, improve CI reliability, or support new features. Tooling & Automation Development : Design and develop internal tools, automation scripts, and services primarily in Python and Go. Write well-tested, production-grade code to solve infrastructure and workflow challenges. CI / CD Pipeline Development : Design, implement, and manage efficient continuous integration and delivery pipelines using Buildkite, GitHub Actions, and Jenkins to enable rapid and reliable software deployment for ML workloads. Kubernetes Infrastructure Management : Deploy and maintain robust Kubernetes-based environments across both on-premise and cloud platforms to support scalable service orchestration. Infrastructure Automation : Automate provisioning, configuration, and management of infrastructure using Ansible, Python, and Bash to improve system consistency and reduce manual intervention. Service Deployment with Helm : Administer application and service deployment in Kubernetes using Helm charts for consistent and repeatable release processes. GPU Server Support : Configure, manage, and maintain GPU-based compute environments including lifecycle automation and hardware-level test integration for ML training and inference workloads. Database and Observability Integration : Interact with MySQL databases to support dynamic data updates and integrate data sources into Grafana dashboards for monitoring and insights. Cross-Functional Collaboration : Work closely with ML framework developers, SREs, and project stakeholders to ensure system-level alignment and high-impact delivery. Quality Assurance Enablement : Integrate automated testing frameworks into CI pipelines to ensure code quality, stability, and performance across development cycles. PREFERRED EXPERIENCE : Build Systems & Toolchains : Strong understanding of CMake, Bazel, Make, and compiler toolchains (GCC, Clang, LLVM). Ability to debug complex build failures, understand dependency resolution, and optimize build performance. Programming Languages : Strong proficiency in Python and Go for building tools, services, and automation. The ability to read and modify C++ code is a plus for working with ML framework internals and build configurations. ML Framework Familiarity : Understanding of ML framework architecture (PyTorch, TensorFlow, JAX, or similar). Ability to navigate large codebases, understand their build systems, and contribute fixes or improvements. Mentorship & Training : Experience documenting complex systems and training team members. Ability to break down technical concepts and create effective learning materials. DevOps Tools & Automation : Proficient with Buildkite, GitHub Actions, Jenkins, Ansible, and scripting for streamlining DevOps workflows. Containerization & Orchestration : Strong experience with Docker, Kubernetes, and Helm for deploying and managing scalable, containerized applications. Infrastructure as Code (IaC) : Hands-on experience automating infrastructure provisioning and configuration to ensure reproducibility and scalability across environments. GPU-Based Compute Environments : Familiarity with GPU server lifecycle management, ROCm / CUDA toolchains, and integration of GPU resources into CI test workflows for performance-critical ML applications. Monitoring & Observability : Experience using tools like Checkmk, Prometheus, and Grafana to monitor infrastructure health and application performance. Version Control & Collaboration : Advanced knowledge of Git-based version control, including branching strategies and CI / CD integration for collaborative development. Linux & System Administration : Solid background in Linux environments, including shell scripting and system-level troubleshooting across distributed systems. Agile & Cross-Disciplinary Collaboration : Comfort working in Agile teams and partnering with software, infrastructure, and product teams to drive consistent delivery and innovation. ACADEMIC CREDENTIALS : Bachelor's or Master's degree in Computer Science, Software Engineering, or related technical discipline. #LI-JG1 Benefits offered are described : AMD benefits at a glance. AMD does not accept unsolicited resumes from headhunters, recruitment agencies, or fee-based recruitment services. AMD and its subsidiaries are equal opportunity, inclusive employers and will consider all applicants without regard to age, ancestry, color, marital status, medical condition, mental or physical disability, national origin, race, religion, political and / or third-party affiliation, sex, pregnancy, sexual orientation, gender identity, military or veteran status, or any other characteristic protected by law. We encourage applications from all qualified candidates and will accommodate applicants’ needs under the respective laws throughout all stages of the recruitment and selection process. AMD may use Artificial Intelligence to help screen, assess or select applicants for this position. AMD’s “Responsible AI Policy” is available here. This posting is for an existing vacancy.THE ROLE : The AI / ML Frameworks team is hiring an MTS Software Development Engineer to build and maintain scalable DevOps infrastructure that accelerates AMD’s AI software development. You will design and own CI / CD pipelines, manage Kubernetes‑based GPU environments, and automate systems using Python, Go, and Ansible. The role involves creating and maintaining production‑grade automation and tooling that enables fast, reliable software delivery across teams. THE PERSON : The ideal candidate is a skilled DevOps / infrastructure engineer with strong programming abilities. They write clean, maintainable code in Python or Go, and can navigate ML framework source code (PyTorch, TensorFlow, ROCm) to debug issues, optimize build processes, or contribute fixes. They have solid knowledge of build systems and toolchains—understanding how CMake, Bazel, and compiler toolchains work is critical for effective issue triaging and root cause analysis. They are proficient in Kubernetes, CI / CD tools, and infrastructure automation frameworks such as Ansible. Familiarity with C++ is valuable for navigating lower-level framework components. This person thrives in collaborative, fast-paced environments, can drive technical execution with minimal oversight, and is passionate about knowledge sharing and upleveling their team. KEY RESPONSIBILITIES : Build System Expertise & Issue Triaging : Develop deep expertise in build tools and flows (CMake, Bazel, Make, compiler toolchains). Triage complex build failures by understanding the full build pipeline—from source to binary. Identify root causes across infrastructure, toolchain, and code-level issues. Team Training & Knowledge Sharing : Train and mentor team members on build systems, CI / CD workflows, and debugging techniques. Create documentation, runbooks, and training sessions to ensure the team can effectively triage issues independently. Foster a culture of continuous learning around build infrastructure. ML Framework Integration & Code Contribution : Understand the architecture and codebase of ML frameworks (PyTorch, TensorFlow, ROCm stack). Review, debug, and contribute code changes as needed to resolve build issues, improve CI reliability, or support new features. Tooling & Automation Development : Design and develop internal tools, automation scripts, and services primarily in Python and Go. Write well-tested, production-grade code to solve infrastructure and workflow challenges. CI / CD Pipeline Development : Design, implement, and manage efficient continuous integration and delivery pipelines using Buildkite, GitHub Actions, and Jenkins to enable rapid and reliable software deployment for ML workloads. Kubernetes Infrastructure Management : Deploy and maintain robust Kubernetes-based environments across both on-premise and cloud platforms to support scalable service orchestration. Infrastructure Automation : Automate provisioning, configuration, and management of infrastructure using Ansible, Python, and Bash to improve system consistency and reduce manual intervention. Service Deployment with Helm : Administer application and service deployment in Kubernetes using Helm charts for consistent and repeatable release processes. GPU Server Support : Configure, manage, and maintain GPU-based compute environments including lifecycle automation and hardware-level test integration for ML training and inference workloads. Database and Observability Integration : Interact with MySQL databases to support dynamic data updates and integrate data sources into Grafana dashboards for monitoring and insights. Cross-Functional Collaboration : Work closely with ML framework developers, SREs, and project stakeholders to ensure system-level alignment and high-impact delivery. Quality Assurance Enablement : Integrate automated testing frameworks into CI pipelines to ensure code quality, stability, and performance across development cycles. PREFERRED EXPERIENCE : Build Systems & Toolchains : Strong understanding of CMake, Bazel, Make, and compiler toolchains (GCC, Clang, LLVM). Ability to debug complex build failures, understand dependency resolution, and optimize build performance. Programming Languages : Strong proficiency in Python and Go for building tools, services, and automation. The ability to read and modify C++ code is a plus for working with ML framework internals and build configurations. ML Framework Familiarity : Understanding of ML framework architecture (PyTorch, TensorFlow, JAX, or similar). Ability to navigate large codebases, understand their build systems, and contribute fixes or improvements. Mentorship & Training : Experience documenting complex systems and training team members. Ability to break down technical concepts and create effective learning materials. DevOps Tools & Automation : Proficient with Buildkite, GitHub Actions, Jenkins, Ansible, and scripting for streamlining DevOps workflows. Containerization & Orchestration : Strong experience with Docker, Kubernetes, and Helm for deploying and managing scalable, containerized applications. Infrastructure as Code (IaC) : Hands-on experience automating infrastructure provisioning and configuration to ensure reproducibility and scalability across environments. GPU-Based Compute Environments : Familiarity with GPU server lifecycle management, ROCm / CUDA toolchains, and integration of GPU resources into CI test workflows for performance-critical ML applications. Monitoring & Observability : Experience using tools like Checkmk, Prometheus, and Grafana to monitor infrastructure health and application performance. Version Control & Collaboration : Advanced knowledge of Git-based version control, including branching strategies and CI / CD integration for collaborative development. Linux & System Administration : Solid background in Linux environments, including shell scripting and system-level troubleshooting across distributed systems. Agile & Cross-Disciplinary Collaboration : Comfort working in Agile teams and partnering with software, infrastructure, and product teams to drive consistent delivery and innovation. ACADEMIC CREDENTIALS : Bachelor's or Master's degree in Computer Science, Software Engineering, or related technical discipline. #LI-JG1

Benefits offered are described : AMD benefits at a glance. AMD does not accept unsolicited resumes from headhunters, recruitment agencies, or fee-based recruitment services. AMD and its subsidiaries are equal opportunity, inclusive employers and will consider all applicants without regard to age, ancestry, color, marital status, medical condition, mental or physical disability, national origin, race, religion, political and / or third-party affiliation, sex, pregnancy, sexual orientation, gender identity, military or veteran status, or any other characteristic protected by law. We encourage applications from all qualified candidates and will accommodate applicants’ needs under the respective laws throughout all stages of the recruitment and selection process. AMD may use Artificial Intelligence to help screen, assess or select applicants for this position. AMD’s “Responsible AI Policy” is available here. This posting is for an existing vacancy.

Créer une alerte emploi pour cette recherche

Software Development Engineer Software Dev Ops Continuous Integration Team • MARKHAM, Ontario, Canada

Offres similaires
Senior Software Engineer II

Senior Software Engineer II

Tripledot Studios • Toronto C6A, ON, Canada
Télétravail
Temps plein
Tripledot is one of the largest independent mobile games companies in the world.We are a multi‑award‑winning organisation, with a global 2,500+ strong team across 12 studios.Our expanded portfolio ...Voir plus
Dernière mise à jour : il y a plus de 30 jours • Offre sponsorisée
Senior Java Software Engineer - Luxoft

Senior Java Software Engineer - Luxoft

Luxoft • markham, on, ca
Temps plein
One of the world's largest providers of products and services to the energy industry has a need to develop and support enterprise information system in Oil & Gas domain. Product being developed is a...Voir plus
Dernière mise à jour : il y a plus de 30 jours • Offre sponsorisée
Sr DevOps Engineer / Lead - Pacer Group

Sr DevOps Engineer / Lead - Pacer Group

Pacer Group • toronto, on, ca
Temps plein
Develop and maintain Infrastructure as Code (IaC) using CDKTF for Azure AI, data, and database workloads.Build reusable CDKTF constructs and modular frameworks to standardize AI and data infrastruc...Voir plus
Dernière mise à jour : il y a 16 jours • Offre sponsorisée
Senior Software Engineer, Backend

Senior Software Engineer, Backend

Common Room, Inc. • Toronto, Canada
Temps plein
AI enrichment and automation to reach the right person with the right context at the right time.Despite an explosion of buyer signals, companies are left struggling with siloed point solution vendo...Voir plus
Dernière mise à jour : il y a 26 jours • Offre sponsorisée
Solutions Engineer

Solutions Engineer

Meld • richmond hill, on, ca
Temps plein
Meld is a fast growing startup looking to add developer support for customers who use our API driven platform for managing their crypto related integrations. We're focused on helping money move on c...Voir plus
Dernière mise à jour : il y a 5 heures • Offre sponsorisée • Nouvelle offre
Sr. Software Engineer, Backend & DevOps (Askuity division)

Sr. Software Engineer, Backend & DevOps (Askuity division)

The Home Depot Canada • Toronto
Temps plein
Software Engineer, Backend & DevOps (Askuity division).Software Engineer, Backend & DevOps (Askuity division).With a career at The Home Depot, you can be yourself and also be part of something bigg...Voir plus
Dernière mise à jour : il y a 19 jours • Offre sponsorisée
Software Engineer (Founding DevOps)

Software Engineer (Founding DevOps)

Highbeam Inc. • Toronto
Temps plein
Highbeam is building the future of business banking and cash management.Our platform combines AI agents, automated financial workflows, and integrated financial products that save brands time and m...Voir plus
Dernière mise à jour : il y a 19 jours • Offre sponsorisée
MTS Software Development Engineer

MTS Software Development Engineer

Net2Source (N2S) • Markham
Temps plein
Be among the first 25 applicants.This range is provided by Net2Source (N2S).Your actual pay will be based on your skills and experience — talk with your recruiter to learn more.Direct message the j...Voir plus
Dernière mise à jour : il y a 19 jours • Offre sponsorisée
Software Development Engineer

Software Development Engineer

Autodesk • Toronto
Temps plein
Job Requisition ID # 25WD93468.Autodesk is a global leader in 3D design, engineering, and entertainment software, empowering innovators everywhere to imagine, design, and make a better world.Our so...Voir plus
Dernière mise à jour : il y a 2 jours • Offre sponsorisée
Hybrid Software Development Manager - Lead Impact Engineers

Hybrid Software Development Manager - Lead Impact Engineers

IXL Learning • Toronto
Temps plein
A leading EdTech company in Toronto is looking for a Software Development Manager to lead a talented engineering team.The role requires 7+ years of software engineering experience coupled with 2+ y...Voir plus
Dernière mise à jour : il y a 19 jours • Offre sponsorisée
Software Development Lead

Software Development Lead

AlgaeCal Inc. • Toronto, Canada
Temps plein
Be among the first 25 applicants.Direct message the job poster from AlgaeCal Inc.Director of People & Culture at AlgaeCal Inc. When it comes to web development, you’re a jack of all trades.Your code...Voir plus
Dernière mise à jour : il y a 26 jours • Offre sponsorisée
DevOps Engineer I

DevOps Engineer I

Paymentus Holdings Inc. • Richmond Hill
Temps plein
The DevOps Engineer is responsible for supporting, monitoring and tooling of cloud deployments.This engineer works closely with the Development and QA teams to produce reliable and secure productio...Voir plus
Dernière mise à jour : il y a 14 jours • Offre sponsorisée
Software Development Engineer at Workday – Canada, ON, Toronto

Software Development Engineer at Workday – Canada, ON, Toronto

Victrays • Toronto
Temps plein
Software Development Engineer at Workday – Canada, ON, Toronto.Your work days are brighter here.At Workday, it all began with a conversation over breakfast. When our founders met at a sunny Californ...Voir plus
Dernière mise à jour : il y a 19 jours • Offre sponsorisée
Software Engineer

Software Engineer

Flexton Inc. • Toronto
Temps plein
Direct message the job poster from Flexton Inc.Hiring for Project Manager & iOS Developer in Cincinnati OH.Referrals increase your chances of interviewing at Flexton Inc. Software Engineer, Backend ...Voir plus
Dernière mise à jour : il y a 19 jours • Offre sponsorisée
Software Development Lead

Software Development Lead

Onico Solutions • Richmond Hill
Temps plein +1
We are looking for a Software Development Manager to lead a team, building modern architectures in critical applications. The Software Development Manager enables, inspires, coaches and motivates a ...Voir plus
Dernière mise à jour : il y a 19 jours • Offre sponsorisée
Technical Lead, Software Development (Vertical App Experience)

Technical Lead, Software Development (Vertical App Experience)

Method CRM • Toronto
Temps plein +1
Our Development team is growing, and we’re looking for a Technical Lead (Vertical App Experience) who is passionate about solving complex technical challenges, driving architectural excellence, and...Voir plus
Dernière mise à jour : il y a 15 jours • Offre sponsorisée
Software Engineer in Test

Software Engineer in Test

Iris Software Inc. • Toronto, Canada
Temps plein
Overview Helping talents upskill their career journey in IRIS Software, Financial Services (BFSI) | Hiring Java Developers in Ontario Canada. Position : Software Engineer in Test.Skills : Automation, ...Voir plus
Dernière mise à jour : il y a 26 jours • Offre sponsorisée
DevOps Engineer

DevOps Engineer

MethodHub • Toronto
Temps plein
Senior DevOps Engineer – Identity Focus.Our team is looking for a DevOps Engineer with at least 9+ years of cloud experience, including specialization in. Identity and Access Management (IAM).Our pr...Voir plus
Dernière mise à jour : il y a 2 jours • Offre sponsorisée