Talent.com
Sr. Systems Design Engineer - Data Center GPU
Sr. Systems Design Engineer - Data Center GPUAMD • Markham, York Region, CA
No longer accepting applications
Sr. Systems Design Engineer - Data Center GPU

Sr. Systems Design Engineer - Data Center GPU

AMD • Markham, York Region, CA
30+ days ago
Job type
  • Full-time
Job description

At AMD, our mission is to build great products that accelerate next-generation computing experiences—from AI and data centers, to PCs, gaming and embedded systems. Grounded in a culture of innovation and collaboration, we believe real progress comes from bold ideas, human ingenuity and a shared passion to create something extraordinary. When you join AMD, you’ll discover the real differentiator is our culture.

THE ROLE

We are looking for a dynamic, energetic Senior Systems Design Engineer to join our growing Data Center GPU team. In this role, you will work closely with the automation, infrastructure, and validation teams to ensure scalability and reliability. You will also document processes, best practices, and provide training for internal teams.

THE PERSON

As a Systems Design Engineer, you will drive balanced, scalable, and automated solutions. In this high visibility position, your software systems engineering expertise will be necessary towards product development, definition, and root cause resolution. You will have strong problem‑solving and debugging skills, excellent communication and collaboration abilities, and the ability to work in fast‑paced, cross‑functional environments.

KEY RESPONSIBILITIES

Containerization & Image Management

  • Design, build, and maintain Docker images optimized for ML / AI workloads.
  • Implement multi‑stage builds, image hardening, and vulnerability scanning.
  • Manage Docker registries (e.g., Harbor) and enforce retention policies for large‑scale deployments.

Automation & Orchestration

  • Develop and maintain Python‑based automation scripts for Conductor workflows.
  • Implement CI / CD pipelines for automated container builds and workload deployment.
  • Integrate orchestration frameworks (Conductor, Kubernetes, Slurm) for multi‑node workload execution.
  • ML / AI Workload Enablement

  • Enable training and inference workloads using frameworks like PyTorch, TensorFlow, VLLM.
  • Optimize distributed training and inference across multi‑node clusters using MPI and RDMA.
  • Collaborate with app experts to benchmark and tune performance for AI / HPC workloads.
  • Infrastructure & Performance

  • Integrate ROCm stack and GPU resource management into containerized environments.
  • Troubleshoot latency, networking, and storage bottlenecks for at‑scale workloads.
  • Implement monitoring and logging for containerized ML workloads.
  • PREFERRED EXPERIENCE

  • Strong proficiency in Python and automation frameworks.
  • Hands‑on experience with Docker and container orchestration (Kubernetes, Podman).
  • Familiarity with CI / CD tools (Jenkins, GitHub Actions) and infrastructure‑as‑code (Terraform, Ansible).
  • Knowledge of ML frameworks (PyTorch, TensorFlow) and GPU acceleration (ROCm, CUDA).
  • Understanding of networking concepts (RDMA, MPI) for distributed workloads.
  • Prior experience enabling ML / AI workloads in production or HPC environments.
  • Exposure to orchestration platforms like Conductor or similar workflow engines.
  • ACADEMIC CREDENTIALS

  • Bachelors or Masters degree in electrical or computer engineering, minimum 5‑7 years relevant experience.
  • LOCATION

    Markham, ON

    BENEFITS

    Benefits offered are described : AMD benefits at a glance.

    BASE PAY RANGE

    $116,000.00 / yr – $174,000.00 / yr

    LEGAL STATEMENTS

    AMD does not accept unsolicited resumes from headhunters, recruitment agencies, or fee‑based recruitment services. AMD and its subsidiaries are equal‑opportunity, inclusive employers and will consider all applicants without regard to age, ancestry, color, marital status, medical condition, mental or physical disability, national origin, race, religion, political and / or third‑party affiliation, sex, pregnancy, sexual orientation, gender identity, military or veteran status, or any other characteristic protected by law. We encourage applications from all qualified candidates and will accommodate applicants’ needs under the respective laws throughout all stages of the recruitment and selection process.

    Seniority Level : Mid‑Senior level

    Employment Type : Full‑time

    Job Function : Semiconductor Manufacturing

    #J-18808-Ljbffr

    Create a job alert for this search

    Sr Data Engineer • Markham, York Region, CA

    Similar jobs
    Sr. Data Engineer

    Sr. Data Engineer

    Xplore Inc. • Markham, ON, Canada
    Full-time
    Canada’s fibre, 5G and satellite broadband company for rural living.Xplore is committed to the relentless pursuit of an improved broadband experience for all Canadians. Xplore is building a world-cl...Show more
    Last updated: 13 days ago • Promoted
    Senior Systems Engineer - Avionics

    Senior Systems Engineer - Avionics

    Essence Coaching Group • Markham, ON, Canada
    Full-time
    Senior Systems Engineer – Avionics.Lindsay, Ontario, Canada (Hybrid).CAD 165,000 – 210,000 gross / year.A senior-level Systems Engineer – Avionics is sought to lead the definition, ...Show more
    Last updated: 5 days ago • Promoted
    Sr. Design Verification Engineer, Annapurna Labs

    Sr. Design Verification Engineer, Annapurna Labs

    Amazon Web Services (AWS) • Toronto, Canada
    Full-time
    Design Verification Engineer, Annapurna Labs – Amazon Web Services (AWS) Amazon Web Services (AWS) is the world’s most comprehensive and broadly adopted cloud platform, and our Custom Silicon organ...Show more
    Last updated: 30+ days ago • Promoted
    AI Systems Engineer – Serverless Distributed Computing

    AI Systems Engineer – Serverless Distributed Computing

    Huawei Technologies Canada Co., Ltd. • Markham, ON, CA
    Permanent
    Huawei Canada has an immediate permanent opening for a Software Engineer.The Distributed Data Storage and Management Lab leads research in distributed data systems, aiming to develop next-generatio...Show more
    Last updated: 30+ days ago
    Senior Systems & Graphics Engineer

    Senior Systems & Graphics Engineer

    Parallelz • Toronto, ON, Canada
    Full-time
    Parallelz enables developers to instantly port their existing mobile apps / games to the web, without any SDKs, code changes, or engineering efforts. Developers can improve user acquisition, organic v...Show more
    Last updated: 30+ days ago • Promoted
    Staff SoC Design Engineer — Hybrid, High-Performance Compute

    Staff SoC Design Engineer — Hybrid, High-Performance Compute

    Arm • Toronto C6A, ON, Canada
    Remote
    Full-time
    A leading semiconductor company is seeking a Staff SoC Design Engineer in Toronto.You will play a vital role in developing complex systems utilizing the latest technologies.Responsibilities include...Show more
    Last updated: 13 days ago • Promoted
    Senior Design Engineer, Traction Power (11977)

    Senior Design Engineer, Traction Power (11977)

    Ttc Productions Ltd. • Toronto C6A, ON, Canada
    Remote
    Full-time
    Be among the first 25 applicants.Toronto Transit Commission (TTC) provided pay range.This range is provided by Toronto Transit Commission (TTC). Your actual pay will be based on your skills and expe...Show more
    Last updated: 1 day ago • Promoted
    Senior Go Engineer – Scale Distributed Systems & Mentor

    Senior Go Engineer – Scale Distributed Systems & Mentor

    Circle Internet Services Inc. • Toronto C6A, ON, Canada
    Full-time
    A leading tech firm in Toronto seeks a Senior Software Engineer to design reliable distributed systems and write clean Go code. You will mentor engineers and directly impact software delivery across...Show more
    Last updated: 30+ days ago • Promoted
    Databricks Engineer

    Databricks Engineer

    TheAppLabb • Toronto, ON, Canada
    Full-time
    TheAILabb is a leading innovation company specializing in AI-powered digital solutions, mobile app development, and emerging technologies. We leverage data-driven insights to enhance digital experie...Show more
    Last updated: 30+ days ago • Promoted
    Senior Cloud Storage & Distributed Systems Engineer

    Senior Cloud Storage & Distributed Systems Engineer

    Confluent • Toronto, Canada
    Full-time
    A leading data streaming platform company in Canada is seeking a passionate engineer to build and operationalize high-performance cloud storage solutions. Ideal candidates will have cloud infrastruc...Show more
    Last updated: 2 days ago • Promoted
    Senior Physical Design Engineer - Power, PPA & Timing

    Senior Physical Design Engineer - Power, PPA & Timing

    Alphawave Semi • Toronto, ON, Canada
    Full-time
    A leading technology firm in Toronto is seeking a Senior Physical Design Engineer to drive critical data communication solutions. The role involves backend process implementation, physical verificat...Show more
    Last updated: 30+ days ago • Promoted
    MAAS Data Center Infrastructure Engineer (Toronto area)

    MAAS Data Center Infrastructure Engineer (Toronto area)

    Canonical • Toronto, ON, Canada
    Full-time
    Canonical is a leading provider of open source software and operating systems to the global enterprise and technology markets. Our platform, Ubuntu, is very widely used in breakthrough enterprise in...Show more
    Last updated: 30+ days ago • Promoted
    Power Systems Engineer

    Power Systems Engineer

    Actalent • Toronto, ON, Canada
    Full-time
    NOW HIRING : Power Systems Engineer.Are you open to relocating to Western Canada? We are now hiring a Power Systems Engineer for a firm with locations in LANGLEY and CALGARY.We are seeking a talente...Show more
    Last updated: 9 days ago • Promoted
    Technology System Design Consultant

    Technology System Design Consultant

    H.H. Angus and Associates Limited • Don Mills, ON, Canada
    Full-time
    At HH Angus, we strive to expand what is possible to shape a better future.We do this by empowering our employees to do their best work, find meaning in what they do and provide opportunities to gr...Show more
    Last updated: 30+ days ago • Promoted
    Sr. Infrastructure Engineer with Kubernetes

    Sr. Infrastructure Engineer with Kubernetes

    Confidential • newmarket, on, ca
    Full-time
    The role seeks a highly experienced Infrastructure Specialist to spearhead the design, deployment, and operational excellence of a modern cloud-native infrastructure. The ideal candidate must posses...Show more
    Last updated: 4 days ago • Promoted
    Sr. ML Kernel Performance Engineer, AWS Neuron, Annapurna Labs

    Sr. ML Kernel Performance Engineer, AWS Neuron, Annapurna Labs

    Amazon • Toronto, ON, Canada
    Full-time
    The Annapurna Labs team at Amazon Web Services (AWS) builds AWS Neuron, the software development kit used to accelerate deep learning and GenAI workloads on Amazon’s custom machine learning acceler...Show more
    Last updated: 30+ days ago • Promoted
    Systems Engineer

    Systems Engineer

    STACK IT Recruitment • Toronto, ON, Canada
    Full-time
    Love complex infrastructure challenges and client-facing problem solving?.This is your chance to lead major tech transformations - from server migrations to cloud upgrades - all while being the go-...Show more
    Last updated: 30+ days ago • Promoted
    Staff Site Reliability Engineer (SRE), Federal- Security and Data Systems

    Staff Site Reliability Engineer (SRE), Federal- Security and Data Systems

    Okta • Toronto C6A, ON, Canada
    Full-time
    Staff Site Reliability Engineer (SRE), Security and Data Systems.Okta is The World’s Identity Company.We free everyone to safely use any technology, anywhere, on any device or app.Our flexible and ...Show more
    Last updated: 30+ days ago • Promoted