Talent.com
Infrastructure Engineer – GPU Test Automation Farm
Infrastructure Engineer – GPU Test Automation FarmAdvanced Micro Devices, Inc • MARKHAM, Ontario, Canada
No longer accepting applications
Infrastructure Engineer – GPU Test Automation Farm

Infrastructure Engineer – GPU Test Automation Farm

Advanced Micro Devices, Inc • MARKHAM, Ontario, Canada
4 days ago
Job type
  • Full-time
Job description

WHAT YOU DO AT AMD CHANGES EVERYTHING At AMD, our mission is to build great products that accelerate next-generation computing experiences—from AI and data centers, to PCs, gaming and embedded systems. Grounded in a culture of innovation and collaboration, we believe real progress comes from bold ideas, human ingenuity and a shared passion to create something extraordinary. When you join AMD, you’ll discover the real differentiator is our culture. We push the limits of innovation to solve the world’s most important challenges—striving for execution excellence, while being direct, humble, collaborative, and inclusive of diverse perspectives. Join us as we shape the future of AI and beyond. Together, we advance your career. THE ROLE : AMD is looking for a highly skilled and experienced systems deployment architect to design, plan, and lead the deployment of a large-scale GPU test automation farm in a datacenter-style environment. This individual will translate AMD’s test and validation vision into a robust, modular, and scalable infrastructure capable of supporting continuous integration and validation for next-generation products. THE PERSON : The ideal candidate combines deep technical expertise in infrastructure design with hands-on experience building large compute farms and automation systems, and has a strong understanding of datacenter operational constraints. Able to demonstrate strong architectural judgment, operational discipline, and a practical understanding of the technologies that enable scalable infrastructure. KEY RESPONSIBILITIES : Architect and design a distributed, large-scale GPU test automation farm optimized for performance, scalability, and reliability. Lead the deployment and operation of infrastructure in datacenter-like environments, ensuring compliance with standards for power, cooling, networking, and management systems. Define and enforce best practices for system configuration, monitoring, and fault tolerance to ensure high availability and performance. Collaborate with cross-functional teams (QA, IT, software, datacenter ops, and engineering) to deliver seamless test workflows and system integration. Evaluate and implement technologies that improve deployment efficiency, system observability, and scalability (containerization, virtualization, orchestration, MaaS, etc.). Mentor engineers in infrastructure design principles and contribute to the overall architectural vision of AMD’s GPU validation environment. PREFERRED EXPERIENCE : Proven expertise in GPU or HPC cluster environments, including system provisioning, scheduling, and performance tuning. Expert background in Windows and Linux administration, including automation tools and scripting. Experience with automation frameworks (Ansible, Terraform, etc.) and CI / CD pipelines for infrastructure deployment. Hands-on experience with MaaS (Metal-as-a-Service) platforms for large-scale bare-metal provisioning. Knowledge of Network Boot (PXE, iPXE, UEFI) configurations and automation. Experience building or integrating inventory health management systems, including real-time monitoring of servers, network devices, and supporting services. Skilled in space allocation and racking strategies in datacenter or lab environments. Deep understanding of power planning for dense compute environments. Experience with network design and topology optimization for high-throughput data paths. ACADEMIC CREDENTIALS : Bachelor’s or Master's degree in Computer Science, Computer Engineering, Electrical Engineering, or equivalent LOCATION : Markham, Ontario Canada #LI-PA1 #LI-HYBRID Benefits offered are described : AMD benefits at a glance. AMD does not accept unsolicited resumes from headhunters, recruitment agencies, or fee-based recruitment services. AMD and its subsidiaries are equal opportunity, inclusive employers and will consider all applicants without regard to age, ancestry, color, marital status, medical condition, mental or physical disability, national origin, race, religion, political and / or third-party affiliation, sex, pregnancy, sexual orientation, gender identity, military or veteran status, or any other characteristic protected by law. We encourage applications from all qualified candidates and will accommodate applicants’ needs under the respective laws throughout all stages of the recruitment and selection process. AMD may use Artificial Intelligence to help screen, assess or select applicants for this position. AMD’s “Responsible AI Policy” is available here. This posting is for an existing vacancy.THE ROLE : AMD is looking for a highly skilled and experienced systems deployment architect to design, plan, and lead the deployment of a large-scale GPU test automation farm in a datacenter-style environment. This individual will translate AMD’s test and validation vision into a robust, modular, and scalable infrastructure capable of supporting continuous integration and validation for next-generation products. THE PERSON : The ideal candidate combines deep technical expertise in infrastructure design with hands-on experience building large compute farms and automation systems, and has a strong understanding of datacenter operational constraints. Able to demonstrate strong architectural judgment, operational discipline, and a practical understanding of the technologies that enable scalable infrastructure. KEY RESPONSIBILITIES : Architect and design a distributed, large-scale GPU test automation farm optimized for performance, scalability, and reliability. Lead the deployment and operation of infrastructure in datacenter-like environments, ensuring compliance with standards for power, cooling, networking, and management systems. Define and enforce best practices for system configuration, monitoring, and fault tolerance to ensure high availability and performance. Collaborate with cross-functional teams (QA, IT, software, datacenter ops, and engineering) to deliver seamless test workflows and system integration. Evaluate and implement technologies that improve deployment efficiency, system observability, and scalability (containerization, virtualization, orchestration, MaaS, etc.). Mentor engineers in infrastructure design principles and contribute to the overall architectural vision of AMD’s GPU validation environment. PREFERRED EXPERIENCE : Proven expertise in GPU or HPC cluster environments, including system provisioning, scheduling, and performance tuning. Expert background in Windows and Linux administration, including automation tools and scripting. Experience with automation frameworks (Ansible, Terraform, etc.) and CI / CD pipelines for infrastructure deployment. Hands-on experience with MaaS (Metal-as-a-Service) platforms for large-scale bare-metal provisioning. Knowledge of Network Boot (PXE, iPXE, UEFI) configurations and automation. Experience building or integrating inventory health management systems, including real-time monitoring of servers, network devices, and supporting services. Skilled in space allocation and racking strategies in datacenter or lab environments. Deep understanding of power planning for dense compute environments. Experience with network design and topology optimization for high-throughput data paths. ACADEMIC CREDENTIALS : Bachelor’s or Master's degree in Computer Science, Computer Engineering, Electrical Engineering, or equivalent LOCATION : Markham, Ontario Canada #LI-PA1 #LI-HYBRID

Benefits offered are described : AMD benefits at a glance. AMD does not accept unsolicited resumes from headhunters, recruitment agencies, or fee-based recruitment services. AMD and its subsidiaries are equal opportunity, inclusive employers and will consider all applicants without regard to age, ancestry, color, marital status, medical condition, mental or physical disability, national origin, race, religion, political and / or third-party affiliation, sex, pregnancy, sexual orientation, gender identity, military or veteran status, or any other characteristic protected by law. We encourage applications from all qualified candidates and will accommodate applicants’ needs under the respective laws throughout all stages of the recruitment and selection process. AMD may use Artificial Intelligence to help screen, assess or select applicants for this position. AMD’s “Responsible AI Policy” is available here. This posting is for an existing vacancy.

Create a job alert for this search

Infrastructure Engineer GPU Test Automation Farm • MARKHAM, Ontario, Canada

Similar jobs
Sr. Electrical Engineer, Test Automation

Sr. Electrical Engineer, Test Automation

Tesla • Richmond Hill
Full-time
Electrical Engineer, Test Automation.Electrical Engineer, Test Automation.Electrical Engineer, Test Automation.Be among the first 25 applicants.Electrical Engineer, Test Automation.Get AI-powered a...Show more
Last updated: 30+ days ago • Promoted
Hybrid Technical Test Automation Lead

Hybrid Technical Test Automation Lead

Hard Rock Games • Toronto
Full-time
A leading gaming company in Toronto seeks a Technical Test Automation Lead to enhance testing and release confidence across engineering.This role involves automating processes, defining testing sta...Show more
Last updated: 12 days ago • Promoted
Infrastructure Architect – GPU Test Automation Farm

Infrastructure Architect – GPU Test Automation Farm

Advanced Micro Devices • Markham
Full-time
WHAT YOU DO AT AMD CHANGES EVERYTHING.At AMD, our mission is to build great products that accelerate next‑generation computing experiences—from AI and data centers, to PCs, gaming and embedded syst...Show more
Last updated: 30+ days ago • Promoted
Walmart Jobs in Keswick Now Hiring

Walmart Jobs in Keswick Now Hiring

GREAT PAY $17-42 p/h • Keswick, Canada
Full-time
Ready to tackle a fun and rewarding career? There are Walmart job openings in your area.Apply today and find the job that you have been looking for!.Show more
Last updated: 1 day ago • Promoted
Full-Stack Infra Engineer: Diagnostics & Automation

Full-Stack Infra Engineer: Diagnostics & Automation

AMD • Markham
Full-time
A leading technology company in Markham, ON, is seeking an innovative software engineer to develop and enhance test automation infrastructure for its state-of-the-art products.The ideal candidate h...Show more
Last updated: 23 days ago • Promoted
Power Integrity Engineer: RTL, PPA & SoC Innovation

Power Integrity Engineer: RTL, PPA & SoC Innovation

Arm • Toronto
Full-time
A leading global technology company is looking for a Power Analysis Engineer in Toronto, Ontario.This role is crucial in optimizing performance per watt for innovative solutions.Candidates should h...Show more
Last updated: 30+ days ago • Promoted
RAN QA Engineer II | LTE/5G Testing & Automation

RAN QA Engineer II | LTE/5G Testing & Automation

Intello Technologies Inc. • Toronto
Full-time
A leading technology company is seeking a Design Specialist II in Toronto to support testing LTE/5G hardware and software.Ideal candidates will have 3-5+ years of relevant experience and solid wire...Show more
Last updated: 30+ days ago • Promoted
Trigonometry Private Tutoring Jobs Georgina

Trigonometry Private Tutoring Jobs Georgina

Superprof • Georgina, Canada
Full-time +1
Superprof is Canada's #1 tutoring platform, and we're actively recruiting passionate tutors! Whether you're a student, a professional, or simply someone who loves teaching, join the largest communi...Show more
Last updated: 30+ days ago • Promoted
Quality Engineer II — Test Automation Leader

Quality Engineer II — Test Automation Leader

TD • Toronto
Full-time
A leading financial institution in Toronto is seeking a Test Automation Lead to develop and implement effective test strategies and automation frameworks.This role requires 3-5 years of relevant ex...Show more
Last updated: 25 days ago • Promoted
S0i3/PMM Emulation Engineer

S0i3/PMM Emulation Engineer

TekWissen ® • Markham
Full-time
Be among the first 25 applicants.This range is provided by TekWissen ®.Your actual pay will be based on your skills and experience — talk with your recruiter to learn more.TekWissen is a global wor...Show more
Last updated: 30+ days ago • Promoted
Railway Systems Testing & Commissioning Engineer

Railway Systems Testing & Commissioning Engineer

ALSTOM Gruppe • Toronto
Full-time
A leading transportation technology firm in Toronto seeks a Systems Testing & Commissioning Engineer to oversee integration and validation activities for railway projects.The role requires a minimu...Show more
Last updated: 30+ days ago • Promoted
Geometry Private Tutoring Jobs Georgina

Geometry Private Tutoring Jobs Georgina

Superprof • Georgina, Canada
Full-time +1
Superprof is Canada's #1 tutoring platform, and we're actively recruiting passionate tutors! Whether you're a student, a professional, or simply someone who loves teaching, join the largest communi...Show more
Last updated: 30+ days ago • Promoted
Senior Test Automation Engineer - Cloud & AWS

Senior Test Automation Engineer - Cloud & AWS

Align Technology, Inc. • Toronto
Full-time
A global leader in innovative health technologies is seeking a Sr.Software Developer in Test to enhance the cloud services platform for their product.You will work closely with clinical experts and...Show more
Last updated: 30+ days ago • Promoted
Earn money testing apps - Remote

Earn money testing apps - Remote

Almedia • Georgina, Ontario, Canada
Remote
Full-time
Get paid for testing apps, games and surveys.Almedia runs a dynamic platform where users earn money online by completing tasks, playing games, and filling out surveys.Since our launch 5 years ago, ...Show more
Last updated: 30+ days ago • Promoted
Power Analysis Engineer – Hybrid PPA & Silicon

Power Analysis Engineer – Hybrid PPA & Silicon

Arm Limited • Toronto
Full-time
A global leader in chip design seeks an experienced Power Analysis Engineer in Toronto, Canada.This role involves utilizing innovative techniques to enhance power efficiency in next generation solu...Show more
Last updated: 30+ days ago • Promoted
Senior GPU Driver Engineer (Linux) – Growth & Innovation

Senior GPU Driver Engineer (Linux) – Growth & Innovation

Huawei Canada • Markham
Full-time +1
A leading technology company in Canada is seeking a Senior Developer - GPU Driver for a 12-month contract.The ideal candidate will develop and maintain GPU drivers on Linux platforms while collabor...Show more
Last updated: 30+ days ago • Promoted
Donate your Eggs - Be an Egg Donor – Help Create Families in Canada with IndianEggDonors

Donate your Eggs - Be an Egg Donor – Help Create Families in Canada with IndianEggDonors

Surrogacy4All • Georgina
Full-time +1
Are you a kind-hearted woman who wants to help others experience the joy of parenthood?.Health Canada’s Assisted Human Reproduction Act.Reimbursement of allowable, approved expenses.A safe, support...Show more
Last updated: 20 days ago • Promoted
Lead Power Architect - GPU Platform (Hybrid)

Lead Power Architect - GPU Platform (Hybrid)

Advanced Micro Devices, Inc. • Markham
Full-time
A leading semiconductor company is seeking a Lead Principal Power Architect to influence the next generation of GPU platform power delivery.The role requires defining and developing power architect...Show more
Last updated: 9 days ago • Promoted