Talent.com
Advanced Micro Devices
Thermal Attainment Engineer - Data Center GPUAdvanced Micro Devices • Toronto, Canada
No longer accepting applications
Thermal Attainment Engineer - Data Center GPU

Thermal Attainment Engineer - Data Center GPU

Advanced Micro Devices • Toronto, Canada
2 days ago
Job type
  • Full-time
Job description
WHAT YOU DO AT AMD CHANGES EVERYTHING

At AMD, our mission is to build great products that accelerate next-generation computing experiences—from AI and data centers, to PCs, gaming and embedded systems. Grounded in a culture of innovation and collaboration, we believe real progress comes from bold ideas, human ingenuity and a shared passion to create something extraordinary. When you join AMD, you’ll discover the real differentiator is our culture. We push the limits of innovation to solve the world’s most important challenges—striving for execution excellence, while being direct, humble, collaborative, and inclusive of diverse perspectives. Join us as we shape the future of AI and beyond.

Together, we advance your career.

THE ROLE The successful candidate will assume responsibility for silicon and system level electrical validation, maximizing performance/watt through power management feature tuning and optimization, power model correlation and prototyping activities of AMD Datacenter GPU products. You will interact with key System and Power Management Architects, Firmware teams, Power Modelling Leads, Board Design & Validation Engineers, Performance engineers, Customer Engineering teams as well as Product Definition team to achieve the desired product goals.

THE PERSON The ideal person is passionate about technology and building great products. You have strong Power fundamentals, understanding of Power Management features, hands on experience optimizing them, silicon validation experience, good understanding of PID controllers, PDN, Physical design, Semiconductor Process, Thermal/Power interactions, Power/Performance optimization & Power Models background. You have excellent communication skills, critical problem-solving skills, data analysis and visualization skills, able to multi-task and work with cross-domain, cross-functional teams to build state of the art HPC & AI products. Must be a self-starter, strong team collaborator and able to independently drive tasks to completion working with cross-functional teams. You are hands‑on, willing to interact directly with the hardware, utilize scopes and probes to gather detailed electrical information. You have automation, scripting, data processing and analysis skills. Have leadership and mentorship skills to lead, train and groom college grads or junior engineers in the team.

KEY RESPONSIBILITIES

Execute Power Attainment test plans in post‑silicon phases in support of Data Center GPU product roadmap optimizing for power, perf/watt and performance.

Configure and setup ML/AI Datacenter GPU systems for data collection, experiments and test plan execution

Utilize lab equipment such as oscilloscope, high speed probes, function generator and data acquisition equipment to gather required electrical characterization data for power and performance optimization.

Actively participate in analysis of post silicon performance and power data collected to ensure integrity of results, provide summary and conclusions of results, drive productization of features

Analyze and debug interactions between various power management features

Analyzing data from workload or execution output datalogs using excel or JMP analysis tools manually or developed automation

Execute ROI analysis of power management features and provide feedback to power management architecture team.

Support prototyping experiments and development of new GPU features that impact performance and power

Electrically stress the system, validate the limits of ASIC and system/board components and optimize settings for stability and performance.

Troubleshoot system‑level issues that may occur in test environments and platforms

Proactively drive continuous improvement for post‑silicon power attainment activities

Participate in development of automation environment in developing scripts automating workloads, enhancing capabilities of execution capabilities in Linux, Python and other support software support tools

Work in a fast‑paced resource constrained environment to build top of the line HPC & AI GPU products

Provide Technical leadership for electrical validation and power optimization in datacenter platforms.

Be part of team building, develop and mentor junior engineers into technical leads of future

Drive process efficiencies, automation and AI for debug and analysis.

Provide weekly readouts to executives on progress, blockers and next steps.

Work with Rack & Cluster teams to develop and execute E2E electrical validation test plan, build electrically robust, reliant, stable and performant systems.

Debug customer issues, collaborate with L1, L2 support and customers to design DOEs to isolate the problem and provide a fix.

PREFERRED EXPERIENCE

7 years of hands‑on experience as an engineer in semiconductor industry.

Demonstrated ability to execute and deliver multiple projects in a timely fashion.

Prioritizing work items in a fast‑paced environment and escalating as necessary.

Excellent grasp of computer organization/architecture, GPU architecture and power management

Knowledge in power limited performance methodologies and control theory

Extensive experience in platform optimization. Solid knowledge of Computer I/O.

Experience with tools for power and performance analysis

Strong programming skills, scripting experience in Python preferred

Familiarity with HPC/AI applications, benchmarks would be a big plus.

Desirable to be proficient in Linux command line environment and Shellscripting

Deep knowledge of power management techniques like deep sleep, clock gating, pstates etc

Experience with container technologies (ex. Docker)

Strong analytical and problem-solving skills with a keyattention to detail

Experience in data analysis, summarization, and presentation

Excellent presentation and communication skills

Experience in use and debug of lab tools such as oscilloscopes, DAQs, power measurement capabilities

Experience working in Windows and Linux environments

Experience working in data center environments, knowledge of boards, systems, racks, clusters and building large electrically stable systems.

ACADEMIC CREDENTIALS Bachelors in Computer Engineering, Electrical Engineering, or Computer Science. MS Preferred.

LOCATION Markham, ON

#LI-HYBRID

Benefits offered are described: AMD benefits at a glance.

AMD does not accept unsolicited resumes from headhunters, recruitment agencies, or fee‑based recruitment services. AMD and its subsidiaries are equal opportunity, inclusive employers and will consider all applicants without regard to age, ancestry, color, marital status, medical condition, mental or physical disability, national origin, race, religion, political and/or third‑party affiliation, sex, pregnancy, sexual orientation, gender identity, military or veteran status, or any other characteristic protected by law. We encourage applications from all qualified candidates and will accommodate applicants’ needs under the respective laws throughout all stages of the recruitment and selection process.

AMD may use Artificial Intelligence to help screen, assess or select applicants for this position. AMD’s “Responsible AI Policy” is available here.

This posting is for an existing vacancy.

#J-18808-Ljbffr
Create a job alert for this search

Thermal Attainment Engineer - Data Center GPU • Toronto, Canada

Similar jobs

Senior Analytics (AWS DBT) Engineer - richmond hill

Mastech Digitalrichmond hill, on, ca
Full-time

We are seeking a Senior AWS DBT Engineer to play a critical role in transforming an existing analytics ecosystem into a modern, scalable Databricks + dbt architecture (Bronze → Silver → Gold).This ... Show more

 • Promoted

Senior Analytics (AWS DBT) Engineer - Mastech Digital

Mastech Digitalmarkham, on, ca
Full-time

We are seeking a Senior AWS DBT Engineer to play a critical role in transforming an existing analytics ecosystem into a modern, scalable Databricks + dbt architecture (Bronze → Silver → Gold).This ... Show more

 • Promoted

Workday Integration Engineer / Lead - richmond hill

VDartrichmond hill, on, ca
Full-time

Job Title: Workday Integration Engineer / Lead (Orchestration, Extend, APIs, Event-Driven).We are seeking a Workday Integration Engineer/Lead with strong hands-on experience designing and deliverin... Show more

 • Promoted • New!

Antenna Design Team Lead in Aurora

Norsat InternationalAurora, York Region, CA
Full-time

Lead the Antenna Design team at a prominent firm in Aurora, ON, focusing on innovative wireless solutions.This full-time, permanent role emphasizes leadership, technical support, and process improv... Show more

 • Promoted

Data Center Operations Engineer (Electrical or Mechanical)

Serverfarm LLCToronto, ON, CA
Full-time

Serverfarm is a leading developer and operator of data centers with over 750+ locations and key customer relationships in 45 countries.We're revolutionizing how data centers operate across North Am... Show more

 • Promoted • New!

Commissioning Engineer (Data Centers)

JobotToronto, ON, CA
Full-time

This range is provided by Jobot.Your actual pay will be based on your skills and experience — talk with your recruiter to learn more.Commissioning Engineer (Data Centers) - Competitive Compensation... Show more

 • Promoted

Data Center Operations Engineer- Brampton - Yoh, A Day & Zimmermann Company

Yoh, A Day & Zimmermann Companytoronto, on, ca
Full-time

Data Center Operations Engineer.You will own the day-to-day reliability and performance of data centers, supporting both IT and facility infrastructure.This includes installing and configuring serv... Show more

 • Promoted • New!

Data Center Operations Engineer- Brampton

Yoh, A Day & Zimmermann Companytoronto, on, ca
Full-time

Data Center Operations Engineer.You will own the day-to-day reliability and performance of data centers, supporting both IT and facility infrastructure.This includes installing and configuring serv... Show more

 • Promoted • New!

Lead Data Engineer at Project X Ltd.

Project X Ltd.Toronto, ON, CA
Full-time

Join the innovative team at Project X Ltd.Lead Data Engineer in Toronto, Ontario, with a hybrid working model.Utilize your expertise in Snowflake to address complex data challenges for our clients.... Show more

 • Promoted

Thermal Attainment Engineer - Data Center GPU

Advanced Micro DevicesMarkham, York Region, CA
Full-time

WHAT YOU DO AT AMD CHANGES EVERYTHING.At AMD, our mission is to build great products that accelerate next-generation computing experiences—from AI and data centers, to PCs, gaming and embedded syst... Show more

 • Promoted

Transmission & Interconnection Engineer - AVA Energy

AVA Energynewmarket, on, ca
Full-time

AVA Energy are partnered with an established firm in the Energy space as they look to hire a Transmission & Interconnection Engineer to bolster their Power Systems offering.This candidate must be a... Show more

 • Promoted

Workday Integration Engineer / Lead

VDartnewmarket, on, ca
Full-time

Job Title: Workday Integration Engineer / Lead (Orchestration, Extend, APIs, Event-Driven).We are seeking a Workday Integration Engineer/Lead with strong hands-on experience designing and deliverin... Show more

 • Promoted • New!

Senior Analog Design Engineer (Project-Based/Contract) - richmond hill

4FMV Incrichmond hill, on, ca
Part-time

Immediate 2-week sprint (up to 40 hours); ongoing part-time starting late May.Could also lead to more projects and hours.Our client is seeking a versatile.You will act as a technical lead, performi... Show more

 • Promoted

Senior Electrical Design Engineer (Data Center Operations)

EquinixToronto, ON, CA
Full-time

Equinix is the world’s digital infrastructure company, shortening the path to connectivity to enable the innovations that enrich our work, life and planet.A place where bold ideas are welcomed, hum... Show more

 • Promoted

Advanced Data Engineering Position at BMO GAM

Jay AnalytixToronto, ON, CA
Full-time

Shape the future of data engineering as a Senior Data Engineer at BMO GAM, focusing on advanced analytics in Asset Management.Your expertise in AWS migration and data pipeline design is vital for t... Show more

 • Promoted • New!

GPU Cloud Platform Engineer

Yotta LabsToronto, ON, CA
Full-time

Yotta Labs is pioneering the development of a Decentralized Operating System (DeOS) for AI workload orchestration at a planetary scale.Our mission is to democratize access to AI resources by aggreg... Show more

 • Promoted

High-Performance Compute Engineer

QuantiphiToronto, ON, CA
Full-time

Drive advancements in AI as a High-Performance Compute Engineer specializing in GenAI workloads.Your skills in GPU profiling and infrastructure design will be crucial to success.In this senior role... Show more

 • Promoted

GPU Architect/Designer

Syndesus, Inc.richmond hill, on, ca
Full-time

Our client is a well-funded, venture-backed startup developing next-generation GPU technology.The company is in a growth stage with significant capital backing and is building a world-class enginee... Show more

 • Promoted • New!

Thermal Attainment Engineer - Data Center GPU

AMDMarkham, York Region, CA
Full-time

WHAT YOU DO AT AMD CHANGES EVERYTHING.At AMD, our mission is to build great products that accelerate next‑generation computing experiences—from AI and data centers, to PCs, gaming and embedded syst... Show more

 • Promoted

Lead ML Engineer

HaysGreater Toronto Area, Canada, Canada
Full-time

You’ll be joining a leading Canadian digital organization building advanced eCommerce experiences across grocery, beauty, pharmacy, loyalty, and apparel.This team handles millions of daily customer... Show more