Talent.com
System Level Debug Engineer- Data Center GPU
System Level Debug Engineer- Data Center GPUAdvanced Micro Devices, Inc • MARKHAM, Ontario, Canada
System Level Debug Engineer- Data Center GPU

System Level Debug Engineer- Data Center GPU

Advanced Micro Devices, Inc • MARKHAM, Ontario, Canada
21 days ago
Job type
  • Full-time
Job description

WHAT YOU DO AT AMD CHANGES EVERYTHING At AMD, our mission is to build great products that accelerate next-generation computing experiences—from AI and data centers, to PCs, gaming and embedded systems. Grounded in a culture of innovation and collaboration, we believe real progress comes from bold ideas, human ingenuity and a shared passion to create something extraordinary. When you join AMD, you’ll discover the real differentiator is our culture. We push the limits of innovation to solve the world’s most important challenges—striving for execution excellence, while being direct, humble, collaborative, and inclusive of diverse perspectives. Join us as we shape the future of AI and beyond. Together, we advance your career. THE TEAM : AMD's Data Center GPU organization is transforming the industry with our AI based Graphic Processors. Our primary objective is to design exceptional products that drive the evolution of computing experiences, serving as the cornerstone for enterprise Data Centers, (AI) Artificial Intelligence, HPC and Embedded systems. If this resonates with you, come and joining our Data Center GPU organization where we are building amazing AI powered products with amazing people. THE ROLE : AMD is looking for a systems engineer to provide thought leadership and subject matter expertise to our growing team. As a key contributor, you will have a strong technical background to contribute to all aspects of the software development process. We have competitive benefit packages and an award-winning culture. Join us! The Datacenter Graphics and Accelerated Computing (DCGPU) organization is looking for an experienced system level debug engineer. Individual will be part of a team that as to bring-up, validate and ensure the platform being used is fully validated : including electrical, power, networking and SOC. Individual will be required to lead and document the plan for validating the system itself as well put in documentation for unique steps to enable it. Individual will need to be able to drive to root closure any issues encountered and communicate with the different Functional and IP layers for resolution. THE PERSON : You are a highly motivated hands-on leader with a strong development background, problem solving mentality, excellent communication skills, ability to prioritize tasks along with willingness to learn and adapt. Excellent teamwork skills and capable of leading a highly technical team. Experience in debugging of complex HW / FW issues is a must, understand the flow of a GPU through the different layers of a system and be able to validate the items connecting to the GPU SOC (pcie, vr’s, RMs, retimers, HBM, internal networking). Communication Is essential in working with different owners of the functional code stack as well as the ability to drive issues via phone calls, chat messages, e-mails. Hands on experience with Hardware in a DataCenter environment will be required. KEY RESPONSIBILITIES : Debug / triage engineer and understanding of industry tools for root causing complex issues Understanding of GPU / System level HW and SW flow Ability to probe parts of a board; check electrical and power currents and validate a system Provide leadership for driving to root cause issues Communicate / Document flows and methods of bring-up, boot-up, system initialization and debug Lead technical presentations demonstrating a good understanding of application, data, infrastructure, architecture expertise and application systems design Collaborate with application, and infrastructure architects and be responsible for the defining-designing-delivering of the technical architectures, patterns, technical quality, risks, fitness for purpose and operability of technical architecture solutions Be a leader and mentor to the operation team; be hands-on and lead by example Be able to hand-on troubleshoot and solve the technical issues; own the problem and drive for resolution Able to proactively support team culture that fosters knowledge sharing, excellence, and collaboration PREFERRED EXPERIENCE : Experience in SoC and / or System debug of complex issues Develop / Document debug capabilities on a given SOC and System Collaborate with internal teams on root causing issues, finding optimum resolutions Hands-on experience in using industry debug tools, scopes as well examine board level power Proven experience with C / C++ Demonstrable experience in facilitating Agile, Scrum or Kanban Skilled in scripting languages such as Perl, Ruby, and Shell script Proficient with revision control (GIT, SVN and CVS) Good balance of hardware, architecture, and software expertise Proven ability to drive resolution of critical problems within a lab, Datacenter Relationship with external customers / partners and able to help resolve problems in their Data Center Relationship with external customers / partners on ability to work manufacturing issues / failures Relationship with external customers / partners on ability to define rqmts for manufacturing validation ACADEMIC CREDENTIALS : Bachelors or Masters degree in electrical or computer engineering LOCATION : Markham, ON #LI-HYBRID #LI-SL2 Benefits offered are described : AMD benefits at a glance. AMD does not accept unsolicited resumes from headhunters, recruitment agencies, or fee-based recruitment services. AMD and its subsidiaries are equal opportunity, inclusive employers and will consider all applicants without regard to age, ancestry, color, marital status, medical condition, mental or physical disability, national origin, race, religion, political and / or third-party affiliation, sex, pregnancy, sexual orientation, gender identity, military or veteran status, or any other characteristic protected by law. We encourage applications from all qualified candidates and will accommodate applicants’ needs under the respective laws throughout all stages of the recruitment and selection process. AMD may use Artificial Intelligence to help screen, assess or select applicants for this position. AMD’s “Responsible AI Policy” is available here. This posting is for an existing vacancy.THE TEAM : AMD's Data Center GPU organization is transforming the industry with our AI based Graphic Processors. Our primary objective is to design exceptional products that drive the evolution of computing experiences, serving as the cornerstone for enterprise Data Centers, (AI) Artificial Intelligence, HPC and Embedded systems. If this resonates with you, come and joining our Data Center GPU organization where we are building amazing AI powered products with amazing people. THE ROLE : AMD is looking for a systems engineer to provide thought leadership and subject matter expertise to our growing team. As a key contributor, you will have a strong technical background to contribute to all aspects of the software development process. We have competitive benefit packages and an award-winning culture. Join us! The Datacenter Graphics and Accelerated Computing (DCGPU) organization is looking for an experienced system level debug engineer. Individual will be part of a team that as to bring-up, validate and ensure the platform being used is fully validated : including electrical, power, networking and SOC. Individual will be required to lead and document the plan for validating the system itself as well put in documentation for unique steps to enable it. Individual will need to be able to drive to root closure any issues encountered and communicate with the different Functional and IP layers for resolution. THE PERSON : You are a highly motivated hands-on leader with a strong development background, problem solving mentality, excellent communication skills, ability to prioritize tasks along with willingness to learn and adapt. Excellent teamwork skills and capable of leading a highly technical team. Experience in debugging of complex HW / FW issues is a must, understand the flow of a GPU through the different layers of a system and be able to validate the items connecting to the GPU SOC (pcie, vr’s, RMs, retimers, HBM, internal networking). Communication Is essential in working with different owners of the functional code stack as well as the ability to drive issues via phone calls, chat messages, e-mails. Hands on experience with Hardware in a DataCenter environment will be required. KEY RESPONSIBILITIES : Debug / triage engineer and understanding of industry tools for root causing complex issues Understanding of GPU / System level HW and SW flow Ability to probe parts of a board; check electrical and power currents and validate a system Provide leadership for driving to root cause issues Communicate / Document flows and methods of bring-up, boot-up, system initialization and debug Lead technical presentations demonstrating a good understanding of application, data, infrastructure, architecture expertise and application systems design Collaborate with application, and infrastructure architects and be responsible for the defining-designing-delivering of the technical architectures, patterns, technical quality, risks, fitness for purpose and operability of technical architecture solutions Be a leader and mentor to the operation team; be hands-on and lead by example Be able to hand-on troubleshoot and solve the technical issues; own the problem and drive for resolution Able to proactively support team culture that fosters knowledge sharing, excellence, and collaboration PREFERRED EXPERIENCE : Experience in SoC and / or System debug of complex issues Develop / Document debug capabilities on a given SOC and System Collaborate with internal teams on root causing issues, finding optimum resolutions Hands-on experience in using industry debug tools, scopes as well examine board level power Proven experience with C / C++ Demonstrable experience in facilitating Agile, Scrum or Kanban Skilled in scripting languages such as Perl, Ruby, and Shell script Proficient with revision control (GIT, SVN and CVS) Good balance of hardware, architecture, and software expertise Proven ability to drive resolution of critical problems within a lab, Datacenter Relationship with external customers / partners and able to help resolve problems in their Data Center Relationship with external customers / partners on ability to work manufacturing issues / failures Relationship with external customers / partners on ability to define rqmts for manufacturing validation ACADEMIC CREDENTIALS : Bachelors or Masters degree in electrical or computer engineering LOCATION : Markham, ON #LI-HYBRID #LI-SL2

Benefits offered are described : AMD benefits at a glance. AMD does not accept unsolicited resumes from headhunters, recruitment agencies, or fee-based recruitment services. AMD and its subsidiaries are equal opportunity, inclusive employers and will consider all applicants without regard to age, ancestry, color, marital status, medical condition, mental or physical disability, national origin, race, religion, political and / or third-party affiliation, sex, pregnancy, sexual orientation, gender identity, military or veteran status, or any other characteristic protected by law. We encourage applications from all qualified candidates and will accommodate applicants’ needs under the respective laws throughout all stages of the recruitment and selection process. AMD may use Artificial Intelligence to help screen, assess or select applicants for this position. AMD’s “Responsible AI Policy” is available here. This posting is for an existing vacancy.

Create a job alert for this search

System Level Debug Engineer Data Center GPU • MARKHAM, Ontario, Canada

Similar jobs
Data Center Infrastructure Engineer

Data Center Infrastructure Engineer

Astra North Infoteck Inc. • Markham, ON, ca
Full-time
Quick Apply
Ready to work on rotation [ shifts on need basis ].Educational Requirements / Qualifications:.High school pass out – Minimum.Should know basic computer Hardware asset knowledge.Has worked in data c...Show more
Last updated: 13 days ago
CBTC System Design Engineer – Intermediate (Hybrid)

CBTC System Design Engineer – Intermediate (Hybrid)

Hitachi Vantara Corporation • Toronto
Full-time
A leading technology solutions company is seeking a System Design Specialist - Intermediate in Toronto, Ontario.This role involves system design and modeling for communication-based train control s...Show more
Last updated: 3 days ago • Promoted
Senior Data Centre Infra Lead: Uptime & Virtualization

Senior Data Centre Infra Lead: Uptime & Virtualization

Rubicon Path • Toronto
Full-time
A technology solutions provider is seeking a Senior Infrastructure Integration Specialist in Toronto, Canada.The role supports Data Centre operations by ensuring service commitments are met and inv...Show more
Last updated: 30+ days ago • Promoted
AI Data Center DevOps Architect — Hybrid, Scalable

AI Data Center DevOps Architect — Hybrid, Scalable

Tenstorrent • Toronto
Full-time
A leading AI technology firm in Toronto is seeking a DevOps Architect to define the architecture for AI cluster control planes, focusing on scalability and security in multi-megawatt data centers.T...Show more
Last updated: 1 day ago • Promoted
Revenue Systems Engineer: GTM Data & Integrations Lead

Revenue Systems Engineer: GTM Data & Integrations Lead

ApprovalMax • Toronto
Full-time
A leading financial technology firm in Toronto is seeking a Revenue Systems Engineer to join their Revenue Operations team.This hybrid position requires expertise in HubSpot CRM, API integrations u...Show more
Last updated: 3 days ago • Promoted
Data Center Infrastructure Lead: Technical Expert

Data Center Infrastructure Lead: Technical Expert

Yondr Group • Toronto
Full-time
A dynamic tech services provider in Toronto is seeking a Technical SME to oversee electrical, controls, and mechanical assets in critical data center environments.The role involves ensuring 100% up...Show more
Last updated: 3 days ago • Promoted
Lead Developer, Design System

Lead Developer, Design System

Citco GSGS • Toronto
Full-time
Citco is a global leader in fund services, corporate governance and related asset services with staff across 80 offices worldwide.With more than $1 trillion in assets under administration, we deliv...Show more
Last updated: 4 days ago • Promoted
Senior Data Center Infrastructure Engineer

Senior Data Center Infrastructure Engineer

NEXTOLOGIES • Markham
Full-time
Physical infrastructure, networking, power, cooling, and video transport).Act as senior escalation point for data center infrastructure incidents, physical network failures, and complex on-site/col...Show more
Last updated: 30+ days ago • Promoted
Data Center Services Lead

Data Center Services Lead

Capgemini • Toronto
Full-time
Job Description - Data Center Services Lead (054357).A global leader in consulting, technology services and digital transformation, Capgemini is at the forefront of innovation to address the entire...Show more
Last updated: 30+ days ago • Promoted
Senior System Engineer - Aversan Inc.

Senior System Engineer - Aversan Inc.

Aversan Inc. • toronto, on, ca
Full-time
Aversan delivers leading-edge and reliable safety-critical electronics and software systems to the aerospace, defence, and space industries.We are currently seeking a qualified.Senior Control Syste...Show more
Last updated: 1 day ago • Promoted
CCaaS Solutions Engineer (AI-Driven Contact Center)

CCaaS Solutions Engineer (AI-Driven Contact Center)

Scotiabank • Toronto
Full-time
A leading financial institution in Canada seeks a highly skilled Solution Engineer to design and implement innovative Contact Centre as a Service (CCaaS) solutions.Responsibilities include collabor...Show more
Last updated: 30+ days ago • Promoted
Applied Data Center Design Engineer: Rack & Power

Applied Data Center Design Engineer: Rack & Power

Cerebras Systems • Toronto
Full-time
A leading AI technology company in Toronto seeks an Applied Data Center Design Engineer to translate high-level designs into operational blueprints for data center architecture.The ideal candidate ...Show more
Last updated: 30+ days ago • Promoted
Senior 5G/LTE Wireless Systems Engineer

Senior 5G/LTE Wireless Systems Engineer

W3Global • Markham
Full-time
A leading telecommunications company is seeking a Senior System Engineer in York Region, Markham.The ideal candidate should have extensive expertise in 5GNR and LTE technologies, with hands-on expe...Show more
Last updated: 3 days ago • Promoted
AWS Engineer- Team Lead

AWS Engineer- Team Lead

Bevertec • Toronto
Full-time
Cloud and Software Engineering Team Lead.Infrastructure, Cloud and Software Engineering.AWS infra with relevant services.IAM, EC2, EKS, Lambda, API Gateway, S3, RDS, KMS, VPC, Cognito, Config, SCP,...Show more
Last updated: 3 days ago • Promoted
Technical Sales Lead, Thermal Solutions (Data Center)

Technical Sales Lead, Thermal Solutions (Data Center)

Vertiv • Toronto
Full-time
A global leader in critical infrastructure technologies is seeking a Technical Sales Representative in Toronto, Ontario.The role involves managing sales activities with key accounts in the data cen...Show more
Last updated: 3 days ago • Promoted
Senior Site Reliability Engineer - AI-Driven Cloud Ops

Senior Site Reliability Engineer - AI-Driven Cloud Ops

RBC • Toronto
Full-time
A leading financial institution in Toronto is seeking a Lead Site Reliability Engineer to manage cloud environments and lead a team of SREs.This role involves setting up automation, ensuring applic...Show more
Last updated: 3 days ago • Promoted
Senior Network Engineer - Data Center & Cloud Focus

Senior Network Engineer - Data Center & Cloud Focus

Swagher • Toronto
Full-time
A leading financial institution in Toronto seeks a Senior Associate Network Engineer.You will support complex network solutions and work with cutting-edge cloud technologies.Ideal candidates have o...Show more
Last updated: 30+ days ago • Promoted
System Engineer - Signaling

System Engineer - Signaling

Alstom • Toronto
Full-time +1
At Alstom, we understand transport networks and what moves people.From high-speed trains, metros, monorails, and trams, to turnkey systems, services, infrastructure, signalling and digital mobility...Show more
Last updated: 3 days ago • Promoted