Talent.com
Alpha Compute
Infrastructure Engineer: GPU Fleet (HPC) - Alpha ComputeAlpha Compute • oshawa, on, ca
Infrastructure Engineer: GPU Fleet (HPC) - Alpha Compute

Infrastructure Engineer: GPU Fleet (HPC) - Alpha Compute

Alpha Compute • oshawa, on, ca
13 hours ago
Job type
  • Full-time
Job description

Infrastructure Engineer: GPU Fleet (HPC)


About the Company

Alpha Compute Corp. (NASDAQ: ALP), formerly AlphaTON Capital Corp. (NASDAQ: ATON), is a technology leader in AI GPU-as-a-service (GPUaaS) and AI Confidential Compute. Alpha Compute builds and operates businesses at the intersection of confidential compute, artificial intelligence, and digital assets. The Company’s GPU assets deliver privacy-preserving computation to partners and applications including Telegram, Animoca Brands, GAMEE, and Midnight Network.


About the Role

Alpha Compute is scaling the next generation of AI infrastructure. We are seeking a Lead GPU Infrastructure Engineer to architect and own the lifecycle of our high-density GPU fleet (H200, B200, and B300). You will not be inheriting legacy systems; you will be building the software-defined systems that deliver enterprise-grade availability for massive production AI training workloads.

Visit https://www.alphacompute.ai/


Core Responsibilities

  • Fleet Architecture & Lifecycle: Own the end-to-end health of our H200, B200, and B300 nodes. You are responsible for the “Day 0” to “Day N” lifecycle—from firmware validation and bare-metal provisioning to decommissioning.
  • Thermal & Power Management: Lead the operational oversight of high-density liquid-cooled environments. Monitor CDU (Coolant Distribution Unit) health and secondary loop telemetry alongside GPU thermals for extreme 120kW+ racks.
  • Auto-Remediation & Observability: Architect a telemetry stack using Prometheus, Grafana, and NVIDIA DCGM that doesn’t just alert you to issues, but actively triggers automated remediation (e.g., automated node draining, reboots, and health validation) for common hardware regressions.
  • NetBox Integration: Own the migration of our inventory to NetBox DCIM. Build the API integrations that make NetBox the undisputed, authoritative source of truth for asset tracking, IPAM, and cabling for our compliance audits.
  • Vendor & Operator Authority: Serve as the primary technical interface for third-party facility operators and MSPs. Set the bar for SLA/KPI compliance, lead technical post-mortems, and manage escalations for cluster-level outages.
  • Commercial Support: Serve as the technical authority on enterprise deal cycles, supporting the Sales team with capacity planning, infrastructure deep-dives, and technical reviews for top-tier clients.
  • On-Call Leadership: Participate in a 24/7 on-call rotation. This role carries primary accountability for fleet availability and incident response.


Technical Requirements

  • HPC & GPU Pedigree: Extensive experience managing large-scale HPC environments or production GPU fleets at a hyperscaler, neocloud, or top-tier research facility.
  • Hopper & Blackwell Mastery: Deep, hands-on experience with H200, B200, or B300 systems. You must intimately understand the unique power, thermal, and networking demands of Blackwell-class hardware.
  • Fabric & Interconnects: Expert knowledge of 400G/800G InfiniBand (ConnectX-7 NDR / ConnectX-8 XDR), NVLink, and NVSwitch architectures.
  • Engineering Mindset: Strong Linux internals and proven proficiency in building bulletproof infrastructure automation using Python or Go.
  • Observability: Deep experience deploying and scaling DCGM-based telemetry and SNMP-based environmental monitoring.


Strong Plus

  • Liquid Cooling Experience: Direct experience with Direct-to-Chip (DLC) systems, coolant chemistry management, or immersion cooling.
  • NVIDIA Mission Control: Familiarity with NVIDIA Mission Control for Blackwell-class cluster management.
  • Confidential Compute: Expertise in Intel TDX or NVIDIA RIM attestation flows.
  • Early-Stage Growth: Prior experience as an initial infrastructure hire responsible for building standards from the ground up.

  • Type: Full-time

    Location: Remote, North America (Core working hours must overlap with EST/PST business hours)

    Alpha Compute Corp. is an equal opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all employees.



    Create a job alert for this search

    Infrastructure Engineer: GPU Fleet (HPC) - Alpha Compute • oshawa, on, ca

    Similar jobs

    Physics Private Tutoring Jobs Beaverton

    SuperprofBeaverton, Canada
    CA$20.00 hourly
    Full-time +1

    Superprof is Canada's #1 tutoring platform, and we're actively recruiting passionate tutors! Whether you're a student, a professional, or simply someone who loves teaching, join the largest communi... Show more

     • Promoted

    Infrastructure Engineer: GPU Fleet (HPC)

    Alpha Computeoshawa, on, ca
    Full-time

    Infrastructure Engineer: GPU Fleet (HPC).NASDAQ: ALP), formerly AlphaTON Capital Corp.NASDAQ: ATON), is a technology leader in AI GPU-as-a-service (GPUaaS) and AI Confidential Compute.Alpha Compute... Show more

     • Promoted • New!

    Cloud-Native Kubernetes Engineer for Hybrid Infrastructure Solutions

    Cpus Engineering Staffing Solutions Inc.Oshawa, Ontario, Canada
    Full-time

    Transform cloud environments as a skilled Kubernetes Engineer.Design, implement, and manage container orchestration infrastructures in a hybrid work model, blending on-site and remote engagements.T... Show more

     • Promoted

    Trigonometry Private Tutoring Jobs Beaverton

    SuperprofBeaverton, Canada
    CA$20.00 hourly
    Full-time +1

    Superprof is Canada's #1 tutoring platform, and we're actively recruiting passionate tutors! Whether you're a student, a professional, or simply someone who loves teaching, join the largest communi... Show more

     • Promoted

    Sr AWS Cloud Engineer (Banking/Capital Markets)

    companyoshawa, on, ca
    Full-time

    Senior AWS Cloud Engineer / Sr AWS DevOps (10+ years of experience).Duration: 6mo (high change of extension).Location: Downtown Toronto or remore (within Canada only).AWS cloud infrastructure / pla... Show more

     • Promoted

    Firmware Engineer - port perry

    Adamson Systems Engineeringport perry, on, ca
    Full-time

    Engineering team and is looking for an additional.This role offers the opportunity to work on industry-leading professional audio products, contributing to firmware development from early design th... Show more

     • Promoted

    SDL Integration Engineer - 5G and Cloud - oshawa

    Global Connect Technologiesoshawa, on, ca
    Full-time

    Job Title: SDL Integration Engineer – 5G Core & Cloud.Nokia Service Delivery Layer (SDL).The ideal candidate will have strong expertise in.This role involves end-to-end ownership of installation, v... Show more

     • Promoted

    Firmware Engineer - Adamson Systems Engineering

    Adamson Systems Engineeringport perry, on, ca
    Full-time

    Engineering team and is looking for an additional.This role offers the opportunity to work on industry-leading professional audio products, contributing to firmware development from early design th... Show more

     • Promoted

    Sr AWS Cloud Engineer (Banking/Capital Markets) - oshawa

    companyoshawa, on, ca
    Full-time

    Senior AWS Cloud Engineer / Sr AWS DevOps (10+ years of experience).Duration: 6mo (high change of extension).Location: Downtown Toronto or remore (within Canada only).AWS cloud infrastructure / pla... Show more

     • Promoted

    Transmission & Interconnection Engineer - AVA Energy

    AVA Energyoshawa, on, ca
    Full-time

    AVA Energy are partnered with an established firm in the Energy space as they look to hire a Transmission & Interconnection Engineer to bolster their Power Systems offering.This candidate must be a... Show more

     • Promoted

    Algebra Private Tutoring Jobs Beaverton

    SuperprofBeaverton, Canada
    CA$20.00 hourly
    Full-time +1

    Superprof is Canada's #1 tutoring platform, and we're actively recruiting passionate tutors! Whether you're a student, a professional, or simply someone who loves teaching, join the largest communi... Show more

     • Promoted

    SDL Integration Engineer - 5G and Cloud

    Global Connect Technologiesoshawa, on, ca
    Full-time

    Job Title: SDL Integration Engineer – 5G Core & Cloud.Nokia Service Delivery Layer (SDL).The ideal candidate will have strong expertise in.This role involves end-to-end ownership of installation, v... Show more

     • Promoted

    Sr AWS Cloud Engineer (Banking/Capital Markets) - company

    companyoshawa, on, ca
    Full-time

    Senior AWS Cloud Engineer / Sr AWS DevOps (10+ years of experience).Duration: 6mo (high change of extension).Location: Downtown Toronto or remore (within Canada only).AWS cloud infrastructure / pla... Show more

     • Promoted

    Infrastructure Engineer: GPU Fleet (HPC) - oshawa

    Alpha Computeoshawa, on, ca
    Full-time

    Infrastructure Engineer: GPU Fleet (HPC).NASDAQ: ALP), formerly AlphaTON Capital Corp.NASDAQ: ATON), is a technology leader in AI GPU-as-a-service (GPUaaS) and AI Confidential Compute.Alpha Compute... Show more

     • Promoted • New!

    Geometry Private Tutoring Jobs Beaverton

    SuperprofBeaverton, Canada
    CA$20.00 hourly
    Full-time +1

    Superprof is Canada's #1 tutoring platform, and we're actively recruiting passionate tutors! Whether you're a student, a professional, or simply someone who loves teaching, join the largest communi... Show more

     • Promoted

    OPEN: Kubernetes Engineer

    Cpus Engineering Staffing Solutions Inc.Oshawa, Durham Region, CA
    Full-time

    We are currently requesting resumes for the following position.November 12th, 2025 (5:00PM EST).Design and implement cloud-native infrastructure using Kubernetes and container orchestration platfor... Show more

     • Promoted

    Computer programming Private Tutoring Jobs Beaverton

    SuperprofBeaverton, Canada
    CA$20.00 hourly
    Full-time +1

    Superprof is Canada's #1 tutoring platform, and we're actively recruiting passionate tutors! Whether you're a student, a professional, or simply someone who loves teaching, join the largest communi... Show more

     • Promoted

    Firmware Engineer

    Adamson Systems EngineeringPort Perry, ON, Canada
    Full-time

    Engineering team and is looking for an additional.This role offers the opportunity to work on industry-leading professional audio products, contributing to firmware development from early design th... Show more