About the Company
Our client is a well-funded, venture-backed startup developing next-generation GPU technology. The company is in a growth stage with significant capital backing and is building a world-class engineering team to design high-performance, scalable GPU architectures from the ground up.
This is a rare opportunity to join at a foundational stage and directly shape the direction of cutting-edge silicon.
About the Role
We are seeking an experienced GPU Architect/Designer with strong expertise in shader core architecture and SIMT (Single Instruction Multiple Threads) execution models. This role involves defining next-generation GPU microarchitecture, optimizing throughput and efficiency, and driving scalable, high-performance parallel compute solutions. The ideal candidate will have deep knowledge of GPU shader pipelines, thread scheduling, memory hierarchy, and parallel execution models, with experience translating architectural concepts into high-quality RTL implementations.
Responsibilities
Architecture & Microarchitecture
- Define and evolve GPU shader core architecture, including SIMT execution units and pipeline design.
- Design warp/wavefront scheduling, thread dispatch, and execution models.
- Architect SIMT execution pipelines, including ALU pipelines, vector units, and control flow units.
- Define thread divergence handling, reconvergence strategies, and branch control mechanisms.
- Develop scalable shader architectures supporting high thread-level parallelism.
- Collaborate on ISA definitions related to shader and compute workloads.
- Analyze shader workloads and identify performance bottlenecks.
- Optimize GPU execution efficiency across diverse workloads including compute shaders, AI/ML kernels, and high-performance parallel workloads.
- Drive performance-per-watt and area efficiency improvements.
Memory & Interconnect
- Define GPU memory subsystem interactions including register files, shared/local memory, L1/L2 cache hierarchy, and memory coalescing mechanisms.
- Optimize memory access scheduling and bandwidth utilization.
- Collaborate on interconnect and memory fabric architecture.
RTL & Design
- Translate architectural specifications into microarchitecture definitions.
- Implement shader pipeline logic in SystemVerilog.
Verification & Validation
- Define architectural test plans and validation strategies.
- Develop directed tests, constrained-random tests, and performance validation frameworks.
- Analyze simulation and silicon results to drive design improvements.
Qualifications
- Education: Bachelor's, Master's, or PhD in Computer Engineering, Electrical Engineering, or Computer Science.
- 10+ years of experience in GPU, CPU, or parallel processor architecture.
Required Skills
Strong experience with:
- SIMT / SIMD architectures
- Shader core design
- Thread scheduling
- Pipeline microarchitecture
- Memory hierarchy design
Proficiency in:
- SystemVerilog or Verilog
- Microarchitecture specification development
- Performance modeling tools
- RTL-level debugging
Deep understanding of:
- Parallel computing models
- GPU execution models
- Pipeline hazard handling
- Synchronization primitives
Pay range and compensation package
175,000 - 250,000 USD + Meaningful Equity
Equal Opportunity Statement
We are committed to diversity and inclusivity in our hiring practices.