About the Role
As part of Workday’s Data platform team, you will be responsible for building, enhancing, and extending our Spark and Trino-based large-scale distributed data processing platform in the cloud. You will work alongside a top-notch team to architect and build features representing our:
High-velocity hybrid transactional/analytical engine
Low-latency interactive engine
Large scale cube builder engine
In this role, you will be a key driver in ensuring the platform is hardened for production at massive scale. This includes designing fault-tolerant architectures, building robust self-healing mechanisms, and implementing comprehensive telemetry to make the internal state of our distributed query engines completely transparent.
Key Responsibilities Include:
Developing data processing algorithms and techniques that work on large datasets, facilitating an interactive querying experience over large volumes of data.
Resiliency: Designing and implementing strategies for high availability, fault isolation, and graceful degradation of Spark and Trino clusters.
Observability: Building end-to-end tracing, deep metrics, and advanced alerting frameworks to proactively identify performance bottlenecks and system anomalies.
Scalability: Optimizing resource allocation, cluster auto-scaling, and multi-tenant isolation to support highly elastic and unpredictable workloads.
About You
Basic Qualification
- 8+ years of software development experience in Java and/or Scala in Linux/Unix environment.
- Experience operating large-scale distributed systems natively within public cloud environments (AWS or GCP)
- 3+ years in database internals, query processing, or distributed systems
Other Qualifications
You have a BS in Computer Science or a related field with 5 years of experience, or an MS/PhD in Computer Science, or a relevant area, with 3 years of experience.
A strong grasp of SQL and distributed data processing engines (e.g., Apache Spark).
Hands-on experience with Trino (formerly PrestoSQL) or Presto for executing fast, distributed SQL queries across large, heterogeneous data sources.
Proven experience architecting and tuning distributed systems for high availability, fault tolerance, and massive horizontal scale.
Experience implementing observability and telemetry frameworks (e.g., Prometheus, Grafana, OpenTelemetry, JMX metrics) to monitor complex distributed workloads.
Industry experience building and delivering high-performance data processing engines.
Familiarity with AI coding tools like Cursor etc
Experience leading or mentoring a team of engineers.
Workday Pay Transparency Statement
The annualized base salary ranges for the primary location and any additional locations are listed below. Workday pay ranges vary based on work location. As a part of the total compensation package, this role may be eligible for the Workday Bonus Plan or a role-specific commission/bonus, as well as annual refresh stock grants. Recruiters can share more detail during the hiring process. Each candidate’s compensation offer will be based on multiple factors including, but not limited to, geography, experience, skills, job duties, and business need, among other things. For more information regarding Workday’s comprehensive benefits, please .
Primary Location: CAN.BC.VancouverPrimary CAN Base Pay Range: $151,200 - $226,800 CADAdditional CAN Location(s) Base Pay Range: $151,200 - $226,800 CAD