About the Role
Architect Distributed Systems: Lead the design and implementation of high-throughput microservices and APIs (Python/Go) that serve as the backbone for Workday’s ML ecosystem.
Engineer the Platform: Build and optimize a unified ML development experience using Kubeflow, Kubernetes (EKS/GKE), and specialized compute orchestration (CPUs/GPUs).
Scale Cloud Infrastructure: Own the end-to-end lifecycle of cloud-based services, utilizing Infrastructure as Code (Terraform) to build resilient, self-healing environments.
Drive Engineering Excellence: Lead architecture reviews, code reviews, and technology evaluations to ensure our systems meet 99.99% reliability standards.
Support Agentic AI: Design the architectural patterns and observability frameworks required to support emerging Agentic AI systems and LLM-based applications.
Collaborate as a Technical Lead: Partner with data scientists, ML engineers, and architects to translate complex data needs into elegant, maintainable software solutions.
Innovate & Prototype: Research and drive adoption of new infrastructure tools with a focus on reliability, security, and enterprise-grade scale.
About You
Basic Qualifications for Principal Software Engineer
6 or more years of validated industry experience.
Bachelor’s and/or Master’s degree in Computer Science or Computer Engineering.
Strong software engineering experience with designing and building scalable, distributed systems.
Deep understanding of cloud computing, cloud infrastructure, and distributed systems; experience with AWS and GCP.
Experience developing microservices, APIs, robust cloud service, large-scale web applications, managing CI/CD workflows.
Proficiency with Python, Go, and infrastructure-as-code tools like Terraform.
Experience running and maintaining Kubernetes clusters in production.
Ensure security and compliance of cloud platforms, implementing best practices for encryption, data protection, and access control.
Other Qualifications
Experience with large-scale ML data pipelines and data lakes.
Ability to think across layers of the ML stack, from infrastructure to model deployment.
Experience developing monitoring and alerting systems for ML infrastructure.
Understanding of agentic AI concepts; experience with LangChain and LangSmith is preferred.
Proven leadership or mentoring experience.
Workday Pay Transparency Statement
The annualized base salary ranges for the primary location and any additional locations are listed below. Workday pay ranges vary based on work location. As a part of the total compensation package, this role may be eligible for the Workday Bonus Plan or a role-specific commission/bonus, as well as annual refresh stock grants. Recruiters can share more detail during the hiring process. Each candidate’s compensation offer will be based on multiple factors including, but not limited to, geography, experience, skills, job duties, and business need, among other things. For more information regarding Workday’s comprehensive benefits, please .
Primary Location: CAN.ON.TorontoPrimary CAN Base Pay Range: $168,000 - $252,000 CADAdditional CAN Location(s) Base Pay Range: $168,000 - $252,000 CAD