About the Role
Architect Distributed Systems: Lead the design and implementation of high-throughput microservices and APIs (Python/Go) that serve as the backbone for Workday’s ML ecosystem.
Engineer the Platform: Build and optimize a unified ML development experience using Kubeflow, Kubernetes (EKS/GKE), and specialized compute orchestration (CPUs/GPUs).
Scale Cloud Infrastructure: Own the end-to-end lifecycle of cloud-based services, utilizing Infrastructure as Code (Terraform) to build resilient, self-healing environments.
Drive Engineering Excellence: Lead architecture reviews, code reviews, and technology evaluations to ensure our systems meet 99.99% reliability standards.
Support Agentic AI: Design the architectural patterns and observability frameworks required to support emerging Agentic AI systems and LLM-based applications.
Collaborate as a Technical Lead: Partner with data scientists, ML engineers, and architects to translate complex data needs into elegant, maintainable software solutions.
Innovate & Prototype:Research and drive adoption of new infrastructure tools with a focus on reliability, security, and enterprise-grade scale.
Lead in architecture reviews, code reviews, and technology evaluation.
Own and develop cloud-based services end-to-end, including infrastructure as code.
Proficiency with Python, Go, and infrastructure-as-code tools like Terraform.
Design and build software solutions for efficient organization, storage, and retrieval of data to enable substantial scale.
Build an MLOps platform primarily using Kubeflow, Kubernetes, and other ML ecosystem tools for a unified ML development experience.
Apply cloud engineering and security best practices to build robust, scalable infrastructure for ML capabilities.
Work with multi-functional teams to deliver scalable, secure, and reliable solutions.
Effectively engage with data scientists, ML engineers, PMs, and architects in requirements elaboration and drive technical solutions.
Build systems and dashboards to monitor service health and performance.
Research, evaluate, prototype, and drive adoption of new platform tools and technologies with reliability and scale in mind.
Understand and support the implementation of agentic AI systems; familiarity with LangChain and LangSmith is preferred.
About You
Basic Qualifications for Principal Software Engineer
6 or more years of validated industry experience.
Bachelor’s and/or Master’s degree in Computer Science or Computer Engineering.
Strong software engineering experience with designing and building scalable, distributed systems.
Deep understanding of cloud computing, cloud infrastructure, and distributed systems; experience with AWS and GCP.
Experience developing microservices, APIs, and large-scale web applications.
Proficiency with Python, Go, and infrastructure-as-code tools like Terraform.
Experience running and maintaining Kubernetes clusters in production.
Implement and manage CI/CD workflows to automate testing, integration, and delivery of software components.
Design, implement, and maintain robust cloud services for deploying, monitoring, and scaling, primarily with Kubernetes.
Troubleshoot and resolve performance bottlenecks, system outages, and operational issues collaboratively with other engineering teams.
Ensure security and compliance of cloud platforms, implementing best practices for encryption, data protection, and access control.
Stay abreast of industry trends and emerging technologies, providing recommendations for continuous improvement of engineering practices.
Other Qualifications
Experience with large-scale ML data pipelines and data lakes.
Ability to think across layers of the ML stack, from infrastructure to model deployment.
Experience developing monitoring and alerting systems for ML infrastructure.
Understanding of agentic AI concepts; experience with LangChain and LangSmith is preferred.
Proven leadership or mentoring experience.
Workday Pay Transparency Statement
The annualized base salary ranges for the primary location and any additional locations are listed below. Workday pay ranges vary based on work location. As a part of the total compensation package, this role may be eligible for the Workday Bonus Plan or a role-specific commission/bonus, as well as annual refresh stock grants. Recruiters can share more detail during the hiring process. Each candidate’s compensation offer will be based on multiple factors including, but not limited to, geography, experience, skills, job duties, and business need, among other things. For more information regarding Workday’s comprehensive benefits, please .
Primary Location: CAN.ON.TorontoPrimary CAN Base Pay Range: $168,000 - $252,000 CADAdditional CAN Location(s) Base Pay Range: $168,000 - $252,000 CAD