About the Role
We are looking for a Machine Learning Engineer to help design and build our Agent Platform—the core infrastructure that enables teams to develop, deploy, orchestrate, and operate AI agents in production.
This role is focused on building the systems and tooling required to host and scale agent-based applications powered by LLMs. You will work across the platform stack to create reusable capabilities for agent execution, workflow orchestration, observability, evaluation, reliability, and developer experience.
You’ll partner closely with applied AI, product, and infrastructure teams to define how agents are built and operated across the organization. This is an ideal role for someone who enjoys solving hard engineering problems in a fast-evolving technical space and wants to shape the foundation for the next generation of AI applications.
Responsibilities:
Design and build the core platform capabilities required to develop, host, and operate AI agents at scale.
Develop infrastructure and services for agent execution, orchestration, state management, and runtime reliability.
Build reusable abstractions, frameworks, and workflows in Python to support agent development patterns across teams.
Design and implement systems for tool use, memory, retrieval, workflow coordination, and human-in-the-loop interactions.
Build and maintain services deployed on Kubernetes, with a focus on scalability, resiliency, and operational excellence.
Develop capabilities for evaluation, tracing, observability, debugging, and performance monitoring of agent behavior in production.
Improve platform performance across latency, throughput, fault tolerance, and cost efficiency.
Create internal APIs, SDKs, and developer tooling that make it easier for engineering teams to build on the platform.
Partner with cross-functional teams to productionize new agent use cases and establish common platform patterns and best practices.
Contribute to technical architecture and help define the roadmap for agent infrastructure and platform evolution.
About You
Basic Qualifications (MLE III):
3+ yrs experience as part of a data science, machine learning software development team or relevant work in a PhD or equivalent program.
5+ years experience in Python and experience building reliable, maintainable production services.
3+ years experience with distributed systems, APIs, asynchronous workflows, and service-oriented architecture.
3+ years experience designing systems with a focus onscalability, reliability, observability, and maintainability.
Basic Qualifications (Sr. MLE):
6+ years of software engineering experience, including experience building and operating production-grade backend, ML, or platform systems.
8+ years experience in Python and experience building reliable, maintainable production services.
5+ years experience with distributed systems, APIs, asynchronous workflows, and service-oriented architecture.
5+ years experience designing systems with a focus on scalability, reliability, observability, and maintainability
Preferred Qualifications:
Experience building or supporting agent platforms, AI infrastructure, or internal developer platforms.
Experience building and deployingmachine learning or LLM-powered applications in production.
Familiarity with LLM application patterns, including:
Tool calling
Retrieval-augmented generation (RAG)
Memory and context management
Multi-step workflows and orchestration
Human-in-the-loop systems
Experience designing and implementing evaluation frameworks for LLM or agent quality.
Familiarity with vector databases, model serving, prompt/version management, and experimentation tooling.
Solid knowledge of Data Science principles and their application in NLP
Experience running services in Kubernetes-based environments.
Ability to work across ambiguity, make strong technical tradeoffs, and drive projects from concept to production.
Strong communication and collaboration skills, with the ability to partner effectively across engineering, product, and AI teams.
Workday Pay Transparency Statement
Workday pay ranges vary based on work location. As a part of the total compensation package, this role may be eligible for the Workday Bonus Plan or a role-specific commission/bonus, as well as annual refresh stock grants. Recruiters can share more detail during the hiring process. Each candidate’s compensation offer will be based on multiple factors including, but not limited to, geography, experience, skills, job duties, and business need, among other things. For more information regarding Workday’s comprehensive benefits, please .
Primary Location: CAN.ON.TorontoPrimary Location Base Pay Range: $156,000 CAD - $234,000 CADPrimary CAN Base Pay Range: $156,000 - $234,000 CAD