Job descriptionExperience Level: Level 3 (senior): 5-7 years
Location: Montreal (Day 1 onboarding onsite / in office presence 3x week)
This is a AI Platform Engineering role at Director level, which is part of the job family responsible for providing specialist GenAI and expertise that drive decision-making and business insights during GenAI development as well as uplifting platform features by introducing cutting edge technology capabilities, enhancements and innovative solutions.
What you’ll do in the role:
Design and build a firmwide AI development and evaluation platform with a strong focus on enterprise-scale GenAI benchmarking, assurance, and governance.
Develop self-service tooling, SDKs, and APIs to enable teams to build, evaluate, and deploy GenAI applications efficiently and safely.
Build reusable, scalable platform components for GenAI and agentic systems, including orchestration, evaluation pipelines, and model lifecycle workflows.
Lead the implementation of container-native GenAI workloads on Kubernetes / OpenShift using GitOps-driven deployment patterns.
Integrate and operate GenAI ecosystem components including LLMs, vector databases, embeddings, and agent frameworks.
Drive key architecture, product, and design decisions across security, authentication, observability, scalability, and reliability.
Establish platform best practices for GenAI evaluations, agentic systems, ModelOps / LLMOps, and production operations.
Collaborate closely with engineers, data scientists, security, and product teams to accelerate safe enterprise adoption of GenAI.
What you’ll bring to the role:
6+ years of strong hands-on software engineering experience, preferably in Python (FastAPI, Flask), building large-scale, cloud-native platforms.
Deep experience designing and operating Kubernetes / OpenShift workloads using Helm, Customize, container registries, and GitOps practices.
Hands-on experience building GenAI and LLM-based applications, including agentic orchestration, embeddings, evaluation workflows, and fine-tuning.
Strong understanding of microservices, RESTful API design, asynchronous and concurrent programming, and performance-oriented systems.
Solid foundation in data engineering principles including SQL/NoSQL stores, Kafka, Redis, vector databases, and state management at scale.
Proficiency in DevOps, CI/CD, observability (OpenTelemetry, Prometheus, Grafana), and SRE-inspired operational practices.
Strong working knowledge of security-first design, OAuth2, secure coding practices, and enterprise-grade platform controls.
Experience with agent-based frameworks or orchestration systems
Exposure to LLMOps / ModelOps / evaluation platforms
Experience working in enterprise-scale platforms or internal developer platforms
*//
EEO Employer
Minorities/ Females/ Disabled/ Veterans/ Gender Identity/ Sexual Orientation
//*