Job descriptionAbout the Role As a Lead Inference Platform Engineer, you will:
Optimize LLMs and ML models for high-performance inference using techniques such as quantization, pruning, distillation, and hardware specific tuning
Deploy and scale inference workloads on GPUs across AWS, Azure, GCP and internal Kubernetes clusters, ensuring predictable performance during peak traffic hours, especially during business hours
Implement routing and failover strategies for OpenAI/Anthropic/Vertex AI traffic
Integrate models into production grade APIs supporting TR products and enterprise workflows.
Develop highly optimized environment and eliminate performance bottlenecks to reduce latency
Collaborate with Platform Engineering teams (Landing Zones, Network, Storage, Compute, AI) to ensure inference workloads align with TR’s cloud native patterns (AWS, Azure, GCP, OCI)
Build and optimize containerized inference pipelines using Kubernetes for large‑scale distributed workloads
Ensure compliance with TR’s AI standards for deployment, monitoring, governance, and drift detection
Profile inference performance, identify GPU/CPU bottlenecks, and optimize compute utilization across heterogeneous hardware
Implement observability and health monitoring for inference pipelines, ensuring reliability of enterprise AI services.
Collaborate with platform teams to enhance capacity forecasting for AI workloads
Work with Product, Data Science, Architecture, and Enterprise AI teams to onboard new research models into production
Collaborate closely with AI engineers to invent new quantization techniques, improve numerical precision, and explore non‑standard architectures.
Partner with Cloud Engineers (Azure, AWS, GCP) to develop guardrails and automation that support inference workload.
Support the scale out of AI infrastructure during critical releases and global product rollouts.
About You You are a potential fit for the role, Lead Inference Platform Engineer, if your background includes:
Required Skills & Qualifications
Strong understanding of ML/LLM fundamentals and inference optimization techniques.
Hands‑on experience with GPU programming (CUDA preferred), inference runtimes (TensorRT, ONNX Runtime), and deep learning frameworks (PyTorch/TensorFlow)
Proficiency in Python and at least one systems language (C++ strongly preferred for performance critical inference paths)
Experience deploying AI workloads to AWS/GCP/Azure and Kubernetes
Familiarity with vector search systems (OpenSearch vectors) and retrieval augmented generation pipelines
Knowledge of distributed systems, microservices, CI/CD, and cloud native architecture
Experience with AI networks, such as CNNs, transformers, and diffusion model architectures, and their performance characteristics
Understanding of GPU, Multithreading and/or other accelerators with vectorized instructions
Specialized experience in one or more of the following machine learning/deep learning domains: Model compression, hardware aware model optimizations, hardware accelerators architecture, GPU/ASIC architecture, machine learning compilers, high performance computing, performance optimizations, numerics and SW/HW co-design.
Preferred Qualifications
3+ years production experience deploying ML/LLM models at scale
Experience in managing GPU fleets or inference clusters across public cloud and container platforms.
Experience supporting enterprise grade AI workloads in regulated or compliance heavy environments.
What’s in it For You?
Hybrid Work Model: We’ve adopted a flexible hybrid working environment (2-3 days a week in the office depending on the role) for our office-based roles while delivering a seamless experience that is digitally and physically connected.
Flexibility & Work-Life Balance: Flex My Way is a set of supportive workplace policies designed to help manage personal and professional responsibilities, whether caring for family, giving back to the community, or finding time to refresh and reset. This builds upon our flexible work arrangements, including work from anywhere for up to 8 weeks per year, empowering employees to achieve a better work-life balance.
Career Development and Growth: By fostering a culture of continuous learning and skill development, we prepare our talent to tackle tomorrow’s challenges and deliver real-world solutions. Our Grow My Way programming and skills-first approach ensures you have the tools and knowledge to grow, lead, and thrive in an AI-enabled future.
Industry Competitive Benefits: We offer comprehensive benefit plans to include flexible vacation, two company-wide Mental Health Days off, access to the Headspace app, retirement savings, tuition reimbursement, employee incentive programs, and resources for mental, physical, and financial wellbeing.
Culture: Globally recognized, award-winning reputation for inclusion and belonging, flexibility, work-life balance, and more. We live by our values: Obsess over our Customers, Compete to Win, Challenge (Y)our Thinking, Act Fast / Learn Fast, and Stronger Together.
Social Impact: Make an impact in your community with our Social Impact Institute. We offer employees two paid volunteer days off annually and opportunities to get involved with pro-bono consulting projects and Environmental, Social, and Governance (ESG) initiatives.
Making a Real-World Impact: We are one of the few companies globally that helps its customers pursue justice, truth, and transparency. Together, with the professionals and institutions we serve, we help uphold the rule of law, turn the wheels of commerce, catch bad actors, report the facts, and provide trusted, unbiased information to people all over the world.
Thomson Reuters complies with local laws that require upfront disclosure of the expected pay range for a position. The base compensation range varies across locations. For Ontario, Canada, the base compensation range for this role is $140,000 CAD - $175,000 CAD. Base pay is positioned within the range based on several factors including an individual’s knowledge, skills and experience with consideration given to internal equity. Base pay is one part of a comprehensive Total Reward program which also includes flexible and supportive benefits and other wellbeing programs. This role may also be eligible for an Annual Bonus based on a combination of enterprise and individual performance.
As a global business, we rely on the unique backgrounds, perspectives, and experiences of all employees to deliver on our business goals. To ensure we can do that, we seek talented, qualified employees in all our operations around the world regardless of race, color, sex/gender, including pregnancy, gender identity and expression, national origin, religion, sexual orientation, disability, age, marital status, citizen status, veteran status, or any other protected classification under applicable law. Thomson Reuters is proud to be an Equal Employment Opportunity Employer providing a drug-free workplace.
We also make reasonable accommodations for qualified individuals with disabilities and for sincerely held religious beliefs in accordance with applicable law. More information on requesting an accommodation here.
#J-18808-Ljbffr