Talent.com
Citi
Site Reliability Engineering LeadCiti • Mississauga, Peel Region, Canada
Site Reliability Engineering Lead

Site Reliability Engineering Lead

Citi • Mississauga, Peel Region, Canada
9 days ago
Job type
  • Full-time
Job description

We are seeking an experienced and motivated team member to support our AI and DevOps Platform Support team in North America. This role is responsible for contributing to the stability, reliability, and performance of our critical AI and DevOps platforms. The team supports a wide range of services, including multiple AI applications, developer tools, and CI/CD pipeline technologies used across the organization. The ideal candidate will help lead a team of SRE and Support engineers, facilitate incident and problem resolution, and collaborate with engineering and development teams to enhance platform services and supportability. The role includes short‑term planning and coordination of actions and resources within the team.

Responsibilities

  • Demonstrates a strong understanding of how application support contributes to the overall technology function and organizational objectives.
  • Assist with vendor relationship management, including coordination with offshore managed services.
  • Support efforts to improve service levels for end users by enhancing operational efficiencies and strengthening incident management, problem management, and knowledge‑sharing practices.
  • Partner with development teams to guide improvements in application stability and supportability.
  • Contribute to frameworks for managing capacity, throughput, and latency.
  • Assist in defining and implementing application onboarding guidelines and standards.
  • Support team members by fostering a collaborative environment and encouraging skill development.
  • Participate in cost‑reduction efforts through Root Cause Analysis reviews, knowledge management, performance tuning, and user training.
  • Participate in business review meetings to help align technology tools and strategies with business requirements.
  • Ensure adherence to support processes and tool standards, and assist in enhancing processes to promote consistency and quality across the support program.
  • Perform other duties and functions as assigned.
  • Support platform leadership in defining the platform roadmap and partnering with engineering teams and business stakeholders.
  • Assist in executing resilience activities such as wargaming scenarios, chaos engineering tests, and disaster recovery drills.
  • Contribute to automation initiatives aimed at reducing manual toil and improving platform efficiency.
  • Support the enterprise‑wide observability strategy, including monitoring, logging, tracing, and alerting.
  • Maintain hands‑on familiarity with platform architecture and services as needed for operational support.
  • Assist in overseeing the operational health of production platforms (including OpenShift, ECS, CI/CD), ensuring SLAs are supported and incident processes are followed.
  • Help implement and operate effective monitoring and observability strategies to support proactive issue detection and system health assessments.

Qualifications

  • 6+ years of relevant experience in a hands‑on technical or support leadership role.
  • Experience contributing to architecture discussions and ensuring solutions align with enterprise standards and long‑term maintainability.
  • Experience working with senior stakeholders or technology partners.Demonstrated experience supporting IT service improvements or platform stability initiatives.
  • Strong communication and presentation skills, with the ability to convey technical concepts clearly.
  • Experience supporting or contributing to technical roadmaps or operational workstreams.
  • Experience participating in resilience‑related activities such as incident simulations, disaster recovery exercises, or stability testing.
  • Ability to collaborate with cross‑functional support teams and technology groups.
  • Strong organizational and workload‑planning skills.
  • Consistently demonstrates clear and concise written and verbal communication skills.
  • Ability to communicate appropriately with relevant stakeholders.
  • Working knowledge of Generative AI concepts preferred.
  • Experience with CI/CD and configuration management tools preferred.
  • Experience with Red Hat OpenShift or similar Kubernetes technologies preferred.
  • Experience working with databases such as Postgres, Oracle, MongoDB, or Redis preferred.
  • Experience writing or maintaining code in Java, Python, Go, or similar languages preferred.
  • Hands‑on experience with modern observability and monitoring tools (e.g., Prometheus, Grafana, Splunk, ELK) preferred.

Education

  • Bachelor’s/University degree required; Master’s degree preferred.

Citi is an equal opportunity employer, and qualified candidates will receive consideration without regard to their race, color, religion, sex, sexual orientation, gender identity, national origin, disability, status as a protected veteran, or any other characteristic protected by law.

If you are a person with a disability and need a reasonable accommodation to use our search tools and/or apply for a career opportunity review Accessibility at Citi.

View Citi’s EEO Policy Statement and the Know Your Rights poster.

#J-18808-Ljbffr
Create a job alert for this search

Site Reliability Engineering Lead • Mississauga, Peel Region, Canada

Similar jobs

Sr. Site Reliability Engineer I

Axon EnterpriseMississauga, Peel Region, CA
Full-time

At Axon, we’re on a mission to Protect Life.We’re explorers, pursuing society’s most critical safety and justice issues with our ecosystem of devices and cloud software.Like our products, we work b... Show more

 • Promoted

BigGeo Cloud Reliability Engineering Lead

BiggeoMississauga, Peel Region, CA
Full-time

Elevate BigGeo’s Spatial Cloud as a Cloud Reliability Engineering Lead, ensuring infrastructure systems run smoothly.Focus on reliability architecture with an emphasis on operational excellence.As ... Show more

 • Promoted

Site Reliability Engineer

Tecsys Inc.Mississauga, Peel Region, CA
Permanent

Having recognized the advantages of remote work, including employee morale, productivity, reduced commuting on employee wellbeing and the environment, we are proud to be a digital-first company.The... Show more

 • Promoted

Senior Site Reliability Engineer

VantageMississauga, Peel Region, CA
Full-time

Do you enjoy keeping systems reliable, performant, and scalable while continuing to grow your technical depth? As a Senior Site Reliability Engineer (SRE) / DevOps Engineer at Vantage, you’ll contr... Show more

 • Promoted

Site Leader

The TRIGO GroupMississauga, Ontario, Canada
Full-time

TRIGO Global Quality Solutions.Site Leader Position in Mississauga/Milton, ON - Afternoon Shift.Overall Purpose of a Site Leader.Ensure quality services and support for all missions assigned to the... Show more

 • Promoted

Site Reliability Engineer

TELUS DigitalMississauga, Peel Region, CA
Full-time

Welcome to TELUS Digital — where innovation drives impact at a global scale.As an award-winning digital product consultancy and the digital division of TELUS, one of Canada’s largest telecommunicat... Show more

 • Promoted

Remote Director, Site Reliability Engineering

AffirmMississauga, Peel Region, CA
Remote
Full-time

A leading technology company is seeking a Senior Site Reliability Engineer to shape reliability engineering strategies and lead a diverse global team.Candidates should have over 15 years of experie... Show more

 • Promoted

Senior Site Reliability Engineer Focused on Kubernetes Infrastructure

Chainlink LabsMississauga, Peel Region, CA
Full-time

Elevate decentralized architecture as a Senior Site Reliability Engineer.Spearhead Kubernetes-based infrastructure for decentralized applications, driving scalability, security, and operational eff... Show more

 • Promoted

Site Reliability Engineer

Citigroup Inc.Mississauga
Full-time

Engineer the future of global finance.At Citi, our Tech team doesn’t just support finance – we are helping to redefine it.Every day, $5 trillion crosses through our network.We do business in 180+ c... Show more

 • Promoted

Lead Reliability Engineer At Iko Industries

IKO North AmericaBrampton, Canada
Full-time

Take on a pivotal role as a Lead Reliability Engineer at IKO Industries.Focus on advancing equipment reliability and applying best-in-class maintenance strategies across manufacturing sites.In this... Show more

 • Promoted

SRE Team Lead - Reliability, Scale & Leadership

Loblaw Companies LimitedBrampton
Full-time

A leading retail company in Brampton is seeking a Team Lead – Site Reliability Engineering.This role involves enhancing the reliability, scalability, and performance of infrastructure while leading... Show more

 • Promoted

Staff Site Reliability Engineer, Database

AlpacaMississauga, Peel Region, CA
Full-time

Alpaca is a US-headquartered self-clearing broker-dealer and brokerage infrastructure for stocks, ETFs, options, crypto, fixed income, 24/5 trading, and more.Our recent Series D funding round broug... Show more

 • Promoted

Site Reliability Engineering Manager Leading Strategic Infrastructure Growth

TempoMississauga, Peel Region, CA
Full-time

Shape the future of Site Reliability Engineering in this innovative Management role.Lead a remote team focused on enhancing infrastructure stability and operational performance.As the Engineering M... Show more

 • Promoted

Senior Site Reliability Engineer- Remote

ClickHouseMississauga, Peel Region, CA
Remote
Full-time

Senior Site Reliability Engineer- Remote.Recognized on the 2025 Forbes Cloud 100 list, ClickHouse is one of the most innovative and fast-growing private cloud companies.With more than 3,000 custome... Show more

 • Promoted

Director of Engineering — Platform & Reliability (Remote)

CliniaMississauga, Peel Region, CA
Remote
Full-time

A tech-driven health company in Canada is seeking a Director of Engineering to lead an engineering team of 25.You will manage delivery, ensure platform reliability, and set engineering standards wh... Show more

 • Promoted

Staff Site Reliability Engineer

ThinkificMississauga, Peel Region, CA
Full-time

Are you an experienced Site Reliability Engineer looking for a new challenge? We’re looking for a.Staff Site Reliability Engineer.Staff Site Reliability Engineer (SRE).As a Staff Site Reliability E... Show more

 • Promoted

Experienced Site Reliability Engineer - Remote

Tech InsightsMississauga, Peel Region, CA
Remote
Full-time

TechInsights seeks a Senior Site Reliability Engineer to enhance AI operations from anywhere in Canada.Oversee reliability strategies, manage error budgets, and collaborate closely with engineering... Show more

 • Promoted

Engineering Team Lead Driving AI Platform Strategies and Reliability

RoofrMississauga, Peel Region, CA
Full-time

Be at the forefront as an Engineering Team Lead, crafting the architecture for a robust AI integration platform.Champion collaborative efforts across teams in a remote-first environment.You'll play... Show more

 • Promoted

Senior Site Reliability Engineer (Remote-First)

VySystemsMississauga, Peel Region, CA
Remote
Full-time

A leading technology company is seeking a Senior Site Reliability Engineer with robust Kubernetes knowledge to work remotely.Ideal candidates have over 6 years of experience in IT disciplines, prof... Show more

 • Promoted

Strategic Leader in Reliability and Maintenance Operations

Blocket ABMississauga, Peel Region, CA
Full-time

Join as a Senior Manager for Maintenance and Reliability.Drive effective maintenance protocols, ensuring operational excellence and cultivating a high-performing team in a fast-paced environment.Th... Show more