Talent.com
Site Reliability Engineer
Site Reliability EngineerHCLTech • Toronto, Canada
Site Reliability Engineer

Site Reliability Engineer

HCLTech • Toronto, Canada
25 days ago
Job type
  • Full-time
Job description

Join our SRE squad supporting ~1000 AWS-hosted services for BMO. You’ll own operational reliability, rapid triage, and proactive maintenance across production and non-prod, partnering closely with Cloud Engineering, SOC, and application teams.

Key Responsibilities

Deliver 24×7 monitoring, incident response, and problem management; drive MTTA / MTTR reduction and SLO / SLI adherence.

Perform preventive health checks; analyze ticket trends to implement continual service improvements and automation to reduce toil.

Execute blameless postmortems and high-quality RCA; maintain SOPs / runbooks and reliability dashboards.

Configure / tune observability (Dynatrace, CloudWatch, ELK); enable self-healing workflows and workload optimizations.

Support change / service requests within agreed SLAs; collaborate during transitions and onboard new AWS services.

Core Skills & Tools

AWS :

Lambda, ECS / Fargate / EC2, API Gateway, SNS / SQS, Kinesis, RDS; IAM / KMS foundations.

Observability & ITSM :

Dynatrace, CloudWatch, ELK; ServiceNow for incidents / changes; SLI / SLO dashboards.

Reliability Practices :

Error budgets, capacity / performance benchmarking, automation / runbook execution, FinOps awareness.

Qualifications

5+ years SRE / DevOps or L2 operations for cloud-native stacks; strong AWS production experience.

Proven incident / change / problem management in 24×7 environments; adept at RCA and postmortems.

Hands‑on with observability tooling and operational automation; excellent collaboration and documentation skills.

Shift Coverage & Locations

Follow-the-sun model with overlapping handoffs across Canada / India to ensure continuous support. Success is measured by uptime, MTTR / MTTD, change failure rate, error‑budget consumption, SLO adherence, RCA quality, and CSI throughput.

#J-18808-Ljbffr

Create a job alert for this search

Site Reliability Engineer • Toronto, Canada

Similar jobs
Site Reliability Engineer

Site Reliability Engineer

Staples • Richmond Hill
Full-time
The Site Reliability Engineer (SRE) is responsible for ensuring the reliability, availability, and operational excellence of Staples Canada’s digital platforms. This role supports production systems...Show more
Last updated: 20 days ago • Promoted
Site Reliability Engineer 3

Site Reliability Engineer 3

Behavox • Toronto
Full-time
Behavox is shaping the future of how businesses harness their most important raw material - data.Our mission is bold : Organize enterprise data into actionable information that protects and promotes...Show more
Last updated: 20 days ago • Promoted
Site Reliability Engineer II

Site Reliability Engineer II

Fivetran • Toronto
Full-time
From Fivetran’s founding until now, our mission has remained the same : to make access to data as simple and reliable as electricity. With Fivetran, customer data arrives in their warehouses, canonic...Show more
Last updated: 20 days ago • Promoted
Senior Site Reliability Engineer

Senior Site Reliability Engineer

Tubi, Inc. • Toronto
Full-time
Boldly built for every fandom, Tubi is a free streaming service that entertains over 100 million monthly active users.Tubi offers the world's largest collection of Hollywood movies and TV shows, th...Show more
Last updated: 20 days ago • Promoted
Senior Site Reliability Engineer : Scale, Automate & Fortify Global Systems

Senior Site Reliability Engineer : Scale, Automate & Fortify Global Systems

Pinterest • Toronto
Full-time
A social media platform is seeking a Site Reliability Engineer to improve the reliability of its large-scale distributed systems. Responsibilities include developing software solutions, managing sys...Show more
Last updated: 18 days ago • Promoted
Site Reliability Engineer (GCP)

Site Reliability Engineer (GCP)

Stacktics Inc. • Toronto
Full-time
As a Site Reliability Engineer (GCP) you will play a key role at Stacktics Inc.Cloud Infrastructure, Big Data Analytics and Cloud For Marketing products, solutions and services.As a SRE / DevOps team...Show more
Last updated: 18 days ago • Promoted
Site Reliability Engineer, Inference Infrastructure

Site Reliability Engineer, Inference Infrastructure

Cohere • Toronto
Full-time
Our mission is to scale intelligence to serve humanity.We’re training and deploying frontier models for developers and enterprises who are building AI systems to power magical experiences like cont...Show more
Last updated: 20 days ago • Promoted
Azure Site Reliability Engineer

Azure Site Reliability Engineer

Epsilon Solutions Ltd. • Toronto
Full-time
Team Lead Recruitment @ Epsilon Solutions Ltd.Azure Site Reliability Engineer.Implement and maintain monitoring systems to proactively identify potential issues and alert engineers to problems befo...Show more
Last updated: 20 days ago • Promoted
Site Reliability Engineer

Site Reliability Engineer

Aarorn Technologies Inc • Toronto
Full-time
Toronto, ON (3x onsite a week).We are seeking a skilled Site Reliability Engineer (SRE) to enhance the reliability, scalability, and performance of our systems and applications.The ideal candidate ...Show more
Last updated: 20 days ago • Promoted
Global SaaS Site Reliability Engineer

Global SaaS Site Reliability Engineer

Kong • Toronto
Full-time
A leading developer of cloud API technologies is seeking a Site Reliability Engineer to join their global Platform SRE team in Toronto, Ontario. The role involves managing and scaling a multi-region...Show more
Last updated: 20 days ago • Promoted
Senior Site Reliability Engineer, Kong Konnect

Senior Site Reliability Engineer, Kong Konnect

Kong Inc. • Toronto
Full-time
Senior Site Reliability Engineer, Kong Konnect.This range is provided by Kong Inc.Your actual pay will be based on your skills and experience — talk with your recruiter to learn more.Are you ready ...Show more
Last updated: 20 days ago • Promoted
Senior Site Reliability Engineer

Senior Site Reliability Engineer

Tubi Tv • Toronto
Full-time
Boldly built for every fandom, Tubi is a free streaming service that entertains over 100 million monthly active users.Tubi offers the world's largest collection of Hollywood movies and TV shows, th...Show more
Last updated: 20 days ago • Promoted
Site Reliability Engineer

Site Reliability Engineer

Manulife • Toronto
Full-time
We are seeking a motivated Site Reliability Engineer (SRE) to join the Manulife Bank Service Delivery Management (SDM) team. In this role, you will be responsible for ensuring the reliability, avail...Show more
Last updated: 10 days ago • Promoted
Site Reliability Engineer (SRE)

Site Reliability Engineer (SRE)

Accelerate Her Future® • Toronto
Full-time +1
Tangerine is Canada’s leading direct bank.We offer flexible and accessible banking options, innovative products, and award-winning Client service. The reason why Tangerine employees come to work eac...Show more
Last updated: 20 days ago • Promoted
Lead Site Reliability Engineer

Lead Site Reliability Engineer

Movable Ink • Toronto
Full-time
Movable Ink scales content personalization for marketers through data-activated content generation and AI decisioning.The world’s most innovative brands rely on Movable Ink to maximize revenue, sim...Show more
Last updated: 1 day ago • Promoted
Site Reliability Engineer (SRE)

Site Reliability Engineer (SRE)

Tangerine Bank • Toronto
Full-time +1
Press Tab to Move to Skip to Content Link.Select how often (in days) to receive an alert : .Tangerine is Canada’s leading direct bank. We offer flexible and accessible banking options, innovative prod...Show more
Last updated: 20 days ago • Promoted
Site Reliability Engineer

Site Reliability Engineer

STAPLES Canada • Richmond Hill
Full-time
The Site Reliability Engineer (SRE) is responsible for ensuring the reliability, availability, and operational excellence of Staples Canada’s digital platforms. This role supports production systems...Show more
Last updated: 20 days ago • Promoted
Site Reliability Engineer — Scale Observability & Autonomy

Site Reliability Engineer — Scale Observability & Autonomy

MaintainX, Inc. • Toronto
Full-time
A leading technology company seeks a Site Reliability Engineer (SRE) to enhance service reliability and observability as it scales its cloud-based platform. The role involves assessing service matur...Show more
Last updated: 2 days ago • Promoted