Site Reliability EngineerHCLTech • Toronto, Canada

Site Reliability Engineer

HCLTech • Toronto, Canada

24 days ago

Job type

Full-time

Job description

Join our SRE squad supporting ~1000 AWS-hosted services for BMO. You’ll own operational reliability, rapid triage, and proactive maintenance across production and non-prod, partnering closely with Cloud Engineering, SOC, and application teams.

Key Responsibilities

Deliver 24×7 monitoring, incident response, and problem management; drive MTTA / MTTR reduction and SLO / SLI adherence.

Perform preventive health checks; analyze ticket trends to implement continual service improvements and automation to reduce toil.

Execute blameless postmortems and high-quality RCA; maintain SOPs / runbooks and reliability dashboards.

Configure / tune observability (Dynatrace, CloudWatch, ELK); enable self-healing workflows and workload optimizations.

Support change / service requests within agreed SLAs; collaborate during transitions and onboard new AWS services.

Core Skills & Tools

AWS :

Lambda, ECS / Fargate / EC2, API Gateway, SNS / SQS, Kinesis, RDS; IAM / KMS foundations.

Observability & ITSM :

Dynatrace, CloudWatch, ELK; ServiceNow for incidents / changes; SLI / SLO dashboards.

Reliability Practices :

Error budgets, capacity / performance benchmarking, automation / runbook execution, FinOps awareness.

Qualifications

5+ years SRE / DevOps or L2 operations for cloud-native stacks; strong AWS production experience.

Proven incident / change / problem management in 24×7 environments; adept at RCA and postmortems.

Hands‑on with observability tooling and operational automation; excellent collaboration and documentation skills.

Shift Coverage & Locations

Follow-the-sun model with overlapping handoffs across Canada / India to ensure continuous support. Success is measured by uptime, MTTR / MTTD, change failure rate, error‑budget consumption, SLO adherence, RCA quality, and CSI throughput.

#J-18808-Ljbffr

Create a job alert for this search

Site Reliability Engineer • Toronto, Canada

Similar jobs

Site Reliability Engineer

Staples • Richmond Hill

Full-time

The Site Reliability Engineer (SRE) is responsible for ensuring the reliability, availability, and operational excellence of Staples Canada’s digital platforms. This role supports production systems...Show more

Last updated: 19 days ago • Promoted

Site Reliability Engineer 3

Behavox • Toronto

Full-time

Behavox is shaping the future of how businesses harness their most important raw material - data.Our mission is bold : Organize enterprise data into actionable information that protects and promotes...Show more

Last updated: 19 days ago • Promoted

Senior Site Reliability Engineer (SRE)

Acquird.io • Toronto

Full-time

B2B SaaS company, teams are based out of North America.Role is 95% remote in Toronto (we meetup 1x a month).Must be able to legally work in Canada (visa or sponsorship won't be provided).Our Platfo...Show more

Last updated: 19 days ago • Promoted

Senior Site Reliability Engineer

Tubi, Inc. • Toronto

Full-time

Boldly built for every fandom, Tubi is a free streaming service that entertains over 100 million monthly active users.Tubi offers the world's largest collection of Hollywood movies and TV shows, th...Show more

Last updated: 19 days ago • Promoted

Site Reliability Engineer

Tyk • Toronto, Canada

Full-time

About Tyk The Tyk API Management platform is helping to drive the connected world and power new products and services.We're changing the way that organisations connect any number of their systems a...Show more

Last updated: 26 days ago • Promoted

Site Reliability Engineer

ScalePad • Toronto, Canada

Full-time

About ScalePad ScalePad is a market‑leading SaaS company headquartered in Vancouver, Toronto, Montreal and Phoenix, AZ.With a global employee reach, we serve over 12,000 MSPs worldwide, helping the...Show more

Last updated: 26 days ago • Promoted

Site Reliability Engineer II

Electronic Arts (EA) • Toronto, Canada

Full-time

Electronic Arts creates next-level entertainment experiences that inspire players and fans around the world.Here, everyone is part of the story. A team where everyone makes play happen.The Productio...Show more

Last updated: 26 days ago • Promoted

Site Reliability Engineer, Inference Infrastructure

Cohere • Toronto

Full-time

Our mission is to scale intelligence to serve humanity.We’re training and deploying frontier models for developers and enterprises who are building AI systems to power magical experiences like cont...Show more

Last updated: 19 days ago • Promoted

Azure Site Reliability Engineer

Epsilon Solutions Ltd. • Toronto

Full-time

Team Lead Recruitment @ Epsilon Solutions Ltd.Azure Site Reliability Engineer.Implement and maintain monitoring systems to proactively identify potential issues and alert engineers to problems befo...Show more

Last updated: 19 days ago • Promoted

Reliability Engineer

Interpro Pipe & Steel • Toronto, Canada

Full-time

As a team we collaborate to solve problems, contribute ideas and challenge each other to ensure growth and ultimately success for the business and our employees. Job Description & Responsibilities.D...Show more

Last updated: 26 days ago • Promoted

Site Reliability Engineer

Aarorn Technologies Inc • Toronto

Full-time

Toronto, ON (3x onsite a week).We are seeking a skilled Site Reliability Engineer (SRE) to enhance the reliability, scalability, and performance of our systems and applications.The ideal candidate ...Show more

Last updated: 19 days ago • Promoted

Global SaaS Site Reliability Engineer

Kong • Toronto

Full-time

A leading developer of cloud API technologies is seeking a Site Reliability Engineer to join their global Platform SRE team in Toronto, Ontario. The role involves managing and scaling a multi-region...Show more

Last updated: 19 days ago • Promoted

Senior Site Reliability Engineer, Kong Konnect

Kong Inc. • Toronto

Full-time

Senior Site Reliability Engineer, Kong Konnect.This range is provided by Kong Inc.Your actual pay will be based on your skills and experience — talk with your recruiter to learn more.Are you ready ...Show more

Last updated: 19 days ago • Promoted

Site Reliability Engineer

Manulife • Toronto

Full-time

We are seeking a motivated Site Reliability Engineer (SRE) to join the Manulife Bank Service Delivery Management (SDM) team. In this role, you will be responsible for ensuring the reliability, avail...Show more

Last updated: 9 days ago • Promoted

Lead Site Reliability Engineer

Movable Ink • Toronto

Full-time

Movable Ink scales content personalization for marketers through data-activated content generation and AI decisioning.The world’s most innovative brands rely on Movable Ink to maximize revenue, sim...Show more

Last updated: 22 hours ago • Promoted • New!

Site Reliability Engineer II

Electronic Arts • Toronto, Canada

Full-time

Electronic Arts creates next-level entertainment experiences that inspire players and fans around the world.Here, everyone is part of the story. Part of a community that connects across the globe.A ...Show more

Last updated: 26 days ago • Promoted

Site Reliability Engineer (SRE)

Tangerine Bank • Toronto

Full-time +1

Press Tab to Move to Skip to Content Link.Select how often (in days) to receive an alert : .Tangerine is Canada’s leading direct bank. We offer flexible and accessible banking options, innovative prod...Show more

Last updated: 19 days ago • Promoted

Site Reliability Engineer

STAPLES Canada • Richmond Hill

Full-time

Last updated: 19 days ago • Promoted