Talent.com
Site Reliability Engineer
Site Reliability EngineerTecsys Inc. • Toronto, Canada
No longer accepting applications
Site Reliability Engineer

Site Reliability Engineer

Tecsys Inc. • Toronto, Canada
30+ days ago
Job type
  • Permanent
Job description

Having recognized the advantages of remote work, including employee morale, productivity, reduced commuting on employee wellbeing and the environment, we are proud to be a digital-first company. The technologies and programs in which we invested have provided a fantastic foundation to this end. Our digital-first work environment, together with our conveniently located offices and collaborative workspaces, provide our team with the freedom and flexibility to work in the way that makes our employees most productive.

About us

Tecsys is a fast-growing innovator offering supply chain solutions to industry leading healthcare systems, hospitals, and pharmacy businesses to distributors, retailers, and 3PLs. We work with industry leaders to transform their supply chains through technology. If you thrive on tackling interesting challenges with continuous learning opportunities, then Tescys could be a good fit for you!

About the Role

We are looking for a Site Reliability Engineer to join our Network and Security Operations Center (NOC), a team at the heart of platform reliability for mission-critical SaaS environments. You will help

maintain, optimize, and ensure the reliability and performance

of the systems that power our cloud infrastructure across AWS and Kubernetes, with a strong focus on automation, observability, and continuous improvement. This role blends reliability engineering with incident command, giving you real ownership over uptime, performance, and innovation. You will be part of a highly skilled team that values creative problem-solving, operational excellence, and continuous improvement through automation and resilience engineering.

Your responsibilities

Collaborate with other Engineering teams to support services before they go live through activities such as system design consulting, developing software platforms and frameworks, capacity planning and launch reviews.

Innovate relentlessly : Identify pain points, propose creative solutions, and drive initiatives that simplify, scale, and strengthen the platform.

Maintain services once they are live by measuring and monitoring availability, latency and overall system health.

Own observability : Enhance and expand monitoring and alerting using Datadog; define SLOs / SLIs and create actionable dashboards that drive reliability outcomes.

Drive automation : Develop and improve internal tooling, IaC frameworks, and pipelines (Terraform, GitLab CI / CD) to reduce manual intervention and enable self-healing systems.

Scale systems sustainably through automation and evolve systems by pushing for changes that improve reliability and velocity.

Be on‑call.

Practice sustainable incident response and blameless postmortems. Lead post‑incident reviews (RCAs) and identify long‑term fixes that improve stability, reliability, and developer experience.

Implement monitoring, Logging, alerting, and SLA Reporting.

Create and maintain technical documentation.

Implement, maintain and mature SRE best practices.

Lead incidents : Act as Incident Commander for Incidents; coordinate cross‑team response, manage communications, and ensure rapid service restoration.

Provide support for our planning and deployment teams to enable stability, predictability, and scale in our continued growth.

Collaborate with members of the Platform Engineering team to implement and support far‑reaching strategic efforts, provide constructive feedback, and foster a collaborative environment.

Work cross‑functionally with internal teams and vendors to manage our growth around the globe, with a strong focus on maintaining the high level of performance, availability, and reliability for our users.

5+ years in Site Reliability, Cloud, or DevOps Engineering, ideally in SaaS or large‑scale production environments.

Experience designing and deploying large scale systems, multi‑vendor platforms and globally distributed infrastructure.

Proven experience managing cloud infrastructure in AWS (multi‑account, VPC, EC2, EKS) and Kubernetes at scale.

Strong hands‑on experience with IaC and automation (Terraform, Ansible, or similar).

Familiarity with CI / CD pipelines and release automation (GitLab preferred, Jenkins acceptable).

Deep understanding of monitoring and observability using Datadog (or equivalent), including metric design, log pipelines, alerting, and dashboards.

Experience with incident management, on‑call participation, escalation, and structured postmortems.

Scripting skills in Python, Bash, Java or equivalent for automation and diagnostics.

Curiosity, ownership, and a bias for action; you see a problem, you solve it, and you share the lessons learned.

Experience with Fedramp (The Federal Risk and Authorization Management Program) compliance is a strong asset.

Basic knowledge of Java‑ or .Net‑based development required.

Strong English communication skills, both written and spoken, are essential for effective correspondence with customers, business partners and colleagues beyond the province of Quebec.

Additional requirements :

Escalation on‑call rotation

Occasional travel (quarterly offsites, conferences – less than 10%)

At Tecsys, we are committed to fostering a diverse and inclusive workplace where all employees feel valued, respected, and empowered. We believe that diversity drives innovation and strengthens our ability to deliver exceptional solutions. We welcome and encourage applicants from all backgrounds, experiences, and perspectives to join our team.

Tecsys is an equal opportunity employer. Accommodation is available for applicants selected for an interview.

NB : if you are applying to this position, you must be a Canadian Citizen or a Permanent Resident of Canada,

OR , have a valid Canadian work permit.

#J-18808-Ljbffr

Create a job alert for this search

Site Reliability Engineer • Toronto, Canada

Similar jobs
Site Reliability Engineer

Site Reliability Engineer

Staples • Richmond Hill
Full-time
The Site Reliability Engineer (SRE) is responsible for ensuring the reliability, availability, and operational excellence of Staples Canada’s digital platforms. This role supports production systems...Show more
Last updated: 4 days ago • Promoted
SRE Dynatrace Specialist : Observability & Reliability

SRE Dynatrace Specialist : Observability & Reliability

BeachHead • Toronto
Full-time
A leading recruitment agency is seeking a Site Reliability Engineer (Dynatrace Specialist) to enhance system reliability and performance for critical applications. In this Toronto-based role, you wi...Show more
Last updated: 3 days ago • Promoted
Remote Senior Property Engineer — Wildfire Expert

Remote Senior Property Engineer — Wildfire Expert

Allianz Commercial • Toronto C6A, ON, Canada
Remote
Full-time
A leading global insurance provider is seeking a Senior Property Engineer – Wildfire Expert to support clients in risk evaluation and management. This remote role requires approximately 30% travel f...Show more
Last updated: 17 days ago • Promoted
Site Reliability Engineer

Site Reliability Engineer

freelance.ca • Toronto, Canada
Full-time
If you are fine with below JD please share me your Updated resume ASAP.Site Reliability EngineerLocation : TORONTO (ONSITE)Duration : 6 monthsExp Required : 10 YearsJob Description : Job Title : SRETec...Show more
Last updated: 30+ days ago • Promoted
Site Lead - Program Development and Evaluation (MLSE)

Site Lead - Program Development and Evaluation (MLSE)

Toronto Community Housing • Toronto C6A, ON, Canada
Part-time
As a Site Lead on the Program Development and Evaluation team, you will lead and inspire a team to deliver a high‑quality program for TCHC youth aged 6‑12. You will supervise program staff, have res...Show more
Last updated: 10 days ago • Promoted
Site Reliability Engineer (SRE)

Site Reliability Engineer (SRE)

Accelerate Her Future® • Toronto C6A, ON, Canada
Full-time +1
Tangerine is Canada’s leading direct bank.We offer flexible and accessible banking options, innovative products, and award-winning Client service. The reason why Tangerine employees come to work eac...Show more
Last updated: 11 days ago • Promoted
Site Reliability Engineer, Inference Infrastructure

Site Reliability Engineer, Inference Infrastructure

The Rundown AI, Inc. • Toronto
Full-time
Our mission is to scale intelligence to serve humanity.We’re training and deploying frontier models for developers and enterprises who are building AI systems to power magical experiences like cont...Show more
Last updated: 4 days ago • Promoted
Senior / Staff Site Reliability Engineer

Senior / Staff Site Reliability Engineer

Circle • Toronto
Full-time
Circle (NYSE : CRCL) is one of the world’s leading internet financial platform companies, building the foundation of a more open, global economy through digital assets, payment applications, and pro...Show more
Last updated: 4 days ago • Promoted
Cloud & Reliability Leader — Kubernetes & SRE

Cloud & Reliability Leader — Kubernetes & SRE

Nulogy • Toronto, Ontario, Canada
Full-time
A leading manufacturing technology company in Toronto seeks a Head of Infrastructure to define and execute the strategic vision for cloud operations. The role involves ensuring high availability of ...Show more
Last updated: 30+ days ago • Promoted
SRE & Production Reliability Engineer — Hybrid

SRE & Production Reliability Engineer — Hybrid

Tangerine Bank • Toronto C6A, ON, Canada
Full-time
A leading digital bank in Toronto is seeking a qualified SRE & Production Support professional to enhance their technology solutions. You will manage team workflows, ensure timely resolution of prod...Show more
Last updated: 30+ days ago • Promoted
Site Reliability Engineer

Site Reliability Engineer

Dexian • Toronto
Full-time
Working Location : Toronto, ON [Hybrid 2 days a week in office].The DevOps and Automation is looking for a Site Reliability Engineer with strong expertise in Dynatrace to ensure the reliability, per...Show more
Last updated: 4 days ago • Promoted
DevOps Engineer - richmond hill

DevOps Engineer - richmond hill

VBeyond Corporation • richmond hill, on, ca
Full-time
We are seeking a DevOps Engineer.The role focuses on infrastructure setup, deployment automation, performance, security, and operational stability throughout the migration and post-launch phases.Su...Show more
Last updated: 6 days ago • Promoted
Site Reliability Engineer (SRE)

Site Reliability Engineer (SRE)

Tangerine • Toronto C6A, ON, Canada
Full-time +1
As Canada’s leading digital bank, Tangerine technology is at the heart of everything we do.We have redefined what digital banking is and we continue to evolve on what it can be, using technology to...Show more
Last updated: 30+ days ago • Promoted
Site Operations Lead - Client-Facing & Excellence

Site Operations Lead - Client-Facing & Excellence

Ricoh Americas Holdings • Toronto C6A, ON, Canada
Full-time +1
A leading technology company in Toronto is seeking a Site Manager for a 12–18 month contract to oversee daily operations and staff. The role involves managing profitability, ensuring quality standar...Show more
Last updated: 9 days ago • Promoted
Engineering Sr. Site Reliability Engineer Palo Alto, California

Engineering Sr. Site Reliability Engineer Palo Alto, California

getjerry.com • Toronto
Full-time
Join a pre-IPO startup with capital, traction and runway ($240M funded | 60X revenue growth in 5 years | $2T market size). Work closely with brilliant leaders and teammates from companies like McKin...Show more
Last updated: 4 days ago • Promoted
Site Reliability Developer 1

Site Reliability Developer 1

Vena • Toronto
Full-time
This is a flexible position and has the option of working in our Toronto office full time, hybrid throughout the week or working entirely remotely. Please note that this role includes participating ...Show more
Last updated: 2 days ago • Promoted
Site Reliability Professional (DB2 LUW)

Site Reliability Professional (DB2 LUW)

IBM • Markham
Full-time
At IBM Software, we transform client challenges into solutions.Building the world’s leading AI-powered, cloud-native products that shape the future of business and society.Our legacy of innovation ...Show more
Last updated: 4 days ago • Promoted
Senior / Staff Distributed Systems Engineer

Senior / Staff Distributed Systems Engineer

GuruLink • Toronto C6A, ON, Canada
Full-time
We are an AI research and systems company building the infrastructure for a new kind of intelligence : one that is structured, efficient, and deeply integrated with data. Our systems operate at exaby...Show more
Last updated: 8 days ago • Promoted