Talent.com

Reliability engineer Jobs in Toronto, ON

Create a job alert for this search

Reliability engineer • toronto on

Last updated: 22 hours ago
Site Reliability Engineer

Site Reliability Engineer

DenvrToronto, ON, CA
Full-time
Site Reliability Engineer - Platform Infrastructure Team (100% Remote - Canada).Denvr is a vertically integrated AI Platform Services company headquartered in Calgary, Canada.We provide foundationa...Show moreLast updated: 12 days ago
  • Promoted
Platform Reliability Engineer

Platform Reliability Engineer

J&M GroupToronto, Canada
Full-time
J&M Group Continue with Google Continue with Google Join to apply for the.J&M Group Infrastructure as Code (IaC) : Terraform, ARM templates, CloudFormation Scripting Languages : Python, PowerShell, B...Show moreLast updated: 24 days ago
Site Reliability Engineer

Site Reliability Engineer

Scotiabank Global SiteToronto, Ontario, Canada
Full-time
Join a purpose driven winning team committed to results in an inclusive and high-performing culture.As a Site Reliability Engineer (SRE) you will join the Digital Engineering Operations team respon...Show moreLast updated: 30+ days ago
Site Reliability Engineer

Site Reliability Engineer

freelance.caToronto, Canada
Full-time
If you are fine with below JD please share me your Updated resume ASAP.Site Reliability EngineerLocation : TORONTO (ONSITE)Duration : 6 monthsExp Required : 10 YearsJob Description : Job Title : SRETec...Show moreLast updated: 30+ days ago
  • Promoted
  • New!
Site Reliability Engineer

Site Reliability Engineer

Dayforce US, Inc.Toronto, Canada
Full-time
Dayforce is a global human capital management (HCM) company headquartered in Toronto, Ontario, and Minneapolis, Minnesota, with operations across North America, Europe, Middle East, Africa (EMEA), ...Show moreLast updated: 22 hours ago
  • Promoted
Site Reliability Engineer

Site Reliability Engineer

LayerZero LabsToronto, Canada
Full-time
Founded in 2021, LayerZero’s vision is to create a community of cross-chain developers, building dApps that are no longer constrained by individual blockchain capabilities.With LayerZero's simple, ...Show moreLast updated: 24 days ago
Site Reliability Engineer

Site Reliability Engineer

Verto HealthToronto, ON, CA
Full-time
Quick Apply
About Verto Health At Verto Health, we’re transforming how healthcare organizations connect and collaborate through delivery of digital twin & AI-enabled journeys for population health.Ou...Show moreLast updated: 29 days ago
Reliability Engineer

Reliability Engineer

BrevitasToronto, ON, CA
Full-time
Resolute Workforce Solutions (Staff augmentation subsidiary of Brevitas Consulting Inc.Our expertise is in Commissioning & Qualification, Validation, Quality Systems, Regulatory Affairs, Engineerin...Show moreLast updated: 12 days ago
  • Promoted
  • New!
Reliability Engineer

Reliability Engineer

Mondelez España Galletas Production SLUToronto, Canada
Full-time
Job Description Are You Ready to Make It Happen at Mondelēz International?.Join our Mission to Lead the Future of Snacking. You will lead the production operations at the plant, delivering key perfo...Show moreLast updated: 22 hours ago
Reliability Engineer

Reliability Engineer

TransareteToronto, ON, CA
Full-time
Resolute Workforce Solutions | Full time.Resolute Workforce Solutions (Staff augmentation subsidiary of Brevitas Consulting Inc. Our expertise is in Commissioning & Qualification, Validation, Qualit...Show moreLast updated: 30+ days ago
  • Promoted
Reliability Engineer

Reliability Engineer

Mondelēz InternationalToronto, Canada
Full-time
Reliability Engineer – Mondelēz International Join our mission to lead the future of snacking.As a Reliability Engineer, you will drive operational excellence in manufacturing and deliver key perfo...Show moreLast updated: 30+ days ago
Site Reliability Engineer

Site Reliability Engineer

Tecsys Inc.Toronto, ON, CA
Permanent
Get AI-powered advice on this job and more exclusive features.Having recognized the advantages of remote work, including employee morale, productivity, reduced commuting on employee wellbeing and t...Show moreLast updated: 30+ days ago
Site Reliability Engineer

Site Reliability Engineer

iManageToronto, ON, CA
Full-time
SRE is part of a global organization that leverages the latest technology to communicate with our colleagues across the globe. We organize ourselves into distributed teams SRE teams are anchored ...Show moreLast updated: 30+ days ago
Site Reliability Engineer

Site Reliability Engineer

Bank of MontrealToronto, Ontario, Canada
Full-time +1
Providing expertise on Enterprise Monitoring platform and related solutions.Review design support and develop internal tools for BMO business users providing automation options with full self-servi...Show moreLast updated: 30+ days ago
  • Promoted
Site Reliability Engineer

Site Reliability Engineer

MonerisToronto, Canada
Full-time
Your Moneris Career - The Opportunity.We are looking for a Site Reliability Engineer (SRE) to join our dynamic team.As an SRE, you will help ensure the reliability, performance, and scalability of ...Show moreLast updated: 24 days ago
  • Promoted
Site Reliability Engineer

Site Reliability Engineer

ArbitrumToronto, Canada
Full-time
Founded in 2021, LayerZero’s vision is to create a community of cross-chain developers, building dApps that are no longer constrained by individual blockchain capabilities.With LayerZero's simple, ...Show moreLast updated: 24 days ago
  • Promoted
Systems Reliability Engineer

Systems Reliability Engineer

ScotiabankToronto, Canada
Full-time
Press Tab to Move to Skip to Content Link.Select how often (in days) to receive an alert : .Requisition ID : 239640 Join a purpose driven winning team, committed to results, in an inclusive and high-p...Show moreLast updated: 30+ days ago
Site Reliability Engineer

Site Reliability Engineer

CB CanadaToronto, Ontario, Canada
Full-time
On behalf of our client in the Banking Sector, PROCOM is looking for a Site Reliability Engineer.Site Reliability Engineer – Job Description. Experience with automating (provisioning, configuration ...Show moreLast updated: 30+ days ago
Site Reliability Engineer

Site Reliability Engineer

KyndrylToronto, ON, CA
Full-time +1
Join to apply for the Site Reliability Engineer role at Kyndryl.Direct message the job poster from Kyndryl.Recruitment & Strategic Staffing @Kyndryl | Partnering with IT Consultants in Financial Se...Show moreLast updated: 22 days ago
People also ask
Site Reliability Engineer

Site Reliability Engineer

DenvrToronto, ON, CA
12 days ago
Job type
  • Full-time
Job description

Site Reliability Engineer - Platform Infrastructure Team (100% Remote - Canada)

Denvr is a vertically integrated AI Platform Services company headquartered in Calgary, Canada. We provide foundational compute infrastructure and services to support the broader AI ecosystem and its end users. The platform includes cloud‑native solutions for training, inference, high‑performance computing, data processing, scalable storage, and a suite of software toolsets that accelerate the development, deployment, and integration of AI applications.

These capabilities are accessible via the public Denvr AI Cloud or through Private AI Platform Services, which offer fully dedicated, sovereign environments with enhanced security. Private deployments incorporate advanced data centers, optimized compute architectures, high‑throughput storage fabrics, and tightly integrated platform operations software—engineered to meet the demands of large‑scale, mission‑critical AI workloads.

Why Join Us

Joining Denvr means being part of a world‑class team in the fast‑moving field of AI and high‑performance computing. We value curiosity, collaboration, and continuous learning. Our people are proactive problem solvers who take pride in delivering great results, thrive in open and transparent environments, and enjoy learning by doing.

About the Role

We are seeking a Site Reliability Engineer (SRE) with experience spanning cloud and data center environments to drive infrastructure reliability, observability, and scalability. In this role, you will design and operate resilient, high‑performance systems that enable cutting‑edge data solutions.

What You’ll Do

  • Observability & Monitoring : Design, implement, and maintain observability systems with Grafana, Prometheus, VictoriaMetrics, and PromQL to monitor system health and performance.
  • Industry Best Practices : Explore opportunities to improve the overall observability of HPC environments using industry best practices.
  • Incident Management & Troubleshooting : Participate in on‑call rotations, rapidly diagnose and resolve incidents, and perform postmortem reviews to drive continuous improvements.
  • DevOps & CI / CD : Hands‑on experience in automating DevOps pipelines using GitHub Actions (or similar tools).

Who You Are

  • Experience : 3‑5 years in a Site Reliability Engineering (SRE) or DevOps role.
  • Infrastructure as Code (IaC) : Familiarity with tools like Terraform or Helm, Ansible, Python for automated infrastructure provisioning.
  • Security Best Practices : Knowledge of security practices and compliance standards for enterprise environments.
  • HPC Knowledge : Familiarity with high‑performance computing, specifically in administering GPU‑related workloads.
  • Kubernetes Proficiency : Strong experience managing Kubernetes clusters in production environments.
  • Observability Tools : Expertise with observability platforms (Grafana, Prometheus, PromQL) for tracking and analyzing system metrics.
  • Networking : Solid understanding of networking fundamentals (TCP / IP, DNS, load balancing, VPNs).
  • AWS Cloud / Hybrid Cloud : Hands‑on experience developing and deploying production‑grade applications in AWS Cloud under hybrid cloud architecture.
  • Linux Systems : Proficiency in Linux administration, shell scripting, and performance tuning.
  • Programming Experience : Strong software development skills (e.g., Bash, Python, Golang) to automate infrastructure and operational tasks.
  • If you are passionate about technology and want to be part of a remote‑first, forward‑thinking company, Denvr would love to hear from you and learn more about your skills and capabilities. Click on the link to apply!

    #J-18808-Ljbffr