Talent.com
Site Reliability Engineer
Site Reliability EngineerDenvr • Toronto, Canada
Site Reliability Engineer

Site Reliability Engineer

Denvr • Toronto, Canada
11 days ago
Job type
  • Full-time
Job description

Site Reliability Engineer - Platform Infrastructure Team (100% Remote - Canada)

Denvr is a vertically integrated AI Platform Services company headquartered in Calgary, Canada. We provide foundational compute infrastructure and services to support the broader AI ecosystem and its end users. The platform includes cloud‑native solutions for training, inference, high‑performance computing, data processing, scalable storage, and a suite of software toolsets that accelerate the development, deployment, and integration of AI applications.

These capabilities are accessible via the public Denvr AI Cloud or through Private AI Platform Services, which offer fully dedicated, sovereign environments with enhanced security. Private deployments incorporate advanced data centers, optimized compute architectures, high‑throughput storage fabrics, and tightly integrated platform operations software—engineered to meet the demands of large‑scale, mission‑critical AI workloads.

Why Join Us

Joining Denvr means being part of a world‑class team in the fast‑moving field of AI and high‑performance computing. We value curiosity, collaboration, and continuous learning. Our people are proactive problem solvers who take pride in delivering great results, thrive in open and transparent environments, and enjoy learning by doing.

About the Role

We are seeking a Site Reliability Engineer (SRE) with experience spanning cloud and data center environments to drive infrastructure reliability, observability, and scalability. In this role, you will design and operate resilient, high‑performance systems that enable cutting‑edge data solutions.

What You’ll Do

Observability & Monitoring :

Design, implement, and maintain observability systems with Grafana, Prometheus, VictoriaMetrics, and PromQL to monitor system health and performance.

Industry Best Practices :

Explore opportunities to improve the overall observability of HPC environments using industry best practices.

Incident Management & Troubleshooting :

Participate in on‑call rotations, rapidly diagnose and resolve incidents, and perform postmortem reviews to drive continuous improvements.

DevOps & CI / CD :

Hands‑on experience in automating DevOps pipelines using GitHub Actions (or similar tools).

Who You Are

Experience :

3‑5 years in a Site Reliability Engineering (SRE) or DevOps role.

Infrastructure as Code (IaC) :

Familiarity with tools like Terraform or Helm, Ansible, Python for automated infrastructure provisioning.

Security Best Practices :

Knowledge of security practices and compliance standards for enterprise environments.

HPC Knowledge :

Familiarity with high‑performance computing, specifically in administering GPU‑related workloads.

Kubernetes Proficiency :

Strong experience managing Kubernetes clusters in production environments.

Observability Tools :

Expertise with observability platforms (Grafana, Prometheus, PromQL) for tracking and analyzing system metrics.

Networking :

Solid understanding of networking fundamentals (TCP / IP, DNS, load balancing, VPNs).

AWS Cloud / Hybrid Cloud :

Hands‑on experience developing and deploying production‑grade applications in AWS Cloud under hybrid cloud architecture.

Linux Systems :

Proficiency in Linux administration, shell scripting, and performance tuning.

Programming Experience :

Strong software development skills (e.g., Bash, Python, Golang) to automate infrastructure and operational tasks.

If you are passionate about technology and want to be part of a remote‑first, forward‑thinking company, Denvr would love to hear from you and learn more about your skills and capabilities. Click on the link to apply!

#J-18808-Ljbffr

Create a job alert for this search

Site Reliability Engineer • Toronto, Canada

Similar jobs
Staff Site Reliability Engineer

Staff Site Reliability Engineer

ContactMonkey • Toronto, ON, Canada
Full-time
Hey there! We're ContactMonkey 👋.Our mission? To power measurable employee engagement worldwide.And we'd love for you to join us!. About the job - Staff Site Reliability Engineer.You are no...Show more
Last updated: 30+ days ago • Promoted
Site Reliability Engineer - Observability

Site Reliability Engineer - Observability

Flinks Technology Inc. • Toronto
Full-time
Flinks is where financial data moves—with purpose, trust, and impact.We’re on a mission to simplify access to financial data and help businesses build better, faster, and more secure financial prod...Show more
Last updated: 30+ days ago • Promoted
Site Reliability Engineer 3

Site Reliability Engineer 3

Behavox • Toronto
Full-time
Behavox is shaping the future of how businesses harness their most important raw material - data.Our mission is bold : Organize enterprise data into actionable information that protects and promotes...Show more
Last updated: 27 days ago • Promoted
Site Reliability Engineer

Site Reliability Engineer

freelance.ca • Toronto, Canada
Full-time
If you are fine with below JD please share me your Updated resume ASAP.Site Reliability EngineerLocation : TORONTO (ONSITE)Duration : 6 monthsExp Required : 10 YearsJob Description : Job Title : SRETec...Show more
Last updated: 30+ days ago • Promoted
Site Reliability Engineer

Site Reliability Engineer

Verto Health • Toronto, ON, Canada
Full-time
At Verto Health, we’re transforming how healthcare organizations connect and collaborate through delivery of digital twin & AI-enabled journeys for population health.Our solutions use pat...Show more
Last updated: 30+ days ago • Promoted
Site Reliability Engineer

Site Reliability Engineer

Kyndryl • Toronto
Full-time +1
Join to apply for the Site Reliability Engineer role at Kyndryl.Direct message the job poster from Kyndryl.Recruitment & Strategic Staffing @Kyndryl | Partnering with IT Consultants in Financial Se...Show more
Last updated: 26 days ago • Promoted
Site Reliability Engineer

Site Reliability Engineer

Capgemini • Toronto, Canada
Full-time
Talent Acquisition Business Partner – Strategic Business Unit at Capgemini America Inc.Choosing Capgemini means choosing a company where you will be empowered to shape your career in the way you’d ...Show more
Last updated: 30+ days ago • Promoted
Site Reliability Engineer (SRE)

Site Reliability Engineer (SRE)

Tangerine • Toronto, Canada
Permanent
SRE & Production Support As Canada’s leading digital bank, Tangerine technology is at the heart of everything we do.We have redefined what digital banking is and we continue to evolve on what it ca...Show more
Last updated: 30+ days ago • Promoted
Senior Site Reliability Engineer

Senior Site Reliability Engineer

Fivetran • Toronto
Full-time
Senior Site Reliability Engineer.From Fivetran’s founding until now, our mission has remained the same : to make access to data as simple and reliable as electricity. With Fivetran, customer data arr...Show more
Last updated: less than 1 hour ago • Promoted • New!
Lead Site Reliability Engineer

Lead Site Reliability Engineer

RBC • Toronto, Canada
Full-time
Join RBC as a Lead Site Reliability Engineer and take the lead in ensuring the reliability, scalability, and performance of our critical production systems and infrastructure.This is your chance to...Show more
Last updated: 30+ days ago • Promoted
Site Reliability Engineer (SRE)

Site Reliability Engineer (SRE)

Tangerine Bank • Toronto
Full-time +1
Press Tab to Move to Skip to Content Link.Select how often (in days) to receive an alert : .Tangerine is Canada’s leading direct bank. We offer flexible and accessible banking options, innovative prod...Show more
Last updated: 30+ days ago • Promoted
Senior Site Reliability Engineer

Senior Site Reliability Engineer

MariaDB plc • Toronto, ON, Canada
Full-time +1
MariaDB is making a big impact on the world.Whether you're checking your bank account, buying a coffee, shopping online, making a phone call, listening to music, taking out a loan or ordering t...Show more
Last updated: 30+ days ago • Promoted
Site Reliability Engineer

Site Reliability Engineer

Dayforce US, Inc. • Toronto
Full-time
Dayforce is a global human capital management (HCM) company headquartered in Toronto, Ontario, and Minneapolis, Minnesota, with operations across North America, Europe, Middle East, Africa (EMEA), ...Show more
Last updated: 4 days ago • Promoted
Site Reliability Engineer

Site Reliability Engineer

STAPLES Canada • Richmond Hill
Full-time
The Site Reliability Engineer (SRE) is responsible for ensuring the reliability, availability, and operational excellence of Staples Canada’s digital platforms. This role supports production systems...Show more
Last updated: 12 days ago • Promoted
Site Reliability Engineer

Site Reliability Engineer

Moneris • Toronto, Canada
Full-time
Your Moneris Career - The Opportunity.We are looking for a Site Reliability Engineer (SRE) to join our dynamic team.As an SRE, you will help ensure the reliability, performance, and scalability of ...Show more
Last updated: 26 days ago • Promoted
Lead Site Reliability Engineer

Lead Site Reliability Engineer

SimCorp • Toronto
Full-time
Lead Site Reliability Engineer.Join some of the most innovative thinkers in FinTech as we lead the evolution of financial technology. If you are an innovative, curious, collaborative person who embr...Show more
Last updated: 30+ days ago • Promoted
Site Reliability Engineer

Site Reliability Engineer

iManage • Toronto, ON, Canada
Full-time
SRE is part of a global organization that leverages the latest technology to communicate with our colleagues across the globe. We organize ourselves into distributed teams SRE teams are anchored ...Show more
Last updated: 30+ days ago • Promoted
Site Reliability Engineer

Site Reliability Engineer

Scotiabank • Toronto, Canada
Full-time
Select how often (in days) to receive an alert : Requisition ID : 244027 Join a purpose driven winning team, committed to results, in an inclusive and high-performing culture.As a Site Reliability En...Show more
Last updated: 30+ days ago • Promoted