Talent.com
Site Reliability Engineer
Site Reliability EngineerDenvr • Toronto, ON, CA
Site Reliability Engineer

Site Reliability Engineer

Denvr • Toronto, ON, CA
Il y a 14 jours
Type de contrat
  • Temps plein
Description de poste

Site Reliability Engineer - Platform Infrastructure Team (100% Remote - Canada)

Denvr is a vertically integrated AI Platform Services company headquartered in Calgary, Canada. We provide foundational compute infrastructure and services to support the broader AI ecosystem and its end users. The platform includes cloud‑native solutions for training, inference, high‑performance computing, data processing, scalable storage, and a suite of software toolsets that accelerate the development, deployment, and integration of AI applications.

These capabilities are accessible via the public Denvr AI Cloud or through Private AI Platform Services, which offer fully dedicated, sovereign environments with enhanced security. Private deployments incorporate advanced data centers, optimized compute architectures, high‑throughput storage fabrics, and tightly integrated platform operations software—engineered to meet the demands of large‑scale, mission‑critical AI workloads.

Why Join Us

Joining Denvr means being part of a world‑class team in the fast‑moving field of AI and high‑performance computing. We value curiosity, collaboration, and continuous learning. Our people are proactive problem solvers who take pride in delivering great results, thrive in open and transparent environments, and enjoy learning by doing.

About the Role

We are seeking a Site Reliability Engineer (SRE) with experience spanning cloud and data center environments to drive infrastructure reliability, observability, and scalability. In this role, you will design and operate resilient, high‑performance systems that enable cutting‑edge data solutions.

What You’ll Do

  • Observability & Monitoring : Design, implement, and maintain observability systems with Grafana, Prometheus, VictoriaMetrics, and PromQL to monitor system health and performance.
  • Industry Best Practices : Explore opportunities to improve the overall observability of HPC environments using industry best practices.
  • Incident Management & Troubleshooting : Participate in on‑call rotations, rapidly diagnose and resolve incidents, and perform postmortem reviews to drive continuous improvements.
  • DevOps & CI / CD : Hands‑on experience in automating DevOps pipelines using GitHub Actions (or similar tools).

Who You Are

  • Experience : 3‑5 years in a Site Reliability Engineering (SRE) or DevOps role.
  • Infrastructure as Code (IaC) : Familiarity with tools like Terraform or Helm, Ansible, Python for automated infrastructure provisioning.
  • Security Best Practices : Knowledge of security practices and compliance standards for enterprise environments.
  • HPC Knowledge : Familiarity with high‑performance computing, specifically in administering GPU‑related workloads.
  • Kubernetes Proficiency : Strong experience managing Kubernetes clusters in production environments.
  • Observability Tools : Expertise with observability platforms (Grafana, Prometheus, PromQL) for tracking and analyzing system metrics.
  • Networking : Solid understanding of networking fundamentals (TCP / IP, DNS, load balancing, VPNs).
  • AWS Cloud / Hybrid Cloud : Hands‑on experience developing and deploying production‑grade applications in AWS Cloud under hybrid cloud architecture.
  • Linux Systems : Proficiency in Linux administration, shell scripting, and performance tuning.
  • Programming Experience : Strong software development skills (e.g., Bash, Python, Golang) to automate infrastructure and operational tasks.
  • If you are passionate about technology and want to be part of a remote‑first, forward‑thinking company, Denvr would love to hear from you and learn more about your skills and capabilities. Click on the link to apply!

    #J-18808-Ljbffr

    Créer une alerte emploi pour cette recherche

    Site Reliability Engineer • Toronto, ON, CA

    Offres similaires
    Site Reliability Engineer 3

    Site Reliability Engineer 3

    Behavox • Toronto
    Temps plein
    Behavox is shaping the future of how businesses harness their most important raw material - data.Our mission is bold : Organize enterprise data into actionable information that protects and promotes...Voir plus
    Dernière mise à jour : il y a 26 jours • Offre sponsorisée
    Staff Site Reliability Engineer

    Staff Site Reliability Engineer

    ContactMonkey • Toronto
    Temps plein
    Our mission? To power measurable employee engagement worldwide.And we’d love for you to join us!.About the job - Staff Site Reliability Engineer. You are not just building infrastructure—you are rad...Voir plus
    Dernière mise à jour : il y a 29 jours • Offre sponsorisée
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    Chainlink Labs • Toronto
    Temps plein
    Chainlink Labs is the primary contributing developer of Chainlink, the decentralized computing platform powering the verifiable web. Chainlink is the industry-standard platform for providing access ...Voir plus
    Dernière mise à jour : il y a plus de 30 jours • Offre sponsorisée
    Site Reliability Engineer

    Site Reliability Engineer

    Verto Health • Toronto, ON, Canada
    Temps plein
    At Verto Health, we’re transforming how healthcare organizations connect and collaborate through delivery of digital twin & AI-enabled journeys for population health.Our solutions use pat...Voir plus
    Dernière mise à jour : il y a plus de 30 jours • Offre sponsorisée
    Site Reliability Engineer

    Site Reliability Engineer

    Kyndryl • Toronto
    Temps plein +1
    Join to apply for the Site Reliability Engineer role at Kyndryl.Direct message the job poster from Kyndryl.Recruitment & Strategic Staffing @Kyndryl | Partnering with IT Consultants in Financial Se...Voir plus
    Dernière mise à jour : il y a 25 jours • Offre sponsorisée
    Azure Site Reliability Engineer

    Azure Site Reliability Engineer

    Epsilon Solutions Ltd. • Toronto
    Temps plein
    Team Lead Recruitment @ Epsilon Solutions Ltd.Azure Site Reliability Engineer.Implement and maintain monitoring systems to proactively identify potential issues and alert engineers to problems befo...Voir plus
    Dernière mise à jour : il y a plus de 30 jours • Offre sponsorisée
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    Tubi • Toronto
    Temps plein
    Senior Site Reliability Engineer.Boldly built for every fandom, Tubi is a free streaming service that entertains over 100 million monthly active users. Tubi offers the world's largest collection of ...Voir plus
    Dernière mise à jour : il y a plus de 30 jours • Offre sponsorisée
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    PowerToFly • Toronto
    Temps plein
    We are seeking a highly motivated and experienced Senior Site Reliability Engineer (SRE) to manage critical cloud infrastructure and site reliability operations for Autodesk's global Product Access...Voir plus
    Dernière mise à jour : il y a 21 jours • Offre sponsorisée
    Sr. Site Reliability Engineer

    Sr. Site Reliability Engineer

    Jerry • Toronto
    Temps plein
    Site Reliability Engineer role at Jerry.Get AI-powered advice on this job and more exclusive features.Pre‑IPO startup with $240M funding, 60X revenue growth in 5 years, and a $2T market.Work with l...Voir plus
    Dernière mise à jour : il y a 14 jours • Offre sponsorisée
    Senior Site Reliability Engineer, Kong Konnect

    Senior Site Reliability Engineer, Kong Konnect

    Kong Inc. • Toronto
    Temps plein
    Senior Site Reliability Engineer, Kong Konnect.This range is provided by Kong Inc.Your actual pay will be based on your skills and experience — talk with your recruiter to learn more.Are you ready ...Voir plus
    Dernière mise à jour : il y a plus de 30 jours • Offre sponsorisée
    Sr. Site Reliability Engineer

    Sr. Site Reliability Engineer

    OpenText • Richmond Hill
    Temps plein
    OpenText is a global leader in information management, where innovation, creativity, and collaboration are the core components of our culture. As a member of our team, you will partner with leading ...Voir plus
    Dernière mise à jour : il y a plus de 30 jours • Offre sponsorisée
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    Circle • Toronto
    Temps plein
    Join to apply for the Senior Site Reliability Engineer role at Circle.Circle is a financial technology company at the epicenter of the emerging internet of money, where value can travel like other ...Voir plus
    Dernière mise à jour : il y a plus de 30 jours • Offre sponsorisée
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    MariaDB plc • Toronto, ON, Canada
    Temps plein +1
    MariaDB is making a big impact on the world.Whether you're checking your bank account, buying a coffee, shopping online, making a phone call, listening to music, taking out a loan or ordering t...Voir plus
    Dernière mise à jour : il y a plus de 30 jours • Offre sponsorisée
    Site Reliability Engineer

    Site Reliability Engineer

    STAPLES Canada • Richmond Hill
    Temps plein
    The Site Reliability Engineer (SRE) is responsible for ensuring the reliability, availability, and operational excellence of Staples Canada’s digital platforms. This role supports production systems...Voir plus
    Dernière mise à jour : il y a 11 jours • Offre sponsorisée
    Lead Site Reliability Engineer

    Lead Site Reliability Engineer

    RBC • Toronto
    Temps plein
    Join RBC as a Lead Site Reliability Engineer and take the lead in ensuring the reliability, scalability, and performance of our critical production systems and infrastructure.This is your chance to...Voir plus
    Dernière mise à jour : il y a plus de 30 jours • Offre sponsorisée
    Lead Site Reliability Engineer

    Lead Site Reliability Engineer

    SimCorp • Toronto
    Temps plein
    Lead Site Reliability Engineer.Join some of the most innovative thinkers in FinTech as we lead the evolution of financial technology. If you are an innovative, curious, collaborative person who embr...Voir plus
    Dernière mise à jour : il y a plus de 30 jours • Offre sponsorisée
    Site Reliability Engineer

    Site Reliability Engineer

    iManage • Toronto, ON, Canada
    Temps plein
    SRE is part of a global organization that leverages the latest technology to communicate with our colleagues across the globe. We organize ourselves into distributed teams SRE teams are anchored ...Voir plus
    Dernière mise à jour : il y a plus de 30 jours • Offre sponsorisée
    Site Reliability Engineer – Group Benefits (Hybrid)

    Site Reliability Engineer – Group Benefits (Hybrid)

    Manulife Financial • Toronto
    Temps plein
    A leading financial services provider in Toronto is looking for a technical professional to join their Group Benefits Engineering Team. This role involves application support, software optimization,...Voir plus
    Dernière mise à jour : il y a plus de 30 jours • Offre sponsorisée