Join us as an Intermediate Site Reliability Engineer helping build reliable, scalable cloudinfrastructure. You’ll work alongside senior engineers to own projects, deepen platform skills, and support teams operating large distributed systems.
You’ll focus on one of three streams :
Kubernetes, Observability, or Developer Experience .
What you'll be doing
Improve infrastructure reliability, scale, and security across cloud-native systems.
Deliver features and upgrades through infrastructure-as-code.
Collaborate with product teams on debugging, migrations, and operational readiness.
Support incident response, capacity planning, and performance improvements.
Automate repeatable workflows to reduce operational load across engineering.
Stream Focus Areas
You’ll help operate and evolve shared Kubernetes platforms used by many product teams.
Typical work :
Maintain and upgrade clusters, networking, ArgoCD, and IaC patterns.
Build or extend reusable infra modules (XRDs, Helm, Terraform) to standardize onboarding.
Partner with teams to plan and execute migrations safely
Handle inbound maintenance, patching, and legacy stack stability work.
Observability Platform
You’ll help deliver a modern telemetry platform powering metrics, logs, and traces for engineering teams.
Typical work :
Build and operate OTEL-based telemetry pipelines across environments.
Support migrations to VictoriaMetrics and maintain data accuracy during transitions.
Improve SLOs, alerting strategies, and reliability of observability systems.
Contribute to IaC automation for observability deployments.
Ideal tools : OTEL, Prometheus, VictoriaMetrics, VM Alert, Grafana, Terraform, GitHub Actions.
Developer Experience / CI / CD
You’ll help maintain and strengthen the CI / CD ecosystem powering builds, tests, and deployments.
Typical work :
Maintain pipelines, update dependencies, and improve the reliability of GitHub Actions.
Migrate workloads away from legacy tooling to a new Tailscale / OIDC-based platform.
Triage support requests, follow runbooks, and assist product teams during migrations.
Reduce operational load by standardizing patterns and supporting migrations.
Ideal tools : GitHub Actions, Docker, Tailscale, Terraform, and container registry best practices.
Your Background
3 - 5 years of experience as an SRE. Minimum 1+ years as a software engineer.
Keen to deepen your software engineering skills and play a bigger role in how our systems are built and operated.
Comfortable writing and debugging code in Go, Python, or a similar language.
Curious about platform reliability, excited to learn deeper system internals over time.
Communicate clearly with engineers across teams and time zones.
Focus on automation, reproducibility, and practical reliability over “heroics.”
Bring some experience in cloud infrastructure and want to grow into owning larger systems.
About Us
CAD $117,610 - $158,240 annually.
Our ranges include base salary and conservative bonus target.
Interested?
We're excited about working with you, so get in touch! Submit your application here .
We believe people from diverse backgrounds, with different identities and experiences, make our company better. No matter your background, we'd love to hear from you! Alignment with our values is just as important as experience. Also, please let us know if there are ways we can make our interview process better for you - we're always happy to listen and accommodate where possible.
J-18808-Ljbffr
Site Reliability Engineer (Intermediate) • Toronto, Canada