Talent.com
Tecsys Inc.
Site Reliability EngineerTecsys Inc. • Montreal, QC, CA
Site Reliability Engineer

Site Reliability Engineer

Tecsys Inc. • Montreal, QC, CA
30+ days ago
Job type
  • Full-time
  • Permanent
  • Quick Apply
Job description

Having recognized the advantages of remote work, including employee morale, productivity, reduced commuting on employee wellbeing and the environment, we are proud to be a digital-first company. The technologies and programs in which we invested have provided a fantastic foundation to this end. Our digital-first work environment, together with our conveniently located offices and collaborative workspaces, provide our team with the freedom and flexibility to work in the way that makes our employees most productive.

About us

Tecsys is a fast-growing innovator offering supply chain solutions to industry leading healthcare systems, hospitals, and pharmacy businesses to distributors, retailers, and 3PLs. We work with industry leaders to transform their supply chains through technology. If you thrive on tackling interesting challenges with continuous learning opportunities, then Tescys could be a good fit for you!

About the Role

We are looking for a Site Reliability Engineer to join our Network and Security Operations Center (NOC), a team at the heart of platform reliability for mission-critical SaaS environments. You will help maintain, optimize, and ensure the reliability and performance of the systems that power our cloud infrastructure across AWS and Kubernetes, with a strong focus on automation, observability, and continuous improvement. This role blends reliability engineering with incident command, giving you real ownership over uptime, performance, and innovation. You will be part of a highly skilled team that values creative problem-solving, operational excellence, and continuous improvement through automation and resilience engineering.

Responsibilities

  • Collaborate with engineering teams to support services from design through launch, including system design consulting, capacity planning, and launch reviews
  • Maintain service reliability post-deployment by monitoring availability, latency, and overall system health
  • Identify pain points and drive continuous improvements to enhance scalability, simplicity, and platform resilience
  • Own observability by developing and improving monitoring, alerting, dashboards, and defining SLOs/SLIs (Datadog)
  • Build and enhance automation, internal tooling, and IaC frameworks (Terraform, CI/CD) to enable scalable and self-healing systems
  • Scale systems sustainably through automation and reliability-focused improvements
  • Leverage AI tools (e.g., Amazon Kiro) to accelerate execution while validating outputs
  • Lead incident response, act as Incident Commander when required, and drive blameless postmortems with long-term fixes
  • Implement and maintain logging, monitoring, alerting, and SLA reporting practices
  • Create and maintain technical documentation and contribute to SRE best practices
  • Partner with platform engineering, deployment teams, and cross-functional stakeholders to support growth and system stability
  • Collaborate with internal teams and vendors globally to ensure high performance, availability, and reliability across environments

Requirements

Qualifications

  • Strong relevant experience in Site Reliability, Cloud, or DevOps Engineering in SaaS or large-scale production environments
  • Strong experience with AWS (multi-account, VPC, EC2, EKS) and Kubernetes at scale
  • Hands-on expertise with Infrastructure as Code and automation tools (Terraform, Ansible, or similar)
  • Experience with CI/CD pipelines and release automation (GitLab preferred, Jenkins acceptable)
  • Proficiency in monitoring and observability tools (Datadog or equivalent), including metrics, logging, alerting, and dashboards
  • Experience designing, deploying, and operating large-scale, distributed systems and multi-vendor platforms
  • Solid incident management experience, including on-call rotations, escalations, and postmortems
  • Strong scripting skills in Python, Bash, Java, or similar for automation and diagnostics
  • Familiarity with AI-assisted engineering tools (e.g., Amazon Kiro) and ability to validate outputs effectively
  • Basic knowledge of Java or .NET-based development environments
  • Proactive mindset with strong ownership, problem-solving, and knowledge-sharing habits
  • Willingness to participate in on-call rotations and occasional travel to the office(less than 10%)

We understand that experience comes in many forms and that careers are not always linear. If you don't meet every requirement in this posting, we still encourage you to apply.

At Tecsys, we are committed to fostering a diverse and inclusive workplace where all employees feel valued, respected, and empowered. We believe that diversity drives innovation and strengthens our ability to deliver exceptional solutions. We welcome and encourage applicants from all backgrounds, experiences, and perspectives to join our team.

Tecsys is an equal opportunity employer. Accommodation is available for applicants selected for an interview.

NB: if you are applying to this position, you must be a Canadian Citizen or a Permanent Resident of Canada, OR, have a valid Canadian work permit.

***

A Note on Our Hiring Process: We do not use AI to automatically screen or reject candidates. However, we do use specific screening questions to prioritize the most relevant applications for human review.

At Tecsys, we welcome the thoughtful use of AI tools to help you prepare your application, for example, to improve clarity, organize your resume, or practice interview responses. However, we ask that all information you provide reflects your real experience, and that any assessments or written submissions represent your own work and thinking.

During interviews, we expect candidates to engage without the use of AI tools, scripts, or real-time assistance. Authentic, direct conversation helps us get to know how you think, collaborate, and communicate. AI can support your preparation, but it shouldn’t speak or act on your behalf. We genuinely want to meet you.

Create a job alert for this search

Site Reliability Engineer • Montreal, QC, CA

Similar jobs

Site Reliability Engineer

Vertex Elite LLCRivière-Des-Prairies-Pointe-Aux-Trembles, Canada
Full-time

Duration: ContractKey Skills:Monitoring / Observability tools - Dynatrace, ELK etc.Platform/ cloud Observability - OpenShift, Prometheus / Azure Cloud etc.Key Responsibilities:Collaborate with vari... Show more

 • Promoted

Site Reliability Engineer Position At Entrust

Entrust CorporationRivière-Des-Prairies-Pointe-Aux-Trembles, Canada
Full-time

Advance your career with Entrust as a Senior Site Reliability Engineer.This hybrid position is critical for maintaining high service levels in our cloud environments.As the Senior Site Reliability ... Show more

 • Promoted

Senior Site Reliability Engineer Focused on Kubernetes Infrastructure

Chainlink LabsMontreal (administrative region), QC, CA
Full-time

Elevate decentralized architecture as a Senior Site Reliability Engineer.Spearhead Kubernetes-based infrastructure for decentralized applications, driving scalability, security, and operational eff... Show more

 • Promoted

Sr. Site Reliability Engineer I

AxonMontreal (administrative region), QC, CA
Full-time

Join Axon and be a Force for Good.At Axon, we’re on a mission to Protect Life.We’re explorers, pursuing society’s most critical safety and justice issues with our ecosystem of devices and cloud sof... Show more

 • Promoted

Senior Site Reliability Engineer

I did my part and supported the Regular ToiletMontreal (administrative region), QC, CA
Full-time

MongoDB’s Storage Layer Services (SLS) team is re‑architecting the MongoDB cloud storage layer and sits at the heart of our next‑generation cloud storage architecture.This relatively new team is bu... Show more

 • Promoted

Senior Site Reliability Engineer (Remote-First)

VySystemsMontreal (administrative region), QC, CA
Remote
Full-time

A leading technology company is seeking a Senior Site Reliability Engineer with robust Kubernetes knowledge to work remotely.Ideal candidates have over 6 years of experience in IT disciplines, prof... Show more

 • Promoted

Senior Site Reliability Engineering Specialist

LeadingtalentMontreal (administrative region), QC, CA
Full-time

We are hiring an elite SRE with a passion for building fault‑tolerant, scalable systems in the cloud.You bring a performance engineering mindset to everything you do—balancing innovation with relia... Show more

 • Promoted

Site Reliability Engineer

ApTaskMontréal, Canada
Full-time

Direct message the job poster from ApTaskLooking for an intermediate between 2 to 5 years' experience.Reporting to a Site Reliability Engineering & Operations Lead.This role requires delive... Show more

 • Promoted

Hybrid Site Reliability Engineer Role

SAP SEMontreal
Full-time

Join a Site Reliability Engineering team focused on cloud service reliability.Use your skills in incident management and container technologies to enhance operational efficiency in a hybrid work se... Show more

 • Promoted

Senior Site Reliability Engineer

ThinkificMontreal (administrative region), QC, CA
Full-time

Senior Site Reliability Engineer.Senior Site Reliability Engineer.Are you an experienced Site Reliability Engineer looking for a new challenge?.Senior Site Reliability Engineer.Senior Site Reliabil... Show more

 • Promoted

Site Reliability Engineer

TELUS DigitalMontreal (administrative region), QC, CA
Full-time

Welcome to TELUS Digital — where innovation drives impact at a global scale.As an award-winning digital product consultancy and the digital division of TELUS, one of Canada’s largest telecommunicat... Show more

 • Promoted

Senior Site Reliability Engineer- Remote

ClickHouseMontreal (administrative region), QC, CA
Remote
Full-time

Senior Site Reliability Engineer- Remote.Recognized on the 2025 Forbes Cloud 100 list, ClickHouse is one of the most innovative and fast-growing private cloud companies.With more than 3,000 custome... Show more

 • Promoted

Senior Infrastructure Reliability Engineer

ShippoMontreal (administrative region), QC, CA
Full-time

Enhance shipping solutions as a Senior Site Reliability Engineer in a remote setting.Focus on infrastructure integrity, scalability, and performance in a collaborative environment.This position inv... Show more

 • Promoted

Reliability Specialist

Selby JenningsMontreal (administrative region), QC, CA
Full-time

Join a global quantitative trading organization known for its long-standing commitment to advancing high‑performance electronic trading systems.With a history spanning over two decades, the firm ha... Show more

 • Promoted

Experienced Site Reliability Engineer - Remote

Tech InsightsMontreal (administrative region), QC, CA
Remote
Full-time

TechInsights seeks a Senior Site Reliability Engineer to enhance AI operations from anywhere in Canada.Oversee reliability strategies, manage error budgets, and collaborate closely with engineering... Show more

 • Promoted

Senior Site Reliability Engineer

MedeloopMontreal (administrative region), QC, CA
Full-time

We are seeking a Senior DevOps & Site Reliability Engineer to own the reliability, scalability, performance, and operational excellence of Medeloop’s platform.This role blends deep DevOps engineeri... Show more

 • Promoted

Remote Site Reliability Engineer - Scale Crypto Systems

NewtonMontreal (administrative region), QC, CA
Remote
Full-time

A leading innovative tech company in Toronto is looking for a Site Reliability Engineer.In this pivotal role, you will enhance the reliability and resilience of critical services, manage incidents,... Show more

 • Promoted

Staff Site Reliability Engineer, Database

AlpacaMontreal (administrative region), QC, CA
Full-time

Alpaca is a US-headquartered self-clearing broker-dealer and brokerage infrastructure for stocks, ETFs, options, crypto, fixed income, 24/5 trading, and more.Our recent Series D funding round broug... Show more

 • Promoted

Lead Developer in Site Reliability Engineering

Z953Montreal
Full-time

Become a Lead Developer in Site Reliability Engineering at Stingray, located in Montreal.This role focuses on transforming technology and advancing team efficiency.You'll work closely with the tech... Show more

 • Promoted

Site Reliability Engineer (Linux / Cloud Infrastructure)

Atlantis IT GroupMontréal, Quebec, Canada
Full-time

Site Reliability Engineer (Linux / Cloud Infrastructure) role with hands-on experience across Linux, distributed systems, scripting, databases, monitoring, containers, cloud SaaS integrations, mess... Show more