Search jobs > Ottawa, ON > Remote > Temporary > Reliability engineer

Lead Site Reliability Engineer - Remote

Purple Drive
Ottawa West, ON, Ontario, Canada
$74.9K-$142.1K a year (estimated)
Remote
Full-time

Lead Site Reliability Engineer

Remote

1. SRE Implementations : Look for candidates who have experience implementing SRE principles, including the establishment of Service Level Indicators (SLIs), Service Level Objectives (SLOs), and Error Budgets to ensure system reliability and availability.

2. Observability : Search for keywords related to observability, including familiarity with concepts such as full-stack observability and distributed tracing,

3. Tool Proficiency : Datadog, CloudWatch, Synthetic Monitoring tools

4. Building SRE Culture : Evaluate candidates based on their ability to develop SRE frameworks within organizations, such as creating SRE charters and fostering a culture of reliability and accountability across teams.

5. Automation : Look for candidates with extensive experience in automation, including the automation of repetitive tasks, infrastructure provisioning, and deployment processes, to streamline operations and enhance efficiency.

6. Chaos Engineering : Consider candidates who have experience in Chaos Engineering practices and related tools, demonstrating their ability to proactively identify system weaknesses and improve resilience through controlled experiments.

Job Details :

Lead and mentor a team of SREs to ensure operational excellence and maximize the reliability and availability of client systems.

Minimum 10 years of work experience in DevOps / SRE, including leadership roles.

Architect and design highly scalable and available infrastructure solutions, integrating best practices in reliability engineering and automation.

Collaborate with cross-functional teams (DevOps, Development, IT) to implement SRE principles throughout the software development life cycle.

Establish and manage Service Level Objectives (SLOs) and Service Level Indicators (SLIs) for critical services, monitoring and maintaining performance against defined targets.

Implement and enhance observability, alerting, and incident response processes to proactively address issues and minimize downtime.

Drive continuous improvement initiatives, identifying bottlenecks and optimizing within the infrastructure and application stack.

Develop and maintain documentation related to system architecture, configuration, and procedures.

Stay current with industry trends, recommending and adopting new tools and practices to enhance system reliability.

Qualifications :

Strong background in designing and implementing highly available and scalable infrastructure.

Proficiency in scripting and automation using Python or Shell

Experience with container orchestration platforms, serverless architectures, CI / CD pipelines, and IaC implementations. (Ansible & Terraform)

Experience with Observability tools (preferred : Datadog, CloudWatch).

In-depth knowledge of cloud computing platforms (preferred : AWS).

Solid understanding of SRE / DevOps principles and practices.

Excellent problem-solving skills with the ability to troubleshoot complex issues in production environments.

Strong communication and leadership skills, fostering effective collaboration with cross-functional teams.

Relevant certifications in SRE, DevOps, Cloud, etc., are a plus

16 days ago
Related jobs
Promoted
Themesoft Inc.
Ottawa, Ontario

As a Site Reliability Engineer, your role is to provide reliability engineering services through observability and performance engineering techniques. ...

Promoted
Themesoft Inc.
Ottawa, Ontario

Position: SRE (Site Reliability Engineer)Location: Ottawa, ON (Hybrid Onsite)Job Description:Proven experience as a Site Reliability Engineer or similar role. ...

Yelp
Canada
Remote

Do you want to help drive efficient, profitable, and cost-effective cloud infrastructure at Yelp? As a Site Reliability Engineer on the Cloud Economics team, you will play a key role in Yelp’s overall cloud cost management. Yelp engineering culture is driven by our : we’re a cooperative team that va...

Behavox
Canada

As a Site Reliability Engineer, you will be responsible for the availability, latency, performance, efficiency, change management, monitoring, emergency response, and capacity planning of all production systems and services. You will work together with other DevOps, Product, and Engineering teams to...

Utility Consumer Analytics Inc.
Ontario, Canada
Remote

Experience leading and mentoring development engineering teams. A division of Harris, Silverblaze, is seeking an experienced Lead Full Stack Software Developer to join our dynamic team. Lead, mentor, and inspire a team of software developers to deliver high-quality software solutions on time and wit...

GlossGenius
Canada
Remote

Production Engineer, Cloud Engineer, Site Reliability Engineer, or DevOps equivalent roles. In this role, you'll have the opportunity to join GlossGenius as one of the first Senior Site Reliability Engineer as part of the Platform Engineering team. As a Site Reliability Engineer, you will play a key...

Fullscript
Ottawa, Ontario
Remote

We’re hiring a Lead Machine Learning Engineer to join our Data team. As one of Fullscript's first Lead Machine Learning Engineers, you will play a crucial role in shaping the future of integrative care by leveraging our mission and extensive data resources. Machine Learning Engineer or similar role,...

Mojio
Canada

Title: Senior Site Reliability  Engineer. Location: USA or Canada - Remote. Founded in , we’ve grown from a disruptive startup to a global leader in the connected mobility space, trusted by some of the world’s biggest brands as customers, investors, and partners, including Amazon, Bosch, Deutsc...

New Relic, Inc.
Ottawa, Ontario
Remote

Lead Software Engineer Req ID 3 Location(s) Calgary, Canada; Montreal, Canada; Ottawa, Canada; Toronto, Canada; Vancouver, Canada; Work arrangement(s) Fully Remote (works exclusively from home) Your opportunity At New Relic, we love making tools for software and ops engineers. Lead Software Engineer...

Fullscript
Ottawa, Ontario
Remote

We're hiring a Lead Machine Learning Engineer to join our Data team. Leading Fullscript's first ML Engineering team, you will play a crucial role in shaping the future of integrative care by leveraging our mission and extensive data resources. Machine Learning Engineer or similar role, with a track ...