Senior Site Reliability Engineer

Practice Better
Canada
$100K-$130K a year (estimated)
Full-time
We are sorry. The job offer you are looking for is no longer available.

About the Position :

Job Title : Senior Site Reliability Engineer

Location : The candidate must be located in Canada or the USA. Our office is in Toronto, ON, Canada, but the role is remote / hybrid / flexible.

Reports to : VP, Technology

Position Overview :

We are on a mission to build an industry-leading product on a strong foundation built by a world-class engineering, product, and design team! We seek an experienced and dedicated Senior Site Reliability Engineer (SRE) to join our growing Engineering, Product, Design, and Growth team.

As a Senior SRE, you will be responsible for designing, implementing, and maintaining scalable and reliable infrastructure and systems, ensuring the highest levels of availability and performance for our platform.

The ideal candidate should have a deep understanding of cloud technologies and automation and a proven track record of building and managing complex distributed systems.

You will also contribute and collaborate with the broader Engineering, Product, Design, and Growth team to understand scalability challenges, development lifecycle bottlenecks, and pain points, make informed decisions about our technology, and deliver frictionless solutions for continuous development and release.

If you are a results-oriented Senior SRE Engineer who takes pride in their work, is obsessed about reliability, performance, security, and quality, and thrives in a fast-paced, collaborative environment, and you enjoy driving and owning your work - we want to hear from you!

We’re counting on you to :

  • Infrastructure Design and Automation : Design, build, and maintain scalable and reliable infrastructure using automation tools and best practices, focusing on infrastructure as code (IaC) principles
  • CI / CD : Build and improve robust CI / CD pipelines for engineers to release with minimal friction and high confidence
  • System Reliability : Monitor, analyze, and optimize system performance, availability, reliability, and security, proactively identifying and mitigating potential issues before they impact users
  • Incident Response and Resolution : Lead incident response efforts and training, ensuring timely resolution of incidents and conducting post-mortem analysis to identify root causes and prevent recurrence
  • Operational Efficiency : Lead best practices and patterns around operating our systems through defining and enforcing rigorous SLAs for availability, performance and security
  • Capacity Planning : Collaborate with cross-functional teams to forecast capacity requirements and scale infrastructure to meet growing demand, optimizing resource utilization and cost efficiency
  • Security and Compliance : Implement security best practices and compliance standards, ensuring the integrity and confidentiality of data and systems
  • Continuous Improvement : Drive a culture of continuous improvement, implementing new technologies and processes to enhance system reliability, scalability, and efficiency

Requirements :

  • Bachelor’s degree in Computer Science or a related technical field
  • You have 5-8+ years of experience in site reliability engineering or a similar role, with a strong focus on building and maintaining highly available, scalable, and secure systems
  • Proficiency in cloud platforms, in particular AWS, with hands-on experience in infrastructure provisioning and management tools like Terraform
  • Expertise in containerization and orchestration technologies such as Docker, Kubernetes, or similar
  • Strong scripting and programming skills
  • Experience with monitoring and observability tools such as Datadog, Prometheus, Grafana, ELK stack, or similar

Who you are :

  • Proactive and Curious - Excellent problem-solving skills and a proactive approach to troubleshooting and debugging complex distributed systems.
  • Continuous Learner Possess a growth mindset and a strong commitment to learning and development.
  • Accountable and Autonomous - Self-motivated to identify and independently solve a problem - taking solutions with some level of ambiguity from conception to release.
  • Team Player - Strong communication and collaboration skills, with the ability to work effectively in a cross-functional team environment.

Our Tech Stack :

  • NET, AngularJS / jquery, Angular, and TypeScript
  • Mobile Cordova, Java, Objective-C
  • MongoDB, S3, SQS, Lambda - AWS
  • CoffeeScript, Puma, Ruby on Rails, Postgres - Heroku
  • Bitbucket, Github, Trello, Jira, Slack

Our Perks and Benefits :

  • Unlimited Vacation : We believe you can be highly productive and still have plenty of time for life outside of work.
  • Generous health benefits plan : Coverage starts from Day 1 and includes vision & dental.
  • Choose your device : Are you Team Windows or Apple? You shouldn’t have to compromise, especially if you work more efficiently on a specific operating system.

When you join us, you get to pick!

  • Home Office Allowance : $500 / year to ensure your home office is set up for optimal comfort and productivity.
  • Health & Wellness Allowance : $750 / year to support your health & wellness-related goals and hobbies.
  • Learning & Development Allowance : $1000 / year to explore a new skill, attend a conference, read some new books, etc.
  • Fully Remote : Work from the comfort of your own home with the choice to access our downtown Toronto office for a change of scenery.
  • Events & Free Lunches : We prioritize weekly team bonding and monthly company-wide social events with a lunch stipend.

We pride ourselves on maintaining a culture where everyone feels engaged, inspired, and excited to come to work every day.

J-18808-Ljbffr

2 days ago
Related jobs
Promoted
Inworld AI
Canada

DevOps, Infrastructure, Operations, or Site Reliability Engineer (or as a software engineer with relevant experience). We are looking for a Staff Cloud DevOps/Site Reliability Engineer to join our team. Our Technical Operations team manages the infrastructure, DevOps, and Site Reliability of our pla...

Jobber
Canada
Remote

Senior Site Reliability Engineer. Reporting to a Senior Manager, Product Engineering, the. Our Software Engineering team is pivotal to Jobber's success, creating software that adds value to tens of thousands of users worldwide. As a part of our cloud infrastructure team (SRE), you'll play a critical...

Mojio
Canada

Title: Senior Site Reliability  Engineer. ...

Unreal Gigs
CA
Remote

We are seeking an experienced Site Reliability Engineer (SRE) who is passionate about leveraging data and automation to optimize a highly dynamic infrastructure. Provide platform support to engineering teams, leveraging data insights to drive decision-making. Collaborate with engineering to redefine...

Life360
Remote, Canada, US
Remote

As an SRE on the Location Engineering group you will help build and operate scalable services powering Life360 product. Engage with product and engineering teams to design, build and maintain the system / software for high availability and resiliency. Bachelor's degree in Computer Science or equival...

Behavox
Canada

As a Site Reliability Engineer you will be responsible for the availability, latency, performance, efficiency, change management, monitoring, emergency response, and capacity planning of all production systems and services. You will work together with other DevOps, Product and Engineering teams to d...

Yelp
Canada
Remote

Do you want to help drive efficient, profitable, and cost-effective cloud infrastructure at Yelp? As a Site Reliability Engineer on the Cloud Economics team, you will play a key role in Yelp’s overall cloud cost management. Yelp engineering culture is driven by our : we’re a cooperative team that va...

Behavox
Canada

As a Site Reliability Engineer, you will be responsible for the availability, latency, performance, efficiency, change management, monitoring, emergency response, and capacity planning of all production systems and services. You will work together with other DevOps, Product, and Engineering teams to...

Promoted
RI-MUHC | Research Institute of the MUHC | #rimuhc,
Canada

The data engineer is responsible for architecting, implementing, and maintaining compute frameworks, analysis tooling, and/or model implementations used or created by the Data Science team to support the management and analysis of clinical and administrative data at the McGill University Health Cent...

Promoted
Intelliswift Software
Canada

Design, test and implement continuous integration and deployment pipelines using Gitlab, Jenkins, Harness , Apigee SaaS, Apigee Hybrid, Terraform, Nexus, Docker and Kubernetes Google Cloud Platform, Pipeline etc.Hands-on experience to create CI/CD pipeline using Gitlab , Harness and deploy the proxy...