Senior Site Reliability Engineer
About the Position :
Job Title : Senior Site Reliability Engineer
Location : The candidate must be located in Canada or the USA. Our office is in Toronto, ON, Canada, but the role is remote / hybrid / flexible.
Reports to : VP, Technology
Position Overview :
We are on a mission to build an industry-leading product on a strong foundation built by a world-class engineering, product, and design team! We seek an experienced and dedicated Senior Site Reliability Engineer (SRE) to join our growing Engineering, Product, Design, and Growth team.
As a Senior SRE, you will be responsible for designing, implementing, and maintaining scalable and reliable infrastructure and systems, ensuring the highest levels of availability and performance for our platform.
The ideal candidate should have a deep understanding of cloud technologies and automation and a proven track record of building and managing complex distributed systems.
You will also contribute and collaborate with the broader Engineering, Product, Design, and Growth team to understand scalability challenges, development lifecycle bottlenecks, and pain points, make informed decisions about our technology, and deliver frictionless solutions for continuous development and release.
If you are a results-oriented Senior SRE Engineer who takes pride in their work, is obsessed about reliability, performance, security, and quality, and thrives in a fast-paced, collaborative environment, and you enjoy driving and owning your work - we want to hear from you!
We’re counting on you to :
- Infrastructure Design and Automation : Design, build, and maintain scalable and reliable infrastructure using automation tools and best practices, focusing on infrastructure as code (IaC) principles
- CI / CD : Build and improve robust CI / CD pipelines for engineers to release with minimal friction and high confidence
- System Reliability : Monitor, analyze, and optimize system performance, availability, reliability, and security, proactively identifying and mitigating potential issues before they impact users
- Incident Response and Resolution : Lead incident response efforts and training, ensuring timely resolution of incidents and conducting post-mortem analysis to identify root causes and prevent recurrence
- Operational Efficiency : Lead best practices and patterns around operating our systems through defining and enforcing rigorous SLAs for availability, performance and security
- Capacity Planning : Collaborate with cross-functional teams to forecast capacity requirements and scale infrastructure to meet growing demand, optimizing resource utilization and cost efficiency
- Security and Compliance : Implement security best practices and compliance standards, ensuring the integrity and confidentiality of data and systems
- Continuous Improvement : Drive a culture of continuous improvement, implementing new technologies and processes to enhance system reliability, scalability, and efficiency
Requirements :
- Bachelor’s degree in Computer Science or a related technical field
- You have 5-8+ years of experience in site reliability engineering or a similar role, with a strong focus on building and maintaining highly available, scalable, and secure systems
- Proficiency in cloud platforms, in particular AWS, with hands-on experience in infrastructure provisioning and management tools like Terraform
- Expertise in containerization and orchestration technologies such as Docker, Kubernetes, or similar
- Strong scripting and programming skills
- Experience with monitoring and observability tools such as Datadog, Prometheus, Grafana, ELK stack, or similar
Who you are :
- Proactive and Curious - Excellent problem-solving skills and a proactive approach to troubleshooting and debugging complex distributed systems.
- Continuous Learner Possess a growth mindset and a strong commitment to learning and development.
- Accountable and Autonomous - Self-motivated to identify and independently solve a problem - taking solutions with some level of ambiguity from conception to release.
- Team Player - Strong communication and collaboration skills, with the ability to work effectively in a cross-functional team environment.
Our Tech Stack :
- NET, AngularJS / jquery, Angular, and TypeScript
- Mobile Cordova, Java, Objective-C
- MongoDB, S3, SQS, Lambda - AWS
- CoffeeScript, Puma, Ruby on Rails, Postgres - Heroku
- Bitbucket, Github, Trello, Jira, Slack
Our Perks and Benefits :
- Unlimited Vacation : We believe you can be highly productive and still have plenty of time for life outside of work.
- Generous health benefits plan : Coverage starts from Day 1 and includes vision & dental.
- Choose your device : Are you Team Windows or Apple? You shouldn’t have to compromise, especially if you work more efficiently on a specific operating system.
When you join us, you get to pick!
- Home Office Allowance : $500 / year to ensure your home office is set up for optimal comfort and productivity.
- Health & Wellness Allowance : $750 / year to support your health & wellness-related goals and hobbies.
- Learning & Development Allowance : $1000 / year to explore a new skill, attend a conference, read some new books, etc.
- Fully Remote : Work from the comfort of your own home with the choice to access our downtown Toronto office for a change of scenery.
- Events & Free Lunches : We prioritize weekly team bonding and monthly company-wide social events with a lunch stipend.
We pride ourselves on maintaining a culture where everyone feels engaged, inspired, and excited to come to work every day.
J-18808-Ljbffr