Staff Site Reliability Engineer

Lightspeed

Toronto, Ontario, Canada

$81.1K-$197.2K a year (estimated)

Full-time

Hi there! Thanks for stopping by

Are you actively looking for a new opportunity? Or just checking the market? Well you might just be in the right place!

We’re looking for a Staff, Site Reliability Engineer to join our NuOrder by Lightspeed team in North America. NuORDER by Lightspeed builds software solutions that help merchants grow the size and the profitability of their business.

You'll join a team responsible for supporting the group in cross-cutting concerns, such as cloud infrastructure, reliability and incident management, data warehousing and analytics, cost transparency and efficiency, and much more.

You will also be supporting our growing Dev teams with the infrastructure and tools needed to continue scaling. You will build and support multi-region infrastructures and networks, and help run our products in a reliable, efficient and secure manner by implementing, advising and advocating the well-known DevOps principles.

What you’ll be doing :

Work closely with development teams to empower them with the necessary tools and practices for monitoring software health in production, defining and measuring reliability metrics (SLI, SLO), and managing error budgets.
Design, build and maintain robust infrastructure built upon GCP, leveraging cloud native technologies such as GKE, Cloud SQL, BigQuery, etc.
Develop and manage CI / CD pipelines for efficient deployment and release using a number of technologies (GitLab, Gihub, Helm, Terraform, etc.).
Drive incident management process and conduct post-mortem analysis to prevent future outages.
Mentor junior SREs and developers, providing guidance on best practices in cloud architecture, data management, and software development.
Conduct system performance benchmarks and implement enhancements to improve system reliability and throughput.
Collaborate with cross-functional teams to identify, design, and implement internal process improvements in a cost-efficient manner.
Design and build robust, scalable, and highly available systems.
Build platform solutions and apply software engineering principles to improve the reliability of our software and accelerate software delivery
Manage infrastructure change through infrastructure as code (IaC)
Be part of our on-call rotation.
Stay current with industry trends and emerging technologies, advocating for the adoption of new technologies and practices that improve product quality and team efficiency.

What you need to bring :

Bachelor’s degree in Computer Science, Engineering, or possess a related level of real-world experience.
8-10 years of experience across site reliability engineering, systems administration, and / or software engineering.
Strong expertise in container orchestration platforms, specifically Kubernetes.
Strong understanding of both relational (e.g., PostgreSQL, MySQL) and NoSQL databases (e.g., MongoDB, Cassandra, Redis).
Deep understanding of network protocols and IP networking, as well as experience with network troubleshooting.
Proficiency in programming languages such as Java, Python, Go, etc.
Proven track record of managing large-scale infrastructure in cloud environments, such as Google Cloud, AWS or Azure.
Experience with monitoring tools (e.g., Prometheus, Grafana, Datadog) and logging solutions (e.g., ELK stack).
Strong understanding of security best practices.
Exceptional problem-solving skills and the ability to work under pressure to troubleshoot and resolve complex issues.
Excellent communication skills to effectively collaborate with cross-functional teams.
Strong leadership skills, capable of leading projects and influencing engineering decisions across the organization.

We know that people are more than what’s on their CV. If you’re unsure that you have the right profile for the role... hit the Apply’ button and give it a try!

What’s in it for you?

Come live the Lightspeed experience...

Ability to do your job in a truly flexible environment;
Genuine career opportunities in a company that’s creating new jobs everyday;
Work in a team big enough for growth but lean enough to make a real impact.

and enjoy a range of benefits that’ll keep you happy, healthy and (not) hungry :

Lightspeed share scheme (we are all owners)
Lightspeed RSU program (we are all owners)
Unlimited paid time off policy
Flexible working policy
Health insurance
Health and wellness benefits
Paid leave assistance for new parents
Linkedin learning
Volunteer day

2 days ago

Related jobs

Promoted

Staff Cloud DevOps/Site Reliability Engineer

Inworld AI

Canada

We are looking for a Staff Cloud DevOps/Site Reliability Engineer to join our team. DevOps, Infrastructure, Operations, or Site Reliability Engineer (or as a software engineer with relevant experience). Our Technical Operations team manages the infrastructure, DevOps, and Site Reliability of our pla...

Promoted

Site Reliability Engineer

Capgemini

Toronto, Ontario

Role: Site Reliability Engineer - Production Support. Develop SRE solutions (monitoring and alerting, machine learning anomaly detection, self-healing and reliability testing). ...

Senior or Staff Site Reliability Engineer - Data Infrastructure

CIRCLE

Toronto, Ontario

Staff Site Reliability Engineer (IV). Staff Site Reliability Engineer (IV). Staff Site Reliability Engineer. As a Senior Site Reliability Engineer at Circle, you will design, build, and maintain Circle’s infrastructure estate to meet the growing worldwide customer base on public cloud providers acro...

Site Reliability Engineer, Cloud Economics (Remote - Canada)

Yelp

Canada

Remote

Do you want to help drive efficient, profitable, and cost-effective cloud infrastructure at Yelp? As a Site Reliability Engineer on the Cloud Economics team, you will play a key role in Yelp’s overall cloud cost management. Yelp engineering culture is driven by our : we’re a cooperative team that va...

Site Reliability Engineer (SRE) - Toronto, CA

Lorven Technologies

Toronto, Ontario

Site Reliability Engineer (SRE). A Bachelor’s degree in Computer Science or related technical field (Example: Mathematics/Engineering/Physics), or equivalent practical experience. ...

Site Reliability Engineer

Scotiabank

Toronto, Ontario

We are looking for a developer to join our Digital Engineering Operations. Develop software following sound software engineering principles and lead investigations for production issues and come up with solutions that meet security standards defined by the organization. If you require accommodation ...

Senior Site Reliability Engineer

Sentry

Toronto, Ontario

The Site Reliability Engineering team is responsible for the deployment, configuration, maintenance and monitoring of Sentry's hosted platform. As Senior Site Reliability Engineer, you will work with a multitude of technologies and have a direct impact on how Sentry evolves to handle 100x our curren...

Cloud Platform/Site Reliability Engineer

State Street Corporation

Toronto, Ontario

The State Street Cyber Architecture & Engineering team is looking for a C. Develop tools and automation for system configuration, deployment and maintenance of our cloud environments while preserving high reliability and availability. Proven technical solutioning experience with current and emer...

Site Reliability Engineer (SRE)

Bourse de Montreal Inc.

Toronto, Ontario

Previous experience as a Site Reliability Engineer (SRE). The Devops Engineering team is responsible for working closely with various business units and stakeholders to solve complex problems using innovative solutions, quickly and effectively using agile, lean and devops methodologies, while ensuri...

Site Reliability Engineer

Scotiabank

Toronto, Ontario

Develop software following sound software engineering principles and lead investigations for production issues and come up with solutions that meet security standards defined by the organization. ...

Staff Site Reliability Engineer

Staff Cloud DevOps/Site Reliability Engineer

Site Reliability Engineer

Senior or Staff Site Reliability Engineer - Data Infrastructure

Site Reliability Engineer, Cloud Economics (Remote - Canada)

Site Reliability Engineer (SRE) - Toronto, CA

Site Reliability Engineer

Senior Site Reliability Engineer

Cloud Platform/Site Reliability Engineer

Site Reliability Engineer (SRE)

Site Reliability Engineer

Related searches