Search jobs > Montreal, QC > Reliability engineer

Site Reliability Engineer

SAP SE
Montreal, QC, Canada
$100K-$120K a year (estimated)
Full-time

We help the world run better

At SAP, we enable you to bring out your best. Our company culture is focused on collaboration and a shared passion to help the world run better.

We focus every day on building the foundation for tomorrow and creating a workplace that embraces differences, values flexibility, and is aligned to our purpose-driven and future-focused work.

We offer a highly collaborative, caring team environment with a strong focus on learning and development, recognition for your individual contributions, and a variety of benefit options for you to choose from.

Montreal location.

PURPOSE AND OBJECTIVES

The Reliability Engineering organization provides a multitude of products and services related to operations and continuity of business delivery.

The Site Reliability Engineering teams make the SAP Business Technology Platform run better by providing 24x7 deep technical coverage for Incident Management (Outages and other incidents with major customer impact) applying SRE principles.

We share a Live Site First culture and care for the business continuity of our customers running mission-critical applications in the Cloud.

We are looking for an engineer to join an already established SRE team for the SAP Business Technology Platform.

EXPECTATIONS AND TASKS

As a Site Reliability Engineer, you will have the opportunity to operate and support business-critical Cloud services. As part of your daily job, you will proactively monitor the service behavior and identify areas for improvement.

You will participate in the development of tools for monitoring and troubleshooting cloud services built on the latest open source and SAP technologies, following SRE principles.

Responsibilities :

  • Act as technical expert during Live site incidents (downtimes of supported services in scope), investigate and solve incidents on a deep technical level.
  • Drive root cause analysis and follow-up improvements to prevent issues from reoccurring.
  • Perform in-depth troubleshooting and log analysis to identify and solve complex issues in accordance with internal and external SLAs.
  • Build software-based solutions to address improvements in service reliability and stability.
  • Enhance infrastructure and platform monitoring by gathering system metrics (4 Golden Signals) and implementing tools for recovery.
  • Integrate and collaborate closely with development teams and work with them on outputs from Postmortems and product improvements.
  • Learn new technologies and keep up to date with the latest development increments.
  • Create and maintain technical documentation.
  • Define, advocate, apply SRE best practices.
  • Participate in the on-call rotation (follow the sun approach) to react to major incidents. On-call has a special compensation package.

If you are interested in software engineering based on cutting-edge technology, you will find an inspiring and professional environment for your learning and growth.

You will be working in close collaboration with the development teams that build the services which are in our joint responsibility.

We emphasize teamwork and a trust-based working model. Collaboration with other teams in an international environment will be a regular part of your work.

EDUCATION AND QUALIFICATIONS / SKILLS AND COMPETENCIES

Required Skills and Competencies :

  • Experience with Kubernetes and good understanding of container technologies.
  • Understanding of modern cloud architectures (experience with Cloud Platforms such as AWS, Azure, GCP are a plus).
  • Scripting skills, CI / CD (Concourse, Github Actions and ArgoCD are a plus) - enthusiasm for automation.
  • Working efficiently in emergency situations. Affinity to quickly analyze and solve problems in a global team setup.
  • Excellent team player, passionate about his / her work, self-motivated and driven.
  • Excellent communication skills - precise, based on facts.
  • Fluency in English, basic French.

Preferred Additional Skills and Competencies :

  • Coding experience with Python, Bash, GO.
  • CKA / CKAD / CKS certifications.
  • Experience with Unix / Linux operating system.
  • Experience with modern monitoring, logging, and alerting tools (Grafana, Prometheus, Kibana, Loki, Splunk On-Call, Dynatrace).
  • Security best practices for application development and operations in a public Cloud Environment.
  • Contribution to open-source projects.

Bring out your best

SAP innovations help more than four hundred thousand customers worldwide work together more efficiently and use business insight more effectively.

Originally known for leadership in enterprise resource planning (ERP) software, SAP has evolved to become a market leader in end-to-end business application software and related services for database, analytics, intelligent technologies, and experience management.

As a cloud company with two hundred million users and more than one hundred thousand employees worldwide, we are purpose-driven and future-focused, with a highly collaborative team ethic and commitment to personal development.

We win with inclusion

SAP’s culture of inclusion, focus on health and well-being, and flexible working models help ensure that everyone regardless of background feels included and can run at their best.

SAP believes we are made stronger by the unique capabilities and qualities that each person brings to our company, and we invest in our employees to inspire confidence and help everyone realize their full potential.

EOE AA M / F / Vet / Disability :

Qualified applicants will receive consideration for employment without regard to their age, race, religion, national origin, ethnicity, age, gender (including pregnancy, childbirth, et al), sexual orientation, gender identity or expression, protected veteran status, or disability.

J-18808-Ljbffr

7 days ago
Related jobs
Promoted
Socotra, Inc.
Montreal, Quebec

Site Reliability Engineer (SRE), Systems Engineer, Software Engineer, DevOps Engineer, Infrastructure Engineer, Production Engineer). The Transit, Bikes, and Scooters (TBS) infrastructure team at Lyft in Montreal is growing, and we are looking for a Site Reliability Engineer to support our productio...

Promoted
Alltech Consulting Services
Montreal, Quebec

We are adding more engineers and require an SRE to help us create the same culture of ownership and independence which exists in our current squad. You will work with the Vault squad supporting Engineering & Operational tasks, ranging from service delivery, automation, DevOps tasks and supporting an...

Royal Bank of Canada
Montreal, Quebec

This role will be responsible for the development, implementation, and support of Site Reliability Engineering (SRE) solutions for applications supported by the Digital Branch SRE organization. As the Engineering arm of the Digital Branch SRE organization, this team will work collaboratively with th...

S.i. Systems
Montreal, Quebec

Senior Site Reliability Engineer (SRE). As the successful candidate, you will work with other application and operational experts to ensure the highest level of availability, reliability, security, and scalability of various financial applications and products. ...

Behavox
Montreal, Quebec

As a Site Reliability Engineer, you will be responsible for the availability, latency, performance, efficiency, change management, monitoring, emergency response, and capacity planning of all production systems and services. You will work together with other DevOps, Product, and Engineering teams to...

Royal Bank of Canada>
Montreal, Quebec

This role will be responsible for the development, implementation, and support of Site Reliability Engineering (SRE) solutions for applications supported by the Digital Branch SRE organization. As the Engineering arm of the Digital Branch SRE organization, this team will work collaboratively with th...

Bourse de Montreal Inc.
Montréal, Quebec

Previous experience as a Site Reliability Engineer (SRE). The Devops Engineering team is responsible for working closely with various business units and stakeholders to solve complex problems using innovative solutions, quickly and effectively using agile, lean and devops methodologies, while ensuri...

Behavox
Montreal, Quebec

As a Site Reliability Engineer you will be responsible for the availability, latency, performance, efficiency, change management, monitoring, emergency response, and capacity planning of all production systems and services. You will work together with other DevOps, Product and Engineering teams to d...

Alltech Consulting Services
Montreal, Quebec

We are adding more engineers and require an SRE to help us create the same culture of ownership and independence which exist in our current squad. You will work with the Vault squad supporting Engineering & Operational tasks, ranging from service delivery, automation, DevOps tasks and supporting an ...

KPMG
Canada, Canada

The OPS Site Reliability Engineer will be a focal role owning and ensuring the fluent operations of Managed Services offerings in the KPMG production cloud environment. The role will be focusing on driving high reliability into systems by working closely with the development DevOps engineers, securi...