Recherche d'emploi > Montréal, QC > Reliability engineer

Site Reliability Engineer (SRE)

Alltech Consulting Services
Montreal, QC, Canada
60 $-70 $ / heure (estimé)
Temps plein

Job Description

Level 4

The Application Infrastructure (AI) department is seeking a Site Reliability Engineer (SRE) to help drive the reliability engineering, operations, and customer support services for Company’s ServiceNow SaaS implementation.

Reporting to a Site Reliability Engineering & Operations Lead, this role requires delivering a range of SRE practices within a global community of other SREs.

This means teaming up with colleagues to deliver reliable, resilient systems without wasteful operational effort. SRE practices include :

  • Task optimization and automation
  • Prioritizing technical debt
  • Observability and monitoring dashboards
  • Capacity management
  • Incident response
  • Problem elimination

This position specializes in ServiceNow Software as a Service which provides a suite of IT service management capabilities and is integrated with many products such as chatbot technology, on-call escalation incident management, and a range of other on-premises infrastructure (including SQL databases, APIs, and web infrastructure).

Despite the focus on value-add development and process delivery, this is also a production-side operational role requiring participation in an on-call rotation from time to time.

Successful candidates for SRE roles in Application Infrastructure have so far come from a variety of backgrounds; they could be :

  • A developer today looking to evolve site reliability as a practice
  • An infrastructure specialist with an interest in reliability and resilience principles
  • A strong system admin who enjoys troubleshooting along with some task automation experience

Prior experience in the financial services industry is not required, and we welcome candidates from all industries and backgrounds to apply.

Responsibilities include :

  • Delivery of improvements that will maximize the availability and performance of supported systems through optimized and automated operational tasks, collaborating on the development of operational tools, ongoing problem management, and architecture reviews with colleagues.
  • Troubleshooting ServiceNow issues, and some on-premise capabilities in a Linux environment from time to time, collaborating with others to determine the root cause of issues, and agreeing on lasting improvements that can be made.
  • Exploring and delivering observability including metrics, logging, tracing, and alerting that can define and measure the target reliability of a product.
  • Being dependable and responsive during agreed hours, like when part of the on-call rotation with the rest of the global team (with a time-off in lieu system).
  • A commitment to understanding the Firm’s ServiceNow instances and related dependencies, contributing to their documentation.
  • Identification and prioritization of technical debt that can impact client satisfaction or operational efficiency.
  • Give feedback on policy and procedures related to the delivery of SRE and operational practices with a view to continually making the Firm safer and more efficient.

Skills required :

  • The ideal candidate would have at least one of the following :
  • ServiceNow administration or development experience
  • Software development skills in one or more programming languages, e.g. Python
  • Proficient oral and written communication skills
  • Establishing warm, effective relationships with colleagues to collaborate on successful delivery
  • A dependable team worker with a demonstrated commitment to client service
  • Ability to respond appropriately during occasional technical emergencies, such as outages.

Skills desired :

ServiceNow administration or development experience, although this can be acquired by the successful candidate through on-the-job training and learning.

J-18808-Ljbffr

Il y a 7 jours
Emplois reliés
Offre sponsorisée
Gologic Inc.
Montréal, Québec

Mission orientée vers la culture SRE. ...

Offre sponsorisée
Alltech Consulting Services
Montréal, Québec

The Application Infrastructure (AI) department is seeking a Site Reliability Engineer (SRE) to help drive the reliability engineering, operations, and customer support services for Company’s ServiceNow SaaS implementation. Reporting to a Site Reliability Engineering & Operations Lead, this role requ...

Offre sponsorisée
SAP
Montréal, Québec

The Site Reliability Engineering teams make the SAP Business Technology Platform run better by providing 24x7 technical coverage for Incident Management applying SRE principles. As a Site Reliability Engineer, you will operate and support business critical Cloud services. The Reliability Engineering...

Offre sponsorisée
National Bank
Montréal, Québec

As a System Reliability Specialist, you will be responsible for helping all IT teams to implement the necessary mechanisms to improve and maintain the highest standards of resilience and availability of IT services. ...

Alltech Consulting Services
Montréal, Québec

The Application Infrastructure (AI) department is seeking a Site Reliability Engineer (SRE) to help drive the reliability engineering, operations and customer support services for Company’s ServiceNow SaaS implementation. Successful candidates for SRE roles in Application Infrastructure have so far ...

Behavox
Canada

As a Site Reliability Engineer, you will be responsible for the availability, latency, performance, efficiency, change management, monitoring, emergency response, and capacity planning of all production systems and services. You will work together with other DevOps, Product, and Engineering teams to...

Mojio
Canada

Title: Senior Site Reliability  Engineer. ...

NBC
Montréal, Québec

As a Systems Reliability Developper,  you will help all IT teams put in place the necessary mechanisms to improve and maintain the highest standards of resilience and availability of IT services. ...

SAP
Montréal, Québec

The Site Reliability Engineering teams make the SAP Business Technology Platform run better by providing 24x7 deep technical coverage for Incident Management (Outages and other incidents with major customer impact) applying SRE principles. As a Site Reliability Engineer, you will have the opportunit...

Alltech Consulting Services
Montréal, Québec

We are adding more engineers and require an SRE to help us create the same culture of ownership and independence which exist in our current squad. As an SRE you would be joining our growing HashiVault squad as part of the strategy to offer more services and a better user experience to our clients. Y...