Specialist Site Reliability EngineerGlobal Talent Alliance, Canada • Montreal (administrative region), QC, CA

Specialist Site Reliability Engineer

Global Talent Alliance, Canada • Montreal (administrative region), QC, CA

5 days ago

Job type

Full-time

Job description

About the job Specialist Site Reliability Engineer

(#11072)

The role of the Specialist Site Reliability Engineer (SRE) is to execute RAM analysis and engineering in support of the I&T solutions. The overall mandate is to ensure that these solutions have attributes of high robustness, reliability, and availability. This involves system and product analysis, modeling and requirements assessment during the development phase and the analysis of field RAM data to determine solution RAM KPIs and to drive corrective action programs. With the advent of Cloud Computing there is also a need for a RAM specialist that is well versed in Cloud based technologies as well as solution architectures for the cloud.

Separate specializations may exist for hardware and software RAM. The technologies used are primarily distributed digital control systems, communication networks, Global Navigation Satellite Systems (GNSS), embedded and virtualized computing as well as Cloud based solutions.

Main Responsibilities

Solution RAM Assessments

Review and approve solution requirements for RAM
Determine non-functional requirements and targets for RAM performance
Perform analysis and modeling to predict RAM behaviour
Adhere to the I&T Development Process

Solution RAM Field Performance

Assign requirements to solutions and products to ensure they support the ability to measure RAM Key Performance Indicators (KPIs)

Use the field performance measurement to identify key contributors and drive corrective action plans when necessary

Review vendor specifications, test results, analysis artifacts

Participate in failure review board for selected vendors

Review corrective action plans from the vendors

Drive to completion the vendor corrective action plans

Use the field performance measurement to identify key contributors and drive corrective action plans when necessary

Requirements

Experience

Minimum 5-10 years overall work experience

Minimum 5 years experience in RAM engineering for complex systems, or 7 years experience in product development for high reliability / availability, or safety critical systems with accountability for product field performance

Skills / Knowledge

Knowledge of hardware and / or software design and development practices and processes with focus on high reliability and high availability applications

Knowledge of RAM analysis techniques such as failure rate prediction, Reliability Block Diagrams (RBD), Markov models, Monte Carlo methods, Failure Modes Effects Analysis (FMEA), Fault Tree Analysis (FTA)

Analysis of reliability and failure field data, statistical estimation, Root Cause Analysis (RCA)

Critical thinking and judgement

Ability to assimilate new information quickly and apply to the assignment

Ability to deliver with autonomy

Organizing work to support multiple projects in parallel

Knowledge and / or experience in the following areas

Multi-Cloud / Multi-Zone-Based designs with High Availability (HA)

Compute Infrastructure : Google Compute Engine (GCE) (servers, databases, firewalls, load balancers, networking and storage)

Services for Google Cloud Platform (GCP)

Databases including NoSQL Databases, Big Data technologies (Oracle, SQL Server, Postgres, Spark, Hadoop, Cloud databases)

Application development concepts and technologies (CI / CD, Java, Python)

Education / Certification / Designation

Bachelors degree in Electrical Engineering, Mechanical Engineering, Computer Science, Computer Engineering or equivalent degree & experience

Assets

Knowledge of product design and standards for the rail industry

Knowledge of rail industry or other transportation industry operations

Working Conditions

This role may require occasional business travel within North America in accordance with company policy

#J-18808-Ljbffr

Create a job alert for this search

Site Reliability Engineer • Montreal (administrative region), QC, CA

Similar jobs

Site Reliability Engineer

TMC Canada • Montreal

Full-time +1

The Application Infrastructure (AI) department is seeking a Site Reliability Engineer (SRE) to help drive the reliability engineering, operations and customer support services for Morgan Stanley's ...Show more

Last updated: 11 hours ago • Promoted • New!

Site Reliability Engineer

High Tech Genesis • Montreal, QC, CA

Full-time

At HTG, you’ll push boundaries with the latest tech and collaborate with a team that loves what they do.Be part of a design services company that is amongst the companies that lead the world in tec...Show more

Last updated: 30+ days ago

Site Reliability Engineer

ApTask • Montreal

Full-time

Direct message the job poster from ApTask.Looking for an intermediate between 2 to 5 years' experience.The Application Infrastructure (Al) department is seeking a Site Reliability Engineer (SRE) to...Show more

Last updated: 30+ days ago • Promoted

Site Reliability Engineer

AKUR8 • Montreal

Full-time

Akur8 is a fast-growing Insurtech scale‑up that transforms insurance pricing and reserving with transparent machine learning. Our SaaS platform injects speed, performance and reliability into insure...Show more

Last updated: 30+ days ago • Promoted

Site Reliability Engineering Specialist (Hybrid)

Morgan Stanley • Montreal

Full-time

Site Reliability Engineering Specialist (Hybrid).Site Reliability Engineering Specialist (Hybrid).We're seeking someone to join our Data Protection Fleet as a Site Reliability Engineering (SRE) Spe...Show more

Last updated: 12 days ago • Promoted

Specialist Site Reliability Engineer

Global Talent Alliance, Canada • Montreal

Full-time

About the job Specialist Site Reliability Engineer.The role of the Specialist Site Reliability Engineer (SRE) is to execute RAM analysis and engineering in support of the I&T solutions.The overall ...Show more

Last updated: 5 days ago • Promoted

Site Reliability Engineer

Compunnel, Inc. • Montreal

Full-time

Client’s Application Infrastructure (AI) division is seeking a Site Reliability Engineer (SRE) to join the Client Development Environment team. This role is focused on driving reliability, operation...Show more

Last updated: 30+ days ago • Promoted

Site Reliability Engineer

Vertex Elite LLC • Ahuntsic North, ca

Full-time

Duration : Contract Key Skills : Monitoring / Observability tools - Dynatrace, ELK etc.Platform / cloud Observability - OpenShift, Prometheus / Azure Cloud etc. Key Responsibilities : Collaborate with v...Show more

Last updated: 12 days ago • Promoted

Site Reliability Engineer (SRE)

Open Systems Technologies • Montreal

Full-time

Last updated: 11 hours ago • Promoted • New!

Senior Site Reliability Engineer : Observability & Cloud Mastery

Xsolla • Montreal

Full-time

A leading gaming services company in Montreal is looking for a Site Reliability Engineer to ensure system reliability and availability. The ideal candidate will have extensive experience in monitori...Show more

Last updated: 30+ days ago • Promoted

Site Reliability Engineering Specialist (Hybrid)

PowerToFly • Montreal

Full-time

We're seeking someone to join our Data Protection Fleet as a Site Reliability Engineering (SRE) Specialist in Cyber to help drive performance, reliability, enhanced observability and efficiency for...Show more

Last updated: 12 days ago • Promoted

Senior Site Reliability Expert (Retail)

Lightspeed • Montreal

Full-time

Are you actively seeking a new opportunity, or simply exploring the market? Either way, you might have just found the right place!. We’re looking for a Senior SRE to join our Lightspeed Retail group...Show more

Last updated: 11 hours ago • Promoted • New!

Senior Site Reliability Engineer

Targeted Talent • Montreal, QC, Canada

Permanent

We are looking for an experienced.Senior Site Reliability Engineer.Our client is a global enterprise company with a product that you've likely used. Experience with coding / software development, ...Show more

Last updated: 30+ days ago • Promoted

Site Reliability Engineer / Platform Operations Engineer

Targeted Talent • Montreal, QC, Canada

Permanent

We are looking for an experienced Site Reliability Engineer or Platform Operations Engineer for our client.This is a permanent position that is remote to start with later relocation to.Our client i...Show more

Last updated: 30+ days ago • Promoted

Senior Site Reliability Engineer (SRE)

Intelcom | Dragonfly • Montreal

Full-time

Be among the first 25 applicants.Get AI-powered advice on this job and more exclusive features.Incident Management : Detect and respond to issues, ensuring rapid recovery to minimize downtime.Curren...Show more

Last updated: 30+ days ago • Promoted

Site Reliability Engineer w / Python (Onsite Hybrid)

NTT DATA, Inc. • Montreal

Full-time

Site Reliability Engineer w / Python (Onsite Hybrid).NTT DATA strives to hire exceptional, innovative and passionate individuals who want to grow with us. If you want to be part of an inclusive, adapt...Show more

Last updated: 30+ days ago • Promoted

Senior Site Reliability Engineer (SRE)

Devopshunt • Montreal

Full-time

Senior Site Reliability Engineer (SRE).Digital Infrastructure Team Lead.This is an opportunity to make a significant impact in a fast-paced, innovative environment. If you’re passionate about buildi...Show more

Last updated: 12 days ago • Promoted

Senior Site Reliability / Gitops Engineer

Canonical • Montreal

Full-time

Senior Site Reliability / Gitops Engineer.Join Canonical, a leading provider of open‑source software and operating systems, as a Senior Site Reliability / Gitops Engineer.In this role you will driv...Show more

Last updated: 30+ days ago • Promoted