Talent.com
Specialist Site Reliability Engineer
Specialist Site Reliability EngineerGlobal Talent Alliance, Canada • Montreal (administrative region), QC, CA
Specialist Site Reliability Engineer

Specialist Site Reliability Engineer

Global Talent Alliance, Canada • Montreal (administrative region), QC, CA
5 days ago
Job type
  • Full-time
Job description

About the job Specialist Site Reliability Engineer

(#11072)

The role of the Specialist Site Reliability Engineer (SRE) is to execute RAM analysis and engineering in support of the I&T solutions. The overall mandate is to ensure that these solutions have attributes of high robustness, reliability, and availability. This involves system and product analysis, modeling and requirements assessment during the development phase and the analysis of field RAM data to determine solution RAM KPIs and to drive corrective action programs. With the advent of Cloud Computing there is also a need for a RAM specialist that is well versed in Cloud based technologies as well as solution architectures for the cloud.

Separate specializations may exist for hardware and software RAM. The technologies used are primarily distributed digital control systems, communication networks, Global Navigation Satellite Systems (GNSS), embedded and virtualized computing as well as Cloud based solutions.

Main Responsibilities

Solution RAM Assessments

  • Review and approve solution requirements for RAM
  • Determine non-functional requirements and targets for RAM performance
  • Perform analysis and modeling to predict RAM behaviour
  • Adhere to the I&T Development Process

Solution RAM Field Performance

  • Assign requirements to solutions and products to ensure they support the ability to measure RAM Key Performance Indicators (KPIs)
  • Use the field performance measurement to identify key contributors and drive corrective action plans when necessary
  • Review vendor specifications, test results, analysis artifacts
  • Participate in failure review board for selected vendors
  • Review corrective action plans from the vendors
  • Drive to completion the vendor corrective action plans
  • Use the field performance measurement to identify key contributors and drive corrective action plans when necessary
  • Requirements

    Experience

  • Minimum 5-10 years overall work experience
  • Minimum 5 years experience in RAM engineering for complex systems, or 7 years experience in product development for high reliability / availability, or safety critical systems with accountability for product field performance
  • Skills / Knowledge

    Knowledge of hardware and / or software design and development practices and processes with focus on high reliability and high availability applications

  • Knowledge of RAM analysis techniques such as failure rate prediction, Reliability Block Diagrams (RBD), Markov models, Monte Carlo methods, Failure Modes Effects Analysis (FMEA), Fault Tree Analysis (FTA)
  • Analysis of reliability and failure field data, statistical estimation, Root Cause Analysis (RCA)
  • Critical thinking and judgement
  • Ability to assimilate new information quickly and apply to the assignment
  • Ability to deliver with autonomy
  • Organizing work to support multiple projects in parallel
  • Knowledge and / or experience in the following areas

  • Multi-Cloud / Multi-Zone-Based designs with High Availability (HA)
  • Compute Infrastructure : Google Compute Engine (GCE) (servers, databases, firewalls, load balancers, networking and storage)
  • Services for Google Cloud Platform (GCP)
  • Databases including NoSQL Databases, Big Data technologies (Oracle, SQL Server, Postgres, Spark, Hadoop, Cloud databases)
  • Application development concepts and technologies (CI / CD, Java, Python)
  • Education / Certification / Designation

  • Bachelors degree in Electrical Engineering, Mechanical Engineering, Computer Science, Computer Engineering or equivalent degree & experience
  • Assets

  • Knowledge of product design and standards for the rail industry
  • Knowledge of rail industry or other transportation industry operations
  • Working Conditions

    This role may require occasional business travel within North America in accordance with company policy

    #J-18808-Ljbffr

    Create a job alert for this search

    Site Reliability Engineer • Montreal (administrative region), QC, CA

    Similar jobs
    Site Reliability Engineer

    Site Reliability Engineer

    TMC Canada • Montreal
    Full-time +1
    The Application Infrastructure (AI) department is seeking a Site Reliability Engineer (SRE) to help drive the reliability engineering, operations and customer support services for Morgan Stanley's ...Show more
    Last updated: 11 hours ago • Promoted • New!
    Site Reliability Engineer

    Site Reliability Engineer

    High Tech Genesis • Montreal, QC, CA
    Full-time
    At HTG, you’ll push boundaries with the latest tech and collaborate with a team that loves what they do.Be part of a design services company that is amongst the companies that lead the world in tec...Show more
    Last updated: 30+ days ago
    Site Reliability Engineer

    Site Reliability Engineer

    ApTask • Montreal
    Full-time
    Direct message the job poster from ApTask.Looking for an intermediate between 2 to 5 years' experience.The Application Infrastructure (Al) department is seeking a Site Reliability Engineer (SRE) to...Show more
    Last updated: 30+ days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    AKUR8 • Montreal
    Full-time
    Akur8 is a fast-growing Insurtech scale‑up that transforms insurance pricing and reserving with transparent machine learning. Our SaaS platform injects speed, performance and reliability into insure...Show more
    Last updated: 30+ days ago • Promoted
    Site Reliability Engineering Specialist (Hybrid)

    Site Reliability Engineering Specialist (Hybrid)

    Morgan Stanley • Montreal
    Full-time
    Site Reliability Engineering Specialist (Hybrid).Site Reliability Engineering Specialist (Hybrid).We're seeking someone to join our Data Protection Fleet as a Site Reliability Engineering (SRE) Spe...Show more
    Last updated: 12 days ago • Promoted
    Specialist Site Reliability Engineer

    Specialist Site Reliability Engineer

    Global Talent Alliance, Canada • Montreal
    Full-time
    About the job Specialist Site Reliability Engineer.The role of the Specialist Site Reliability Engineer (SRE) is to execute RAM analysis and engineering in support of the I&T solutions.The overall ...Show more
    Last updated: 5 days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Compunnel, Inc. • Montreal
    Full-time
    Client’s Application Infrastructure (AI) division is seeking a Site Reliability Engineer (SRE) to join the Client Development Environment team. This role is focused on driving reliability, operation...Show more
    Last updated: 30+ days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Vertex Elite LLC • Ahuntsic North, ca
    Full-time
    Duration : Contract Key Skills : Monitoring / Observability tools - Dynatrace, ELK etc.Platform / cloud Observability - OpenShift, Prometheus / Azure Cloud etc. Key Responsibilities : Collaborate with v...Show more
    Last updated: 12 days ago • Promoted
    Site Reliability Engineer (SRE)

    Site Reliability Engineer (SRE)

    Open Systems Technologies • Montreal
    Full-time
    The Application Infrastructure (AI) department is seeking a Site Reliability Engineer (SRE) to help drive the reliability engineering, operations and customer support services for Morgan Stanley's ...Show more
    Last updated: 11 hours ago • Promoted • New!
    Senior Site Reliability Engineer : Observability & Cloud Mastery

    Senior Site Reliability Engineer : Observability & Cloud Mastery

    Xsolla • Montreal
    Full-time
    A leading gaming services company in Montreal is looking for a Site Reliability Engineer to ensure system reliability and availability. The ideal candidate will have extensive experience in monitori...Show more
    Last updated: 30+ days ago • Promoted
    Site Reliability Engineering Specialist (Hybrid)

    Site Reliability Engineering Specialist (Hybrid)

    PowerToFly • Montreal
    Full-time
    We're seeking someone to join our Data Protection Fleet as a Site Reliability Engineering (SRE) Specialist in Cyber to help drive performance, reliability, enhanced observability and efficiency for...Show more
    Last updated: 12 days ago • Promoted
    Senior Site Reliability Expert (Retail)

    Senior Site Reliability Expert (Retail)

    Lightspeed • Montreal
    Full-time
    Are you actively seeking a new opportunity, or simply exploring the market? Either way, you might have just found the right place!. We’re looking for a Senior SRE to join our Lightspeed Retail group...Show more
    Last updated: 11 hours ago • Promoted • New!
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    Targeted Talent • Montreal, QC, Canada
    Permanent
    We are looking for an experienced.Senior Site Reliability Engineer.Our client is a global enterprise company with a product that you've likely used. Experience with coding / software development, ...Show more
    Last updated: 30+ days ago • Promoted
    Site Reliability Engineer / Platform Operations Engineer

    Site Reliability Engineer / Platform Operations Engineer

    Targeted Talent • Montreal, QC, Canada
    Permanent
    We are looking for an experienced Site Reliability Engineer or Platform Operations Engineer for our client.This is a permanent position that is remote to start with later relocation to.Our client i...Show more
    Last updated: 30+ days ago • Promoted
    Senior Site Reliability Engineer (SRE)

    Senior Site Reliability Engineer (SRE)

    Intelcom | Dragonfly • Montreal
    Full-time
    Be among the first 25 applicants.Get AI-powered advice on this job and more exclusive features.Incident Management : Detect and respond to issues, ensuring rapid recovery to minimize downtime.Curren...Show more
    Last updated: 30+ days ago • Promoted
    Site Reliability Engineer w / Python (Onsite Hybrid)

    Site Reliability Engineer w / Python (Onsite Hybrid)

    NTT DATA, Inc. • Montreal
    Full-time
    Site Reliability Engineer w / Python (Onsite Hybrid).NTT DATA strives to hire exceptional, innovative and passionate individuals who want to grow with us. If you want to be part of an inclusive, adapt...Show more
    Last updated: 30+ days ago • Promoted
    Senior Site Reliability Engineer (SRE)

    Senior Site Reliability Engineer (SRE)

    Devopshunt • Montreal
    Full-time
    Senior Site Reliability Engineer (SRE).Digital Infrastructure Team Lead.This is an opportunity to make a significant impact in a fast-paced, innovative environment. If you’re passionate about buildi...Show more
    Last updated: 12 days ago • Promoted
    Senior Site Reliability / Gitops Engineer

    Senior Site Reliability / Gitops Engineer

    Canonical • Montreal
    Full-time
    Senior Site Reliability / Gitops Engineer.Join Canonical, a leading provider of open‑source software and operating systems, as a Senior Site Reliability / Gitops Engineer.In this role you will driv...Show more
    Last updated: 30+ days ago • Promoted