Talent.com
Specialist Site Reliability Engineer
Specialist Site Reliability EngineerGlobal Talent Alliance, Canada • Montreal (administrative region), QC, CA
Specialist Site Reliability Engineer

Specialist Site Reliability Engineer

Global Talent Alliance, Canada • Montreal (administrative region), QC, CA
6 days ago
Job type
  • Full-time
Job description

About the job Specialist Site Reliability Engineer

(#11072)

The role of the Specialist Site Reliability Engineer (SRE) is to execute RAM analysis and engineering in support of the I&T solutions. The overall mandate is to ensure that these solutions have attributes of high robustness, reliability, and availability. This involves system and product analysis, modeling and requirements assessment during the development phase and the analysis of field RAM data to determine solution RAM KPIs and to drive corrective action programs. With the advent of Cloud Computing there is also a need for a RAM specialist that is well versed in Cloud based technologies as well as solution architectures for the cloud.

Separate specializations may exist for hardware and software RAM. The technologies used are primarily distributed digital control systems, communication networks, Global Navigation Satellite Systems (GNSS), embedded and virtualized computing as well as Cloud based solutions.

Main Responsibilities

Solution RAM Assessments

  • Review and approve solution requirements for RAM
  • Determine non-functional requirements and targets for RAM performance
  • Perform analysis and modeling to predict RAM behaviour
  • Adhere to the I&T Development Process

Solution RAM Field Performance

  • Assign requirements to solutions and products to ensure they support the ability to measure RAM Key Performance Indicators (KPIs)
  • Use the field performance measurement to identify key contributors and drive corrective action plans when necessary
  • Review vendor specifications, test results, analysis artifacts
  • Participate in failure review board for selected vendors
  • Review corrective action plans from the vendors
  • Drive to completion the vendor corrective action plans
  • Use the field performance measurement to identify key contributors and drive corrective action plans when necessary
  • Requirements

    Experience

  • Minimum 5-10 years overall work experience
  • Minimum 5 years experience in RAM engineering for complex systems, or 7 years experience in product development for high reliability / availability, or safety critical systems with accountability for product field performance
  • Skills / Knowledge

    Knowledge of hardware and / or software design and development practices and processes with focus on high reliability and high availability applications

  • Knowledge of RAM analysis techniques such as failure rate prediction, Reliability Block Diagrams (RBD), Markov models, Monte Carlo methods, Failure Modes Effects Analysis (FMEA), Fault Tree Analysis (FTA)
  • Analysis of reliability and failure field data, statistical estimation, Root Cause Analysis (RCA)
  • Critical thinking and judgement
  • Ability to assimilate new information quickly and apply to the assignment
  • Ability to deliver with autonomy
  • Organizing work to support multiple projects in parallel
  • Knowledge and / or experience in the following areas

  • Multi-Cloud / Multi-Zone-Based designs with High Availability (HA)
  • Compute Infrastructure : Google Compute Engine (GCE) (servers, databases, firewalls, load balancers, networking and storage)
  • Services for Google Cloud Platform (GCP)
  • Databases including NoSQL Databases, Big Data technologies (Oracle, SQL Server, Postgres, Spark, Hadoop, Cloud databases)
  • Application development concepts and technologies (CI / CD, Java, Python)
  • Education / Certification / Designation

  • Bachelors degree in Electrical Engineering, Mechanical Engineering, Computer Science, Computer Engineering or equivalent degree & experience
  • Assets

  • Knowledge of product design and standards for the rail industry
  • Knowledge of rail industry or other transportation industry operations
  • Working Conditions

    This role may require occasional business travel within North America in accordance with company policy

    #J-18808-Ljbffr

    Create a job alert for this search

    Site Reliability Engineer • Montreal (administrative region), QC, CA

    Similar jobs
    Site Reliability Engineer

    Site Reliability Engineer

    TMC Canada • Montreal
    Full-time +1
    The Application Infrastructure (AI) department is seeking a Site Reliability Engineer (SRE) to help drive the reliability engineering, operations and customer support services for Morgan Stanley's ...Show more
    Last updated: 2 days ago • Promoted
    Reliability Specialists

    Reliability Specialists

    Laurentide Controls ltd • Montreal
    Full-time
    Come join the largest supplier of automation and reliability solutions in our region.Industry thrive in Eastern Canada.Several positions for contract assignments are available, offering flexible wo...Show more
    Last updated: 13 days ago • Promoted
    Reliability Engineering Lead (contract)

    Reliability Engineering Lead (contract)

    Capgemini • Montreal
    Full-time
    Reliability Engineering Lead (contract).Be among the first 25 applicants.Reliability Engineering Lead (contract).Get AI-powered advice on this job and more exclusive features.Mirabel facility in Qu...Show more
    Last updated: 30+ days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    High Tech Genesis • Montreal, QC, CA
    Full-time
    At HTG, you’ll push boundaries with the latest tech and collaborate with a team that loves what they do.Be part of a design services company that is amongst the companies that lead the world in tec...Show more
    Last updated: 30+ days ago
    Site Reliability Engineer

    Site Reliability Engineer

    ApTask • Montreal
    Full-time
    Direct message the job poster from ApTask.Looking for an intermediate between 2 to 5 years' experience.The Application Infrastructure (Al) department is seeking a Site Reliability Engineer (SRE) to...Show more
    Last updated: 30+ days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    AKUR8 • Montreal
    Full-time
    Akur8 is a fast-growing Insurtech scale‑up that transforms insurance pricing and reserving with transparent machine learning. Our SaaS platform injects speed, performance and reliability into insure...Show more
    Last updated: 30+ days ago • Promoted
    Site Reliability Engineering Specialist (Hybrid)

    Site Reliability Engineering Specialist (Hybrid)

    Morgan Stanley • Montreal
    Full-time
    Site Reliability Engineering Specialist (Hybrid).Site Reliability Engineering Specialist (Hybrid).We're seeking someone to join our Data Protection Fleet as a Site Reliability Engineering (SRE) Spe...Show more
    Last updated: 14 days ago • Promoted
    Specialist Site Reliability Engineer

    Specialist Site Reliability Engineer

    Global Talent Alliance, Canada • Montreal
    Full-time
    About the job Specialist Site Reliability Engineer.The role of the Specialist Site Reliability Engineer (SRE) is to execute RAM analysis and engineering in support of the I&T solutions.The overall ...Show more
    Last updated: 7 days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Compunnel, Inc. • Montreal
    Full-time
    Client’s Application Infrastructure (AI) division is seeking a Site Reliability Engineer (SRE) to join the Client Development Environment team. This role is focused on driving reliability, operation...Show more
    Last updated: 30+ days ago • Promoted
    Site Reliability Engineer (SRE)

    Site Reliability Engineer (SRE)

    Open Systems Technologies • Montreal
    Full-time
    The Application Infrastructure (AI) department is seeking a Site Reliability Engineer (SRE) to help drive the reliability engineering, operations and customer support services for Morgan Stanley's ...Show more
    Last updated: 2 days ago • Promoted
    NetSuite Systems Engineer

    NetSuite Systems Engineer

    MADE BY GATHER • Montreal, QC, Canada
    Full-time
    Founded in 2003 by entrepreneur Shae Hong, Made by Gather makes super-premium design and innovation accessible to the world through our kitchenware and lifestyle brands. We’ve spent 20+ years develo...Show more
    Last updated: 23 days ago • Promoted
    Site Reliability Engineering Specialist (Hybrid)

    Site Reliability Engineering Specialist (Hybrid)

    PowerToFly • Montreal
    Full-time
    We're seeking someone to join our Data Protection Fleet as a Site Reliability Engineering (SRE) Specialist in Cyber to help drive performance, reliability, enhanced observability and efficiency for...Show more
    Last updated: 14 days ago • Promoted
    Senior Site Reliability Expert (Retail)

    Senior Site Reliability Expert (Retail)

    Lightspeed • Montreal
    Full-time
    Are you actively seeking a new opportunity, or simply exploring the market? Either way, you might have just found the right place!. We’re looking for a Senior SRE to join our Lightspeed Retail group...Show more
    Last updated: 2 days ago • Promoted
    Site Reliability Engineer w / Python (Onsite Hybrid)

    Site Reliability Engineer w / Python (Onsite Hybrid)

    NTT DATA North America • Montreal
    Full-time
    Site Reliability Engineer / ServiceNow SaaS (Onsite Hybrid).NTT DATA is seeking a Site Reliability Engineer to join our Montreal, Quebec, Canada team. The position is onsite‑hybrid, requiring office a...Show more
    Last updated: 30+ days ago • Promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    Targeted Talent • Montreal, QC, Canada
    Permanent
    We are looking for an experienced.Senior Site Reliability Engineer.Our client is a global enterprise company with a product that you've likely used. Experience with coding / software development, ...Show more
    Last updated: 30+ days ago • Promoted
    Site Reliability Engineer / Platform Operations Engineer

    Site Reliability Engineer / Platform Operations Engineer

    Targeted Talent • Montreal, QC, Canada
    Permanent
    We are looking for an experienced Site Reliability Engineer or Platform Operations Engineer for our client.This is a permanent position that is remote to start with later relocation to.Our client i...Show more
    Last updated: 30+ days ago • Promoted
    Senior Site Reliability Engineer (SRE)

    Senior Site Reliability Engineer (SRE)

    Intelcom | Dragonfly • Montreal
    Full-time
    Be among the first 25 applicants.Get AI-powered advice on this job and more exclusive features.Incident Management : Detect and respond to issues, ensuring rapid recovery to minimize downtime.Curren...Show more
    Last updated: 30+ days ago • Promoted
    Site Reliability Engineer w / Python (Onsite Hybrid)

    Site Reliability Engineer w / Python (Onsite Hybrid)

    NTT DATA, Inc. • Montreal
    Full-time
    Site Reliability Engineer w / Python (Onsite Hybrid).NTT DATA strives to hire exceptional, innovative and passionate individuals who want to grow with us. If you want to be part of an inclusive, adapt...Show more
    Last updated: 30+ days ago • Promoted