Talent.com
Principal Site Reliability Engineering Specialist (SRE)
Principal Site Reliability Engineering Specialist (SRE)CGI • Vancouver, Canada
Principal Site Reliability Engineering Specialist (SRE)

Principal Site Reliability Engineering Specialist (SRE)

CGI • Vancouver, Canada
16 days ago
Job type
  • Full-time
Job description

Position Description :

Location : Edmonton

Open to other locations within proximity to a CGI Office

Hybrid work model

We are hiring a Senior Site Reliability Engineer (SRE) with a strong foundation in building and operating reliable, scalable, and resilient cloud platforms. You bring a reliability and performance engineering mindset to everything you do—balancing operational stability with modernization and automation. In this role, you will apply core SRE practices—including SLIs / SLOs, observability, incident management, and operational automation—while temporarily supporting a regional support strategy engagement focused on assessing and strengthening large-scale operational environments. You will work closely with platform, operations, and architecture teams to evaluate current-state practices, identify reliability and support gaps, and contribute to the definition of future-state operating models and implementation roadmaps. Beyond this engagement, the role is designed for ongoing, hands-on SRE delivery, where you will lead and implement monitoring, reliability engineering, automation, and tooling across cloud and hybrid environments. You will collaborate with cross-functional teams to design, build, and continuously improve platform reliability, engineering standards, and operational excellence practices for mission-critical services. This position places you in a client-facing, high-impact environment, where your technical depth, operational judgment, and ability to translate reliability principles into practical outcomes will directly influence service stability, modernization efforts, and future cloud initiatives. If you are a proven SRE who thrives in complex environments and values both hands-on engineering and operational leadership, this role offers the opportunity to make a meaningful and lasting impact.

Your future duties and responsibilities :

Who are You?

You are a senior Site Reliability Engineer who thrives on solving complex reliability and operational challenges at scale. You are curious, collaborative, and continuously focused on improving how platforms, infrastructure, and services are operated and supported. Your strength lies in applying sound engineering judgment to real-world operational problems, balancing reliability, performance, and maintainability. You are equally comfortable working hands-on with tools and systems and stepping back to assess how operational practices, support models, and workflows impact service reliability. You can engage confidently in technical discussions with engineers while also communicating clearly with operational leaders and stakeholders to explain risks, trade-offs, and improvement opportunities.

With a mindset grounded in continuous improvement and learning, you champion modernization, automation, and pragmatic reliability practices. You are trusted for your ability to identify root causes rather than symptoms, to raise concerns early, and to translate reliability principles into practical, actionable outcomes. Your peers value your technical depth and calm leadership in complex environments, and teams rely on you to elevate operational maturity and execution quality. At CGI, we recognize strong SRE practitioners and provide the environment and support for them to grow, contribute, and make a meaningful impact across engagements.

Responsibilities

  • Develop, operate, and evolve monitoring, logging, and alerting capabilities across cloud and hybrid environments, while temporarily contributing SRE expertise to assess and rationalize existing operational monitoring practices as part of a regional support strategy initiative.
  • Define, implement, and continuously improve SLIs, SLOs, and SLAs for platform and service reliability, applying these principles during the engagement to evaluate current-state service outcomes and inform future-state reliability targets.
  • Lead and participate in incident response, problem investigation, and root cause analysis, leveraging hands-on SRE experience to identify systemic reliability issues and recurring operational failure patterns observed across regional support operations.
  • Design and automate reliability and operational processes, including integration with CI / CD pipelines and operational workflows, while contributing insights into where automation and tooling can reduce manual effort and improve support consistency across regions.
  • Collaborate closely with DevOps, platform engineering, architecture, and application teams, providing SRE leadership during this engagement and transitioning seamlessly to tool- and platform-heavy delivery roles on future projects.
  • Analyze and document current operational workflows, support models, and escalation paths, translating frontline operational insights into actionable reliability and service improvement recommendations.
  • Contribute to the definition of future-state operating models and implementation roadmaps by applying SRE and operational excellence principles to improve reliability, supportability, and scalability.
  • Provide regular status updates and risk assessments, highlighting operational risks, dependencies, and reliability impacts to support informed decision-making.

Required qualifications to be successful in this role :

  • 5+ years of experience in Site Reliability Engineering, platform engineering, or infrastructure operations, with demonstrated ability to apply reliability principles across both delivery and operational contexts.
  • Strong proficiency with observability and monitoring platforms such as Grafana, Prometheus, ELK, New Relic, or equivalent, with the ability to assess, design, and improve monitoring strategies in complex environments.
  • Hands-on experience operating cloud platforms (Azure, AWS, and / or GCP), including production support, reliability engineering, and operational troubleshooting.
  • Strong automation and scripting skills using tools such as Python, Bash, Ansible, or equivalent, with a mindset focused on reducing toil and improving operational efficiency.
  • Excellent communication skills in English (French considered an asset), with the ability to clearly articulate technical concepts to both technical and non-technical stakeholders.
  • Proven track record of improving system reliability, availability, and operational stability, including measurable reductions in incident frequency or impact.
  • Experience analyzing and documenting operational workflows, support models, and escalation paths within IT or platform operations environments.
  • Ability to facilitate technical and operational workshops with engineers, operations teams, and service stakeholders to validate findings and align on improvements.
  • Working knowledge of ITSM / ITIL practices (Incident, Problem, Change), particularly as they relate to reliability, supportability, and operational maturity.
  • Experience working in regulated, enterprise, or public-sector environments where documentation quality, security classification, and auditability are required.
  • CGI is providing a reasonable estimate of the pay range for this role. The determination of this range includes factors such as skill set level, geographic market, experience and training, and licenses and certifications. Compensation decisions depend on the facts and circumstances of each case. A reasonable estimate of the current range is $90,–$,. This role is a future opportunity.

    #LI-AB19

    Use of the term ‘engineering’ in this job posting refers to the technical sense related to Information Technology (IT) and does not imply that the individual practices engineering or possesses the requisite license as prescribed by the applicable provincial or territorial engineering regulator. We are seeking individuals with expertise in IT engineering-related functions, but licensure from an engineering regulator is not a prerequisite for this position. Engineering is a regulated profession in Canada which is restricted in terms of use of titles and designation.

    Skills :

  • Finance&Ops Apps Solution Arch
  • Create a job alert for this search

    Principal Site Reliability Engineering Specialist SRE • Vancouver, Canada

    Similar jobs
    Platform Engineering Leader — Scale, Reliability & Growth

    Platform Engineering Leader — Scale, Reliability & Growth

    Hiive • Vancouver
    Full-time
    A leading fintech startup in Vancouver is seeking an Associate Director of Engineering, Platform.You will lead and mentor teams in DevOps, Site Reliability Engineering, and QA, ensuring operational...Show more
    Last updated: 17 days ago • Promoted
    Engineering Manager - High-Impact S3 Initiative

    Engineering Manager - High-Impact S3 Initiative

    Amazon Web Services (AWS) • Vancouver
    Full-time
    A leading cloud services provider in Vancouver is seeking a Software Development Manager to lead engineering teams focused on improving internal communication protocols for one of their foundationa...Show more
    Last updated: 21 hours ago • Promoted • New!
    Safety Manager - Vortex Companies - Trenchless Infrastructure Rehabilitation Solutions

    Safety Manager - Vortex Companies - Trenchless Infrastructure Rehabilitation Solutions

    Vortex Companies - Trenchless Infrastructure Rehabilitation Solutions • delta, bc, ca
    Full-time
    Cette offre d'emploi est disponible en français.Si vous avez besoin de la version anglaise, elle est fournie uniquement pour la commodité. L'employeur soutient l'équité en matière d'emploi et encour...Show more
    Last updated: 1 day ago • Promoted
    Senior Site Reliability Engineer - Distributed Systems & Platforms

    Senior Site Reliability Engineer - Distributed Systems & Platforms

    Apple • Vancouver
    Full-time
    A leading tech company in Metro Vancouver is seeking Site Reliability Engineers to develop processes and tools for managing distributed systems. The role involves building scalable services and coll...Show more
    Last updated: 17 days ago • Promoted
    Staff Site Reliability Engineer (Staff SRE)

    Staff Site Reliability Engineer (Staff SRE)

    Walt Disney Animation Studios • Vancouver
    Full-time
    Staff Site Reliability Engineer (Staff SRE).Walt Disney Animation Studios’ world‑class filmmakers, artists, and technical collaborators create the magic of animation. Bring your unique talents, pass...Show more
    Last updated: 17 days ago • Promoted
    SRE Specialist

    SRE Specialist

    Fortinet, Inc. • Burnaby
    Full-time
    We are the SSP (Support Systems and Processes).Fortinet and passionate about building, improving, and maintaining various information systems that serve our employees worldwide, as well as consumer...Show more
    Last updated: 17 days ago • Promoted
    Team Lead, Systems Engineering

    Team Lead, Systems Engineering

    OSI Maritime Systems Ltd. • Burnaby
    Full-time
    Posted Thursday, October 16, 2025 at 10 : 00 a.At OSI Maritime Systems, we pride ourselves on delivering world-class navigation and bridge systems. With decades of experience serving military customer...Show more
    Last updated: 17 days ago • Promoted
    Structural Field Review Specialist — On-Site, Employee-Owned

    Structural Field Review Specialist — On-Site, Employee-Owned

    Read Jones Christoffersen Ltd. • Vancouver
    Full-time
    A prominent engineering firm is seeking a Construction Field Review Representative in Metro Vancouver.This full-time role involves reviewing on-site work for varied projects, engaging with clients ...Show more
    Last updated: 17 days ago • Promoted
    Senior Director, Development

    Senior Director, Development

    Thor Companies • richmond, BC, ca
    Full-time
    My client is looking to hire Senior Director, Development.This role will see you execute the comprehensive development lifecycle for large-scale data center infrastructure campuses.This executive r...Show more
    Last updated: 1 day ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    BNB Chain • Vancouver
    Full-time
    Founded in 2021, LayerZero’s vision is to create a community of cross-chain developers, building dApps that are no longer constrained by individual blockchain capabilities.With LayerZero's simple, ...Show more
    Last updated: 2 days ago • Promoted
    Engineering Lead, Offers Platform & Catalog Systems

    Engineering Lead, Offers Platform & Catalog Systems

    Amazon • Vancouver
    Full-time
    A leading e-commerce company based in Vancouver is seeking a Software Development Manager to lead engineering teams and drive the delivery of product capabilities. The ideal candidate will have exte...Show more
    Last updated: 17 days ago • Promoted
    Seed Capital Investor – Sustainable Aviation Fuel (SAF) Project Company - DHT Energy Corp

    Seed Capital Investor – Sustainable Aviation Fuel (SAF) Project Company - DHT Energy Corp

    DHT Energy Corp • delta, bc, ca
    Full-time
    Canadian headquartered multinational energy transition developer is seeking one additional seed capital investor for a large-scale SAF project currently advancing in Latin America.The project addre...Show more
    Last updated: 2 days ago • Promoted
    Epicor Kinetic Implementation Specialist - Tenth Revolution Group

    Epicor Kinetic Implementation Specialist - Tenth Revolution Group

    Tenth Revolution Group • delta, bc, ca
    Full-time
    Job Description : Epicor Kinetic Implementation Consultant.Epicor Kinetic Implementation Consultant.ERP implementations for manufacturing and distribution clients. This role requires strong expertise...Show more
    Last updated: 6 days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    ScalePad • Vancouver
    Full-time
    ScalePad is a market‑leading SaaS company headquartered in Vancouver, Toronto, Montreal and Phoenix, AZ.With a global employee reach, we serve over 12,000 MSPs worldwide, helping them increase clie...Show more
    Last updated: 17 days ago • Promoted
    LNG Reliability & Integrity Engineer – Onsite

    LNG Reliability & Integrity Engineer – Onsite

    Woodfibre LNG • Squamish
    Full-time
    A leading energy firm is seeking a Reliability & Integrity Engineer to support operations at the LNG plant in Squamish, BC. This role involves ensuring technical integrity, managing risks, and leadi...Show more
    Last updated: 17 days ago • Promoted
    Product Reliability Engineer

    Product Reliability Engineer

    Motorola Solutions • Vancouver
    Full-time
    Motorola Solutions values your privacy • •.Product Reliability Engineer page is loaded## Product Reliability Engineerlocations : Vancouver, Canadatime type : Full timeposted on : Posted Todayjob r...Show more
    Last updated: 17 days ago • Promoted
    Mechanical Engineering Lead

    Mechanical Engineering Lead

    Scout Talent • Vancouver, British Columbia, Canada
    Full-time
    Combine technical leadership with hands-on mechanical design in a fast-paced engineering team.Enjoy a $100,000–$125,000 annual salary with performance-based incentives and benefits.Build your...Show more
    Last updated: 24 days ago • Promoted
    D365 CE Solutions Specialist

    D365 CE Solutions Specialist

    R2 Global • delta, bc, ca
    Full-time
    Dynamics 365 Solutions Specialist.Microsoft Dynamics 365 CRM platform.This role is ideal for someone who thrives at the intersection of business process optimization, functional solution design, an...Show more
    Last updated: 1 day ago • Promoted