Senior Site Reliability Engineer

Royal Bank of Canada>
MISSISSAUGA, Canada
160K $-180K $ / an (estimé)
Temps plein
Nous sommes désolés. L'offre d'emploi que vous recherchez n'est plus disponible.

Job Summary

Job Description

What is the opportunity?

RBC Insurance Technology is seeking to hire a Senior Site Reliability Engineer for its Insurance Technology Platform Support team.

The Insurance Technology Platform Support Team is a specialized unit dedicated to ensuring the optimal performance, availability, and resilience of IT applications used in the insurance line of business.

With a unique blend of technical expertise and industry-specific knowledge, this team plays a critical role in ensuring the seamless operations of digital services that cater to both the business's internal and external stakeholders.

As a Senior Site Reliability Engineer, you will bring the engineering mindset of bold ambition, curiosity and outcome focus to ensuring the performance and reliability of our systems.

This role calls for a dynamic individual who excels in a collaborative environment, interacting with cross-functional teams to establish best practices for observability, monitoring, logging, alerting, and automation.

This role will be responsible for the development, implementation, and support of Site Reliability Engineering (SRE) solutions for applications supported by RBC Insurance Technology.

You'll leverage your proficiency in Elasticsearch, Ansible, GitHub Actions, Moogsoft, PagerDuty, Dynatrace and scripting languages to build and maintain robust automation and SRE tooling.

What will you do?

Set vision for SRE product base (monitoring, alerting, machine learning anomaly detection, self-healing, reliability testing)

Lead cross-functional collaborations to define and implement best practices for monitoring, logging, and incident response, driving a proactive stance on system health.

Implement and manage automation processes with Ansible and GitHub Actions to streamline operational tasks.

Develop and maintain custom tooling and automation scripts in languages like Bash, Python, and PowerShell to enhance operational efficiency and system reliability.

Work closely with development teams to understand code changes and their impact on the production environment, ensuring that new releases meet our reliability standards.

Actively contribute to the definition and tracking of SLIs, SLOs, and other critical metrics, refining our alerting and monitoring strategies accordingly.

Document and maintain comprehensive runbooks, facilitating quick resolution of incidents and reducing mean time to recovery (MTTR).

Create and refine custom tooling and automation scripts using languages such as Bash, Python, and PowerShell, supporting the infrastructure's scalability and reliability needs.

Guide the technical direction for future deployments, advocating for reliability and performance improvements based on industry trends and company objectives.

Mentor team members in building out robust monitoring and alerting strategies based on well-defined SLIs and SLOs.

Act as portfolio SME (Subject Matter Expert) understand & document common components, core functionalities, infrastructure of supported applications.

Lead in incident management and problem management for applications in scope and RCA Action items fulfillment / ownership.

Drive transformation by continuously looking for ways to automate existing processes.

Debug production issues across services and levels of the stack and provide primary operational support.

Perform production support role, including off-hours support (As part of an oncall rotation)

Must-have :

4+ years of SRE or Systems Engineering experience with a proven record in technical leadership.

Bachelor’s degree in Computer Science, Engineering, or a related field, or equivalent experience.

Expertise in infrastructure-as-code and configuration management, particularly Ansible.

Advanced scripting capabilities in Bash, Python, PowerShell, or other similar languages.

In-depth knowledge of tools such as Elasticsearch, Ansible, GitHub, OpenShift, Kubernetes, Dynatrace, Kafka, and their role in system reliability.

Knowledge of creating, maintaining, and alerting on SLIs, SLOs, and other reliability metrics.

Nice-to-have :

Insurance industry experience

In-depth hands-on experience in a variety of SRE tools (Azure Automation, Catchpoint, Prometheus, Splunk, Grafana)

Familiarity with containerization technologies such as Docker.

Hands-on experience with DevOps CI-CD tools e.g. Jenkins, Artifactory and Vault

Soft Skills :

Excellent communication skills to foster collaboration across departments.

A resilient problem-solving approach, capable of leading the charge during high-stress incidents.

Strategic thinking and analytical prowess, with a focus on delivering reliable and performant systems.

Organizational skills to manage multiple priorities in a fast-paced environment.

RBC is committed to supporting flexible work arrangements when and where available. Details to be discussed with Hiring Manager.

What’s in it for you?

We thrive on the challenge to be our best, progressive thinking to keep growing, and working together to deliver trusted advice to help our clients thrive and communities prosper.

We care about each other, reaching our potential, making a difference to our communities, and achieving success that is mutual.

A comprehensive Total Rewards Program including bonuses and flexible benefits, competitive compensation, commissions, and stock where applicable

Leaders who support your development through coaching and managing opportunities

Ability to make a difference and lasting impact

Work in a dynamic, collaborative, progressive, and high-performing team

A world-class training program in financial services

Flexible work / life balance options

Opportunities to do challenging work

Job Skills

Agile Methodology, Application Infrastructure, Group Problem Solving, IT Automation, IT Monitoring, Operations Support, Production Support, Software Development Life Cycle (SDLC), Software Engineering, Software Product Technical Knowledge, System Applications, Systems Software

Additional Job Details

Address :

MEADOWVALE BUSINESS PARK, 6880 FINANCIAL DR : MISSISSAUGA

City : MISSISSAUGA

MISSISSAUGA

Country : Canada

Canada

Work hours / week : 37.5

37.5

Employment Type : Full time

Full time

Platform :

Technology and Operations

Job Type : Regular

Regular

Pay Type : Salaried

Salaried

Posted Date : 2024-05-03

2024-05-03

Application Deadline :

2024-05-17

I nclusion and Equal Opportunity Employment

At RBC, we embrace diversity and inclusion for innovation and growth. We are committed to building inclusive teams and an equitable workplace for our employees to bring their true selves to work.

We are taking actions to tackle issues of inequity and systemic bias to support our diverse talent, clients and communities.

We also strive to provide an accessible candidate experience for our prospective employees with different abilities. Please let us know if you need any accommodations during the recruitment process.

Join our Talent Community

Stay in-the-know about great career opportunities at RBC. Sign up and get customized info on our latest jobs, career tips and Recruitment events that matter to you.

Expand your limits and create a new future together at RBC. Find out how we use our passion and drive to enhance the well-being of our clients and communities at jobs.rbc.com .

Il y a 14 jours
Emplois reliés
Jobber
Canada
Télétravail

Senior Site Reliability Engineer. Reporting to a Senior Manager, Product Engineering, the. Our Software Engineering team is pivotal to Jobber's success, creating software that adds value to tens of thousands of users worldwide. As a part of our cloud infrastructure team (SRE), you'll play a critical...

Royal Bank of Canada
Mississauga, Ontario

As a Senior Site Reliability Engineer, you will bring the engineering mindset of bold ambition, curiosity and outcome focus to ensuring the performance and reliability of our systems. Senior Site Reliability Engineer. This role will be responsible for the development, implementation, and support of ...

Company 1 - The Manufacturers Life Insurance Company
Toronto, Ontario

We are seeking a self-motivated Senior Site Reliability Engineer in our Identity and Access Management space, who is obsessed with delivering value, is forward-thinking, and excited to see the successful implementation of the products delivered. As the Senior Site Reliability Engineer, you will:. Cr...

Mimecast
Mississauga, Ontario
Télétravail

Senior Site Reliability Engineer. As a Site Reliability Engineer within the Communication and Collaboration Security team, you’ll play an integral role in ensuring our code, tools, and deployments are consistent, high quality, and continually optimized. Larry V, Senior Director of Quality Engineerin...

Mimecast
Mississauga, Ontario

Senior Site Reliability Engineer. As a Site Reliability Engineer within the Communication and Collaboration Security team, you'll play an integral role in ensuring our code, tools, and deployments are consistent, high quality, and continually optimized. Larry V, Senior Director of Quality Engineerin...

MongoDB
Toronto, Ontario

The Cloud Site Reliability Engineering Team designs and builds the global infrastructure on which we deploy our services. ...

Hopper
Toronto, Ontario

We are looking for a senior Site Reliability Engineer to join the Platform Infrastructure team at Hopper. Strong background in SRE, DevOps, Software Engineering or Systems engineering. We manage a large infrastructure in Google Cloud that is used by hundreds of engineers worldwide to provide a first...

Randstad Canada
Mississauga, Ontario

One of our major client is looking for a Site Reliability Engineer - SRE to join their fantastic team. ...

Electronic Arts
Toronto, Ontario

Work as a technical liaison with development teams to address build issues and improvements.Create, modify, and maintain pipelines and workflow tools.Write application code to enhance various tools in the system.Collaborate with team-mates to maintain and enhance an automation pipeline.Monitor autom...

Lightspeed
Toronto, Ontario

We’re looking for a Principal Site Reliability Engineer to join our NuOrder by Lightspeed team in North America. Build platform solutions and apply software engineering principles to improve the reliability of our software and accelerate software delivery. You'll join a team responsible for supporti...