Recherche d'emploi > Vancouver, BC > Site reliability engineer

Senior Site Reliability Engineer

TEEMA
Vanouver, BC, CA
100K $-175K $ / an (estimé)
Temps plein
Quick Apply

MUST LIVE IN CANADA NEAR AN AIRPORT

Looking for a technical lead with 10+ years of DevOps / SRE experience

MUST HAVE - 5+ years permanent residence or Citizenship (cant have lived out of Canada for the last 5 years)

MUST LIVE IN CANADA NEAR AN AIRPORT

Looking for a technical lead with 10+ years of DevOps / SRE experience

Monitoring and logging services are a must 2 or 3 of them that are listed and Orchestration

Close to a city with the ability to traveling up to 4 X a year to Vancouver.

1st - technical interview

Team Size - 2 team members already onboarded plus manager

Work is very meaningful - province wide for public safety - big project roll out.

Pensioned position - municipality pension plan is better. Stable and room to grow.

Our client is seeking a highly skilled and motivated Site Reliability Engineer (SRE) to join their dynamic and innovative team.

As an SRE, you will play a critical role in maintaining and enhancing the reliability, availability, and performance of their systems.

Your expertise in both software engineering and systems administration will be key in building and automating scalable infrastructure solutions.

In this role, you will be responsible for improving the reliability and performance of production applications and infrastructure with a focus of automation, system design and improvements to system resilience.

We are seeking a technical expert who understands the criticality of our systems and who is able to manage risk and support the improvement of more resilient and reliable technological capabilities.

What you will be doing :

Collaborate with cross-functional teams to design, deploy, and maintain reliable and scalable services

Implement best practices for monitoring, logging, and alerting to ensure rapid detection and resolution of issues

Troubleshoot and resolve incidents related to the infrastructure, applications, and network to minimize downtime and improve system reliability

Participate in capacity planning and performance optimization efforts to handle increasing user demands and traffic growth

Develop and maintain automation tools for configuration management, deployment, and continuous integration / continuous deployment (CI / CD) pipelines

Conduct thorough post-incident reviews and work towards preventing similar incidents in the future

Perform regular security assessments and ensure compliance with industry standards and regulations

Stay up-to-date with the latest technologies and industry trends to propose innovative solutions and improvements

What you must have :

Completion of a degree or diploma program in computer science or a related discipline plus 5 years of related experience, or an equivalent combination of training and experience

ITIL Foundation v3 or later accreditation preferred

Sound experience (5+ years) of running services in a large scale enterprise environment

Experience in one of the leading cloud platforms such as AWS, Azure or Google Cloud

Experience with distributed monitoring and logging solutions (such as Prometheus, Thanos, Splunk, Elasticsearch, Grafana, Dynatrace, New Relic, Honeycomb)

Experience with containers and container orchestration (such as docker, podman, kubernetes)

Experience with DevOps platform (such Gitlab, Github, Azure DevOps, Teamcity, Octopus)

Knowledge of application performance monitoring (such as Dynatrace, New Relic, Appdynamics)

Knowledge of Scaling, Capacity Planning and Disaster Recovery

Knowledge of Chaos Engineering

Ability to design, author, and release code in any language (Go, Python, Ruby or Java would be a plus)

Monitoring and logging services are a must 2 or 3 of them that are listed and Orchestration

Close to a city with the ability to traveling up to 4 X a year to Vancouver.

1st - technical interview

Team Size - 2 team members already onboarded plus manager

Work is very meaningful - province wide for public safety - big project roll out.

Pensioned position - municipality pension plan is better. Stable and room to grow.

Our client is seeking a highly skilled and motivated Site Reliability Engineer (SRE) to join their dynamic and innovative team.

As an SRE, you will play a critical role in maintaining and enhancing the reliability, availability, and performance of their systems.

Your expertise in both software engineering and systems administration will be key in building and automating scalable infrastructure solutions.

In this role, you will be responsible for improving the reliability and performance of production applications and infrastructure with a focus of automation, system design and improvements to system resilience.

We are seeking a technical expert who understands the criticality of our systems and who is able to manage risk and support the improvement of more resilient and reliable technological capabilities.

What you will be doing :

Collaborate with cross-functional teams to design, deploy, and maintain reliable and scalable services

Implement best practices for monitoring, logging, and alerting to ensure rapid detection and resolution of issues

Troubleshoot and resolve incidents related to the infrastructure, applications, and network to minimize downtime and improve system reliability

Participate in capacity planning and performance optimization efforts to handle increasing user demands and traffic growth

Develop and maintain automation tools for configuration management, deployment, and continuous integration / continuous deployment (CI / CD) pipelines

Conduct thorough post-incident reviews and work towards preventing similar incidents in the future

Perform regular security assessments and ensure compliance with industry standards and regulations

Stay up-to-date with the latest technologies and industry trends to propose innovative solutions and improvements

What you must have :

Completion of a degree or diploma program in computer science or a related discipline plus 5 years of related experience, or an equivalent combination of training and experience

ITIL Foundation v3 or later accreditation preferred

Sound experience (5+ years) of running services in a large scale enterprise environment

Experience in one of the leading cloud platforms such as AWS, Azure or Google Cloud

Experience with distributed monitoring and logging solutions (such as Prometheus, Thanos, Splunk, Elasticsearch, Grafana, Dynatrace, New Relic, Honeycomb)

Experience with containers and container orchestration (such as docker, podman, kubernetes)

Experience with DevOps platform (such Gitlab, Github, Azure DevOps, Teamcity, Octopus)

Knowledge of application performance monitoring (such as Dynatrace, New Relic, Appdynamics)

Knowledge of Scaling, Capacity Planning and Disaster Recovery

Knowledge of Chaos Engineering

Ability to design, author, and release code in any language (Go, Python, Ruby or Java would be a plus)

Il y a 16 jours
Emplois reliés
Taurus SA
Vancouver, Colombie-Britannique

We are seeking talented SREs to build our Solutions Engineering team in Vancouver. You’ll partner with clients to uncover requirements and work closely with our engineering teams, creating roadmaps, architecting solutions, and executing on them. Ensure operational excellence of Taurus' managed servi...

Activision Blizzard
Vancouver, Colombie-Britannique

We are looking for a Senior Site Reliability Engineer to join our Progression Team at Demonware. We work alongside engineers and creatives at our AAA partner studios and deliver the online and data services required by our massive franchises. Apply your technical expertise to maximize the scalabilit...

Mojio
Canada

Title: Senior Site Reliability  Engineer. ...

Jobber
Canada
Télétravail

Senior Site Reliability Engineer. Reporting to a Senior Manager, Product Engineering, the. Our Software Engineering team is pivotal to Jobber's success, creating software that adds value to tens of thousands of users worldwide. As a part of our cloud infrastructure team (SRE), you'll play a critical...

Dapper Labs
Vancouver, Colombie-Britannique

We’re looking for a Site Reliability Engineer who wants to be at the technical core of an organization that’s completely reshaping how distributed applications on blockchains can reach massive audiences. You will join a Site Reliability Engineering team that has the ability to architect, build, and ...

Electronic Arts Inc
Burnaby, Colombie-Britannique

The CPE team is looking for a talented Site Reliability Engineer to join our team. Site Reliability Engineer III. Technology and engineering leadership at EA is essential to making the industry's best games and services and the EADP team is leading the way to providing the cross-platform infrastruct...

Global Relay
Vancouver, Colombie-Britannique

In this DevOps/SRE role, you will be responsible for the reliability and smooth operation of your service in both production and test environments. Ensure service reliability and up-time in production and test environments. ...

FortisBC
Surrey, Colombie-Britannique

We need someone like you to step into our Senior Engineer, Maintenance Planning and Reliability position. Eligible for registration as a Professional Engineer with the Association of Professional Engineers and Geoscientists of BC. Develop and maintain strong business relationships with stakeholders,...

fortisBC
Surrey, Colombie-Britannique

We need someone like you to step into our Senior Engineer, Maintenance Planning and Reliability position. Eligible for registration as a Professional Engineer with the Association of Professional Engineers and Geoscientists of BC. Develop and maintain strong business relationships with stakeholders,...

Visier, Inc
Vancouver, Colombie-Britannique

Visier is the leader in people analytics and we believe in a 'people-first' approach to business strategy.Our innovative technology transforms the way that organisations make decisions, allowing them to elevate their employees and drive better business outcomes.Embarking on an exciting new chapter i...