Talent.com
Staff Site Reliability Engineer (Staff SRE)
Staff Site Reliability Engineer (Staff SRE)Walt Disney Animation Studios • Vancouver, Canada
Staff Site Reliability Engineer (Staff SRE)

Staff Site Reliability Engineer (Staff SRE)

Walt Disney Animation Studios • Vancouver, Canada
Il y a plus de 30 jours
Type de contrat
  • Temps plein
Description de poste

Job Summary :

Walt Disney Animation Studios’ world-class filmmakers, artists, and technical collaborators create the magic of animation. Bring your unique talents, passion and ideas to our team and prepare to play in a creative, artist-friendly environment.

We are seeking a Staff SRE with expertise in systems administration skills in Linux platforms, and also has experience with software development (. Python, Go, Java, Node), CI Pipeline tools (. Jenkins), Git source management, cloud hosting (AWS, GCP & Azure), container computing (. Docker, OCI), and web technologies. The ideal candidate will enjoy the diversity and challenges of working at various levels in the foundational deployment stack, from defining configuration management, to developing CI / CD infrastructure and processes.

This role resides within the Platform and Infrastructure team at Walt Disney Animation Studios (WDAS), and we build the tools and manage the infrastructure that artists use daily to create our celebrated animated content. The SRE team within Platform Engineering is focused on optimizing service deployments and improving the availability, latency, performance, efficiency, and observability of systems at WDAS. All projects have in common pursuit of simple and performant solutions to complex problems using Agile and DevOps methodologies as part of high-energy, proficient teams.

Critical to success in this role is an aptitude for working collaboratively with a technical team. You will help to develop and drive requirements and strategies while also supporting services and core services infrastructure.

Our studio thrives from a wide variety of technical backgrounds and experiences, so we encourage applicants to apply even if they have experiences not specified below. Bring your unique talents, passion and ideas to our team, and be a part of Disney’s creative legacy!

Responsibilities

As Staff SRE, you will translate ideas into tangible products that shape experiences by focusing on a systematic approach to automation, resiliency, efficiency, stability, security, performance, and capacity management, as well as documentation. You will serve as a subject matter expert in multiple areas and be looked at by your fellow team members as a 'go to' individual; you are someone who has a clear understanding of, and can thoroughly elaborate on SRE principles and best practices to a given audience. To be successful in this role you will continuously uphold and improve all the relevant reliability aspects for our services, with an increased focus on SLIs and SLOs, while raising the reliability of a variety of large scale user facing and internal services. As Staff SRE, you will maintain a strong understanding of stakeholder workflows and requirements, and then be able to translate the targeted solutions into an end-to-end architectural design.

You will work with engineering, creative and production teams in an extremely collaborative and high-energy environment to brainstorm, architect, gather requirements, troubleshoot, and provide stellar customer support. You are passionate about constantly learning, applying technology to solve complex problems, and is a highly motivated, optimistic, proactive, creative thought leader and project manager.

Additional Responsibilities Include :

Support a wide range of on-premises and cloud deployments using infrastructure-as-code, self-healing, and security automation patterns and can facilitate others to use the Infrastructure as Code paradigm

Deploy and manage a wide array of on-premises and cloud deployments

Develop useful telemetry, alerts, and response to reduce Mean Time To Repair (MTTR).

Collaborate and provide technical excellence within and across teams.

Consult on best practices and develop tools to enable smooth adoptions of good service reliability practices and methods.

Identify areas of improvement in reliability, efficiency, and operations.

Build tools to help your SRE team quickly pinpoint, isolate and resolve issues related to infrastructure, platform services and applications.

Continuously refine monitoring processes, configurations, and thresholds.

Practice and promote sustainable incident response and blameless postmortems

Develop runbooks and tools to streamline processes and shorten problem resolution time.

Write code that improves scalability, performance, maintainability, and security.

Add, tune and maintain alert configurations and documentation as needed.

Develop and improve CI / CD processes to improve release cadence and success.

Use Chaos Engineering principles and methodologies to test what you build under real-world conditions.

Mentor SREs, Sysadmins, and Systems Engineers in technical and non-technical SRE responsibilities.

Required Education

BS in Computer Science, Computer Engineering, Electrical Engineering or related field

Key Qualifications :

7+ years of experience in SRE, devops, technical operations, systems engineering, software engineering or related discipline

Proficient, collaborative, & experienced in building reliable, scalable, enterprise systems

Excellent communication skills, both verbal and written

Passionate and curious about ways to leverage technology while continually learning

Efficiently skilled with the use of containers and container orchestration systems in enterprise production environments (. Docker, Kubernetes, Rancher, AWS ECS and EKS)

Experience with configuration management and infrastructure as code (. Terraform, Helm, Cloud Formation, Ansible, Puppet, and Ansible)

Comfortable in one or more of the following languages (Python, Java, Scala, Go, Rust, Ruby, or similar)

Skilled in Cloud / PaaS / SaaS Environments (. AWS, Azure, Google Cloud Compute)

Hands-on experience using source control (Git, GitHub) and feature branching strategies

Experience with continuous integration tools (. Jenkins, Gitlab CI / CD, AWS CodeBuild, CodeDeploy, Spinnaker)

Knowledge of best practices and IT operations in an always-up, always-available service

Possess expertise in scalable testing, automation, continuous integration frameworks and best practices

Experience in SDLC, distributed systems, networking, hardware, logistics and operations or capacity planning

UNIX / Linux administration, troubleshooting, performance tuning, and security

Experience with DevOps methodologies and / or SRE

Experience with monitoring and observability tooling such as Datadog, Prometheus, and Grafana

Experience with automating infrastructure, deployment and testing using tools like Cloudformation, Ansible or Terraform.

Experience with Service Level Objectives and Error Budgets

Understanding of the principles and methodologies behind Chaos Engineering

Bonus Qualifications :

Expertise in web server administration

The Walt Disney Company is an Equal Opportunity Employer.

The hiring range for this position in British Columbia, Canada is C$124,200 to C$166,700 CAD per year. The base pay actually offered will take into account internal equity and also may vary depending on the candidate’s geographic region, job-related knowledge, skills, and experience among other factors. A full range of medical, financial, and / or other variable pay or benefits, may be offered dependent on the level and position offered.

Créer une alerte emploi pour cette recherche

Staff Site Reliability Engineer Staff SRE • Vancouver, Canada

Offres similaires
Rope Access Technician (L3 IRATA / SPRAT Certified)

Rope Access Technician (L3 IRATA / SPRAT Certified)

Cleantech Service Group • Richmond, BC, Canada
Temps plein +1
Join Our Team as a Rope Access Technician (L3 IRATA / SPRAT Certified)!.Are you passionate about safety and excellence in high-rise building maintenance? Have you ever wondered why your current emplo...Voir plus
Dernière mise à jour : il y a 1 jour • Offre sponsorisée
Senior Site Reliability Engineer

Senior Site Reliability Engineer

Relay • Vancouver, BC, Canada
Temps plein
Relay is a digital banking platform that gives self-made business owners the tools and know-how to be great with money—bringing clarity, confidence, and control to every dollar earned, so the...Voir plus
Dernière mise à jour : il y a 10 jours • Offre sponsorisée
Security Engineer (ID#5228)

Security Engineer (ID#5228)

freelance.ca • Richmond, Canada
Temps plein
The company, a national IT consulting company, is seeking a Security Engineer to join a DevSecOps team focused on security in SDLC. This will involve secure design review, threat modelling, secure c...Voir plus
Dernière mise à jour : il y a 10 jours • Offre sponsorisée
Site Superintendent

Site Superintendent

TalentSphere • Vancouver, BC, Canada
Temps plein
Key Responsibilities as the Site Superintendent : .Oversee and make decisions related to project activities in compliance with approved contract documents, scheduling, logistics, quality control, fie...Voir plus
Dernière mise à jour : il y a 5 jours • Offre sponsorisée
Site Services Supervisor

Site Services Supervisor

Coeur Mining • Vancouver, BC, Canada
Temps plein
Las Chispas silver-gold mine in Sonora, Mexico, the Palmarejo gold-silver complex in Chihuahua, Mexico, the Rochester silver-gold mine in Nevada, the Kensington gold mine in Alaska and the Wharf go...Voir plus
Dernière mise à jour : il y a plus de 30 jours • Offre sponsorisée
Customer Service Agent - 50k-60k / Year - Remote

Customer Service Agent - 50k-60k / Year - Remote

Spade Recruiting • Squamish, British Columbia
Télétravail
Temps plein
Quick Apply
We’re looking for enthusiastic, self-driven, individuals to assist existing and prospective clients within our organization. This position will work with multiple clients throughout the day pr...Voir plus
Dernière mise à jour : il y a 9 jours • Offre sponsorisée
Reliability Engineer

Reliability Engineer

Lithium Americas Corp. • Vancouver
Temps plein
Reliability Engineer page is loaded## Reliability Engineerlocations : Thacker Passposted on : Posted 4 Days Agojob requisition id : JR100017Lithium Americas is a North American resource and mate...Voir plus
Dernière mise à jour : il y a 18 jours • Offre sponsorisée
Senior Site Reliability Engineer

Senior Site Reliability Engineer

Targeted Talent • Burnaby, BC, Canada
Permanent
We are looking for an experienced.Senior Site Reliability Engineer.Our client is a global enterprise company with a product that you've likely used. Experience with coding / software development, ...Voir plus
Dernière mise à jour : il y a plus de 30 jours • Offre sponsorisée
Site Reliability Engineer

Site Reliability Engineer

BNB Chain • Vancouver
Temps plein
Founded in 2021, LayerZero’s vision is to create a community of cross-chain developers, building dApps that are no longer constrained by individual blockchain capabilities.With LayerZero's simple, ...Voir plus
Dernière mise à jour : il y a 2 jours • Offre sponsorisée
Sr. Systems Engineer

Sr. Systems Engineer

LANTRONIX • Vancouver, BC, Canada
Temps plein
Global M2M communications hardware manufacture and a provider of Software as a Service (SaaS), connectivity services, engineering services, intelligent hardware, and turnkey solutions for Automotiv...Voir plus
Dernière mise à jour : il y a plus de 30 jours • Offre sponsorisée
Reliability and Integrity Engineer

Reliability and Integrity Engineer

Pacific Energy Canada • Squamish, BC, Canada
Temps plein
Project is located approximately 7 km west-southwest of Squamish, British Columbia.It involves the construction and operation of a liquefied natural gas (LNG) export facility on the previous Woodfi...Voir plus
Dernière mise à jour : il y a plus de 30 jours • Offre sponsorisée
Site Reliability Engineer

Site Reliability Engineer

Arbitrum • Vancouver
Temps plein
Founded in 2021, LayerZero’s vision is to create a community of cross-chain developers, building dApps that are no longer constrained by individual blockchain capabilities.With LayerZero's simple, ...Voir plus
Dernière mise à jour : il y a 18 jours • Offre sponsorisée
Site Reliability Engineer II

Site Reliability Engineer II

Electronic Arts • Vancouver
Temps plein
Electronic Arts creates next-level entertainment experiences that inspire players and fans around the world.Here, everyone is part of the story. Part of a community that connects across the globe.A ...Voir plus
Dernière mise à jour : il y a 18 jours • Offre sponsorisée
Site Installation Supervisor

Site Installation Supervisor

Aversan Inc. • Vancouver, BC, Canada
Temps plein
Aversan delivers leading-edge and reliable safety-critical electronics and software systems to the aerospace, defence, and space industries. We are currently seeking a qualified Site Installation Su...Voir plus
Dernière mise à jour : il y a 11 jours • Offre sponsorisée
Build Engineer (On-Site)

Build Engineer (On-Site)

Offworld industries • New Westminster, BC, Canada
Temps plein +1
Offworld) is the independent studio behind the successful military first-person shooter game, Squad.Offworld was formed in 2014 by more than 15 developers who had worked together on the well known ...Voir plus
Dernière mise à jour : il y a plus de 30 jours • Offre sponsorisée
Staff Site Reliability Engineer (Staff SRE)

Staff Site Reliability Engineer (Staff SRE)

The Walt Disney Company • Vancouver
Temps plein
Staff Site Reliability Engineer (Staff SRE).Get AI-powered advice on this job and more exclusive features.Walt Disney Animation Studios’ world-class filmmakers, artists, and technical collaborators...Voir plus
Dernière mise à jour : il y a 18 jours • Offre sponsorisée
Site Reliability Engineer

Site Reliability Engineer

Apple • Vancouver
Temps plein
The Apple Service Engineering - SRE team is looking for Site Reliability Engineers with experience in developing processes, tools, and automation for managing distributed systems in production envi...Voir plus
Dernière mise à jour : il y a 18 jours • Offre sponsorisée
Site-based Reliability & Integrity Engineer – LNG Facility

Site-based Reliability & Integrity Engineer – LNG Facility

Woodfibre Management Ltd • Squamish
Temps plein
A Canadian LNG project company is seeking a Reliability & Integrity Engineer for their facility in Squamish, BC.This role involves ensuring technical integrity and reliability during construction a...Voir plus
Dernière mise à jour : il y a 16 jours • Offre sponsorisée