Job Description
Job Description
We are looking for an experienced Site Reliability Engineer or Platform Operations Engineer for our client. This is a permanent position that is remote to start with later relocation to Calgary or Winnipeg . Our client is a global enterprise company with a product that you've likely used.
You Will :
- Own development projects, providing technical guidance and delivering against the Platform & Service Operations Engineering roadmap.
- Designing and Implementing Wargames to test our operational response and identify areas of weakness in our platforms.
- Technical and Management Escalation point for Service Operations Centre (SOC) engineers and during major incidents.
- Troubleshooting, reproducing and mitigating issues in our production environments
- Mentoring other team members.
- Operate global AWS Platforms at scale
You Have :
Evidence of Strong Troubleshooting, problem-solving and investigative skillsExperience of AWS or Other cloud providersExperience developing in JavaMajor incident management on experience operating production platforms at scaleExperience working with distributed web applicationsExperience Automating operational tasks / Processes using other languagesUnderstanding of relational and / or NoSQL data structuresExperience mentoring / influencing peersIdentifying improvements, highlighting risks vs benefits, and translating them into technical requirementsBonus :
Worked with Ansible, Terraform, PythonExperience working with Serverless / ContainersExperience of ELK & / Or Graphite / Prometheus / GrafanaUsed Tracing Tools in production beforeExperience in Chaos Engineering / Failure Injection TestingExperience of working in an Agile EnvironmentExperience working in a similar site reliability roleThis role offers great perks and a competitive salary, please apply to the job posting if it matches your career path!