Our Vancouver Client is seeking a Senior Site Reliability Engineer to develop robust observability solutions using Dynatrace, and automating key monitoring processes through Terraform and PowerShell -
12 months contract, Vancouver - 2 days / month in office and as needed basis for meetings.
Must Have :
- Extensive and recent experience as a Site Reliability Engineer (SRE) / Azure / DevOps engineer with a focus on Dynatrace and Observability practices within Cloud (Azure, AWS)
- Strong proficiency in Dynatrace monitoring solutions, including configuration, customization, and optimization.
- Hands-on experience with Observability tools and practices such as distributed tracing, logging, metrics collection, and anomaly detection.
- PowerShell experience
- Experience with automation tools (Ansible, Terraform, Kubernetes) and Infrastructure as Code (IaC) principles
- Solid understanding of cloud platforms (AWS, Azure , GCP) and containerization technologies (Docker)
- Excellent problem-solving skills, analytical thinking, and the ability to troubleshoot complex technical issues.
- Strong communication and collaboration skills, with the ability to work effectively in cross-functional teams and drive initiatives to completion.
- Bachelor's degree in Computer Science, Engineering, or related field;
Nice to Have :
- Relevant certifications (Dynatrace, AWS, Azure, Kubernetes, etc.)
- Master's Degree
Responsibilities :
The Senior Site Reliability Engineer will be responsible for developing robust observability solutions using Dynatrace, and automating key monitoring processes through Terraform and PowerShell.
The role aims to design and implement custom solutions that provide comprehensive, end-to-end monitoring of applications built on Azure services, ensuring optimized performance, reliability, and scalability.
This is a hands-on role, requiring strong expertise in scripting and automation to streamline infrastructure operations.
Additionally, the Sr. SRE will collaborate with cross-functional teams to drive improvements in automation, infrastructure as code, and operational efficiency.
- Serve as the subject matter expert (SME) for Dynatrace, responsible for configuring, optimizing, and managing Dynatrace monitoring solutions.
- Design and implement monitoring strategies using Dynatrace to ensure comprehensive visibility into system performance, availability, and reliability
- Collaborate with our Engineering & Platform teams to ensure our services, platforms and infrastructure are emitting the right metrics
- Lead the rollout and adoption of Observability practices, tools, and frameworks across teams and projects.
- Collaborate with Incident Management teams to resolve critical incidents, conduct post-incident reviews, and implement preventive measures.
- Proactively identify and mitigate potential issues, bottlenecks, and performance degradation to ensure system reliability and uptime
- Drive automation initiatives using tools like Ansible, Terraform, or Kubernetes to streamline deployment, configuration, and management of infrastructure.