Talent.com
Site Reliability Engineer

Site Reliability Engineer

CB CanadaCalgary, Alberta, Canada
30+ days ago
Salary
CA$140,000.00–CA$160,000.00 yearly
Job description

Site Reliability Engineer

On behalf of our client, Procom is seeking a Site Reliability Engineer for a full-time permanent position, that can be fully remote across Canada.

Site Reliability Engineer - Job Details

  • We are looking for a self-driven Site Reliability Engineer (SRE) who likes taking engineering-based approaches to solve Supportability problems, with a history of engineering excellence and experience in supporting cloud services. You will be responsible for optimizing and operating supportability improvements in a data-driven manner, working closely with Software Engineers to design and deliver experience that adheres to services best practices, highly available, reliable, scalable, provides a great user experience, and meets our compliance policies and requirements.
  • You’ll be focused on driving continuous improvements across the lifecycle of our services with automation in mind. You’ll also demonstrate a history of managing multiple priorities, deep technical and online services skills, a focus on using metrics and data, and a strong supportability-first mindset.

Site Reliability Engineer - Main Responsibilities

  • Collaborating closely with several engineering teams on building and enhancing tooling and automation solutions for faster resolution of customer issues and avoiding them altogether when possible.
  • Partnering with external platform teams building the support tooling with the ability to extend those to meet the needs of any special requirements.
  • Ability to design and implement any changes to service telemetry for the automation to consume if it's not already available.
  • Enhancing customer facing experience by proactive alerting based on utilization, trends, resource health, etc.
  • Analyze data and provide operational insights into customer experience to Design and Product teams, so that we can design features with Supportability in mind.
  • Engage and foster opportunities to improve existing planning, processes, and automation.
  • Site Reliability Engineer - Mandatory Skills

  • Bachelor’s degree in Computer Science, Engineering, or related technical field.
  • 5+ years of SRE or SWE experience running large scale online / hybrid services in cloud environments (Azure), applying site reliability principles and / or demonstrating sensitivity to operational concerns. Automation-related experience valued.
  • Experience with any of C# / Java / Python as a primary language.
  • Fluency in one or more automation languages like PowerShell, Python etc.
  • Specifically desired is a deep understanding and familiarity with Observability and MELT (Monitoring, Events, Logging, and Tracing) design and implementation patterns for large-scale distributed services.
  • Experience in hypothesis driven development, test-driven development / behavior driven development desirable.
  • Familiar with Agile / Scrum / Lean Methodology.
  • Strong problem-solving, troubleshooting, and analytical skills.
  • Ability to deal with the ambiguity associated with working in a fast-paced and changing environment and aren't afraid to change things to make them better.
  • Intellectual curiosity and high EQ (emotional intelligence) will serve the successful candidate well.
  • Great communicator with the ability to analyze and clearly articulate complex issues.
  • Influencing the product architecture and roadmap to make sure the customer-experienced supportability is always a key consideration when evolving the product.
  • Site Reliability Engineer - Assignment Location

  • Fully Remote, across Canada
  • Site Reliability Engineer - Assignment Location - Length

  • Permanent