Talent.com
Site Reliability Engineer

Site Reliability Engineer

Compunnel Software GroupMontreal, QC, United States
12 days ago
Job description

Role Summary :

We are seeking a Site Reliability Engineer (SRE) to support and enhance the reliability engineering, operations, and customer support for our ServiceNow SaaS platform. This is a hybrid role combining automation, process improvement, and production support with a strong emphasis on building and maintaining reliable and scalable systems. As part of a global SRE community, you'll collaborate with diverse teams and stakeholders to optimize system performance, resolve incidents, and drive service excellence.

The ideal candidate brings a blend of development skills, a problem-solving mindset, and a passion for operational excellence. Whether you come from a development, infrastructure, or systems administration background, if you're eager to apply SRE principles and deliver measurable improvements, we encourage you to apply.

Key Responsibilities :

  • Drive improvements in availability, performance, and scalability for the ServiceNow SaaS platform by optimizing and automating operational tasks.
  • Collaborate with global SRE colleagues to develop observability tools (metrics, logging, tracing, dashboards) that monitor and define product reliability.
  • Engage in incident response and resolution, particularly for ServiceNow and occasionally Linux-based on-premise infrastructure.
  • Participate in a global on-call rotation, ensuring timely response and remediation during incidents (time-off in lieu offered).
  • Contribute to knowledge documentation and ongoing efforts to understand and map dependencies in ServiceNow and associated systems.
  • Identify, prioritize, and address technical debt that hinders performance, reliability, or client satisfaction.
  • Collaborate in architecture reviews, process delivery improvements, and operational tooling development to support SRE goals.
  • Provide constructive feedback on policies and operational processes to continuously improve service delivery and team effectiveness.

Required Skills & Qualifications :

  • Minimum 7 years of relevant experience in software development, system administration, or infrastructure operations.
  • Strong proficiency in at least one programming / scripting language (e.g., Python).
  • Excellent troubleshooting skills across ServiceNow and Linux-based systems.
  • Strong interpersonal and communication skills; capable of building positive, productive relationships across teams.
  • Proven dependability in handling time-sensitive or high-impact technical incidents.
  • Commitment to continuous learning and improvement of reliability, efficiency, and customer satisfaction.
  • Preferred Skills :

  • ServiceNow administration or development experience (training available if not already acquired).
  • Familiarity with SRE principles such as task automation, technical debt reduction, capacity management, and monitoring.
  • Experience in a production support or DevOps / SRE role in an enterprise-scale environment.
  • Exposure to IT service management (ITSM), SaaS platforms, and enterprise toolchains.
  • Education : Bachelors Degree

    Create a job alert for this search

    Site Reliability Engineer • Montreal, QC, United States