Role Summary :
We are seeking a Site Reliability Engineer (SRE) to support and enhance the reliability engineering, operations, and customer support for our ServiceNow SaaS platform. This is a hybrid role combining automation, process improvement, and production support with a strong emphasis on building and maintaining reliable and scalable systems. As part of a global SRE community, you'll collaborate with diverse teams and stakeholders to optimize system performance, resolve incidents, and drive service excellence.
The ideal candidate brings a blend of development skills, a problem-solving mindset, and a passion for operational excellence. Whether you come from a development, infrastructure, or systems administration background, if you're eager to apply SRE principles and deliver measurable improvements, we encourage you to apply.
Key Responsibilities :
- Drive improvements in availability, performance, and scalability for the ServiceNow SaaS platform by optimizing and automating operational tasks.
- Collaborate with global SRE colleagues to develop observability tools (metrics, logging, tracing, dashboards) that monitor and define product reliability.
- Engage in incident response and resolution, particularly for ServiceNow and occasionally Linux-based on-premise infrastructure.
- Participate in a global on-call rotation, ensuring timely response and remediation during incidents (time-off in lieu offered).
- Contribute to knowledge documentation and ongoing efforts to understand and map dependencies in ServiceNow and associated systems.
- Identify, prioritize, and address technical debt that hinders performance, reliability, or client satisfaction.
- Collaborate in architecture reviews, process delivery improvements, and operational tooling development to support SRE goals.
- Provide constructive feedback on policies and operational processes to continuously improve service delivery and team effectiveness.
Required Skills & Qualifications :
Minimum 7 years of relevant experience in software development, system administration, or infrastructure operations.Strong proficiency in at least one programming / scripting language (e.g., Python).Excellent troubleshooting skills across ServiceNow and Linux-based systems.Strong interpersonal and communication skills; capable of building positive, productive relationships across teams.Proven dependability in handling time-sensitive or high-impact technical incidents.Commitment to continuous learning and improvement of reliability, efficiency, and customer satisfaction.Preferred Skills :
ServiceNow administration or development experience (training available if not already acquired).Familiarity with SRE principles such as task automation, technical debt reduction, capacity management, and monitoring.Experience in a production support or DevOps / SRE role in an enterprise-scale environment.Exposure to IT service management (ITSM), SaaS platforms, and enterprise toolchains.Education : Bachelors Degree