Overview
OpenText is a global leader in information management, where innovation, creativity, and collaboration are the key components of our corporate culture. As a member of our team, you will have the opportunity to partner with the most highly regarded companies in the world, tackle complex issues, and contribute to projects that shape the future of digital transformation.
Senior Site Reliability Administrator (Intermediate Senior level)
OpenText is hiring a Senior Site Reliability Administrator (Intermediate Senior level) to build reliable, scalable solutions in a cloud DevOps environment. This role focuses on availability, performance, and stability of OpenText services, and automation to reduce manual work.
AI-First. Future-Driven. Human-Centered. At OpenText, AI is at the heart of everything we do—powering innovation, transforming work, and empowering digital knowledge workers. We're hiring talent that AI can't replace to help us shape the future of information management. Join us.
Responsibilities
- Uses technical knowledge, creativity, and company practices to drive down incidents through proactive monitoring and alerting.
- Provide attention to incidents according to Service Level Agreements.
- Provide continuous feedback to development teams on system stability, defect analysis and system enhancements.
- Develop runbooks and patterns to sustain applications in a production environment.
- Participate in technical discussions and drive transition to sustain activities with development teams.
- Gather input from IT business and development partners to develop new capabilities in displaying/monitoring/alerting on KPIs by tracking business transactions in real-time.
- Partner with application owners to develop creative and effective solutions to mitigate risk and remediate audit issues, providing quality and timely responses.
- Take ownership of the incident resolution process, participating in RCAs and SWAT investigations.
- Plan for validation and verification of changes deployed by infrastructure and development teams.
- Provide day-to-day real-time advanced technical support and troubleshooting for issues reported by users/customers.
- Provide guidance in resolving performance-related issues and designing solutions for technical issues faced by applications.
- Establish and maintain good relationships with team members, product development, product management, customer service, client management, and cross-functional teams.
- Participate in training and information sharing activities; act as a backup for other team members when necessary.
- Flexibility to work rotating shifts as needed; on-call rotation required for 24/7 support.
What You Need To Succeed
- Understanding and ability to maintain scripting software
- Deep understanding of Linux systems
- Hands-on experience with cloud infrastructure (Google, AWS, or Azure)
- Experience with PaaS technologies such as Cloud Foundry, Kubernetes, and Bosh
- Good understanding and operational experience with container technologies like Docker, rkt, Mesos
- Good understanding and experience with microservices and RESTful architecture
- Experience with CI/CD tools to set up automated pipelines as needed (GitOps, Ansible, Rundeck, or Argo CD)
- Strong working knowledge of aPaaS or Application operations best practices
- Operational understanding of message brokers (e.g., Apache Kafka, RabbitMQ)
- Operational understanding of search technologies (e.g., Solr, Elasticsearch)
- Experience in supporting middleware technologies such as Apache, Tomcat, Spring
- Experience with at least one scripting language (e.g., shell, Python, JavaScript, etc.)
- Experience with installing and configuring Apache and Tomcat
- Experience in supporting Java applications built with frameworks such as Spring, Struts, Spark
- Experience and knowledge of RDBMS and NoSQL databases (e.g., Oracle, Postgres, MariaDB, Cassandra)
- Deep expertise in monitoring distributed systems and correlating environment conditions and metrics to application events
- Experience with APM tools (New Relic, Dynatrace, AppDynamics)
- Experience with monitoring tools such as Zabbix or check_mk
- Knowledge of centralized logging (e.g., Graylog, Kibana)
- Strong ITIL principles understanding; certification is a plus
- Passion for diagnosing and troubleshooting user-facing service incidents and outages
- Knowledge of API gateways such as Apigee and OAuth 2.0
- Ability to diagnose and resolve problems in high-throughput web applications and network services
- Proven problem-solving and analytical abilities
- Excellent organizational and time management skills; ability to handle multiple tasks concurrently
- Ability to lead, drive and implement scalable, complex solutions
- Strong understanding of security best practices
- Ability to work independently and collaboratively
More About Our Team
OpenText Site Cloud Application Engineering is a rapidly growing group within the organization. We are building our teams, tools, and systems as part of OpenText's mission to deliver the best Cloud services in the world. We enable OpenText to go fast by providing real-time feedback on production systems, working with product families and platform developers to maintain and improve services and performance, and living the company values with a strong customer focus. We are a data-driven team using data collection, enrichment, analytics, and visualizations to learn about our complex systems. We also value sharing learning experiences with development teams. If you need assistance and/or a reasonable accommodation due to a disability during the application or recruiting process, please contact hr@opentext.com. Our proactive approach fosters collaboration, innovation, and personal growth, enriching OpenText's vibrant workplace.
Seniority level
Employment type
Job function
Industries
#J-18808-Ljbffr