The Private Cloud SRE L3 position sits within Client Enterprise Computing organization. This role supports cloud and container-based infrastructure in a high-availability, globally distributed environment. As a member of the global L3 team, you will provide advanced technical support, participate in on-call rotations, and collaborate with engineering teams on performance, testing, and automation.
Key Responsibilities:
- Provide L3 support for Client private cloud infrastructure and participate in on-call rotation.
- Collaborate with internal engineering teams to test and validate software releases, upgrades, and infrastructure changes.
- Drive process improvements including automation, scripting, documentation, and incident management.
- Assist in the development of capacity planning, performance monitoring, and alerting solutions.
- Coordinate closely with L2 teams and global L3 peers to ensure consistent support across regions.
- Champion operational excellence through robust change, incident, and problem management practices.
Required Qualifications:- 5–7 years of relevant experience in systems or infrastructure roles.
- 3–5 years of hands-on experience with Linux systems in enterprise environments.
- Strong understanding of server infrastructure, virtualization, and cloud computing architectures.
- Proven experience with Kubernetes and Docker in a production setting.
- Solid grasp of internet and networking protocols (TCP/IP, HTTP/S) and security protocols (SSL/TLS, Kerberos).
- Strong scripting skills (e.g., Python preferred) for automation and tooling.
- Experience with Agile development and DevOps/SRE methodologies.
- Excellent communication skills and the ability to work effectively with diverse teams and stakeholders.
Preferred / Nice-to-Have Skills:- Experience with cloud-native monitoring tools (e.g., Prometheus, Grafana, ELK stack).
- Hands-on experience in enterprise-scale hosting environments.
- Familiarity with high-availability system design and disaster recovery strategies.
- Knowledge of monitoring architecture, including deployment of agents, custom dashboards, and alerting logic.
- Prior work experience in regulated environments (e.g., financial services) is a plus.
Soft Skills:- Strong problem-solving and incident management capabilities.
- Ability to manage multiple high-pressure issues simultaneously.
- Highly organized with attention to detail and a proactive attitude toward continuous improvement.
#J-18808-Ljbffr