Client’s Application Infrastructure (AI) division is seeking a Site Reliability Engineer (SRE) to join the Client Development Environment team. This role is focused on driving reliability, operational efficiency, and support for core development lifecycle tools used by over 17,000 developers across the firm. The ideal candidate will play a critical role in scaling and maintaining high-performing systems, ensuring system resilience, and working closely with developers to maximize productivity while minimizing manual operational effort.
Job Responsibilities :
- Gain and maintain full-stack knowledge of Morgan Stanley’s development environment
- Ensure maximum availability and performance of systems through architecture reviews, problem management, and plant optimization
- Automate plant management tasks and develop tools to reduce operational effort and support costs
- Identify and address technical debt that impacts developer productivity or system reliability
- Collaborate with other SREs across Application Infrastructure to implement shared solutions
- Troubleshoot complex issues across the full development stack
- Enhance Ops team product knowledge to reduce issue escalation rates
- Consult with internal developer clients to help troubleshoot and optimize use of Client tooling
- Experiment with emerging technologies, tools, and techniques to improve operations
- Participate in a global on-call rotation with compensatory time-off
- Champion operational responsiveness and a strong culture of reliability and automation
Required Skills :
Programming / scripting experience for task automation (Python preferred)Hands-on experience with observability tools like Prometheus and GrafanaExperience with version control (Bitbucket, GitHub), issue tracking (Jira), CI tools (Jenkins, GitHub Actions, Azure DevOps)Familiarity with automated testing and deployment pipelinesStrong interpersonal and communication skillsProven collaboration capabilities within technical stakeholder groupsPreferred Skills :
Familiarity with SRE principles such as SLOs, error budgets, toil reduction, and blameless postmortemsExperience with containerization technologies such as Docker and orchestration tools like KubernetesPrior exposure to large-scale development environments or developer tooling platformsCertifications :
[Not Specified – Relevant certifications in Linux, Python, Kubernetes, or SRE practices are a plus]
Education :
Bachelor’s degree in computer science, Engineering, or related field (preferred)
Email ID
This field is required Please enter valid emailId.Cell phone
This field is required Please enter valid cell phone.First Name
This field is required Please enter valid first name.Last Name
This field is required Please enter valid last name.#J-18808-Ljbffr