The Site Reliability Engineer will play a critical part in ensuring the reliability, supportability, scalability, and performance of our .NET stack applications built with MVC, Angular, and Web API.
Partner with developers and product operations teams to understand application requirements and translate them into operational practices.
Design, implement, and maintain infrastructure automation tools using Infrastructure as Code (IaC) methodologies.
Monitor application health and performance metrics, proactively identifying and resolving potential issues.
Implement incident response procedures to ensure timely resolution of outages and service disruptions.
Establish and improve best practices for product solution design / architecture, and development.
Participate in peer and team code reviews by developing comprehensive coding standards and guidelines to ensure consistency, maintainability, and quality in software development. By establishing clear protocols for code formatting, naming conventions, error handling, testing, and documentation, we can enhance code readability, reduce defects, and facilitate knowledge sharing among team members.
Collaborate with engineers to develop and implement disaster recovery plans.
Continuously improve monitoring and alerting processes to ensure efficient problem identification and resolution.
Stay up-to-date on the latest advancements in .NET infrastructure and SRE best practices.
Qualifications
Bachelor degree required
Minimum 3+ years of experience in a related technical role (, Systems Administrator, Network Engineer) required
Experience with configuration management tools like Ansible, Puppet, or Chef preferred
Azure experience required
Familiarity with monitoring and alerting tools (.NET performance counters, Azure App Insight, Prometheus, Grafana) is a plus preferred
Ability to manage and coordinate multiple projects in a fast paced, highly professional environment.
While coding proficiency is not required, a strong understanding of the .NET ecosystem and a desire to delve into infrastructure and automation will be essential for success.
Strong understanding of system administration principles, including operating systems (Windows Server preferred) and networking concepts.
Familiarity with monitoring and alerting tools (.NET performance counters, Azure App Insight, Prometheus, Grafana)
Ability to work independently and as part of a team