Talent.com
Site Reliability Engineer
Site Reliability EngineerALLTECH CONSULTING SVC INC • Quebec, Capitale-Nationale, CA
Site Reliability Engineer

Site Reliability Engineer

ALLTECH CONSULTING SVC INC • Quebec, Capitale-Nationale, CA
Il y a plus de 30 jours
Type de contrat
  • Temps plein
Description de poste

Job Description :

Technology / Role / Department at our Company Enterprise Technology & Services (ETS) delivers shared technology services for the Firm supporting all business applications and end users. ETS provides capabilities for all stages of the Firm’s software development lifecycle, enabling productive coding, functional and integration testing, application releases, and ongoing monitoring and support for over 3,000 production applications. ETS also delivers all workplace technologies (desktop, mobile, voice, video, productivity, intranet / internet) in integrated configurations that boost the personal productivity of our employees.

Application and end user services are delivered on a scalable, secure, and reliable infrastructure composed of seamlessly integrated datacenter, network, compute, cloud, storage, and database services. Application Infrastructure (AI) strives to maximize the business application developers’ productivity by centrally providing the core development lifecycle tools, core reusable software libraries and middleware thus minimizing duplicative efforts across silos. We are also focusing on the lifecycle into production and provide tooling to monitor systems, applications, hosts, logs and infrastructure inventory.

Our goal is to provide infrastructure that is broadly reusable, scalable, reliable and highly performant to meet the demanding needs of our applications.

Job Responsibilities : The Company’s Development Environment department is seeking a Site Reliability Engineer to drive reliability engineering, operational support, and customer consultation services for key products. MSDE is part of the Application Infrastructure organization and is responsible for shaping the SDLC within the Company by implementing the tools, systems, and processes used by 17,000+ developers for software development and deployment.

Reporting to the SRE Lead for MSDEs Engineered products, this role requires growing SRE capabilities to deliver reliable systems efficiently and understanding MSDEs products thoroughly to maximize developer productivity across the Firm.

This is a production-side, operational role requiring participation in an on-call rotation and strong influencing skills among technical stakeholders. Much of the daily operations can be delegated to team ops staff.

The successful candidate may be a Python developer aiming to evolve into reliability engineering or a strong operational lead with Python experience. Prior experience in finance is not required; candidates from software or other industries are welcome.

Job Responsibilities :

  • Building and maintaining comprehensive knowledge of the Company’s development environment
  • Maximizing system availability and performance through automation, problem management, and architecture reviews
  • Reducing support costs via operational issue elimination, automation, operational tool development, and client self-service
  • Identifying and prioritizing technical debt impacting productivity, reliability, or support efficiency
  • Collaborating with other SREs to share solutions
  • Troubleshooting complex environment issues
  • Enhancing Ops team knowledge and support capabilities to reduce escalations
  • Consulting with development teams to improve productivity and troubleshoot issues
  • Experimenting with new tools and techniques
  • Sharing on-call responsibilities within the global team

Required Qualifications / Skills :

  • Strong Linux troubleshooting skills
  • Automation experience in any language, preferably Python
  • Experience with monitoring / observability tools like Prometheus and Grafana
  • Familiarity with version control, issue tracking, CI / CD, automated testing, and deployment automation tools
  • Excellent communication and collaboration skills
  • Desired Skills :

  • Knowledge of SRE practices like SLOs, error budgets, blameless postmortems, toil reduction
  • Experience with Docker / Kubernetes
  • #J-18808-Ljbffr

    Créer une alerte emploi pour cette recherche

    Site Reliability Engineer • Quebec, Capitale-Nationale, CA