Systems Reliability Engineer
12 Months Contract
Location : Montreal
Experience : Intermediate with 2 to 5 years
Top 3 Must have :
1. Strong experience with Python and / or Shell scripting
2. Strong experience with data base (DB2 knowledges is a plus)
3. Strong communication skills. The consultant will work with business users in day to day basis.
Top 2 Nice to have :
1. Good knowledges of Grafana, Prometheus
2. Good experience with debugging
Reliability & Production Engineering
Resiliency Engineering is a production-oriented discipline focused on improving service availability, latency, scalability, performance, and efficiency for technology products in
Job Profile
Systems Reliability Engineering (SRE) is a discipline focused on improving system service availability, observability, scalability, performance, and resilience across
We are growing SRE capabilities within our Reliability & Production Engineering (RPE) organization as part of the transformation of
Responsibilities :
Are interested in distributed systems and working with highly scalable and reliable services.
Like to work in a fast-moving environment and you aren't afraid to change things to make them better.
Enjoy new technological challenges and solving hard problems.
Believe a team working well together is smarter than the single smartest person on that team.
Have grit, drive and a deep sense of ownership.
Working closely with engineering / development teams to design, build, and maintain systems.
Troubleshooting issues across the entire technology stack : hardware, software, application, and network.
Identifying and driving opportunities to improve automation for our platforms; scope and create automation for deployment, management, and visibility of our services.
Proactively identifying and addressing systems reliability risks.
Working alongside existing global and regional team members on a follow-the-sun basis.
Represent the RPE organization in design reviews and operational readiness exercises for new and existing services.
Qualifications - Skill Set
Demonstrated ability to troubleshoot problems and debug to identify root cause.
Hands on experience on enterprise tools such as AppDynamics, Grafana, Splunk, Dynatrace.
Experience with Ansible, GitHub or any automation / configuration / release management tools.
Automation-related experience is particularly valued using scripting languages such as python, bash, perl. One higher level language is desired.
Awareness of, and ability to reason about modern software and systems architectures, including load-balancing, databases, queueing, caching, distributed systems failure modes, micro services, Cloud, etc.
Practical experience running large scale systems is an advantage.
Should be able to contribute to system design and architecture with strong database knowledge.
Qualifications / Criterion
Background in Computer Science / Engineering or similar field.
Company Profile