Talent.com
Senior Production Support and SRE Engineer

Senior Production Support and SRE Engineer

ScotiabankToronto, ON, CA
27 days ago
Job description

Requisition ID : 211963

Tangerine is Canada’s leading direct bank. We offer flexible and accessible banking options, innovative products, and award-winning Client service. The reason why Tangerine employees come to work each day is to help Canadians live better lives. We focus on making a difference in our communities, and that includes our own internal community. It’s important to us that our employees feel empowered and enthusiastic about belonging to our Orange culture.

As Canada’s leading digital bank, Tangerine technology is at the heart of everything we do. We have redefined what digital banking is, and we continue to evolve on what it can be, using technology to create innovative, forward thinking banking solutions with our clients’ needs in mind. We are made up of high performing, curious, energetic, and collaborative individuals who thrive in our high trust agile environment to deliver best in class solutions for our customers. We believe in giving people hands-on challenges and the responsibilities that come with them, allowing them to grow, evolve and create opportunities to build their career. Are you ready to make the change and become part of an established disruptor with the backing of a highly engaged team? If so come join us and help redefine the Canadian banking landscape!

We are looking for a Senior SRE & Production Support Specialist join our Tangerine’s Production Support and SRE team.

Are you enthusiastic about about managing and supporting complex, exceptionally reliable and scalable enterprise systems , improving automation and ensuring the resiliency of technology? Do you get your energy by providing technology solutions collaborating with a team? We are currently seeking an experienced Senior Production Support & Site Reliability Engineer who can provide technical expertise to resolve application and infrastructure technology issues on medium to complex projects in compliance with service standards, policies, and procedures. In addition, the role also monitors and analyzes supported services and deployment methodologies to identify opportunities for improvement and recommend solutions using SRE methodologies. The role devises new methods and procedures using strong analytic and inductive thinking. Specifically, we are searching for someone who brings fresh ideas, demonstrates a unique and informed viewpoint, and enjoys collaborating with a cross-functional team to investigate and assist with resolving recurring and major issues and help improve the performance of our supported applications.

Is this role right for you? In this role you will : ?

You will be joining Tangerine’s SRE & Production Support team.

You will be responsible for maintaining the production applications and day-to-day operational activities, manage escalations and modify established procedures / approaches to suit specific situations including 24 x 7 support and coordination of recovery efforts.

Ensures all production issues are resolved within SLAs, and user requests are completed satisfactorily and that all customer requests are responded to in a timely manner.

You will be responsible for providing investigation and second level support on client issues, technical issues, system / web site outages and questions from all internal and external application by maintaining, prioritization and addressing to respective Tangerine technology groups and vendors.

You will run the production environment by monitoring availability and taking a holistic view of system health.

You will improve our suite of software solutions' reliability, quality, and time-to-market.

Measure and optimize system performance to push our capabilities forward, get ahead of customer needs, and innovate to improve continually.

Participate in defining SLIs, SLOs and SLAs for Enterprise Systems

Gather and analyze metrics from both applications and infrastructure to assist in performance tuning and fault finding.

Partner with development teams to address outstanding tickets and implement permanent fixes.

Create sustainable systems and services through automation and process improvements.

Balance feature development speed and reliability with well-defined service level objectives.

Monitor multiple application health and discover opportunities to optimize in a continuously growing large complex hybrid environment.

Lead on-call problem escalation and outage recovery effort, not limited to code fixes in presentation and integration layer, but also provide infrastructure level investigation and support where necessary.

Lead post-incident technical retrospect to discover and implement remediation actions.

You will perform troubleshooting, deploy systems, or execute maintenance tasks as necessary to meet the specified SLOs.

Production support & SRE capabilities - Investigate and define operational issues and prioritize based on severity, risk and / or strategic business needs. Manage issue logs and Contact Centre requests. Design and implement solutions to prevent recurrence with the end goal of ensuring clients satisfaction.

Partnership - Work as a regular liaison with business partners, technology partners, senior management team, internal and external clients. Ensure internal clients are well-informed of the team mandate and nature, including relevant policies and legislation changes. Promote and support the concepts, products, and services of the Channel Support area.

Projects delivery - Function as subject matter expert in the assessment of impacts for planned system changes and projects, ensuring compliance with relevant organization standards (Business, Continuity, Security, Compliance, and Privacy); develop and maintain productive relationships with Technology, QA, Project team and others. Research, evaluate and support the development and implementation of new and / or revised policies, procedures, and standards. Investigate, research, and provide recommendations on issues and system outages .

Do you have the skills that will enable you to succeed in this role? We'd love to work with you if you have :

Be initiative-taking, autonomous and a team player in a fast-paced environment.

Good understanding of networking concepts : TCP / IP, DNS, HTTP, TLS, OSI Model.

Good understanding of multi-tier applications, microservices (Docker, Kubernetes etc.)

Experience instrumenting and monitoring cloud hosted software stacks (preferably GCP)

Working knowledge of one or more programming languages (Java, NodeJS, Python, etc.).

Basic knowledge of one or more scripting languages (Ansible, Terraform, Bash etc.).

2-4 years of experience in developing and / or supporting complex, large-scale customer-facing platforms.

Strong working experience with incident management and setting up monitoring alerts.

Have a proficient understanding of code versioning tools, such as Git / Bitbucket.

Knowledge about building a highly automated production monitoring and support model, hands-on experience integrating Splunk, Ansible, Dynatrace, Sumologic, Service now, PagerDuty.com, or equivalents.

Proven ability to translate ideas into technical and business realities and map technology to business problems.

Experience with private / public cloud services and platforms.

Superior verbal and written communication skills with the ability to influence decision-making with stakeholders.

A proactive approach to spotting problems, areas for improvement, and performance bottlenecks.

Exceptional written and verbal communication skills

Excellent problem-solving skills

Flexible approach to work and the ability to adapt to change.

Prior production support or SRE experience.

Proficient with MS suite

Experience working with scalable containerized systems in the public cloud (GCP etc.).

Experience with Docker (or other container runtimes) and Kubernetes.

Experience in building public and internal REST APIs.

Experience with CI / CD tools such as Jenkins.

Experience with fundamental front-end stack : HTML, CSS, and JavaScript .

Experience working with database technology such as SQL server, Oracle.

Experience with the Atlassian tools (JIRA, Confluence).

What's in it for you?

Diversity, Equity, Inclusion & Allyship - We strive to create an inclusive culture where every employee is empowered to reach their fullest potential, respected for who they are, and are embraced through bias-free practices and inclusive values across Scotiabank. We embrace diversity and provide opportunities for all employee to learn, grow & participate through our various Employee Resource Groups (ERGs) that span across diverse gender identities, ethnicity, race, age, ability & veterans.

Accessibility and Workplace Accommodations - We value the unique skills and experiences each individual brings to the Bank and are committed to creating and maintaining an inclusive and accessible environment for everyone. Scotiabank continues to locate, remove and prevent barriers so that we can build a diverse and inclusive environment while meeting accessibility requirements.

Upskilling through online courses, cross-functional development opportunities, and tuition assistance.

Competitive Rewards program including bonus, flexible vacation, personal, sick days and benefits will start on day one.

Community Engagement - no matter where you choose to work from; we offer opportunities for community engagement & belonging with our various programs such as hackathons, contests, cooking with friends, Humans of Digital and much more!

Working location condition : Hybrid

LI-Hybrid

Location(s) :   Canada : Ontario : Toronto

At Tangerine we value the unique skills and experiences each individual brings to the team, and are committed to creating and maintaining an inclusive and accessible environment. If you require accommodation during the recruitment and selection process, please let our Recruitment team know.