Talent.com
Site Reliability Engineer
Site Reliability EngineerALLTECH CONSULTING SVC INC • Toronto, Canada
Site Reliability Engineer

Site Reliability Engineer

ALLTECH CONSULTING SVC INC • Toronto, Canada
26 days ago
Job type
  • Full-time
Job description

Job Description :

Technology / Role / Department at our Company Enterprise Technology & Services (ETS) delivers shared technology services for the Firm supporting all business applications and end users. ETS provides capabilities for all stages of the Firm’s software development lifecycle, enabling productive coding, functional and integration testing, application releases, and ongoing monitoring and support for over 3,000 production applications. ETS also delivers all workplace technologies (desktop, mobile, voice, video, productivity, intranet / internet) in integrated configurations that boost the personal productivity of our employees.

Application and end user services are delivered on a scalable, secure, and reliable infrastructure composed of seamlessly integrated datacenter, network, compute, cloud, storage, and database services. Application Infrastructure (AI) strives to maximize the business application developers’ productivity by centrally providing the core development lifecycle tools, core reusable software libraries and middleware thus minimizing duplicative efforts across silos. We are also focusing on the lifecycle into production and provide tooling to monitor systems, applications, hosts, logs and infrastructure inventory.

Our goal is to provide infrastructure that is broadly reusable, scalable, reliable and highly performant to meet the demanding needs of our applications.

Job Responsibilities : The Company’s Development Environment department is seeking a Site Reliability Engineer to drive reliability engineering, operational support, and customer consultation services for key products. MSDE is part of the Application Infrastructure organization and is responsible for shaping the SDLC within the Company by implementing the tools, systems, and processes used by 17,000+ developers for software development and deployment.

Reporting to the SRE Lead for MSDEs Engineered products, this role requires growing SRE capabilities to deliver reliable systems efficiently and understanding MSDEs products thoroughly to maximize developer productivity across the Firm.

This is a production-side, operational role requiring participation in an on-call rotation and strong influencing skills among technical stakeholders. Much of the daily operations can be delegated to team ops staff.

The successful candidate may be a Python developer aiming to evolve into reliability engineering or a strong operational lead with Python experience. Prior experience in finance is not required; candidates from software or other industries are welcome.

Job Responsibilities :

  • Building and maintaining comprehensive knowledge of the Company’s development environment
  • Maximizing system availability and performance through automation, problem management, and architecture reviews
  • Reducing support costs via operational issue elimination, automation, operational tool development, and client self-service
  • Identifying and prioritizing technical debt impacting productivity, reliability, or support efficiency
  • Collaborating with other SREs to share solutions
  • Troubleshooting complex environment issues
  • Enhancing Ops team knowledge and support capabilities to reduce escalations
  • Consulting with development teams to improve productivity and troubleshoot issues
  • Experimenting with new tools and techniques
  • Sharing on-call responsibilities within the global team

Required Qualifications / Skills :

  • Strong Linux troubleshooting skills
  • Automation experience in any language, preferably Python
  • Experience with monitoring / observability tools like Prometheus and Grafana
  • Familiarity with version control, issue tracking, CI / CD, automated testing, and deployment automation tools
  • Excellent communication and collaboration skills
  • Desired Skills :

  • Knowledge of SRE practices like SLOs, error budgets, blameless postmortems, toil reduction
  • Experience with Docker / Kubernetes
  • #J-18808-Ljbffr

    Create a job alert for this search

    Site Reliability Engineer • Toronto, Canada

    Similar jobs
    Site Reliability Engineer

    Site Reliability Engineer

    Staples • Richmond Hill
    Full-time
    The Site Reliability Engineer (SRE) is responsible for ensuring the reliability, availability, and operational excellence of Staples Canada’s digital platforms. This role supports production systems...Show more
    Last updated: 19 days ago • Promoted
    Site Reliability Engineer 3

    Site Reliability Engineer 3

    Behavox • Toronto
    Full-time
    Behavox is shaping the future of how businesses harness their most important raw material - data.Our mission is bold : Organize enterprise data into actionable information that protects and promotes...Show more
    Last updated: 19 days ago • Promoted
    Senior Site Reliability Engineer (SRE)

    Senior Site Reliability Engineer (SRE)

    Acquird.io • Toronto
    Full-time
    B2B SaaS company, teams are based out of North America.Role is 95% remote in Toronto (we meetup 1x a month).Must be able to legally work in Canada (visa or sponsorship won't be provided).Our Platfo...Show more
    Last updated: 19 days ago • Promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    Tubi, Inc. • Toronto
    Full-time
    Boldly built for every fandom, Tubi is a free streaming service that entertains over 100 million monthly active users.Tubi offers the world's largest collection of Hollywood movies and TV shows, th...Show more
    Last updated: 19 days ago • Promoted
    Site Reliability Engineer, Inference Infrastructure

    Site Reliability Engineer, Inference Infrastructure

    Cohere • Toronto
    Full-time
    Our mission is to scale intelligence to serve humanity.We’re training and deploying frontier models for developers and enterprises who are building AI systems to power magical experiences like cont...Show more
    Last updated: 19 days ago • Promoted
    Azure Site Reliability Engineer

    Azure Site Reliability Engineer

    Epsilon Solutions Ltd. • Toronto
    Full-time
    Team Lead Recruitment @ Epsilon Solutions Ltd.Azure Site Reliability Engineer.Implement and maintain monitoring systems to proactively identify potential issues and alert engineers to problems befo...Show more
    Last updated: 19 days ago • Promoted
    Site Reliability Engineer, AI / ML Infrastructure

    Site Reliability Engineer, AI / ML Infrastructure

    Boson AI • Toronto
    Full-time
    We2;re looking for a Senior Site Reliability Engineer to help us run one of the most exciting GPU clusters aroundour Toronto datacenter packed with NVIDIA H100 and A100 GPUs, over 20PB of Ceph stor...Show more
    Last updated: 19 days ago • Promoted
    Sr. Manager, Site Reliability Engineering

    Sr. Manager, Site Reliability Engineering

    OpenText • Richmond Hill
    Full-time
    OpenText - The Information Company.OpenText is a global leader in information management, where innovation, creativity, and collaboration are the key components of our corporate culture.As a member...Show more
    Last updated: 19 days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Aarorn Technologies Inc • Toronto
    Full-time
    Toronto, ON (3x onsite a week).We are seeking a skilled Site Reliability Engineer (SRE) to enhance the reliability, scalability, and performance of our systems and applications.The ideal candidate ...Show more
    Last updated: 19 days ago • Promoted
    Global SaaS Site Reliability Engineer

    Global SaaS Site Reliability Engineer

    Kong • Toronto
    Full-time
    A leading developer of cloud API technologies is seeking a Site Reliability Engineer to join their global Platform SRE team in Toronto, Ontario. The role involves managing and scaling a multi-region...Show more
    Last updated: 19 days ago • Promoted
    Senior Site Reliability Engineer, Kong Konnect

    Senior Site Reliability Engineer, Kong Konnect

    Kong Inc. • Toronto
    Full-time
    Senior Site Reliability Engineer, Kong Konnect.This range is provided by Kong Inc.Your actual pay will be based on your skills and experience — talk with your recruiter to learn more.Are you ready ...Show more
    Last updated: 19 days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Manulife • Toronto
    Full-time
    We are seeking a motivated Site Reliability Engineer (SRE) to join the Manulife Bank Service Delivery Management (SDM) team. In this role, you will be responsible for ensuring the reliability, avail...Show more
    Last updated: 9 days ago • Promoted
    Site Reliability Engineer II

    Site Reliability Engineer II

    Electronic Arts • Toronto, Canada
    Full-time
    Electronic Arts creates next-level entertainment experiences that inspire players and fans around the world.Here, everyone is part of the story. Part of a community that connects across the globe.A ...Show more
    Last updated: 26 days ago • Promoted
    Lead Site Reliability Engineer

    Lead Site Reliability Engineer

    Movable Ink • Toronto
    Full-time
    Movable Ink scales content personalization for marketers through data-activated content generation and AI decisioning.The world’s most innovative brands rely on Movable Ink to maximize revenue, sim...Show more
    Last updated: 22 hours ago • Promoted • New!
    【正社員】SRE / Site Reliability Engineer

    【正社員】SRE / Site Reliability Engineer

    Kyash • Toronto, Canada
    Overview 株式会社Kyash 正社員【SRE / Site Reliability Engineer】.Kyashはデジタルウォレットアプリで、サービスインフラの設計・構築・運用を担うSREを募集します。金融サービスあるため、SREチームはシステムの信頼性向上を最重要ミッションとしています。. PCI DSS等のセキュリティ要件に沿った設計・ルール遵守体制の構築、ドキュメント整備.Be...Show more
    Last updated: 26 days ago • Promoted
    Site Reliability Engineer - FedRAMP (Toronto - Canada) NEW

    Site Reliability Engineer - FedRAMP (Toronto - Canada) NEW

    Confluent Inc • Toronto
    Full-time
    Site Reliability Engineer - FedRAMP (Toronto - Canada).We’re not just building better tech.We’re rewriting how data moves and what the world can do with it. With Confluent, data doesn’t sit still.Ou...Show more
    Last updated: 19 days ago • Promoted
    Site Reliability Engineer (SRE)

    Site Reliability Engineer (SRE)

    Tangerine Bank • Toronto
    Full-time +1
    Press Tab to Move to Skip to Content Link.Select how often (in days) to receive an alert : .Tangerine is Canada’s leading direct bank. We offer flexible and accessible banking options, innovative prod...Show more
    Last updated: 19 days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    STAPLES Canada • Richmond Hill
    Full-time
    The Site Reliability Engineer (SRE) is responsible for ensuring the reliability, availability, and operational excellence of Staples Canada’s digital platforms. This role supports production systems...Show more
    Last updated: 19 days ago • Promoted