Talent.com
Senior Site Reliability / Infrastructure Platform Engineer
Senior Site Reliability / Infrastructure Platform EngineerNextologies Limited • Markham, ON, Canada
Senior Site Reliability / Infrastructure Platform Engineer

Senior Site Reliability / Infrastructure Platform Engineer

Nextologies Limited • Markham, ON, Canada
3 days ago
Job type
  • Full-time
Job description

Job Description

Job Description

Senior Site Reliability / Infrastructure Platform Engineer

(Virtualization, distributed systems, Linux performance, and service reliability)

Responsibilities

  • Act as senior escalation point for service outages, platform failures, and complex distributed systems incidents.
  • Own the architecture, deployment, and reliability of virtualization platforms, storage clusters, and service infrastructure.
  • Design and maintain higher-level service architecture including load balancing, clustering strategies, dependency management, and failure domain modeling
  • Build, operate, and scale virtualization and storage environments (compute clusters, hypervisors, and software-defined storage).
  • Design, deploy, and maintain distributed database platforms including SQL clusters and in-memory data stores.
  • Perform deep Linux systems engineering including kernel, scheduler, memory, IO, and network-stack optimization.
  • Develop and maintain infrastructure automation, CI pipelines, and Git-driven operational workflows.
  • Build and operate backup, snapshot, replication, and disaster-recovery systems. Design recovery procedures and regularly validate restoration paths.
  • Perform capacity planning, performance modeling, and saturation analysis across compute, memory, storage, and network layers.
  • Utilize observability platforms to detect early signals of service degradation and latent reliability risks.
  • Collaborate with application, network, and data center engineering teams to deliver end-to-end resilient platforms.
  • Produce architecture documents, runbooks, failure analyses, and low-level operational design documentation.
  • Lead incident response, root-cause analysis, and reliability improvement initiatives.

Qualifications

  • Strong background as a senior Linux systems engineer, SRE, or infrastructure platform engineer.
  • Proven experience designing and operating large virtualization and storage clusters.
  • Hands-on experience with distributed databases (Galera / Patroni / MySQL / Postgres clusters, Redis / KeyDB, etc.).
  • Strong understanding of service architecture, clustering models, load balancers, and high-availability patterns.
  • Deep Linux expertise including CPU / NUMA tuning, memory management, disk IO pipelines, and network optimization.
  • Experience building and maintaining CI / CD pipelines and Git-based infrastructure workflows.
  • Demonstrated ownership of backup, disaster recovery, and service continuity systems.
  • Strong troubleshooting skills across OS, platform, and application interaction layers.
  • Ability to translate business and service requirements into resilient technical architectures.
  • Strong documentation, communication, and cross-team collaboration skills.
  • Ability to operate effectively during outages, incident response, and recovery scenarios.
  • Nice to Have

  • Experience with Ceph, ZFS, NVMe-oF, or large-scale software-defined storage platforms.
  • Experience with high-performance or low-latency Linux environments.
  • Familiarity with container platforms or hybrid virtualization / container environments.
  • Experience supporting high-bandwidth media, streaming, or real-time service platforms.
  • Exposure to infrastructure-focused AI tooling or automation frameworks.
  • Please be advised

    Technical assessment covering Linux systems engineering, distributed systems concepts, and platform reliability may be conducted prior to interview progression.

    Company Description

    Nextologies has the world's largest broadcast video delivery network specializing in award-winning, broadcast-grade video connectivity for broadcasters and content owners across the globe with instant access to over 65,000 linear TV channels downlinked from 90+ globally-placed satellites.

    In addition, Nextologies is a leader in signal acquisition and delivery providing fiber, IP and custom end-to-end solutions for IPTV and OTT platforms and video-centric applications across all platforms.

    Learn more at .

    10TX by Nextologies is a leading signal transmission company trusted by professional sports leagues, broadcasters, content producers, and entertainment companies to deliver live events and pay-per-view programming worldwide.

    Company Description

    Nextologies has the world's largest broadcast video delivery network specializing in award-winning, broadcast-grade video connectivity for broadcasters and content owners across the globe with instant access to over 65,000 linear TV channels downlinked from 90+ globally-placed satellites.\r\n\r\nIn addition, Nextologies is a leader in signal acquisition and delivery providing fiber, IP and custom end-to-end solutions for IPTV and OTT platforms and video-centric applications across all platforms. \r\nLearn more at by Nextologies is a leading signal transmission company trusted by professional sports leagues, broadcasters, content producers, and entertainment companies to deliver live events and pay-per-view programming worldwide.

    Create a job alert for this search

    Site Reliability Engineer • Markham, ON, Canada

    Similar jobs
    Staff Site Reliability Engineer

    Staff Site Reliability Engineer

    ContactMonkey • Toronto, ON, Canada
    Full-time
    Hey there! We're ContactMonkey 👋.Our mission? To power measurable employee engagement worldwide.And we'd love for you to join us!. About the job - Staff Site Reliability Engineer.You are no...Show more
    Last updated: 30+ days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    freelance.ca • Toronto, Canada
    Full-time
    If you are fine with below JD please share me your Updated resume ASAP.Site Reliability EngineerLocation : TORONTO (ONSITE)Duration : 6 monthsExp Required : 10 YearsJob Description : Job Title : SRETec...Show more
    Last updated: 30+ days ago • Promoted
    Integration Reliability Engineer, Technical Operations, Commerce Systems

    Integration Reliability Engineer, Technical Operations, Commerce Systems

    Stripe • Toronto, Canada
    Full-time
    Stripe is a financial infrastructure platform for businesses.Millions of companies—from the world’s largest enterprises to the most ambitious startups—use Stripe to accept payments, grow their reve...Show more
    Last updated: 2 days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Verto Health • Toronto, ON, Canada
    Full-time
    At Verto Health, we’re transforming how healthcare organizations connect and collaborate through delivery of digital twin & AI-enabled journeys for population health.Our solutions use pat...Show more
    Last updated: 30+ days ago • Promoted
    Staff Site Reliability Engineer, Database

    Staff Site Reliability Engineer, Database

    Alpaca • Toronto, ON, Canada
    Full-time
    Alpaca is a US-headquartered self-clearing broker-dealer and brokerage infrastructure for stocks, ETFs, options, crypto, fixed income, 24 / 5 trading, and more. Our recent Series C funding round broug...Show more
    Last updated: 30+ days ago • Promoted
    Remote Data Infrastructure & Platform Engineering Lead

    Remote Data Infrastructure & Platform Engineering Lead

    jobr.pro • Toronto C6A, ON, Canada
    Remote
    Full-time
    A leading technology firm in Toronto is looking for a Manager of Data Infrastructure to lead and mentor a team of engineers. The successful candidate will be responsible for overseeing data lake ope...Show more
    Last updated: 5 days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Tecsys Inc. • Toronto, Canada
    Permanent
    Having recognized the advantages of remote work, including employee morale, productivity, reduced commuting on employee wellbeing and the environment, we are proud to be a digital-first company.The...Show more
    Last updated: 30+ days ago • Promoted
    Sales and Customer Service Representative Remote (69k+ per year)

    Sales and Customer Service Representative Remote (69k+ per year)

    HMG Careers • Keswick, Ontario
    Remote
    Full-time
    Quick Apply
    This full-time position offers flexible work hours and ample opportunities for advancement into management roles.You will work remotely to serve clients and offer valuable advice and protection to ...Show more
    Last updated: 3 days ago • Promoted
    Azure & IT Infrastructure Partner — Platform Leader

    Azure & IT Infrastructure Partner — Platform Leader

    Options Consulting Solutions • Toronto C6A, ON, Canada
    Full-time
    A non-governmental organization is seeking an Infrastructure and Platforms Partner to manage IT infrastructure and software applications. This role requires expertise in Microsoft technologies, part...Show more
    Last updated: 15 days ago • Promoted
    Senior Site Reliability Developer

    Senior Site Reliability Developer

    Autodesk • Toronto, Canada
    Full-time
    Senior Site Reliability Engineer.Position Overview We are seeking a highly motivated and experienced Senior Site Reliability Engineer (SRE) to manage critical cloud infrastructure and site reliabil...Show more
    Last updated: 6 days ago • Promoted
    Algebra Private Tutoring Jobs Lake Simcoe

    Algebra Private Tutoring Jobs Lake Simcoe

    Superprof • Lake Simcoe, Canada
    Full-time +1
    Superprof is Canada's #1 tutoring platform, and we're actively recruiting passionate tutors! Whether you're a student, a professional, or simply someone who loves teaching, join the largest communi...Show more
    Last updated: 30+ days ago • Promoted
    Construction Site Supervisor

    Construction Site Supervisor

    Zgemi Inc. • Greater Toronto Area, Canada
    Full-time +1
    We provide design, development, renovation, restoration, and remodelling for hospitality, residential, commercial, and institutional buildings. At Zgemi, every project step will be handled by our ex...Show more
    Last updated: 30+ days ago • Promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    MariaDB plc • Toronto, ON, Canada
    Full-time +1
    MariaDB is making a big impact on the world.Whether you're checking your bank account, buying a coffee, shopping online, making a phone call, listening to music, taking out a loan or ordering t...Show more
    Last updated: 30+ days ago • Promoted
    Cloud & Reliability Leader — Kubernetes & SRE

    Cloud & Reliability Leader — Kubernetes & SRE

    Nulogy • Toronto C6A, ON, Canada
    Full-time
    A leading manufacturing technology company in Toronto seeks a Head of Infrastructure to define and execute the strategic vision for cloud operations. The role involves ensuring high availability of ...Show more
    Last updated: 30+ days ago • Promoted
    Staff Site Reliability Engineer (SRE), Federal- Security and Data Systems

    Staff Site Reliability Engineer (SRE), Federal- Security and Data Systems

    Okta • Toronto C6A, ON, Canada
    Full-time
    Staff Site Reliability Engineer (SRE), Security and Data Systems.Okta is The World’s Identity Company.We free everyone to safely use any technology, anywhere, on any device or app.Our flexible and ...Show more
    Last updated: 30+ days ago • Promoted
    Memory FW Engineer

    Memory FW Engineer

    TekWissen ® • Markham, ON, Canada
    Temporary
    Location : Markham, ON / Vancouver, BC.Job Type : Temporary Assignment.TekWissen is a global workforce management provider headquartered in Ann Arbor, Michigan that offers strategic talent solutions ...Show more
    Last updated: 1 day ago • Promoted
    Site Reliability Engineer (SRE)

    Site Reliability Engineer (SRE)

    Scotiabank • Toronto C6A, ON, Canada
    Full-time
    Site Reliability Engineer (SRE) – Scotiabank.Join a purpose‑driven winning team, committed to results, in an inclusive and high‑performing culture. As an SRE, you will implement, measure, and gather...Show more
    Last updated: 3 days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    iManage • Toronto, ON, Canada
    Full-time
    SRE is part of a global organization that leverages the latest technology to communicate with our colleagues across the globe. We organize ourselves into distributed teams SRE teams are anchored ...Show more
    Last updated: 30+ days ago • Promoted