Talent.com
Senior Site Reliability / Infrastructure Platform Engineer
Senior Site Reliability / Infrastructure Platform EngineerNextologies Limited • Markham, ON, Canada
Senior Site Reliability / Infrastructure Platform Engineer

Senior Site Reliability / Infrastructure Platform Engineer

Nextologies Limited • Markham, ON, Canada
2 days ago
Job type
  • Full-time
Job description

Job Description

Job Description

Senior Site Reliability / Infrastructure Platform Engineer

(Virtualization, distributed systems, Linux performance, and service reliability)

Responsibilities

  • Act as senior escalation point for service outages, platform failures, and complex distributed systems incidents.
  • Own the architecture, deployment, and reliability of virtualization platforms, storage clusters, and service infrastructure.
  • Design and maintain higher-level service architecture including load balancing, clustering strategies, dependency management, and failure domain modeling
  • Build, operate, and scale virtualization and storage environments (compute clusters, hypervisors, and software-defined storage).
  • Design, deploy, and maintain distributed database platforms including SQL clusters and in-memory data stores.
  • Perform deep Linux systems engineering including kernel, scheduler, memory, IO, and network-stack optimization.
  • Develop and maintain infrastructure automation, CI pipelines, and Git-driven operational workflows.
  • Build and operate backup, snapshot, replication, and disaster-recovery systems. Design recovery procedures and regularly validate restoration paths.
  • Perform capacity planning, performance modeling, and saturation analysis across compute, memory, storage, and network layers.
  • Utilize observability platforms to detect early signals of service degradation and latent reliability risks.
  • Collaborate with application, network, and data center engineering teams to deliver end-to-end resilient platforms.
  • Produce architecture documents, runbooks, failure analyses, and low-level operational design documentation.
  • Lead incident response, root-cause analysis, and reliability improvement initiatives.

Qualifications

  • Strong background as a senior Linux systems engineer, SRE, or infrastructure platform engineer.
  • Proven experience designing and operating large virtualization and storage clusters.
  • Hands-on experience with distributed databases (Galera / Patroni / MySQL / Postgres clusters, Redis / KeyDB, etc.).
  • Strong understanding of service architecture, clustering models, load balancers, and high-availability patterns.
  • Deep Linux expertise including CPU / NUMA tuning, memory management, disk IO pipelines, and network optimization.
  • Experience building and maintaining CI / CD pipelines and Git-based infrastructure workflows.
  • Demonstrated ownership of backup, disaster recovery, and service continuity systems.
  • Strong troubleshooting skills across OS, platform, and application interaction layers.
  • Ability to translate business and service requirements into resilient technical architectures.
  • Strong documentation, communication, and cross-team collaboration skills.
  • Ability to operate effectively during outages, incident response, and recovery scenarios.
  • Nice to Have

  • Experience with Ceph, ZFS, NVMe-oF, or large-scale software-defined storage platforms.
  • Experience with high-performance or low-latency Linux environments.
  • Familiarity with container platforms or hybrid virtualization / container environments.
  • Experience supporting high-bandwidth media, streaming, or real-time service platforms.
  • Exposure to infrastructure-focused AI tooling or automation frameworks.
  • Please be advised

    Technical assessment covering Linux systems engineering, distributed systems concepts, and platform reliability may be conducted prior to interview progression.

    Company Description

    Nextologies has the world's largest broadcast video delivery network specializing in award-winning, broadcast-grade video connectivity for broadcasters and content owners across the globe with instant access to over 65,000 linear TV channels downlinked from 90+ globally-placed satellites.

    In addition, Nextologies is a leader in signal acquisition and delivery providing fiber, IP and custom end-to-end solutions for IPTV and OTT platforms and video-centric applications across all platforms.

    Learn more at .

    10TX by Nextologies is a leading signal transmission company trusted by professional sports leagues, broadcasters, content producers, and entertainment companies to deliver live events and pay-per-view programming worldwide.

    Company Description

    Nextologies has the world's largest broadcast video delivery network specializing in award-winning, broadcast-grade video connectivity for broadcasters and content owners across the globe with instant access to over 65,000 linear TV channels downlinked from 90+ globally-placed satellites.\r\n\r\nIn addition, Nextologies is a leader in signal acquisition and delivery providing fiber, IP and custom end-to-end solutions for IPTV and OTT platforms and video-centric applications across all platforms. \r\nLearn more at by Nextologies is a leading signal transmission company trusted by professional sports leagues, broadcasters, content producers, and entertainment companies to deliver live events and pay-per-view programming worldwide.

    Create a job alert for this search

    Site Reliability Engineer • Markham, ON, Canada

    Similar jobs
    Site Reliability Engineer II — Hybrid Toronto

    Site Reliability Engineer II — Hybrid Toronto

    Fivetran • Toronto, Canada
    Full-time
    A data integration technology company based in Toronto is seeking a full-time Site Reliability Engineer to enhance the reliability of its data platform. This hybrid position offers flexibility along...Show more
    Last updated: 6 days ago • Promoted
    Staff Site Reliability Engineer

    Staff Site Reliability Engineer

    ContactMonkey • Toronto, ON, Canada
    Full-time
    Hey there! We're ContactMonkey 👋.Our mission? To power measurable employee engagement worldwide.And we'd love for you to join us!. About the job - Staff Site Reliability Engineer.You are no...Show more
    Last updated: 30+ days ago • Promoted
    Senior Site Reliability Engineer IV - Banking, Hybrid

    Senior Site Reliability Engineer IV - Banking, Hybrid

    ICONMA • Toronto, Canada
    Full-time
    A financial services company is seeking an IT - Site Reliability Engineer IV for their Toronto, ON hybrid location.This role requires significant experience in technical troubleshooting, ITIL proce...Show more
    Last updated: 1 day ago • Promoted
    Site Reliability Engineer (SRE)

    Site Reliability Engineer (SRE)

    Scotiabank • Toronto, Canada
    Full-time
    Site Reliability Engineer (SRE) – Scotiabank Requisition ID : .Join a purpose‑driven winning team, committed to results, in an inclusive and high‑performing culture. Job Overview As an SRE, you will i...Show more
    Last updated: 3 days ago • Promoted
    Integration Reliability Engineer, Technical Operations, Commerce Systems

    Integration Reliability Engineer, Technical Operations, Commerce Systems

    Stripe • Toronto, Canada
    Full-time
    Stripe is a financial infrastructure platform for businesses.Millions of companies—from the world’s largest enterprises to the most ambitious startups—use Stripe to accept payments, grow their reve...Show more
    Last updated: 1 day ago • Promoted
    Senior Platform Engineer Cloud Infrastructure & DevOps

    Senior Platform Engineer Cloud Infrastructure & DevOps

    Quickplay • Toronto, ON, Canada
    Full-time
    At Quickplay, we are technologists at heart who are passionate about what we do.We believe in transparency, fairness, and collaboration while tackling some of the toughest use cases in OTT video.We...Show more
    Last updated: 30+ days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Verto Health • Toronto, ON, Canada
    Full-time
    At Verto Health, we’re transforming how healthcare organizations connect and collaborate through delivery of digital twin & AI-enabled journeys for population health.Our solutions use pat...Show more
    Last updated: 30+ days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Tecsys Inc. • Toronto, Canada
    Permanent
    Having recognized the advantages of remote work, including employee morale, productivity, reduced commuting on employee wellbeing and the environment, we are proud to be a digital-first company.The...Show more
    Last updated: 30+ days ago • Promoted
    Sales and Customer Service Representative Remote (69k+ per year)

    Sales and Customer Service Representative Remote (69k+ per year)

    HMG Careers • Keswick, Ontario
    Remote
    Full-time
    Quick Apply
    This full-time position offers flexible work hours and ample opportunities for advancement into management roles.You will work remotely to serve clients and offer valuable advice and protection to ...Show more
    Last updated: 3 days ago • Promoted
    Algebra Private Tutoring Jobs Lake Simcoe

    Algebra Private Tutoring Jobs Lake Simcoe

    Superprof • Lake Simcoe, Canada
    Full-time +1
    Superprof is Canada's #1 tutoring platform, and we're actively recruiting passionate tutors! Whether you're a student, a professional, or simply someone who loves teaching, join the largest communi...Show more
    Last updated: 30+ days ago • Promoted
    Systems Engineer

    Systems Engineer

    STACK IT Recruitment • Toronto, ON, Canada
    Full-time +1
    Love complex infrastructure challenges and client-facing problem solving?.This is your chance to lead major tech transformations - from server migrations to cloud upgrades - all while being the go-...Show more
    Last updated: 30+ days ago • Promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    MariaDB plc • Toronto, ON, Canada
    Full-time +1
    MariaDB is making a big impact on the world.Whether you're checking your bank account, buying a coffee, shopping online, making a phone call, listening to music, taking out a loan or ordering t...Show more
    Last updated: 30+ days ago • Promoted
    Senior Site Reliability Engineer : Build Resilient Infra & Automation

    Senior Site Reliability Engineer : Build Resilient Infra & Automation

    ACV Auctions • Toronto, Canada
    Full-time
    A dynamic technology company in Toronto is seeking a Site Reliability Engineer to optimize operational efficiency and drive growth. This role involves coding, maintaining infrastructure solutions, a...Show more
    Last updated: 2 days ago • Promoted
    Staff Site Reliability Engineer, Streaming

    Staff Site Reliability Engineer, Streaming

    Alpaca • Toronto, ON, Canada
    Full-time
    Alpaca is a US-headquartered self-clearing broker-dealer and brokerage infrastructure for stocks, ETFs, options, crypto, fixed income, 24 / 5 trading, and more. Our recent Series C funding round broug...Show more
    Last updated: 30+ days ago • Promoted
    Site Reliability Engineer IV

    Site Reliability Engineer IV

    Global Technical Talent • Toronto, Canada
    Permanent
    Primary Job Title Site Reliability Engineer IV.Alternate / Related Job Titles.Location & Onsite Flexibility Toronto, ON —. Office Address : 66 Wellington Street West, 19th Floor, Toronto, ON.Contract...Show more
    Last updated: 1 day ago • Promoted
    Cloud Service Reliability Engineer

    Cloud Service Reliability Engineer

    Forhyre • Toronto, ON, Canada
    Full-time
    We are looking for someone that is generalist at heart, one who is curious, appreciates complexity, knows or wants to learn when to step back and when to dive deep. We call this role a Cloud Service...Show more
    Last updated: 30+ days ago • Promoted
    Sr. Systems Engineer (Networking)

    Sr. Systems Engineer (Networking)

    PagerDuty • Toronto, ON, Canada
    Full-time
    NYSE : PD) is a global leader in digital operations management.Trusted by nearly half of both the Fortune 500 and the Forbes AI 50, as well as approximately two-thirds of the Fortune 100, PagerDuty i...Show more
    Last updated: 30+ days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    iManage • Toronto, ON, Canada
    Full-time
    SRE is part of a global organization that leverages the latest technology to communicate with our colleagues across the globe. We organize ourselves into distributed teams SRE teams are anchored ...Show more
    Last updated: 30+ days ago • Promoted