Senior Site Reliability / Infrastructure Platform EngineerNextologies Limited • Markham, ON, Canada

Senior Site Reliability / Infrastructure Platform Engineer

Nextologies Limited • Markham, ON, Canada

2 days ago

Job type

Full-time

Job description

Job Description

Senior Site Reliability / Infrastructure Platform Engineer

(Virtualization, distributed systems, Linux performance, and service reliability)

Responsibilities

Act as senior escalation point for service outages, platform failures, and complex distributed systems incidents.
Own the architecture, deployment, and reliability of virtualization platforms, storage clusters, and service infrastructure.
Design and maintain higher-level service architecture including load balancing, clustering strategies, dependency management, and failure domain modeling
Build, operate, and scale virtualization and storage environments (compute clusters, hypervisors, and software-defined storage).
Design, deploy, and maintain distributed database platforms including SQL clusters and in-memory data stores.
Perform deep Linux systems engineering including kernel, scheduler, memory, IO, and network-stack optimization.
Develop and maintain infrastructure automation, CI pipelines, and Git-driven operational workflows.
Build and operate backup, snapshot, replication, and disaster-recovery systems. Design recovery procedures and regularly validate restoration paths.
Perform capacity planning, performance modeling, and saturation analysis across compute, memory, storage, and network layers.
Utilize observability platforms to detect early signals of service degradation and latent reliability risks.
Collaborate with application, network, and data center engineering teams to deliver end-to-end resilient platforms.
Produce architecture documents, runbooks, failure analyses, and low-level operational design documentation.
Lead incident response, root-cause analysis, and reliability improvement initiatives.

Qualifications

Strong background as a senior Linux systems engineer, SRE, or infrastructure platform engineer.

Proven experience designing and operating large virtualization and storage clusters.

Hands-on experience with distributed databases (Galera / Patroni / MySQL / Postgres clusters, Redis / KeyDB, etc.).

Strong understanding of service architecture, clustering models, load balancers, and high-availability patterns.

Deep Linux expertise including CPU / NUMA tuning, memory management, disk IO pipelines, and network optimization.

Experience building and maintaining CI / CD pipelines and Git-based infrastructure workflows.

Demonstrated ownership of backup, disaster recovery, and service continuity systems.

Strong troubleshooting skills across OS, platform, and application interaction layers.

Ability to translate business and service requirements into resilient technical architectures.

Strong documentation, communication, and cross-team collaboration skills.

Ability to operate effectively during outages, incident response, and recovery scenarios.

Nice to Have

Experience with Ceph, ZFS, NVMe-oF, or large-scale software-defined storage platforms.

Experience with high-performance or low-latency Linux environments.

Familiarity with container platforms or hybrid virtualization / container environments.

Experience supporting high-bandwidth media, streaming, or real-time service platforms.

Exposure to infrastructure-focused AI tooling or automation frameworks.

Please be advised

Technical assessment covering Linux systems engineering, distributed systems concepts, and platform reliability may be conducted prior to interview progression.

Company Description

In addition, Nextologies is a leader in signal acquisition and delivery providing fiber, IP and custom end-to-end solutions for IPTV and OTT platforms and video-centric applications across all platforms.

Learn more at .

10TX by Nextologies is a leading signal transmission company trusted by professional sports leagues, broadcasters, content producers, and entertainment companies to deliver live events and pay-per-view programming worldwide.

Company Description

Nextologies has the world's largest broadcast video delivery network specializing in award-winning, broadcast-grade video connectivity for broadcasters and content owners across the globe with instant access to over 65,000 linear TV channels downlinked from 90+ globally-placed satellites.\r\n\r\nIn addition, Nextologies is a leader in signal acquisition and delivery providing fiber, IP and custom end-to-end solutions for IPTV and OTT platforms and video-centric applications across all platforms. \r\nLearn more at by Nextologies is a leading signal transmission company trusted by professional sports leagues, broadcasters, content producers, and entertainment companies to deliver live events and pay-per-view programming worldwide.

Create a job alert for this search

Site Reliability Engineer • Markham, ON, Canada

Similar jobs

Site Reliability Engineer II — Hybrid Toronto

Fivetran • Toronto, Canada

Full-time

A data integration technology company based in Toronto is seeking a full-time Site Reliability Engineer to enhance the reliability of its data platform. This hybrid position offers flexibility along...Show more

Last updated: 6 days ago • Promoted

Staff Site Reliability Engineer

ContactMonkey • Toronto, ON, Canada

Full-time

Hey there! We're ContactMonkey 👋.Our mission? To power measurable employee engagement worldwide.And we'd love for you to join us!. About the job - Staff Site Reliability Engineer.You are no...Show more

Last updated: 30+ days ago • Promoted

Senior Site Reliability Engineer IV - Banking, Hybrid

ICONMA • Toronto, Canada

Full-time

A financial services company is seeking an IT - Site Reliability Engineer IV for their Toronto, ON hybrid location.This role requires significant experience in technical troubleshooting, ITIL proce...Show more

Last updated: 1 day ago • Promoted

Site Reliability Engineer (SRE)

Scotiabank • Toronto, Canada

Full-time

Site Reliability Engineer (SRE) – Scotiabank Requisition ID : .Join a purpose‑driven winning team, committed to results, in an inclusive and high‑performing culture. Job Overview As an SRE, you will i...Show more

Last updated: 3 days ago • Promoted

Integration Reliability Engineer, Technical Operations, Commerce Systems

Stripe • Toronto, Canada

Full-time

Stripe is a financial infrastructure platform for businesses.Millions of companies—from the world’s largest enterprises to the most ambitious startups—use Stripe to accept payments, grow their reve...Show more

Last updated: 1 day ago • Promoted

Senior Platform Engineer Cloud Infrastructure & DevOps

Quickplay • Toronto, ON, Canada

Full-time

At Quickplay, we are technologists at heart who are passionate about what we do.We believe in transparency, fairness, and collaboration while tackling some of the toughest use cases in OTT video.We...Show more

Last updated: 30+ days ago • Promoted

Site Reliability Engineer

Verto Health • Toronto, ON, Canada

Full-time

At Verto Health, we’re transforming how healthcare organizations connect and collaborate through delivery of digital twin & AI-enabled journeys for population health.Our solutions use pat...Show more

Last updated: 30+ days ago • Promoted

Site Reliability Engineer

Tecsys Inc. • Toronto, Canada

Permanent

Having recognized the advantages of remote work, including employee morale, productivity, reduced commuting on employee wellbeing and the environment, we are proud to be a digital-first company.The...Show more

Last updated: 30+ days ago • Promoted

Sales and Customer Service Representative Remote (69k+ per year)

HMG Careers • Keswick, Ontario

Remote

Full-time

Quick Apply

This full-time position offers flexible work hours and ample opportunities for advancement into management roles.You will work remotely to serve clients and offer valuable advice and protection to ...Show more

Last updated: 3 days ago • Promoted

Algebra Private Tutoring Jobs Lake Simcoe

Superprof • Lake Simcoe, Canada

Full-time +1

Superprof is Canada's #1 tutoring platform, and we're actively recruiting passionate tutors! Whether you're a student, a professional, or simply someone who loves teaching, join the largest communi...Show more

Last updated: 30+ days ago • Promoted

Systems Engineer

STACK IT Recruitment • Toronto, ON, Canada

Full-time +1

Love complex infrastructure challenges and client-facing problem solving?.This is your chance to lead major tech transformations - from server migrations to cloud upgrades - all while being the go-...Show more

Last updated: 30+ days ago • Promoted

Senior Site Reliability Engineer

MariaDB plc • Toronto, ON, Canada

Full-time +1

MariaDB is making a big impact on the world.Whether you're checking your bank account, buying a coffee, shopping online, making a phone call, listening to music, taking out a loan or ordering t...Show more

Last updated: 30+ days ago • Promoted

Senior Site Reliability Engineer : Build Resilient Infra & Automation

ACV Auctions • Toronto, Canada

Full-time

A dynamic technology company in Toronto is seeking a Site Reliability Engineer to optimize operational efficiency and drive growth. This role involves coding, maintaining infrastructure solutions, a...Show more

Last updated: 2 days ago • Promoted

Staff Site Reliability Engineer, Streaming

Alpaca • Toronto, ON, Canada

Full-time

Alpaca is a US-headquartered self-clearing broker-dealer and brokerage infrastructure for stocks, ETFs, options, crypto, fixed income, 24 / 5 trading, and more. Our recent Series C funding round broug...Show more

Last updated: 30+ days ago • Promoted

Site Reliability Engineer IV

Global Technical Talent • Toronto, Canada

Permanent

Primary Job Title Site Reliability Engineer IV.Alternate / Related Job Titles.Location & Onsite Flexibility Toronto, ON —. Office Address : 66 Wellington Street West, 19th Floor, Toronto, ON.Contract...Show more

Last updated: 1 day ago • Promoted

Cloud Service Reliability Engineer

Forhyre • Toronto, ON, Canada

Full-time

We are looking for someone that is generalist at heart, one who is curious, appreciates complexity, knows or wants to learn when to step back and when to dive deep. We call this role a Cloud Service...Show more

Last updated: 30+ days ago • Promoted

Sr. Systems Engineer (Networking)

PagerDuty • Toronto, ON, Canada

Full-time

NYSE : PD) is a global leader in digital operations management.Trusted by nearly half of both the Fortune 500 and the Forbes AI 50, as well as approximately two-thirds of the Fortune 100, PagerDuty i...Show more

Last updated: 30+ days ago • Promoted

Site Reliability Engineer

iManage • Toronto, ON, Canada

Full-time

SRE is part of a global organization that leverages the latest technology to communicate with our colleagues across the globe. We organize ourselves into distributed teams SRE teams are anchored ...Show more

Last updated: 30+ days ago • Promoted