Site Reliability Engineer – GenAI PlatformAstra North Infoteck Inc. • MONTREAL & MIRABEL, QC, ca

Site Reliability Engineer – GenAI Platform

Astra North Infoteck Inc. • MONTREAL & MIRABEL, QC, ca

Il y a 22 heures

Type de contrat

Temps plein

Quick Apply

Description de poste

Job Description

Experience: 8+ years of experience as a Site Reliability Engineer or in a similar role, with hands-on experience in supporting IaaS platforms with networking and system engineer-ing knowledge.

Roles and Responsibilities:

Operate, monitor, and maintain the infrastructure supporting GenAI applications (training, inference, feature store, data ingestion, model serving)
Design and build automation for core platform capabilities, reducing manual toil
Develop and maintain infrastructure-as-code (IaC) for provisioning and managing compute, storage, network, GPU clusters, Kubernetes / container orchestration, etc.
Establish, monitor, and enforce SLOs/SLIs/SLAs, error budgets, alerting, and dashboards
Lead incident response, root cause analysis (RCA), postmortems, and systemic remediation
Perform capacity planning, scaling strategies, workload scheduling, and resource forecasting
Optimize cost vs. performance tradeoffs in large-scale compute environments
Harden systems for security, compliance, auditability, and data governance
Collaborate across teams (cloud engineers, data engineers, infrastructure, secu-rity) to ensure safe deployment, rollout, rollback, and integration of new systems
Define disaster recovery (DR) strategies, backup/restore practices, fault toler-ance mechanisms
Maintain runbooks, operational playbooks, documentation, and training materials
Participate in on-call rotations and respond to production incidents 24/7 as needed
Continuously evaluate and integrate new tools, frameworks, or technologies to enhance platform reliability

Skills:

Production experience in SRE / Infrastructure / ops for large-scale systems
Strong programming/scripting skills (Python, Go, Java, or equivalent)
Deep experience with containerization (Docker), orchestration (Kubernetes, etc.)
Infrastructure-as-code (Terraform, Helm, CloudFormation, Ansible, etc.)
Familiarity with GPU / AI compute clusters, high-performance data storage, and distributed architectures
Experience with monitoring / observability / logging / alerting tools (Prometheus, Grafana, ELK / EFK, Datadog, etc.)
Networking & systems engineering knowledge (TCP/IP, DNS, routing, load bal-ancing, distributed storage)
Solid experience in capacity planning, performance tuning, scaling, and incident response
Demonstrated ability to lead RCAs, deploy fixes, and drive reliability improve-ments
Experience in regulated environments (financial services, compliance, audit, se-curity) is a strong plus
Excellent communication, documentation, and cross-team collaboration skills
Proven track record of reducing operational toil via automation

Requirements
Android and iOS

Créer une alerte emploi pour cette recherche

Site Reliability Engineer GenAI Platform • MONTREAL & MIRABEL, QC, ca

Offres similaires

Senior Flight Systems Engineer

Cessna Aircraft Company • Mirabel

Temps plein

Une entreprise aérospatiale recherche un Spécialiste en Systèmes pour gérer l'intégration des modifications et évaluer la sécurité des systèmes.Le candidat idéal a au moins 10 ans d’expérience dans...Voir plus

Dernière mise à jour : il y a 26 jours • Offre sponsorisée

M365/Gen AI Engineer - mirabel

APEX-TEK PLACEMENT CONSULTANTS PRIVATE LIMITED • mirabel, qc, ca

Temps plein

Job description & Roles and responsibilities.The M365/GenAI Engineer designs, builds, and supports secure integrations, connectors, and operational controls across Microsoft 365 and enterprise LLM ...Voir plus

Dernière mise à jour : il y a 1 jour • Offre sponsorisée

M365/Gen AI Engineer

APEX-TEK PLACEMENT CONSULTANTS PRIVATE LIMITED • mirabel, qc, ca

Temps plein

Dernière mise à jour : il y a 1 jour • Offre sponsorisée

Solutions Engineer - mirabel

Meld • mirabel, qc, ca

Temps plein

Meld is a fast growing startup looking to add developer support for customers who use our API driven platform for managing their crypto related integrations.We're focused on helping money move on c...Voir plus

Dernière mise à jour : il y a 7 jours • Offre sponsorisée

Avionics Systems Architect – Next‑Gen Aircraft

Airbus Atlantique Canada Inc. • Mirabel

Temps plein

Une entreprise aéronautique majeure basée à Mirabel, au Canada, recherche un Spécialiste en Architecture Système Avionique pour rejoindre son équipe.Le candidat idéal aura au moins 5 ans d'expérien...Voir plus

Dernière mise à jour : il y a 4 jours • Offre sponsorisée

Azure DevOps Engineer - saint-jérôme

LTIMindtree • saint-jérôme, qc, ca

Temps plein

LTIMindtree is an equal opportunity employer that is committed to diversity in the workplace.Our employment decisions are made without regard to race, color, creed, religion, sex (including pregnan...Voir plus

Dernière mise à jour : il y a plus de 30 jours • Offre sponsorisée

Siteminder IAM Expert

Software International • Mirabel

Temps plein

Software International (SI) supplies technical talent to a variety of Fortune 100/500/1000 and other companies in Canada/US.We are currently hiring for a Siteminder IAM Expert for our Fortune 500 c...Voir plus

Dernière mise à jour : il y a 26 jours • Offre sponsorisée

M365/Gen AI Engineer - saint-jérôme

APEX-TEK PLACEMENT CONSULTANTS PRIVATE LIMITED • saint-jérôme, qc, ca

Temps plein

Dernière mise à jour : il y a 1 jour • Offre sponsorisée

Snowflake Cortex expert - mirabel

Amaris Consulting • mirabel, qc, ca

Temps plein

Snowflake Cortex & Snowpark Specialist.AI-driven solutions within the Snowflake Data Cloud.You will work closely with Data Engineering, Architecture, and Business teams to build scalable pipelines,...Voir plus

Dernière mise à jour : il y a 1 jour • Offre sponsorisée

Senior Full-Stack Engineer – Green Tech Impact & Growth

EffectiV HVAC Inc. • Blainville

Temps plein

A rapidly growing technology firm in Quebec seeks a Senior Full Stack Analyst/Programmer to develop and maintain software systems.You will be responsible for both legacy and modern applications usi...Voir plus

Dernière mise à jour : il y a 8 jours • Offre sponsorisée

Chef d’équipe outillage, spécialiste en conception / Tooling Group Lead, Design Specialist

Raytheon Technologies • Mirabel

Temps plein

CA-QC-MIRABEL-M01 ~ 11155 Julien-Audette ~ M01 BLDG.Chef d’équipe outillage, spécialiste en conception.À propos de Pratt & Whitney Canada.Pratt & Whitney Canada (P&WC) est un leader mondial de l’in...Voir plus

Dernière mise à jour : il y a 5 jours • Offre sponsorisée

Solutions Engineer

Meld • mirabel, qc, ca

Temps plein

Dernière mise à jour : il y a 7 jours • Offre sponsorisée

Repair Design Engineer: Shape Critical Maintenance

Expleo Group • Mirabel

Temps plein

Une entreprise d'ingénierie à Mirabel, QC, recherche un Ingénieur en conception de réparations pour fournir des solutions techniques, gérer les questions des clients et améliorer les processus de r...Voir plus

Dernière mise à jour : il y a 26 jours • Offre sponsorisée

Propulsion Systems Specialist — Hybrid Work & Growth

Airbus • Mirabel

Temps plein

Une entreprise d'aéronautique basée à Mirabel (Québec) recherche un Spécialiste des systèmes de propulsion pour rejoindre son équipe d'Ingénierie.Vous travaillerez sur le développement de solutions...Voir plus

Dernière mise à jour : il y a 6 jours • Offre sponsorisée

Sr Systems Engineer - Spacecraft Flight Dynamics

MDA • Sainte-Anne-de-Bellevue

Temps plein

For those who dream of advancing our space in the Universe and on Earth, we will take you there.MDA is an international space mission partner and pioneer in robotics & space operations, satellite s...Voir plus

Dernière mise à jour : il y a 26 jours • Offre sponsorisée

Senior Full Stack Engineer - saint-jérôme

Luxoft • saint-jérôme, qc, ca

Temps plein

Luxoft is looking for a Full-stack Developer who would be working with our Customer - one of the world's largest investment management companies.Based in Southern California, our client manages clo...Voir plus

Dernière mise à jour : il y a 1 jour • Offre sponsorisée

Senior Full Stack Engineer

Luxoft • saint-jérôme, qc, ca

Temps plein

Dernière mise à jour : il y a 1 jour • Offre sponsorisée

Snowflake Cortex expert - saint-jérôme

Amaris Consulting • saint-jérôme, qc, ca

Temps plein

Dernière mise à jour : il y a 1 jour • Offre sponsorisée