Talent.com
SRE for Gen AI App Infrastructure and Operations
SRE for Gen AI App Infrastructure and OperationsAstra North Infoteck Inc. • Laval, Qc
No longer accepting applications
SRE for Gen AI App Infrastructure and Operations

SRE for Gen AI App Infrastructure and Operations

Astra North Infoteck Inc. • Laval, Qc
30+ days ago
Job type
  • Full-time
Job description

"AI Infra Ops and SRE engineer

Need to come to office 3 days a week

Skills :

  • Production experience in SRE / Infrastructure / ops for large-scale systems
  • Strong programming / scripting skills (Python, Go, Java, or equivalent)
  • Deep experience with containerization (Docker), orchestration (Kubernetes, etc.)
  • Infrastructure-as-code (Terraform, Helm, CloudFormation, Ansible, etc.)
  • Familiarity with GPU / AI compute clusters, high-performance data storage, and distributed architectures
  • Experience with monitoring / observability / logging / alerting tools (Prometheus, Grafana, ELK / EFK, Datadog, etc.)
  • Networking & systems engineering knowledge (TCP / IP, DNS, routing, load bal-ancing, distributed storage)
  • Solid experience in capacity planning, performance tuning, scaling, and incident response
  • Demonstrated ability to lead RCAs, deploy fixes, and drive reliability improve-ments
  • Experience in regulated environments (financial services, compliance, audit, se-curity) is a strong plus
  • Excellent communication, documentation, and cross-team collaboration skills
  • Proven track record of reducing operational toil via automation

Experience : 8+ years of experience as a Site Reliability Engineer or in a similar role, with hands-on experience in supporting IaaS platforms with networking and system engineer-ing knowledge.

Roles and Responsibilities :

  • Operate, monitor, and maintain the infrastructure supporting GenAI applications (training, inference, feature store, data ingestion, model serving)
  • Design and build automation for core platform capabilities, reducing manual toil
  • Develop and maintain infrastructure-as-code (IaC) for provisioning and managing compute, storage, network, GPU clusters, Kubernetes / container orchestration, etc.
  • Establish, monitor, and enforce SLOs / SLIs / SLAs, error budgets, alerting, and dashboards
  • Lead incident response, root cause analysis (RCA), postmortems, and systemic remediation
  • Perform capacity planning, scaling strategies, workload scheduling, and resource forecasting
  • Optimize cost vs. performance tradeoffs in large-scale compute environments
  • Harden systems for security, compliance, auditability, and data governance
  • Collaborate across teams (cloud engineers, data engineers, infrastructure, secu-rity) to ensure safe deployment, rollout, rollback, and integration of new systems
  • Define disaster recovery (DR) strategies, backup / restore practices, fault toler-ance mechanisms
  • Maintain runbooks, operational playbooks, documentation, and training materials
  • Participate in on-call rotations and respond to production incidents 24 / 7 as needed
  • Continuously evaluate and integrate new tools, frameworks, or technologies to enhance platform reliability"
  • Create a job alert for this search

    Ai Infrastructure • Laval, Qc

    Similar jobs
    Senior Generative AI Engineer

    Senior Generative AI Engineer

    Alexa Translations • Montreal, QC, Canada
    Full-time
    Alexa Translations provides translation services in the legal, financial, and securities sectors by leveraging proprietary A. Unmatched in speed and quality, our machine translation engine is best-i...Show more
    Last updated: 30+ days ago • Promoted
    Ingnieur(e) infonuagique / Cloud Engineer

    Ingnieur(e) infonuagique / Cloud Engineer

    Taiga Motors • Montreal, QC, Canada
    Full-time
    Taiga Motors, une entreprise de technologie et de fabrication de vhicules lectriques hors route en pleine expansion, est la recherche dun(e) ingnieur(e). Dans ce rle, vous serez responsable de la co...Show more
    Last updated: 30+ days ago • Promoted
    AI Engineer - Randstad Digital Americas

    AI Engineer - Randstad Digital Americas

    Randstad Digital Americas • saint-esprit, qc, ca
    Full-time
    We are seeking a Senior AI Engineer to lead the design and development of advanced generative, recommender, and predictive models that drive significant business value. In this role, you will ensure...Show more
    Last updated: 5 hours ago • Promoted • New!
    Architecte Cloud Azure – Pilotage infonuagique Télétravail

    Architecte Cloud Azure – Pilotage infonuagique Télétravail

    Centre de services scolaire Marguerite-Bourgeoys - CSSMB • Montreal (administrative region), QC, Canada
    Full-time
    Une entité éducative recherche un Analyste spécialisé pour moderniser les infrastructures TI.Vous serez responsable de la migration vers des solutions infonuagiques sur Microsoft Azure, pilotant de...Show more
    Last updated: 6 days ago • Promoted
    Senior DevOps Specialist

    Senior DevOps Specialist

    Experlogix • Terrebonne, QC, Canada
    Full-time
    As a Senior DevOps at Experlogix, you will play a crucial role in ensuring the reliability, scalability, and efficiency of our SaaS platforms. You will work closely with our development and operatio...Show more
    Last updated: 6 hours ago • Promoted • New!
    Trigonometry Private Tutoring Jobs L'epiphanie

    Trigonometry Private Tutoring Jobs L'epiphanie

    Superprof • L'epiphanie, Canada
    Full-time +1
    Superprof is Canada's #1 tutoring platform, and we're actively recruiting passionate tutors! Whether you're a student, a professional, or simply someone who loves teaching, join the largest communi...Show more
    Last updated: 30+ days ago • Promoted
    DevOps / SRE Engineer (Remote)

    DevOps / SRE Engineer (Remote)

    Rivalry • Montreal, QC, Canada
    Remote
    Full-time
    Rivalry is a startup uniquely positioned to disrupt the dated online gambling space.The founders and staff come from the gaming and esports scene and are now working their way into the betting worl...Show more
    Last updated: 30+ days ago • Promoted
    Senior Developer / DevOps (AWS)

    Senior Developer / DevOps (AWS)

    Targeted Talent • Montreal, QC, Canada
    Full-time
    This role is with a company that is a leader in the video streaming industry.This role is great for someone located in Canada looking for a remote role. You will be working in PST working hours.Desi...Show more
    Last updated: 30+ days ago • Promoted
    Sr. Infrastructure Engineer with Kubernetes - saint-esprit

    Sr. Infrastructure Engineer with Kubernetes - saint-esprit

    Confidential • saint-esprit, qc, ca
    Full-time
    The role seeks a highly experienced Infrastructure Specialist to spearhead the design, deployment, and operational excellence of a modern cloud-native infrastructure. The ideal candidate must posses...Show more
    Last updated: 5 days ago • Promoted
    Architecte de solutions d'infrastructure / Infrastructure Solution Architect

    Architecte de solutions d'infrastructure / Infrastructure Solution Architect

    Explorance • Montreal, QC, Canada
    Full-time
    Joignez-vous une entreprise base Montral qui aide les organisations du monde entier crer un parcours personnalis d'impact et d'panouissement pour leurs employs. Elle propose des solutions in...Show more
    Last updated: 30+ days ago • Promoted
    Development Manager, Cloud Infrastructure

    Development Manager, Cloud Infrastructure

    LARGIER CONSEILS • Montréal, QC, ca
    Full-time
    Quick Apply
    As a premier Canadian healthcare software leader, our client is at the forefront of modernizing medical systems through business intelligence and high-precision data analytics.Recognized repeatedly...Show more
    Last updated: 15 hours ago • New!
    Orthopédagogue - Écoles primaires des Explorateurs, Saint-Louis et de la Récolte

    Orthopédagogue - Écoles primaires des Explorateurs, Saint-Louis et de la Récolte

    Centre de services scolaire des Samares • Rawdon, QC, Canada
    Full-time
    Remplacement à temps plein du 5 janvier au 29 avril 2026.Viens nous aider à créer un environnement où tous peuvent apprendre et. Le Centre de services scolaire des Samares est à la recherche d’un(e)...Show more
    Last updated: 10 days ago • Promoted
    Senior Solution Engineer - AI Knowledge

    Senior Solution Engineer - AI Knowledge

    Coveo • Montreal, QC, Canada
    Full-time
    Shape smarter customer experiences.Have you ever wanted to help enterprises understand what cutting-edge AI can truly unlock for their customer experience? As our Senior Solution Engineer, you'...Show more
    Last updated: 7 days ago • Promoted
    Senior DevOps Engineer / Dveloppeur DevOps senior

    Senior DevOps Engineer / Dveloppeur DevOps senior

    Anyon Systems Inc. • Dorval, QC, Canada
    Full-time
    Anyon Systems builds the worlds most advanced superconducting quantum computers from the ground up.Our team spans quantum engineering, hardware design, cryogenics, embedded systems, and high-perfor...Show more
    Last updated: 1 day ago • Promoted
    AI / HPC Sr. Field Solutions Architect / Architecte de solutions senior IA / HPC

    AI / HPC Sr. Field Solutions Architect / Architecte de solutions senior IA / HPC

    CDW • Montreal, QC, Canada
    Full-time
    Job Description : \n\nDescription At CDW, we make it happen, together.Trust, connection, and commitment are at the heart of how we work together to deliver for our customers.It’s why we’r...Show more
    Last updated: 21 days ago • Promoted
    AI Systems Developer

    AI Systems Developer

    dcbel Inc • Montreal, QC, Canada
    Full-time
    Our flagship product, the dcbel Home Energy Station, is a small wall mounted device that gives everyone ownership over their energy supply by using solar power to charge their EV and home, unlockin...Show more
    Last updated: 30+ days ago • Promoted
    Senior Software & AI Engineer - Cloud & Security (Hybrid)

    Senior Software & AI Engineer - Cloud & Security (Hybrid)

    Export Development Canada • Ahuntsic North, ca
    Full-time
    A financial crown corporation in Canada is seeking a Software Engineer or Senior Software & AI Engineer to enhance their Digital Delivery team. The role involves working on cloud-based solutions, de...Show more
    Last updated: 1 day ago • Promoted
    Spécialiste du cloud et de l'IA - secteur public / Cloud and AI Specialist - Public Sector

    Spécialiste du cloud et de l'IA - secteur public / Cloud and AI Specialist - Public Sector

    Microsoft Canada • Montréal, CA
    Full-time
    Overview Êtes-vous passionné par la transformation infonuagique dans le secteur public? Joignez-vous à notre équipe dynamique en tant que spécialiste Cloud & IA, où vous jouerez un rôle clé dans l'...Show more
    Last updated: 23 hours ago • Promoted