Talent.com
SRE for Gen AI App Infrastructure and Operations
SRE for Gen AI App Infrastructure and OperationsAstra North Infoteck Inc. • Laval, Qc
Les candidatures ne sont plus acceptées
SRE for Gen AI App Infrastructure and Operations

SRE for Gen AI App Infrastructure and Operations

Astra North Infoteck Inc. • Laval, Qc
Il y a plus de 30 jours
Type de contrat
  • Temps plein
Description de poste

"AI Infra Ops and SRE engineer

Need to come to office 3 days a week

Skills :

  • Production experience in SRE / Infrastructure / ops for large-scale systems
  • Strong programming / scripting skills (Python, Go, Java, or equivalent)
  • Deep experience with containerization (Docker), orchestration (Kubernetes, etc.)
  • Infrastructure-as-code (Terraform, Helm, CloudFormation, Ansible, etc.)
  • Familiarity with GPU / AI compute clusters, high-performance data storage, and distributed architectures
  • Experience with monitoring / observability / logging / alerting tools (Prometheus, Grafana, ELK / EFK, Datadog, etc.)
  • Networking & systems engineering knowledge (TCP / IP, DNS, routing, load bal-ancing, distributed storage)
  • Solid experience in capacity planning, performance tuning, scaling, and incident response
  • Demonstrated ability to lead RCAs, deploy fixes, and drive reliability improve-ments
  • Experience in regulated environments (financial services, compliance, audit, se-curity) is a strong plus
  • Excellent communication, documentation, and cross-team collaboration skills
  • Proven track record of reducing operational toil via automation

Experience : 8+ years of experience as a Site Reliability Engineer or in a similar role, with hands-on experience in supporting IaaS platforms with networking and system engineer-ing knowledge.

Roles and Responsibilities :

  • Operate, monitor, and maintain the infrastructure supporting GenAI applications (training, inference, feature store, data ingestion, model serving)
  • Design and build automation for core platform capabilities, reducing manual toil
  • Develop and maintain infrastructure-as-code (IaC) for provisioning and managing compute, storage, network, GPU clusters, Kubernetes / container orchestration, etc.
  • Establish, monitor, and enforce SLOs / SLIs / SLAs, error budgets, alerting, and dashboards
  • Lead incident response, root cause analysis (RCA), postmortems, and systemic remediation
  • Perform capacity planning, scaling strategies, workload scheduling, and resource forecasting
  • Optimize cost vs. performance tradeoffs in large-scale compute environments
  • Harden systems for security, compliance, auditability, and data governance
  • Collaborate across teams (cloud engineers, data engineers, infrastructure, secu-rity) to ensure safe deployment, rollout, rollback, and integration of new systems
  • Define disaster recovery (DR) strategies, backup / restore practices, fault toler-ance mechanisms
  • Maintain runbooks, operational playbooks, documentation, and training materials
  • Participate in on-call rotations and respond to production incidents 24 / 7 as needed
  • Continuously evaluate and integrate new tools, frameworks, or technologies to enhance platform reliability"
  • Créer une alerte emploi pour cette recherche

    Ai Infrastructure • Laval, Qc

    Offres similaires
    AI / HPC Sr. Field Solutions Architect / Architecte de solutions senior IA / HPC

    AI / HPC Sr. Field Solutions Architect / Architecte de solutions senior IA / HPC

    CDW Canada • Montreal (administrative region), QC, Canada
    Temps plein
    Field Solutions Architect / Architecte de solutions senior IA / HPC.Chez CDW, nous accomplissons les projets ensemble.La confiance, les relations humaines et l’engagement sont au cœur de la collabora...Voir plus
    Dernière mise à jour : il y a 14 jours • Offre sponsorisée
    Senior Generative AI Engineer

    Senior Generative AI Engineer

    Alexa Translations • Montreal, QC, Canada
    Temps plein
    Alexa Translations provides translation services in the legal, financial, and securities sectors by leveraging proprietary A. Unmatched in speed and quality, our machine translation engine is best-i...Voir plus
    Dernière mise à jour : il y a plus de 30 jours • Offre sponsorisée
    DevOps Engineer - Sky Systems, Inc. (SkySys)

    DevOps Engineer - Sky Systems, Inc. (SkySys)

    Sky Systems, Inc. (SkySys) • montréal, qc, ca
    Temps plein
    Titre / Title : Responsable Cloud et DevOps Azure / Tech Cloud Lead DevOps.Lieu / Location : Montréal – 2 jours sur site / semaine / Montreal – 2 days on site / week. Durée / Duration : 6 mois avec poss...Voir plus
    Dernière mise à jour : il y a 21 jours • Offre sponsorisée
    Trigonometry Private Tutoring Jobs L'epiphanie

    Trigonometry Private Tutoring Jobs L'epiphanie

    Superprof • L'epiphanie, Canada
    Temps plein +1
    Superprof is Canada's #1 tutoring platform, and we're actively recruiting passionate tutors! Whether you're a student, a professional, or simply someone who loves teaching, join the largest communi...Voir plus
    Dernière mise à jour : il y a plus de 30 jours • Offre sponsorisée
    Gen AI Lead – AI / Data (1782)

    Gen AI Lead – AI / Data (1782)

    freelance.ca • Montreal, Canada
    Temporaire
    Hybrid work model, 2 days / week in Montreal Office.Month Contract, 8 hours / day, 40 hours / week.AI / ML, Generative AI, Copilot, ChatGPT, Python, API integration, data pipelines, cloud platforms, Azure,...Voir plus
    Dernière mise à jour : il y a plus de 30 jours • Offre sponsorisée
    DevOps / SRE Engineer (Remote)

    DevOps / SRE Engineer (Remote)

    Rivalry • Montreal, QC, Canada
    Télétravail
    Temps plein
    Rivalry is a startup uniquely positioned to disrupt the dated online gambling space.The founders and staff come from the gaming and esports scene and are now working their way into the betting worl...Voir plus
    Dernière mise à jour : il y a plus de 30 jours • Offre sponsorisée
    Senior Developer / DevOps (AWS)

    Senior Developer / DevOps (AWS)

    Targeted Talent • Montreal, QC, Canada
    Temps plein
    This role is with a company that is a leader in the video streaming industry.This role is great for someone located in Canada looking for a remote role. You will be working in PST working hours.Desi...Voir plus
    Dernière mise à jour : il y a plus de 30 jours • Offre sponsorisée
    Sr. Infrastructure Engineer with Kubernetes - saint-esprit

    Sr. Infrastructure Engineer with Kubernetes - saint-esprit

    Confidential • saint-esprit, qc, ca
    Temps plein
    The role seeks a highly experienced Infrastructure Specialist to spearhead the design, deployment, and operational excellence of a modern cloud-native infrastructure. The ideal candidate must posses...Voir plus
    Dernière mise à jour : il y a 4 jours • Offre sponsorisée
    AWS Bioinformatics Engineer (Full time remote US / Canada) - Juniper Genomics

    AWS Bioinformatics Engineer (Full time remote US / Canada) - Juniper Genomics

    Juniper Genomics • saint-esprit, qc, ca
    Télétravail
    Temps plein
    You have 2-5 years’ experience in high-volume production bioinformatics workflows for WGS and WTS analysis.You've worked in a regulated clinical lab environment and have built tools that help scien...Voir plus
    Dernière mise à jour : il y a 18 jours • Offre sponsorisée
    Senior DevOps Engineer / Dveloppeur DevOps senior

    Senior DevOps Engineer / Dveloppeur DevOps senior

    Anyon Systems Inc. • Dorval, QC, Canada
    Temps plein
    Anyon Systems builds the worlds most advanced superconducting quantum computers from the ground up.Our team spans quantum engineering, hardware design, cryogenics, embedded systems, and high-perfor...Voir plus
    Dernière mise à jour : il y a 21 heures • Offre sponsorisée • Nouvelle offre
    Strategic Cybersecurity SaaS CEO & Growth Architect

    Strategic Cybersecurity SaaS CEO & Growth Architect

    cybersecurity company • Pointe-Claire, Montreal (administrative region), Canada
    Temps plein
    A leading technology firm is seeking a Chief Executive Officer (CEO) in Montreal to lead a dynamic cybersecurity startup. The ideal candidate will have a law degree and a proven track record in exec...Voir plus
    Dernière mise à jour : il y a plus de 30 jours • Offre sponsorisée
    Azure DevOps Engineer - saint-esprit

    Azure DevOps Engineer - saint-esprit

    LTIMindtree • saint-esprit, qc, ca
    Temps plein
    LTIMindtree is an equal opportunity employer that is committed to diversity in the workplace.Our employment decisions are made without regard to race, color, creed, religion, sex (including pregnan...Voir plus
    Dernière mise à jour : il y a 28 jours • Offre sponsorisée
    Développeur •se senior - Architecte de solutions

    Développeur •se senior - Architecte de solutions

    Nexus Innovations • Montreal West, QC, Canada
    Télétravail
    Temps plein
    Développeur •se senior - Architecte de solutions.Rejoignez Nexus Innovations en tant que développeur •se senior – Architecte de solutions. En tant que partenaire technologique des PME, Nexus accompagn...Voir plus
    Dernière mise à jour : il y a plus de 30 jours • Offre sponsorisée
    Spécialiste des operations infonuagique et centre de données / Digital Hosting & Platforms Operations Specialist

    Spécialiste des operations infonuagique et centre de données / Digital Hosting & Platforms Operations Specialist

    Airbus Canada Limited Partnership • Dorval, Quebec, Canada
    Permanent
    English job description follows • • •.Vous avez une expérience dans les centres de données ou la gestion d' infrastructure infonuagique, vous avez travaillé avec des fournisseurs de services infrastru...Voir plus
    Dernière mise à jour : il y a 7 heures • Offre sponsorisée • Nouvelle offre
    AI / HPC Sr. Field Solutions Architect / Architecte de solutions senior IA / HPC

    AI / HPC Sr. Field Solutions Architect / Architecte de solutions senior IA / HPC

    CDW • Montreal, QC, Canada
    Temps plein
    Job Description : \n\nDescription At CDW, we make it happen, together.Trust, connection, and commitment are at the heart of how we work together to deliver for our customers.It’s why we’r...Voir plus
    Dernière mise à jour : il y a 20 jours • Offre sponsorisée
    Sr. Solutions Architect to support ongoing development, maintenance, and integration of software sol

    Sr. Solutions Architect to support ongoing development, maintenance, and integration of software sol

    S.i. Systems • Montreal, QC, Canada
    Temps plein
    Our valued client is looking for a Sr.Solutions Architect to support ongoing development, maintenance, and integration of software solutions in a Devops environment &...Voir plus
    Dernière mise à jour : il y a 8 jours • Offre sponsorisée
    Spécialiste du cloud et de l'IA - secteur public / Cloud and AI Specialist - Public Sector

    Spécialiste du cloud et de l'IA - secteur public / Cloud and AI Specialist - Public Sector

    Microsoft Canada • Montréal, CA
    Temps plein
    Overview Êtes-vous passionné par la transformation infonuagique dans le secteur public? Joignez-vous à notre équipe dynamique en tant que spécialiste Cloud & IA, où vous jouerez un rôle clé dans l'...Voir plus
    Dernière mise à jour : il y a 8 heures • Offre sponsorisée • Nouvelle offre
    Analyste de données touristiques / Conseiller •ère en connaissances stratégiques

    Analyste de données touristiques / Conseiller •ère en connaissances stratégiques

    Tourisme Lanaudière • Rawdon, CA
    Temps plein +1
    L’analyste joue un rôle clé dans la compréhension de la performance touristique de la région.Il ou elle collecte, organise et interprète les données afin d’appuyer les décisions stratégiques liées ...Voir plus
    Dernière mise à jour : il y a 9 jours • Offre sponsorisée