Talent.com
Site Reliability Engineer
Site Reliability EngineerHigh Tech Genesis Inc. • Montreal, Montreal (administrative region), CA
Site Reliability Engineer

Site Reliability Engineer

High Tech Genesis Inc. • Montreal, Montreal (administrative region), CA
30+ days ago
Job type
  • Full-time
Job description

WE'RE HIRING!

At HTG, you’ll push boundaries with the latest tech and collaborate with a team that loves what they do. Be part of a design services company that is amongst the companies that lead the world in technology and innovation.

Your next chapter starts here.

  • Act as the main technical escalation point for first-level operations analysts across cloud, network, and connected device environments.
  • Lead advanced troubleshooting, service restoration, and fault isolation for critical incidents, collaborating with engineering teams when required.
  • Own and manage problem records by conducting detailed root cause analyses, documenting preventive actions, and tracking issue resolution through completion.
  • Prepare and distribute clear and timely communication for customer-facing incident updates and internal post-incident summaries.
  • Identify manual and repetitive operational work and replace it with automated solutions through scripts, scheduled jobs, or self-healing workflows.
  • Define operational data requirements and contribute to refining AI and automation models used in incident management.
  • Establish and maintain performance metrics and service objectives; improve monitoring and reliability through better instrumentation and observability.
  • Implement safeguards and resilience mechanisms within operational systems, while promoting a culture of continuous learning and blameless retrospectives.
  • Maintain and enhance monitoring tools, alerting systems, dashboards, and operational documentation supporting 24 / 7 availability.
  • Tune monitoring thresholds and notifications to reduce noise and ensure only meaningful alerts are surfaced for action.
  • Ensure complete visibility across systems through metrics, logs, and traces for effective diagnostics and performance tracking.
  • Participate in operational readiness reviews and evaluate risk, rollback plans, and change impact before scheduled deployments.
  • Coordinate deployments and maintenance windows, performing verification steps before and after updates.
  • Track and improve deployment reliability and change success rates through post-release reviews and metrics.
  • Manage and operate cloud resources including compute, storage, networking, and identity, following least-privilege and compliance principles.
  • Support observability, access control, and governance standards within the cloud environment, including cost visibility and tagging policies.
  • Oversee integrations with hybrid infrastructure, including connectivity, certificates, and internal networking components.
  • Develop, maintain, and continuously improve operational documentation such as standard procedures, runbooks, and escalation workflows.
  • Ensure the accuracy, version control, and completeness of all operational knowledge materials.
  • Utilize ticketing and workflow systems for managing incidents, problems, and changes, while maintaining visibility into service performance.
  • Collaborate with engineering and DevOps teams to incorporate operational needs into design and deployment processes.
  • Provide training and mentorship to junior analysts, improving first-contact resolution rates and technical skill depth.
  • Communicate effectively with internal teams and external partners regarding incidents, maintenance updates, and service improvements.
  • Uphold security best practices in daily operations, including patch management, credential hygiene, and access reviews.
  • Work with compliance and security teams to address vulnerabilities, audits, and control assessments.
  • Participate in a shared on-call rotation and scheduled maintenance periods, ensuring smooth handovers and consistent shift documentation.
  • The on-call rotation will initially involve 3 to 4 team members, progressing toward full 24 / 7 coverage as the team expands.
  • At least 3 years of experience in network operations, site reliability, or cloud platform support roles managing production systems.
  • Strong understanding of networking, VPNs, firewalls, load balancers, DNS, and certificate management.
  • Hands‑on experience with cloud services including compute, storage, networking, and identity management.
  • Practical experience with both Linux and Windows systems administration.
  • Proficiency in one or more scripting languages such as Python, PowerShell, or Bash, and ability to create dependable automation workflows.
  • Familiarity with monitoring, alerting, and telemetry systems, including the design of meaningful service‑level indicators.
  • Working knowledge of service management platforms and workflow automation tools.
  • Proven ability to write accurate operational documentation, including procedures and troubleshooting guides.
  • Strong communication skills for both technical and customer‑facing interactions.

Preferred Qualifications :

  • Experience with Infrastructure‑as‑Code tools (e.g., Terraform, Bicep) and CI / CD systems.
  • Knowledge of IoT or distributed device management at scale.
  • Understanding of system reliability concepts such as graceful degradation and autoscaling.
  • Exposure to industrial or energy systems involving telemetry, control, or gateway operations.
  • Relevant certifications such as Azure Administrator, Azure Network Engineer, ITIL, or CCNA (or equivalents).
  • High Tech Genesis Inc. is an Equal Opportunity Employer. Diversity and inclusion are at the core of our values.

    Please advise High Tech Genesis of any accommodation measures you may require.

    Please be advised :

  • Applicants must have the legal right to work in Canada.
  • Kindly submit your resume in MS Word format upon application for this position.
  • #J-18808-Ljbffr

    Create a job alert for this search

    Site Reliability Engineer • Montreal, Montreal (administrative region), CA

    Similar jobs
    Lead technique (Anvil Pipeline)

    Lead technique (Anvil Pipeline)

    Ubisoft • Montreal, QC, Canada
    Full-time
    Le pipeline Anvil est utilisé par plus de 4000 personnes sur plusieurs productions de jeux.Grâce à la stratégie Anvil One, chaque modification de code est automatiquement parta...Show more
    Last updated: 30+ days ago • Promoted
    Optical Engineer (On site - Montreal, Canada)

    Optical Engineer (On site - Montreal, Canada)

    Astronics • Dorval, QC, Canada
    Full-time
    Astronics - Luminescent Systems Canada Inc.Reporting to the Director of Engineering, the incumbent will be responsible for optical design, inspection test design, automation software tools.He will ...Show more
    Last updated: 30+ days ago • Promoted
    Superintendent - SiteTalent

    Superintendent - SiteTalent

    SiteTalent • saint-esprit, qc, ca
    Full-time
    General Contractor based out of Vancouver.Your primary responsibility is to support in successful completion of assigned projects which will be in the Northwest Territories, Canada.If you are someo...Show more
    Last updated: 16 hours ago • Promoted • New!
    Teamcenter Administrator – PLM - saint-esprit

    Teamcenter Administrator – PLM - saint-esprit

    Axiom Global Technologies • saint-esprit, qc, ca
    Full-time
    This role is critical to ensuring the stability, performance, security, and effective use of Teamcenter across engineering, manufacturing, and product development teams. System Administration & Main...Show more
    Last updated: 3 days ago • Promoted
    Algebra Private Tutoring Jobs Lanaudi

    Algebra Private Tutoring Jobs Lanaudi

    Superprof • Lanaudi, Canada
    Full-time +1
    Superprof is Canada's #1 tutoring platform, and we're actively recruiting passionate tutors! Whether you're a student, a professional, or simply someone who loves teaching, join the largest communi...Show more
    Last updated: 30+ days ago • Promoted
    DevOps / SRE Engineer (Remote)

    DevOps / SRE Engineer (Remote)

    Rivalry • Montreal, QC, Canada
    Remote
    Full-time
    Rivalry is a startup uniquely positioned to disrupt the dated online gambling space.The founders and staff come from the gaming and esports scene and are now working their way into the betting worl...Show more
    Last updated: 30+ days ago • Promoted
    Sr. Infrastructure Engineer with Kubernetes - saint-esprit

    Sr. Infrastructure Engineer with Kubernetes - saint-esprit

    Confidential • saint-esprit, qc, ca
    Full-time
    The role seeks a highly experienced Infrastructure Specialist to spearhead the design, deployment, and operational excellence of a modern cloud-native infrastructure. The ideal candidate must posses...Show more
    Last updated: 7 days ago • Promoted
    System Integration Engineer

    System Integration Engineer

    Vention • Montreal, QC, Canada
    Full-time
    Headquartered in Montreal and Berlin, Vention helps manufacturers automate their operations in record time with the only hardware and software AI-powered platform built for the factory floor.Our te...Show more
    Last updated: 16 hours ago • Promoted • New!
    Senior System Safety Engineer

    Senior System Safety Engineer

    CS GROUP • Montreal, QC, Canada
    Full-time +1
    CS Group Canada, a subsidiary of CS Group (part of the Sopra Steria Group), is a leader in the development and certification of safety-critical systems in the aerospace, electric, and autonomous dr...Show more
    Last updated: 30+ days ago • Promoted
    Engineer

    Engineer

    K&Y Diamond Limited • Dollard-des-Ormeaux, CA
    Full-time
    The Engineer has the responsibility for carrying out and managing projects for the design, improvement and development of diamond products. He(she) is also responsible for monitoring the needs of cu...Show more
    Last updated: 3 days ago • Promoted
    Senior Analytics Engineer - saint-esprit

    Senior Analytics Engineer - saint-esprit

    TekRek • saint-esprit, qc, ca
    Full-time
    TekRek has partnered with a fast growing firm specializing in AI enablement and data optimization.Our client works with some of the top tier technology firms in Silcom Valley.With rapid growth and ...Show more
    Last updated: 1 day ago • Promoted
    Aircraft Maintenance Engineer M2

    Aircraft Maintenance Engineer M2

    GAL AeroStaff • Pointe-Claire, QC, Canada
    Full-time
    Canadian aerospace company specializing in delivering high-quality technical and operational services to the aviation industry. Our expertise includes aircraft interior, structural, manufacturing, a...Show more
    Last updated: 30+ days ago • Promoted
    Ingénieur(e) civil(e) - ingénierie

    Ingénieur(e) civil(e) - ingénierie

    Produits Kruger • Lanaudière-Nord (Saint-Esprit), ca
    Full-time
    Produits Kruger , nous partageons la même vision de l’excellence.Nous fabriquons certaines des marques de produits de papier les plus populaires de l’Amérique du Nord - Cashmere®, Purex®, Scotties®...Show more
    Last updated: 4 days ago • Promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    Targeted Talent • Montreal, QC, Canada
    Permanent
    We are looking for an experienced.Senior Site Reliability Engineer.Our client is a global enterprise company with a product that you've likely used. Experience with coding / software development, ...Show more
    Last updated: 30+ days ago • Promoted
    Canada - Data Center Commissioning Engineer (CxE) - Mechanical - All Levels

    Canada - Data Center Commissioning Engineer (CxE) - Mechanical - All Levels

    Apollo Mission Critical Engineering • Montreal, QC, Canada
    Permanent
    Apollo is a Mission-Critical Infrastructure Engineering, Construction, Commissioning, and Operations Support Services Company. We serve mission-critical industries globally with commissioning, emerg...Show more
    Last updated: 30+ days ago • Promoted
    Test Engineer

    Test Engineer

    dcbel Inc • Montreal, QC, Canada
    Full-time
    Our flagship product, the dcbel Home Energy Station, is a small wall mounted device that gives everyone ownership over their energy supply by using solar power to charge their EV and home, unlockin...Show more
    Last updated: 30+ days ago • Promoted
    Lead Site Reliability Engineering (SRE)

    Lead Site Reliability Engineering (SRE)

    freelance.ca • Montreal, Canada
    Full-time
    Lead Site Reliability Engineering (SRE).Vous serez responsable de bâtir et de maintenir des pipelines CI / CD partagés, d’implanter des pratiques exemplaires en matière de résilience et de stabilité,...Show more
    Last updated: 30+ days ago • Promoted
    Data Center Site Manager

    Data Center Site Manager

    Salute Inc. • Montreal, QC, Canada
    Full-time
    Salute is a leading provider of cutting-edge Data Center Infrastructure Services, dedicated to serving data center clients worldwide. We pride ourselves on delivering sustainable solutions, unparall...Show more
    Last updated: 11 days ago • Promoted