Talent.com
Site Reliability Engineer
Site Reliability EngineerHigh Tech Genesis Inc. • Montreal, Montreal (administrative region), CA
Site Reliability Engineer

Site Reliability Engineer

High Tech Genesis Inc. • Montreal, Montreal (administrative region), CA
30+ days ago
Job type
  • Full-time
Job description

WE'RE HIRING!

At HTG, you’ll push boundaries with the latest tech and collaborate with a team that loves what they do. Be part of a design services company that is amongst the companies that lead the world in technology and innovation.

Your next chapter starts here.

  • Act as the main technical escalation point for first-level operations analysts across cloud, network, and connected device environments.
  • Lead advanced troubleshooting, service restoration, and fault isolation for critical incidents, collaborating with engineering teams when required.
  • Own and manage problem records by conducting detailed root cause analyses, documenting preventive actions, and tracking issue resolution through completion.
  • Prepare and distribute clear and timely communication for customer-facing incident updates and internal post-incident summaries.
  • Identify manual and repetitive operational work and replace it with automated solutions through scripts, scheduled jobs, or self-healing workflows.
  • Define operational data requirements and contribute to refining AI and automation models used in incident management.
  • Establish and maintain performance metrics and service objectives; improve monitoring and reliability through better instrumentation and observability.
  • Implement safeguards and resilience mechanisms within operational systems, while promoting a culture of continuous learning and blameless retrospectives.
  • Maintain and enhance monitoring tools, alerting systems, dashboards, and operational documentation supporting 24 / 7 availability.
  • Tune monitoring thresholds and notifications to reduce noise and ensure only meaningful alerts are surfaced for action.
  • Ensure complete visibility across systems through metrics, logs, and traces for effective diagnostics and performance tracking.
  • Participate in operational readiness reviews and evaluate risk, rollback plans, and change impact before scheduled deployments.
  • Coordinate deployments and maintenance windows, performing verification steps before and after updates.
  • Track and improve deployment reliability and change success rates through post-release reviews and metrics.
  • Manage and operate cloud resources including compute, storage, networking, and identity, following least-privilege and compliance principles.
  • Support observability, access control, and governance standards within the cloud environment, including cost visibility and tagging policies.
  • Oversee integrations with hybrid infrastructure, including connectivity, certificates, and internal networking components.
  • Develop, maintain, and continuously improve operational documentation such as standard procedures, runbooks, and escalation workflows.
  • Ensure the accuracy, version control, and completeness of all operational knowledge materials.
  • Utilize ticketing and workflow systems for managing incidents, problems, and changes, while maintaining visibility into service performance.
  • Collaborate with engineering and DevOps teams to incorporate operational needs into design and deployment processes.
  • Provide training and mentorship to junior analysts, improving first-contact resolution rates and technical skill depth.
  • Communicate effectively with internal teams and external partners regarding incidents, maintenance updates, and service improvements.
  • Uphold security best practices in daily operations, including patch management, credential hygiene, and access reviews.
  • Work with compliance and security teams to address vulnerabilities, audits, and control assessments.
  • Participate in a shared on-call rotation and scheduled maintenance periods, ensuring smooth handovers and consistent shift documentation.
  • The on-call rotation will initially involve 3 to 4 team members, progressing toward full 24 / 7 coverage as the team expands.
  • At least 3 years of experience in network operations, site reliability, or cloud platform support roles managing production systems.
  • Strong understanding of networking, VPNs, firewalls, load balancers, DNS, and certificate management.
  • Hands‑on experience with cloud services including compute, storage, networking, and identity management.
  • Practical experience with both Linux and Windows systems administration.
  • Proficiency in one or more scripting languages such as Python, PowerShell, or Bash, and ability to create dependable automation workflows.
  • Familiarity with monitoring, alerting, and telemetry systems, including the design of meaningful service‑level indicators.
  • Working knowledge of service management platforms and workflow automation tools.
  • Proven ability to write accurate operational documentation, including procedures and troubleshooting guides.
  • Strong communication skills for both technical and customer‑facing interactions.

Preferred Qualifications :

  • Experience with Infrastructure‑as‑Code tools (e.g., Terraform, Bicep) and CI / CD systems.
  • Knowledge of IoT or distributed device management at scale.
  • Understanding of system reliability concepts such as graceful degradation and autoscaling.
  • Exposure to industrial or energy systems involving telemetry, control, or gateway operations.
  • Relevant certifications such as Azure Administrator, Azure Network Engineer, ITIL, or CCNA (or equivalents).
  • High Tech Genesis Inc. is an Equal Opportunity Employer. Diversity and inclusion are at the core of our values.

    Please advise High Tech Genesis of any accommodation measures you may require.

    Please be advised :

  • Applicants must have the legal right to work in Canada.
  • Kindly submit your resume in MS Word format upon application for this position.
  • #J-18808-Ljbffr

    Create a job alert for this search

    Site Reliability Engineer • Montreal, Montreal (administrative region), CA

    Similar jobs
    Site Reliability Engineer

    Site Reliability Engineer

    TMC Canada • Montreal
    Full-time +1
    The Application Infrastructure (AI) department is seeking a Site Reliability Engineer (SRE) to help drive the reliability engineering, operations and customer support services for Morgan Stanley's ...Show more
    Last updated: 8 hours ago • Promoted • New!
    Senior Site Reliability Engineer (SRE)

    Senior Site Reliability Engineer (SRE)

    Intelcom Express Inc. • Montreal
    Full-time
    Senior Site Reliability Engineer (SRE) page is loaded## Senior Site Reliability Engineer (SRE)locations : Canada, Quebec, Montrealtime type : Full timeposted on : Posted Todayjob requisition id : ...Show more
    Last updated: 30+ days ago • Promoted
    Site Reliability Engineering Manager

    Site Reliability Engineering Manager

    Aduna Global • Montreal
    Full-time
    Chez Aduna, nous construisons l’épine dorsale de l’économie mondiale des API.En connectant les opérateurs télécom, les plateformes cloud et les innovateurs logiciels, nous permettons la prochaine g...Show more
    Last updated: 12 days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    AKUR8 • Montreal
    Full-time
    Akur8 is a fast-growing Insurtech scale‑up that transforms insurance pricing and reserving with transparent machine learning. Our SaaS platform injects speed, performance and reliability into insure...Show more
    Last updated: 30+ days ago • Promoted
    Site Reliability Engineering Specialist (Hybrid)

    Site Reliability Engineering Specialist (Hybrid)

    Morgan Stanley • Montreal
    Full-time
    Site Reliability Engineering Specialist (Hybrid).Site Reliability Engineering Specialist (Hybrid).We're seeking someone to join our Data Protection Fleet as a Site Reliability Engineering (SRE) Spe...Show more
    Last updated: 12 days ago • Promoted
    Site Supervision Engineer

    Site Supervision Engineer

    AtkinsRéalis • Montreal
    Full-time
    If you're looking for an employer with diverse supervision projects around the Montréal area and want to join an inclusive and diverse team you can rely on… then we're the employer for you!.Partici...Show more
    Last updated: 12 days ago • Promoted
    Senior Engineer, Reliability

    Senior Engineer, Reliability

    VIA Rail Canada • Montreal
    Full-time
    Did you know that VIA Rail is carrying out ambitious projects to modernize its services and infrastructure? From our new ultramodern train fleet to ongoing improvement of our infrastructure, we’re ...Show more
    Last updated: 12 days ago • Promoted
    Algebra Private Tutoring Jobs Lanaudi

    Algebra Private Tutoring Jobs Lanaudi

    Superprof • Lanaudi, Canada
    Full-time +1
    Superprof is Canada's #1 tutoring platform, and we're actively recruiting passionate tutors! Whether you're a student, a professional, or simply someone who loves teaching, join the largest communi...Show more
    Last updated: 30+ days ago • Promoted
    Sr. Infrastructure Engineer with Kubernetes - saint-esprit

    Sr. Infrastructure Engineer with Kubernetes - saint-esprit

    Confidential • saint-esprit, qc, ca
    Full-time
    The role seeks a highly experienced Infrastructure Specialist to spearhead the design, deployment, and operational excellence of a modern cloud-native infrastructure. The ideal candidate must posses...Show more
    Last updated: 1 day ago • Promoted
    Specialist Site Reliability Engineer

    Specialist Site Reliability Engineer

    Global Talent Alliance, Canada • Montreal
    Full-time
    About the job Specialist Site Reliability Engineer.The role of the Specialist Site Reliability Engineer (SRE) is to execute RAM analysis and engineering in support of the I&T solutions.The overall ...Show more
    Last updated: 5 days ago • Promoted
    Senior Site Reliability Engineer : Observability & Cloud Mastery

    Senior Site Reliability Engineer : Observability & Cloud Mastery

    Xsolla • Montreal
    Full-time
    A leading gaming services company in Montreal is looking for a Site Reliability Engineer to ensure system reliability and availability. The ideal candidate will have extensive experience in monitori...Show more
    Last updated: 30+ days ago • Promoted
    Lead DevSecOps Engineer (Remote, Montreal, QC, Canada)

    Lead DevSecOps Engineer (Remote, Montreal, QC, Canada)

    HR POD - Hiring Talent Globally • Montreal, QC, Canada
    Remote
    Full-time
    Proven track record of redesigning and scaling production infrastructure for high-growth companies.Deep expertise in AWS services including RDS, EC2, ELB / ALB, Route53, VPC, IAM, and.Strong security...Show more
    Last updated: 17 days ago • Promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    Targeted Talent • Montreal, QC, Canada
    Permanent
    We are looking for an experienced.Senior Site Reliability Engineer.Our client is a global enterprise company with a product that you've likely used. Experience with coding / software development, ...Show more
    Last updated: 30+ days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    High Tech Genesis • Montreal
    Full-time
    Be among the first 25 applicants.At HTG, you’ll push boundaries with the latest tech and collaborate with a team that loves what they do. Be part of a design services company that is amongst the com...Show more
    Last updated: 30+ days ago • Promoted
    Canada - Data Center Commissioning Engineer (CxE) - Mechanical - All Levels

    Canada - Data Center Commissioning Engineer (CxE) - Mechanical - All Levels

    Apollo Mission Critical Engineering • Montreal, QC, Canada
    Permanent
    Apollo is a Mission-Critical Infrastructure Engineering, Construction, Commissioning, and Operations Support Services Company. We serve mission-critical industries globally with commissioning, emerg...Show more
    Last updated: 30+ days ago • Promoted
    Security Engineer - saint-esprit

    Security Engineer - saint-esprit

    ITCO Solutions, Inc. • saint-esprit, qc, ca
    Full-time
    What You Will DoWrite code to integrate services using vendor-supplied APIs.Write code to manage asset inventory.Write code to modify data records. Work with tech leads and project managers to commu...Show more
    Last updated: 10 hours ago • Promoted • New!
    Senior Site Reliability Engineer (SRE)

    Senior Site Reliability Engineer (SRE)

    Devopshunt • Montreal
    Full-time
    Senior Site Reliability Engineer (SRE).Digital Infrastructure Team Lead.This is an opportunity to make a significant impact in a fast-paced, innovative environment. If you’re passionate about buildi...Show more
    Last updated: 12 days ago • Promoted
    Site Reliability Engineer (Linux / Cloud Infrastructure)

    Site Reliability Engineer (Linux / Cloud Infrastructure)

    Atlantis IT Group • Montreal
    Full-time
    Site Reliability Engineer (Linux / Cloud Infrastructure) role with hands-on experience across Linux, distributed systems, scripting, databases, monitoring, containers, cloud SaaS integrations, mess...Show more
    Last updated: 30+ days ago • Promoted