Talent.com
IKO
Infrastructure Reliability EngineerIKO • Mississauga, Ontario, Canada
Infrastructure Reliability Engineer

Infrastructure Reliability Engineer

IKO • Mississauga, Ontario, Canada
11 days ago
Salary
CA$110,000.00 yearly
Job type
  • Full-time
Job description

IKO Industries Ltd. is a market leader in the manufacturing of roofing and building materials. IKO is a Canadian owned and operated business with production facilities worldwide and has many years of unparalleled success in the roofing materials industry. Quality integrity and trustworthiness are the values that underlie this success and we have built this company by hiring people who hold these values. People like you!

Job Description

.

IT Infrastructure Reliability Engineer

Department: Infrastructure & Operations
Reports To: Global Director Infrastructure & Operations
Employment Type: Full-Time
Location: On-Site
Compensation: $110000$129000

Position Summary

The IT Infrastructure Reliability Engineer plays a critical role in ensuring the availability performance and resilience of enterprise technology systems across a complex globally distributed environment. Reporting to the Global Director of Infrastructure and Operations this individual will serve as a subject matter expert in observability monitoring alerting application performance while actively contributing to governance and architectural decisions through membership on the Architecture Review Board.

Key Responsibilities

Monitoring Observability & Alerting

  • Design implement and maintain comprehensive monitoring solutions across on-premises cloud and hybrid infrastructure environments.
  • Develop observability frameworks leveraging metrics logs and distributed tracing to provide end-to-end visibility into system health and performance.
  • Define and manage alerting thresholds escalation policies and on-call runbooks to enable rapid incident detection and response.
  • Continuously evaluate and improve monitoring tooling (e.g. SolarWinds Prometheus Grafana Splunk Dynatrace) to align with organizational needs.
  • Establish SLOs SLIs and error budgets to measure and communicate reliability targets to business and technical stakeholders.

Application Performance Monitoring (APM)

  • Lead the deployment and optimization of APM tools to monitor application response times throughput error rates and resource utilization.
  • Collaborate with development teams to instrument applications where applicable and integrate performance monitoring into development pipelines.
  • Conduct proactive performance analysis to identify bottlenecks regressions and optimization opportunities before they impact end users.
  • Develop dashboards and reports that surface actionable insights for engineering operations and leadership teams.
  • Participate in post-incident reviews to identify root causes and drive improvements to application reliability and observability.

Change Management Coordination

  • Serve as a technical liaison in the Change Advisory Board (CAB) process evaluating infrastructure and platform changes for reliability risk.
  • Evaluate and improve change management standards including pre-change testing rollback planning and post-change validation procedures.
  • Coordinate scheduled maintenance windows and communicate impact assessments to stakeholders and service owners.
  • Maintain change records and audit trails in the ITSM platform (ServiceNow) to support compliance and reporting.
  • Champion a culture of disciplined risk-aware change practices across the I&O team.

Architecture Review Board (ARB) Membership

  • Participate as a standing member of the Architecture Review Board providing reliability observability and operational readiness input on proposed solutions.
  • Review and assess new infrastructure designs cloud services and technology platforms for alignment with reliability engineering standards.
  • Contribute to the development and maintenance of architecture principles infrastructure reference architectures and technology standards.
  • Work cross-functionally with Enterprise Architects Security and Development teams to ensure new capabilities are designed for operability and resilience.
  • Document ARB decisions and provide post-implementation feedback loops to inform future architectural guidance.

Additional Responsibilities

  • Develop and maintain infrastructure-as-code (IaC) for monitoring configurations ensuring consistency and version control.
  • Support capacity planning efforts by analyzing trends in resource consumption and forecasting future infrastructure requirements.
  • Mentor junior engineers in reliability engineering principles tooling and best practices.
  • Contribute to the development of disaster recovery and business continuity plans including regular DR testing.
  • Maintain up-to-date documentation for all monitoring alerting and operational runbooks.

Qualifications

Required

  • 5 years of experience in IT infrastructure site reliability engineering (SRE) or a related operations role.
  • Demonstrated expertise in monitoring and observability platforms (e.g. Datadog Prometheus Grafana Dynatrace New Relic or Splunk).
  • Solid understanding of APM concepts and hands-on experience instrumenting applications in enterprise environments.
  • Experience with ITSM and change management processes (ITIL certification preferred).
  • Proficiency with cloud platforms (AWS Azure GCP OCI) and hybrid infrastructure architectures.
  • Familiarity with containerization and orchestration technologies (Docker Kubernetes).
  • Experience with scripting or automation languages (Python PowerShell) and infrastructure-as-code tools (Ansible Terraform).
  • Strong communication skills with the ability to convey complex technical information to both technical and non-technical audiences.

Preferred

  • Experience in a formal Site Reliability Engineering (SRE) function with ownership of SLOs and error budgets.
  • Background in enterprise architecture governance or participation in architecture review processes.
  • Certifications such as AWS Solutions Architect Google Professional Cloud Architect ITIL v4 or CKA/CKAD.
  • Familiarity with observability frameworks such as OpenTelemetry.
  • Experience in regulated industries with compliance-driven change controls.

Core Competencies

Technical Excellence

  • Deep infrastructure expertise
  • Systemslevel thinking
  • Automationfirst mindset
  • Security and compliance awareness

Collaboration & Influence

  • Crossfunctional partnership
  • Stakeholder communication
  • Architecture governance participation
  • Mentorship and knowledge sharing

Operational Mindset

  • Reliability and availability focus
  • Incident ownership
  • Continuous improvement
  • Riskaware change management

Working Conditions

This role may require participation in an oncall rotation and availability outside standard business hours for critical incidents. Occasional travel may be required to support multisite operations.

Benefits of Employment: IKO recognizes that its success is due to the strength of its employees. A primary goal of IKO is to promote individual employees sense of accomplishment and contribution so that employees enjoy their association with IKO. The Company invests in its employees so that they are the most knowledgeable in the industry and undertakes great efforts to nurture loyalty to and teamwork at IKO. We are pleased to offer competitive compensation health care a progressive and challenging workplace and a commitment to teamwork and integrity.

Diversity and Equal Opportunity Employment: IKO Industries Ltd. is an equal opportunity employer. We are committed to diversity and inclusion and are pleased to consider all qualified applicants for employment without consideration to race religion creed color national origin age gender sexual orientation marital status veteran status or Industries Ltd. encourages and welcomes applications from people with disabilities. Accommodations are available on request for candidates taking part in all aspects of the selection process.


Required Experience:

IC


Experience: years
Vacancy: 1
Monthly Salary Salary: 110000 - 129000
Create a job alert for this search

Infrastructure Reliability Engineer • Mississauga, Ontario, Canada

Similar jobs

Site Reliability Engineer

HCLTechmississauga, on, ca
Full-time

Hands-on experience with at least one major public cloud platform (Azure, AWS, or GCP).Strong understanding of cloud infrastructure and application runtime components, including compute, storage, n... Show more

 • Promoted

Platform Reliability Engineer — Global Cloud & Kubernetes

Menlo SecurityMississauga, Peel Region, CA
Full-time

A technology company in Canada is seeking a Platform Infrastructure Engineer to design and manage cloud-native infrastructure.You will work with a distributed team and ensure reliable operations ac... Show more

 • Promoted

Bilingual Infrastructure Engineer

VetStrategyVaughan
Full-time

Manager of Infrastructure and Security, the incumbent will be responsible for completing infrastructure activities within the server/cloud networking sphere both in our veterinary clinics and Azure... Show more

 • Promoted

Lead Cloud Services Engineer for Innovative Reliability Solutions

OpenTextMississauga
Full-time

Join us as a Lead Cloud Services Engineer, specializing in creating and supporting reliable cloud systems.Use your expertise to tackle complex challenges while enhancing customer experience in dyna... Show more

 • Promoted

Cloud Infrastructure Engineer

Element Fleet ManagementMississauga, Ontario, Canada
Full-time

Get started on an exciting career at Element! Element employees make a difference in the lives of others every day.We are re-defining the fleet management industry to be people first, then business... Show more

 • Promoted

Infrastructure and DevOps Engineer

Citigroup Inc.Mississauga, Ontario, Canada
Full-time

We are seeking a highly motivated Infrastructure & DevOps Engineer to join our Fixed Income Technology team.This role is pivotal in ensuring the seamless operation and continuous improvement of our... Show more

 • Promoted

Cloud-Focused Site Reliability Engineer Driving Automation and Reliability

Dayforce US, Inc.Mississauga, Peel Region, CA
Full-time

Play a vital role as a Site Reliability Engineer, enhancing cloud systems' automation and reliability.Collaborate with teams and build strong relationships while working remotely in a dynamic envir... Show more

 • Promoted

Senior Infrastructure Engineer — Remote Platform & Cloud

MLabs LtdMississauga, Peel Region, CA
Remote
Full-time

A leading financial technology firm is seeking a Senior Infrastructure Engineer to design and build an internal platform that empowers product teams.This role involves improving developer productiv... Show more

 • Promoted

Site Reliability Engineer

Tecsys Inc.Mississauga, Peel Region, CA
Permanent

Having recognized the advantages of remote work, including employee morale, productivity, reduced commuting on employee wellbeing and the environment, we are proud to be a digital-first company.The... Show more

 • Promoted

Staff Platform Engineer: Scale, Reliability & Impact

AshbyMississauga, Peel Region, CA
Full-time

A leading software firm in Canada is looking for a mid-senior Staff Platform Engineer.In this role, you will design scalable infrastructure, collaborate with engineering teams, and ensure the secur... Show more

 • Promoted

Senior Infrastructure Reliability Engineer

ShippoMississauga, Peel Region, CA
Full-time

Enhance shipping solutions as a Senior Site Reliability Engineer in a remote setting.Focus on infrastructure integrity, scalability, and performance in a collaborative environment.This position inv... Show more

 • Promoted

Site Reliability Engineer

TELUS DigitalMississauga, Peel Region, CA
Full-time

Welcome to TELUS Digital — where innovation drives impact at a global scale.As an award-winning digital product consultancy and the digital division of TELUS, one of Canada’s largest telecommunicat... Show more

 • Promoted

Platform Infrastructure Engineer — Cloud-Native SRE

Menlo Security Inc.Mississauga, Peel Region, CA
Full-time

A leading cybersecurity firm in Canada seeks a Platform Infrastructure Engineer to join their team.This role involves designing, deploying, and maintaining a robust cloud-native infrastructure acro... Show more

 • Promoted

Senior Cloud Reliability Engineer - Remote Canada

Axon EnterpriseMississauga, Peel Region, CA
Remote
Full-time

A technology company is seeking a Senior Site Reliability Engineer to enhance its mission-critical cloud native services.Responsibilities include building platforms, designing test strategies, and ... Show more

 • Promoted

Senior Site Reliability Engineer II - Remote, Scale-Focused

InstacartMississauga, Peel Region, CA
Remote
Full-time

A leading grocery delivery service is seeking a Senior Site Reliability Engineer II in Calgary, Alberta.You will ensure optimal performance and reliability of the platform while establishing incide... Show more

 • Promoted

Site Reliability Engineer for Cloud Infrastructure Management

NewtonMississauga, Peel Region, CA
Full-time

Be a pivotal Site Reliability Engineer focused on improving infrastructure resilience and reliability.Collaborate remotely to drive operational success and enhance system performance in a dynamic e... Show more

 • Promoted

Senior Site Reliability Engineer - Remote & Scale Impact

ClickHouseMississauga, Peel Region, CA
Remote
Full-time

A leading cloud company is seeking a Senior Site Reliability Engineer to build and lead processes ensuring the reliability and performance of their remote cloud infrastructure.This role requires co... Show more

 • Promoted

Cloud Infrastructure Engineer — Remote (GTA)

GlossGeniusMississauga, Peel Region, CA
Remote
Full-time

A leading entrepreneurship support platform in Canada is hiring a Software Engineer, Infrastructure at all levels.Join the team to improve the reliability and scalability of their production system... Show more

 • Promoted

Site Reliability Engineer

Citibank (Switzerland) AGMississauga, Peel Region, CA
Full-time

For additional information, please review.Description** Engineer the future of global finance.At Citi, our Tech team doesn’t just support finance – we are helping to redefine it.Every day, $5 tril... Show more

 • Promoted

Senior Site Reliability Engineer (Remote-First)

VySystemsMississauga, Peel Region, CA
Remote
Full-time

A leading technology company is seeking a Senior Site Reliability Engineer with robust Kubernetes knowledge to work remotely.Ideal candidates have over 6 years of experience in IT disciplines, prof... Show more