Talent.com
IKO North America
Infrastructure Reliability EngineerIKO North America • Winnipeg, Canada
No longer accepting applications
Infrastructure Reliability Engineer

Infrastructure Reliability Engineer

IKO North America • Winnipeg, Canada
4 days ago
Job type
  • Full-time
Job description
IT Infrastructure Reliability Engineer Department:

Infrastructure & Operations

Reports To:

Global Director, Infrastructure & Operations

Employment Type:

Full-Time

Location:

On-Site

Compensation:

$110,000–$129,000

Position Summary The IT Infrastructure Reliability Engineer plays a critical role in ensuring the availability, performance, and resilience of enterprise technology systems across a complex, globally distributed environment. Reporting to the Global Director of Infrastructure and Operations, this individual will serve as a subject‑matter expert in observability, monitoring, alerting, application performance, while actively contributing to governance and architectural decisions through membership on the Architecture Review Board.

Key Responsibilities Monitoring, Observability & Alerting

Design, implement, and maintain comprehensive monitoring solutions across on‑premises, cloud, and hybrid infrastructure environments.

Develop observability frameworks leveraging metrics, logs, and distributed tracing to provide end‑to‑end visibility into system health and performance.

Define and manage alerting thresholds, escalation policies, and on‑call runbooks to enable rapid incident detection and response.

Continuously evaluate and improve monitoring tooling (SolarWinds, Prometheus, Grafana, Splunk, Dynatrace) to align with organizational needs.

Establish SLOs, SLIs, and error budgets to measure and communicate reliability targets to business and technical stakeholders.

Application Performance Monitoring (APM)

Lead the deployment and optimization of APM tools to monitor application response times, throughput, error rates, and resource utilization.

Collaborate with development teams to instrument applications and integrate performance monitoring into development pipelines.

Conduct proactive performance analysis to identify bottlenecks, regressions, and optimization opportunities before they impact end users.

Develop dashboards and reports that surface actionable insights for engineering, operations, and leadership teams.

Participate in post‑incident reviews to identify root causes and drive improvements to application reliability and observability.

Change Management Coordination

Serve as a technical liaison in the Change Advisory Board (CAB) process, evaluating infrastructure and platform changes for reliability risk.

Evaluate and improve change management standards, including pre‑change testing, rollback planning, and post‑change validation procedures.

Coordinate scheduled maintenance windows and communicate impact assessments to stakeholders and service owners.

Maintain change records and audit trails in the ITSM platform (ServiceNow) to support compliance and reporting.

Champion a culture of disciplined, risk‑aware change practices across the I&O team.

Architecture Review Board (ARB) Membership

Participate as a standing member of the Architecture Review Board, providing reliability, observability, and operational readiness input on proposed solutions.

Review and assess new infrastructure designs, cloud services, and technology platforms for alignment with reliability engineering standards.

Contribute to the development and maintenance of architecture principles, infrastructure reference architectures, and technology standards.

Work cross‑functionally with Enterprise Architects, Security, and Development teams to ensure new capabilities are designed for operability and resilience.

Document ARB decisions and provide post‑implementation feedback loops to inform future architectural guidance.

Additional Responsibilities

Develop and maintain infrastructure‑as‑code (IaC) for monitoring configurations, ensuring consistency and version control.

Support capacity planning efforts by analyzing trends in resource consumption and forecasting future infrastructure requirements.

Mentor junior engineers in reliability engineering principles, tooling, and best practices.

Contribute to the development of disaster recovery and business continuity plans, including regular DR testing.

Maintain up‑to‑date documentation for all monitoring, alerting, and operational runbooks.

Qualifications Required

5+ years of experience in IT infrastructure, site reliability engineering (SRE), or a related operations role.

Demonstrated expertise in monitoring and observability platforms (Datadog, Prometheus, Grafana, Dynatrace, New Relic, or Splunk).

Solid understanding of APM concepts and hands‑on experience instrumenting applications in enterprise environments.

Experience with ITSM and change management processes (ITIL certification preferred).

Proficiency with cloud platforms (AWS, Azure, GCP, OCI) and hybrid infrastructure architectures.

Familiarity with containerization and orchestration technologies (Docker, Kubernetes).

Experience with scripting or automation languages (Python, PowerShell) and infrastructure‑as‑code tools (Ansible, Terraform).

Strong communication skills with the ability to convey complex technical information to both technical and non‑technical audiences.

Preferred

Experience in a formal Site Reliability Engineering (SRE) function with ownership of SLOs and error budgets.

Background in enterprise architecture governance or participation in architecture review processes.

Certifications such as AWS Solutions Architect, Google Professional Cloud Architect, ITIL v4, or CKA/CKAD.

Familiarity with observability frameworks such as OpenTelemetry.

Experience in regulated industries with compliance‑driven change controls.

Core Competencies Technical Excellence

Deep infrastructure expertise

Systems‑level thinking

Automation‑first mindset

Security and compliance awareness

Collaboration & Influence

Cross‑functional partnership

Stakeholder communication

Architecture governance participation

Mentorship and knowledge sharing

Operational Mindset

Reliability and availability focus

Incident ownership

Continuous improvement

Risk‑aware change management

Working Conditions This role may require participation in an on‑call rotation and availability outside standard business hours for critical incidents. Occasional travel may be required to support multi‑site operations.

Benefits of Employment We are pleased to offer competitive compensation, health care, a progressive and challenging workplace, and a commitment to teamwork and integrity.

Diversity and Equal Opportunity Employment IKO Industries Ltd. is an equal opportunity employer. We are committed to diversity and inclusion and are pleased to consider all qualified applicants for employment without consideration to race, religion, creed, color, national origin, age, gender, sexual orientation, marital status, veteran status, or disability. IKO Industries Ltd. encourages and welcomes applications from people with disabilities. Accommodations are available on request for candidates taking part in all aspects of the selection process.

#J-18808-Ljbffr
Create a job alert for this search

Infrastructure Reliability Engineer • Winnipeg, Canada

Similar jobs

Remote Infra Engineer - Kubernetes & Pulumi (Open-Source)

GetLago Corp.Winnipeg, Canada
Remote
Full-time

A leading open-source billing platform is hiring an Infrastructure Engineer to scale their infrastructure, focusing on reliability and performance.You'll need 5+ years of experience in DevOps a... Show more

 • Promoted

Hardware Engineer: Systems & Cloud Infrastructure

GTR WorldwideWinnipeg, MB, CA
Full-time

A technology services company in Manitoba seeks a skilled IT specialist to conduct hardware analysis and manage cloud storage for secure operations.Candidates should possess a Bachelor's in a relev... Show more

 • Promoted

Senior DevOps Engineer – Remote, Cloud Reliability

ZayZoonWinnipeg, MB, CA
Remote
Full-time

A leading financial wellness platform in Canada is seeking a DevOps Engineer to manage and optimize their AWS infrastructure.The ideal candidate will have over 5 years of cloud infrastructure exper... Show more

 • Promoted

Site Reliability Engineer

TELUS DigitalWinnipeg, Canada
Full-time

Welcome to TELUS Digital — where innovation drives impact at a global scale.As an award-winning digital product consultancy and the digital division of TELUS, one of Canada’s largest telecommunicat... Show more

 • Promoted

Senior Staff Engineer - Cloud Infrastructure Systems

ConfluentWinnipeg, Manitoba, Canada
Full-time

Drive the future of Confluent Cloud as a Senior Staff Software Engineer focusing on control-plane infrastructure.Leverage your expertise to lead complex initiatives while enhancing system reliabili... Show more

 • Promoted

Reliability Engineer

J.D. Irving, LimitedWinnipeg, Canada
Permanent

Job Description As a member of the Irving Forest Services team, starting at Irving Paper, this position will support reliability programs across the Pulp and Paper Division and have growth potentia... Show more

 • Promoted

Senior Systems Reliability Engineer at nesto

nestoWinnipeg, Manitoba, Canada
Full-time

Join nesto as a Senior Systems Reliability Engineer and improve cloud-based mortgage solutions.Your expertise in SRE and automation will ensure performance and stability as we expand.At nesto, you ... Show more

 • Promoted

Senior Infrastructure Engineer (Remote Role)

GetLago Corp.Winnipeg, Canada
Remote
Full-time +1

Become a Senior Infrastructure Engineer in a dynamic open-source environment.Focus on implementing scalable infrastructure and robust security practices from anywhere in Canada or Brazil.In this pe... Show more

 • Promoted

Senior Site Reliability Engineer in Crypto

P2PWinnipeg, Manitoba, Canada
Full-time

Join Kraken as a Senior Site Reliability Engineer, contributing to innovative crypto solutions from anywhere in the world.This remote role emphasizes managing infrastructure and enhancing CI/CD pro... Show more

 • Promoted

Infrastructure Engineer

TailscaleWinnipeg, Canada
Full-time

Tailscale is building the new Internet by delivering software that makes it easy to securely interconnect people and their devices, no matter where they are.From hobbyists to multinational corporat... Show more

 • Promoted

Lead Platform Engineer For Infrastructure Development

ShakepayWinnipeg, Canada
Full-time

Advance our internal platform as a Lead Platform Engineer.Spearhead architecture, develop automation strategies, and elevate our engineering teams' efficiency and productivity in a flexible wor... Show more

 • Promoted

Flood Damage Reduction Hydraulic Engineer

Government of ManitobaWinnipeg, MB, CA
Full-time

Job Title: Flood Damage Reduction Hydraulic Engineer (EG4).Department: Transportation and Infrastructure – Hydrologic Forecasting and Water Management, Engineering and Technical Services.Employment... Show more

 • Promoted

Sr. Site Reliability Engineer

AndiamoWinnipeg, Canada
Full-time

Senior Software Engineer, Site Reliability to join our Platform team.In this position, you will play a vital role in ensuring our systems and services' stability, scalability, and reliability.Y... Show more

 • Promoted

Principal Platform Infrastructure Engineer (Containers)

JobgetherWinnipeg, Manitoba, Canada
Full-time

Principal Platform Infrastructure Engineer (Containers) – Canada.This role sits at the heart of a global, cloud-native infrastructure platform that powers secure connectivity at scale for enterpris... Show more

 • Promoted

Lead Engineer, Bridges (relocation provided)

WSP in CanadaWinnipeg, MB, CA
Full-time

What if you could redefine what’s possible? With us, you can.We are the home of ambitious, passionate, and innovative world shapers.With an unmatched breadth and depth of engineering, advisory and ... Show more

 • Promoted

Senior Site Reliability Engineer

CloudbedsWinnipeg, MB, CA
Full-time

At Cloudbeds, we transform hospitality with a platform that powers properties in 150 countries and processes billions in bookings annually.Our remote, global team builds AI‑powered solutions for ho... Show more

 • Promoted

Bridge Design Lead for Innovative Infrastructure Programs

HatchWinnipeg, MB, CA
Full-time

A global engineering company in Winnipeg is seeking an experienced structural engineer specializing in bridge design.The candidate will lead project teams in planning, designing, and managing infra... Show more

 • Promoted

Senior Infrastructure Reliability Engineer

ShippoWinnipeg, Canada
Full-time

Enhance shipping solutions as a Senior Site Reliability Engineer in a remote setting.Focus on infrastructure integrity, scalability, and performance in a collaborative environment.This position inv... Show more

 • Promoted

Network and Storage Infrastructure Lead

Manitoba HydroWinnipeg, MB, CA
Full-time

Manitoba Hydro is consistently recognized as one of Manitoba's Top Employers!.We are a leader among energy companies in North America, recognized for providing highly reliable service and exception... Show more

 • Promoted

Senior Infra Engineer: Cloud & Observability

Afresh Technologies, Inc.Winnipeg, Manitoba, Canada
Full-time

A leading AI company in fresh food is seeking a Senior Software Engineer, Infrastructure.You will enhance the infrastructure for service teams, delivering projects end-to-end, and ensuring operatio... Show more