Talent.com

Java developer Jobs in Mirabel, QC

Create a job alert for this search

Java developer • mirabel qc

Last updated: 17 hours ago
Site Reliability Engineer – GenAI Platform

Site Reliability Engineer – GenAI Platform

Astra North Infoteck Inc.MONTREAL & MIRABEL, QC, ca
Full-time
Experience: 8+ years of experience as a Site Reliability Engineer or in a similar role, with hands-on experience in supporting IaaS platforms with networking and system engineer-ing knowledge.Opera...Show moreLast updated: 19 days ago
  • New!
Full stack Developer

Full stack Developer

Artech LLC~MONTREAL & MIRABEL, ON
Full-time
Title: Full stack Developer .Location: Montreal - Hybrid 3 days .We are seeking a highly skilled Full Stack Java Developer to join our dynamic team.The successful candidate will have a strong backg...Show moreLast updated: 17 hours ago
Entry-Level Research Assistant (Remote)

Entry-Level Research Assistant (Remote)

FocusGroupPanelSaint-Joseph-du-Lac, Quebec, Canada
Remote
Part-time
Work From Home, Entry Level Data Entry Clerk As A Research Participant.We are looking for people who want to work remotely from home.You'll need an Internet connection and a mobile device or comput...Show moreLast updated: 30+ days ago
Site Reliability Engineer – GenAI Platform

Site Reliability Engineer – GenAI Platform

Astra North Infoteck Inc.MONTREAL & MIRABEL, QC, ca
19 days ago
Job type
  • Full-time
Job description
  • Experience: 8+ years of experience as a Site Reliability Engineer or in a similar role, with hands-on experience in supporting IaaS platforms with networking and system engineer-ing knowledge.

  • Roles and Responsibilities:

    • Operate, monitor, and maintain the infrastructure supporting GenAI applications (training, inference, feature store, data ingestion, model serving)

    • Design and build automation for core platform capabilities, reducing manual toil

    • Develop and maintain infrastructure-as-code (IaC) for provisioning and managing compute, storage, network, GPU clusters, Kubernetes / container orchestration, etc.

    • Establish, monitor, and enforce SLOs/SLIs/SLAs, error budgets, alerting, and dashboards

    • Lead incident response, root cause analysis (RCA), postmortems, and systemic remediation

    • Perform capacity planning, scaling strategies, workload scheduling, and resource forecasting

    • Optimize cost vs. performance tradeoffs in large-scale compute environments

    • Harden systems for security, compliance, auditability, and data governance

    • Collaborate across teams (cloud engineers, data engineers, infrastructure, secu-rity) to ensure safe deployment, rollout, rollback, and integration of new systems

    • Define disaster recovery (DR) strategies, backup/restore practices, fault toler-ance mechanisms

    • Maintain runbooks, operational playbooks, documentation, and training materials

    • Participate in on-call rotations and respond to production incidents 24/7 as needed

    • Continuously evaluate and integrate new tools, frameworks, or technologies to enhance platform reliability

  • Skills:

    • Production experience in SRE / Infrastructure / ops for large-scale systems

    • Strong programming/scripting skills (Python, Go, Java, or equivalent)

    • Deep experience with containerization (Docker), orchestration (Kubernetes, etc.)

    • Infrastructure-as-code (Terraform, Helm, CloudFormation, Ansible, etc.)

    • Familiarity with GPU / AI compute clusters, high-performance data storage, and distributed architectures

    • Experience with monitoring / observability / logging / alerting tools (Prometheus, Grafana, ELK / EFK, Datadog, etc.)

    • Networking & systems engineering knowledge (TCP/IP, DNS, routing, load bal-ancing, distributed storage)

    • Solid experience in capacity planning, performance tuning, scaling, and incident response

    • Demonstrated ability to lead RCAs, deploy fixes, and drive reliability improve-ments

    • Experience in regulated environments (financial services, compliance, audit, se-curity) is a strong plus

    • Excellent communication, documentation, and cross-team collaboration skills

    • Proven track record of reducing operational toil via automation