Job descriptionIntuitive AI is dedicated to addressing the global waste crisis by creating innovative solutions that make waste management more effective and impactful. Our mission is to inspire changes in behavior and foster sustainability through advanced technology and user-friendly systems. With a focus on environmental stewardship, we aim to empower individuals and organizations to make responsible waste disposal choices. At Intuitive AI, we believe in making a lasting impact and creating a cleaner planet for future generations.
The Role We're hiring a Fleet Reliability Engineer to help us keep that fleet healthy, observable, and improving. You'll work directly alongside our Senior Engineers to maintain, monitor, and scale the thousands of Oscar Sort and Pixel units deployed worldwide - catching issues before they happen, automating the boring stuff, and shipping the systems that let a small team operate a fleet far bigger than it has any right to.
You'll have a senior engineer in your corner from day one, and real ownership from week one. The work is hands‑on, the feedback loop is fast, and what you build will be running on devices in real customer sites within days.
What You'll Work On
Diagnosing and resolving issues on Linux‑based edge devices in the field; logs, crashes, network drops, hardware quirks
Building and maintaining our fleet management, monitoring, and alerting systems
Owning OTA updates and release rollouts - staged deployments, rollbacks, canary fleets
Writing scripts and tooling to automate diagnostics, remediation, and routine maintenance
Improving system reliability - driving down recurring issue classes, reducing manual support load, raising fleet uptime
Partnering with Customer Success when client‑facing issues need engineering depth
What We're Looking For Required
2–4 years working with Linux systems in a production or fleet context
Strong command line fluency and systems‑level debugging chops
Solid scripting in Python and Bash
Networking fundamentals - SSH, TCP/IP, firewalls, VPNs, basic troubleshooting
Experience with monitoring or observability tooling (Prometheus, Grafana, Datadog, Loki, or similar)
Hands‑on experience with embedded Linux, IoT, or edge devices — this is the core of the role, not a side interest
Experience with OTA / fleet update systems (Mender, RAUC, balena, AWS IoT Greengrass, or similar)
Comfortable with Docker and containerized workflows
A genuine problem‑solving instinct - you don't stop at the symptom
Nice to Have
Cloud infrastructure (AWS preferred)
CI/CD pipelines and infrastructure‑as‑code
Hands‑on hardware experience (replacing components, debugging physical units remotely)
Experience operating fleets at thousands‑of‑devices scale
Who You Are
Curious, hands‑on, and allergic to "we've always done it that way"
Comfortable owning a problem end‑to‑end, from the log line to the fix to the postmortem
Detail‑oriented - you trust the data, not the vibe
Energized by real‑world AI systems and devices people actually touch
Genuinely interested in why something broke, not just getting it back up
Why Join
Real fleet, real customers, real impact - every change you make ships to devices in the field
Direct mentorship from senior engineers, no layers between you and the systems
High ownership in a small team where your work is visible end‑to‑end
#J-18808-Ljbffr