Senior DevOps Engineer, ML Infrastructure at Serve Robotics
Base pay range
$155,000.00 / yr – $195,000.00 / yr (United States)
Remote available to qualified candidates in Canada : $130k - $160k CAD
Overview
At Serve Robotics, we’re reimagining how things move in cities. Our personable sidewalk robot is our vision for the future. It’s designed to take deliveries away from congested streets, make deliveries available to more people, and benefit local businesses. The Serve fleet has been delighting merchants, customers, and pedestrians along the way in Los Angeles, Miami, Dallas, Atlanta and Chicago while doing commercial deliveries. We’re looking for talented individuals who will grow robotic deliveries from surprising novelty to efficient ubiquity.
Responsibilities
- Deploy and maintain our ML training orchestration system that operates across multiple platforms.
- Manage cloud and on-premise environments for large-scale distributed data processing and ML training / inference systems.
- Automate deployment pipelines, monitoring, and alerting for ML and data services.
- Collaborate closely with data scientists, ML engineers, and autonomy teams to streamline experimentation and model deployment.
- Maintain and improve CI / CD systems to support rapid development and testing.
- Implement best practices for system security, reliability, and observability.
- Optimize infrastructure costs and ensure efficient resource utilization.
- Support internal developer productivity through tooling, documentation, and support.
Qualifications
Bachelor’s or Master’s degree in Computer Science, Engineering, or equivalent experience.5+ years of experience as a DevOps, SRE, or Infrastructure Engineer, preferably supporting ML or data-intensive systems.Strong experience with cloud platforms (AWS, GCP, or Azure) and container orchestration (Kubernetes, Docker).Proficiency in infrastructure-as-code tools such as Terraform or Helm.Solid understanding of CI / CD systems (GitLab CI, Jenkins, Argo CD, etc.).Experience with Python and SQL.Experience with cloud security, IAM (Identity and Access Management), and access control.Experience analyzing and optimizing hardware performance.Experience with GPU cluster management.What Makes You Stand Out
Experience managing large-scale distributed data processing systems.Experience analyzing and optimizing ML training workloads.Background in observability stacks (Prometheus, Grafana, ELK, OpenTelemetry).Contributions to open-source DevOps or ML infrastructure projects.Seniority Level
Mid‑Senior level
Employment Type
Full‑time
Job Function
Engineering and Information Technology
Industries
Technology, Information and Internet
Referrals increase your chances of interviewing at Serve Robotics by 2x.
#J-18808-Ljbffr