Senior Platform Engineer - Cloud & Infrastructure (f/m/d) - Full-time
ZenML
Software Engineering, Other Engineering
Posted on Jan 8, 2026
☁️
Senior Platform Engineer - Cloud & Infrastructure (f/m/d)
Senior Platform Engineer - Cloud & Infrastructure
Overview: Architect the Infrastructure of MLOps
We are looking for a heavy-hitting infrastructure engineer who lives in Kubernetes but understands the reality of enterprise deployments. ZenML is an open-source MLOps framework, and as we scale our ZenML Cloud and Enterprise offerings, we need someone to own the "plumbing" that makes ML pipelines run anywhere.
This is a unique hybrid role. You won't just be maintaining internal clusters; you will be building core product features (like our new workload manager and scheduler) AND helping our most advanced customers architect their MLOps stacks.
Key Responsibilities
Build "Infra-Heavy" Product Features: You will design and implement core features in ZenML Pro, such as native schedulers and the workload manager that triggers pipelines across hybrid clouds.
Own the ZenML Pro (SaaS) Infrastructure: ensuring our managed control plane is resilient, scalable, and secure using modern SRE practices (Grafana, Prometheus, Alerting).
Enterprise Architecture & PoCs: You will be the "Special Forces" engineer we send in when a major enterprise customer needs to deploy ZenML on a complex, air-gapped, or custom Kubernetes setup. You will unblock them and feed those learnings back into the product.
Developer Experience: Abstracting the complexity of K8s away from the Data Scientists who use our tool.
Tech You'll Work With
The Core: Kubernetes (Deep knowledge required - CKA level), Docker, Terraform, Helm.
The Code: Python (for ZenML) and likely Go (for controllers/operators).
The Clouds: AWS (EKS), GCP (GKE), Azure (AKS).
The Stack: PostgreSQL, SQLModel, FastAPI.
What We're Looking For
The K8s Native: You don't just use Kubernetes; you understand its internals. You’ve written Helm charts from scratch, debugged failed ingress controllers, and wrestled with VPC peering.
Infrastructure as Code (IaC) Master: You hate clicking buttons in the AWS console. If it isn't in Terraform, it doesn't exist.
Code + Ops: You are not just a SysAdmin. You can write production-quality code (Python or Go) to build features, not just scripts.
Customer Empathy: You are comfortable jumping on a call with a customer’s DevOps team to debug a deployment. You can explain complex infra concepts to Data Scientists without overwhelming them.
Problem Solver: You enjoy the detective work of figuring out why a pod is crashing in a customer's obscure private cloud environment.
Why This Role Matters
High Impact: You will be the bridge between our code and the real world. You will build the engine that executes ML pipelines for thousands of users.
Technical Breadth: You will see every type of cloud architecture imaginable. You will become a master of "running things anywhere."
Open Source: You will be contributing to a major open-source project and building your reputation in the MLOps community.
What We Bring to the Table
🌍 An inspiring, international team We’re a tight-knit group of motivated people from 7 different nationalities, speaking 20+ languages - and we are just as diverse in our interests. Whether you're into gaming, music, writing, meditation, yoga, sailing, mountain hikes, or motorcycles, you’ll find your people here.
🎉 Genuine connection & lots of fun We take our work seriously but ourselves not too much. Laughter, memes, and spontaneous coffee chats are part of the daily deal. Check out our team website if you don’t believe us.
Join us and you can look forward to BBQs, sailing trips, gaming nights, and more!
ALT
🌴 Annual company offsite Once a year, we bring the whole team together for a few days of deep connection, collaboration, and good vibes - somewhere beautiful.
🏡 Office in the heart of Munich Our home base is in Schellingstraße, right in the middle of everything. Stop by for great coffee on our sunny balcony or some after-work drinks on our roof-top terrace.
🗓️ Flexible hours & trust-based work We have core hours (9am to 6pm), but life happens - and we trust you to manage your time in a way that works for you.
🧑🏼💻 Remote-friendly culture Around half of our team is remote, working from places like Spain, Morocco, the Netherlands, and the US. Whether you're in Munich or elsewhere, you're equally part of the team.
🏦 Competitive compensation Wondering what the salary for this role is Just ask us! On the first call, it's something we always cover as we genuinely want to match your experience with the correct salary. The reason why we don't advertise is because we honestly have a degree of flexibility and would never want salary to be a reason why someone doesn't apply - what's more important to us is finding the right fit!
About ZenML
ZenML isn't just another MLOps tool - we're building the next generation platform that makes production ML accessible to every organization. Our open-source framework has a growing community of users who rely on our tool to create reproducible, production-ready ML pipelines.
All-Hands Summer 2023
ALT
How to Apply
Send us examples of your work (GitHub, portfolio, or previous projects) along with your resume to careers@zenml.io. We're particularly interested in hearing about your experience with cloud infrastructure, customer-facing technical roles, and any MLOps-related projects.
We'll review your application within 48 hours and get back to you with the next steps. If you’re curious about our process, you can find more information on our website.