Applied Engineer (Tribal Knowledge)
Pavo AI
London, UK
Applied Engineer (Tribal Knowledge)
Design and own the agent system that compiles an organization's tribal knowledge
About Pavo
Pavo is building Enterprise Superintelligence: compounding systems that take ownership of business outcomes and work with humans to deliver them.
We believe that while foundation models are necessary, they are not sufficient. The hard problem is systems intelligence: end-to-end architectures that understand a company's code, data, and decisions, and improve themselves through experience.
We are assembling a small, senior team of researchers and engineers obsessed with systems-first intelligence. Our current team consists of PhDs and ML engineers from top applied ML and coding agent companies, with a heritage of shipping systems at Spotify, ShareChat, and Sourcegraph scale.
Our team has built impressive momentum with a small group of highly capable engineers and researchers.
The Opportunity
As an Applied Engineer at Pavo, you will design and own the agent system that compiles an organization's tribal knowledge — the body of knowledge that nobody has written down — from its own evidence: source code, structured data, internal documents, and conversations. You'll work hands-on at the level of agent architecture, pipeline design, and the production surface that turns a generative system into one customers can depend on.
You'll partner closely with the Applied Scientist on instrumentation — but the system itself is yours. It is the most consequential thing we build, and idea to production is days.
This is a senior, individual-contributor role. Everyone on the team joins as a Member of Technical Staff — with the scope, autonomy, and end-to-end ownership that title implies.
What You'll Build
You'll own the system end-to-end, across its hardest surfaces:
- →Hierarchical Agent Architecture: Design how a main agent coordinates sub-agents, tools, and skills — deciding where deterministic scaffolding beats prompting and where prompting is the right tool. This decomposition is the single biggest determinant of whether a generative knowledge system is dependable.
- →The Synthesis Pipeline: Own the multi-stage system that turns heterogeneous private evidence into a verifiable knowledge artifact — with the checkpoints, retries, fallbacks, and structured outputs that production reliability requires.
- →Deployment & Rollout: Build scheduled regeneration, quality and reliability gates, blue/green rollout, and customer-facing version control of compiled knowledge artifacts. Treat each release with the seriousness it deserves.
- →Observability & Debuggability: Structured traces of agent runs, replay/debug tooling, cost and latency budgets, and regression detection on releases. A multi-stage agent run must be debuggable by someone who didn't initiate it.
- →Reliability as a First-Class Property: Treat run-to-run variance as a defect class equal to incorrectness, and build the engineering practice that reduces it.
- →Eval & CI Infrastructure: Reproducible, automated evaluation that feeds signal back into the deploy loop — complementary to the experimental harness on the science side.
What We Are Looking For
We are looking for an engineer who has shipped dependable production systems and is hungry to do it at the edge of what current agentic frameworks can do.
Core Qualifications
- →Senior Track Record: 8+ years building and operating production systems, including a stretch where you owned a major system or platform area end-to-end and were the engineer others came to for the hard problems.
- →Production Engineering Chops: You ship Python systems that other engineers want to extend; you write tests that catch real bugs; you have opinions about structure, dependency boundaries, and what belongs in a library versus an application.
- →Hands-On with an Agentic Framework: Depth with at least one — Anthropic tool use, OpenAI Agents SDK, Pydantic AI, LangGraph, AutoGen, or equivalent — including its sharp edges, not just its happy path. We have no preference for which; we strongly prefer that you've been bitten by one.
- →Strong Opinions on Agent Architecture: When to use a single-context agent versus decompose into sub-agents. Where deterministic code beats prompting, and how to tell which before the third refactor. What a tool is, what a skill is, and where the line is.
- →Deployment Pipelines for ML/LLM Systems: Experience designing and operating rollout, gating, observability, on-call posture, and regression detection.
- →Comfort with Reliability Work: The unglamorous work that turns a flaky agent into a dependable one — async Python, structured outputs, retries with sensible idempotency, and the long tail of "why did this run produce a different answer."
- →Pragmatism About Prompts: A hard-won sense of what's prompt-tunable and what isn't. You've earned this opinion at least once.
Preferred Qualifications
- →Built or maintained an LLM evaluation harness in production.
- →Familiarity with retrieval / IR systems and the engineering of large-context pipelines.
- →Distributed-systems background — workflow engines (Temporal / Airflow / Prefect), queues, and observability stacks (OpenTelemetry, Datadog, Honeycomb).
- →Open-source contributions to agentic frameworks, eval tooling, or workflow orchestration.
Why Join Us
- →Architecture-Defining Work: The system you design becomes the substrate for everything we ship downstream.
- →Short Loop: Work directly with the Applied Scientist on instrumentation and with the founders on platform direction. Idea to production is days.
- →Real Ownership: Genuine ownership in a small, technically deep team. Your name will be on the system.
- →Foundational Space: The private knowledge layer will reshape how AI agents operate inside organizations; the engineering problems sit at the edge of what current agentic frameworks can do well.
Pavo is an equal opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all employees.