ML Engineer - Full-Time

Role: Machine Learning Engineer (Rust / Python / Voice AI)

Location: Paris

Job type: Full-time

Work setup: 2-3 days remote per week

Start: ASAP

Job offer

About pyannoteAI

pyannoteAI is pioneering Speaker Intelligence AI, transforming how AI processes and understands spoken language. Our speaker diarization technology distinguishes speakers with unmatched precision, regardless of the spoken language, making AI understand not just what is said, but who said it and when.

Founded by voice AI experts with 10+ years in the industry (ex-CNRS research scientists), we've built the 9th most downloaded open-source model on HuggingFace with 52 million monthly downloads and over 140,000 users worldwide. After raising €8M from leading international VCs (Crane Venture Partners, Serena, and angels from HuggingFace and OpenAI), we're now scaling our enterprise platform.

From meeting transcription and call center analytics to video dubbing and voice agents, pyannoteAI powers the next generation of voice-enabled applications across industries that depend on understanding who speaks and when.

Your role

As a Machine Learning Engineer at pyannoteAI, you'll bridge cutting-edge research and production systems, transforming state-of-the-art speaker diarization models into scalable, real-time voice processing infrastructure. Working directly with our research scientists and within the Tech team, you'll write production code in Python and Rust, optimize for low-latency inference, and build the ML infrastructure that powers the leading diarization model in the VoiceAI space.

You'll:

Design, implement and deploy ML models, particularly in the Audio/Voice AI domain (e.g., speaker diarization, speech separation, speech recognition, etc).

Develop products/services in Rust/Python to support model training and inference in both streaming and batch pipelines.

Work with ML frameworks like PyTorch / ONNX

Build and maintain containerized environments (using Docker) for model training/inference, testing, and CI/CD pipelines.

Implement CI/CD workflows (model build/test/deploy), monitor model performance in production, troubleshoot inference/pipeline issues.

Optimize inference performance (latency, throughput, resource usage) especially for voice real-time systems.

Collaborate with researchers, software engineers, DevOps/MLOps teams, product teams to integrate the ML models into production services.

Maintain documentation of model architectures, data pipelines, deployment workflows, tests, and serve as a bridge between research/prototype and production.

Keep up with latest developments in Voice AI, model optimization (e.g., quantization, ONNX runtime, edge/embedded inference) and propose improvements.

What makes this role unique: You'll work at the intersection of groundbreaking AI research and real-world production, collaborating directly with ex-CNRS voice AI pioneers. Unlike typical ML roles, you'll leverage both Python for ML workflows and Rust for performance-critical systems—rare exposure to the full stack from training to deployment. With 140K+ developers using our technology, your optimizations will have immediate impact on one of the fastest-growing voice AI platforms globally.

What we’re looking for

Must-haves:

Strong programming skills in Python, especially for ML/deep learning (data preprocessing, model architecture, training, evaluation, inference). (Essential)

AND/OR strong programming skills in Rust (systems/low-level aspects, safe concurrency, performance, embedding inference engines or services in Rust). (Essential)

Solid experience with a deep‐learning framework such as PyTorch (preferred) — building/training models from scratch, fine‐tuning, evaluation, etc.

Experience converting or deploying models using ONNX (exporting models, optimizations, running inference via ONNX runtime or similar).

Experience building containerized workflows using Docker (creating images, defining Dockerfiles, managing container deployments).

Experience with CI/CD pipelines (building model build/test/deploy workflows, versioning, model deployment automation).

Experience with deploying models to production (inference services, API endpoints, batch pipelines, streaming if applicable), monitoring/model-observability.

Experience in Audio/Voice AI domain: e.g., speaker diarization, speaker embedding, speech recognition (ASR), voice synthesis (TTS), speaker recognition, feature extraction.

Strong software engineering practices: version control (Git), code reviews, automated testing (unit/integration tests for ML pipelines), logging/monitoring, clean architecture.

Good understanding of ML fundamentals: statistics, probability, linear algebra, model evaluation metrics, overfit/underfit, bias/variance, generalization etc.

Good communication & collaboration skills — able to translate ML engineering trade-offs to product/business teams.

Nice-to-haves:

Experience with other deployment/serving frameworks: Kubernetes, serverless functions, edge/embedded inference (e.g., on device).

Experience optimizing for latency/throughput in voice/real‐time systems (e.g., low-latency voice pipelines, speaker diarization, streaming inference).

Experience with cloud infrastructure (AWS, GCP, Azure) and ML/AI infrastructure

Experience with other languages like C++ (for embedding or interfacing with low-level voice/audio libraries).

Familiarity with audio/signal-processing concepts: feature extraction (MFCCs, spectrograms), filter banks, audio pipelines, noise robustness, real-time streaming audio.

Familiarity with model compression/quantization/pruning, or hardware acceleration (GPU/TPU, on-device inference).

Publications or open-source contributions in Voice AI/ML, or experience with research→production workflows.

Familiarity with container orchestration (Kubernetes), observability/monitoring (Prometheus, Grafana), A/B testing of models.

Minimum Qualifications / Experience Levels

Master’s degree in Computer Science, Engineering, Applied Mathematics, or related field (or equivalent experience).

Minimum 3 years of relevant experience working with machine learning/AI, and at least 2 years deploying ML into production, especially Voice/Audio systems.

Demonstrated projects where Python + Rust have been used in production or near-production setting (or strong systems background with Rust + ML in Python).

Proven track record of shipping ML models into production, maintaining them, monitoring performance, and iterating.

Prior experience working in containerized environments and automating CI/CD for ML pipelines.

What you’ll get

Benefits:

Competitive compensation package with attractive salary and BSPCE (French ESOP)

Premium Alan health insurance

Full transportation reimbursement

5 weeks of paid vacation + 10 RTT days

Work Environment:

Brand-new, premium offices at La Maison in central Paris (Motier Ventures' hub) - an inspiring space designed for fast-growing startups.

Hybrid flexibility - Work remotely up to 3 days per week while staying connected to the team.

Top-tier equipment and infrastructure - Everything you need to do your best work.

Growth & Impact:

Work with world-class AI researchers - Collaborate directly with ex-CNRS scientists who pioneered speaker diarization, with access to Jean Zay supercomputer for training state-of-the-art models.

Solve cutting-edge technical challenges - Optimize real-time voice processing at scale, tackle sub-100ms latency requirements, and work on problems few teams globally are solving.

Rare dual-stack mastery - Build expertise in both Python ML ecosystem and Rust systems programming—highly sought-after skills in AI infrastructure.

See immediate global impact - Your code serves 140K+ developers and powers voice applications used by millions across transcription, call centers, dubbing, and voice agents.

Shape ML infrastructure from scratch - Define training pipelines, deployment workflows, and production architecture as one of our first ML engineers.

Bridge research and production - Transform breakthrough research into scalable systems, directly improving model accuracy, latency, and cost-efficiency.

Growth opportunities - Evolve into senior technical leadership, specialize in ML infrastructure, or transition into research based on your interests and performance.

⭐️ Hiring process

We've designed a comprehensive process to ensure mutual fit for this strategic role. Here's what to expect:

Screening call (30-45 min) - Get to know each other with our Chief of Staff and explore product philosophy alignment

Take-home case study (5-7 days) - Take home a case study focused on testing your skills and ability to think thoroughly on a project-based assignment

Product case study presentation (60 min) - Present your strategic thinking to our CTO and a senior engineer of our team

Founders conversation (45-60 min) - Meet our CEO and and CSO co-founders to align on vision, discuss the voice AI landscape, and explore what excites you about the space

Timeline: Typically 2.5-3 weeks from application to offer. We'll keep you informed at every stage and respond within 2-3 days after each step.

Apply here TypeformML Engineer @pyannoteAI

Equal Opportunity Employer

pyannoteAI is committed to creating a diverse and inclusive workplace. We are an equal opportunity employer and welcome applications from all qualified candidates regardless of gender, gender identity or expression, sexual orientation, race, ethnicity, national origin, age, disability, religion, or any other characteristic protected by law.

All employment decisions at pyannoteAI are based on business needs, job requirements, and individual qualifications. We believe that diverse perspectives strengthen our team and drive innovation in voice AI technology.

Build what the world will run on next