hero

Build what the world will run on next

Discover roles across the Crane portfolio

Research Scientist - Full-Time

Pyannoteai

Pyannoteai

Posted on Feb 18, 2026

Research Scientist - Full-Time

Role: Research Scientist (VoiceAI)

Location: Toulouse, Paris, Remote
Job type: Full-time
Work setup: On-Site, Hybrid, Remote
Start: ASAP

Job offer

About pyannoteAI

pyannoteAI is pioneering Speaker Intelligence AI, transforming how AI processes and understands spoken language. Our speaker diarization technology distinguishes speakers with unmatched precision, regardless of the spoken language, making AI understand not just what is said, but who said it and when.
Founded by voice AI experts with 10+ years in the industry (ex-CNRS research scientists), we've built the 9th most downloaded open-source model on HuggingFace with 52 million monthly downloads and over 140,000 users worldwide. After raising €8M from leading international VCs (Crane Venture Partners, Serena, and angels from HuggingFace and OpenAI), we're now scaling our enterprise platform.
From meeting transcription and call center analytics to video dubbing and voice agents, pyannoteAI powers the next generation of voice-enabled applications across industries that depend on understanding who speaks and when.

Your role

As a Research Scientist at pyannoteAI, you'll push the boundaries of multi-talker conversational speech processing, conducting both applied and moonshot research that gets published at top-tier conferences and deployed to production. Working alongside the creators of pyannote.audio, you'll train state-of-the-art models at scale and collaborate with engineering to bring breakthrough research to 140K+ developers and large Enterprise customers worldwide.
You'll:
Conduct cutting-edge research - Explore batch and streaming approaches to multi-talker speech processing including speaker diarization, speaker separation, multi-talker transcription, speaker identification, and speaker profiling
Train models at scale - Design, implement, and train neural network architectures using PyTorch
Publish at top-tier venues - Contribute original research to leading speech and machine learning conferences
Bridge research and production - Collaborate with our tech team to optimize and deploy models that serve millions of API calls
Contribute to our codebase - Develop and maintain both our proprietary and open-source libraries
What makes this role unique: Unlike typical research roles, you'll see your work both published AND deployed to production. With access to Jean Zay supercomputer and mentorship from researchers who pioneered speaker diarization for over a decade, you'll have the resources to do your best work with immediate real-world impact.

What we’re looking for

Must-haves:

PhD in speech processing, audio understanding, or machine learning - Deep theoretical foundation in the core domains relevant to our work
Track record of publication at top-tier conferences - You've published at venues like Interspeech, ICASSP, NeurIPS, ICML, or similar speech/ML conferences
Experience training state-of-the-art models in PyTorch - You've designed and trained neural network architectures that achieve competitive or SOTA results
Strong software engineering fundamentals - Advanced Python, Git, PyTorch, and Jupyter notebooks with clean, reproducible code practices
Collaborative development experience - Comfortable with Git workflows, pull requests, code reviews, and working in a team environment
Fluent in English - Able to write papers, present research, and communicate with the team professionally

Nice-to-haves:

Experience with
pyannote.audio
- You've used our toolkit in past research projects and understand its architecture
Model optimization expertise - Experience with distillation, quantization, pruning, or other inference optimization techniques
HPC and cluster computing - Familiarity with distributed training and large-scale computing environments, ideally
slurm
job scheduling
rust
programming - Ability to contribute to performance-critical inference systems
Fluent in French - Being based in France, it’s always a plus

What you’ll get

Benefits:
Competitive research compensation package with attractive BSPCE (French ESOP)
Premium Alan health insurance
Full transportation reimbursement
5 weeks of paid vacation + 10 RTT days
Conference travel budget
Work Environment:
Brand-new, premium offices in Toulouse or at La Maison in central Paris (Motier Ventures' hub) - inspiring spaces designed for fast-growing startups
Full remote flexibility - Work from anywhere, with the option to join us in our Toulouse or Paris offices if you prefer in-person collaboration
Top-tier equipment - Latest MacBooks, GPU access for rapid prototyping and testing, and everything you need to do your best work
Access to Jean Zay supercomputer - Train models on one of Europe's most powerful AI supercomputers (managed by GENCI)
Research & Impact:
Work with pioneering researchers - Collaborate directly with ex-CNRS scientists who created pyannote.audio and are recognized globally as leaders in speaker diarization
Publish and deploy - Unlike pure research roles, see your work both published at top venues AND deployed to production serving 140K+ developers
Immediate real-world impact - Your research improvements directly accelerate model development and unlock new capabilities used by millions of end users
Cutting-edge challenges - Work on diverse voice AI problems spanning multiple languages, real-time streaming, and production-scale datasets
Research autonomy - Freedom to pursue both applied research aligned with product needs and moonshot explorations that could redefine the field
Mentorship and growth - Learn from researchers who have shaped the speaker diarization field for over a decade, with potential to grow into senior research leadership

⭐️ Hiring process

We've designed a comprehensive process to ensure mutual fit for this strategic role. Here's what to expect:
Screening call (30-45 min) - Get to know each other with our Chief of Staff and explore product philosophy alignment
Take-home research study (2-3 days) - Take home a research study focused on testing your skills and ability to think thoroughly on a research assignment
Research presentation (60 min) - Present the result of the research study (30min) as well as a past research of your choice (30min) to our CSO and a senior researcher of our team
Founders conversation (45-60 min) - Meet our CEO and and CTO co-founders to align on vision, discuss the voice AI landscape, and explore what excites you about the space
Timeline: Typically 2.5-3 weeks from application to offer. We'll keep you informed at every stage and respond within 2-3 days after each step.
Equal Opportunity Employer
pyannoteAI is committed to creating a diverse and inclusive workplace. We are an equal opportunity employer and welcome applications from all qualified candidates regardless of gender, gender identity or expression, sexual orientation, race, ethnicity, national origin, age, disability, religion, or any other characteristic protected by law.
All employment decisions at pyannoteAI are based on business needs, job requirements, and individual qualifications. We believe that diverse perspectives strengthen our team and drive innovation in voice AI technology.