Distil Large V3.5

distil-whisper/distil-large-v3.5

published Dec 2024 · updated Apr 2026

Distil Large V3.5 is a distilled automatic speech recognition model that transcribes English audio with high efficiency, offering ~1.5x faster inference than Whisper Large V3 Turbo while maintaining competitive word error rates.

status

coming soon

API providers

downloads / mo

license

mit

specs

Task	Automatic Speech Recognition (ASR)
Architecture	Distil-Whisper (knowledge-distilled Transformer)
Parameters	756M
Training Data	98k hours of diverse public audio

about this model

Distil-Whisper Distil-Large-v3.5 is an automatic speech recognition (ASR) model that distills OpenAI’s Whisper-Large-v3 into a smaller, faster variant while preserving accuracy. It is trained on over 98,000 hours of diverse public data using a patient teacher, extended training schedule, and SpecAugment data augmentation, resulting in improved robustness over earlier Distil-Whisper models.

Key strengths

Speed: Approximately 1.5× faster than Whisper-Large-v3-Turbo in real-time factor (RTFx) on long-form transcription.
Accuracy: Out-of-distribution Word Error Rate (WER) of 7.08 on short-form and 11.39 on long-form tasks.
Speculative decoding: Can serve as a draft model for Whisper-Large-v3, achieving ~2× faster inference while producing identical outputs.

Performance comparisons

Short-form OOD WER (lower is better):

Model	Params (M)	OOD WER
large-v3-turbo	809	7.30
distil-large-v3	756	7.53
distil-large-v3.5	756	7.08

Long-form OOD WER and average RTFx (higher RTFx is faster):

Model	OOD WER	Avg RTFx
large-v3-turbo	10.25	33.81
distil-large-v3	11.6	48.64
distil-large-v3.5	11.39	49.34

The model is a collaborative effort by Bofeng Huang, Eustache Le Bihan, Steven Zheng, and Vaibhav Srivastav.

best for

·Short-form transcription of audio clips under 30 seconds
·Long-form transcription using sequential or chunked decoding
·Speculative decoding as a draft model paired with Whisper Large V3

FAQ

What is the main benefit of Distil Large V3.5 compared to Whisper Large V3 Turbo?

It is ~1.5x faster and performs slightly better on short-form transcription, while being slightly behind on long-form.

What input format does the model accept?

Audio files or waveforms sampled at 16 kHz, processed via the Whisper feature extractor.

How do I use this model via the gigarouter API?

Send HTTP requests to the OpenAI-compatible endpoint with your API key and audio data.

Can this model be used for speculative decoding?

Yes, it works as a draft model for Whisper Large V3, achieving ~2x faster inference while preserving identical outputs.

What hardware is recommended?

GPU with CUDA for optimal performance, but CPU inference is possible with reduced speed.

not yet live

We're benchmarking and onboarding Distil Large V3.5 as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related speech-to-text models

compare all →

speaker-diarization-3.1

wav2vec2-large-xlsr-53-japanese

6.1M dl/mo

wav2vec2-large-xlsr-53-polish

4.7M dl/mo

wav2vec2-large-xlsr-53-dutch

4.1M dl/mo