skip to content
gigarouter gigarouter
models / speech-to-text · coming soon

Distil Large V3.5

distil-whisper/distil-large-v3.5

published Dec 2024 · updated Apr 2026

Distil Large V3.5 is a distilled automatic speech recognition model that transcribes English audio with high efficiency, offering ~1.5x faster inference than Whisper Large V3 Turbo while maintaining competitive word error rates.

status
coming soon
API providers
0
downloads / mo
3K
license
mit

specs

TaskAutomatic Speech Recognition (ASR)
ArchitectureDistil-Whisper (knowledge-distilled Transformer)
Parameters756M
Training Data98k hours of diverse public audio

about this model

Distil-Whisper Distil-Large-v3.5 is an automatic speech recognition (ASR) model that distills OpenAI’s Whisper-Large-v3 into a smaller, faster variant while preserving accuracy. It is trained on over 98,000 hours of diverse public data using a patient teacher, extended training schedule, and SpecAugment data augmentation, resulting in improved robustness over earlier Distil-Whisper models.

Key strengths

  • Speed: Approximately 1.5× faster than Whisper-Large-v3-Turbo in real-time factor (RTFx) on long-form transcription.
  • Accuracy: Out-of-distribution Word Error Rate (WER) of 7.08 on short-form and 11.39 on long-form tasks.
  • Speculative decoding: Can serve as a draft model for Whisper-Large-v3, achieving ~2× faster inference while producing identical outputs.

Performance comparisons

Short-form OOD WER (lower is better):

ModelParams (M)OOD WER
large-v3-turbo8097.30
distil-large-v37567.53
distil-large-v3.57567.08

Long-form OOD WER and average RTFx (higher RTFx is faster):

ModelOOD WERAvg RTFx
large-v3-turbo10.2533.81
distil-large-v311.648.64
distil-large-v3.511.3949.34

The model is a collaborative effort by Bofeng Huang, Eustache Le Bihan, Steven Zheng, and Vaibhav Srivastav.

best for

FAQ

What is the main benefit of Distil Large V3.5 compared to Whisper Large V3 Turbo?

It is ~1.5x faster and performs slightly better on short-form transcription, while being slightly behind on long-form.

What input format does the model accept?

Audio files or waveforms sampled at 16 kHz, processed via the Whisper feature extractor.

How do I use this model via the gigarouter API?

Send HTTP requests to the OpenAI-compatible endpoint with your API key and audio data.

Can this model be used for speculative decoding?

Yes, it works as a draft model for Whisper Large V3, achieving ~2x faster inference while preserving identical outputs.

What hardware is recommended?

GPU with CUDA for optimal performance, but CPU inference is possible with reduced speed.

not yet live

We're benchmarking and onboarding Distil Large V3.5 as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related speech-to-text models

compare all →