Parakeet RNNT 0.6B
nvidia/parakeet-rnnt-0.6b
published Dec 2023 · updated Jun 2026
Parakeet RNNT 0.6B is an ASR model that transcribes English speech into lower-case text.
specs
| Task | Automatic Speech Recognition (ASR) |
| Architecture | FastConformer-Transducer (RNNT) |
| Parameters | 0.6B |
| License | CC-BY-4.0 |
about this model
Parakeet-RNNT-0.6B is an automatic speech recognition (ASR) model that transcribes English speech into lower-case text. Developed jointly by NVIDIA NeMo and Suno.ai, it is a FastConformer Transducer model with approximately 600 million parameters. The FastConformer architecture is an optimized version of the Conformer model that uses 8x depthwise-separable convolutional downsampling, achieving 2.8x faster inference than the original Conformer while supporting scaling to billion-parameter models.
Key Capabilities
The model accepts 16 kHz mono-channel audio as input and outputs transcribed text as a string. It uses a SentencePiece Unigram tokenizer with a vocabulary size of 1024. The model was trained on 64,000 hours of English speech, comprising a private 40,000-hour subset and 24,000 hours from public datasets including LibriSpeech, Fisher Corpus, Switchboard-1, WSJ, VCTK, VoxPopuli, Europarl-ASR, MLS English, Mozilla Common Voice, and People's Speech.
Benchmark Performance
Word Error Rate (WER) with greedy decoding on standard benchmarks:
| Dataset | WER (%) |
|---|---|
| LS test-clean | 1.63 |
| SPGI Speech | 3.06 |
| TEDLIUM-v3 | 3.47 |
| Vox Populi | 3.86 |
| Common Voice | 8.07 |
| Giga Speech | 10.07 |
| Earnings-22 | 14.78 |
| AMI | 17.55 |
These are greedy decoding results without an external language model. The model supports transcription of long-form audio up to 11 hours through limited context attention, applied post-training with fine-tuning using a global token.
Licensing
This model is released under the CC-BY-4.0 license.
best for
- ·Transcribing English meeting recordings
- ·Voice command transcription
- ·Automated captioning of English audio
FAQ
It transcribes English speech into lower-case text for general ASR tasks.
It accepts 16000 Hz mono-channel WAV files.
Use the gigarouter OpenAI-compatible endpoint with an API key to send audio and receive transcription.
It is licensed under CC-BY-4.0.
It achieves WER of 1.63% on LibriSpeech test-clean and 14.78% on Earnings-22 with greedy decoding.
We're benchmarking and onboarding Parakeet RNNT 0.6B as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.