Parakeet CTC 0.6B

nvidia/parakeet-ctc-0.6b

published Dec 2023 · updated Sep 2025

Parakeet CTC 0.6B is an automatic speech recognition model using FastConformer-CTC architecture that transcribes English speech into lowercase text.

status

coming soon

API providers

downloads / mo

15.3K

license

cc-by-4.0

specs

Task	Automatic Speech Recognition (ASR)
Architecture	FastConformer CTC (8x downsampling)
Parameters	0.6B (600 million)
License	CC-BY-4.0

about this model

Parakeet CTC 0.6B is an automatic speech recognition (ASR) model that transcribes speech in lower case English alphabet. It is based on the FastConformer CTC architecture with approximately 600 million parameters, jointly developed by NVIDIA NeMo and Suno.ai.

Architecture and Training

FastConformer is an optimized variant of the Conformer model, achieving 2.8x faster inference than the original Conformer while supporting scaling to billion-parameter models without core architectural changes. The model was trained using CTC loss on 64,000 hours of English speech, comprising 40,000 hours of private data and 24,000 hours of public datasets including Librispeech, Fisher Corpus, Switchboard, WSJ, VCTK, VoxPopuli, Europarl-ASR, Multilingual Librispeech (2,000-hour subset), Mozilla Common Voice (v7.0), and People’s Speech (12,000-hour subset).

With post-training replacement of global attention by limited context attention, the model can transcribe long-form speech up to 11 hours in duration. For further details, refer to the Fast Conformer paper (ASRU 2023, arXiv:2305.05084).

Performance Benchmarks

Word Error Rate (WER) with greedy decoding (no external language model) on standard benchmarks:

AMI	Earnings-22	Giga Speech	LS test-clean	SPGI Speech	TEDLIUM-v3	Vox Populi	Common Voice
16.30	14.14	10.35	1.87	3.76	4.11	3.78	7.00

Benchmarks are from the Hugging Face ASR Leaderboard. The model uses a SentencePiece Unigram tokenizer with vocabulary size 1024.

License: CC-BY-4.0.

best for

·Transcribing recorded meetings and earnings calls
·Real-time speech-to-text for voice commands and dictation
·Large-scale batch transcription of audio libraries

FAQ

What is the best use case for Parakeet CTC 0.6B?

It excels at transcribing English speech in various domains, including meetings, lectures, and voice assistants, with high accuracy.

How does it compare in size and speed?

It has 0.6B parameters and the FastConformer architecture provides 2.8x faster inference than the original Conformer.

What are the license terms?

It is licensed under CC-BY-4.0, allowing use with attribution.

What input/output format does it accept?

It accepts 16000 Hz mono-channel WAV audio and outputs transcribed lowercase English text.

How can I call this model via the gigarouter API?

Use the OpenAI-compatible endpoint with your API key; refer to gigarouter documentation for exact details.

not yet live

We're benchmarking and onboarding Parakeet CTC 0.6B as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related speech-to-text models

compare all →

speaker-diarization-3.1

wav2vec2-large-xlsr-53-japanese

6.1M dl/mo

wav2vec2-large-xlsr-53-polish

4.7M dl/mo

wav2vec2-large-xlsr-53-dutch

4.1M dl/mo