Parakeet CTC 1.1B

nvidia/parakeet-ctc-1.1b

published Dec 2023 · updated Sep 2025

Parakeet CTC 1.1B is an automatic speech recognition model that transcribes English speech into lower-case text using a FastConformer-CTC architecture with 1.1 billion parameters.

status

coming soon

API providers

downloads / mo

781.7K

license

cc-by-4.0

specs

Task	Automatic Speech Recognition (ASR)
Architecture	FastConformer-CTC
Parameters	1.1B
License	CC-BY-4.0

about this model

Parakeet CTC 1.1B is an automatic speech recognition (ASR) model that transcribes English speech into lower-case text. It is an XXL version of the FastConformer CTC architecture with approximately 1.1 billion parameters, jointly developed by NVIDIA NeMo and Suno.ai.

The model is built on the Fast Conformer architecture, which is 2.8x faster than the original Conformer while supporting scaling to billion-parameter models. It uses CTC loss and a SentencePiece Unigram tokenizer with a vocabulary size of 1024. The model supports transcription of long-form speech up to 11 hours via post-training limited context attention with a global token. The architecture was accepted at ASRU 2023.

Training Data

The model was trained on 64,000 hours of English speech, comprising 40,000 hours of private data and 24,000 hours from public datasets including Librispeech, Fisher Corpus, Switchboard-1, WSJ, VCTK, VoxPopuli, Europarl-ASR, Multilingual Librispeech (MLS EN), Mozilla Common Voice (v7.0), and People's Speech.

Performance

Word Error Rate (WER%) with greedy decoding (no external language model) on standard benchmarks:

Benchmark	WER (%)
AMI	15.62
Earnings-22	13.69
Giga Speech	10.27
LibriSpeech test-clean	1.83
SPGI Speech	3.54
TEDLIUM-v3	4.20
Vox Populi	3.54
Common Voice	6.53

Additional benchmark results are available on the HuggingFace ASR Leaderboard.

Key Capabilities

Accepts 16 kHz mono-channel audio input
Supports transcription of long-form audio up to 11 hours
Fast Conformer architecture delivers 2.8x speed improvement over original Conformer
Licensed under CC-BY-4.0

best for

·Transcribing English speech from audio files or streams
·Processing long-form audio up to 11 hours
·High-accuracy transcription in production ASR pipelines

FAQ

What input format does the model require?

It accepts 16 kHz mono-channel WAV audio as input.

What output does the model produce?

It outputs transcribed speech as a lowercase English string.

How does this model compare in speed to the original Conformer?

The FastConformer architecture is 2.8x faster than the original Conformer.

What is the license for this model?

It is licensed under CC-BY-4.0.

How can I call this model via the gigarouter API?

Use the gigarouter OpenAI-compatible endpoint with your API key to send audio and receive transcriptions.

not yet live

We're benchmarking and onboarding Parakeet CTC 1.1B as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related speech-to-text models

compare all →

speaker-diarization-3.1

wav2vec2-large-xlsr-53-japanese

6.1M dl/mo

wav2vec2-large-xlsr-53-polish

4.7M dl/mo

wav2vec2-large-xlsr-53-dutch

4.1M dl/mo