skip to content
gigarouter gigarouter
tasks / speech-to-text

Hosted speech-to-text models

30 models · 0 live as APIs · benchmarked & compared

Speech-to-text models convert spoken audio into written text, enabling applications such as real-time captioning, meeting transcription, voice-controlled interfaces, and automated subtitling. Speaker diarization models—such as pyannote/speaker-diarization-3.1—extend this by identifying who spoke when, which is critical for multi-speaker recordings like conference calls or interviews.

In production, these models are typically deployed in pipelines that include voice activity detection, language identification, and post-processing for punctuation and formatting. The choice among models involves a trade-off between transcription accuracy, latency, and computational cost. For example, openai/whisper-base offers a fast, compact option, while larger variants or specialized models like jonatasgrosman/wav2vec2-large-xlsr-53-japanese are tuned for specific languages or higher accuracy at the expense of speed and memory.

This page lists 30 speech-to-text models (0 currently live, the remainder being onboarded), including pyannote/speaker-diarization-3.1, argmaxinc/whisperkit-coreml, openai/whisper-base, and several wav2vec2 variants. Calling a

compare

modelparamsdownloads/mopricestatus
pyannote/speaker-diarization-3.1-8.2Mat launchcoming soon
argmaxinc/whisperkit-coreml-8Mat launchcoming soon
openai/whisper-base72.6M6.4Mat launchcoming soon
jonatasgrosman/wav2vec2-large-xlsr-53-japanese-6.1Mat launchcoming soon
jonatasgrosman/wav2vec2-large-xlsr-53-polish-4.7Mat launchcoming soon
jonatasgrosman/wav2vec2-large-xlsr-53-dutch-4.1Mat launchcoming soon
indonesian-nlp/wav2vec2-indonesian-javanese-sundanese-4.1Mat launchcoming soon
pyannote/speaker-diarization-community-1-4Mat launchcoming soon
jonatasgrosman/wav2vec2-large-xlsr-53-arabic-3.5Mat launchcoming soon
jonatasgrosman/wav2vec2-large-xlsr-53-hungarian-3.4Mat launchcoming soon
openai/whisper-small241.7M3.3Mat launchcoming soon
MahmoudAshraf/mms-300m-1130-forced-aligner315.5M3.2Mat launchcoming soon
jonatasgrosman/wav2vec2-large-xlsr-53-portuguese-3.2Mat launchcoming soon
jonatasgrosman/wav2vec2-large-xlsr-53-russian-2.9Mat launchcoming soon
gigant/romanian-wav2vec2315.5M2.8Mat launchcoming soon
anuragshas/wav2vec2-large-xlsr-53-telugu-2.8Mat launchcoming soon
jonatasgrosman/wav2vec2-large-xlsr-53-persian-2.5Mat launchcoming soon
KBLab/wav2vec2-large-voxrex-swedish315.5M2.5Mat launchcoming soon
kingabzpro/wav2vec2-large-xls-r-300m-Urdu315.5M2.3Mat launchcoming soon
theainerd/Wav2Vec2-large-xlsr-hindi315.5M2.1Mat launchcoming soon
pyannote/voice-activity-detection-2Mat launchcoming soon
mistralai/Voxtral-Mini-4B-Realtime-26024429.7M2Mat launchcoming soon
imvladikon/wav2vec2-xls-r-300m-hebrew315.5M1.8Mat launchcoming soon
mesolitica/wav2vec2-xls-r-300m-mixed-1.8Mat launchcoming soon
airesearch/wav2vec2-large-xlsr-53-th-1.7Mat launchcoming soon
openai/whisper-tiny37.8M1.6Mat launchcoming soon
jonatasgrosman/wav2vec2-large-xlsr-53-chinese-zh-cn-1.5Mat launchcoming soon
mlx-community/parakeet-tdt-0.6b-v2-1.5Mat launchcoming soon
arijitx/wav2vec2-xls-r-300m-bengali-1.4Mat launchcoming soon
Systran/faster-whisper-base-1.4Mat launchcoming soon