wav2vec2-large-xlsr-53-telugu
anuragshas/wav2vec2-large-xlsr-53-telugu
A popular open speech-to-text model, with 2.8M downloads a month. gigarouter benchmarks and hosts it as an OpenAI-compatible API.
about this model
anuragshas/wav2vec2-large-xlsr-53-telugu is an automatic speech recognition (ASR) model fine-tuned for Telugu. It is based on Facebook’s XLSR-53 multilingual speech representation model and further trained on 70% of the OpenSLR SLR66 Telugu dataset. The model accepts 16 kHz mono audio input and produces transcribed text.
Key strengths
- Designed specifically for Telugu ASR, leveraging cross-lingual pretraining from 53 languages.
- Trained and evaluated on a standardized open dataset (OpenSLR SLR66), enabling reproducible comparison.
- Direct usage without a separate language model; output is generated via greedy decoding from the CTC head.
Best for
- Transcribing Telugu speech in applications such as media captioning, voice commands, and conversational analytics.
- Scenarios where a dedicated Telugu model is preferred over general multilingual alternatives.
Benchmark result
The model achieves a Word Error Rate (WER) of 44.98% on the test split of OpenSLR SLR66 Telugu.
Additional details
- Fine-tuned on Telugu only; not intended for other languages.
- Input audio must be resampled to 16 kHz for optimal performance.
- Text normalization (removal of punctuation, English characters, etc.) was applied during evaluation; similar preprocessing is recommended for production use.
We're benchmarking and onboarding wav2vec2-large-xlsr-53-telugu as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.