models / speech-to-text · coming soon

wav2vec2-large-xlsr-53-telugu

anuragshas/wav2vec2-large-xlsr-53-telugu

A popular open speech-to-text model, with 2.8M downloads a month. gigarouter benchmarks and hosts it as an OpenAI-compatible API.

status

coming soon

API providers

downloads / mo

2.8M

license

apache-2.0

about this model

anuragshas/wav2vec2-large-xlsr-53-telugu is an automatic speech recognition (ASR) model fine-tuned for Telugu. It is based on Facebook’s XLSR-53 multilingual speech representation model and further trained on 70% of the OpenSLR SLR66 Telugu dataset. The model accepts 16 kHz mono audio input and produces transcribed text.

Key strengths

Designed specifically for Telugu ASR, leveraging cross-lingual pretraining from 53 languages.
Trained and evaluated on a standardized open dataset (OpenSLR SLR66), enabling reproducible comparison.
Direct usage without a separate language model; output is generated via greedy decoding from the CTC head.

Best for

Transcribing Telugu speech in applications such as media captioning, voice commands, and conversational analytics.
Scenarios where a dedicated Telugu model is preferred over general multilingual alternatives.

Benchmark result

The model achieves a Word Error Rate (WER) of 44.98% on the test split of OpenSLR SLR66 Telugu.

Additional details

Fine-tuned on Telugu only; not intended for other languages.
Input audio must be resampled to 16 kHz for optimal performance.
Text normalization (removal of punctuation, English characters, etc.) was applied during evaluation; similar preprocessing is recommended for production use.

not yet live

We're benchmarking and onboarding wav2vec2-large-xlsr-53-telugu as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.