wav2vec2-large-xls-r-300m-Urdu
kingabzpro/wav2vec2-large-xls-r-300m-Urdu
A popular open speech-to-text model, with 2.3M downloads a month. gigarouter benchmarks and hosts it as an OpenAI-compatible API.
about this model
kingabzpro/wav2vec2-large-xls-r-300m-Urdu is an automatic speech recognition (ASR) model fine-tuned from Facebook’s XLS-R 300M for Urdu. It transcribes 16 kHz mono audio and includes an optional 5-gram KenLM decoder for improved accuracy.
Key strengths
- Best reported result on the Urdu Common Voice 8.0 test set: 39.89% WER / 16.70% CER with KenLM decoding (reproducible Kaggle notebook).
- The KenLM decoder reduces WER from 56.07% (greedy CTC) to 39.89% on the full test set.
Benchmark results (Common Voice 8.0, Urdu test split)
| Decoder | Test WER | Test CER |
|---|---|---|
| Greedy CTC | 56.07% | 23.70% |
| 5-gram language model | 39.89% | 16.70% |
Best for
Urdu speech transcription and prototyping. Accuracy varies with recording quality, accent, background noise, and domain-specific vocabulary. Review transcripts before use in production or user-facing workflows.
Hosted API
This model is hosted on gigarouter as an OpenAI-compatible API. No local setup or dependency management is required.
We're benchmarking and onboarding wav2vec2-large-xls-r-300m-Urdu as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.