skip to content
gigarouter gigarouter
models / speech-to-text · coming soon

wav2vec2-large-xls-r-300m-Urdu

kingabzpro/wav2vec2-large-xls-r-300m-Urdu

A popular open speech-to-text model, with 2.3M downloads a month. gigarouter benchmarks and hosts it as an OpenAI-compatible API.

status
coming soon
API providers
0
downloads / mo
2.3M
license
apache-2.0

about this model

kingabzpro/wav2vec2-large-xls-r-300m-Urdu is an automatic speech recognition (ASR) model fine-tuned from Facebook’s XLS-R 300M for Urdu. It transcribes 16 kHz mono audio and includes an optional 5-gram KenLM decoder for improved accuracy.

Key strengths

  • Best reported result on the Urdu Common Voice 8.0 test set: 39.89% WER / 16.70% CER with KenLM decoding (reproducible Kaggle notebook).
  • The KenLM decoder reduces WER from 56.07% (greedy CTC) to 39.89% on the full test set.

Benchmark results (Common Voice 8.0, Urdu test split)

Decoder Test WER Test CER
Greedy CTC 56.07% 23.70%
5-gram language model 39.89% 16.70%

Best for

Urdu speech transcription and prototyping. Accuracy varies with recording quality, accent, background noise, and domain-specific vocabulary. Review transcripts before use in production or user-facing workflows.

Hosted API

This model is hosted on gigarouter as an OpenAI-compatible API. No local setup or dependency management is required.

not yet live

We're benchmarking and onboarding wav2vec2-large-xls-r-300m-Urdu as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.