wav2vec2-large-xlsr-53-arabic
jonatasgrosman/wav2vec2-large-xlsr-53-arabic
A popular open speech-to-text model, with 3.5M downloads a month. gigarouter benchmarks and hosts it as an OpenAI-compatible API.
about this model
jonatasgrosman/wav2vec2-large-xlsr-53-arabic is an automatic speech recognition (ASR) model for Arabic that transcribes spoken audio into text. It is a fine-tuned version of Facebook’s wav2vec2-large-xlsr-53, trained on the train and validation splits of Common Voice 6.1 and the Arabic Speech Corpus. Input audio must be sampled at 16 kHz.
Key Strengths
This model achieves the lowest Word Error Rate (WER) and Character Error Rate (CER) among several publicly available Arabic ASR models when evaluated on the Common Voice Arabic test set. The evaluation was run on 2021‑05‑14 and results are reported below.
| Model | WER | CER |
|---|---|---|
| jonatasgrosman/wav2vec2-large-xlsr-53-arabic | 39.59% | 18.18% |
| bakrianoo/sinai-voice-ar-stt | 45.30% | 21.84% |
| othrif/wav2vec2-large-xlsr-arabic | 45.93% | 20.51% |
| kmfoda/wav2vec2-large-xlsr-arabic | 54.14% | 26.07% |
| mohammed/wav2vec2-large-xlsr-arabic | 56.11% | 26.79% |
| anas/wav2vec2-large-xlsr-arabic | 62.02% | 27.09% |
| elgeish/wav2vec2-large-xlsr-53-arabic | 100.00% | 100.56% |
Best For
This model is suitable for production Arabic speech-to-text pipelines where high accuracy and low latency are required. It handles Modern Standard Arabic and dialects represented in Common Voice and the Arabic Speech Corpus. Gigarouter hosts it as a managed OpenAI‑compatible API, eliminating the need for manual model loading or infrastructure setup.
Citation
Jonatas Grosman. Fine-tuned XLSR-53 large model for speech recognition in Arabic. 2021. Available at https://huggingface.co/jonatasgrosman/wav2vec2-large-xlsr-53-arabic.
We're benchmarking and onboarding wav2vec2-large-xlsr-53-arabic as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.