wav2vec2-large-xlsr-53-persian
jonatasgrosman/wav2vec2-large-xlsr-53-persian
A popular open speech-to-text model, with 2.5M downloads a month. gigarouter benchmarks and hosts it as an OpenAI-compatible API.
about this model
Jonatasgrosman/wav2vec2-large-xlsr-53-persian is an automatic speech recognition (ASR) model fine-tuned from Facebook’s wav2vec2-large-xlsr-53 on Persian speech. It is optimized for transcribing Persian-language audio sampled at 16 kHz and achieves competitive accuracy on the Common Voice 6.1 test set.
Key strengths
- Fine-tuned exclusively on Persian data (train and validation splits of Common Voice 6.1).
- Requires no external language model for inference; can be used directly for transcription.
- Delivers a Character Error Rate (CER) of 7.37%, indicating strong phonetic accuracy.
Benchmark results
The following table reports Word Error Rate (WER) and Character Error Rate (CER) on the Persian test set of Common Voice, evaluated on 2021-04-22.
| Model | WER | CER |
|---|---|---|
| jonatasgrosman/wav2vec2-large-xlsr-53-persian | 30.12% | 7.37% |
| m3hrdadfi/wav2vec2-large-xlsr-persian-v2 | 33.85% | 8.79% |
| m3hrdadfi/wav2vec2-large-xlsr-persian | 34.37% | 8.98% |
Best for
Persian speech-to-text applications where low character error and direct transcription (without a language model) are priorities. The model is hosted on gigarouter as a managed, OpenAI-compatible API — no installation or local dependencies required.
We're benchmarking and onboarding wav2vec2-large-xlsr-53-persian as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.