models / speech-to-text · coming soon

wav2vec2-large-xlsr-53-persian

jonatasgrosman/wav2vec2-large-xlsr-53-persian

A popular open speech-to-text model, with 2.5M downloads a month. gigarouter benchmarks and hosts it as an OpenAI-compatible API.

status

coming soon

API providers

downloads / mo

2.5M

license

apache-2.0

about this model

Jonatasgrosman/wav2vec2-large-xlsr-53-persian is an automatic speech recognition (ASR) model fine-tuned from Facebook’s wav2vec2-large-xlsr-53 on Persian speech. It is optimized for transcribing Persian-language audio sampled at 16 kHz and achieves competitive accuracy on the Common Voice 6.1 test set.

Key strengths

Fine-tuned exclusively on Persian data (train and validation splits of Common Voice 6.1).
Requires no external language model for inference; can be used directly for transcription.
Delivers a Character Error Rate (CER) of 7.37%, indicating strong phonetic accuracy.

Benchmark results

The following table reports Word Error Rate (WER) and Character Error Rate (CER) on the Persian test set of Common Voice, evaluated on 2021-04-22.

Model	WER	CER
jonatasgrosman/wav2vec2-large-xlsr-53-persian	30.12%	7.37%
m3hrdadfi/wav2vec2-large-xlsr-persian-v2	33.85%	8.79%
m3hrdadfi/wav2vec2-large-xlsr-persian	34.37%	8.98%

Best for

Persian speech-to-text applications where low character error and direct transcription (without a language model) are priorities. The model is hosted on gigarouter as a managed, OpenAI-compatible API — no installation or local dependencies required.

not yet live

We're benchmarking and onboarding wav2vec2-large-xlsr-53-persian as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.