models / speech-to-text · coming soon

wav2vec2-large-xlsr-53-japanese

jonatasgrosman/wav2vec2-large-xlsr-53-japanese

A popular open speech-to-text model, with 6.1M downloads a month. gigarouter benchmarks and hosts it as an OpenAI-compatible API.

status

coming soon

API providers

downloads / mo

6.1M

license

apache-2.0

about this model

jonatasgrosman/wav2vec2-large-xlsr-53-japanese is an automatic speech recognition (ASR) model fine-tuned from Facebook's Wav2Vec2-Large-XLSR-53 for Japanese speech transcription. It is hosted on gigarouter as a managed, OpenAI-compatible API, eliminating the need for local inference setup.

Key Strengths

Designed for Japanese ASR with support for mixed scripts (kanji, hiragana, katakana).
Trained on Common Voice 6.1 (train/validation), CSS10, and JSUT datasets; requires 16 kHz sampled input.
Provides competitive character-level accuracy for a purely acoustic model (no external language model).

Benchmark Results

Evaluated on the Common Voice Japanese test set (2021-05-10), the model achieves:

Model	WER	CER
jonatasgrosman/wav2vec2-large-xlsr-53-japanese	81.80%	20.16%
vumichien/wav2vec2-large-xlsr-japanese	1108.86%	23.40%
qqhann/w2v_hf_jsut_xlsr53	1012.18%	70.77%

Note: the word error rate (WER) for this model is 81.80% and the character error rate (CER) is 20.16%. The very high WER values for comparative models likely result from evaluation script differences (e.g., blank token handling), but the table reflects the original benchmark.

Best Use Cases

Japanese speech-to-text for applications where character-level accuracy is critical (CER ~20%).
Scenarios requiring low-latency inference via a hosted API without managing GPU infrastructure.

not yet live

We're benchmarking and onboarding wav2vec2-large-xlsr-53-japanese as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.