wav2vec2-large-xlsr-53-japanese
jonatasgrosman/wav2vec2-large-xlsr-53-japanese
A popular open speech-to-text model, with 6.1M downloads a month. gigarouter benchmarks and hosts it as an OpenAI-compatible API.
about this model
jonatasgrosman/wav2vec2-large-xlsr-53-japanese is an automatic speech recognition (ASR) model fine-tuned from Facebook's Wav2Vec2-Large-XLSR-53 for Japanese speech transcription. It is hosted on gigarouter as a managed, OpenAI-compatible API, eliminating the need for local inference setup.
Key Strengths
- Designed for Japanese ASR with support for mixed scripts (kanji, hiragana, katakana).
- Trained on Common Voice 6.1 (train/validation), CSS10, and JSUT datasets; requires 16 kHz sampled input.
- Provides competitive character-level accuracy for a purely acoustic model (no external language model).
Benchmark Results
Evaluated on the Common Voice Japanese test set (2021-05-10), the model achieves:
| Model | WER | CER |
|---|---|---|
| jonatasgrosman/wav2vec2-large-xlsr-53-japanese | 81.80% | 20.16% |
| vumichien/wav2vec2-large-xlsr-japanese | 1108.86% | 23.40% |
| qqhann/w2v_hf_jsut_xlsr53 | 1012.18% | 70.77% |
Note: the word error rate (WER) for this model is 81.80% and the character error rate (CER) is 20.16%. The very high WER values for comparative models likely result from evaluation script differences (e.g., blank token handling), but the table reflects the original benchmark.
Best Use Cases
- Japanese speech-to-text for applications where character-level accuracy is critical (CER ~20%).
- Scenarios requiring low-latency inference via a hosted API without managing GPU infrastructure.
We're benchmarking and onboarding wav2vec2-large-xlsr-53-japanese as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.