skip to content
gigarouter gigarouter
models / speech-to-text · coming soon

wav2vec2-large-xlsr-53-japanese

jonatasgrosman/wav2vec2-large-xlsr-53-japanese

A popular open speech-to-text model, with 6.1M downloads a month. gigarouter benchmarks and hosts it as an OpenAI-compatible API.

status
coming soon
API providers
0
downloads / mo
6.1M
license
apache-2.0

about this model

jonatasgrosman/wav2vec2-large-xlsr-53-japanese is an automatic speech recognition (ASR) model fine-tuned from Facebook's Wav2Vec2-Large-XLSR-53 for Japanese speech transcription. It is hosted on gigarouter as a managed, OpenAI-compatible API, eliminating the need for local inference setup.

Key Strengths

  • Designed for Japanese ASR with support for mixed scripts (kanji, hiragana, katakana).
  • Trained on Common Voice 6.1 (train/validation), CSS10, and JSUT datasets; requires 16 kHz sampled input.
  • Provides competitive character-level accuracy for a purely acoustic model (no external language model).

Benchmark Results

Evaluated on the Common Voice Japanese test set (2021-05-10), the model achieves:

Model WER CER
jonatasgrosman/wav2vec2-large-xlsr-53-japanese 81.80% 20.16%
vumichien/wav2vec2-large-xlsr-japanese 1108.86% 23.40%
qqhann/w2v_hf_jsut_xlsr53 1012.18% 70.77%

Note: the word error rate (WER) for this model is 81.80% and the character error rate (CER) is 20.16%. The very high WER values for comparative models likely result from evaluation script differences (e.g., blank token handling), but the table reflects the original benchmark.

Best Use Cases

  • Japanese speech-to-text for applications where character-level accuracy is critical (CER ~20%).
  • Scenarios requiring low-latency inference via a hosted API without managing GPU infrastructure.
not yet live

We're benchmarking and onboarding wav2vec2-large-xlsr-53-japanese as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.