skip to content
gigarouter gigarouter
models / speech-to-text · coming soon

wav2vec2-large-voxrex-swedish

KBLab/wav2vec2-large-voxrex-swedish

A popular open speech-to-text model, with 2.5M downloads a month. gigarouter benchmarks and hosts it as an OpenAI-compatible API.

status
coming soon
API providers
0
downloads / mo
2.5M
license
cc0-1.0

about this model

KBLab/wav2vec2-large-voxrex-swedish is an automatic speech recognition (ASR) model fine-tuned from the Swedish VoxRex large model on a combination of Swedish radio broadcasts, NST, and Common Voice data. It is optimized for transcribing Swedish speech with high accuracy.

Performance

Without a language model, the model achieves a word error rate (WER) of 2.5% on the NST + Common Voice test set (2% of total sentences). On the Common Voice test set, WER is 8.49% directly and 7.37% when used with a 4-gram language model.

Bar chart comparing model performance with and without additional Common Voice fine-tuning Line graph showing WER during training across updates

Training Details

The model was fine-tuned for 120,000 updates on the combined NST and Common Voice dataset, followed by an additional 20,000 updates on Common Voice only. This second phase improves performance on Common Voice but slightly reduces it on the mixed NST+Common Voice test set. A full description is available in the accompanying paper (Malmsten et al., arXiv:2205.03026).

Input speech must be sampled at 16 kHz. The model is hosted as a managed API on gigarouter, requiring no local installation – simply call the OpenAI-compatible endpoint.

not yet live

We're benchmarking and onboarding wav2vec2-large-voxrex-swedish as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.