romanian-wav2vec2
gigant/romanian-wav2vec2
A popular open speech-to-text model, with 2.8M downloads a month. gigarouter benchmarks and hosts it as an OpenAI-compatible API.
about this model
gigant/romanian-wav2vec2 is an automatic speech recognition (ASR) model for Romanian, fine-tuned from facebook/wav2vec2-xls-r-300m on the Common Voice 8.0 Romanian dataset with additional data from Romanian Speech Synthesis.
The model achieved top-1 ranking for Romanian speech recognition in the Hugging Face Robust Speech Challenge (Speech Bench and Challenge Leaderboard). It includes a 5-gram language model (built with pyctcdecode and kenlm) trained on Romanian Corpora Parliament.
Key Benchmarks
On the Common Voice 8.0 Romanian test split (without the 5-gram LM optimization):
- Word Error Rate (WER): 0.1174
- Character Error Rate (CER): 0.0294
- Loss: 0.1553
Intended Use
Best suited for Romanian speech recognition from audio sampled at 16 kHz. Output is lowercased without punctuation. The model is hosted as an OpenAI-compatible API on gigarouter, requiring no local installation or dependency management.
Training Details
| Hyperparameter | Value |
|---|---|
| Learning rate | 0.003 |
| Batch size (train/eval) | 16 / 8 |
| Gradient accumulation steps | 3 |
| Optimizer | Adam (betas=0.9,0.999; epsilon=1e-8) |
| LR scheduler | Linear, warmup 500 steps |
| Epochs | 50 |
| Mixed precision | Native AMP |
Final training loss reached 0.0376 after 49.69 epochs. The model uses a CTC head with an added 5-gram language model decoder for improved accuracy.
We're benchmarking and onboarding romanian-wav2vec2 as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.