models / speech-to-text · coming soon

romanian-wav2vec2

gigant/romanian-wav2vec2

A popular open speech-to-text model, with 2.8M downloads a month. gigarouter benchmarks and hosts it as an OpenAI-compatible API.

status

coming soon

API providers

downloads / mo

2.8M

license

apache-2.0

about this model

gigant/romanian-wav2vec2 is an automatic speech recognition (ASR) model for Romanian, fine-tuned from facebook/wav2vec2-xls-r-300m on the Common Voice 8.0 Romanian dataset with additional data from Romanian Speech Synthesis.

The model achieved top-1 ranking for Romanian speech recognition in the Hugging Face Robust Speech Challenge (Speech Bench and Challenge Leaderboard). It includes a 5-gram language model (built with pyctcdecode and kenlm) trained on Romanian Corpora Parliament.

Key Benchmarks

On the Common Voice 8.0 Romanian test split (without the 5-gram LM optimization):

Word Error Rate (WER): 0.1174
Character Error Rate (CER): 0.0294
Loss: 0.1553

Intended Use

Best suited for Romanian speech recognition from audio sampled at 16 kHz. Output is lowercased without punctuation. The model is hosted as an OpenAI-compatible API on gigarouter, requiring no local installation or dependency management.

Training Details

Hyperparameter	Value
Learning rate	0.003
Batch size (train/eval)	16 / 8
Gradient accumulation steps	3
Optimizer	Adam (betas=0.9,0.999; epsilon=1e-8)
LR scheduler	Linear, warmup 500 steps
Epochs	50
Mixed precision	Native AMP

Final training loss reached 0.0376 after 49.69 epochs. The model uses a CTC head with an added 5-gram language model decoder for improved accuracy.

not yet live

We're benchmarking and onboarding romanian-wav2vec2 as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.