wav2vec2-large-xlsr-53-portuguese
jonatasgrosman/wav2vec2-large-xlsr-53-portuguese
A popular open speech-to-text model, with 3.2M downloads a month. gigarouter benchmarks and hosts it as an OpenAI-compatible API.
about this model
The model is a fine-tuned version of facebook/wav2vec2-large-xlsr-53 for automatic speech recognition (ASR) in Portuguese. It was fine-tuned on the train and validation splits of Common Voice 6.1 and requires audio sampled at 16 kHz. Gigarouter hosts this model as an OpenAI-compatible API, eliminating the need for local deployment.
This specialist model is best suited for transcribing spoken Portuguese into text. It leverages the large XLSR-53 architecture, which has been pre-trained on 53 languages and then adapted specifically for Portuguese, resulting in strong recognition accuracy on diverse speech samples.
Sample Transcriptions
The following table shows example predictions from the Common Voice test set, comparing reference sentences with model outputs:
| Reference | Prediction |
|---|---|
| NEM O RADAR NEM OS OUTROS INSTRUMENTOS DETECTARAM O BOMBARDEIRO STEALTH. | NEMHUM VADAN OS OLTWES INSTRUMENTOS DE TTÉÃN UM BOMBERDEIRO OSTER |
| PEDIR DINHEIRO EMPRESTADO ÀS PESSOAS DA ALDEIA | E DIR ENGINHEIRO EMPRESTAR AS PESSOAS DA ALDEIA |
| OITO | OITO |
| TRANCÁ-LOS | TRANCAUVOS |
| REALIZAR UMA INVESTIGAÇÃO PARA RESOLVER O PROBLEMA | REALIZAR UMA INVESTIGAÇÃO PARA RESOLVER O PROBLEMA |
| O YOUTUBE AINDA É A MELHOR PLATAFORMA DE VÍDEOS. | YOUTUBE AINDA É A MELHOR PLATAFOMA DE VÍDEOS |
| MENINA E MENINO BEIJANDO NAS SOMBRAS | MENINA E MENINO BEIJANDO NAS SOMBRAS |
| EU SOU O SENHOR | EU SOU O SENHOR |
| DUAS MULHERES QUE SENTAM-SE PARA BAIXO LENDO JORNAIS. | DUAS MIERES QUE SENTAM-SE PARA BAICLANE JODNÓI |
| EU ORIGINALMENTE ESPERAVA | EU ORIGINALMENTE ESPERAVA |
Usage Notes
When using this model via the gigarouter API, ensure that submitted audio files are sampled at 16 kHz. The model expects single-channel, 16-bit PCM audio. It performs best on clean, reasonably paced Portuguese speech and may show reduced accuracy on very noisy or overlapping audio.
We're benchmarking and onboarding wav2vec2-large-xlsr-53-portuguese as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.