skip to content
gigarouter gigarouter
models / speech-to-text · coming soon

wav2vec2-large-xlsr-53-portuguese

jonatasgrosman/wav2vec2-large-xlsr-53-portuguese

A popular open speech-to-text model, with 3.2M downloads a month. gigarouter benchmarks and hosts it as an OpenAI-compatible API.

status
coming soon
API providers
0
downloads / mo
3.2M
license
apache-2.0

about this model

The model is a fine-tuned version of facebook/wav2vec2-large-xlsr-53 for automatic speech recognition (ASR) in Portuguese. It was fine-tuned on the train and validation splits of Common Voice 6.1 and requires audio sampled at 16 kHz. Gigarouter hosts this model as an OpenAI-compatible API, eliminating the need for local deployment.

This specialist model is best suited for transcribing spoken Portuguese into text. It leverages the large XLSR-53 architecture, which has been pre-trained on 53 languages and then adapted specifically for Portuguese, resulting in strong recognition accuracy on diverse speech samples.

Sample Transcriptions

The following table shows example predictions from the Common Voice test set, comparing reference sentences with model outputs:

ReferencePrediction
NEM O RADAR NEM OS OUTROS INSTRUMENTOS DETECTARAM O BOMBARDEIRO STEALTH.NEMHUM VADAN OS OLTWES INSTRUMENTOS DE TTÉÃN UM BOMBERDEIRO OSTER
PEDIR DINHEIRO EMPRESTADO ÀS PESSOAS DA ALDEIAE DIR ENGINHEIRO EMPRESTAR AS PESSOAS DA ALDEIA
OITOOITO
TRANCÁ-LOSTRANCAUVOS
REALIZAR UMA INVESTIGAÇÃO PARA RESOLVER O PROBLEMAREALIZAR UMA INVESTIGAÇÃO PARA RESOLVER O PROBLEMA
O YOUTUBE AINDA É A MELHOR PLATAFORMA DE VÍDEOS.YOUTUBE AINDA É A MELHOR PLATAFOMA DE VÍDEOS
MENINA E MENINO BEIJANDO NAS SOMBRASMENINA E MENINO BEIJANDO NAS SOMBRAS
EU SOU O SENHOREU SOU O SENHOR
DUAS MULHERES QUE SENTAM-SE PARA BAIXO LENDO JORNAIS.DUAS MIERES QUE SENTAM-SE PARA BAICLANE JODNÓI
EU ORIGINALMENTE ESPERAVAEU ORIGINALMENTE ESPERAVA

Usage Notes

When using this model via the gigarouter API, ensure that submitted audio files are sampled at 16 kHz. The model expects single-channel, 16-bit PCM audio. It performs best on clean, reasonably paced Portuguese speech and may show reduced accuracy on very noisy or overlapping audio.

not yet live

We're benchmarking and onboarding wav2vec2-large-xlsr-53-portuguese as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.