models / speech-to-text · coming soon

wav2vec2-large-xlsr-53-portuguese

jonatasgrosman/wav2vec2-large-xlsr-53-portuguese

A popular open speech-to-text model, with 3.2M downloads a month. gigarouter benchmarks and hosts it as an OpenAI-compatible API.

status

coming soon

API providers

downloads / mo

3.2M

license

apache-2.0

about this model

The model is a fine-tuned version of facebook/wav2vec2-large-xlsr-53 for automatic speech recognition (ASR) in Portuguese. It was fine-tuned on the train and validation splits of Common Voice 6.1 and requires audio sampled at 16 kHz. Gigarouter hosts this model as an OpenAI-compatible API, eliminating the need for local deployment.

This specialist model is best suited for transcribing spoken Portuguese into text. It leverages the large XLSR-53 architecture, which has been pre-trained on 53 languages and then adapted specifically for Portuguese, resulting in strong recognition accuracy on diverse speech samples.

Sample Transcriptions

The following table shows example predictions from the Common Voice test set, comparing reference sentences with model outputs:

Reference	Prediction
NEM O RADAR NEM OS OUTROS INSTRUMENTOS DETECTARAM O BOMBARDEIRO STEALTH.	NEMHUM VADAN OS OLTWES INSTRUMENTOS DE TTÉÃN UM BOMBERDEIRO OSTER
PEDIR DINHEIRO EMPRESTADO ÀS PESSOAS DA ALDEIA	E DIR ENGINHEIRO EMPRESTAR AS PESSOAS DA ALDEIA
OITO	OITO
TRANCÁ-LOS	TRANCAUVOS
REALIZAR UMA INVESTIGAÇÃO PARA RESOLVER O PROBLEMA	REALIZAR UMA INVESTIGAÇÃO PARA RESOLVER O PROBLEMA
O YOUTUBE AINDA É A MELHOR PLATAFORMA DE VÍDEOS.	YOUTUBE AINDA É A MELHOR PLATAFOMA DE VÍDEOS
MENINA E MENINO BEIJANDO NAS SOMBRAS	MENINA E MENINO BEIJANDO NAS SOMBRAS
EU SOU O SENHOR	EU SOU O SENHOR
DUAS MULHERES QUE SENTAM-SE PARA BAIXO LENDO JORNAIS.	DUAS MIERES QUE SENTAM-SE PARA BAICLANE JODNÓI
EU ORIGINALMENTE ESPERAVA	EU ORIGINALMENTE ESPERAVA

Usage Notes

When using this model via the gigarouter API, ensure that submitted audio files are sampled at 16 kHz. The model expects single-channel, 16-bit PCM audio. It performs best on clean, reasonably paced Portuguese speech and may show reduced accuracy on very noisy or overlapping audio.

not yet live

We're benchmarking and onboarding wav2vec2-large-xlsr-53-portuguese as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.