w2v-bert-2.0
facebook/w2v-bert-2.0
A popular open embeddings model, with 3.7M downloads a month. gigarouter benchmarks and hosts it as an OpenAI-compatible API.
about this model
The W2v-BERT 2.0 is a Conformer-based speech encoder (600M parameters) that serves as the core of Meta’s Seamless communication models. It is pre-trained on 4.5 million hours of unlabeled audio data covering over 143 languages. The model produces high-quality audio embeddings and requires fine-tuning for downstream tasks such as automatic speech recognition (ASR) or audio classification.
Key Strengths
- Large-scale multilingual pre-training (143+ languages) enables strong cross-lingual representation learning.
- Conformer architecture combines convolution and self-attention for efficient sequence modeling.
- Proven in Seamless models for speech-to-speech translation and other audio tasks.
Best For
Developers building custom speech recognition, speaker identification, language identification, or audio classification systems that benefit from a robust, pre-trained encoder. The model is particularly suited for multilingual or low-resource language scenarios due to its broad language coverage.
Model Specifications
| Model Name | #params | Checkpoint |
|---|---|---|
| W2v-BERT 2.0 | 600M | checkpoint |
This model is hosted by gigarouter as a managed, OpenAI-compatible API. No installation or local setup is required; simply call the endpoint to generate embeddings.
We're benchmarking and onboarding w2v-bert-2.0 as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.