models / embeddings · coming soon

w2v-bert-2.0

facebook/w2v-bert-2.0

A popular open embeddings model, with 3.7M downloads a month. gigarouter benchmarks and hosts it as an OpenAI-compatible API.

est. price

~$0.008

/ 1M tokens · estimated, set at launch

API providers

downloads / mo

3.7M

license

mit

about this model

The W2v-BERT 2.0 is a Conformer-based speech encoder (600M parameters) that serves as the core of Meta’s Seamless communication models. It is pre-trained on 4.5 million hours of unlabeled audio data covering over 143 languages. The model produces high-quality audio embeddings and requires fine-tuning for downstream tasks such as automatic speech recognition (ASR) or audio classification.

Key Strengths

Large-scale multilingual pre-training (143+ languages) enables strong cross-lingual representation learning.
Conformer architecture combines convolution and self-attention for efficient sequence modeling.
Proven in Seamless models for speech-to-speech translation and other audio tasks.

Best For

Developers building custom speech recognition, speaker identification, language identification, or audio classification systems that benefit from a robust, pre-trained encoder. The model is particularly suited for multilingual or low-resource language scenarios due to its broad language coverage.

Model Specifications

Model Name	#params	Checkpoint
W2v-BERT 2.0	600M	checkpoint

This model is hosted by gigarouter as a managed, OpenAI-compatible API. No installation or local setup is required; simply call the endpoint to generate embeddings.

not yet live

We're benchmarking and onboarding w2v-bert-2.0 as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.