skip to content
gigarouter gigarouter
models / specialist model · coming soon

Contriever

facebook/contriever

published Mar 2022 · updated Jan 2022

Contriever is an unsupervised dense information retrieval model trained with contrastive learning for zero-shot passage and document search.

status
coming soon
API providers
0
downloads / mo
7.3M

specs

TaskDense Retrieval
ArchitectureBERT-based transformer encoder
Training DataCC-net and English Wikipedia (unsupervised)

about this model

facebook/contriever is an unsupervised dense information retrieval model trained with contrastive learning, as described in the paper "Towards Unsupervised Dense Information Retrieval with Contrastive Learning" (arXiv:2112.09118). It produces sentence embeddings that can be compared via dot product for retrieval tasks, without requiring any supervised training data.

Key strengths

The unsupervised Contriever is competitive with BM25 on the BEIR benchmark. On Recall@100, it outperforms BM25 on 11 out of 15 datasets. When fine-tuned on MS MARCO (contriever-msmarco), retrieval recall improves substantially. A multilingual version, mcontriever, is pre-trained on 29 languages using CC-net data and supports cross-lingual retrieval across different scripts.

Benchmark results

Performance on NaturalQuestions (R@k):

ModelR@5R@20R@100
Contriever47.867.882.1
Contriever-msmarco65.779.688.0

Performance on TriviaQA (R@k):

ModelR@5R@20R@100
Contriever59.467.883.2
Contriever-msmarco71.380.485.7

BEIR evaluation additionally uses nDCG@10 across datasets including MS MARCO, TREC-Covid, NFCorpus, and others. Pre-computed Wikipedia passage embeddings for both Contriever and Contriever-msmarco are available for download.

Model variants hosted by Gigarouter

Four pre-trained variants are available: the unsupervised contriever, contriever-msmarco (fine-tuned on MS MARCO), mcontriever (multilingual, 29 languages), and mcontriever-msmarco. All are accessible via the gigarouter API as an OpenAI-compatible endpoint, requiring no local installation or pooling logic.

best for

FAQ

What task is Contriever designed for?

Contriever performs dense retrieval – it maps queries and passages to dense vectors and retrieves the most relevant passages by dot-product similarity.

How does Contriever compare to BM25?

On the BEIR benchmark, unsupervised Contriever outperforms BM25 on 11 out of 15 datasets for Recall@100, especially in zero-shot settings.

What input format does the model expect?

The model accepts text strings (queries or passages). Use the HuggingFace tokenizer with padding and truncation, then apply mean pooling to obtain sentence embeddings.

How can I call Contriever via the gigarouter API?

Use the OpenAI‑compatible endpoint with your gigarouter API key. Send a request with the model name and input text; the API returns the embeddings.

Are there fine-tuned versions available?

Yes, Contriever‑msmarco is fine‑tuned on MS MARCO for better retrieval on that domain. Multilingual mContriever and mContriever‑msmarco are also available.

not yet live

We're benchmarking and onboarding Contriever as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related specialist model models

compare all →