models / reranker · coming soon

crossencoder-camembert-base-mmarcoFR

antoinelouis/crossencoder-camembert-base-mmarcoFR

A popular open reranker model, with 185K downloads a month. gigarouter benchmarks and hosts it as an OpenAI-compatible API.

est. price

~$0.008

/ 1k docs · estimated, set at launch

API providers

downloads / mo

185K

license

mit

about this model

This cross-encoder model is designed for French-language reranking. It takes a query-passage pair, applies cross-attention, and outputs a relevance score between 0 and 1. In a retrieval pipeline, it reorders an initial set of candidate passages (generated by a first-stage retriever such as BM25 or a dense bi-encoder) to place the most relevant results first.

Key Strengths

Trained on mMARCO fr (8.8M passages, 539K training queries) with hard negatives mined from 12 different dense retrievers, producing 2.6M training triplets with a balanced positive-to-negative ratio of 1:1.
Initialized from CamemBERT-base and fine-tuned using a binary cross-entropy loss (monoBERT approach) on an NVIDIA H100 GPU for 20k steps (batch size 128, learning rate 2e-5).
Maximum sequence length of 256 tokens for concatenated query-passage pairs.

Best For

French semantic search pipelines that require precise second-stage reranking. The model is particularly suited when higher-quality ranking than that provided by a first-stage retriever is needed, especially in domain-specific or general French-language collections.

Benchmark Performance

The model is evaluated on the development set of mMARCO-fr (6,980 queries, each with an ensemble of 1,000 passages containing positives and ColBERTv2 hard negatives). Reported metrics include Mean Reciprocal Rank (MRR) and Recall at various cut-offs (R@k). For a comparison against other French neural retrievers, see the DécouvrIR leaderboard.

not yet live

We're benchmarking and onboarding crossencoder-camembert-base-mmarcoFR as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.