crossencoder-camembert-base-mmarcoFR
antoinelouis/crossencoder-camembert-base-mmarcoFR
A popular open reranker model, with 185K downloads a month. gigarouter benchmarks and hosts it as an OpenAI-compatible API.
about this model
This cross-encoder model is designed for French-language reranking. It takes a query-passage pair, applies cross-attention, and outputs a relevance score between 0 and 1. In a retrieval pipeline, it reorders an initial set of candidate passages (generated by a first-stage retriever such as BM25 or a dense bi-encoder) to place the most relevant results first.
Key Strengths
- Trained on mMARCO fr (8.8M passages, 539K training queries) with hard negatives mined from 12 different dense retrievers, producing 2.6M training triplets with a balanced positive-to-negative ratio of 1:1.
- Initialized from CamemBERT-base and fine-tuned using a binary cross-entropy loss (monoBERT approach) on an NVIDIA H100 GPU for 20k steps (batch size 128, learning rate 2e-5).
- Maximum sequence length of 256 tokens for concatenated query-passage pairs.
Best For
French semantic search pipelines that require precise second-stage reranking. The model is particularly suited when higher-quality ranking than that provided by a first-stage retriever is needed, especially in domain-specific or general French-language collections.
Benchmark Performance
The model is evaluated on the development set of mMARCO-fr (6,980 queries, each with an ensemble of 1,000 passages containing positives and ColBERTv2 hard negatives). Reported metrics include Mean Reciprocal Rank (MRR) and Recall at various cut-offs (R@k). For a comparison against other French neural retrievers, see the DécouvrIR leaderboard.
We're benchmarking and onboarding crossencoder-camembert-base-mmarcoFR as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.