ms-marco-MiniLM-L6-v2
cross-encoder/ms-marco-MiniLM-L6-v2
The default cross-encoder reranker for RAG - tiny, fast, and one of the most-downloaded models on the Hub. We host it as a production API, so you don't stand up a GPU to call it.
about this model
This cross-encoder model, trained on MS Marco Passage Ranking, performs reranking for information retrieval. Given a query and a set of candidate passages (e.g., from a first-stage retrieval like ElasticSearch), it scores each query-passage pair and sorts passages by relevance. It is best suited for retrieval-augmented pipelines where a fast, accurate reranker is needed to improve top results.
Key strengths include low latency (1,800 docs/sec on a V100 GPU) and strong benchmark performance on both TREC Deep Learning 2019 and MS Marco Passage Reranking. The following table compares this model (MiniLM-L6-v2) with other version 2 cross-encoders in the same family.
| Model-Name | NDCG@10 (TREC DL 19) |
|---|
# rerank documents by relevance; billed per document curl https://gigarouter.ai/v1/rerank \ -H "Authorization: Bearer $GR_KEY" \ -d '{"model":"cross-encoder/ms-marco-MiniLM-L6-v2","query":"capital of France", "documents":["Paris is the capital of France.","Bananas are yellow."]}'