models / reranker

ms-marco-MiniLM-L6-v2

cross-encoder/ms-marco-MiniLM-L6-v2

The default cross-encoder reranker for RAG - tiny, fast, and one of the most-downloaded models on the Hub. We host it as a production API, so you don't stand up a GPU to call it.

price

$0.008

/ 1k docs

API providers

downloads / mo

81.5M

throughput

2.7K docs/s

license

apache-2.0

about this model

This cross-encoder model, trained on MS Marco Passage Ranking, performs reranking for information retrieval. Given a query and a set of candidate passages (e.g., from a first-stage retrieval like ElasticSearch), it scores each query-passage pair and sorts passages by relevance. It is best suited for retrieval-augmented pipelines where a fast, accurate reranker is needed to improve top results.

Key strengths include low latency (1,800 docs/sec on a V100 GPU) and strong benchmark performance on both TREC Deep Learning 2019 and MS Marco Passage Reranking. The following table compares this model (MiniLM-L6-v2) with other version 2 cross-encoders in the same family.

Model-Name	NDCG@10 (TREC DL 19)

call it

# rerank documents by relevance; billed per document
curl https://gigarouter.ai/v1/rerank \
  -H "Authorization: Bearer $GR_KEY" \
  -d '{"model":"cross-encoder/ms-marco-MiniLM-L6-v2","query":"capital of France",
       "documents":["Paris is the capital of France.","Bananas are yellow."]}'

get a key + $25 free →model card ↗all models