ms-marco-MiniLM-L4-v2
cross-encoder/ms-marco-MiniLM-L4-v2
A popular open reranker model, with 4.8M downloads a month. gigarouter benchmarks and hosts it as an OpenAI-compatible API.
about this model
Model Overview
This is a cross-encoder model fine-tuned on the MS Marco Passage Ranking task for information retrieval reranking. Given a query and a set of candidate passages (e.g., retrieved via ElasticSearch), the model scores each query-passage pair and sorts passages in decreasing order of relevance.
Key Strengths
- Optimized for the reranking stage in a retrieve-and-rerank pipeline
- Compact MiniLM-L4 architecture balances speed and accuracy
- Directly outputs relevance scores for efficient ranking
Benchmark Performance
| Metric | Score |
|---|---|
| NDCG@10 (TREC DL 2019) | 73.04 |
| MRR@10 (MS Marco Dev) | 37.70 |
| Docs / Sec (V100 GPU) | 2,500 |
Among version 2 models, this variant achieves a strong trade-off: higher throughput than larger MiniLM-L6/L12 models while maintaining competitive ranking quality. It outperforms most version 1 and third-party models of comparable size.
Best For
- Production reranking pipelines where latency and throughput matter
- Scoring query-passage pairs for search or question answering
- Applications needing a lightweight yet effective cross-encoder
Hosted on Gigarouter
Access this model via a managed, OpenAI-compatible API — no infrastructure or model loading required.
We're benchmarking and onboarding ms-marco-MiniLM-L4-v2 as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.