skip to content
gigarouter gigarouter
models / reranker · coming soon

ms-marco-MiniLM-L4-v2

cross-encoder/ms-marco-MiniLM-L4-v2

A popular open reranker model, with 4.8M downloads a month. gigarouter benchmarks and hosts it as an OpenAI-compatible API.

est. price
~$0.008
/ 1k docs · estimated, set at launch
API providers
0
downloads / mo
4.8M
license
apache-2.0

about this model

Model Overview

This is a cross-encoder model fine-tuned on the MS Marco Passage Ranking task for information retrieval reranking. Given a query and a set of candidate passages (e.g., retrieved via ElasticSearch), the model scores each query-passage pair and sorts passages in decreasing order of relevance.

Key Strengths

  • Optimized for the reranking stage in a retrieve-and-rerank pipeline
  • Compact MiniLM-L4 architecture balances speed and accuracy
  • Directly outputs relevance scores for efficient ranking

Benchmark Performance

MetricScore
NDCG@10 (TREC DL 2019)73.04
MRR@10 (MS Marco Dev)37.70
Docs / Sec (V100 GPU)2,500

Among version 2 models, this variant achieves a strong trade-off: higher throughput than larger MiniLM-L6/L12 models while maintaining competitive ranking quality. It outperforms most version 1 and third-party models of comparable size.

Best For

  • Production reranking pipelines where latency and throughput matter
  • Scoring query-passage pairs for search or question answering
  • Applications needing a lightweight yet effective cross-encoder

Hosted on Gigarouter

Access this model via a managed, OpenAI-compatible API — no infrastructure or model loading required.

not yet live

We're benchmarking and onboarding ms-marco-MiniLM-L4-v2 as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.