models / reranker · coming soon

ms-marco-MiniLM-L4-v2

cross-encoder/ms-marco-MiniLM-L4-v2

A popular open reranker model, with 4.8M downloads a month. gigarouter benchmarks and hosts it as an OpenAI-compatible API.

est. price

~$0.008

/ 1k docs · estimated, set at launch

API providers

downloads / mo

4.8M

license

apache-2.0

about this model

Model Overview

This is a cross-encoder model fine-tuned on the MS Marco Passage Ranking task for information retrieval reranking. Given a query and a set of candidate passages (e.g., retrieved via ElasticSearch), the model scores each query-passage pair and sorts passages in decreasing order of relevance.

Key Strengths

Optimized for the reranking stage in a retrieve-and-rerank pipeline
Compact MiniLM-L4 architecture balances speed and accuracy
Directly outputs relevance scores for efficient ranking

Benchmark Performance

Metric	Score
NDCG@10 (TREC DL 2019)	73.04
MRR@10 (MS Marco Dev)	37.70
Docs / Sec (V100 GPU)	2,500

Among version 2 models, this variant achieves a strong trade-off: higher throughput than larger MiniLM-L6/L12 models while maintaining competitive ranking quality. It outperforms most version 1 and third-party models of comparable size.

Best For

Production reranking pipelines where latency and throughput matter
Scoring query-passage pairs for search or question answering
Applications needing a lightweight yet effective cross-encoder

Hosted on Gigarouter

Access this model via a managed, OpenAI-compatible API — no infrastructure or model loading required.

not yet live

We're benchmarking and onboarding ms-marco-MiniLM-L4-v2 as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.