Hosted reranker models
38 models · 4 live as APIs · benchmarked & compared
Reranker models are specialized neural networks that take a query and a set of candidate documents and output relevance scores, reordering the list with the most relevant results first. They solve the problem of improving accuracy after an initial, cheap retrieval step—common in search engines, retrieval-augmented generation (RAG) pipelines, and enterprise question-answering. For example, a legal research platform might first retrieve 100 possibly relevant cases by keyword, then use a reranker to surface the top 5 that best match the user’s intent.
In production, rerankers are typically placed after a fast bi-encoder or keyword-based index. The retriever passes a limited number of candidates (e.g., 50–200) to the reranker, which re-scores them. This two-stage design balances recall and latency. When choosing between models, the primary trade-off is accuracy versus speed and memory. Smaller models like cross-encoder/ms-marco-MiniLM-L2-v2 offer higher throughput and lower latency, while larger ones like Qwen/Qwen3-Reranker-4B or cross-encoder/ms-marco-MiniLM-L12-v2 (and the multilingual jinaai/jina-reranker-v2-base-multilingual) provide stronger relevance signals at the cost of slower inference. On GigaRouter, 4 reranker models are live now, with 38 total being onboarded.
For most production call volumes, calling a hosted API eliminates the operational burden of managing GPU infrastructure, scaling, and model updates, making it simpler to integrate reranking without dedicating engineering resources to self-hosting.
compare
| model | params | downloads/mo | price | status |
|---|---|---|---|---|
| cross-encoder/ms-marco-MiniLM-L6-v2 | - | 81.5M | $0.008 / 1k docs | live |
| jinaai/jina-reranker-v2-base-multilingual | - | 1.8M | $0.008 / 1k docs | live |
| Qwen/Qwen3-Reranker-0.6B | - | - | $0.008 / 1k docs | live |
| BAAI/bge-reranker-base | - | - | $0.008 / 1k docs | live |
| cross-encoder/ms-marco-MiniLM-L4-v2 | 19.2M | 4.8M | ~$0.008 / 1k docs | coming soon |
| Alibaba-NLP/gte-reranker-modernbert-base | 149.6M | 2.7M | ~$0.008 / 1k docs | coming soon |
| cross-encoder/ms-marco-MiniLM-L12-v2 | 33.4M | 2.3M | ~$0.008 / 1k docs | coming soon |
| Qwen/Qwen3-Reranker-4B | 4021.8M | 1.8M | ~$0.008 / 1k docs | coming soon |
| cross-encoder/mmarco-mMiniLMv2-L12-H384-v1 | 117.6M | 1.6M | ~$0.008 / 1k docs | coming soon |
| cross-encoder/ms-marco-MiniLM-L2-v2 | 15.6M | 1.2M | ~$0.008 / 1k docs | coming soon |
| Qwen/Qwen3-Reranker-8B | 8188.5M | 1M | ~$0.008 / 1k docs | coming soon |
| jinaai/jina-reranker-v3 | 596.8M | 949.9K | ~$0.008 / 1k docs | coming soon |
| mixedbread-ai/mxbai-rerank-xsmall-v1 | 70.8M | 551K | ~$0.008 / 1k docs | coming soon |
| Qwen/Qwen3-VL-Reranker-8B | 8767.1M | 431K | ~$0.008 / 1k docs | coming soon |
| hotchpotch/japanese-reranker-cross-encoder-small-v1 | 117.6M | 334.2K | ~$0.008 / 1k docs | coming soon |
| Qwen/Qwen3-VL-Reranker-2B | 2127.5M | 300.3K | ~$0.008 / 1k docs | coming soon |
| cross-encoder/stsb-roberta-large | 355.4M | 286.8K | ~$0.008 / 1k docs | coming soon |
| cross-encoder/ms-marco-TinyBERT-L2-v2 | 4.4M | 283.3K | ~$0.008 / 1k docs | coming soon |
| cl-nagoya/ruri-v3-reranker-310m | 315.2M | 274K | ~$0.008 / 1k docs | coming soon |
| mixedbread-ai/mxbai-rerank-base-v1 | 184.4M | 273.7K | ~$0.008 / 1k docs | coming soon |
| tomaarsen/Qwen3-Reranker-0.6B-seq-cls | 595.8M | 262.5K | ~$0.008 / 1k docs | coming soon |
| nvidia/llama-nemotron-rerank-1b-v2 | 1235.8M | 231K | ~$0.008 / 1k docs | coming soon |
| Alibaba-NLP/gte-multilingual-reranker-base | 306M | 221.9K | ~$0.008 / 1k docs | coming soon |
| antoinelouis/crossencoder-camembert-base-mmarcoFR | 110.6M | 185K | ~$0.008 / 1k docs | coming soon |
| cross-encoder/stsb-roberta-base | 124.6M | 182.5K | ~$0.008 / 1k docs | coming soon |
| nvidia/llama-nemotron-rerank-vl-1b-v2 | 1678.3M | 99.7K | ~$0.008 / 1k docs | coming soon |
| mixedbread-ai/mxbai-rerank-base-v2 | 494M | 97.7K | ~$0.008 / 1k docs | coming soon |
| cross-encoder/stsb-distilroberta-base | 82.1M | 95.4K | ~$0.008 / 1k docs | coming soon |
| Xenova/bge-reranker-base | - | 84.3K | at launch | coming soon |
| mixedbread-ai/mxbai-rerank-large-v1 | - | 66.1K | at launch | coming soon |
| hotchpotch/japanese-reranker-cross-encoder-xsmall-v1 | 107M | 55.4K | ~$0.008 / 1k docs | coming soon |
| mixedbread-ai/mxbai-rerank-large-v2 | 1543.7M | 54.3K | ~$0.008 / 1k docs | coming soon |
| ContextualAI/ctxl-rerank-v2-instruct-multilingual-1b | 1327M | 54.1K | ~$0.008 / 1k docs | coming soon |
| hotchpotch/japanese-reranker-xsmall-v2 | 36.8M | 50.8K | ~$0.008 / 1k docs | coming soon |
| jinaai/jina-reranker-v1-turbo-en | 37.8M | 49.2K | ~$0.008 / 1k docs | coming soon |
| zeroentropy/zerank-2-reranker | 4022.5M | 34.6K | ~$0.008 / 1k docs | coming soon |
| ggml-org/Qwen3-Reranker-0.6B-Q8_0-GGUF | - | 32.7K | at launch | coming soon |
| cross-encoder/qnli-electra-base | 109.5M | 29.6K | ~$0.008 / 1k docs | coming soon |