models / reranker · coming soon

llama-nemotron-rerank-1b-v2

nvidia/llama-nemotron-rerank-1b-v2

A popular open reranker model, with 231K downloads a month. gigarouter benchmarks and hosts it as an OpenAI-compatible API.

est. price

~$0.008

/ 1k docs · estimated, set at launch

API providers

downloads / mo

231K

license

other

about this model

The Llama Nemotron Reranking 1B model is a cross-encoder fine-tuned from Meta's Llama-3.2-1B using contrastive learning. It accepts a query and a set of candidate documents and outputs a raw logit score representing the relevance of each document to the query. The model is designed to reorder initial results from an embedding or sparse retrieval system, improving overall retrieval accuracy.

Key Strengths

Supports documents up to 8,192 tokens.
Multilingual and cross-lingual: evaluated on 26 languages including English, Arabic, Bengali, Chinese, Czech, Danish, Dutch, Finnish, French, German, Hebrew, Hindi, Hungarian, Indonesian, Italian, Japanese, Korean, Norwegian, Persian, Polish, Portuguese, Russian, Spanish, Swedish, Thai, and Turkish.
Commercial-ready under the NVIDIA Open Model License.

Intended Use

Reranking candidate passages in information retrieval pipelines, particularly for multilingual or long-context question-answering tasks. Typically deployed after an initial retrieval step to refine the ranking of top candidates.

Performance

When combined with the llama-nemotron-embed-1b-v2 embedding model, the pipeline achieves high accuracy on BEIR+TechQA benchmarks. The embedding model is 3.5× smaller than the nv-rerankqa-mistral-4b-v3 model. The reranker was trained on

not yet live

We're benchmarking and onboarding llama-nemotron-rerank-1b-v2 as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.