llama-nemotron-rerank-1b-v2
nvidia/llama-nemotron-rerank-1b-v2
A popular open reranker model, with 231K downloads a month. gigarouter benchmarks and hosts it as an OpenAI-compatible API.
about this model
The Llama Nemotron Reranking 1B model is a cross-encoder fine-tuned from Meta's Llama-3.2-1B using contrastive learning. It accepts a query and a set of candidate documents and outputs a raw logit score representing the relevance of each document to the query. The model is designed to reorder initial results from an embedding or sparse retrieval system, improving overall retrieval accuracy.
Key Strengths
- Supports documents up to 8,192 tokens.
- Multilingual and cross-lingual: evaluated on 26 languages including English, Arabic, Bengali, Chinese, Czech, Danish, Dutch, Finnish, French, German, Hebrew, Hindi, Hungarian, Indonesian, Italian, Japanese, Korean, Norwegian, Persian, Polish, Portuguese, Russian, Spanish, Swedish, Thai, and Turkish.
- Commercial-ready under the NVIDIA Open Model License.
Intended Use
Reranking candidate passages in information retrieval pipelines, particularly for multilingual or long-context question-answering tasks. Typically deployed after an initial retrieval step to refine the ranking of top candidates.
Performance
When combined with the llama-nemotron-embed-1b-v2 embedding model, the pipeline achieves high accuracy on BEIR+TechQA benchmarks. The embedding model is 3.5× smaller than the nv-rerankqa-mistral-4b-v3 model. The reranker was trained on
We're benchmarking and onboarding llama-nemotron-rerank-1b-v2 as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.