nomic-embed-text-v1
nomic-ai/nomic-embed-text-v1
A popular open embeddings model, with 4.2M downloads a month. gigarouter benchmarks and hosts it as an OpenAI-compatible API.
about this model
Model Overview
nomic-embed-text-v1 is a text embedding model with a context length of 8192 tokens. It is designed for a variety of embedding tasks including retrieval-augmented generation (RAG), clustering, and classification. The model requires a task instruction prefix (e.g., search_document, search_query, clustering, classification) to guide the embedding behavior.
Performance Benchmarks
| Model | SeqLen | MTEB | LoCo | Jina Long Context |
|---|---|---|---|---|
| nomic-embed-text-v1 | 8192 | 62.39 | 85.53 | 54.16 |
| jina-embeddings-v2-base-en | 8192 | 60.39 | 85.45 | 51.90 |
| text-embedding-3-small | 8191 | 62.26 | 82.40 | 58.20 |
| text-embedding-ada-002 | 8191 | 60.99 | 52.7 | 55.25 |
The model surpasses OpenAI text-embedding-ada-002 and text-embedding-3-small on MTEB and LoCo benchmarks, while offering open weights, training code, and data.
Key Strength
Long-context support (8192 tokens) with strong performance on both short and long context tasks. The model is reproducible, with fully open training pipeline and data.
Training Visualization
Training uses a multi-stage pipeline combining unsupervised contrastive learning on weakly related text pairs (e.g., QA forums, reviews) followed by supervised finetuning with high-quality labeled datasets and hard-example mining.
We're benchmarking and onboarding nomic-embed-text-v1 as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.
