models / embeddings · coming soon

nomic-embed-text-v1

nomic-ai/nomic-embed-text-v1

A popular open embeddings model, with 4.2M downloads a month. gigarouter benchmarks and hosts it as an OpenAI-compatible API.

est. price

~$0.008

/ 1M tokens · estimated, set at launch

API providers

downloads / mo

4.2M

license

apache-2.0

about this model

Model Overview

nomic-embed-text-v1 is a text embedding model with a context length of 8192 tokens. It is designed for a variety of embedding tasks including retrieval-augmented generation (RAG), clustering, and classification. The model requires a task instruction prefix (e.g., search_document, search_query, clustering, classification) to guide the embedding behavior.

Performance Benchmarks

Model	SeqLen	MTEB	LoCo	Jina Long Context
nomic-embed-text-v1	8192	62.39	85.53	54.16
jina-embeddings-v2-base-en	8192	60.39	85.45	51.90
text-embedding-3-small	8191	62.26	82.40	58.20
text-embedding-ada-002	8191	60.99	52.7	55.25

The model surpasses OpenAI text-embedding-ada-002 and text-embedding-3-small on MTEB and LoCo benchmarks, while offering open weights, training code, and data.

Key Strength

Long-context support (8192 tokens) with strong performance on both short and long context tasks. The model is reproducible, with fully open training pipeline and data.

Training Visualization

Training uses a multi-stage pipeline combining unsupervised contrastive learning on weakly related text pairs (e.g., QA forums, reviews) followed by supervised finetuning with high-quality labeled datasets and hard-example mining.

not yet live

We're benchmarking and onboarding nomic-embed-text-v1 as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.