skip to content
gigarouter gigarouter
models / embeddings · coming soon

nomic-embed-text-v1

nomic-ai/nomic-embed-text-v1

A popular open embeddings model, with 4.2M downloads a month. gigarouter benchmarks and hosts it as an OpenAI-compatible API.

est. price
~$0.008
/ 1M tokens · estimated, set at launch
API providers
0
downloads / mo
4.2M
license
apache-2.0

about this model

Model Overview

nomic-embed-text-v1 is a text embedding model with a context length of 8192 tokens. It is designed for a variety of embedding tasks including retrieval-augmented generation (RAG), clustering, and classification. The model requires a task instruction prefix (e.g., search_document, search_query, clustering, classification) to guide the embedding behavior.

Performance Benchmarks

ModelSeqLenMTEBLoCoJina Long Context
nomic-embed-text-v1819262.3985.5354.16
jina-embeddings-v2-base-en819260.3985.4551.90
text-embedding-3-small819162.2682.4058.20
text-embedding-ada-002819160.9952.755.25

The model surpasses OpenAI text-embedding-ada-002 and text-embedding-3-small on MTEB and LoCo benchmarks, while offering open weights, training code, and data.

Key Strength

Long-context support (8192 tokens) with strong performance on both short and long context tasks. The model is reproducible, with fully open training pipeline and data.

Training Visualization

Training data sample visualization

Training uses a multi-stage pipeline combining unsupervised contrastive learning on weakly related text pairs (e.g., QA forums, reviews) followed by supervised finetuning with high-quality labeled datasets and hard-example mining.

not yet live

We're benchmarking and onboarding nomic-embed-text-v1 as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.