tasks / embeddings

Hosted embeddings models

25 models · 2 live as APIs · benchmarked & compared

Embeddings models convert text into dense vector representations that capture semantic meaning, enabling machines to compare and retrieve relevant information. Common use cases include semantic search, where queries are matched to documents by vector similarity; retrieval-augmented generation (RAG), where relevant context is fetched before prompting a language model; and clustering or classification of text based on thematic proximity.

In production, embeddings are typically precomputed for a corpus and stored in a vector database. At query time, the input is embedded and a nearest-neighbor search returns the most relevant items. The choice of model involves a trade-off between size, quality, and speed. Larger models like Qwen/Qwen3-Embedding-4B or jinaai/jina-embeddings-v3 often deliver higher accuracy on nuanced tasks, but require more compute and memory. Smaller models such as Xenova/all-MiniLM-L6-v2 or ibm-granite/granite-embedding-small-english-r2 are faster and cheaper to run, making them suitable for latency-sensitive or high-volume applications.

For most call volumes, calling a hosted API beats self-hosting by eliminating the infrastructure, scaling, and maintenance overhead associated with running multiple model variants.

compare

model	params	downloads/mo	price	status
Qwen/Qwen3-Embedding-0.6B	-	-	$0.008 / 1M tokens	live
BAAI/bge-small-en-v1.5	-	-	$0.008 / 1M tokens	live
nomic-ai/nomic-embed-text-v1.5	136.7M	16.9M	~$0.008 / 1M tokens	coming soon
nomic-ai/nomic-embed-text-v1	136.7M	4.2M	~$0.008 / 1M tokens	coming soon
facebook/w2v-bert-2.0	580.5M	3.7M	~$0.008 / 1M tokens	coming soon
Xenova/all-MiniLM-L6-v2	-	2.8M	at launch	coming soon
jinaai/jina-embeddings-v3	572.3M	2.7M	~$0.008 / 1M tokens	coming soon
Qwen/Qwen3-Embedding-4B	4021.8M	2.6M	~$0.008 / 1M tokens	coming soon
ibm-granite/granite-embedding-small-english-r2	47.7M	2.2M	~$0.008 / 1M tokens	coming soon
Xenova/bge-base-en-v1.5	-	1.8M	at launch	coming soon
microsoft/wavlm-large	-	1.4M	at launch	coming soon
Qdrant/all-MiniLM-L6-v2-onnx	-	1.3M	at launch	coming soon
jinaai/jina-embeddings-v2-small-en	32.7M	1.3M	~$0.008 / 1M tokens	coming soon
Alibaba-NLP/gte-multilingual-base	305.4M	1.2M	~$0.008 / 1M tokens	coming soon
Alibaba-NLP/gte-large-en-v1.5	434.1M	1.1M	~$0.008 / 1M tokens	coming soon
Qwen/Qwen3-VL-Embedding-8B	8144.8M	1.1M	~$0.008 / 1M tokens	coming soon
jinaai/jina-embeddings-v5-text-nano	211.8M	1.1M	~$0.008 / 1M tokens	coming soon
Salesforce/SFR-Embedding-2_R	7110.7M	1M	~$0.008 / 1M tokens	coming soon
nomic-ai/nomic-embed-text-v2-moe	475.3M	854.7K	~$0.008 / 1M tokens	coming soon
indobenchmark/indobert-base-p1	-	826.3K	at launch	coming soon
Alibaba-NLP/gte-Qwen2-1.5B-instruct	1776.2M	772.7K	~$0.008 / 1M tokens	coming soon
microsoft/wavlm-base-plus	-	771.5K	at launch	coming soon
Qdrant/bm25	-	769.3K	at launch	coming soon
nvidia/llama-nemotron-embed-1b-v2	1235.8M	658.5K	~$0.008 / 1M tokens	coming soon
boboliu/Qwen3-Embedding-4B-W4A16-G128	4050.2M	549.1K	~$0.008 / 1M tokens	coming soon

get a key + $25 free →docs