Hosted embeddings models
25 models · 2 live as APIs · benchmarked & compared
Embeddings models convert text into dense vector representations that capture semantic meaning, enabling machines to compare and retrieve relevant information. Common use cases include semantic search, where queries are matched to documents by vector similarity; retrieval-augmented generation (RAG), where relevant context is fetched before prompting a language model; and clustering or classification of text based on thematic proximity.
In production, embeddings are typically precomputed for a corpus and stored in a vector database. At query time, the input is embedded and a nearest-neighbor search returns the most relevant items. The choice of model involves a trade-off between size, quality, and speed. Larger models like Qwen/Qwen3-Embedding-4B or jinaai/jina-embeddings-v3 often deliver higher accuracy on nuanced tasks, but require more compute and memory. Smaller models such as Xenova/all-MiniLM-L6-v2 or ibm-granite/granite-embedding-small-english-r2 are faster and cheaper to run, making them suitable for latency-sensitive or high-volume applications.
For most call volumes, calling a hosted API beats self-hosting by eliminating the infrastructure, scaling, and maintenance overhead associated with running multiple model variants.
compare
| model | params | downloads/mo | price | status |
|---|---|---|---|---|
| Qwen/Qwen3-Embedding-0.6B | - | - | $0.008 / 1M tokens | live |
| BAAI/bge-small-en-v1.5 | - | - | $0.008 / 1M tokens | live |
| nomic-ai/nomic-embed-text-v1.5 | 136.7M | 16.9M | ~$0.008 / 1M tokens | coming soon |
| nomic-ai/nomic-embed-text-v1 | 136.7M | 4.2M | ~$0.008 / 1M tokens | coming soon |
| facebook/w2v-bert-2.0 | 580.5M | 3.7M | ~$0.008 / 1M tokens | coming soon |
| Xenova/all-MiniLM-L6-v2 | - | 2.8M | at launch | coming soon |
| jinaai/jina-embeddings-v3 | 572.3M | 2.7M | ~$0.008 / 1M tokens | coming soon |
| Qwen/Qwen3-Embedding-4B | 4021.8M | 2.6M | ~$0.008 / 1M tokens | coming soon |
| ibm-granite/granite-embedding-small-english-r2 | 47.7M | 2.2M | ~$0.008 / 1M tokens | coming soon |
| Xenova/bge-base-en-v1.5 | - | 1.8M | at launch | coming soon |
| microsoft/wavlm-large | - | 1.4M | at launch | coming soon |
| Qdrant/all-MiniLM-L6-v2-onnx | - | 1.3M | at launch | coming soon |
| jinaai/jina-embeddings-v2-small-en | 32.7M | 1.3M | ~$0.008 / 1M tokens | coming soon |
| Alibaba-NLP/gte-multilingual-base | 305.4M | 1.2M | ~$0.008 / 1M tokens | coming soon |
| Alibaba-NLP/gte-large-en-v1.5 | 434.1M | 1.1M | ~$0.008 / 1M tokens | coming soon |
| Qwen/Qwen3-VL-Embedding-8B | 8144.8M | 1.1M | ~$0.008 / 1M tokens | coming soon |
| jinaai/jina-embeddings-v5-text-nano | 211.8M | 1.1M | ~$0.008 / 1M tokens | coming soon |
| Salesforce/SFR-Embedding-2_R | 7110.7M | 1M | ~$0.008 / 1M tokens | coming soon |
| nomic-ai/nomic-embed-text-v2-moe | 475.3M | 854.7K | ~$0.008 / 1M tokens | coming soon |
| indobenchmark/indobert-base-p1 | - | 826.3K | at launch | coming soon |
| Alibaba-NLP/gte-Qwen2-1.5B-instruct | 1776.2M | 772.7K | ~$0.008 / 1M tokens | coming soon |
| microsoft/wavlm-base-plus | - | 771.5K | at launch | coming soon |
| Qdrant/bm25 | - | 769.3K | at launch | coming soon |
| nvidia/llama-nemotron-embed-1b-v2 | 1235.8M | 658.5K | ~$0.008 / 1M tokens | coming soon |
| boboliu/Qwen3-Embedding-4B-W4A16-G128 | 4050.2M | 549.1K | ~$0.008 / 1M tokens | coming soon |