skip to content
gigarouter gigarouter
tasks / embeddings

Hosted embeddings models

25 models · 2 live as APIs · benchmarked & compared

Embeddings models convert text into dense vector representations that capture semantic meaning, enabling machines to compare and retrieve relevant information. Common use cases include semantic search, where queries are matched to documents by vector similarity; retrieval-augmented generation (RAG), where relevant context is fetched before prompting a language model; and clustering or classification of text based on thematic proximity.

In production, embeddings are typically precomputed for a corpus and stored in a vector database. At query time, the input is embedded and a nearest-neighbor search returns the most relevant items. The choice of model involves a trade-off between size, quality, and speed. Larger models like Qwen/Qwen3-Embedding-4B or jinaai/jina-embeddings-v3 often deliver higher accuracy on nuanced tasks, but require more compute and memory. Smaller models such as Xenova/all-MiniLM-L6-v2 or ibm-granite/granite-embedding-small-english-r2 are faster and cheaper to run, making them suitable for latency-sensitive or high-volume applications.

For most call volumes, calling a hosted API beats self-hosting by eliminating the infrastructure, scaling, and maintenance overhead associated with running multiple model variants.

compare

modelparamsdownloads/mopricestatus
Qwen/Qwen3-Embedding-0.6B--$0.008 / 1M tokenslive
BAAI/bge-small-en-v1.5--$0.008 / 1M tokenslive
nomic-ai/nomic-embed-text-v1.5136.7M16.9M~$0.008 / 1M tokenscoming soon
nomic-ai/nomic-embed-text-v1136.7M4.2M~$0.008 / 1M tokenscoming soon
facebook/w2v-bert-2.0580.5M3.7M~$0.008 / 1M tokenscoming soon
Xenova/all-MiniLM-L6-v2-2.8Mat launchcoming soon
jinaai/jina-embeddings-v3572.3M2.7M~$0.008 / 1M tokenscoming soon
Qwen/Qwen3-Embedding-4B4021.8M2.6M~$0.008 / 1M tokenscoming soon
ibm-granite/granite-embedding-small-english-r247.7M2.2M~$0.008 / 1M tokenscoming soon
Xenova/bge-base-en-v1.5-1.8Mat launchcoming soon
microsoft/wavlm-large-1.4Mat launchcoming soon
Qdrant/all-MiniLM-L6-v2-onnx-1.3Mat launchcoming soon
jinaai/jina-embeddings-v2-small-en32.7M1.3M~$0.008 / 1M tokenscoming soon
Alibaba-NLP/gte-multilingual-base305.4M1.2M~$0.008 / 1M tokenscoming soon
Alibaba-NLP/gte-large-en-v1.5434.1M1.1M~$0.008 / 1M tokenscoming soon
Qwen/Qwen3-VL-Embedding-8B8144.8M1.1M~$0.008 / 1M tokenscoming soon
jinaai/jina-embeddings-v5-text-nano211.8M1.1M~$0.008 / 1M tokenscoming soon
Salesforce/SFR-Embedding-2_R7110.7M1M~$0.008 / 1M tokenscoming soon
nomic-ai/nomic-embed-text-v2-moe475.3M854.7K~$0.008 / 1M tokenscoming soon
indobenchmark/indobert-base-p1-826.3Kat launchcoming soon
Alibaba-NLP/gte-Qwen2-1.5B-instruct1776.2M772.7K~$0.008 / 1M tokenscoming soon
microsoft/wavlm-base-plus-771.5Kat launchcoming soon
Qdrant/bm25-769.3Kat launchcoming soon
nvidia/llama-nemotron-embed-1b-v21235.8M658.5K~$0.008 / 1M tokenscoming soon
boboliu/Qwen3-Embedding-4B-W4A16-G1284050.2M549.1K~$0.008 / 1M tokenscoming soon