skip to content
gigarouter gigarouter
models / embeddings

Qwen3-Embedding-0.6B

Qwen/Qwen3-Embedding-0.6B

A hosted embeddings model - call it over an OpenAI-compatible API, no GPU to run.

price
$0.008
/ 1M tokens
throughput
581 embeds/s

about this model

Qwen3-Embedding-0.6B is a dense text embedding model hosted by gigarouter as a managed, OpenAI-compatible API. It is part of the Qwen3 Embedding series, built on the Qwen3 foundational models and designed for text embedding and ranking tasks.

Capabilities

  • Multilingual: Supports over 100 languages, including programming languages, enabling robust multilingual, cross-lingual, and code retrieval.
  • Long context: Handles sequences up to 32K tokens.
  • Flexible embedding dimensions: User-defined output dimensions from 32 to 1024 (MRL support).
  • Instruction-aware: Supports task-specific instructions on both embedding and reranking models, typically improving performance by 1% to 5%.

Performance

The larger 8B variant of the series ranks No.1 on the MTEB multilingual leaderboard (score 70.58 as of June 5, 2025). The 0.6B model delivers efficient, production-ready embeddings suitable for retrieval, classification, clustering, and bitext mining across diverse languages and domains.

Qwen3 Embedding architecture diagram

Model Specifications

ModelSizeLayersContext LengthEmbedding DimensionMRL SupportInstruction Aware
Qwen3-Embedding-0.6B0.6B2832K1024YesYes

Larger embedding variants (4B and 8B) and corresponding reranker models are also available in the Qwen3 Embedding series.

For detailed benchmarks, refer to the Qwen3 Embedding blog and GitHub repository.

call it
# OpenAI client - just change base_url
from openai import OpenAI
client = OpenAI(base_url="https://gigarouter.ai/v1", api_key=KEY)
v = client.embeddings.create(model="Qwen/Qwen3-Embedding-0.6B", input=["hello world"])
print(v.data[0].embedding[:4])