jina-embeddings-v3
jinaai/jina-embeddings-v3
A popular open embeddings model, with 2.7M downloads a month. gigarouter benchmarks and hosts it as an OpenAI-compatible API.
about this model
jina-embeddings-v3 is a multilingual, multi-task text embedding model designed for a variety of NLP applications. Based on the Jina-XLM-RoBERTa architecture, it supports long input sequences up to 8192 tokens via Rotary Position Embeddings and uses five task-specific LoRA adapters to produce optimized embeddings for different use cases.
Key Features
- Extended sequence length: Up to 8192 tokens with RoPE.
- Task-specific embeddings: Five LoRA adapters for retrieval.query, retrieval.passage, separation, classification, and text-matching.
- Matryoshka embeddings: Flexible output dimensions from 32 to 1024, allowing truncation to fit your application.
- Multilingual support: Tuned for 30 languages including Arabic, Bengali, Chinese, Danish, Dutch, English, Finnish, French, Georgian, German, Greek, Hindi, Indonesian, Italian, Japanese, Korean, Latvian, Norwegian, Polish, Portuguese, Romanian, Russian, Slovak, Spanish, Swedish, Thai, Turkish, Ukrainian, Urdu, and Vietnamese.
This model is best suited for retrieval (asymmetric and symmetric), clustering, re-ranking, classification, and semantic textual similarity. It is available as a hosted, OpenAI-compatible API on Gigarouter, eliminating the need for local installation or GPU management.
For details on model architecture and training, refer to the paper.
We're benchmarking and onboarding jina-embeddings-v3 as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.