all-MiniLM-L6-v2-onnx
Qdrant/all-MiniLM-L6-v2-onnx
A popular open embeddings model, with 1.3M downloads a month. gigarouter benchmarks and hosts it as an OpenAI-compatible API.
about this model
This ONNX port of sentence-transformers/all-MiniLM-L6-v2 is a lightweight embedding model optimized for text classification and similarity searches. It is hosted on gigarouter as a managed, OpenAI-compatible API — no local installation required.
Key strengths
- Compact and fast: the MiniLM-L6 architecture balances speed and quality, making it suitable for high‑throughput embedding pipelines.
- ONNX format ensures broad runtime compatibility and efficient inference.
- Designed for semantic textual similarity, clustering, and retrieval tasks.
What it is best for
- Sentence and paragraph‑level embedding for semantic search.
- Zero‑shot text classification using embedding‑based approaches.
- Applications where low latency and modest resource usage are critical.
Performance
As a port of the original all‑MiniLM‑L6‑v2, the model inherits its known performance. The original model achieves competitive scores on the STS Benchmark (e.g., Spearman correlation of approximately 80–82 on STS‑test) and is widely used in production for general‑purpose embedding tasks.
We're benchmarking and onboarding all-MiniLM-L6-v2-onnx as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.