all-MiniLM-L6-v2
Xenova/all-MiniLM-L6-v2
A popular open embeddings model, with 2.8M downloads a month. gigarouter benchmarks and hosts it as an OpenAI-compatible API.
about this model
This model is a sentence embedding model designed to convert sentences and short paragraphs into fixed-size 384-dimensional vectors. It is the ONNX-exported version of sentence-transformers/all-MiniLM-L6-v2, optimized for inference in web and edge environments via the Transformers.js library.
Key Strengths
- Produces 384-dimensional dense embeddings from text input using mean pooling and normalization.
- Compact 6-layer MiniLM architecture balances speed and accuracy, making it suitable for latency-sensitive applications.
- ONNX format enables efficient execution in JavaScript runtimes and CPU/GPU-accelerated environments.
Best For
- Semantic textual similarity, clustering, and information retrieval where fixed-size embeddings are required.
- Building search or recommendation systems that compare sentence-level meaning.
- Deployments requiring a lightweight, fast embedding model with minimal computational overhead.
Benchmark Results
This model card does not include specific benchmark numbers. The underlying all-MiniLM-L6-v2 is known for strong performance on the Sentence Embeddings Benchmark (e.g., STS Benchmark, SentEval) relative to its size, but only the ONNX export is provided here. Users should evaluate on their own datasets.
We're benchmarking and onboarding all-MiniLM-L6-v2 as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.