marqo-fashionSigLIP
Marqo/marqo-fashionSigLIP
A popular open zero-shot image model, with 642.8K downloads a month. gigarouter benchmarks and hosts it as an OpenAI-compatible API.
about this model
Marqo-FashionSigLIP is a multimodal embedding model for fashion search and retrieval. It provides up to a 57% improvement in mean reciprocal rank (MRR) and recall over FashionCLIP. The model leverages Generalized Contrastive Learning (GCL), allowing it to be trained on text descriptions, categories, style, colors, materials, keywords, and fine details to deliver highly relevant results for fashion products. It is fine-tuned from ViT-B-16-SigLIP (webli).
This model is best suited for zero-shot image retrieval and classification tasks in the fashion domain, including text-to-image, category-to-product, and sub-category-to-product matching. A newer version, marqo-fashion-SigLip-2, is available with a further 78% improvement in MRR and recall.
Benchmark Results
Average evaluation results across six public multimodal fashion datasets (Atlas, DeepFashion In-shop, DeepFashion Multimodal, Fashion200k, KAGL, Polyvore) are shown below.
Text-To-Image (Averaged across 6 datasets)
| Model | AvgRecall | Recall@1 | Recall@10 | MRR |
|---|---|---|---|---|
| Marqo-FashionSigLIP | 0.231 | 0.121 | 0.340 | 0.239 |
| FashionCLIP2.0 | 0.163 | 0.077 | 0.249 | 0.165 |
| OpenFashionCLIP | 0.132 | 0.060 | 0.204 | 0.135 |
| ViT-B-16-laion2b_s34b_b88k | 0.174 | 0.088 | 0.261 | 0.180 |
| ViT-B-16-SigLIP-webli | 0.212 | 0.111 | 0.314 | 0.214 |
Category-To-Product (Averaged across 5 datasets)
| Model | AvgP | P@1 | P@10 | MRR |
|---|---|---|---|---|
| Marqo-FashionSigLIP | 0.737 | 0.758 | 0.716 | 0.812 |
| FashionCLIP2.0 | 0.684 | 0.681 | 0.686 | 0.741 |
| OpenFashionCLIP | 0.646 | 0.653 | 0.639 | 0.720 |
| ViT-B-16-laion2b_s34b_b88k | 0.662 | 0.673 | 0.652 | 0.743 |
| ViT-B-16-SigLIP-webli | 0.688 | 0.690 | 0.685 | 0.751 |
Sub-Category-To-Product (Averaged across 4 datasets)
| Model | AvgP | P@1 | P@10 | MRR |
|---|---|---|---|---|
| Marqo-FashionSigLIP | 0.725 | 0.767 | 0.683 | 0.811 |
| FashionCLIP2.0 | 0.657 | 0.676 | 0.638 | 0.733 |
| OpenFashionCLIP | 0.598 | 0.619 | 0.578 | 0.689 |
| Vi |
We're benchmarking and onboarding marqo-fashionSigLIP as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.