mask2former-swin-base-ade-semantic
facebook/mask2former-swin-base-ade-semantic
A popular open image segmentation model, with 14.7K downloads a month. gigarouter benchmarks and hosts it as an OpenAI-compatible API.
about this model
Mask2Former (Swin-Base backbone) is a unified image segmentation model trained on ADE20k for semantic segmentation. It adopts a single paradigm for instance, semantic, and panoptic segmentation by predicting a set of masks and their corresponding labels. This checkpoint is optimized for semantic segmentation and can also be applied to panoptic and instance segmentation tasks due to its universal architecture.
Key strengths
- Multi-scale deformable attention Transformer pixel decoder for improved feature extraction.
- Masked attention in the Transformer decoder that boosts performance without additional computation.
- Loss calculation on subsampled points rather than full masks, improving training efficiency.
- Outperforms the previous state-of-the-art MaskFormer in both accuracy and efficiency.
Best for
Developers needing a high‑performance semantic segmentation model that can be repurposed for instance or panoptic segmentation. The Swin-Base variant offers a strong balance of accuracy and computational cost, suitable for production deployments via an API.
We're benchmarking and onboarding mask2former-swin-base-ade-semantic as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.