models / image segmentation · coming soon

mask2former-swin-base-ade-semantic

facebook/mask2former-swin-base-ade-semantic

A popular open image segmentation model, with 14.7K downloads a month. gigarouter benchmarks and hosts it as an OpenAI-compatible API.

est. price

~$0.047

/ 1k images · estimated, set at launch

API providers

downloads / mo

14.7K

license

other

about this model

Mask2Former (Swin-Base backbone) is a unified image segmentation model trained on ADE20k for semantic segmentation. It adopts a single paradigm for instance, semantic, and panoptic segmentation by predicting a set of masks and their corresponding labels. This checkpoint is optimized for semantic segmentation and can also be applied to panoptic and instance segmentation tasks due to its universal architecture.

Key strengths

Multi-scale deformable attention Transformer pixel decoder for improved feature extraction.
Masked attention in the Transformer decoder that boosts performance without additional computation.
Loss calculation on subsampled points rather than full masks, improving training efficiency.
Outperforms the previous state-of-the-art MaskFormer in both accuracy and efficiency.

Best for

Developers needing a high‑performance semantic segmentation model that can be repurposed for instance or panoptic segmentation. The Swin-Base variant offers a strong balance of accuracy and computational cost, suitable for production deployments via an API.

not yet live

We're benchmarking and onboarding mask2former-swin-base-ade-semantic as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.