models / zero-shot image · coming soon

BiomedCLIP-PubMedBERT_256-vit_base_patch16_224

microsoft/BiomedCLIP-PubMedBERT_256-vit_base_patch16_224

A popular open zero-shot image model, with 565.8K downloads a month. gigarouter benchmarks and hosts it as an OpenAI-compatible API.

status

coming soon

API providers

downloads / mo

565.8K

license

mit

about this model

Overview

BiomedCLIP-PubMedBERT_256-vit_base_patch16_224 is a biomedical vision-language foundation model pretrained on PMC-15M, a dataset of 15 million figure-caption pairs extracted from PubMed Central research articles. It uses PubMedBERT as the text encoder and a Vision Transformer as the image encoder, with domain-specific adaptations for biomedical data.

Capabilities

The model supports zero-shot image classification, cross-modal retrieval, and visual question answering. It is designed for biomedical image types including microscopy, radiography, histology, and chart images.

Performance

BiomedCLIP establishes state-of-the-art results across a wide range of standard biomedical vision-language benchmarks, substantially outperforming prior approaches.

Example Zero-Shot Classification Labels

adenocarcinoma histopathology
brain MRI
covid line chart
squamous cell carcinoma histopathology
immunohistochemistry histopathology
bone X-ray
chest X-ray
pie chart
hematoxylin and eosin histopathology

Limitations

The model was developed using English-language corpora and is English-only.

Reference

Zhang et al., "A Multimodal Biomedical Foundation Model Trained from Fifteen Million Image–Text Pairs," NEJM AI, 2024. Read the paper.

not yet live

We're benchmarking and onboarding BiomedCLIP-PubMedBERT_256-vit_base_patch16_224 as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.