BiomedCLIP-PubMedBERT_256-vit_base_patch16_224
microsoft/BiomedCLIP-PubMedBERT_256-vit_base_patch16_224
A popular open zero-shot image model, with 565.8K downloads a month. gigarouter benchmarks and hosts it as an OpenAI-compatible API.
about this model
Overview
BiomedCLIP-PubMedBERT_256-vit_base_patch16_224 is a biomedical vision-language foundation model pretrained on PMC-15M, a dataset of 15 million figure-caption pairs extracted from PubMed Central research articles. It uses PubMedBERT as the text encoder and a Vision Transformer as the image encoder, with domain-specific adaptations for biomedical data.
Capabilities
The model supports zero-shot image classification, cross-modal retrieval, and visual question answering. It is designed for biomedical image types including microscopy, radiography, histology, and chart images.
Performance
BiomedCLIP establishes state-of-the-art results across a wide range of standard biomedical vision-language benchmarks, substantially outperforming prior approaches.
Example Zero-Shot Classification Labels
- adenocarcinoma histopathology
- brain MRI
- covid line chart
- squamous cell carcinoma histopathology
- immunohistochemistry histopathology
- bone X-ray
- chest X-ray
- pie chart
- hematoxylin and eosin histopathology
Limitations
The model was developed using English-language corpora and is English-only.
Reference
Zhang et al., "A Multimodal Biomedical Foundation Model Trained from Fifteen Million Image–Text Pairs," NEJM AI, 2024. Read the paper.
We're benchmarking and onboarding BiomedCLIP-PubMedBERT_256-vit_base_patch16_224 as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.