skip to content
gigarouter gigarouter
models / zero-shot image · coming soon

BiomedCLIP-PubMedBERT_256-vit_base_patch16_224

microsoft/BiomedCLIP-PubMedBERT_256-vit_base_patch16_224

A popular open zero-shot image model, with 565.8K downloads a month. gigarouter benchmarks and hosts it as an OpenAI-compatible API.

status
coming soon
API providers
0
downloads / mo
565.8K
license
mit

about this model

Overview

BiomedCLIP-PubMedBERT_256-vit_base_patch16_224 is a biomedical vision-language foundation model pretrained on PMC-15M, a dataset of 15 million figure-caption pairs extracted from PubMed Central research articles. It uses PubMedBERT as the text encoder and a Vision Transformer as the image encoder, with domain-specific adaptations for biomedical data.

Capabilities

The model supports zero-shot image classification, cross-modal retrieval, and visual question answering. It is designed for biomedical image types including microscopy, radiography, histology, and chart images.

Performance

BiomedCLIP establishes state-of-the-art results across a wide range of standard biomedical vision-language benchmarks, substantially outperforming prior approaches.

BiomedCLIP benchmark performance chart

Example Zero-Shot Classification Labels

  • adenocarcinoma histopathology
  • brain MRI
  • covid line chart
  • squamous cell carcinoma histopathology
  • immunohistochemistry histopathology
  • bone X-ray
  • chest X-ray
  • pie chart
  • hematoxylin and eosin histopathology

Limitations

The model was developed using English-language corpora and is English-only.

Reference

Zhang et al., "A Multimodal Biomedical Foundation Model Trained from Fifteen Million Image–Text Pairs," NEJM AI, 2024. Read the paper.

not yet live

We're benchmarking and onboarding BiomedCLIP-PubMedBERT_256-vit_base_patch16_224 as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.