Falcon OCR

tiiuae/Falcon-OCR

published Feb 2026 · updated Jul 2026

Falcon OCR is a 300M parameter early-fusion vision-language model that performs document OCR, extracting plain text, LaTeX formulas, or HTML tables from images.

est. price

~$0.094

/ 1k images · estimated, set at launch

API providers

downloads / mo

5.1K

license

apache-2.0

about this model

Falcon-OCR is a 300M parameter early-fusion vision-language model for document OCR that extracts text, LaTeX formulas, or HTML tables from images. Unlike modular encoder-decoder pipelines, it uses a single Transformer with a hybrid attention mask: image tokens attend bidirectionally while text tokens decode causally conditioned on the image. Task switching is done via prompts (e.g., category="table"). An optional two‑stage pipeline adds layout detection (PP‑DocLayoutV3) for dense multi‑column documents.

Benchmark results

Benchmark	Score
olmOCR (average accuracy)	80.3%
OmniDocBench (Overall↑)	88.64

On olmOCR, Falcon-OCR is especially strong on multi‑column documents (87.1%) and tables (90.3%). On OmniDocBench it achieves an Overall score of 88.64 (edit distance 0.055, CDM 86.8%, TEDS 84.6%).

At 0.3B parameters, the model is roughly 3× smaller than comparable OCR VLMs, translating into higher throughput. On a single A100‑80GB with vLLM, the full layout+OCR pipeline processes 5,825 tok/s and 2.9 img/s.

Output formats

Plain text: general document text
LaTeX: formulas and mathematical expressions
HTML: table

not yet live

We're benchmarking and onboarding Falcon OCR as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related image-to-text models

compare all →

blip-image-captioning-base

1.9M dl/mo

blip-image-captioning-large

trocr-small-handwritten

448.6K dl/mo