Falcon OCR
tiiuae/Falcon-OCR
published Feb 2026 · updated Jul 2026
Falcon OCR is a 300M parameter early-fusion vision-language model that performs document OCR, extracting plain text, LaTeX formulas, or HTML tables from images.
about this model
Falcon-OCR is a 300M parameter early-fusion vision-language model for document OCR that extracts text, LaTeX formulas, or HTML tables from images. Unlike modular encoder-decoder pipelines, it uses a single Transformer with a hybrid attention mask: image tokens attend bidirectionally while text tokens decode causally conditioned on the image. Task switching is done via prompts (e.g., category="table"). An optional two‑stage pipeline adds layout detection (PP‑DocLayoutV3) for dense multi‑column documents.
Benchmark results
| Benchmark | Score |
|---|---|
| olmOCR (average accuracy) | 80.3% |
| OmniDocBench (Overall↑) | 88.64 |
On olmOCR, Falcon-OCR is especially strong on multi‑column documents (87.1%) and tables (90.3%). On OmniDocBench it achieves an Overall score of 88.64 (edit distance 0.055, CDM 86.8%, TEDS 84.6%).
At 0.3B parameters, the model is roughly 3× smaller than comparable OCR VLMs, translating into higher throughput. On a single A100‑80GB with vLLM, the full layout+OCR pipeline processes 5,825 tok/s and 2.9 img/s.
Output formats
- Plain text: general document text
- LaTeX: formulas and mathematical expressions
- HTML: table
We're benchmarking and onboarding Falcon OCR as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.