Surya OCR 2
datalab-to/surya-ocr-2
published May 2026 · updated May 2026
Surya OCR 2 is a VLM model that performs OCR, layout analysis, and table recognition on documents.
specs
| Task | OCR, Layout Analysis, Table Recognition |
| Architecture | VLM (Vision Language Model) |
| Parameters | 650M |
| License | Code: Apache 2.0, Weights: Modified AI Pubs Open Rail-M (free for research, personal use, and startups under $5M) |
about this model
Surya is a 650M parameter vision-language model (VLM) for document OCR, text detection, layout analysis, and table recognition, hosted by Gigarouter as a managed API.
Key benchmarks and capabilities
Surya scores 83.3% on olmOCR-bench, ranking 4th overall and first among models under 3B parameters. It achieves 87.2% on an internal 91-language multilingual benchmark.
- Speed: 5 pages/second on an RTX 5090.
- Layout analysis: classifies blocks (text, table, image, header, etc.) and provides reading order.
- Table recognition: extracts rows and columns from tables.
- Multilingual support: 91 languages covered in internal evaluation.
Output formats
Per-block OCR returns HTML (tables as <table>, math in <math>), bounding polygons, confidence scores, and raw labels. Layout output includes canonical labels and 0-indexed reading order.
Visual examples
The model handles a wide range of documents, from newspapers to handwritten notes to corporate reports, as shown below.
| Task | Example output |
|---|---|
| Detection | ![]() |
| OCR | ![]() |
| Layout | ![]() |
| Table recognition | ![]() |
Additional images from the model card illustrate detection, OCR, layout, reading order, and table recognition on newspaper, textbook, tax form, handwritten notes, and corporate documents. The model is optimized for accurate, structured document understanding with fast throughput on modern GPUs.
best for
- ·Extracting text and layout from scanned documents and PDFs
- ·Recognizing tables and reading order in complex multi-column layouts
- ·Multilingual OCR supporting 90+ languages
FAQ
Surya OCR 2 is best for document OCR with layout analysis, table recognition, and reading order extraction, especially for multilingual documents.
Surya OCR 2 scores 83.3% on olmOCR-bench, ranking 4th overall and top under 3B parameters.
The code is Apache 2.0. The model weights use a modified AI Pubs Open Rail-M license, free for research, personal use, and startups under $5M funding/revenue. Broader commercial use requires a license from Datalab.
Use the gigarouter OpenAI-compatible endpoint with an API key. Send an image or PDF as input and receive JSON output with text, layout, and table data.
It supports images (JPEG, PNG, etc.) and PDF files. Output is a structured JSON with per-block text, bounding boxes, layout labels, and confidence scores.
We're benchmarking and onboarding Surya OCR 2 as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.



