Chandra OCR 2

datalab-to/chandra-ocr-2

published Mar 2026 · updated Jun 2026

Chandra OCR 2 is a vlm model that converts images and PDFs into structured markdown, HTML, and JSON while preserving layout information.

est. price

~$1.341

/ 1k images · estimated, set at launch

API providers

downloads / mo

1.3M

license

openrail

specs

Task	Optical Character Recognition (OCR) and Document Understanding
Architecture	Vision-Language Model (VLM)
License	Code: Apache 2.0; Model weights: Modified OpenRAIL-M (free for research, personal use, and startups under $2M funding/revenue; cannot be used competitively with Datalab API)

about this model

Chandra OCR 2 is a vision-language model (VLM) that converts images and PDFs into structured Markdown, HTML, or JSON while preserving document layout. It is hosted on Gigarouter as a managed, OpenAI-compatible API.

Key Capabilities

The model extracts text, tables, math, handwriting, checkboxes, and images with captions from documents. It supports 90+ languages and outputs structured formats with detailed layout information.

Benchmark Performance

Chandra 2 achieves an 85.8% overall score on the olmOCR benchmark, with strong results across document types:

ArXiv: 86.9%
Old Scans Math: 89.1%
Tables: 92.1%
Multi column: 82.1%
Long tiny text: 93.7%

On the 43-language multilingual benchmark, Chandra 2 averages 77.8%, a 12% improvement over Chandra 1 (69.4%). In the full 90-language evaluation, Chandra 2 averages 72.7% ± 1.2% compared to Gemini 2.5 Flash at 60.8% ± 1.3%. Languages with Chandra 2 scores above 90% include English (96.6%), German (94.8%), Italian (94.6%), French (93.7%), Swedish (93.3%), Danish (91.1%), Indonesian (91.6%), Polish (91.5%), Ukrainian (91.0%), Norwegian (90.5%), Breton (90.0%), Croatian (90.1%), and Serbian (90.3%).

Throughput

On a single NVIDIA H100 80GB GPU with vLLM and 96 concurrent sequences, Chandra 2 processes 1.44 pages per second with an average latency of 60 seconds and a P95 latency of 156 seconds. Real-world usage is estimated at 2 pages per second.

Output Formats

The model outputs Markdown, HTML, or JSON with detailed layout information, including image and diagram extraction with captions and structured data.

Example output showing document conversion to structured format

best for

·Converting scanned documents and PDFs to structured markdown or HTML
·Extracting text from complex layouts with tables, math, and multi-column formats
·Handwritten form and document transcription with layout preservation
·Multilingual OCR across 90+ languages

FAQ

What output formats does Chandra OCR 2 support?

It outputs markdown, HTML, and JSON with detailed layout information.

What languages does Chandra OCR 2 support?

It supports 90+ languages, with a 77.8% average score on a 43-language multilingual benchmark.

What is the license for Chandra OCR 2?

The code is Apache 2.0. The model weights use a modified OpenRAIL-M license: free for research, personal use, and startups under $2M funding/revenue. It cannot be used competitively with the Datalab API.

How do I call Chandra OCR 2 via the gigarouter API?

Use the gigarouter OpenAI-compatible endpoint with your API key. Send an image or PDF as input and specify the desired output format (markdown, HTML, or JSON).

What is the throughput of Chandra OCR 2?

Benchmarked with vLLM on a single NVIDIA H100 80GB GPU, it achieves 1.44 pages/sec with 96 concurrent sequences, with an average latency of 60 seconds and a 0% failure rate.

not yet live

We're benchmarking and onboarding Chandra OCR 2 as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related vision-language models

compare all →

Qwen2.5-VL-7B-Instruct

9.8M dl/mo

Qwen3.6-35B-A3B-FP8

6.2M dl/mo

Qwen2.5-VL-3B-Instruct

5.3M dl/mo

gemma-4-26B-A4B-it-AWQ-4bit