MinerU2.5 2509 1.2B

opendatalab/MinerU2.5-2509-1.2B

published Sep 2025 · updated Apr 2026

A popular open vision-language model, with 21.2K downloads a month. gigarouter benchmarks and hosts it as an OpenAI-compatible API.

est. price

~$0.235

/ 1k images · estimated, set at launch

API providers

downloads / mo

21.2K

license

agpl-3.0

about this model

MinerU2.5-2509-1.2B is a 1.2B-parameter vision-language model for document parsing, hosted by gigarouter as a managed API. It achieves state-of-the-art accuracy with high computational efficiency through a coarse-to-fine, two-stage parsing strategy. The model first performs efficient global layout analysis on downsampled images to identify structural elements, then conducts fine-grained content recognition on native-resolution crops for text, formulas, and tables. A large-scale, diverse data engine supports both pretraining and fine-tuning, enabling robust performance across diverse document types.

Key Improvements

Comprehensive and Granular Layout Analysis: Preserves non-body elements (headers, footers, page numbers) and uses a refined labeling schema for clearer representation of lists, references, and code blocks.
Breakthroughs in Formula Parsing: Delivers high-quality parsing of complex, lengthy mathematical formulae and accurately recognizes mixed-language (Chinese‑English) equations.
Enhanced Robustness in Table Parsing: Handles challenging cases such as rotated tables, borderless tables, and tables with partial borders.

Benchmark Performance

On the olmOCR-bench benchmark, MinerU2.5-2509-1.2B achieves the following scores:

Benchmark	Score
Overall	75.2
Arxiv Math	76.6
Old Scans Math	54.6
Table Tests	84.9

It ranks 14th overall, 15th for Arxiv Math, and 15th for Old Scans Math in the olmOCR-bench leaderboard.

MinerU2.5 architecture diagram showing two-stage parsing pipeline

Example of MinerU2.5 document parsing output highlighting layout, formulas, and tables

not yet live

We're benchmarking and onboarding MinerU2.5 2509 1.2B as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related vision-language models

compare all →

Qwen2.5-VL-7B-Instruct

9.8M dl/mo

Qwen3.6-35B-A3B-FP8

6.2M dl/mo

Qwen2.5-VL-3B-Instruct

5.3M dl/mo

gemma-4-26B-A4B-it-AWQ-4bit