skip to content
gigarouter gigarouter
models / vision-language · coming soon

MinerU2.5 2509 1.2B

opendatalab/MinerU2.5-2509-1.2B

published Sep 2025 · updated Apr 2026

A popular open vision-language model, with 21.2K downloads a month. gigarouter benchmarks and hosts it as an OpenAI-compatible API.

est. price
~$0.235
/ 1k images · estimated, set at launch
API providers
0
downloads / mo
21.2K
license
agpl-3.0

about this model

MinerU2.5-2509-1.2B is a 1.2B-parameter vision-language model for document parsing, hosted by gigarouter as a managed API. It achieves state-of-the-art accuracy with high computational efficiency through a coarse-to-fine, two-stage parsing strategy. The model first performs efficient global layout analysis on downsampled images to identify structural elements, then conducts fine-grained content recognition on native-resolution crops for text, formulas, and tables. A large-scale, diverse data engine supports both pretraining and fine-tuning, enabling robust performance across diverse document types.

Key Improvements

  • Comprehensive and Granular Layout Analysis: Preserves non-body elements (headers, footers, page numbers) and uses a refined labeling schema for clearer representation of lists, references, and code blocks.
  • Breakthroughs in Formula Parsing: Delivers high-quality parsing of complex, lengthy mathematical formulae and accurately recognizes mixed-language (Chinese‑English) equations.
  • Enhanced Robustness in Table Parsing: Handles challenging cases such as rotated tables, borderless tables, and tables with partial borders.

Benchmark Performance

On the olmOCR-bench benchmark, MinerU2.5-2509-1.2B achieves the following scores:

Benchmark Score
Overall 75.2
Arxiv Math 76.6
Old Scans Math 54.6
Table Tests 84.9

It ranks 14th overall, 15th for Arxiv Math, and 15th for Old Scans Math in the olmOCR-bench leaderboard.

MinerU2.5 architecture diagram showing two-stage parsing pipeline Example of MinerU2.5 document parsing output highlighting layout, formulas, and tables
not yet live

We're benchmarking and onboarding MinerU2.5 2509 1.2B as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related vision-language models

compare all →