skip to content
gigarouter gigarouter
models / specialist model · coming soon

Infinity Parser 7B

infly/Infinity-Parser-7B

published Oct 2025 · updated Feb 2026

Infinity Parser 7B is a vision-language model that parses scanned documents into structured text using reinforcement learning with layout-aware rewards.

status
coming soon
API providers
0
downloads / mo
4.2K

specs

TaskScanned document parsing (OCR, table and formula extraction, reading order detection)
ArchitectureVision-Language Model (VLM) based on Qwen2.5-VL-7B
Parameters7B
LicenseApache 2.0

about this model

Infinity-Parser-7B is a vision-language model for end-to-end scanned document parsing, trained with reinforcement learning to preserve layout and content fidelity.

The model uses the LayoutRL framework, optimizing a composite reward of normalized edit distance, paragraph count accuracy, and reading order preservation. It was trained on Infinity-Doc-400K, a dataset of 400K scanned documents with rich layout variations and structural annotations.

On benchmarks including OmniDocBench, olmOCR-Bench, PubTabNet, and FinTabNet, Infinity-Parser-7B achieves state-of-the-art performance across diverse document types, languages (English and Chinese), and structural complexities. It substantially outperforms both specialized document parsing pipelines and general-purpose vision-language models while retaining near-equivalent general multimodal understanding (base model: Qwen2.5-VL-7B, evaluated via LMMS-Eval).

Architecture

The training framework combines reinforcement fine-tuning with edit distance, layout, and order-based rewards.

Architecture diagram of Infinity-Parser training framework showing reward components

Benchmark Results

Performance on olmOCR-bench Performance on OmniDocBench Table recognition performance on PubTabNet and FinTabNet General multimodal capability evaluation compared to Qwen2.5-VL-7B

Visual Comparison

Example comparisons showing Infinity-Parser output vs. other methods

Limitations

  • Does not provide layout or bounding box information, limiting support for downstream structured reconstruction.
  • Lacks perception and structured extraction for charts and figures.

The model was accepted as a Findings paper at ACL 2026. Licensed under Apache 2.0. For further details, see the arXiv paper and GitHub repository.

best for

FAQ

What is the main task of Infinity Parser 7B?

It is a vision-language model for end-to-end scanned document parsing, including OCR, table and formula extraction, and reading order detection.

What base model is Infinity Parser 7B built on?

It is built on Qwen2.5-VL-7B.

What is the license for Infinity Parser 7B?

It is licensed under Apache 2.0.

Does Infinity Parser 7B output bounding boxes or layout information?

No, the current model does not provide layout or bounding box information.

How can I call Infinity Parser 7B via the API?

Use the gigarouter OpenAI-compatible endpoint with an API key.

not yet live

We're benchmarking and onboarding Infinity Parser 7B as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related specialist model models

compare all →