skip to content
gigarouter gigarouter
models / vision-language · coming soon

Unlimited-OCR

sahilchachra/Unlimited-OCR-GGUF

published Jun 2026 · updated Jul 2026

Unlimited-OCR is a 3B vision-language model for one-shot, long-horizon document OCR and parsing, supporting multilingual text recognition with bounding boxes.

status
coming soon
API providers
0
downloads / mo
43.7K
license
mit

specs

TaskOCR / document parsing
ArchitectureDeepSeek-OCR (DeepEncoder vision + DeepSeek-V2 MoE text decoder)
Parameters3B
LicenseMIT
TaskOCR / document parsing
ArchitectureDeepSeek-OCR (DeepEncoder vision + DeepSeek-V2 MoE text decoder)
Parameters3B
LicenseMIT

about this model

Unlimited-OCR is a 3-billion-parameter vision-language model specialized in one-shot, long-horizon document parsing and multilingual OCR. Built on the DeepSeek-OCR architecture, it combines a DeepEncoder vision tower comprising SAM-ViT-B and CLIP-L/14 with a DeepSeek-V2 Mixture-of-Experts text decoder. The model processes images at 1024×1024 resolution and outputs structured text, Markdown, or bounding-box-grounded extractions.

Key Capabilities and Strengths

  • One-shot parsing of dense, multi-page documents without sliding windows.
  • Layout-aware Markdown conversion that preserves tables, headings, and reading order.
  • Grounding mode that interleaves recognized text with spatial bounding boxes (<|det|>...</det>).
  • Support for plain text OCR, figure/chart parsing, and referring expression comprehension.

Benchmark Performance

On the ParseBench dataset, Unlimited-OCR achieves the following scores:

MetricScoreRank
Mean46.1713
Text Content86.819
Layout71.526
Table70.2112

Technical Details

The model is licensed under MIT. It processes input images at 1024×1024 with a 16× downsample factor. The vision encoder runs at FP16 precision, while the text decoder can be quantized. Through gigarouter, Unlimited-OCR is available as an OpenAI-compatible API, eliminating the need for local inference setup.

best for

FAQ

What is Unlimited-OCR best for?

It is best for one-shot document parsing, converting images to Markdown, extracting text with bounding boxes, and locating specific text in documents.

What quantizations are available?

K-quants (BF16, Q8_0, Q6_K, Q5_K_M, Q5_K_S, Q4_K_M, Q4_K_S, Q3_K_M) and i-quants (IQ4_XS, IQ4_NL, IQ3_M, IQ3_XXS, IQ2_M) are available. The recommended default is Q4_K_M.

How do I call Unlimited-OCR via the gigarouter API?

Use the gigarouter OpenAI-compatible endpoint with your API key. Send a chat completion request with an image URL (base64 data URL) and the appropriate prompt (e.g., "<|grounding|>Convert the document to markdown.").

What license does Unlimited-OCR use?

It is released under the MIT license (inherited from the base model).

What input and output formats are supported?

Input: an image (supported formats like PNG/JPEG) plus a text instruction. Output: Markdown text with optional bounding boxes in tokens like <|det|>...</|det|> when using the <|grounding|> prefix.

What is Unlimited-OCR best for?

It is best for one-shot document parsing, converting images to Markdown, extracting text with bounding boxes, and locating specific text in documents.

What quantizations are available?

K-quants (BF16, Q8_0, Q6_K, Q5_K_M, Q5_K_S, Q4_K_M, Q4_K_S, Q3_K_M) and i-quants (IQ4_XS, IQ4_NL, IQ3_M, IQ3_XXS, IQ2_M) are available. The recommended default is Q4_K_M.

How do I call Unlimited-OCR via the gigarouter API?

Use the gigarouter OpenAI-compatible endpoint with your API key. Send a chat completion request with an image URL (base64 data URL) and the appropriate prompt (e.g., "<|grounding|>Convert the document to markdown.").

What license does Unlimited-OCR use?

It is released under the MIT license (inherited from the base model).

What input and output formats are supported?

Input: an image (supported formats like PNG/JPEG) plus a text instruction. Output: Markdown text with optional bounding boxes in tokens like <|det|>...</det|> when using the <|grounding|> prefix.

not yet live

We're benchmarking and onboarding Unlimited-OCR as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related vision-language models

compare all →