Unlimited-OCR
sahilchachra/Unlimited-OCR-GGUF
published Jun 2026 · updated Jul 2026
Unlimited-OCR is a 3B vision-language model for one-shot, long-horizon document OCR and parsing, supporting multilingual text recognition with bounding boxes.
specs
| Task | OCR / document parsing |
| Architecture | DeepSeek-OCR (DeepEncoder vision + DeepSeek-V2 MoE text decoder) |
| Parameters | 3B |
| License | MIT |
| Task | OCR / document parsing |
| Architecture | DeepSeek-OCR (DeepEncoder vision + DeepSeek-V2 MoE text decoder) |
| Parameters | 3B |
| License | MIT |
about this model
Unlimited-OCR is a 3-billion-parameter vision-language model specialized in one-shot, long-horizon document parsing and multilingual OCR. Built on the DeepSeek-OCR architecture, it combines a DeepEncoder vision tower comprising SAM-ViT-B and CLIP-L/14 with a DeepSeek-V2 Mixture-of-Experts text decoder. The model processes images at 1024×1024 resolution and outputs structured text, Markdown, or bounding-box-grounded extractions.
Key Capabilities and Strengths
- One-shot parsing of dense, multi-page documents without sliding windows.
- Layout-aware Markdown conversion that preserves tables, headings, and reading order.
- Grounding mode that interleaves recognized text with spatial bounding boxes (
<|det|>...</det>). - Support for plain text OCR, figure/chart parsing, and referring expression comprehension.
Benchmark Performance
On the ParseBench dataset, Unlimited-OCR achieves the following scores:
| Metric | Score | Rank |
|---|---|---|
| Mean | 46.17 | 13 |
| Text Content | 86.81 | 9 |
| Layout | 71.52 | 6 |
| Table | 70.21 | 12 |
Technical Details
The model is licensed under MIT. It processes input images at 1024×1024 with a 16× downsample factor. The vision encoder runs at FP16 precision, while the text decoder can be quantized. Through gigarouter, Unlimited-OCR is available as an OpenAI-compatible API, eliminating the need for local inference setup.
best for
- ·Convert scanned documents to Markdown with layout and bounding boxes
- ·Extract plain text from receipts or invoices
- ·Locate specific text fields (e.g., invoice number) with coordinates
FAQ
It is best for one-shot document parsing, converting images to Markdown, extracting text with bounding boxes, and locating specific text in documents.
K-quants (BF16, Q8_0, Q6_K, Q5_K_M, Q5_K_S, Q4_K_M, Q4_K_S, Q3_K_M) and i-quants (IQ4_XS, IQ4_NL, IQ3_M, IQ3_XXS, IQ2_M) are available. The recommended default is Q4_K_M.
Use the gigarouter OpenAI-compatible endpoint with your API key. Send a chat completion request with an image URL (base64 data URL) and the appropriate prompt (e.g., "<|grounding|>Convert the document to markdown.").
It is released under the MIT license (inherited from the base model).
Input: an image (supported formats like PNG/JPEG) plus a text instruction. Output: Markdown text with optional bounding boxes in tokens like <|det|>...</|det|> when using the <|grounding|> prefix.
It is best for one-shot document parsing, converting images to Markdown, extracting text with bounding boxes, and locating specific text in documents.
K-quants (BF16, Q8_0, Q6_K, Q5_K_M, Q5_K_S, Q4_K_M, Q4_K_S, Q3_K_M) and i-quants (IQ4_XS, IQ4_NL, IQ3_M, IQ3_XXS, IQ2_M) are available. The recommended default is Q4_K_M.
Use the gigarouter OpenAI-compatible endpoint with your API key. Send a chat completion request with an image URL (base64 data URL) and the appropriate prompt (e.g., "<|grounding|>Convert the document to markdown.").
It is released under the MIT license (inherited from the base model).
Input: an image (supported formats like PNG/JPEG) plus a text instruction. Output: Markdown text with optional bounding boxes in tokens like <|det|>...</det|> when using the <|grounding|> prefix.
We're benchmarking and onboarding Unlimited-OCR as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.