skip to content
gigarouter gigarouter
models / vision-language · coming soon

DeepSeek-OCR-2

deepseek-ai/DeepSeek-OCR-2

A popular open vision-language model, with 3.3M downloads a month. gigarouter benchmarks and hosts it as an OpenAI-compatible API.

est. price
~$0.626
/ 1k images · estimated, set at launch
API providers
0
downloads / mo
3.3M
license
apache-2.0

about this model

DeepSeek-OCR 2 is a vision-language model (VLM) specialized for optical character recognition (OCR) and document understanding. It processes images and natural language prompts to extract text, with a primary focus on converting documents to structured markdown or performing free-form OCR. The model introduces a "Visual Causal Flow" architecture designed to achieve more human-like visual encoding.

Capabilities

  • Document OCR with layout-aware markdown output via the <image>\n<|grounding|>Convert the document to markdown. prompt.
  • Free OCR without layout information using the <image>\nFree OCR. prompt.
  • Dynamic resolution support: default configuration uses (0–6)×768×768 tiles plus one 1024×1024 tile, yielding (0–6)×144 + 256 visual tokens.

Best For

Developers needing high-quality, prompt-controlled OCR from images of documents, tables, or forms, especially when retaining spatial layout in markdown output is required.

DeepSeek-OCR 2 visual representation

For further details, refer to the arXiv paper and the GitHub repository.

not yet live

We're benchmarking and onboarding DeepSeek-OCR-2 as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.