skip to content
gigarouter gigarouter
tasks / image-to-text

Hosted image-to-text models

36 models · 0 live as APIs · benchmarked & compared

Image-to-text models convert visual information into structured text outputs. They solve a range of problems: optical character recognition (OCR) extracts printed or handwritten text from scanned documents and photos; image captioning generates descriptive text for accessibility, content moderation, or metadata generation; and specialist models handle document layout analysis, orientation detection, or domain-specific inputs such as manga panels. For example, microsoft/trocr-small-handwritten transcribes handwritten notes, while PaddlePaddle/PP-OCRv5_server_det detects and reads text in natural scenes. Others like Salesforce/blip-image-captioning-base produce natural language captions, and numind/NuExtract3 extracts structured data from document images.

In production, these models are typically chained into pipelines. A common pattern is document processing: first detect text regions, then recognize characters, and finally parse the output into actionable fields. Some systems combine orientation detection (PaddlePaddle/PP-LCNet_x1_0_doc_ori) and full document understanding (PaddlePaddle/UVDoc) before extraction. The choice between models involves a trade-off between size, quality, and speed. Smaller models offer lower latency and reduced compute cost but may sacrifice accuracy on noisy or complex inputs. Larger models deliver higher-quality results at the expense of throughput. Domain‑specific models, such as kha-white/manga-ocr-base, can outperform general‑purpose OCR on their target data.

For most call volumes, using a hosted API eliminates the operational burden of managing infrastructure, provisioning GPUs, and handling scaling—while still providing pay‑as‑you‑go flexibility and consistent performance.

compare

modelparamsdownloads/mopricestatus
Salesforce/blip-image-captioning-base-1.9Mat launchcoming soon
Salesforce/blip-image-captioning-large469.7M752.9K~$0.094 / 1k imagescoming soon
PaddlePaddle/PP-OCRv5_server_det-587.3Kat launchcoming soon
numind/NuExtract34539.3M520.7K~$1.341 / 1k imagescoming soon
PaddlePaddle/UVDoc-512.8Kat launchcoming soon
microsoft/trocr-small-handwritten-448.6Kat launchcoming soon
PaddlePaddle/PP-LCNet_x1_0_doc_ori-445.3Kat launchcoming soon
kha-white/manga-ocr-base-389.4Kat launchcoming soon
ibm-granite/granite-vision-3.3-2b2975.4M343.3K~$0.626 / 1k imagescoming soon
PaddlePaddle/PP-LCNet_x1_0_textline_ori-274.6Kat launchcoming soon
microsoft/trocr-base-printed333.3M251.5K~$0.094 / 1k imagescoming soon
lightonai/LightOnOCR-1B-10251161.2M199.9K~$0.235 / 1k imagescoming soon
PaddlePaddle/PP-OCRv5_server_rec-189.4Kat launchcoming soon
microsoft/trocr-large-handwritten-182.4Kat launchcoming soon
microsoft/kosmos-2-patch14-2241664.5M166.7K~$0.626 / 1k imagescoming soon
naver-clova-ix/donut-base-166Kat launchcoming soon
microsoft/trocr-base-stage1384.3M149K~$0.094 / 1k imagescoming soon
facebook/nougat-base348.7M145.4K~$0.094 / 1k imagescoming soon
microsoft/trocr-large-printed608.1M133K~$0.235 / 1k imagescoming soon
PaddlePaddle/PP-OCRv5_mobile_det-129.4Kat launchcoming soon
microsoft/trocr-base-handwritten333.3M124K~$0.094 / 1k imagescoming soon
alibaba-damo/mgp-str-base148M110.8K~$0.047 / 1k imagescoming soon
PaddlePaddle/PP-OCRv6_medium_det-89Kat launchcoming soon
PaddlePaddle/PP-OCRv6_medium_rec-79.9Kat launchcoming soon
PaddlePaddle/PP-OCRv5_mobile_rec-74.5Kat launchcoming soon
rtr46/meiki.txt.recognition.v0-65.6Kat launchcoming soon
nlpconnect/vit-gpt2-image-captioning-64.4Kat launchcoming soon
PaddlePaddle/latin_PP-OCRv5_mobile_rec-37.5Kat launchcoming soon
microsoft/trocr-small-printed61.4M36.3K~$0.047 / 1k imagescoming soon
facebook/nougat-small247.4M28.5K~$0.094 / 1k imagescoming soon
unsloth/GLM-OCR-28Kat launchcoming soon
numind/NuMarkdown-8B-Thinking8292.2M26.1K~$1.341 / 1k imagescoming soon
PaddlePaddle/en_PP-OCRv4_mobile_rec-24.6Kat launchcoming soon
PaddlePaddle/PP-DocLayout_plus-L-21.3Kat launchcoming soon
PaddlePaddle/PP-OCRv4_mobile_det-20.1Kat launchcoming soon
PaddlePaddle/PP-DocBlockLayout-18.6Kat launchcoming soon