models / image-to-text · coming soon

trocr-large-printed

microsoft/trocr-large-printed

A popular open image-to-text model, with 133K downloads a month. gigarouter benchmarks and hosts it as an OpenAI-compatible API.

est. price

~$0.235

/ 1k images · estimated, set at launch

API providers

downloads / mo

133K

about this model

microsoft/trocr-large-printed is a Transformer-based optical character recognition (OCR) model fine-tuned on the SROIE dataset for printed text. It converts images of single text lines into digital text, leveraging an encoder-decoder architecture where the image encoder is initialized from BEiT and the text decoder from RoBERTa. Images are processed as sequences of 16x16 patches with absolute position embeddings, enabling the model to autoregressively generate token sequences.

Key Strengths

State-of-the-art accuracy on printed text OCR, as demonstrated by its fine-tuning on the SROIE dataset (a standard benchmark for printed receipt OCR).
Transformer-based architecture eliminates the need for traditional layout or segmentation steps, providing end-to-end recognition.
Large model variant (approximately 325M parameters) for high-fidelity transcription.

Best For

This model is optimized for printed text in structured documents, such as receipts, invoices, and forms. It excels at recognizing single text-line images with clear, machine-printed characters.

Background

Introduced in “TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models” (Li et al., 2021, arXiv:2109.10282). The model is fine-tuned on the SROIE dataset and hosted as a managed API on gigarouter, providing OpenAI-compatible endpoints for image-to-text inference.

not yet live

We're benchmarking and onboarding trocr-large-printed as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.