trocr-large-printed
microsoft/trocr-large-printed
A popular open image-to-text model, with 133K downloads a month. gigarouter benchmarks and hosts it as an OpenAI-compatible API.
about this model
microsoft/trocr-large-printed is a Transformer-based optical character recognition (OCR) model fine-tuned on the SROIE dataset for printed text. It converts images of single text lines into digital text, leveraging an encoder-decoder architecture where the image encoder is initialized from BEiT and the text decoder from RoBERTa. Images are processed as sequences of 16x16 patches with absolute position embeddings, enabling the model to autoregressively generate token sequences.
Key Strengths
- State-of-the-art accuracy on printed text OCR, as demonstrated by its fine-tuning on the SROIE dataset (a standard benchmark for printed receipt OCR).
- Transformer-based architecture eliminates the need for traditional layout or segmentation steps, providing end-to-end recognition.
- Large model variant (approximately 325M parameters) for high-fidelity transcription.
Best For
This model is optimized for printed text in structured documents, such as receipts, invoices, and forms. It excels at recognizing single text-line images with clear, machine-printed characters.
Background
Introduced in “TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models” (Li et al., 2021, arXiv:2109.10282). The model is fine-tuned on the SROIE dataset and hosted as a managed API on gigarouter, providing OpenAI-compatible endpoints for image-to-text inference.
We're benchmarking and onboarding trocr-large-printed as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.