skip to content
gigarouter gigarouter
models / image-to-text · coming soon

mgp-str-base

alibaba-damo/mgp-str-base

A popular open image-to-text model, with 110.8K downloads a month. gigarouter benchmarks and hosts it as an OpenAI-compatible API.

est. price
~$0.047
/ 1k images · estimated, set at launch
API providers
0
downloads / mo
110.8K

about this model

MGP-STR (base-sized) is an image-to-text model for scene text recognition that operates as a pure vision transformer, combining a ViT backbone with multi-granularity prediction modules to decode character, subword, and word-level outputs from a single input image.

Model description

MGP-STR uses a ViT encoder (initialized from DeiT-base weights) to process 32x128 pixel input images as a sequence of 4x4 patches. The encoder output is passed through A^3 modules that select and integrate token combinations corresponding to individual characters. Subword prediction heads based on BPE and WordPiece are added to implicitly model language information. All granularity predictions are merged via a simple fusion strategy to produce the final text output.

Training data and origin

The model was trained on MJSynth and SynthText, and introduced in the paper “Multi-Granularity Prediction for Scene Text Recognition” (ECCV 2022) by Alibaba Research. The base version is published under the name alibaba-damo/mgp-str-base.

Example output

The model accepts text-line images such as the one below and returns a recognized string:

Intended use

This model is designed for optical character recognition (OCR) on cropped text images from natural scenes. It is well-suited as a drop-in specialist OCR component for applications requiring high‑accuracy scene text recognition. Gigarouter hosts this model as a managed API compatible with the OpenAI API schema, eliminating the need for local installation or GPU infrastructure.

not yet live

We're benchmarking and onboarding mgp-str-base as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.