models / image-to-text · coming soon

PP-OCRv6_medium_det

PaddlePaddle/PP-OCRv6_medium_det

A popular open image-to-text model, with 89K downloads a month. gigarouter benchmarks and hosts it as an OpenAI-compatible API.

status

coming soon

API providers

downloads / mo

89K

license

apache-2.0

about this model

PP-OCRv6_medium_det is a text detection model for image-to-text tasks, part of the PP-OCRv6 family developed by PaddlePaddle. It is designed to locate text regions in images with high accuracy, serving as the detection component in an OCR pipeline.

Key strengths

Compact model size: available in a range from 1.5M to 34.5M parameters, enabling deployment on resource-constrained devices.
Competitive accuracy: claims to surpass billion-scale vision-language models on OCR tasks, as demonstrated in internal benchmarks.
Optimized for real-world text detection: trained on diverse datasets to handle various fonts, scales, and orientations.

Best for

OCR pipelines requiring fast, lightweight text detection.
Scenarios where computational resources are limited (e.g., mobile, edge).
Applications needing high detection precision without the overhead of large models.

Benchmark comparisons

The following table (from the model card) illustrates the performance of PP-OCRv6 models relative to larger vision-language models:

Detailed metrics are available in the original model card. The model is hosted on Gigarouter as a managed API, providing low-latency inference without requiring local installation or model management.

not yet live

We're benchmarking and onboarding PP-OCRv6_medium_det as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.