models / image-to-text · coming soon

manga-ocr-base

kha-white/manga-ocr-base

A popular open image-to-text model, with 389.4K downloads a month. gigarouter benchmarks and hosts it as an OpenAI-compatible API.

status

coming soon

API providers

downloads / mo

389.4K

license

apache-2.0

about this model

kha-white/manga-ocr-base is an image-to-text model that performs optical character recognition (OCR) for printed Japanese text, with a primary focus on Japanese manga.

Key Strengths

Robust recognition of both vertical and horizontal text layouts.
Handles text with furigana (phonetic annotations) accurately.
Works reliably on text overlaid on images, such as speech bubbles.
Supports a wide variety of fonts, styles, and low-quality image inputs.

Best for

Developers requiring high-quality Japanese text extraction from manga pages, scanned comics, or any printed Japanese material where conventional OCR may struggle with layout complexity or image degradation.

Architecture

The model uses a Vision Encoder Decoder framework, built on Hugging Face Transformers. Underlying code and training details are available in the official repository.

Hosted by Gigarouter

This model is offered as a managed, OpenAI-compatible API. No installation or local setup is required; simply call the endpoint with an image and receive text output.

not yet live

We're benchmarking and onboarding manga-ocr-base as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.