models / vision-language · coming soon

Qwen2.5-VL-7B-Instruct

Qwen/Qwen2.5-VL-7B-Instruct

A popular open vision-language model, with 9.8M downloads a month. gigarouter benchmarks and hosts it as an OpenAI-compatible API.

est. price

~$1.341

/ 1k images · estimated, set at launch

API providers

downloads / mo

9.8M

license

apache-2.0

about this model

Qwen2.5-VL-7B-Instruct is a vision-language model hosted on gigarouter as an OpenAI-compatible API. It processes images, videos, and text to perform tasks such as visual question answering, document understanding, visual localization, and structured output generation.

Capabilities

Visual understanding — recognizes objects, text, charts, icons, and layouts within images.
Agentic behavior — can reason and dynamically direct tools, enabling computer and phone use.
Long video comprehension — understands videos over 1 hour and can pinpoint specific events with temporal localization.
Visual localization — outputs bounding boxes or points for objects, with stable JSON for coordinates and attributes.
Structured outputs — extracts structured data from invoices, forms, and tables for finance and commerce applications.

Architecture

The model extends dynamic resolution to the temporal dimension via dynamic FPS sampling and updates mRoPE with absolute time alignment. Its vision encoder uses window attention, SwiGLU, and RMSNorm, aligned with the Qwen2.5 LLM backbone.

Architecture diagram

Benchmark Performance

Selected results on standard benchmarks:

Benchmark	Score
DocVQA (test)	95.7
ChartQA (test)	87.3
MathVista (testmini)	68.2
OCRBench	864
Video-MME (w/ subs)	71.6
MVBench	69.6
ScreenSpot	84.7
Android Control (Low EM)	93.7

On image benchmarks, Qwen2.5-VL-7B outperforms comparable models (InternVL2.5-8B, GPT-4o-mini, Qwen2-VL-7B) on tasks including MMMU-Pro, DocVQA, InfoVQA, ChartQA, MMVet, MathVista, and OCRBench. For video, it surpasses its predecessor on MVBench, PerceptionTest, and Video-MME. Agent benchmarks confirm strong performance in screen grounding and mobile control.

not yet live

We're benchmarking and onboarding Qwen2.5-VL-7B-Instruct as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.