tasks / vision-language

Hosted vision-language models

32 models · 0 live as APIs · benchmarked & compared

Vision-language models process both images and text, enabling tasks such as extracting structured data from scanned documents, answering questions about photographs, and generating captions for accessibility. For example, deepseek-ai/DeepSeek-OCR-2 is specialised for optical character recognition, while series like Qwen/Qwen2.5-VL-7B-Instruct and Qwen/Qwen2-VL-2B-Instruct support visual question answering and image-to-text generation.

Document digitisation and invoice parsing
Automated content moderation on visual platforms
Visual search and retrieval-augmented generation (RAG) pipelines

In production, these models are often integrated into RAG workflows or multimodal chatbots. Choosing among the 32 models listed here involves balancing latency, accuracy, and cost: larger architectures such as Qwen/Qwen3.6-35B-A3B-FP8 yield higher quality on complex reasoning but require more compute, while quantised or smaller models like cyankiwi/gemma-4-26B-A4B-it-AWQ-4bit or Qwen/Qwen3-VL-4B-Instruct serve well at lower throughputs. For most call volumes, calling a hosted API eliminates infrastructure overhead and enables elastic scaling — benefits gigarouter provides through its benchmarked, OpenAI-compatible endpoints. (Currently 0 models are live; the remainder are being onboarded.)

compare

model	params	downloads/mo	price	status
Qwen/Qwen2.5-VL-7B-Instruct	8292.2M	9.8M	~$1.341 / 1k images	coming soon
Qwen/Qwen3.6-35B-A3B-FP8	35953.9M	6.2M	~$1.341 / 1k images	coming soon
Qwen/Qwen2.5-VL-3B-Instruct	3754.6M	5.3M	~$0.626 / 1k images	coming soon
cyankiwi/gemma-4-26B-A4B-it-AWQ-4bit	26554.3M	5.1M	~$1.341 / 1k images	coming soon
Qwen/Qwen3.6-27B-FP8	27782.9M	4.9M	~$1.341 / 1k images	coming soon
Qwen/Qwen3-VL-4B-Instruct	4437.8M	3.7M	~$1.341 / 1k images	coming soon
Qwen/Qwen2-VL-2B-Instruct	2209M	3.6M	~$0.626 / 1k images	coming soon
deepseek-ai/DeepSeek-OCR-2	3389.1M	3.3M	~$0.626 / 1k images	coming soon
llava-hf/llava-1.5-7b-hf	7063.4M	3.2M	~$1.341 / 1k images	coming soon
RedHatAI/gemma-4-31B-it-FP8-block	31274.9M	3.2M	~$1.341 / 1k images	coming soon
HauhauCS/Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive	-	3M	at launch	coming soon
microsoft/Florence-2-base	231.6M	2.6M	~$0.094 / 1k images	coming soon
Qwen/Qwen3.5-0.8B	873.4M	2.5M	~$0.235 / 1k images	coming soon
Qwen/Qwen3-VL-2B-Instruct	2127.5M	2.1M	~$0.626 / 1k images	coming soon
RedHatAI/gemma-4-26B-A4B-it-FP8-Dynamic	26560.9M	2M	~$1.341 / 1k images	coming soon
cyankiwi/Qwen3.6-35B-A3B-AWQ-4bit	35951.8M	1.8M	~$1.341 / 1k images	coming soon
Qwen/Qwen2-VL-7B-Instruct	8291.4M	1.8M	~$1.341 / 1k images	coming soon
Qwen/Qwen2-VL-7B-Instruct-AWQ	8291.4M	1.8M	~$1.341 / 1k images	coming soon
unsloth/Qwen3.6-27B-MTP-GGUF	-	1.8M	at launch	coming soon
Qwen/Qwen2.5-VL-7B-Instruct-AWQ	8292.2M	1.7M	~$1.341 / 1k images	coming soon
vikhyatk/moondream2	1927.2M	1.6M	~$0.626 / 1k images	coming soon
unsloth/gemma-4-26B-A4B-it-GGUF	-	1.5M	at launch	coming soon
OpenGVLab/InternVL2-2B	2205.8M	1.5M	~$0.626 / 1k images	coming soon
empero-ai/Qwythos-9B-Claude-Mythos-5-1M-GGUF	-	1.4M	at launch	coming soon
baidu/Unlimited-OCR	3336.1M	885K	~$0.626 / 1k images	coming soon
unsloth/Qwen3.6-35B-A3B-GGUF	-	874.6K	at launch	coming soon
unsloth/Qwen3.6-35B-A3B-MTP-GGUF	-	734.7K	at launch	coming soon
DavidAU/Qwen3.6-40B-Claude-4.6-Opus-Deckard-Heretic-Uncensored-Thinking-NEO-CODE-Di-IMatrix-MAX-GGUF	-	519.4K	at launch	coming soon
HauhauCS/Gemma4-12B-QAT-Uncensored-HauhauCS-Balanced	-	71.7K	at launch	coming soon
Jackrong/Qwopus3.6-35B-A3B-Coder-MTP-GGUF	-	44.8K	at launch	coming soon
HauhauCS/Gemma4-26B-A4B-QAT-Uncensored-HauhauCS-Balanced-MTP	-	44.5K	at launch	coming soon
sahilchachra/Unlimited-OCR-GGUF	-	43.7K	at launch	coming soon

get a key + $25 free →docs