UI-Venus 1.5 2B

inclusionAI/UI-Venus-1.5-2B

published Feb 2026 · updated Feb 2026

UI-Venus 1.5 2B is a vision-language model (VLM) that serves as a unified GUI agent for UI grounding and navigation tasks.

est. price

~$0.626

/ 1k images · estimated, set at launch

API providers

downloads / mo

946

license

apache-2.0

specs

Task	GUI Agent / UI Grounding & Navigation
Architecture	Qwen3-VL-based dense transformer
Parameters	2 billion
License	Apache 2.0

about this model

UI-Venus-1.5-2B is a vision-language model (VLM) for GUI agent tasks that performs visual grounding and autonomous navigation across mobile, desktop, and web interfaces. It is the 2B-parameter dense variant of the UI-Venus-1.5 family, which also includes 8B and 30B-A3B mixture-of-experts variants.

Benchmark Performance

UI-Venus-1.5 establishes state-of-the-art results on key grounding and agent benchmarks. The 2B variant achieves 57.7% on ScreenSpot-Pro, with the 8B and 30B-A3B models reaching 68.4% and 69.6% respectively. On VenusBench-GD the family leads with 75.0% (30B-A3B). In mobile navigation, the 30B-A3B model scores 77.6% on AndroidWorld and 55.1%/68.1% on AndroidLab, outperforming prior specialist and general-purpose VLMs.

Benchmark performance of UI-Venus-1.5 across grounding and navigation benchmarks

Grounding benchmark results for UI-Venus-1.5

Training Methodology

UI-Venus-1.5 employs a four-stage training pipeline: Mid-Training on 10B tokens from 30+ GUI datasets, offline RL, online RL with full-trajectory rollouts, and model merging that unifies grounding, web, and mobile specialists into a single checkpoint. This curriculum enables robust cross-platform generalization and long-horizon navigation in real-world environments.

best for

·Automated mobile app testing and navigation
·Web browser automation and element grounding
·GUI element detection and coordinate prediction

FAQ

What input does UI-Venus 1.5 2B accept?

It accepts a screenshot image and a textual instruction (e.g., "tap the login button").

What output does the model produce?

It outputs action predictions (e.g., tap coordinates, scroll) or grounded element locations depending on the task.

How can I call this model via the gigarouter API?

Use the OpenAI-compatible endpoint with your API key, sending a chat completion request with a system prompt, user message containing image_url and text.

What is the license for UI-Venus 1.5 2B?

It is licensed under Apache 2.0, allowing commercial use.

How does the 2B variant compare to the 8B or 30B versions?

The 2B variant is smaller and faster, suitable for resource-constrained deployments, while larger variants offer higher accuracy on benchmarks like ScreenSpot-Pro and AndroidWorld.

not yet live

We're benchmarking and onboarding UI-Venus 1.5 2B as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related vision-language models

compare all →

Qwen2.5-VL-7B-Instruct

9.8M dl/mo

Qwen3.6-35B-A3B-FP8

6.2M dl/mo

Qwen2.5-VL-3B-Instruct

5.3M dl/mo

gemma-4-26B-A4B-it-AWQ-4bit