Qwen2.5-0.5B-Instruct
Qwen/Qwen2.5-0.5B-Instruct
A popular open text-gen model. gigarouter benchmarks and hosts it as an OpenAI-compatible API.
about this model
Model Overview
Qwen2.5-0.5B-Instruct is a causal language model from the Qwen2.5 series, optimized for chat and instruction-following tasks. It is available as a hosted, OpenAI-compatible API on gigarouter.
Key Capabilities
- Improved coding and mathematics performance relative to Qwen2.
- Strong instruction following, with resilience to diverse system prompts for role-play and condition-setting.
- Proficient in generating long texts (up to 8,192 tokens) and understanding structured data such as tables.
- Supports structured output generation, particularly JSON.
- Full context length of 32,768 tokens, with generation context up to 8,192 tokens.
- Multilingual support for over 29 languages including Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, and Arabic.
Architecture
- Parameters: 0.49B total (0.36B non-embedding).
- Layers: 24.
- Attention heads: 14 for Q, 2 for KV (Grouped Query Attention).
- Activation: SwiGLU; normalization: RMSNorm; includes Attention QKV bias and tied word embeddings.
- Training: pretraining followed by post-training (instruction tuning).
Evaluation
Detailed evaluation results and speed benchmarks are available in the official blog and documentation.
Best For
This model is suited for applications requiring efficient, lightweight instruction-tuned chat, structured data handling, and multilingual support within a constrained parameter budget. It is particularly effective for tasks that benefit from long context or structured output generation.
We're benchmarking and onboarding Qwen2.5-0.5B-Instruct as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.