Qwen3-TTS-12Hz-0.6B-CustomVoice
Qwen/Qwen3-TTS-12Hz-0.6B-CustomVoice
A popular open text-to-speech model, with 1.2M downloads a month. gigarouter benchmarks and hosts it as an OpenAI-compatible API.
about this model
Qwen3-TTS-12Hz-0.6B-CustomVoice is a text-to-speech model that synthesizes speech in 10 languages with fine-grained control over tone, rhythm, and emotional expression via natural language instructions.
Multilingual and Controllable Synthesis
The model supports Chinese, English, Japanese, Korean, German, French, Russian, Portuguese, Spanish, and Italian. It adapts expression based on prompts such as "speak in a very happy tone." It includes 9 premium timbres, each optimized for a native language.
Low Latency Streaming
Built on the Qwen3-TTS-Tokenizer-12Hz, the model achieves end-to-end synthesis latency as low as 97 ms, enabling streaming generation suitable for real-time applications.
Supported Speakers
| Speaker | Voice Description | Native Language |
|---|---|---|
| Vivian | Bright young female voice. | Chinese |
| Serena | Warm, gentle young female voice. | Chinese |
| Uncle_Fu | Seasoned male voice, mellow timbre. | Chinese |
| Dylan | Youthful Beijing male voice. | Chinese (Beijing) |
| Eric | Lively Chengdu male voice. | Chinese (Sichuan) |
| Ryan | Dynamic male voice with rhythm. | English |
| Aiden | Sunny American male voice. | English |
| Ono_Anna | Playful Japanese female voice. | Japanese |
| Sohee | Warm Korean female voice. | Korean |
Recommended Use Cases
This model is best suited for applications requiring multilingual voice generation with controllable style, such as interactive voice assistants, content dubbing, and accessibility tools. Developers can reference the technical report for further details on architecture and evaluation.
We're benchmarking and onboarding Qwen3-TTS-12Hz-0.6B-CustomVoice as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.