models / text-to-speech · coming soon

Qwen3-TTS-12Hz-0.6B-CustomVoice

Qwen/Qwen3-TTS-12Hz-0.6B-CustomVoice

A popular open text-to-speech model, with 1.2M downloads a month. gigarouter benchmarks and hosts it as an OpenAI-compatible API.

status

coming soon

API providers

downloads / mo

1.2M

license

apache-2.0

about this model

Qwen3-TTS-12Hz-0.6B-CustomVoice is a text-to-speech model that synthesizes speech in 10 languages with fine-grained control over tone, rhythm, and emotional expression via natural language instructions.

Multilingual and Controllable Synthesis

The model supports Chinese, English, Japanese, Korean, German, French, Russian, Portuguese, Spanish, and Italian. It adapts expression based on prompts such as "speak in a very happy tone." It includes 9 premium timbres, each optimized for a native language.

Low Latency Streaming

Built on the Qwen3-TTS-Tokenizer-12Hz, the model achieves end-to-end synthesis latency as low as 97 ms, enabling streaming generation suitable for real-time applications.

Supported Speakers

Speaker	Voice Description	Native Language
Vivian	Bright young female voice.	Chinese
Serena	Warm, gentle young female voice.	Chinese
Uncle_Fu	Seasoned male voice, mellow timbre.	Chinese
Dylan	Youthful Beijing male voice.	Chinese (Beijing)
Eric	Lively Chengdu male voice.	Chinese (Sichuan)
Ryan	Dynamic male voice with rhythm.	English
Aiden	Sunny American male voice.	English
Ono_Anna	Playful Japanese female voice.	Japanese
Sohee	Warm Korean female voice.	Korean

Recommended Use Cases

This model is best suited for applications requiring multilingual voice generation with controllable style, such as interactive voice assistants, content dubbing, and accessibility tools. Developers can reference the technical report for further details on architecture and evaluation.

not yet live

We're benchmarking and onboarding Qwen3-TTS-12Hz-0.6B-CustomVoice as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.