models / text-to-speech · coming soon

Qwen3-TTS-12Hz-1.7B-CustomVoice

Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice

A popular open text-to-speech model, with 2M downloads a month. gigarouter benchmarks and hosts it as an OpenAI-compatible API.

status

coming soon

API providers

downloads / mo

license

apache-2.0

about this model

Qwen3-TTS-12Hz-1.7B-CustomVoice is a text-to-speech (TTS) model that provides style control over target timbres via user instructions and supports 9 premium timbres covering combinations of gender, age, language, and dialect across 10 languages.

The model uses a discrete multi-codebook language model architecture for end-to-end speech modeling, bypassing the bottlenecks and cascading errors of traditional LM+DiT pipelines. It is built on Qwen3-TTS-Tokenizer-12Hz, which performs efficient acoustic compression and high-dimensional semantic modeling while preserving paralinguistic and acoustic environmental features.

Key Capabilities

Multilingual support: Chinese, English, Japanese, Korean, German, French, Russian, Portuguese, Spanish, and Italian, including multiple dialectal voice profiles.
Low-latency streaming: outputs the first audio packet immediately after a single character is input; end-to-end synthesis latency as low as 97 ms.
Instruction-driven voice control: natural language instructions enable adaptive adjustment of timbre, emotion, tone, and speaking rate.
Contextual understanding: adapts emotional expression and prosody based on text semantics and instructions.
Robustness to noisy input text.

Supported Speakers (CustomVoice)

The model includes 9 built-in premium timbres. Using each speaker’s native language yields best quality; all speakers can speak any supported language.

Speaker	Voice Description	Native Language
Vivian	Bright, slightly edgy young female voice	Chinese
Serena	Warm, gentle young female voice	Chinese
Uncle_Fu	Seasoned male voice with a low, mellow timbre	Chinese
Dylan	Youthful Beijing male voice with a clear, natural timbre	Chinese (Beijing Dialect)
Eric	Lively Chengdu male voice with a slightly husky brightness	Chinese (Sichuan Dialect)
Ryan	Dynamic male voice with strong rhythmic drive	English
Aiden	Sunny American male voice with a clear midrange	English
Ono_Anna	Playful Japanese female voice with a light, nimble timbre	Japanese
Sohee	Warm Korean female voice with rich emotion	Korean

Best Use Cases

Real-time interactive applications requiring low-latency streaming TTS.
Multilingual voice output with fine-grained control over timbre, emotion, and dialect.
Production deployments that need a single model to handle both streaming and non-streaming generation via a dual-track hybrid architecture.

This model is hosted on gigarouter as a managed, OpenAI-compatible API. No local installation or model loading is required — simply call the API endpoint.

not yet live

We're benchmarking and onboarding Qwen3-TTS-12Hz-1.7B-CustomVoice as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.