GLM-ASR Nano 2512

zai-org/GLM-ASR-Nano-2512

published Dec 2025 · updated Apr 2026

GLM-ASR Nano 2512 is a robust, open-source automatic speech recognition (ASR) model optimized for dialect support and low-volume speech.

status

coming soon

API providers

downloads / mo

133.7K

license

mit

specs

Task	Automatic Speech Recognition (ASR)
Architecture	Transformer-based seq2seq
Parameters	1.5B
Supported Languages	17 languages (WER ≤ 20%)

about this model

GLM-ASR-Nano-2512 is a speech recognition model with 1.5 billion parameters that transcribes audio into text, optimized for Mandarin, English, Cantonese, and 14 other languages with WER ≤ 20%.

Key Capabilities

Dialect support: Highly optimized for Cantonese and other dialects beyond standard Mandarin and English.
Low-volume speech robustness: Trained to accurately transcribe extremely quiet or whispered audio that traditional models often miss.

Benchmark Performance

The model achieves the lowest average error rate (4.10) among comparable open-source models, outperforming OpenAI Whisper V3 on multiple benchmarks. It shows significant advantages in Chinese benchmarks including Wenet Meeting (real-world meeting scenarios with noise and overlapping speech) and Aishell-1 (standard Mandarin).

Benchmark results comparing GLM-ASR-Nano against other models across Wenet Meeting, Aishell-1, and Whisper V3

The model supports 17 languages with high usability and is available as a hosted, OpenAI-compatible API on gigarouter.

best for

·Transcribing Cantonese and other dialects
·Low-volume or quiet speech recognition
·Real-world meeting transcription with noise and overlapping speech

FAQ

What languages does GLM-ASR Nano 2512 support?

It supports 17 languages with high usability, including Mandarin, English, Cantonese, and others.

How many parameters does this model have?

It has 1.5 billion parameters.

How does it compare to OpenAI Whisper V3?

It outperforms Whisper V3 on multiple benchmarks, achieving the lowest average error rate (4.10) among comparable open-source models.

What input format does the model accept?

It accepts audio as a URL or numpy array, processed via the AutoProcessor.

How can I use this model via gigarouter?

Use the OpenAI-compatible endpoint with your API key; refer to gigarouter documentation for details.

not yet live

We're benchmarking and onboarding GLM-ASR Nano 2512 as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related speech-to-text models

compare all →

speaker-diarization-3.1

wav2vec2-large-xlsr-53-japanese

6.1M dl/mo

wav2vec2-large-xlsr-53-polish

4.7M dl/mo

wav2vec2-large-xlsr-53-dutch

4.1M dl/mo