Moonshine Streaming Small
UsefulSensors/moonshine-streaming-small
published Jan 2026 · updated Feb 2026
Moonshine Streaming Small is a 123M parameter streaming automatic speech recognition (ASR) model that pairs a lightweight audio frontend with a sliding-window Transformer encoder for low-latency transcription on edge hardware.
specs
| Task | Automatic Speech Recognition (ASR) |
| Architecture | Streaming sliding-window Transformer encoder with autoregressive Transformer decoder |
| Parameters | 123M |
| License | Not specified in model card |
about this model
Moonshine Streaming Small is an automatic speech recognition (ASR) model hosted on gigarouter, designed for low-latency streaming transcription on edge-class hardware. It pairs a lightweight 50 Hz audio frontend with a sliding-window Transformer encoder that uses bounded local attention and no positional embeddings (an ergodic encoder), enabling efficient streaming. The model has 123 million parameters across 10 encoder and 10 decoder layers with an encoder dimension of 620 and decoder dimension of 512.
Trained on approximately 300K hours of English speech data, Moonshine Streaming Small achieves competitive word error rates (WER) on standard benchmarks:
| Dataset | WER (%) |
|---|---|
| AMI | 12.54 |
| Earnings-22 | 13.53 |
| GigaSpeech | 10.41 |
| LibriSpeech (clean) | 2.49 |
| LibriSpeech (other) | 6.78 |
| SPGISpeech | 3.19 |
| TED-LIUM | 3.77 |
| VoxPopuli | 9.98 |
| Average | 7.84 |
Key strengths include low initial latency due to the streaming encoder design and suitability for devices with 0.1–1 TOPS compute and sub-1 GB memory budgets. Known limitations: the autoregressive decoder causes latency proportional to transcript length; the current Hugging Face Transformers integration does not yet implement fully efficient streaming (falling back to flash-attention for sliding-window); and like other seq2seq models, it may hallucinate or repeat on short or noisy audio. Intended use cases include live captioning, voice commands, and real-time transcription on constrained hardware.
best for
- ·Live captioning on edge devices
- ·Voice command recognition
- ·Real-time transcription on memory-constrained hardware
FAQ
It is designed for low-latency, on-device English speech transcription on platforms with roughly 0.1-1 TOPS and sub-1 GB memory budgets.
The Small model has 123M parameters, compared to 34M for Tiny and 245M for Medium. It offers a balance of accuracy and efficiency, with an average WER of 7.84% across open ASR benchmarks.
The model expects audio sampled at the processor's sampling rate, processed via the AutoProcessor into tensors with attention masks.
Use the gigarouter OpenAI-compatible endpoint with your API key, passing the model ID and audio input in the request.
The decoder is autoregressive so latency grows with transcript length, the Transformers implementation does not yet perform fully efficient streaming, and the model can hallucinate or repeat phrases on short or noisy audio.
We're benchmarking and onboarding Moonshine Streaming Small as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.