Question 1

What is the primary use case for Moonshine Streaming Small?

Accepted Answer

It is designed for low-latency, on-device English speech transcription on platforms with roughly 0.1-1 TOPS and sub-1 GB memory budgets.

Question 2

How does the Small model compare in size and speed to the Tiny and Medium variants?

Accepted Answer

The Small model has 123M parameters, compared to 34M for Tiny and 245M for Medium. It offers a balance of accuracy and efficiency, with an average WER of 7.84% across open ASR benchmarks.

Question 3

What input format does the model expect?

Accepted Answer

The model expects audio sampled at the processor's sampling rate, processed via the AutoProcessor into tensors with attention masks.

Question 4

How can I call this model via the gigarouter API?

Accepted Answer

Use the gigarouter OpenAI-compatible endpoint with your API key, passing the model ID and audio input in the request.

Question 5

What are the known limitations of this model?

Accepted Answer

The decoder is autoregressive so latency grows with transcript length, the Transformers implementation does not yet perform fully efficient streaming, and the model can hallucinate or repeat phrases on short or noisy audio.

Task	Automatic Speech Recognition (ASR)
Architecture	Streaming sliding-window Transformer encoder with autoregressive Transformer decoder
Parameters	123M
License	Not specified in model card

Dataset	WER (%)
AMI	12.54
Earnings-22	13.53
GigaSpeech	10.41
LibriSpeech (clean)	2.49
LibriSpeech (other)	6.78
SPGISpeech	3.19
TED-LIUM	3.77
VoxPopuli	9.98
Average	7.84

Moonshine Streaming Small

specs

about this model

best for

FAQ

related speech-to-text models