Moonshine Base
UsefulSensors/moonshine-base
published Nov 2024 · updated Jan 2025
Moonshine Base is an automatic speech recognition (ASR) model that transcribes English speech to text, optimized for live transcription and voice commands.
specs
| Task | Automatic Speech Recognition (ASR) – English speech-to-text |
| Architecture | Encoder-decoder transformer with Rotary Position Embedding (RoPE) |
| Parameters | 61 million |
| Languages | English only |
about this model
Moonshine is an automatic speech recognition (ASR) model that transcribes English speech audio into English text, optimized for live transcription and voice command processing on resource-constrained hardware.
Architecture and Key Strengths
Moonshine uses an encoder-decoder transformer architecture with Rotary Position Embedding (RoPE) instead of traditional absolute position embeddings. It is trained on speech segments of varying lengths without zero-padding, improving encoder inference efficiency. The model is available in two sizes: Tiny (27M parameters) and Base (61M parameters), both English-only.
Performance
When benchmarked against OpenAI's Whisper tiny-en, Moonshine Tiny demonstrates a 5x reduction in compute requirements for transcribing a 10-second speech segment while incurring no increase in word error rates across standard evaluation datasets. The models are trained on 200,000 hours of audio and corresponding transcripts collected from the internet and openly available datasets.
Limitations
Like other sequence-to-sequence ASR models, Moonshine may produce hallucinations (text not present in the audio) or repetitive text, particularly with short audio segments or segments where words are cut off at the beginning or end. The model is intended for English speech transcription only and has not been robustly evaluated for classification, speaker identification, or other non-transcription tasks.
Additional Resources
best for
- ·Live transcription of English speech in real-time applications
- ·Voice command processing for resource-constrained devices
- ·Low-latency voice agents and conversational AI
FAQ
Moonshine Base is a 61-million-parameter English ASR model from Useful Sensors, designed for fast, on-device speech transcription.
Moonshine Tiny (27M) achieves a 5x compute reduction over Whisper tiny-en with no increase in word error rate; Moonshine Base (61M) offers higher accuracy while remaining efficient.
No, it is English-only. The model card lists only an English-only version for the base size.
Input is raw audio waveform (sampled at the processor’s sampling rate), output is transcribed English text.
Use the gigarouter OpenAI-compatible endpoint with your API key to send audio and receive transcriptions.
We're benchmarking and onboarding Moonshine Base as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.