skip to content
gigarouter gigarouter
models / speech-to-text · coming soon

mms-300m-1130-forced-aligner

MahmoudAshraf/mms-300m-1130-forced-aligner

A popular open speech-to-text model, with 3.2M downloads a month. gigarouter benchmarks and hosts it as an OpenAI-compatible API.

status
coming soon
API providers
0
downloads / mo
3.2M
license
cc-by-nc-4.0

about this model

MahmoudAshraf/mms-300m-1130-forced-aligner is an automatic speech recognition (ASR) model optimized for forced alignment between text and audio. It is a conversion of the MMS-300M checkpoint, originally trained on a forced alignment dataset, from torchaudio to Hugging Face Transformers format.

Key Strengths

  • Efficient forced alignment with significantly lower memory usage compared to the TorchAudio forced alignment API.
  • Supports multilingual text preprocessing with romanization via ISO-639-3 language codes.
  • Designed for batch processing of audio emissions to improve throughput.

Best For

  • Aligning transcribed text to audio at the word level for applications such as subtitle generation, pronunciation analysis, and audio segmentation.
  • Use cases requiring accurate timestamp extraction from speech data with minimal computational overhead.

Performance

The model leverages the MMS-300M architecture, which has demonstrated strong results on forced alignment benchmarks. The conversion to Hugging Face format enables seamless integration with modern ASR pipelines while maintaining the alignment accuracy of the original checkpoint.

Workflow Overview

The model processes audio and text through the following pipeline: load audio and alignment model, generate emissions, preprocess text (with optional romanization), compute alignments, extract spans, and produce word-level timestamps. This is handled automatically when using the model via the gigarouter API.

not yet live

We're benchmarking and onboarding mms-300m-1130-forced-aligner as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.