Moonshine

Moonshine

Category: Audio/Speech
Framework: PyTorch
Dataset: Gigaspeech
Created: March 29, 2025

Overview

From scratch implementation of Moonshine

Technical Details

  • Framework: PyTorch
  • Dataset: Gigaspeech
  • Category: Audio/Speech

Implementation Details

Trained a small transformer-based ASR model coded and trained from scratch in Pytorch.

Moonshine: Speech Recognition for Live Transcription and Voice Commands

Hyperparameters

| Parameter | Value | Description | |————————–|————|—————————————————————————–| | epochs | 10 | Total training epochs. | | batch_size | 128 | Samples per batch. | | block_size | 40 | Context window length for attention. | | embeddings_dims | 288 | Embedding dimension (must be divisible by no_of_heads). | | no_of_heads | 6 | Attention heads in multi-head attention. | | no_of_decoder_layers | 6 | Transformer decoder layers. | | dropout | 0.1 | Dropout rate for regularization. | | max_lr | 6e-4 | Peak learning rate (use with learning rate scheduler). | | weight_decay_optim | 0.1 | Weight decay for AdamW (consider reducing to 0.01 if unstable). | | sr | 16000 | Audio sampling rate (fix conflict with SAMPLING_RATE=480000 if needed). |


Dataset

Gigaspeech

Frameworks:

Pytorch

Epochs/Steps

Steps (train) = 1500

Val iterations = every 50 steps

Loss Curves

📊 View Training Loss Curves

Looks like 25 hours isnt enough thus started to overfit!

Source Code

📁 GitHub Repository: Moonshine

View the complete implementation, training scripts, and documentation on GitHub.