Moonshine
Moonshine
Overview
From scratch implementation of Moonshine
Technical Details
- Framework: PyTorch
- Dataset: Gigaspeech
- Category: Audio/Speech
Implementation Details
Trained a small transformer-based ASR model coded and trained from scratch in Pytorch.
Moonshine: Speech Recognition for Live Transcription and Voice Commands
Hyperparameters
| Parameter | Value | Description |
|————————–|————|—————————————————————————–|
| epochs
| 10 | Total training epochs. |
| batch_size
| 128 | Samples per batch. |
| block_size
| 40 | Context window length for attention. |
| embeddings_dims
| 288 | Embedding dimension (must be divisible by no_of_heads
). |
| no_of_heads
| 6 | Attention heads in multi-head attention. |
| no_of_decoder_layers
| 6 | Transformer decoder layers. |
| dropout
| 0.1 | Dropout rate for regularization. |
| max_lr
| 6e-4 | Peak learning rate (use with learning rate scheduler). |
| weight_decay_optim
| 0.1 | Weight decay for AdamW (consider reducing to 0.01
if unstable). |
| sr
| 16000 | Audio sampling rate (fix conflict with SAMPLING_RATE=480000
if needed). |
Dataset
Frameworks:
Pytorch
Epochs/Steps
Steps (train) = 1500
Val iterations = every 50 steps
Loss Curves
Looks like 25 hours isnt enough thus started to overfit!
Source Code
📁 GitHub Repository: Moonshine
View the complete implementation, training scripts, and documentation on GitHub.