TTS

TTS

Category: Audio/Speech
Framework: PyTorch
Dataset: Gigaspeech
Created: March 26, 2025

Overview

From scratch implementation of TTS

Technical Details

  • Framework: PyTorch
  • Dataset: Gigaspeech
  • Category: Audio/Speech

Implementation Details

Trained a small transformer based TTS model coded and trained from scratch in Pytorch

(will be uploading the implementation of Wavenet soon)

Neural Speech Synthesis with Transformer Network

Model Hyperparameters

Core Architecture

Parameter Value Description
batch_size 32 Number of samples per batch
max_lr 6e-4 Maximum learning rate
dropout 0.1 General dropout rate
epochs 10 Total training epochs
block_size 80 Sequence length in tokens
src_vocab_size dynamic Source vocabulary size
phenome_embeddings_dims 512 Phoneme embedding dimension
embeddings_dims 512 Main embedding dimension
prenet_encoder_embeddings_dims 512 Encoder prenet dimension
embeddings_dims_decoder 256 Decoder-specific embedding dimension
attn_dropout 0.1 Attention dropout rate
no_of_heads 4 Attention heads per layer
no_of_decoder_layers 8 Number of decoder layers
weight_decay_optim 0.01 Optimizer weight decay
hidden_dim 2048 (4×512) FFN hidden dimension
clip 1.0 Gradient clipping threshold

Audio Processing

| Parameter | Value | Description | |————————-|———-|———————————————-| | log_mel_features | 80 | Mel spectrogram channels | | kernel_size | 5 | Convolution kernel size | | stride | (2,10) | Convolution stride (time, freq) | | sr, SAMPLING_RATE | 16000 | Audio sample rate (Hz) | | N_MELS | 80 | Number of Mel bands | | WINDOW_DURATION | 0.050s | Analysis window duration | | STRIDE_DURATION | 0.0125s | Window stride duration | | max_t | 512 | Maximum spectrogram time steps | | n_channels | 80 | Input spectrogram channels |

Dataset

Gigaspeech (can be used)

Frameworks:

Pytorch

Epochs/Steps

Steps (train) = 150

Val iterations = every 50 steps

Loss Curves

📊 View Training Loss Curves

Source Code

📁 GitHub Repository: TTS

View the complete implementation, training scripts, and documentation on GitHub.