Paper Replications

ML/DL Research Paper Implementations

Overview

A collection of clean, from-scratch implementations of influential ML/DL research papers in PyTorch. Each subfolder contains code, notes, and sometimes pretrained weights.

Deployed Models

Visit SmolHub to view the individual model pages and implementations.

Repository Structure

Language Models & Transformers

BERT: Bidirectional Encoder Representations from Transformers
GPT: Generative Pretrained Transformer models
Llama, Llama4: Meta’s Llama model replications
Gemma, Gemma3: Google’s Gemma models
Mixtral: Mixture-of-Experts Transformer models
DeepSeekV3: DeepSeek model replications
Kimi-K2: Kimi-K2 model replications and training scripts
Moonshine: Moonshine model experiments
Transformer: Original Transformer model and variants
Differential Transformer: Novel differential attention architectures

Vision & Multimodal Models

ViT: Vision Transformer models
CLiP: CLIP vision-language models
SigLip: Sigmoid Loss for Language-Image Pretraining
Llava: Large Language and Vision Assistant models
PaliGemma: PaliGemma multimodal model replications

Generative Adversarial Networks

DCGANs: Deep Convolutional GANs
WGANs: Wasserstein GANs
CGANs: Conditional Generative Adversarial Networks
CycleGANs: Cycle-consistent GANs for image translation
Pix2Pix: Image-to-image translation with conditional GANs

Recurrent & Sequence Models

RNNs: Recurrent Neural Networks
LSTM: Long Short-Term Memory models
GRU: Gated Recurrent Unit models
Seq2Seq: Sequence-to-sequence models
Encoder-Decoder: Encoder-decoder architectures

RLHF & Fine-Tuning Techniques

DPO: Direct Preference Optimization
ORPO: Online RLHF Preference Optimization
SimplePO: Simple Preference Optimization
LoRA: Low-Rank Adaptation for efficient fine-tuning
Fine Tuning using PEFT: Parameter-Efficient Fine-Tuning methods

Audio & Speech Models

Whisper: OpenAI’s Whisper speech recognition model
TTS: Text-to-Speech models
CLAP: Contrastive Language-Audio Pretraining

Other Architectures

Attention Mechanisms: Various attention mechanisms and patterns
VAE: Variational Autoencoders
DDP: Distributed Data Parallel training experiments

Key Features

30+ Model Implementations: From classic RNNs to modern LLMs
Production-Ready Code: Clean, documented, and tested implementations
Training Scripts: End-to-end training pipelines with distributed training support

Yuvraj Singh