ML/DL Research Paper Implementations
Overview
A collection of clean, from-scratch implementations of influential ML/DL research papers in PyTorch. Each subfolder contains code, notes, and sometimes pretrained weights.
Deployed Models
Visit SmolHub to view the individual model pages and implementations.
Repository Structure
Language Models & Transformers
- BERT: Bidirectional Encoder Representations from Transformers
- GPT: Generative Pretrained Transformer models
- Llama, Llama4: Meta’s Llama model replications
- Gemma, Gemma3: Google’s Gemma models
- Mixtral: Mixture-of-Experts Transformer models
- DeepSeekV3: DeepSeek model replications
- Kimi-K2: Kimi-K2 model replications and training scripts
- Moonshine: Moonshine model experiments
- Transformer: Original Transformer model and variants
- Differential Transformer: Novel differential attention architectures
Vision & Multimodal Models
- ViT: Vision Transformer models
- CLiP: CLIP vision-language models
- SigLip: Sigmoid Loss for Language-Image Pretraining
- Llava: Large Language and Vision Assistant models
- PaliGemma: PaliGemma multimodal model replications
Generative Adversarial Networks
- DCGANs: Deep Convolutional GANs
- WGANs: Wasserstein GANs
- CGANs: Conditional Generative Adversarial Networks
- CycleGANs: Cycle-consistent GANs for image translation
- Pix2Pix: Image-to-image translation with conditional GANs
Recurrent & Sequence Models
- RNNs: Recurrent Neural Networks
- LSTM: Long Short-Term Memory models
- GRU: Gated Recurrent Unit models
- Seq2Seq: Sequence-to-sequence models
- Encoder-Decoder: Encoder-decoder architectures
RLHF & Fine-Tuning Techniques
- DPO: Direct Preference Optimization
- ORPO: Online RLHF Preference Optimization
- SimplePO: Simple Preference Optimization
- LoRA: Low-Rank Adaptation for efficient fine-tuning
- Fine Tuning using PEFT: Parameter-Efficient Fine-Tuning methods
Audio & Speech Models
- Whisper: OpenAI’s Whisper speech recognition model
- TTS: Text-to-Speech models
- CLAP: Contrastive Language-Audio Pretraining
Other Architectures
- Attention Mechanisms: Various attention mechanisms and patterns
- VAE: Variational Autoencoders
- DDP: Distributed Data Parallel training experiments
Key Features
- 30+ Model Implementations: From classic RNNs to modern LLMs
- Production-Ready Code: Clean, documented, and tested implementations
- Training Scripts: End-to-end training pipelines with distributed training support