Paper Replications

ML/DL Research Paper Implementations

Overview

A collection of clean, from-scratch implementations of influential ML/DL research papers in PyTorch. Each subfolder contains code, notes, and sometimes pretrained weights.

Deployed Models

Visit SmolHub to view the individual model pages and implementations.

Repository Structure

Language Models & Transformers

  • BERT: Bidirectional Encoder Representations from Transformers
  • GPT: Generative Pretrained Transformer models
  • Llama, Llama4: Meta’s Llama model replications
  • Gemma, Gemma3: Google’s Gemma models
  • Mixtral: Mixture-of-Experts Transformer models
  • DeepSeekV3: DeepSeek model replications
  • Kimi-K2: Kimi-K2 model replications and training scripts
  • Moonshine: Moonshine model experiments
  • Transformer: Original Transformer model and variants
  • Differential Transformer: Novel differential attention architectures

Vision & Multimodal Models

  • ViT: Vision Transformer models
  • CLiP: CLIP vision-language models
  • SigLip: Sigmoid Loss for Language-Image Pretraining
  • Llava: Large Language and Vision Assistant models
  • PaliGemma: PaliGemma multimodal model replications

Generative Adversarial Networks

  • DCGANs: Deep Convolutional GANs
  • WGANs: Wasserstein GANs
  • CGANs: Conditional Generative Adversarial Networks
  • CycleGANs: Cycle-consistent GANs for image translation
  • Pix2Pix: Image-to-image translation with conditional GANs

Recurrent & Sequence Models

  • RNNs: Recurrent Neural Networks
  • LSTM: Long Short-Term Memory models
  • GRU: Gated Recurrent Unit models
  • Seq2Seq: Sequence-to-sequence models
  • Encoder-Decoder: Encoder-decoder architectures

RLHF & Fine-Tuning Techniques

  • DPO: Direct Preference Optimization
  • ORPO: Online RLHF Preference Optimization
  • SimplePO: Simple Preference Optimization
  • LoRA: Low-Rank Adaptation for efficient fine-tuning
  • Fine Tuning using PEFT: Parameter-Efficient Fine-Tuning methods

Audio & Speech Models

  • Whisper: OpenAI’s Whisper speech recognition model
  • TTS: Text-to-Speech models
  • CLAP: Contrastive Language-Audio Pretraining

Other Architectures

  • Attention Mechanisms: Various attention mechanisms and patterns
  • VAE: Variational Autoencoders
  • DDP: Distributed Data Parallel training experiments

Key Features

  • 30+ Model Implementations: From classic RNNs to modern LLMs
  • Production-Ready Code: Clean, documented, and tested implementations
  • Training Scripts: End-to-end training pipelines with distributed training support

Sponsors