Differential Transformer

Differential Transformer

Category: Language Models
Framework: PyTorch
Dataset: TinyShakespeare
Created: February 09, 2025

Overview

From scratch implementation of Differential Transformer

Key Features

  • Transformer Architecture

Technical Details

  • Framework: PyTorch
  • Dataset: TinyShakespeare
  • Category: Language Models

Implementation Details

I implemented the Differential Transformers using Pytorch on Tinyshakespeare dataset.

Differential Transformers

Datasets

Tineshakespeare: in the /data folder

Frameworks:

Pytorch

Results (on A100 GPU Single)

Training steps: 2000 Training steps: per 100 training steps

Train loss: 5.95 Val loss: 5.98

Source Code

๐Ÿ“ GitHub Repository: Differential Transformer

View the complete implementation, training scripts, and documentation on GitHub.