Differential Transformer
Differential Transformer
Overview
From scratch implementation of Differential Transformer
Key Features
- Transformer Architecture
Technical Details
- Framework: PyTorch
- Dataset: TinyShakespeare
- Category: Language Models
Implementation Details
I implemented the Differential Transformers using Pytorch on Tinyshakespeare dataset.
Datasets
Tineshakespeare: in the /data folder
Frameworks:
Pytorch
Results (on A100 GPU Single)
Training steps: 2000 Training steps: per 100 training steps
Train loss: 5.95 Val loss: 5.98
Source Code
๐ GitHub Repository: Differential Transformer
View the complete implementation, training scripts, and documentation on GitHub.