SimplePO
SimplePO
Overview
From scratch implementation of SimplePO
Technical Details
- Framework: PyTorch
- Dataset: UltraFeedback
- Category: Fine-tuning
Implementation Details
Trained OPT-330M model using SimplePO in Pytorch for Instruction Following
SimplePO: Simple Preference Optimization with a Reference-Free Reward
ModelArgs Hyperparameters
Parameter | Value | Description |
---|---|---|
batch_size |
128 | The number of samples processed before the model is updated. |
max_lr |
2e-5 | Maximum learning rate. |
device |
‘cuda:0’ | The device to run the model on (e.g., ‘cuda:0’ for GPU). |
beta |
2 | Beta values |
gamma |
1.6 | Gamma values for the optimizer |
Datasets
Frameworks:
Pytorch
Source Code
📁 GitHub Repository: SimplePO
View the complete implementation, training scripts, and documentation on GitHub.