SimplePO

Category: Fine-tuning

Framework: PyTorch

Dataset: UltraFeedback

Created: April 04, 2025

GitHub: View Implementation

Overview

From scratch implementation of SimplePO

Trained OPT-330M model using SimplePO in Pytorch for Instruction Following

Parameter	Value	Description
`batch_size`	128	The number of samples processed before the model is updated.
`max_lr`	2e-5	Maximum learning rate.
`device`	‘cuda:0’	The device to run the model on (e.g., ‘cuda:0’ for GPU).
`beta`	2	Beta values
`gamma`	1.6	Gamma values for the optimizer

Pytorch

📁 GitHub Repository: SimplePO

View the complete implementation, training scripts, and documentation on GitHub.