SimplePO

SimplePO

Category: Fine-tuning
Framework: PyTorch
Dataset: UltraFeedback
Created: April 04, 2025

Overview

From scratch implementation of SimplePO

Technical Details

  • Framework: PyTorch
  • Dataset: UltraFeedback
  • Category: Fine-tuning

Implementation Details

Trained OPT-330M model using SimplePO in Pytorch for Instruction Following

SimplePO: Simple Preference Optimization with a Reference-Free Reward

ModelArgs Hyperparameters

Parameter Value Description
batch_size 128 The number of samples processed before the model is updated.
max_lr 2e-5 Maximum learning rate.
device ‘cuda:0’ The device to run the model on (e.g., ‘cuda:0’ for GPU).
beta 2 Beta values
gamma 1.6 Gamma values for the optimizer

Datasets

UltraFeedback

Frameworks:

Pytorch

Source Code

📁 GitHub Repository: SimplePO

View the complete implementation, training scripts, and documentation on GitHub.