ORPO

Category: Fine-tuning

Framework: PyTorch

Dataset: UltraFeedback

Created: March 01, 2025

GitHub: View Implementation

Overview

From scratch implementation of ORPO

Trained OPT-330M model using ORPO in Pytorch for Instruction Following

Parameter	Value	Description
`batch_size`	2	The number of samples processed before the model is updated.
`max_lr`	8e-6	Maximum learning rate.
`device`	‘cuda:0’	The device to run the model on (e.g., ‘cuda:0’ for GPU).
`betas`	0.95,0.99	Beta values
`weight_decay`	0.1	Weight decay values for the optimizer

Pytorch

Iterations (train) = 3k

Val iterations = every 20

Train loss - 1.70

Val loss - 1.98 (at 2.5k steps)

📁 GitHub Repository: ORPO

View the complete implementation, training scripts, and documentation on GitHub.