RL From Scratch

Q-Learning Value-Based

Tabular Q-Learning and Value Iteration implemented from scratch as educational notebooks.

★ 223

◉ —
DQN Flappy Value-Based

DQN agent trained on Flappy Bird using pixel observations, experience replay, and epsilon-greedy exploration.

★ 223

◉ —
VizDoom RL Value-Based

DQN agent trained on VizDoom Basic via Gymnasium wrapper, with grayscale preprocessing, replay buffer, and W&B logging.

★ 223

◉ —
GRPO Policy-Based

Group Relative Policy Optimization — DeepSeek-R1's RL training objective implemented from scratch.

★ 223

◉ —
A2C (A2C) Actor-Critic

Implementation of A2C reinforcement learning algorithm

★ 223

◉ —
DDPG Actor-Critic

Implementation of DDPG reinforcement learning algorithm

★ 223

◉ —
DQN Frozenlake Exploration

Implementation of DQN-FrozenLake reinforcement learning algorithm

★ 223

◉ —
DQN Lunar Exploration

Implementation of DQN-Lunar reinforcement learning algorithm

★ 223

◉ —
DQN Taxi Exploration

Implementation of DQN-Taxi reinforcement learning algorithm

★ 223

◉ —
DQN Atari Exploration

Implementation of DQN-atari reinforcement learning algorithm

★ 223

◉ —
DQN Exploration

Implementation of DQN reinforcement learning algorithm

★ 223

◉ —
Duel DQN Exploration

Implementation of Duel-DQN reinforcement learning algorithm

★ 223

◉ —
Flappybird PPO Actor-Critic

Implementation of FlappyBird-PPO reinforcement learning algorithm

★ 223

◉ —
Frozen Lake Exploration

Implementation of Frozen-Lake reinforcement learning algorithm

★ 223

◉ —
Imitation Learning Imitation Learning

Implementation of Imitation Learning reinforcement learning algorithm

★ 223

◉ —
MARL Multi-Agent

Implementation of MARL reinforcement learning algorithm

★ 223

◉ —
IPPO Multi-Agent

Implementation of IPPO reinforcement learning algorithm

★ 223

◉ —
MAPPO Multi-Agent

Implementation of MAPPO reinforcement learning algorithm

★ 223

◉ —
Self Play Multi-Agent

Implementation of Self Play reinforcement learning algorithm

★ 223

◉ —
PPO Actor-Critic

Implementation of PPO reinforcement learning algorithm

★ 223

◉ —
Atari Actor-Critic

Implementation of Atari reinforcement learning algorithm

★ 223

◉ —
MuJoCo Actor-Critic

PPO on MuJoCo benchmark

★ 223

◉ —
REINFORCE Actor-Critic

Implementation of REINFORCE reinforcement learning algorithm

★ 223

◉ —
RND Actor-Critic

Implementation of RND reinforcement learning algorithm

★ 223

◉ —
SAC Actor-Critic

Implementation of SAC reinforcement learning algorithm

★ 223

◉ —
TD3 Actor-Critic

Implementation of TD3 reinforcement learning algorithm

★ 223

◉ —

Yuvraj Singh

RL From Scratch

No RL algorithms found