RL From Scratch
-
Q-Learning Value-BasedTabular Q-Learning and Value Iteration implemented from scratch as educational notebooks.223—
-
DQN Flappy Value-BasedDQN agent trained on Flappy Bird using pixel observations, experience replay, and epsilon-greedy exploration.223—
-
VizDoom RL Value-BasedDQN agent trained on VizDoom Basic via Gymnasium wrapper, with grayscale preprocessing, replay buffer, and W&B logging.223—
-
GRPO Policy-BasedGroup Relative Policy Optimization — DeepSeek-R1's RL training objective implemented from scratch.223—
-
A2C (A2C) Actor-CriticImplementation of A2C reinforcement learning algorithm223—
-
DDPG Actor-CriticImplementation of DDPG reinforcement learning algorithm223—
-
DQN Frozenlake ExplorationImplementation of DQN-FrozenLake reinforcement learning algorithm223—
-
DQN Lunar ExplorationImplementation of DQN-Lunar reinforcement learning algorithm223—
-
DQN Taxi ExplorationImplementation of DQN-Taxi reinforcement learning algorithm223—
-
DQN Atari ExplorationImplementation of DQN-atari reinforcement learning algorithm223—
-
DQN ExplorationImplementation of DQN reinforcement learning algorithm223—
-
Duel DQN ExplorationImplementation of Duel-DQN reinforcement learning algorithm223—
-
Flappybird PPO Actor-CriticImplementation of FlappyBird-PPO reinforcement learning algorithm223—
-
Frozen Lake ExplorationImplementation of Frozen-Lake reinforcement learning algorithm223—
-
Imitation Learning Imitation LearningImplementation of Imitation Learning reinforcement learning algorithm223—
-
MARL Multi-AgentImplementation of MARL reinforcement learning algorithm223—
-
IPPO Multi-AgentImplementation of IPPO reinforcement learning algorithm223—
-
MAPPO Multi-AgentImplementation of MAPPO reinforcement learning algorithm223—
-
Self Play Multi-AgentImplementation of Self Play reinforcement learning algorithm223—
-
PPO Actor-CriticImplementation of PPO reinforcement learning algorithm223—
-
Atari Actor-CriticImplementation of Atari reinforcement learning algorithm223—
-
MuJoCo Actor-CriticPPO on MuJoCo benchmark223—
-
REINFORCE Actor-CriticImplementation of REINFORCE reinforcement learning algorithm223—
-
RND Actor-CriticImplementation of RND reinforcement learning algorithm223—
-
SAC Actor-CriticImplementation of SAC reinforcement learning algorithm223—
-
TD3 Actor-CriticImplementation of TD3 reinforcement learning algorithm223—