DQN Flappy

Value-Based PyTorch Flappy Bird

Overview

DQN agent trained on the Flappy Bird game environment from raw pixel observations. The agent learns to navigate through pipes by choosing to flap or do nothing at each timestep.

Algorithm

Standard DQN with:

Experience replay: transitions stored in a replay buffer, sampled randomly to break correlation
Target network: separate frozen network for computing TD targets, updated periodically
Epsilon-greedy: epsilon decays from 1.0 to a minimum over training to balance exploration/exploitation

Update Rule

TD target = r + gamma * max_a' Q_target(s', a')
Loss = MSE(Q(s, a), TD target)

Features

Convolutional Q-network processing raw game frames
Frame preprocessing (grayscale, resize, normalize)
W&B experiment tracking
Video recording of agent performance

References

Playing Atari with Deep Reinforcement Learning — Mnih et al., DeepMind 2013

Yuvraj Singh

Overview

Algorithm

Update Rule

Features

References