DQN Flappy
Overview
DQN agent trained on the Flappy Bird game environment from raw pixel observations. The agent learns to navigate through pipes by choosing to flap or do nothing at each timestep.
Algorithm
Standard DQN with:
- Experience replay: transitions stored in a replay buffer, sampled randomly to break correlation
- Target network: separate frozen network for computing TD targets, updated periodically
- Epsilon-greedy: epsilon decays from 1.0 to a minimum over training to balance exploration/exploitation
Update Rule
TD target = r + gamma * max_a' Q_target(s', a')
Loss = MSE(Q(s, a), TD target)
Features
- Convolutional Q-network processing raw game frames
- Frame preprocessing (grayscale, resize, normalize)
- W&B experiment tracking
- Video recording of agent performance
References
Playing Atari with Deep Reinforcement Learning — Mnih et al., DeepMind 2013