VizDoom RL

Value-Based PyTorch VizDoom
GitHub →

Overview

DQN agent trained on the VizDoom Basic scenario (VizdoomBasic-v0) using Gymnasium’s VizDoom wrapper. The agent learns to navigate a 3D first-person environment and eliminate enemies using raw pixel observations.

Architecture

3-layer CNN + 2 FC layers:

Layer Spec
Conv 1 32 filters, 8×8, stride 4, ReLU
Conv 2 32 filters, 4×4, stride 2, ReLU
Conv 3 64 filters, 3×3, stride 3, ReLU
FC 1 512 units, ReLU
FC 2 512 units, ReLU
Output Action space dim

Image Preprocessing

  • RGB → grayscale, channel-first
  • Resize to 128×128
  • Normalize to [0, 1]
  • Handles dict observations (obs['screen'])

Training Config

Hyperparameter Value
Total timesteps 1,000,000
Learning rate 2e-4
Buffer size 30,000
Batch size 128
Gamma 0.99
Epsilon start/end 1.0 → 0.05
Exploration fraction 0.5
Target update freq 50 steps
Optimizer Adam, MSE loss

Features

  • Target network with polyak averaging (tau)
  • SB3 replay buffer for efficient sampling
  • W&B logging (episodic return, Q-values, epsilon)
  • Periodic evaluation with video export

References

Playing Atari with Deep Reinforcement Learning — Mnih et al., DeepMind 2013