Frozen Lake
Published:
Frozen Lake
Implementation of Frozen-Lake reinforcement learning algorithm
Technical Details
- Framework: PyTorch
- Environment: Frozenlake
- Category: Other
This project implements reinforcement learning algorithms for the Frozen Lake environment from OpenAI Gymnasium. The agent learns to navigate across a frozen lake from the start to a goal without falling into holes.
Environment Description
FrozenLake-v1 is a grid-world environment where:
- The agent navigates on a frozen lake from start (S) to goal (G)
- Some tiles are frozen (F) and safe to walk on
- Some tiles have holes (H) and the agent falls if it steps on them
- The ice is slippery, so the agent’s movement can be stochastic
Example 4x4 map:
SFFF
FHFH
FFFH
HFFG
- State space: Discrete with 16 states (for 4x4 grid) or 64 states (for 8x8 grid)
- Action space: 4 discrete actions (LEFT, DOWN, RIGHT, UP)
- Rewards: +1 for reaching the goal, 0 otherwise
Algorithms Implemented
This project includes implementations of:
- Q-Learning: A model-free, off-policy algorithm using a tabular approach
- Deep Q-Network (DQN): Neural network implementation for Q-learning
- Double DQN: Reducing overestimation bias with two networks
Features
- Multiple map sizes (4x4 and 8x8)
- Option for deterministic or stochastic (slippery) environments
- Exploration vs. exploitation control with epsilon-greedy strategy
- Visualization of learned policies
- Tracking of training metrics
- Integration with TensorBoard and WandB
Getting Started
Installation
pip install torch gymnasium numpy matplotlib tqdm tensorboard wandb
Running the Algorithms
# For tabular Q-Learning
python q_learning.py
# For DQN
python dqn.py
# For visualization of learned policy
python visualize_policy.py
Configuration
Key hyperparameters can be modified in the Config
class:
class Config:
# Environment settings
env_id = "FrozenLake-v1"
map_size = "4x4" # or "8x8"
is_slippery = True
# Algorithm parameters
learning_rate = 0.1 # for Q-Learning
gamma = 0.99 # Discount factor
epsilon_start = 1.0
epsilon_end = 0.01
epsilon_decay = 0.995
# Training parameters
total_episodes = 10000
max_steps = 100
# For DQN
buffer_size = 10000
batch_size = 64
target_update = 100
# Logging
use_wandb = True
log_interval = 100
Results
The algorithms learn efficient policies for navigating the Frozen Lake:
- Q-Learning: Converges to optimal policy after ~5000 episodes for 4x4 map
- DQN: Learns good policies but might be less sample-efficient for this simple environment
- Double DQN: Provides more stable learning, especially for the 8x8 map
Visualization
The project includes tools to visualize:
- Learning curves
- Value functions
- Optimal policies
- Step-by-step agent behavior
Challenges
- Sparse Rewards: Only getting reward at the goal makes learning difficult
- Stochasticity: The slippery environment introduces randomness in transitions
- Exploration: Finding the goal in larger environments requires efficient exploration
References
License
MIT License
Source Code
📁 GitHub Repository: Frozen Lake (Frozen Lake)
View the complete implementation, training scripts, and documentation on GitHub.