| NeatRL | Deep Reinforcement Learning Algorithms Library |
Date:
Deep Reinforcement Learning Projects
This repository contains various ONE-FILE implementations of deep reinforcement learning algorithms.
🚀 Primary Use: Training with NeatRL Library
NeatRL is the main training library in this repository. It provides high-quality implementations of popular RL algorithms with a focus on simplicity, performance, and ease of use.
Quick Training with NeatRL
# Install NeatRL
pip install neatrl"[classic,box2d,atari]"
# Train DQN on CartPole in 3 lines
from neatrl import train_dqn
model = train_dqn(
env_id="CartPole-v1",
total_timesteps=10000,
seed=42
)
Advanced Training Features
- Experiment Tracking: Built-in Weights & Biases integration
- Video Recording: Automatic training progress videos
- Hyperparameter Tuning: Easy configuration of all training parameters
- Multiple Environments: Support for Gymnasium environments
📖 Complete NeatRL Documentation
Project Structure
NeatRL Library (Primary Training Tool)
- neatrl/: Main NeatRL library with DQN implementation and training utilities
Additional Algorithm Implementations
- DQN: Deep Q-Network implementation for CartPole and LunarLander environments
- DQN-atari: DQN adapted for Atari games with convolutional networks
- DQN-flappy: DQN implementation for FlappyBird environment
- DQN-Lunar: DQN specifically tuned for the Lunar Lander environment
- DQN-Taxi: DQN for the discrete Taxi-v3 environment
- DQN-FrozenLake: DQN implementation for the FrozenLake environment
- Duel-DQN: Dueling DQN with separate value and advantage streams for CliffWalking
- Q-Learning: Classic tabular Q-learning implementations
Policy-Based Methods
- REINFORCE: Monte Carlo policy gradient method for CartPole environment
- A2C: Advantage Actor-Critic implementation for multiple environments (CartPole, FrozenLake, LunarLander)
- PPO: Proximal Policy Optimization with clipped surrogate objective for LunarLander
- FlappyBird-PPO: PPO implementation specifically for FlappyBird environment
Actor-Critic Methods (Continuous Control)
- DDPG: Deep Deterministic Policy Gradient for continuous action spaces (Pendulum, BipedalWalker)
- TD3: Twin Delayed DDPG with twin critics and delayed policy updates
- SAC: Soft Actor-Critic with maximum entropy reinforcement learning
Exploration & Advanced Methods
- RND: Random Network Distillation combined with PPO for curiosity-driven exploration
- NeatRL: NEAT (NeuroEvolution of Augmenting Topologies) reinforcement learning implementations
Game-Specific Implementations
- Pong: Classic Pong environment implementations
- VizDoom-RL: Reinforcement learning in VizDoom 3D environments
- Frozen-Lake: Specialized implementations for FrozenLake environment
- SimpleRLGames: Collection of simple RL game implementations
Unity ML-Agents
- ml-agents: Unity ML-Agents toolkit for training intelligent agents in Unity environments
- ml-agents-train: Training scripts and utilities for Unity ML-Agents
Key Features
- Comprehensive Algorithm Coverage: Implementations spanning value-based (DQN variants), policy-based (REINFORCE, A2C, PPO), and actor-critic methods (DDPG, TD3, SAC)
- Multiple Environment Support: Code for various Gymnasium/OpenAI Gym environments including discrete and continuous action spaces
- Advanced Techniques: Experience replay, target networks, dueling architectures, curiosity-driven exploration (RND)
- Continuous Control: Specialized implementations for continuous action spaces with advanced algorithms
- Visualization & Logging: Integration with TensorBoard and Weights & Biases (WandB) for comprehensive experiment tracking
- Game-Specific Optimizations: Tailored implementations for specific games and environments
- Unity Integration: ML-Agents support for training in Unity environments
- Trained Models: Saved model weights and training logs for reproducible results
- Comprehensive Logging: Track metrics like Q-values, advantage, episode returns, and exploration statistics
Reinforcement Learning Concepts
This repository explores comprehensive RL concepts across different paradigms:
Value-Based Methods
- Deep Q-Networks (DQN): Neural network function approximation for Q-values
- Experience Replay: Store and reuse past experiences for stable learning
- Target Networks: Stabilize training by reducing correlation between updates
- Dueling Networks: Separate value and advantage estimation for better learning
Policy-Based Methods
- Policy Gradient (REINFORCE): Direct policy optimization using Monte Carlo returns
- Actor-Critic Methods: Combine policy gradients with value function estimation
- Advantage Functions: Reduce variance in policy gradient estimates
- Proximal Policy Optimization: Stable policy updates with clipped objectives
Continuous Control
- Deterministic Policy Gradients: Handle continuous action spaces efficiently
- Twin Critics: Reduce overestimation bias in Q-value estimation
- Soft Actor-Critic: Maximum entropy reinforcement learning for robust policies
- Noise Injection: Exploration strategies for continuous action spaces
Advanced Techniques
- Curiosity-Driven Learning: Intrinsic motivation through prediction error (RND)
- Multi-Environment Training: Consistent algorithms across different domains
- Exploration vs. Exploitation: Various strategies including epsilon-greedy and entropy bonuses
- Unity Integration: Real-time training in complex 3D environments
Results
Each implementation includes trained models and performance visualizations. Check the individual project READMEs for specific results.
Extending the Projects
Ideas for extensions and improvements:
Algorithm Enhancements
- Implement Rainbow DQN with all improvements (prioritized replay, noisy nets, etc.)
- Add Double DQN and other DQN variants
- Implement advanced policy gradient methods (TRPO, IMPALA)
- Add multi-agent reinforcement learning (MADDPG, QMIX)
Architecture Improvements
- Experiment with different neural network architectures (CNNs, RNNs, Transformers)
- Implement attention mechanisms for partially observable environments
- Add hierarchical reinforcement learning approaches
- Explore meta-learning and few-shot adaptation
Environment Extensions
- Apply algorithms to custom environments and real-world problems
- Implement curriculum learning for complex environments
- Add support for partial observability and memory-based agents
- Create multi-task learning setups
Training Enhancements
- Implement distributed training across multiple GPUs/machines
- Add hyperparameter optimization and automated tuning
- Implement model-based reinforcement learning approaches
- Add imitation learning and learning from human feedback
References
- Sutton & Barto RL Book
- DQN Paper (Mnih et al.)
- Gymnasium Documentation
- PyTorch Documentation
- Stable Baselines3
Citation
If you use this repository in your research, please cite it as:
@misc{singh2025deep-rl-projects,
author = {YuvrajSingh-mist},
title = {Deep Reinforcement Learning Algorithms Implementations},
year = {2025},
howpublished = {GitHub repository},
url = {https://github.com/YuvrajSingh-mist/NeatRL},
note = {commit 477ff21}
}
License
MIT License