SAC

Published: August 21, 2025

SAC

Category: Actor-Critic

Framework: PyTorch

Environment: Gymnasium

Created: August 21, 2025

GitHub: View Implementation

Implementation of SAC reinforcement learning algorithm

Technical Details

Framework: PyTorch
Environment: Gymnasium
Category: Actor-Critic Methods

This directory contains implementations of the Soft Actor-Critic (SAC) algorithm for various continuous control environments.

Overview

SAC is an off-policy actor-critic algorithm designed for continuous action spaces that optimizes a stochastic policy in an off-policy way. It incorporates several key features:

Maximum Entropy Reinforcement Learning: Encourages exploration by maximizing the policy entropy along with the expected return.
Actor-Critic Architecture: Uses a critic to estimate the Q-values and an actor to learn the policy.
Off-Policy Learning: Can learn from previously collected data, making it sample-efficient.
Soft Policy Updates: Uses soft updates of the target networks to improve stability.

Key features of this implementation:

Entropy-regularized reinforcement learning
Actor-Critic architecture with automatic temperature tuning
Experience replay buffer for stable learning
Soft target network updates using Polyak averaging
Stochastic policy for better exploration
Support for different continuous control environments

Environments

This implementation includes support for the following environments:

Pendulum-v1: A classic control problem where the goal is to balance a pendulum in an upright position.
BipedalWalker-v3: A more challenging environment where a 2D biped robot must walk forward without falling.

Configuration

Each implementation includes a Config class that specifies the hyperparameters for training. You can modify these parameters to experiment with different settings:

exp_name: Name of the experiment
seed: Random seed for reproducibility
env_id: ID of the Gymnasium environment
total_timesteps: Total number of training steps
learning_rate: Learning rate for the optimizer
buffer_size: Size of the replay buffer
gamma: Discount factor
tau: Soft update coefficient for target networks
batch_size: Batch size for training
exploration_fraction: Fraction of total timesteps for exploration
learning_starts: Number of timesteps before learning starts
train_frequency: Frequency of updates to the networks

Architecture

The SAC implementation includes:

Actor Network (Policy): Outputs a mean and log standard deviation for each action dimension, defining a Gaussian distribution over actions.
Twin Critic Networks: Two separate Q-value networks to mitigate overestimation bias.
Temperature Parameter (Alpha): Automatically adjusted to maintain a target entropy level.
Target Networks: Slowly updated copies of the critic networks for stability.
Replay Buffer: Stores and samples transitions for training.

Key Advantages of SAC

SAC offers several advantages over other continuous control algorithms:

Sample Efficiency: Off-policy learning allows SAC to reuse past experiences.
Stability: The entropy term and soft updates help stabilize training.
Exploration-Exploitation Balance: The maximum entropy framework naturally balances exploration and exploitation.
Performance: SAC has shown state-of-the-art performance across many continuous control tasks.
Robustness: Less sensitive to hyperparameter tuning compared to other algorithms.

Logging and Monitoring

Training progress is logged using:

TensorBoard: Local visualization of training metrics
Weights & Biases (WandB): Cloud-based experiment tracking (optional)
Video Capture: Records videos of agent performance at intervals

Results

Pendulum

The following image shows the training performance on the Pendulum environment:

Pendulum Training Results

HalfCheetah

Although not explicitly implemented in the current codebase, we have training results for the HalfCheetah environment using SAC:

HalfCheetah Training Results

Source Code

📁 GitHub Repository: SAC (SAC)

View the complete implementation, training scripts, and documentation on GitHub.

Share on

Twitter Facebook LinkedIn

Yuvraj Singh

SAC