Skip to content

v4.0.1: Soft Actor-Critic

Compare
Choose a tag to compare
@kengz kengz released this 11 Aug 18:14
· 560 commits to master since this release
4fb2efe

This release adds a new algorithm: Soft Actor-Critic (SAC).

Soft Actor-Critic

-implement the original paper: "Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor" https://arxiv.org/abs/1801.01290 #398

  • implement the improvement of SAC paper: "Soft Actor-Critic Algorithms and Applications" https://arxiv.org/abs/1812.05905 #399
  • extend SAC to work directly for discrete environment using GumbelSoftmax distribution (custom)

Roboschool (continuous control) Benchmark

Note that the Roboschool reward scales are different from MuJoCo's.

Env. \ Alg. SAC
RoboschoolAnt 2451.55
graph
RoboschoolHalfCheetah 2004.27
graph
RoboschoolHopper 2090.52
graph
RoboschoolWalker2d 1711.92
graph

LunarLander (discrete control) Benchmark

sac_lunar_t0_trial_graph_mean_returns_vs_frames sac_lunar_t0_trial_graph_mean_returns_ma_vs_frames
Trial graph Moving average