Muesli (LunarLander-v2)

Introduction

Here is simple implementation of Muesli algorithm. Muesli has same performance and network architecture as MuZero, but it can be trained without MCTS lookahead search, just use one-step lookahead. It can reduce computational cost significantly compared to MuZero.

Paper : Muesli: Combining Improvements in Policy Optimization, Hessel et al., 2021 (v2 version)

You can run this code on colab demo link, train the agent and monitor with tensorboard, play LunarLander-v2 environment with trained network. This agent can solve LunarLander-v2 within 1~2 hours computed by Google Colab CPU backend. It can reach about > 250 average score.

Implemented

Todo

Retrace estimator
CNN representation network
LSTM dynamics network
Atari environment

Differences from paper

Self-play use agent network (originally target network)

Self-play

Flow of self-play.

Unroll structure

Target network 1-step unroll : When calculating v_pi_prior(s) and second term of L_pg+cmpo.

Unroll 5-step(agent network) : Unroll agent network to optimize.

1-step unrolls for L_m (target network) : When calculating pi_cmpo of L_m.

Results

Score graph Loss graph Lunarlander play length and last rewards Var variables of advantage normalization

Comment

Need your help! Welcome to contribute, advice, question, etc.

Contact : emtgit2@gmail.com (Available languages : English, Korean)

Links

Author's presentation : https://icml.cc/virtual/2021/poster/10769

Lunarlander-v2 env document : https://www.gymlibrary.dev/environments/box2d/lunar_lander/

Colab demo link (main branch)

Colab demo link (develop branch)

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
LICENSE		LICENSE
Muesli_LunarLander.ipynb		Muesli_LunarLander.ipynb
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Muesli (LunarLander-v2)

Introduction

Implemented

Todo

Differences from paper

Self-play

Unroll structure

Results

Comment

Links

About

Releases

Packages

Languages

License

howsmyanimeprofilepicture/Muesli-lunarlander

Folders and files

Latest commit

History

Repository files navigation

Muesli (LunarLander-v2)

Introduction

Implemented

Todo

Differences from paper

Self-play

Unroll structure

Results

Comment

Links

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages