Today we will learn about Policy Gradient methods, and use them to land on the Moon.
Ready, set, go!
Make sure you have Python >= 3.7. Otherwise, update it.
-
Pull the code from GitHub and cd into the
04_lunar_lander
folder:$ git clone https://github.com/Paulescu/hands-on-rl.git $ cd hands-on-rl/04_lunar_lander
-
Make sure you have the
virtualenv
tool in your Python installation$ pip3 install virtualenv
-
Create a virtual environment and activate it.
$ virtualenv -p python3 venv $ source venv/bin/activate
From this point onwards commands run inside the virtual environment.
-
Install dependencies and code from
src
folder in editable mode, so you can experiment with the code.$ (venv) pip install -r requirements.txt $ (venv) export PYTHONPATH="."
-
Open the notebooks, either with good old Jupyter or Jupyter lab
$ (venv) jupyter notebook
$ (venv) jupyter lab
If both launch commands fail, try these:
$ (venv) jupyter notebook --NotebookApp.use_redirect_file=False
$ (venv) jupyter lab --NotebookApp.use_redirect_file=False
-
Play and learn. And do the homework π.
- Random agent baseline
- Policy gradients with rewards as weights
- Policy gradients with rewards-to-go as weights
- Homework
Do you wanna become a PRO in Machine Learning?
ππ½ Subscribe to the datamachines newsletter π§