This reposistory contains the code to train Hebbian random networks on any Gym environment or pyBullet environment as described in our paper Meta-Learning through Hebbian Plasticity in Random Networks, 2020. Additionally, you can train any custom environment by registering them.
First, install dependencies. Use Python >= 3.8
:
# clone project
git clone https://github.com/enajx/HebbianMetaLearning
# install dependencies
cd HebbianMetaLearning
pip install -r requirements.txt
Next, use train_hebb.py
to train an agent. You can train any of OpenAI Gym's or pyBullet environments:
# train Hebbian network to solve the racing car
python train_hebb.py --environment CarRacing-v0
# train Hebbian network specifying evolution parameters, eg.
python train_hebb.py --environment CarRacing-v0 --hebb_rule ABCD_lr --generations 300 --popsize 200 --print_every 1 --init_weights uni --lr 0.2 --sigma 0.1 --decay 0.995 --threads -1 --distribution normal
Use python train_hebb.py --help
to display all the training options:
train_hebb.py [--environment] [--hebb_rule] [--popsize] [--lr] [--decay] [--sigma] [--init_weights] [--print_every] [--generations] [--threads] [--folder] [--distribution]
--environment Environment: any OpenAI Gym or pyBullet environment may be used
--hebb_rule Hebbian rule type: A, AD_lr, ABC, ABC_lr, ABCD, ABCD_lr
--popsize Population size.
--lr ES learning rate.
--decay ES decay.
--sigma ES sigma: modulates the amount of noise used to populate each new generation
--init_weights The distribution used to sample random weights from at each episode / coevolve mode: uni, normal, coevolve
--print_every Print and save every N steps.
--generations Number of generations that the ES will run.
--threads Number of threads used to run evolution in parallel.
--folder folder to store the evolved Hebbian coefficients
--distribution Sampling distribution for initialize the Hebbian coefficients: normal, uniform
Once trained, use evaluate_hebb.py
to test the evolved agent:
python evaluate_hebb.py --environment CarRacing-v0 --hebb_rule ABCD_lr --path_hebb heb_coeffs.dat --path_coev cnn_parameters.dat --init_weights uni
When running on a headless server some environments will require a virtual display to run -eg. CarRacing-v0-, in this case run:
xvfb-run -a -s "-screen 0 1400x900x24 +extension RANDR" -- python train_hebb.py --environment CarRacing-v0
If you use the code for academic or commecial use, please cite the associated paper:
@inproceedings{Najarro2020,
title = {{Meta-Learning through Hebbian Plasticity in Random Networks}},
author = {Najarro, Elias and Risi, Sebastian},
booktitle = {Advances in Neural Information Processing Systems},
year = {2020},
url = {https://arxiv.org/abs/2007.02686}
}
The CarRacing-v0 environment results can be reproduced by running python train_hebb.py --environment CarRacing-v0
.
The damaged quadruped morphologies can be found in the folder damaged_bullet_morphologies. In order to reproduce the damaged quadruped results, these new morphologies need to be firstly registered as custom environments and secondly added to the fitness function: simply add a 2-fold loop which returns the average cummulative distance walked of the standard morphology and the damaged one.
All the necessary training parameters are indicated in the paper.
The static networks used as baselines can be reproduced with the code in this repository.
If you have any trouble reproducing the paper's results, feel free to open an issue or email us.
In the paper we have tested the CarRacing-v0 and AntBulletEnv-v0 environments. For both of them we have written custom functions to bound the actions; the rest of the environments have a simple clipping mechanism to bound their actions. Environments with a continuous action space (ie. Box) may benefit from a continous scaling -rather than clipping- of their action spaces, either with a custom activation function or with Gym's RescaleAction wrapper.
Another element that greatly affects performance -if you have bounded computational resources- is the choice of a suitable early stop meachanism such that less CPU cycles are wasted, eg. for the CarRacing-v0 environment we use 20 consecutive steps with negative reward as an early stop signal.
Finally, some pixel-based environments would likely benefit from using grayscaling + stacked frames approach rather than feeding the network the three RGB channels as we do in our implementation, eg. by using Gym's Frame stack wrapper or the Atari preprocessing wrapper.