❗ News |
---|
The Jax version of SHAQ has been implemented and merged into the novel MARL framework for Jax called JaxMARL following CleanRL's philosophy of providing single file implementations. |
This is the implementation of the paper SHAQ: Incorporating Shapley Value Theory into Multi-Agent Q-Learning, published on NeurIPS 2022.
The implementation is based on PyMARL. Please refer to that repo for more documentation.
The baselines except for SQDDPG used in this paper are from the repo of Weighted QMIX. To know more about baselines, please refer to that repo.
SQDDPG is implemented based on https://github.com/hsvgbkhgbv/SQDDPG.
The model implemented in this paper is based on Pytorch 1.4.0.
In particular implementations for:
- SHAQ
Note that in the repository the naming of certain hyper-parameters and concepts is a little different to the paper:
-
$\hat{\alpha}$ in the paper is alpha in the code
For all SMAC experiments we used SC2.4.6.2.69232 (not SC2.4.10). The underlying dynamics are sufficiently different that you cannot compare runs across the 2 versions!
The install_sc2.sh
script will install SC2.4.6.2.69232.
The config file (src/config/algs/shaq.yaml
) contains default hyper-parameters for SHAQ.
These were changed when running the experiments for the paper (epsilon_anneal_time = 1000000
for the super-hard maps in SMAC and Predator-Prey).
About the hyperparameter settings of variant experiments, the full details are listed as below.
Scenarios | LR for alpha | Epsilon anneal time |
---|---|---|
SMAC: 2c_vs_64zg | 0.002 | 50k |
SMAC: 3s_vs_5z | 0.001 | 50k |
SMAC: 5m_vs_6m | 0.0005 | 50k |
SMAC: 6h_vs_8z | 0.0005 | 1mil |
SMAC: Corridor | 0.0005 | 1mil |
SMAC: 8m | 0.0003 | 50k |
SMAC: 3s5z | 0.0003 | 50k |
SMAC: 3s5z_vs_3s6z | 0.0003 | 1mil |
SMAC: 1c3s5z | 0.0002 | 50k |
SMAC: 10m_vs_11m | 0.0001 | 50k |
SMAC: MMM2 | 0.0001 | 1mil |
Predator-Prey | 0.0001 | 1mil |
Please see the Appendix of the paper for the exact hyper-parameters used.
As an example, to run the SHAQ on SMAC: 2c_vs_64zg with epsilon annealed over 50k time steps:
python3 src/main.py --config=shaq --env-config=sc2 with env_args.map_name=2c_vs_64zg alpha_lr=0.002 epsilon_anneal_time=50000
We also provide the method to visualize the learned values during test. The details are as follows:
-
Set the param
evaluate
asTrue
and set an address for saving the testing result in string tosave_batch_path
in thedefault.yaml
. -
Set the checkpoint address to
checkpoint_path
that saves the model you would test indefault.yaml
. -
Set the trajectory during test saving address to
save_batch_path
indefault.yaml
. -
If you would like to test with the greedy policies, you need to set
epsilon_test
asFalse
. Otherwise, you need to setepsilon_test
as a float from0
to1.
to indicate the probability of performing random actions. -
Run the following example command:
python3 src/main.py --config=shaq --env-config=pred_prey_punish with env_args.miscapture_punishment=-1 checkpoint_path=results/models/shaq__2022-07-31_11-31-44 evaluate=True epsilon_test=False save_batch_path=[...]
-
Remove the
save_batch_path
and set the trajectory saving path toload_batch_path
. -
Set the paths for saving values, actions and state to
save_values_path
,save_actions_path
andsave_state_path
respectively. These are saved in pickle files. -
Run the following example command:
python3 src/main.py --config=shaq --env-config=pred_prey_punish with env_args.miscapture_punishment=-1 checkpoint_path=results/models/shaq__2022-07-31_11-31-44 evaluate=True epsilon_test=False load_batch_path=[...] save_values_path=[...] save_actions_path=[...] save_state_path=[...]
- You can visualize and analyze the learned values, actions and states through the storage in the saved pickle files.
If you use part of the work mentioned in this paper, please cite
@article{wang2022shaq,
title={SHAQ: Incorporating Shapley Value Theory into Multi-Agent Q-Learning},
author={Wang, Jianhong and Zhang, Yuan and Gu, Yunjie and Kim, Tae-Kyun},
journal={Advances in Neural Information Processing Systems},
volume={35},
pages={5941--5954},
year={2022}
}