Adversarial Distributional Training

This repository contains the code for adversarial distributional training (ADT) introduced in the following paper

Adversarial Distributional Training for Robust Deep Learning (NeurIPS 2020)

Yinpeng Dong*, Zhijie Deng*, Tianyu Pang, Hang Su, and Jun Zhu (* indicates equal contribution)

Citation

If you find our methods useful, please consider citing:

@inproceedings{dong2020adversarial,
  title={Adversarial Distributional Training for Robust Deep Learning},
  author={Dong, Yinpeng and Deng, Zhijie and Pang, Tianyu and Su, Hang and Zhu, Jun},
  booktitle={Advances in Neural Information Processing Systems},
  year={2020}
}

Introduction

Adversarial distribution training (ADT) is a new framework to train robust deep learning models. It is formulated as a minimax optimization problem, in which the inner maximization aims to find an adversarial distribution for each natural input to characterize potential adversarial examples; and the outer minimization aims to optimize DNN parameters with the worst-case adversarial distributions.

In this paper, we proposed three different approaches to parameterize the adversarial distributions, as illustrated below.

Figure 1: An illustration of three different ADT methods, including (a) ADT_EXP; (b) ADT_EXP-AM; (c) ADT_IMP-AM.

Prerequisites

Python (3.6.8)
Pytorch (1.3.0)
torchvision (0.4.1)
numpy

Training

We have proposed three different methods for ADT. The command for each training method is specified below.

Training ADT_EXP

python adt_exp.py --model-dir adt-exp --dataset cifar10 (or cifar100/svhn)

Training ADT_EXP-AM

python adt_expam.py --model-dir adt-expam --dataset cifar10 (or cifar100/svhn)

Training ADT_IMP-AM

python adt_impam.py --model-dir adt-impam --dataset cifar10 (or cifar100/svhn)

The checkpoints will be saved at each model folder.

Evaluation

Evaluation under White-box Attacks

For FGSM attack, run

python evaluate_attacks.py --model-path ${MODEL-PATH} --attack-method FGSM --dataset cifar10 (or cifar100/svhn)

For PGD attack, run

python evaluate_attacks.py --model-path ${MODEL-PATH} --attack-method PGD --num-steps 20 (or 100) --dataset cifar10 (or cifar100/svhn)

For MIM attack, run

python evaluate_attacks.py --model-path ${MODEL-PATH} --attack-method MIM --num-steps 20 --dataset cifar10 (or cifar100/svhn)

For C&W attack, run

python evaluate_attacks.py --model-path ${MODEL-PATH} --attack-method CW --num-steps 30 --dataset cifar10 (or cifar100/svhn)

For FeaAttack, run

python feature_attack.py --model-path ${MODEL-PATH} --dataset cifar10 (or cifar100/svhn)

Evaluation under Transfer-based Black-box Attacks

First change the --white-box-attack argument in evaluate_attacks.py to False. Then run

python evaluate_attacks.py --source-model-path ${SOURCE-MODEL-PATH} --target-model-path ${TARGET-MODEL-PATH} --attack-method PGD (or MIM)

Evaluation under SPSA

python spsa.py --model-path ${MODEL-PATH} --samples_per_draw 256 (or 512/1024/2048)

Pretrained Models

We have provided the pre-trained models on CIFAR-10, whose performance is reported in Table 1. They can be downloaded at

Contact

Yinpeng Dong: dyp17@mails.tsinghua.edu.cn

Zhijie Deng: dzj17@mails.tsinghua.edu.cn

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Adversarial Distributional Training

Citation

Introduction

Prerequisites

Training

Training ADT_EXP

Training ADT_EXP-AM

Training ADT_IMP-AM

Evaluation

Evaluation under White-box Attacks

Evaluation under Transfer-based Black-box Attacks

Evaluation under SPSA

Pretrained Models

Contact

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
models		models
LICENSE		LICENSE
README.md		README.md
adt_exp.py		adt_exp.py
adt_expam.py		adt_expam.py
adt_impam.py		adt_impam.py
algos_adt.jpg		algos_adt.jpg
evaluate_attacks.py		evaluate_attacks.py
feature_attack.py		feature_attack.py
spsa.py		spsa.py

License

dongyp13/Adversarial-Distributional-Training

Folders and files

Latest commit

History

Repository files navigation

Adversarial Distributional Training

Citation

Introduction

Prerequisites

Training

Training ADTEXP

Training ADTEXP-AM

Training ADTIMP-AM

Evaluation

Evaluation under White-box Attacks

Evaluation under Transfer-based Black-box Attacks

Evaluation under SPSA

Pretrained Models

Contact

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Training ADT_EXP

Training ADT_EXP-AM

Training ADT_IMP-AM

Packages