This repository is the main source code for the paper titled: "Adaptive Distillation: Aggregating Knowledge from Multiple Paths for Efficient Distillation". This paper has been published in BMVC2021.
To reproduce the experiments with CIFAR-100 dataset, please follow the instructions below:
Install packages.
conda create -n AdaptiveDistillation python=3.7 -y
conda activate AdaptiveDistillation
conda install pytorch==1.7.0 torchvision==0.8.1 cudatoolkit==11.0 -c pytorch -y
pip install mmcv-full==1.3.8 -f https://download.openmmlab.com/mmcv/dist/cu110/torch1.7.0/index.html
pip install -r requirements.txt
Please note that, this repository is based on mmclassification package version 0.13.0
. Hence this particular version should be installed. Newer versions might work, but has not been tested.
python tools/train.py configs/kd/cifar100/<config-file> --options model.backbone.norm_cfg.type='BN'
For instance for the case of distillation from the ResNet50 to ResNet18 with equal contribution from different paths you can run:
python tools/train.py configs/kd/cifar100/kd_resnet18_resnet50_cifar100_equal --options model.backbone.norm_cfg.type='BN'
./tools/dist_train.sh configs/kd/cifar100/<config-file> <num_gpus> --options data.samples_per_gpu=<num_samples>
For instance, to reproduce our results use 4 GPUs and num_samples 32 for the same setting you can use:
./tools/dist_train.sh configs/kd/cifar100/kd_resnet18_resnet50_cifar100_equal 4 --options data.samples_per_gpu=32
Please use the following bibitem for citing our work Adaptive Distillation:
@article{chennupati2021adaptive,
title={Adaptive Distillation: Aggregating Knowledge from Multiple Paths for Efficient Distillation},
author={Chennupati, Sumanth and Kamani, Mohammad Mahdi and Cheng, Zhongwei and Chen, Lin},
journal={British Machine Vision Conference (BMVC)},
year={2021}
}
The Paper can be accessed from here
The backbone of this repository is forked from mmclassification repository from OpenMMLab