This page provides basic tutorials about the usage of OpenMixup. For installation instructions, please see Install.
Learning discriminative visual representation efficiently that facilitates downstream tasks is one of the fundamental problems in computer vision. Data mixing techniques largely improve the quality of deep neural networks (DNNs) in various scenarios. Since mixup techniques are used as augmentations or auxiliary tasks in a wide range of cases, this repo focuses on mixup-related methods for Supervised, Self- and Semi-Supervised Representation Learning. Thus, we name this repo OpenMixp
.
Note: The default learning rate in config files is for 4 or 8 GPUs. If using differnt number GPUs, the total batch size will change in proportion, you have to scale the learning rate following new_lr = old_lr * new_ngpus / old_ngpus
. We recommend to use tools/dist_train.sh
even with 1 gpu, since some methods do not support non-distributed training.
bash tools/dist_train.sh ${CONFIG_FILE} ${GPUS} [optional arguments]
Optional arguments are:
--resume_from ${CHECKPOINT_FILE}
: Resume from a previous checkpoint file. Or you can use--auto_resume
to resume fromlatest.pth
automatically.--pretrained ${PRETRAIN_WEIGHTS}
: Load pretrained weights for the backbone.--load_checkpoint ${CHECKPOINT_FILE}
: Load the whole network from the checkpoint file.--deterministic
: Switch on "deterministic" mode which slows down training but the results are reproducible.
An example: Run the following command to train ResNet-50 for ImageNet classification, training results (checkpoints, jsons, logs) saved in WORK_DIR=work_dirs/classification/imagenet/resnet/resnet50_rsb_a3_sz160_8xb256_ep100/
.
bash tools/dist_train.sh configs/classification/imagenet/resnet/resnet50_rsb_a3_sz160_8xb256_ep100.py 8 --auto_resume
Note: During training, checkpoints and logs are saved in the same folder structure as the config file under work_dirs/
. Custom work directory is not recommended since evaluation scripts infer work directories from the config file name. If you want to save your weights somewhere else, please use symlink, for example:
ln -s /lisiyuan/source/OPENMIXUP_WORKDIRS ${OPENMIXUP}/work_dirs
Alternatively, if you run OpenMixup on a cluster managed with slurm:
SRUN_ARGS="${SRUN_ARGS}" bash tools/srun_train.sh ${PARTITION} ${CONFIG_FILE} ${GPUS} [optional arguments]
An example: Run the following command to train ResNet-50 for self-supervised learning.
SRUN_ARGS="-w xx.xx.xx.xx" bash tools/srun_train.sh Dummy configs/selfsup/mocov2/imagenet/r50_4xb64_cos_fp16_ep200.py 4 --resume_from work_dirs/selfsup/mocov2/imagenet/r50_4xb64_cos_fp16_ep200/latest.pth
If you launch with multiple machines simply connected with ethernet, you have to modify tools/dist_train.sh
or create a new script, please refer to PyTorch Launch utility. Usually it is slow if you do not have high speed networking like InfiniBand.
If you launch with slurm, the command is the same as that on single machine described above. You only need to change ${GPUS}, e.g., to 16 for two 8-GPU machines.
If you launch multiple jobs on a single machine, e.g., 2 jobs of 4-GPU training on a machine with 8 GPUs, you need to specify different ports (29500 by default) for each job to avoid communication conflict. No matter how many GPUs you use (e.g., 1-8 GPUs), we recommend you use the distributed training with torch.distributed.launch
or torchrun
rather than only using python (we deprecated single_train.sh
).
If you use dist_train.sh
to launch training jobs:
CUDA_VISIBLE_DEVICES=0,1,2,3 PORT=29500 bash tools/dist_train.sh ${CONFIG_FILE} 4
CUDA_VISIBLE_DEVICES=4,5,6,7 PORT=29501 bash tools/dist_train.sh ${CONFIG_FILE} 4
For examples, you can run the script below to train a ResNet-18 classifier on CIFAR-100 with 1 GPU:
CUDA_VISIBLE_DEVICES=0 PORT=29500 bash tools/dist_train.sh configs/classification/cifar100/mixups/basic/r18_mixups_CE_none.py 1
or you can run the script below to train a ResNet-50 classifier on ImageNet with 4 GPUs:
CUDA_VISIBLE_DEVICES=0,1,2,3 PORT=29500 bash tools/dist_train.sh configs/classification/imagenet/resnet/resnet50_4xb64_cos_ep100.py 4
If you use launch training jobs with slurm:
GPUS_PER_NODE=4 bash tools/srun_train.sh ${PARTITION} ${CONFIG_FILE} 4 --port 29500
GPUS_PER_NODE=4 bash tools/srun_train.sh ${PARTITION} ${CONFIG_FILE} 4 --port 29501
If you do not have so many GPUs to launch large training jobs, we recommend the gradient accumulation. Assuming that you only have 1 GPU that can contain 64 images in a batch, while you expect the batch size to be 256, you may add the following line into your config file. It performs network update every 4 iterations. In this way, the equivalent batch size is 256. Of course, it is about 4x slower than using 4 GPUs. Note that the workaround is not applicable for methods like SimCLR which require intra-batch communication.
optimizer_config = dict(update_interval=4)
We support mmcv and Apex to implement Mixed Precision Training. If you want to use Mixed Precision Training, you can add below in the config file.
use_fp16 = True
fp16 = dict(type='mmcv', loss_scale='dynamic')
You can choose FP16 types in 'apex' or 'mmcv'. We recommend that using 'mmcv' in PyTorch 1.6 or higher for faster training speed, while using 'apex' with lower PyTorch versions. An example of the RSB A3 setting:
bash tools/dist_train.sh configs/classification/imagenet/mixups/rsb_a3/r50/r18_rsb_a3_CE_sigm_mix0_1_cut1_0_sz160_bs2048_fp16_ep100.py 4
1 . Prefetching data helps to speeding up IO and make better use of CUDA stream parallelization. If you want to use it, you can activate it in the config file (disabled by default) and remove ToTensor
and Normalize
in 'train_pipeline'. Costly operation ToTensor
is reimplemented along with prefetch.
prefetch = True
if not prefetch:
train_pipeline.extend([dict(type='ToTensor'), dict(type='Normalize', **img_norm_cfg)])
2 . Replacing Pillow with Pillow-SIMD (https://github.com/uploadcare/pillow-simd.git) to make use of SIMD command sets with modern CPU.
pip uninstall pillow
pip install Pillow-SIMD or CC="cc -mavx2" pip install -U --force-reinstall pillow-simd if AVX2 is available.
We test it using MoCoV2 using a total batch size of 256 on Tesla V100. The training time per step is decreased to 0.17s from 0.23s.
We provide several standard benchmarks to evaluate representation learning (supervised and self-supervised pre-trained models), and you can refer to Benchmarks for the details. The config files or scripts for evaluation mentioned are NOT recommended to be changed if you want to use this repo in your publications. We hope that all methods are under a fair comparison.
If you want to adjust some parts of a basic config file (e.g., do ablation studies or tuning hyper-parameters), we provide ConfigGenerator in the config folders of each methods. For example, you want to train {'Mixup', 'CutMix'} with alpha in {0.2, 1.0} for {100, 300} epochs on ImageNet-1k based on PyTorch-style settings in configs/classification/imagenet/mixups/basic/r50_mixups_CE_none.py
, you can modified auto_train_in_mixups.py
and run
python configs/classification/imagenet/mixups/auto_train_in_mixups.py
It will generate eight config files and a bash file r50_mixups_CE_none_xxxx.sh
. You can adjust GPUs and PORT settings and execute this bash file to run eight experiments automaticly.
python tools/count_parameters.py ${CONFIG_FILE}
Compute the hash of the weight file and append the hash id to the filename. The output file is the input file name with a hash suffix.
python tools/publish_model.py ${WEIGHT_FILE}
Arguments:
WEIGHT_FILE
: The extracted backbone weights extracted aforementioned.
If you want to make your performance exactly reproducible, please switch on --deterministic
to train the final model to be published. Note that this flag will switch off torch.backends.cudnn.benchmark
and slow down the training speed.
- Configure data augmentations in the config file.
The augmentations are the same as torchvision.transforms
except that torchvision.transforms.RandomAppy
corresponds to RandomAppliedTrans
. Lighting
and GaussianBlur
is additionally implemented.
img_norm_cfg = dict(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
train_pipeline = [
dict(type='RandomResizedCrop', size=224),
dict(type='RandomAppliedTrans',
transforms=[
dict(type='GaussianBlur', sigma_min=0.1, sigma_max=2.0, kernel_size=23)],
p=0.5),
dict(type='ToTensor'),
dict(type='Normalize', **img_norm_cfg)
]
- Parameter-wise optimization parameters.
You may specify optimization paramters including lr, momentum and weight_decay for a certain group of paramters in the config file with paramwise_options
. paramwise_options
is a dict whose key is regular expressions and value is options. Options include 6 fields: lr, lr_mult, momentum, momentum_mult, weight_decay, weight_decay_mult, lars_exclude (only works with LARS optimizer).
# this config sets all normalization layers in CNN with weight_decay_mult=0.1,
# and the `head` with `lr_mult=10, momentum=0`.
paramwise_options = {
'(bn|gn)(\d+)?.(weight|bias)': dict(weight_decay_mult=0.1),
'\Ahead.': dict(lr_mult=10, momentum=0)}
optimizer_cfg = dict(type='SGD', lr=0.01, momentum=0.9,
weight_decay=0.0001,
paramwise_options=paramwise_options)
- Configure custom hooks in the config file.
The hooks will be called in order. For hook design, please refer to momentum_hook.py as an example.
custom_hooks = [
dict(type='SAVEHook', ...),
dict(type='CosineScheduleHook', ...),
]