Skip to content

Latest commit

 

History

History
207 lines (157 loc) · 11.4 KB

get_started.md

File metadata and controls

207 lines (157 loc) · 11.4 KB

Getting Started

This page provides basic tutorials about the usage of OpenMixup. For installation instructions, please see Install.

Introduction

Learning discriminative visual representation efficiently that facilitates downstream tasks is one of the fundamental problems in computer vision. Data mixing techniques largely improve the quality of deep neural networks (DNNs) in various scenarios. Since mixup techniques are used as augmentations or auxiliary tasks in a wide range of cases, this repo focuses on mixup-related methods for Supervised, Self- and Semi-Supervised Representation Learning. Thus, we name this repo OpenMixp.

Train existing methods

Note: The default learning rate in config files is for 4 or 8 GPUs. If using differnt number GPUs, the total batch size will change in proportion, you have to scale the learning rate following new_lr = old_lr * new_ngpus / old_ngpus. We recommend to use tools/dist_train.sh even with 1 gpu, since some methods do not support non-distributed training.

Train with single/multiple GPUs

bash tools/dist_train.sh ${CONFIG_FILE} ${GPUS} [optional arguments]

Optional arguments are:

  • --resume_from ${CHECKPOINT_FILE}: Resume from a previous checkpoint file. Or you can use --auto_resume to resume from latest.pth automatically.
  • --pretrained ${PRETRAIN_WEIGHTS}: Load pretrained weights for the backbone.
  • --load_checkpoint ${CHECKPOINT_FILE}: Load the whole network from the checkpoint file.
  • --deterministic: Switch on "deterministic" mode which slows down training but the results are reproducible.

An example: Run the following command to train ResNet-50 for ImageNet classification, training results (checkpoints, jsons, logs) saved in WORK_DIR=work_dirs/classification/imagenet/resnet/resnet50_rsb_a3_sz160_8xb256_ep100/.

bash tools/dist_train.sh configs/classification/imagenet/resnet/resnet50_rsb_a3_sz160_8xb256_ep100.py 8 --auto_resume

Note: During training, checkpoints and logs are saved in the same folder structure as the config file under work_dirs/. Custom work directory is not recommended since evaluation scripts infer work directories from the config file name. If you want to save your weights somewhere else, please use symlink, for example:

ln -s /lisiyuan/source/OPENMIXUP_WORKDIRS ${OPENMIXUP}/work_dirs

Alternatively, if you run OpenMixup on a cluster managed with slurm:

SRUN_ARGS="${SRUN_ARGS}" bash tools/srun_train.sh ${PARTITION} ${CONFIG_FILE} ${GPUS} [optional arguments]

An example: Run the following command to train ResNet-50 for self-supervised learning.

SRUN_ARGS="-w xx.xx.xx.xx" bash tools/srun_train.sh Dummy configs/selfsup/mocov2/imagenet/r50_4xb64_cos_fp16_ep200.py 4 --resume_from work_dirs/selfsup/mocov2/imagenet/r50_4xb64_cos_fp16_ep200/latest.pth

Train with multiple machines

If you launch with multiple machines simply connected with ethernet, you have to modify tools/dist_train.sh or create a new script, please refer to PyTorch Launch utility. Usually it is slow if you do not have high speed networking like InfiniBand.

If you launch with slurm, the command is the same as that on single machine described above. You only need to change ${GPUS}, e.g., to 16 for two 8-GPU machines.

Launch multiple jobs on a single machine

If you launch multiple jobs on a single machine, e.g., 2 jobs of 4-GPU training on a machine with 8 GPUs, you need to specify different ports (29500 by default) for each job to avoid communication conflict. No matter how many GPUs you use (e.g., 1-8 GPUs), we recommend you use the distributed training with torch.distributed.launch or torchrun rather than only using python (we deprecated single_train.sh).

If you use dist_train.sh to launch training jobs:

CUDA_VISIBLE_DEVICES=0,1,2,3 PORT=29500 bash tools/dist_train.sh ${CONFIG_FILE} 4
CUDA_VISIBLE_DEVICES=4,5,6,7 PORT=29501 bash tools/dist_train.sh ${CONFIG_FILE} 4

For examples, you can run the script below to train a ResNet-18 classifier on CIFAR-100 with 1 GPU:

CUDA_VISIBLE_DEVICES=0 PORT=29500 bash tools/dist_train.sh configs/classification/cifar100/mixups/basic/r18_mixups_CE_none.py 1

or you can run the script below to train a ResNet-50 classifier on ImageNet with 4 GPUs:

CUDA_VISIBLE_DEVICES=0,1,2,3 PORT=29500 bash tools/dist_train.sh configs/classification/imagenet/resnet/resnet50_4xb64_cos_ep100.py 4

If you use launch training jobs with slurm:

GPUS_PER_NODE=4 bash tools/srun_train.sh ${PARTITION} ${CONFIG_FILE} 4 --port 29500
GPUS_PER_NODE=4 bash tools/srun_train.sh ${PARTITION} ${CONFIG_FILE} 4 --port 29501

Gradient Accumulation

If you do not have so many GPUs to launch large training jobs, we recommend the gradient accumulation. Assuming that you only have 1 GPU that can contain 64 images in a batch, while you expect the batch size to be 256, you may add the following line into your config file. It performs network update every 4 iterations. In this way, the equivalent batch size is 256. Of course, it is about 4x slower than using 4 GPUs. Note that the workaround is not applicable for methods like SimCLR which require intra-batch communication.

optimizer_config = dict(update_interval=4)

Mixed Precision Training

We support mmcv and Apex to implement Mixed Precision Training. If you want to use Mixed Precision Training, you can add below in the config file.

use_fp16 = True
fp16 = dict(type='mmcv', loss_scale='dynamic')

You can choose FP16 types in 'apex' or 'mmcv'. We recommend that using 'mmcv' in PyTorch 1.6 or higher for faster training speed, while using 'apex' with lower PyTorch versions. An example of the RSB A3 setting:

bash tools/dist_train.sh configs/classification/imagenet/mixups/rsb_a3/r50/r18_rsb_a3_CE_sigm_mix0_1_cut1_0_sz160_bs2048_fp16_ep100.py 4

Speeding Up IO

1 . Prefetching data helps to speeding up IO and make better use of CUDA stream parallelization. If you want to use it, you can activate it in the config file (disabled by default) and remove ToTensor and Normalize in 'train_pipeline'. Costly operation ToTensor is reimplemented along with prefetch.

prefetch = True
if not prefetch:
    train_pipeline.extend([dict(type='ToTensor'), dict(type='Normalize', **img_norm_cfg)])

2 . Replacing Pillow with Pillow-SIMD (https://github.com/uploadcare/pillow-simd.git) to make use of SIMD command sets with modern CPU.

pip uninstall pillow
pip install Pillow-SIMD or CC="cc -mavx2" pip install -U --force-reinstall pillow-simd if AVX2 is available.

We test it using MoCoV2 using a total batch size of 256 on Tesla V100. The training time per step is decreased to 0.17s from 0.23s.

(back to top)

Benchmarks

We provide several standard benchmarks to evaluate representation learning (supervised and self-supervised pre-trained models), and you can refer to Benchmarks for the details. The config files or scripts for evaluation mentioned are NOT recommended to be changed if you want to use this repo in your publications. We hope that all methods are under a fair comparison.

Tools and Tips

Generate fast config files

If you want to adjust some parts of a basic config file (e.g., do ablation studies or tuning hyper-parameters), we provide ConfigGenerator in the config folders of each methods. For example, you want to train {'Mixup', 'CutMix'} with alpha in {0.2, 1.0} for {100, 300} epochs on ImageNet-1k based on PyTorch-style settings in configs/classification/imagenet/mixups/basic/r50_mixups_CE_none.py, you can modified auto_train_in_mixups.py and run

python configs/classification/imagenet/mixups/auto_train_in_mixups.py

It will generate eight config files and a bash file r50_mixups_CE_none_xxxx.sh. You can adjust GPUs and PORT settings and execute this bash file to run eight experiments automaticly.

Count number of parameters

python tools/count_parameters.py ${CONFIG_FILE}

Publish a model

Compute the hash of the weight file and append the hash id to the filename. The output file is the input file name with a hash suffix.

python tools/publish_model.py ${WEIGHT_FILE}

Arguments:

  • WEIGHT_FILE: The extracted backbone weights extracted aforementioned.

Reproducibility

If you want to make your performance exactly reproducible, please switch on --deterministic to train the final model to be published. Note that this flag will switch off torch.backends.cudnn.benchmark and slow down the training speed.

Convenient Features

  • Configure data augmentations in the config file.

The augmentations are the same as torchvision.transforms except that torchvision.transforms.RandomAppy corresponds to RandomAppliedTrans. Lighting and GaussianBlur is additionally implemented.

img_norm_cfg = dict(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
train_pipeline = [
    dict(type='RandomResizedCrop', size=224),
    dict(type='RandomAppliedTrans',
        transforms=[
            dict(type='GaussianBlur', sigma_min=0.1, sigma_max=2.0, kernel_size=23)],
        p=0.5),
    dict(type='ToTensor'),
    dict(type='Normalize', **img_norm_cfg)
]
  • Parameter-wise optimization parameters.

You may specify optimization paramters including lr, momentum and weight_decay for a certain group of paramters in the config file with paramwise_options. paramwise_options is a dict whose key is regular expressions and value is options. Options include 6 fields: lr, lr_mult, momentum, momentum_mult, weight_decay, weight_decay_mult, lars_exclude (only works with LARS optimizer).

# this config sets all normalization layers in CNN with weight_decay_mult=0.1,
# and the `head` with `lr_mult=10, momentum=0`.
paramwise_options = {
    '(bn|gn)(\d+)?.(weight|bias)': dict(weight_decay_mult=0.1),
    '\Ahead.': dict(lr_mult=10, momentum=0)}
optimizer_cfg = dict(type='SGD', lr=0.01, momentum=0.9,
                     weight_decay=0.0001,
                     paramwise_options=paramwise_options)
  • Configure custom hooks in the config file.

The hooks will be called in order. For hook design, please refer to momentum_hook.py as an example.

custom_hooks = [
    dict(type='SAVEHook', ...),
    dict(type='CosineScheduleHook', ...),
]

(back to top)