Skip to content

Latest commit

 

History

History
111 lines (75 loc) · 5.6 KB

README.md

File metadata and controls

111 lines (75 loc) · 5.6 KB

CrossFormer Segmentation

Our semantic segmentation code is developed on top of MMSegmentation v0.12.0.

For more details please refer to our paper CrossFormer: A Versatile Vision Transformer Based on Cross-scale Attention.

Prerequisites

  1. Libraries (Python3.6-based)
pip3 install mmcv-full==1.2.7 mmsegmentation==0.12.0
  1. Prepare ADE20K dataset according to guidelines in MMSegmentation v0.12.0

  2. Prepare pretrained CrossFormer models

import torch
ckpt = torch.load("crossformer-s.pth") ## load classification checkpoint
torch.save(ckpt["model"], "backbone-corssformer-s.pth") ## only model weights are needed

Getting Started

  1. Modify data_root in configs/_base_/datasets/ade20k.py and configs/_base_/datasets/ade20k_swin.py to your path to the ADE20K dataset.

  2. Training

## Use config in Results table listed below as <CONFIG_FILE>
./dist_train.sh <CONFIG_FILE> <GPUS> <PRETRAIN_MODEL>

## e.g. train fpn_crossformer_b model with 8 GPUs
./dist_train.sh configs/fpn_crossformer_b_ade20k_40k.py 8 path/to/backbone-corssformer-s.pth
  1. Inference
./dist_test.sh <CONFIG_FILE> <GPUS> <DET_CHECKPOINT_FILE>

## e.g. evaluate semantic segmentation model by mIoU
./dist_test.sh configs/fpn_crossformer_b_ade20k_40k.py 8 path/to/ckpt

Notes: We use single-scale testing by default, you can enable multi-scale testing or flip testing manually by following the instructions in configs/_base_/datasets/ade20k[_swin].py.

Results

Semantic FPN

Backbone Iterations Params FLOPs IOU config Models
PVT-M 80K 48.0M 219.0G 41.6 - -
CrossFormer-S 80K 34.3M 209.8G 46.4 config Google Drive/BaiduCloud, key: sn5h
PVT-L 80K 65.1M 283.0G 42.1 - -
Swin-S 80K 53.2M 274.0G 45.2 - -
CrossFormer-B 80K 55.6M 320.1G 48.0 config Google Drive/BaiduCloud, key: joi5
CrossFormer-L 80K 95.4M 482.7G 49.1 config Google Drive/BaiduCloud, key: 6v5d

UPerNet

Backbone Iterations Params FLOPs IOU MS IOU config Models
ResNet-101 160K 86.0M 1029.0G 44.9 - - -
Swin-T 160K 60.0M 945.0G 44.5 45.8 - -
CrossFormer-S 160K 62.3M 979.5G 47.6 48.4 config Google Drive/BaiduCloud, key: wesb
Swin-S 160K 81.0M 1038.0G 47.6 49.5 - -
CrossFormer-B 160K 83.6M 1089.7G 49.7 50.6 config Google Drive/BaiduCloud, key: j061
Swin-B 160K 121.0M 1088.0G 48.1 49.7 - -
CrossFormer-L 160K 125.5M 1257.8G 50.4 51.4 config Google Drive/BaiduCloud, key: 17ks

Notes:

  • MS IOU means IOU with multi-scale testing.
  • Models are trained on ADE20K. Backbones are initialized with weights pre-trained on ImageNet-1K.
  • For Semantic FPN, models are trained for 80K iterations with batch size 16. For UperNet, models are trained for 160K iterations.
  • More detailed training settings can be found in corresponding configs.
  • More results can be seen in our paper.

FLOPs and Params Calculation

use get_flops.py to calculate FLOPs and #parameters of the specified model.

python get_flops.py <CONFIG_FILE> --shape <height> <width>

## e.g. get FLOPs and #params of fpn_crossformer_b with input image size [1024, 1024]
python get_flops.py configs/fpn_crossformer_b_ade20k_40k.py --shape 1024 1024

Notes: Default input image size is [1024, 1024]. For calculation with different input image size, you need to change <height> <width> in the above command and change img_size in crossformer_factory.py accordingly at the same time.

Citing Us

@inproceedings{wang2021crossformer,
  title = {CrossFormer: A Versatile Vision Transformer Hinging on Cross-scale Attention},
  author = {Wang, Wenxiao and Yao, Lu and Chen, Long and Lin, Binbin and Cai, Deng and He, Xiaofei and Liu, Wei},
  booktitle = {International Conference on Learning Representations, {ICLR}},
  url = {https://openreview.net/forum?id=_PHymLIxuI},
  year = {2022}
}