STMask

The code is implmented for our paper in CVPR2021:

STMask: Spatial Feature Calibration and Temporal Fusion for Effective One-stage Video Instance Segmentation

News

[27/06/2021] !Important issue: For previous results of YTVIS2021 and OVIS datasets, we use the bounding boxes with normalization in the function bbox_feat_extractor() of track_to_segmetn_head.py by mistake. However, the bounding boxes in bbox_feat_extractor() function should not be normalized. We update the results and trained models for YTVIS2021 and OVIS datasets. Apologize for our negligence.
[12/06/2021] Update the solution for the error in deform_conv_cuda.cu
[22/04/2021] Add experimental results on YTVIS2021 and OVIS datasets
[14/04/2021] Release code on Github and paper on arxiv

Installation

Clone this repository and enter it:

git clone https://github.com/MinghanLi/STMask.git
cd STMask

Set up the environment using one of the following methods:
- Using Anaconda
  - Run conda env create -f environment.yml
  - conda activate STMask-env
- Manually with pip
  - Set up a Python3 environment.
  - Install Pytorch 1.0.1 (or higher) and TorchVision.
  - Install some other packages:
```
# Cython needs to be installed before pycocotools
pip install cython
pip install opencv-python pillow pycocotools matplotlib 
```

Install mmcv and mmdet

According to your Cuda and pytorch version to install mmcv or mmcv-full from here. Here my cuda and torch version are 10.1 and 1.5.0 respectively.
```
pip install mmcv-full==1.1.2 -f https://download.openmmlab.com/mmcv/dist/cu101/torch1.5.0/index.html
```

install cocoapi and a customized COCO API for YouTubeVIS dataset from here

pip install "git+https://github.com/cocodataset/cocoapi.git#subdirectory=PythonAPI"
git clone https://github.com/youtubevos/cocoapi
cd cocoapi/PythonAPI
# To compile and install locally 
python setup.py build_ext --inplace
# To install library to Python site-packages 
python setup.py build_ext install

Install spatial-correlation-sampler
```
pip install spatial-correlation-sampler
```
Complie DCNv2 code (see Installation)
- Download code for deformable convolutional layers from here
```
git clone https://github.com/CharlesShang/DCNv2.git
cd DCNv2
python setup.py build develop
```

Modify mmcv/ops/deform_conv.py to handle deformable convolution with different height and width (like 3 * 5) in FCB(ali) or FCB(ada)

Open the file deform_conv.py

vim /your_conda_env_path/mmcv/ops/deform_conv.py

Replace padW=ctx.padding[1], padH=ctx.padding[0] with padW=ctx.padding[0], padH=ctx.padding[1], taking Line 81-89 as an example:

ext_module.deform_conv_forward(
        input,
        weight,
        offset,
        output,
        ctx.bufs_[0],
        ctx.bufs_[1],
        kW=weight.size(3),
        kH=weight.size(2),
        dW=ctx.stride[1],
        dH=ctx.stride[0],
        padW=ctx.padding[0],
        padH=ctx.padding[1],
        dilationW=ctx.dilation[1],
        dilationH=ctx.dilation[0],
        group=ctx.groups,
        deformable_group=ctx.deform_groups,
        im2col_step=cur_im2col_step)

Dataset

If you'd like to train STMask, please download the datasets from the official web: YTVIS2019, YTVIS2021 and OVIS.

Evaluation

The input size on all VIS benchmarks is 360*640 here.

Quantitative Results on YTVIS2019 ((trained with 12 epoches))

Here are our STMask models (released on April, 2021) along with their FPS on a 2080Ti and mAP on valid set, where mAP and mAP* are obtained under cross class fast nms and fast nms respectively. Note that FCB(ali) and FCB(ada) are only executed on the classification branch.

Backbone	FCA	FCB	TF	FPS	mAP	mAP*	Weights
R50-DCN-FPN	FCA	-	TF	29.3	32.6	33.4	STMask_plus_resnet50.pth
R50-DCN-FPN	FCA	FCB(ali)	TF	27.8	-	32.1	STMask_plus_resnet50_ali.pth
R50-DCN-FPN	FCA	FCB(ada)	TF	28.6	32.8	33.0	STMask_plus_resnet50_ada.pth
R101-DCN-FPN	FCA	-	TF	24.5	36.0	36.3	STMask_plus_base.pth
R101-DCN-FPN	FCA	FCB(ali)	TF	22.1	36.3	37.1	STMask_plus_base_ali.pth
R101-DCN-FPN	FCA	FCB(ada)	TF	23.4	36.8	37.9	STMask_plus_base_ada.pth

Quantitative Results on YTVIS2021 (trained with 12 epoches)

Backbone	FCA	FCB	TF	mAP*	Weights	Results
R50-DCN-FPN	FCA	-	TF	30.6	STMask_plus_resnet50_YTVIS2021.pth	-
R50-DCN-FPN	FCA	FCB(ada)	TF	31.1	STMask_plus_resnet50_ada_YTVIS2021.pth	stdout.txt
R101-DCN-FPN	FCA	-	TF	33.7	STMask_plus_base_YTVIS2021.pth	-
R101-DCN-FPN	FCA	FCB(ada)	TF	34.6	STMask_plus_base_ada_YTVIS2021.pth	stdout.txt

Quantitative Results on OVIS (trained with 20 epoches)

Backbone	FCA	FCB	TF	mAP*	Weights	Results
R50-DCN-FPN	FCA	-	TF	15.4	STMask_plus_resnet50_OVIS.pth	-
R50-DCN-FPN	FCA	FCB(ada)	TF	15.4	STMask_plus_resnet50_ada_OVIS.pth	stdout.txt
R101-DCN-FPN	FCA	-	TF	17.3	STMask_plus_base_OVIS.pth	stdout.txt
R101-DCN-FPN	FCA	FCB(ada)	TF	15.8	STMask_plus_base_ada_OVIS.pth	-

To evalute the model, put the corresponding weights file in the ./weights directory and run one of the following commands. The name of each config is everything before the numbers in the file name (e.g., STMask_plus_base for STMask_plus_base.pth). Here all STMask models are trained based on yolact_plus_base_54_80000.pth or yolact_plus_resnet_54_80000.pth from Yolact++ here.

Quantitative Results on COCO

We also provide quantitative results of Yolcat++ with our proposed feature calibration for anchors and boxes on COCO (w/o temporal fusion module). Here are the results on COCO valid set.

Image Size	Backbone	FCA	FCB	B_AP	M_AP	Weights
[550,550]	R50-DCN-FPN	FCA	-	34.5	32.9	yolact_plus_resnet50_54.pth
[550,550]	R50-DCN-FPN	FCA	FCB(ali)	34.6	33.3	yolact_plus_resnet50_ali_54.pth
[550,550]	R50-DCN-FPN	FCA	FCB(ada)	34.7	33.2	yolact_plus_resnet50_ada_54.pth
[550,550]	R101-DCN-FPN	FCA	-	35.7	33.3	yolact_plus_base_54.pth
[550,550]	R101-DCN-FPN	FCA	FCB(ali)	35.6	34.1	yolact_plus_base_ali_54.pth
[550,550]	R101-DCN-FPN	FCA	FCB(ada)	36.4	34.8	yolact_plus_baseada_54.pth

Inference

# Output a YTVOSEval json to submit to the website.
# This command will create './weights/results.json' for instance segmentation.
python eval.py --config=STMask_plus_base_ada_config --trained_model=weights/STMask_plus_base_ada.pth --mask_det_file=weights/results.json

# Output a visual segmentation results
python eval.py --config=STMask_plus_base_ada_config --trained_model=weights/STMask_plus_base_ada.pth --mask_det_file=weights/results.json --display

Training

By default, we train on YouTubeVOS2019 dataset. Make sure to download the entire dataset using the commands above.

To train, grab an COCO-pretrained model and put it in ./weights.
- [Yolcat++]: For Resnet-50/-101, download yolact_plus_base_54_80000.pth or yolact_plus_resnet_54_80000.pth from Yolact++ here.
- [Yolcat++ & FC]: Alternatively, you can use those Yolact++ with FC models on Table. 2 for training, which can obtain a relative higher performance than that of Yolact++ models.
Run one of the training commands below.
- Note that you can press ctrl+c while training and it will save an *_interrupt.pth file at the current iteration.
- All weights are saved in the ./weights directory by default with the file name <config>_<epoch>_<iter>.pth.

# Trains STMask_plus_base_config with a batch_size of 8.
CUDA_VISIBLE_DEVICES=0,1 python train.py --config=STMask_plus_base_config --batch_size=8 --lr=1e-4 --save_folder=weights/weights_r101


# Resume training STMask_plus_base with a specific weight file and start from the iteration specified in the weight file's name.
CUDA_VISIBLE_DEVICES=0,1 python train.py --config=STMask_plus_base_config --resume=weights/STMask_plus_base_10_32100.pth

Citation

If you use STMask or this code base in your work, please cite

@inproceedings{STMask-CVPR2021,
  author    = {Minghan Li and Shuai Li and Lida Li and Lei Zhang},
  title     = {Spatial Feature Calibration and Temporal Fusion for Effective One-stage Video Instance Segmentation},
  booktitle = {CVPR},
  year      = {2021},
}

Contact

For questions about our paper or code, please contact Li Minghan (liminghan0330@gmail.com or minghancs.li@connect.polyu.hk).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

STMask

News

Installation

Dataset

Evaluation

Quantitative Results on YTVIS2019 ((trained with 12 epoches))

Quantitative Results on YTVIS2021 (trained with 12 epoches)

Quantitative Results on OVIS (trained with 20 epoches)

Quantitative Results on COCO

Inference

Training

Citation

Contact

Files

README.md

Latest commit

History

README.md

File metadata and controls

STMask

News

Installation

Dataset

Evaluation

Quantitative Results on YTVIS2019 ((trained with 12 epoches))

Quantitative Results on YTVIS2021 (trained with 12 epoches)

Quantitative Results on OVIS (trained with 20 epoches)

Quantitative Results on COCO

Inference

Training

Citation

Contact