Skip to content

Latest commit

 

History

History
95 lines (76 loc) · 5.46 KB

README.md

File metadata and controls

95 lines (76 loc) · 5.46 KB

VISTEM : Vision Model Training Template

Train and evaluate all of present Object Detection and Segmentation models.

Although previous projects implement multiple models(Detectron2, MMDetection and so on), these are missingsome of vision models. Our goal is to implement all of models by only using our project.

Installation

Requirements

  • PyTorch : >=1.5
  • TensorFlow : >=1.15.0
  • CUDA : >=10.2

Setup

git clone https://github.com/major196512/vistem
python -m pip install -e vistem
cd vistem
ln -s $PRETRAINED pretrained_weights
ln -s $DATA data

$PRETRAINED : the directory of pretrained backbone network weights(download pretrained models from here)

$DATA : the directory of datasets(See here for more information).

Performance

Pascal VOC

We train our models with 8-gpu and 16 images per batch.

Meta
Architecture
Backbone
Network
BBox
AP
BBox
AP50
BBox
AP75
Config File
RetinaNet ResNet-50
with FPN
55.533 81.730 60.504 retinanet_R50_FPN
RetinaNet ResNet-50
with NAS-FPN
In Progress retinanet_R50_NASFPN
Faster RCNN ResNet-50
with FPN
54.282 81.827 60.048 faster_R50_FPN
Faster RCNN ResNet-50
with NAS-FPN
In Progress faster_R50_NASFPN
CornerNet ResNet-50
with FPN
In Progress
RepPoints ResNet-50
with FPN
In Progress

When training using Gradient Accumulation, you must assign a cfg.SOLVER.ACUUMULATE and cfg.SOLVER.IMG_PER_BATCH in config file. In this table below, we run our models with 4 gradient accumulation and 4 images per batch with 2-gpu.

Meta
Architecture
Backbone
Network
BBox
AP
BBox
AP50
BBox
AP75
RetinaNet ResNet-50
with FPN
51.011 79.542 54.105
RetinaNet ResNet-50
with NAS-FPN
In Progress
Faster RCNN ResNet-50
with FPN
49.928 80.683 53.101
Faster RCNN ResNet-50
with NAS-FPN
In Progress
CornerNet ResNet-50
with FPN
In Progress
RepPoints ResNet-50
with FPN
In Progress

MS-COCO

In training MS-COCO datasets, We only evaluate with 8-gpu settings.

Meta
Architecture
Backbone
Network
BBox
AP
Config File
RetinaNet ResNet-50
with FPN
36.524 retinanet_R50_FPN
RetinaNet ResNet-50
with FPN
In Progress retinanet_R50_NASFPN
Faster RCNN ResNet-50
with FPN
38.021 faster_R50_FPN
Faster RCNN ResNet-50
with NAS-FPN
In Progress faster_R50_NASFPN
CornerNet ResNet-50
with FPN
In Progress
RepPoints ResNet-50
with FPN
In Progress

Training

Single Machine

When training in a single machine, you should only specify --config-file and --num-gpu in argument. You can select the training model and datasets by using or modifying a config file. For more information about factors in config, see here.

python tools/train.py --config-file ./configs/RetinaNet/VOC-Detection/R50_FPN_1x_8gpu.yaml --num-gpu 8

If you want to resume training, just set --resume in argument.

python tools/train.py --config-file ./configs/RetinaNet/VOC-Detection/R50_FPN_1x_8gpu.yaml --num-gpu 8 --resume

Multi Machine

For collective communication in pytorch, it needs to execute process in main machine. They automatically set main machine IP address and unused port number for TCP communication.

For main process, you must set machine-rank to zero and num-machine to the number of machines.

python tools/train.py --config-file ./configs/train.yaml --num-gpu 4 --num-machine 2 --machine-rank 0

In other machines, you clarify machine-rank and must set dist-ip and dist-port arguments which is the same with main machine values.

python tools/train.py --config-file ./configs/train.yaml --num-gpu 4 --num-machine 2 --machine-rank 1 --dist-ip xxx.xxx.xxx.xxx --dist-port xxxx

Evaluation

python tools/test.py --config-file ./configs/test.yaml --eval-only