Skip to content
This repository has been archived by the owner on Jul 2, 2021. It is now read-only.

Add ResNet training code #436

Merged
merged 74 commits into from
Nov 29, 2018
Merged

Add ResNet training code #436

merged 74 commits into from
Nov 29, 2018

Conversation

yuyu2172
Copy link
Member

@yuyu2172 yuyu2172 commented Sep 29, 2017

Merge after #652
Merge after #432, #435, #321

Aims at reproducing scheme used in https://github.com/facebook/fb.resnet.torch.
Edit:
The training code reproduces the scheme introduced below.
This performs better than fb.resnet.torch and scales to the situation with many GPUs.
https://arxiv.org/pdf/1706.02677.pdf

  • Linear Scaling Rule for lr (see the paper)
  • Set default value of gamma for BNs in the end of bottleneck to 0
  • use corrected momentum SGD
  • Have correct scale for weight decay (ChainerMN accomplishes this by default)
  • Learning rate scheduling happens at 30, 60, 80-th epoch.
  • stop using color related data augmentation

Edit 11/16:
Although "ResNet in 1 hour" paper is new, the training scheme is adapted by some researchers (e.g. mixup).

@yuyu2172 yuyu2172 changed the title Add ResNet training code [WIP] Add ResNet training code Sep 29, 2017
@yuyu2172 yuyu2172 changed the title [WIP] Add ResNet training code Add ResNet training code Nov 26, 2018
@yuyu2172 yuyu2172 added this to the 0.12 milestone Nov 26, 2018
@yuyu2172
Copy link
Member Author

@Hakuyume
Please review this

@yuyu2172 yuyu2172 force-pushed the resnet-train branch 4 times, most recently from 1b0a075 to 7abbaf1 Compare November 26, 2018 02:54
@Hakuyume
Copy link
Member

Both a weight and weights are used for "a set of parameters of a model". Which one is better?

'url': 'https://chainercv-models.preferred.jp/'
'resnet152_imagenet_trained_2018_11_26.npz'
},
},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we need 'cv2': True?

@yuyu2172
Copy link
Member Author

yuyu2172 commented Nov 26, 2018

I am checking how slow the training gets without options like autotune.

RESULT
After 7382 iterations with 32 GPUs

none:  1771s
`No `autotune`, no `chainer.cuda.set_max_workspace_size(1 * 1024 * 1024 * 1024)`:  1734s
`autotune`, `chainer.cuda.set_max_workspace_size(1 * 1024 * 1024 * 1024)`: 1776s

@yuyu2172
Copy link
Member Author

Thanks for reviewing.
I reflected ours discussions to the code.


The training procedure carefully follows the "ResNet in 1 hour" paper [5].

##### Performance tip
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you use 5th level? (4th level #### is not enough?)


def __call__(self, in_data):
img, label = in_data
_, H, W = img.shape
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

H and W are not used?

## Performance
## ImageNet

### Weight conversion

Single crop error rate.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the consistency with Trained model, it would be nice to add something like ~ of the models converted from Caffe model.

@@ -1,6 +1,8 @@
# Classification

## Performance
## ImageNet
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

he and fb should be distinguished from each other in score board.

Copy link
Member

@Hakuyume Hakuyume left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Hakuyume Hakuyume merged commit 9d8a68e into chainer:master Nov 29, 2018
@yuyu2172 yuyu2172 deleted the resnet-train branch November 29, 2018 10:37
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants