Add ResNet training code #436

yuyu2172 · 2017-09-29T09:01:19Z

~~Merge after #652~~
~~Merge after #432, #435, #321~~

~~Aims at reproducing scheme used in https://github.com/facebook/fb.resnet.torch.~~
Edit:
The training code reproduces the scheme introduced below.
This performs better than fb.resnet.torch and scales to the situation with many GPUs.
https://arxiv.org/pdf/1706.02677.pdf

Linear Scaling Rule for lr (see the paper)
Set default value of gamma for BNs in the end of bottleneck to 0
use corrected momentum SGD
Have correct scale for weight decay (ChainerMN accomplishes this by default)
Learning rate scheduling happens at 30, 60, 80-th epoch.
stop using color related data augmentation

Edit 11/16:
Although "ResNet in 1 hour" paper is new, the training scheme is adapted by some researchers (e.g. mixup).

…-train

yuyu2172 · 2018-11-26T02:48:04Z

@Hakuyume
Please review this

Hakuyume · 2018-11-26T04:07:41Z

Both a weight and weights are used for "a set of parameters of a model". Which one is better?

Hakuyume · 2018-11-26T04:29:09Z

chainercv/links/model/resnet/resnet.py

+                    'url': 'https://chainercv-models.preferred.jp/'
+                    'resnet152_imagenet_trained_2018_11_26.npz'
+                },
+            },


Don't we need 'cv2': True?

yuyu2172 · 2018-11-26T05:19:21Z

I am checking how slow the training gets without options like autotune.

RESULT
After 7382 iterations with 32 GPUs

none:  1771s
`No `autotune`, no `chainer.cuda.set_max_workspace_size(1 * 1024 * 1024 * 1024)`:  1734s
`autotune`, `chainer.cuda.set_max_workspace_size(1 * 1024 * 1024 * 1024)`: 1776s

yuyu2172 · 2018-11-26T07:35:34Z

Thanks for reviewing.
I reflected ours discussions to the code.

Hakuyume · 2018-11-29T08:08:39Z

examples/classification/README.md

+
+The training procedure carefully follows the "ResNet in 1 hour" paper [5].
+
+##### Performance tip


Why did you use 5th level? (4th level #### is not enough?)

Hakuyume · 2018-11-29T08:09:25Z

examples/classification/train_imagenet_multi.py

+
+    def __call__(self, in_data):
+        img, label = in_data
+        _, H, W = img.shape


H and W are not used?

Hakuyume · 2018-11-29T08:11:10Z

examples/classification/README.md

-## Performance
+## ImageNet
+
+### Weight conversion

 Single crop error rate.


For the consistency with Trained model, it would be nice to add something like ~ of the models converted from Caffe model.

Hakuyume · 2018-11-29T08:12:16Z

examples/classification/README.md

@@ -1,6 +1,8 @@
 # Classification

-## Performance
+## ImageNet


he and fb should be distinguished from each other in score board.

Hakuyume

LGTM

yuyu2172 added 9 commits September 28, 2017 16:39

Merge branch 'random-sized-crop' into resnet-train

d46f680

add train_imagenet

93e0bf9

Merge remote-tracking branch 'yuyu2172/random-sized-crop' into HEAD

a4a03d7

add train_imagenet_mn

2808c16

fix learning rate

2c3667a

add color_jitter

ea80eee

Merge remote-tracking branch 'yuyu2172/color-jitter' into HEAD

b5003a1

use color_jitter

63ae346

update readme

c0c536b

yuyu2172 changed the title ~~Add ResNet training code~~ [WIP] Add ResNet training code Sep 29, 2017

yuyu2172 added the feature label Sep 29, 2017

yuyu2172 added 11 commits September 30, 2017 06:58

Merge remote-tracking branch 'yuyu2172/random-sized-crop' into resnet…

9efd461

…-train

make training code follow Training ImageNet 1 hour paper

056ec6b

Merge remote-tracking branch 'yuyu2172/resnet-link' into resnet-train

89d1ba1

set initial gamma to zero for the last bn of block

03d4ac1

add corrected_momentum_sgd

193e47f

add observe_lr extension

1cb7d37

Merge remote-tracking branch 'yuyu2172/resnet-link' into resnet-train

007e937

Merge remote-tracking branch 'yuyu2172/resnet-link' into resnet-train

61f5936

Merge branch 'resnet-link' into resnet-train

9fdb6ab

merge

779a4d1

remove ResNet18

d39ebf9

yuyu2172 added 2 commits May 24, 2018 09:54

Merge remote-tracking branch 'origin/master' into HEAD

fb2e062

delete color_jitter

d357af3

yuyu2172 force-pushed the resnet-train branch from 9a637d8 to 991a21a Compare May 24, 2018 01:05

move corrected_momentum_sgd to chainer_experimental

c677fcb

yuyu2172 force-pushed the resnet-train branch from 991a21a to c677fcb Compare May 24, 2018 01:05

delete color_jitter from reference

d5c1cd3

yuyu2172 added 3 commits November 26, 2018 11:25

update README

315a645

add url link

7c73f05

change default arch to fb for eval

783ba36

yuyu2172 changed the title ~~[WIP] Add ResNet training code~~ Add ResNet training code Nov 26, 2018

yuyu2172 added this to the 0.12 milestone Nov 26, 2018

yuyu2172 assigned Hakuyume Nov 26, 2018

yuyu2172 force-pushed the resnet-train branch 4 times, most recently from 1b0a075 to 7abbaf1 Compare November 26, 2018 02:54

update README

fbcbe56

yuyu2172 force-pushed the resnet-train branch from 7abbaf1 to fbcbe56 Compare November 26, 2018 02:55

yuyu2172 added 2 commits November 26, 2018 11:55

typo

68479ef

update README

175e0fb

Hakuyume reviewed Nov 26, 2018

View reviewed changes

yuyu2172 added 3 commits November 26, 2018 14:14

add cv2 option

efdb580

fix doc

7ca130c

delete unnecessary options for iterators

084fac9

delete performance related stuff

1e19bdf

delete PlotReport

2b0dc98

Hakuyume reviewed Nov 29, 2018

View reviewed changes

yuyu2172 added 2 commits November 29, 2018 17:27

fix README

dd814dc

remove unnecessary

80f449e

Hakuyume approved these changes Nov 29, 2018

View reviewed changes

Hakuyume merged commit 9d8a68e into chainer:master Nov 29, 2018

yuyu2172 deleted the resnet-train branch November 29, 2018 10:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add ResNet training code #436

Add ResNet training code #436

yuyu2172 commented Sep 29, 2017 •

edited

Loading

yuyu2172 commented Nov 26, 2018

Hakuyume commented Nov 26, 2018

Hakuyume Nov 26, 2018

yuyu2172 commented Nov 26, 2018 •

edited

Loading

yuyu2172 commented Nov 26, 2018

Hakuyume Nov 29, 2018

Hakuyume Nov 29, 2018

Hakuyume Nov 29, 2018

Hakuyume Nov 29, 2018

Hakuyume left a comment


		The training procedure carefully follows the "ResNet in 1 hour" paper [5].

		##### Performance tip

Add ResNet training code #436

Add ResNet training code #436

Conversation

yuyu2172 commented Sep 29, 2017 • edited Loading

yuyu2172 commented Nov 26, 2018

Hakuyume commented Nov 26, 2018

Hakuyume Nov 26, 2018

Choose a reason for hiding this comment

yuyu2172 commented Nov 26, 2018 • edited Loading

yuyu2172 commented Nov 26, 2018

Hakuyume Nov 29, 2018

Choose a reason for hiding this comment

Hakuyume Nov 29, 2018

Choose a reason for hiding this comment

Hakuyume Nov 29, 2018

Choose a reason for hiding this comment

Hakuyume Nov 29, 2018

Choose a reason for hiding this comment

Hakuyume left a comment

Choose a reason for hiding this comment

yuyu2172 commented Sep 29, 2017 •

edited

Loading

yuyu2172 commented Nov 26, 2018 •

edited

Loading