Feature load matches #77

BenjaminBossan · 2015-04-24T10:53:14Z

Loading features now matches the shapes of the layers using python's difflib.

…o that weights can be loaded even if architecture is not completely identical.

…al nets (see Xudong Cao); layer infos are saved in layer_infos_ attribute for potential later use. New dependency: tabulate.

…ulate in the log_ attribute.

…ameter custom_score=('mean abs error', mean_abs_error)).

…uts.

…s, which allows to load weights into a different architecture, which would fail otherwise once the first layer does not match. This now requires the net to be initialized before loading.

dnouri · 2015-04-24T10:54:56Z

What was the argument again for not matching by names? I think that would be more transparent and robust.

(Also, please squash your commits, here's a guide: http://gitready.com/advanced/2009/02/10/squashing-commits-with-rebase.html)

BenjaminBossan · 2015-04-24T11:09:01Z

(yeah I can squash the commits next time.)

I would say that matching by names could be an additional option. The advantage here is that all is done automatically.

If you have two very deep CNNs, could you predict which layers' shapes will match? I cannot but I would have to if I wanted to match by name. Trust me, I'm often surprised by which layers' parameters match and which don't.

Furthermore, often some but not all parameters will match (bias but not weights), so you would have to match by parameter name, not just layer name! Can you make sure that parameter names are unique?

Finally, I cannot fathom a case where a user would want to load weights into a similar architecture but would want to exclude specific weights (which name matching may allow). Loaded weights should not be any worse than random weights.

dnouri · 2015-04-24T11:23:01Z

(please squash this time, it's easy)

Not sure why you think matching by names wouldn't be done automatically? Say in one case, you might want to start out with a convnet that's not as deep, and once it's trained, transfer the weights and continue learning with a deeper architecture. The names in your first net would maybe be:

conv1
conv2
pool3
...
pool5
dense1
output

And your second net would look like this:

conv1
...
pool7
dense1
output

In this case, the name matching would find all the matches up until pool5, ignore dense1 because it probably doesn't have the same number of weights, but probably copy weights for output again.

A similar example would be training an autoencoder and then changing the decoder part to do something else in a second model. I believe this would also "just work".

If you need to override the standard behaviour (the standard behaviour being easy to understand: "copy weights of layers with matching names if you can"), you could maybe add a dictionary for overrides. Say I want pool7 to copy weights from pool5 (let's imagine it's possible), then in an override dict you may specify {'pool7': 'pool5'}. In your automagick solution, there wouldn't be a way to do that: Either the diffing works or it doesn't.

I really don't see an advantage here of using the diff approach. As explained, my idea would be it's all automatic, too, plus more predictable and flexible.

BenjaminBossan · 2015-04-24T12:30:51Z

Okay, I misunderstood what you meant, I thought the user would have to spell out which layers to match with which.

Anyways, your approach would work fine as long as you only add new layers to the bottom. If I add a new layer at the top, all names would be incremented by 1 and no layer would match. The user would have to indicate all subsequent parameters.

What I can say for sure is that I have used my implementation locally for some time and it worked great. I have yet to encounter a situation where I would need the implementation that you proposed. And anyways, I would be all in favor for having the name matching in addition.

What would be great is if my implementation specified the layer names that were matched but I have not yet found a way to do this in a robust fashion.

dnouri · 2015-04-24T12:34:40Z

On Fr, Apr 24, 2015 at 2:30 , Benjamin Bossan
notifications@github.com wrote:

Okay, I misunderstood what you meant, I thought the user would have
to spell out which layers to match with which.

Anyways, your approach would work fine as long as you only add new
layers to the bottom. If I add a new layer at the top, all names
would be incremented by 1 and no layer would match. The user would
have to indicate all subsequent parameters.

There's a pretty straight-forward fix for that: just name your layers
explicitly.
What I can say for sure is that I have used my implementation locally
for some time and it worked great. I have yet to encounter a
situation where I would need the implementation that you proposed.
And anyways, I would be all in favor for having the name matching in
addition.

Life is too short to support two implementations of one feature. :-)

Admit that the name matching is more transparent. :-P

BenjaminBossan · 2015-04-24T12:45:18Z

Well my implementation is ready to go and yours is not, so if life is too short, go ahead and merge ;)

Seriously, my proposal is that you give it a try and see whether you will ever find yourself in need for a different solution.

Specifically, it will match layers by name and copy weights if names match, instead of the brute-force approach using list indices, which doesn't work reliably in some cases. Aims to supercede #77. Still needs better testing and possibly a way to override some of the matching.

bmilde · 2015-05-05T13:18:13Z

"Finally, I cannot fathom a case where a user would want to load weights into a similar architecture but would want to exclude specific weights (which name matching may allow). Loaded weights should not be any worse than random weights."

The exact opposite is true for transfer learning. I have an upcoming paper on that with a ConvNet and here you absolutely need to only transfer weights of the first layers, but leave the last layers randomly initialised, if you use the same architecture. In a transfer learning setting with neural networks, loaded weights for all layers performs most of the time worse than totally random weights: c.f. "Direct transfer of learned information among neural networks", Pratt et al., 1991

"From Generic to Specific Deep Representations for Visual Recognition", Azizpour et al., 2015 http://arxiv.org/pdf/1406.5774.pdf also has some points why you want to leave some layers randomly initialised in transfer learning to gain anything from it.

bmilde · 2015-05-05T13:34:08Z

Worth pointing out though, that Benjamins implementation with _param_alignment(shapes0, shapes1) should handle the case correctly where you pass in weights partially, as far as I can tell. So should the implementation where you compare layers by their name.

BenjaminBossan · 2015-05-06T18:55:16Z

You are right, bmilde, I did not think about academic applications. My favourite solution would be to have 'load_params_from' accept an argument that determines how to merge: 'auto' (my solution), 'by_name' (Daniel's new solution), 'hard' (old implementation).

bmilde · 2015-05-07T13:49:01Z

+1 for more flexibility.

As a compromise, the additional argument could be: 'by_name' (default), 'by_shape', 'hard' (old implementation), since dnouri favours the by_name approach.

Speaking of additional arguments: an argument to specify the number of additional layers (or a list of layer names) to import would be very useful for transfer learning. This is also useful outside academica ;), e.g. by copying weights from a trained ImageNet model to your own CV problem with completely different classes and a limited number of examples.

Specifically, it will match layers by name and copy weights if names match, instead of the brute-force approach using list indices, which doesn't work reliably in some cases. Aims to supercede #77. Still needs better testing and possibly a way to override some of the matching.

BenjaminBossan added 11 commits April 24, 2015 09:25

Added verbose load_weights_from that also matches weights by shape, s…

904b772

…o that weights can be loaded even if architecture is not completely identical.

Nets have to be initialized manually before loading weights.

0f7e502

More detailed architecture information is now printed for convolution…

4f60d9e

…al nets (see Xudong Cao); layer infos are saved in layer_infos_ attribute for potential later use. New dependency: tabulate.

Updated requirements: tabulate.

edc4f32

Training info now in a slightly different format, is logged using tab…

c48720e

…ulate in the log_ attribute.

Added possibility to add custom score to the train info (e.g. use par…

92c21c7

…ameter custom_score=('mean abs error', mean_abs_error)).

Unnecessary check.

3288f96

Increased test coverage for lasagne, mainly covering the verbose outp…

af10846

…uts.

Restore master branch as in origin nolearn.

f40eea2

Rewind commit to state of master.

b9a7eec

Added dynamic matching of parameter shapes when loading stored weight…

017f599

…s, which allows to load weights into a different architecture, which would fail otherwise once the first layer does not match. This now requires the net to be initialized before loading.

dnouri mentioned this pull request Apr 26, 2015

A new 'load_params_from' is more clever about loading weights. #79

Merged

dnouri force-pushed the master branch from 4e8b980 to d85735e Compare May 30, 2015 20:48

dnouri force-pushed the master branch from de24a8c to db12c84 Compare August 17, 2015 09:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature load matches #77

Feature load matches #77

BenjaminBossan commented Apr 24, 2015

dnouri commented Apr 24, 2015

BenjaminBossan commented Apr 24, 2015

dnouri commented Apr 24, 2015

BenjaminBossan commented Apr 24, 2015

dnouri commented Apr 24, 2015

BenjaminBossan commented Apr 24, 2015

bmilde commented May 5, 2015

bmilde commented May 5, 2015

BenjaminBossan commented May 6, 2015

bmilde commented May 7, 2015

Feature load matches #77

Are you sure you want to change the base?

Feature load matches #77

Conversation

BenjaminBossan commented Apr 24, 2015

dnouri commented Apr 24, 2015

BenjaminBossan commented Apr 24, 2015

dnouri commented Apr 24, 2015

BenjaminBossan commented Apr 24, 2015

dnouri commented Apr 24, 2015

BenjaminBossan commented Apr 24, 2015

bmilde commented May 5, 2015

bmilde commented May 5, 2015

BenjaminBossan commented May 6, 2015

bmilde commented May 7, 2015