Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature load matches #77

Open
wants to merge 11 commits into
base: master
Choose a base branch
from

Conversation

BenjaminBossan
Copy link
Collaborator

Loading features now matches the shapes of the layers using python's difflib.

…o that weights can be loaded even if architecture is not completely identical.
…al nets (see Xudong Cao); layer infos are saved in layer_infos_ attribute for potential later use. New dependency: tabulate.
…ameter custom_score=('mean abs error', mean_abs_error)).
…s, which allows to load weights into a different architecture, which would fail otherwise once the first layer does not match. This now requires the net to be initialized before loading.
@dnouri
Copy link
Owner

dnouri commented Apr 24, 2015

What was the argument again for not matching by names? I think that would be more transparent and robust.

(Also, please squash your commits, here's a guide: http://gitready.com/advanced/2009/02/10/squashing-commits-with-rebase.html)

@BenjaminBossan
Copy link
Collaborator Author

(yeah I can squash the commits next time.)

I would say that matching by names could be an additional option. The advantage here is that all is done automatically.

If you have two very deep CNNs, could you predict which layers' shapes will match? I cannot but I would have to if I wanted to match by name. Trust me, I'm often surprised by which layers' parameters match and which don't.

Furthermore, often some but not all parameters will match (bias but not weights), so you would have to match by parameter name, not just layer name! Can you make sure that parameter names are unique?

Finally, I cannot fathom a case where a user would want to load weights into a similar architecture but would want to exclude specific weights (which name matching may allow). Loaded weights should not be any worse than random weights.

@dnouri
Copy link
Owner

dnouri commented Apr 24, 2015

(please squash this time, it's easy)

Not sure why you think matching by names wouldn't be done automatically? Say in one case, you might want to start out with a convnet that's not as deep, and once it's trained, transfer the weights and continue learning with a deeper architecture. The names in your first net would maybe be:

  • conv1
  • conv2
  • pool3
  • ...
  • pool5
  • dense1
  • output

And your second net would look like this:

  • conv1
  • ...
  • pool7
  • dense1
  • output

In this case, the name matching would find all the matches up until pool5, ignore dense1 because it probably doesn't have the same number of weights, but probably copy weights for output again.

A similar example would be training an autoencoder and then changing the decoder part to do something else in a second model. I believe this would also "just work".

If you need to override the standard behaviour (the standard behaviour being easy to understand: "copy weights of layers with matching names if you can"), you could maybe add a dictionary for overrides. Say I want pool7 to copy weights from pool5 (let's imagine it's possible), then in an override dict you may specify {'pool7': 'pool5'}. In your automagick solution, there wouldn't be a way to do that: Either the diffing works or it doesn't.

I really don't see an advantage here of using the diff approach. As explained, my idea would be it's all automatic, too, plus more predictable and flexible.

@BenjaminBossan
Copy link
Collaborator Author

Okay, I misunderstood what you meant, I thought the user would have to spell out which layers to match with which.

Anyways, your approach would work fine as long as you only add new layers to the bottom. If I add a new layer at the top, all names would be incremented by 1 and no layer would match. The user would have to indicate all subsequent parameters.

What I can say for sure is that I have used my implementation locally for some time and it worked great. I have yet to encounter a situation where I would need the implementation that you proposed. And anyways, I would be all in favor for having the name matching in addition.

What would be great is if my implementation specified the layer names that were matched but I have not yet found a way to do this in a robust fashion.

@dnouri
Copy link
Owner

dnouri commented Apr 24, 2015

On Fr, Apr 24, 2015 at 2:30 , Benjamin Bossan
notifications@github.com wrote:

Okay, I misunderstood what you meant, I thought the user would have
to spell out which layers to match with which.

Anyways, your approach would work fine as long as you only add new
layers to the bottom. If I add a new layer at the top, all names
would be incremented by 1 and no layer would match. The user would
have to indicate all subsequent parameters.

There's a pretty straight-forward fix for that: just name your layers
explicitly.
What I can say for sure is that I have used my implementation locally
for some time and it worked great. I have yet to encounter a
situation where I would need the implementation that you proposed.
And anyways, I would be all in favor for having the name matching in
addition.

Life is too short to support two implementations of one feature. :-)

Admit that the name matching is more transparent. :-P

@BenjaminBossan
Copy link
Collaborator Author

Well my implementation is ready to go and yours is not, so if life is too short, go ahead and merge ;)

Seriously, my proposal is that you give it a try and see whether you will ever find yourself in need for a different solution.

dnouri added a commit that referenced this pull request Apr 26, 2015
Specifically, it will match layers by name and copy weights if names match,
instead of the brute-force approach using list indices, which doesn't work
reliably in some cases.

Aims to supercede #77.  Still needs better testing and possibly a way to
override some of the matching.
@bmilde
Copy link

bmilde commented May 5, 2015

"Finally, I cannot fathom a case where a user would want to load weights into a similar architecture but would want to exclude specific weights (which name matching may allow). Loaded weights should not be any worse than random weights."

The exact opposite is true for transfer learning. I have an upcoming paper on that with a ConvNet and here you absolutely need to only transfer weights of the first layers, but leave the last layers randomly initialised, if you use the same architecture. In a transfer learning setting with neural networks, loaded weights for all layers performs most of the time worse than totally random weights: c.f. "Direct transfer of learned information among neural networks", Pratt et al., 1991

"From Generic to Specific Deep Representations for Visual Recognition", Azizpour et al., 2015 http://arxiv.org/pdf/1406.5774.pdf also has some points why you want to leave some layers randomly initialised in transfer learning to gain anything from it.

@bmilde
Copy link

bmilde commented May 5, 2015

Worth pointing out though, that Benjamins implementation with _param_alignment(shapes0, shapes1) should handle the case correctly where you pass in weights partially, as far as I can tell. So should the implementation where you compare layers by their name.

@BenjaminBossan
Copy link
Collaborator Author

You are right, bmilde, I did not think about academic applications. My favourite solution would be to have 'load_params_from' accept an argument that determines how to merge: 'auto' (my solution), 'by_name' (Daniel's new solution), 'hard' (old implementation).

@bmilde
Copy link

bmilde commented May 7, 2015

+1 for more flexibility.

As a compromise, the additional argument could be: 'by_name' (default), 'by_shape', 'hard' (old implementation), since dnouri favours the by_name approach.

Speaking of additional arguments: an argument to specify the number of additional layers (or a list of layer names) to import would be very useful for transfer learning. This is also useful outside academica ;), e.g. by copying weights from a trained ImageNet model to your own CV problem with completely different classes and a limited number of examples.

dnouri added a commit that referenced this pull request May 8, 2015
Specifically, it will match layers by name and copy weights if names match,
instead of the brute-force approach using list indices, which doesn't work
reliably in some cases.

Aims to supercede #77.  Still needs better testing and possibly a way to
override some of the matching.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants