-
Notifications
You must be signed in to change notification settings - Fork 260
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature load matches #77
base: master
Are you sure you want to change the base?
Feature load matches #77
Conversation
…o that weights can be loaded even if architecture is not completely identical.
…al nets (see Xudong Cao); layer infos are saved in layer_infos_ attribute for potential later use. New dependency: tabulate.
…ulate in the log_ attribute.
…ameter custom_score=('mean abs error', mean_abs_error)).
…s, which allows to load weights into a different architecture, which would fail otherwise once the first layer does not match. This now requires the net to be initialized before loading.
What was the argument again for not matching by names? I think that would be more transparent and robust. (Also, please squash your commits, here's a guide: http://gitready.com/advanced/2009/02/10/squashing-commits-with-rebase.html) |
(yeah I can squash the commits next time.) I would say that matching by names could be an additional option. The advantage here is that all is done automatically. If you have two very deep CNNs, could you predict which layers' shapes will match? I cannot but I would have to if I wanted to match by name. Trust me, I'm often surprised by which layers' parameters match and which don't. Furthermore, often some but not all parameters will match (bias but not weights), so you would have to match by parameter name, not just layer name! Can you make sure that parameter names are unique? Finally, I cannot fathom a case where a user would want to load weights into a similar architecture but would want to exclude specific weights (which name matching may allow). Loaded weights should not be any worse than random weights. |
(please squash this time, it's easy) Not sure why you think matching by names wouldn't be done automatically? Say in one case, you might want to start out with a convnet that's not as deep, and once it's trained, transfer the weights and continue learning with a deeper architecture. The names in your first net would maybe be:
And your second net would look like this:
In this case, the name matching would find all the matches up until pool5, ignore dense1 because it probably doesn't have the same number of weights, but probably copy weights for output again. A similar example would be training an autoencoder and then changing the decoder part to do something else in a second model. I believe this would also "just work". If you need to override the standard behaviour (the standard behaviour being easy to understand: "copy weights of layers with matching names if you can"), you could maybe add a dictionary for overrides. Say I want pool7 to copy weights from pool5 (let's imagine it's possible), then in an override dict you may specify I really don't see an advantage here of using the diff approach. As explained, my idea would be it's all automatic, too, plus more predictable and flexible. |
Okay, I misunderstood what you meant, I thought the user would have to spell out which layers to match with which. Anyways, your approach would work fine as long as you only add new layers to the bottom. If I add a new layer at the top, all names would be incremented by 1 and no layer would match. The user would have to indicate all subsequent parameters. What I can say for sure is that I have used my implementation locally for some time and it worked great. I have yet to encounter a situation where I would need the implementation that you proposed. And anyways, I would be all in favor for having the name matching in addition. What would be great is if my implementation specified the layer names that were matched but I have not yet found a way to do this in a robust fashion. |
On Fr, Apr 24, 2015 at 2:30 , Benjamin Bossan
Admit that the name matching is more transparent. :-P |
Well my implementation is ready to go and yours is not, so if life is too short, go ahead and merge ;) Seriously, my proposal is that you give it a try and see whether you will ever find yourself in need for a different solution. |
Specifically, it will match layers by name and copy weights if names match, instead of the brute-force approach using list indices, which doesn't work reliably in some cases. Aims to supercede #77. Still needs better testing and possibly a way to override some of the matching.
"Finally, I cannot fathom a case where a user would want to load weights into a similar architecture but would want to exclude specific weights (which name matching may allow). Loaded weights should not be any worse than random weights." The exact opposite is true for transfer learning. I have an upcoming paper on that with a ConvNet and here you absolutely need to only transfer weights of the first layers, but leave the last layers randomly initialised, if you use the same architecture. In a transfer learning setting with neural networks, loaded weights for all layers performs most of the time worse than totally random weights: c.f. "Direct transfer of learned information among neural networks", Pratt et al., 1991 "From Generic to Specific Deep Representations for Visual Recognition", Azizpour et al., 2015 http://arxiv.org/pdf/1406.5774.pdf also has some points why you want to leave some layers randomly initialised in transfer learning to gain anything from it. |
Worth pointing out though, that Benjamins implementation with _param_alignment(shapes0, shapes1) should handle the case correctly where you pass in weights partially, as far as I can tell. So should the implementation where you compare layers by their name. |
You are right, bmilde, I did not think about academic applications. My favourite solution would be to have 'load_params_from' accept an argument that determines how to merge: 'auto' (my solution), 'by_name' (Daniel's new solution), 'hard' (old implementation). |
+1 for more flexibility. As a compromise, the additional argument could be: 'by_name' (default), 'by_shape', 'hard' (old implementation), since dnouri favours the by_name approach. Speaking of additional arguments: an argument to specify the number of additional layers (or a list of layer names) to import would be very useful for transfer learning. This is also useful outside academica ;), e.g. by copying weights from a trained ImageNet model to your own CV problem with completely different classes and a limited number of examples. |
Specifically, it will match layers by name and copy weights if names match, instead of the brute-force approach using list indices, which doesn't work reliably in some cases. Aims to supercede #77. Still needs better testing and possibly a way to override some of the matching.
Loading features now matches the shapes of the layers using python's difflib.