-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Miniscule Nitpicks. (1) Bad default gain for model initialization (2) misapplication of nn.ModuleList
#38
Comments
Hey mudballs. I really liked your post but thought I'd add some slight additions that might justify their process slightly more / add a little bit more depth to your post.
edit: thanks for making me refresh my ML knowledge and look at some fun code |
Thanks for the explanation! Regarding (1), what your saying makes total sense. Regarding (2), my fault for not explaining this right. Running in parallel here means applying each block to the input and then concatenating the output tensors into a big tensor (which is subsequently processed by a MLP) |
Ah, I thought about (2) in terms of sequential/parallel computing. You certainly got it right! I see now in the constructor the |
Minuscule nitpicks.
Bad initialization gain value for ReLU nets.
The initial weights for this feedforward block
the-algorithm-ml/projects/home/recap/model/mask_net.py
Line 39 in 78c3235
the-algorithm-ml/projects/home/recap/model/mask_net.py
Line 10 in 78c3235
1.0
. The recommended gain value for ReLU nets issqrt(2)
though. I see the same issue in other files as well.the-algorithm-ml/projects/home/recap/model/mlp.py
Line 42 in 78c3235
Currently using
nn.ModuleList
when application calls fornn.Sequential
It seems that the masknet is composed of primitive blocks that are either called in parallel or in sequence. In the parallel mode, the use of
nn.ModuleList
to hold the blocks makes sense. However, in the sequential mode, it might be more efficient to organize the blocks usingnn.Sequential
instead. Currenty, both are done usingnn.Modulelist.
This doesn't affect the correctness of the code however, it should slightly reduce the runtime during sequential application. Currently, the model is running a pythonic for loop over the blocks in sequence herethe-algorithm-ml/projects/home/recap/model/mask_net.py
Line 94 in 78c3235
nn.Sequential
block would do the inference natively in C which should be faster and more scalable.Thanks for making the repo public btw!
The text was updated successfully, but these errors were encountered: