-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Training in C++? #8
Comments
I was interested in implementing the NN programs in C++, and I want to improve my coding ability in C++, so I decided to write this code. I found a strange result that there are cases in which it runs faster in Python than in C++.
Training in C++ is slow when the GPU is used. I have heard reports that the speed in C++ is faster when training only fully connected layers. |
Huge thanks to you for these interesting insights. I suspect the reason for high training time on C++ could be under optimized data pipeline or under-optimized-and-serialized CPU-to-GPU and GPU-to-CPU data copy instead of parallel async copy. But, still further investigation might help. I was also curious about this as many people used to train models on C++ and I wonder what on earth forced them to do this. :-P |
This is a follow-up report. I benchmarked again using the following three kinds of Neural Networks in PyTorch v1.8.0. My article for details (Japanese only): https://qiita.com/koba-jon/items/59a64c6ec38ac7286d6b
As above, compared to before, the speed of "AE2d" in C++ is much faster and improved. Looking at the details, it seems that the convolutional layer and the transposed convolutional layer are bad. I look forward to future improvements in the PyTorch C++ API for models of "2" and "3". |
Thanks a lot for such a detailed experiments. One more thing that I would like to share to you that I have recently discovered, when transferring the training data from RAM to GPU memory, people generally use the concept of Pinned memory and that is a designated area of RAM from which memory copy into GPU memory is faster. I have seen this while working with TensorRT related operations in C++ where they allocate an input tensor memory on pinned memory area and once the data is there on this pinned memory they will call memcpy command to copy data from this pinned memory area to GPU for further computation. This may again give you some boost in C++ timings. PS : Please create a paper / medium article of your findings along side this qiita.com blog. Because world needs to know about your findings. Keep it up. |
Thank you for sharing such information. Please look forward to follow-up report. |
Hey, you have showed such an amazing work to train NNs in C++.
I would like to know what are the reasons for which you started training models in C++ instead of python? Because once the model definition has been written in pytorch in python and data pipeline has been setup, all the computation needs to be done will be done on GPU. So there wont be much drastic performance gains when migrating from python to c++. Please tell me some of your thoughts on this.
The text was updated successfully, but these errors were encountered: