-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Openmp is not working #41
Comments
Hi Haobo. Sorry to hear you're experiencing performance problems. I can't comment much on threading in qdpxx but I know most users have switched to the more performant qdp++ implementation qdp-jit (https://github.com/JeffersonLab/qdp-jit, devel branch). Is there any reason you're using qdpxx instead of qdp-jit? I understand you're doing propagators using QUDA. This should work fine with qdp-jit. In case you're interested: I have a simple build package and there should be a version to build qdp-jit/chroma/quda. https://github.com/fwinter/package For a CUDA/QUDA build, for instance, you could do EDIT: before building everything check for the correct 'sm_xx' version in |
Thank you Frank, I don't have a specific preference. Yes, I use QUDA to calculate propagators, and I also use Chroma to write some contractions. I will try building with qdp-jit tomorrow. Is this going to solve the OpenMP problem? Thank you for sharing the build script! And why set -DQUDA_OPENMP=OFF (and not set it to ON in Chroma)? |
I believe so. qdp-jit doesn't use CPU multithreading. It uses the GPU for parallelization. |
Thank you! I'll let you know after I tried the jit version. |
Sorry I took some time to build the required packages. I tried to build qdp-jit with |
I could reproduce this using gcc12. Before had used gcc11. I'll fix this and let you know.. |
Thanks for the information. I have only |
Committed changes to qdp-jit for gcc12. Your CUDA version 11.1 might cause you trouble when it comes to building QUDA. My guess is you need the latest version. |
I have posted a former version of this problem on the QUDA page, where I found plenty of time was wasted when calculating the propagators. And the time does not change essentially no matter how I change
OMP_NUM_THREADS
. This is because OpenMP is not working, at some stage. @SaltyChiang has pointed out on the QUDA page thatCMakeLists.txt
in the devel branch of qdpxx did not actually setQDP_USE_OMP_THREADS
. I think this can be fixed in later versions.I use the latest versions: QMP 2-5-4, QDP++ 1-46-0, QUDA 1.1.0, and Chroma 3-44-0, checked out to their development branch, and all build with CMake, with
cc=mpicc
and-fopenmp
flag, and-DQDP_USE_OPENMP=ON
,-DQUDA_OPENMP=ON
,-DChroma_ENABLE_OPENMP=ON
. The log of a typical propagator calculation shows a very low invertQuda / initQuda-endQuda ratio. If I usetop
to look at the process, I see clearly the Chroma program uses only one thread.However, after I modified the
CMakeLists.txt
of qdpxx, the output of the Chroma program printsQDP use OpenMP threading. We have x threads
as expected (it does not do so before the change), the program still uses only one thread. Are there any possible problems going on here? I have checked with a simple C++ program that OpenMP works on the cluster.The text was updated successfully, but these errors were encountered: