Openmp is not working #41

wittscien · 2022-11-18T14:04:42Z

I have posted a former version of this problem on the QUDA page, where I found plenty of time was wasted when calculating the propagators. And the time does not change essentially no matter how I change OMP_NUM_THREADS. This is because OpenMP is not working, at some stage. @SaltyChiang has pointed out on the QUDA page that CMakeLists.txt in the devel branch of qdpxx did not actually set QDP_USE_OMP_THREADS. I think this can be fixed in later versions.

I use the latest versions: QMP 2-5-4, QDP++ 1-46-0, QUDA 1.1.0, and Chroma 3-44-0, checked out to their development branch, and all build with CMake, with cc=mpicc and -fopenmp flag, and -DQDP_USE_OPENMP=ON, -DQUDA_OPENMP=ON, -DChroma_ENABLE_OPENMP=ON. The log of a typical propagator calculation shows a very low invertQuda / initQuda-endQuda ratio. If I use top to look at the process, I see clearly the Chroma program uses only one thread.

However, after I modified the CMakeLists.txt of qdpxx, the output of the Chroma program prints QDP use OpenMP threading. We have x threads as expected (it does not do so before the change), the program still uses only one thread. Are there any possible problems going on here? I have checked with a simple C++ program that OpenMP works on the cluster.

The text was updated successfully, but these errors were encountered:

fwinter · 2022-11-18T14:27:40Z

Hi Haobo. Sorry to hear you're experiencing performance problems. I can't comment much on threading in qdpxx but I know most users have switched to the more performant qdp++ implementation qdp-jit (https://github.com/JeffersonLab/qdp-jit, devel branch). Is there any reason you're using qdpxx instead of qdp-jit? I understand you're doing propagators using QUDA. This should work fine with qdp-jit. In case you're interested: I have a simple build package and there should be a version to build qdp-jit/chroma/quda.

https://github.com/fwinter/package

For a CUDA/QUDA build, for instance, you could do
./download_sources.sh
./download_sources_quda.sh
cd cuda-jit-quda/
./build_all.sh

EDIT: before building everything check for the correct 'sm_xx' version in build_quda.sh

wittscien · 2022-11-18T16:21:07Z

Thank you Frank, I don't have a specific preference. Yes, I use QUDA to calculate propagators, and I also use Chroma to write some contractions. I will try building with qdp-jit tomorrow. Is this going to solve the OpenMP problem? Thank you for sharing the build script! And why set -DQUDA_OPENMP=OFF (and not set it to ON in Chroma)?

fwinter · 2022-11-18T16:26:15Z

I believe so. qdp-jit doesn't use CPU multithreading. It uses the GPU for parallelization.
Good point. This switch could be turned on, I suppose.

wittscien · 2022-11-18T16:34:13Z

Thank you! I'll let you know after I tried the jit version.

wittscien · 2022-11-24T16:04:01Z

Sorry I took some time to build the required packages. I tried to build qdp-jit with llvm 13.0.0 and gcc 12.2.0 (using C++20 standard), and encountered an error (which repeat many times):
In file included from /.../gcc/12.2/include/c++/12.2.0/bits/unique_ptr.h:36, from /.../gcc/12.2/include/c++/12.2.0/memory:76, from /.../source/qdp-jit/lib/../include/qdp.h:77: /.../gcc/12.2/include/c++/12.2.0/tuple:1595:45: note：declaration of ‘struct std::array<QDP::PScalarJIT<QDP::RScalarJIT<QDP::WordJIT<float> > >, 1>’ 1595 | template<typename _Tp, size_t _Nm> struct array; | ^~~~~ /.../source/qdp-jit/lib/../include/qdp_basejit.h: In instantiation of ‘class QDP::BaseJIT<QDP::RScalarJIT<QDP::WordJIT<float> >, 1>’: /.../source/qdp-jit/lib/../include/qdp_primscalarjit.h:24:27: required from ‘class QDP::PScalarJIT<QDP::RScalarJIT<QDP::WordJIT<float> > >’ /.../source/qdp-jit/lib/../include/qdp_basejit.h:18:76: required from ‘class QDP::BaseJIT<QDP::PScalarJIT<QDP::RScalarJIT<QDP::WordJIT<float> > >, 1>’ /.../source/qdp-jit/lib/../include/qdp_primscalarjit.h:24:27: required from ‘class QDP::PScalarJIT<QDP::PScalarJIT<QDP::RScalarJIT<QDP::WordJIT<float> > > >’ /.../source/qdp-jit/lib/../include/qdp_viewleaf.h:35:28: required from here /.../source/qdp-jit/lib/../include/qdp_basejit.h:9:21: error：‘QDP::BaseJIT<T, N>::F’ has incomplete type 9 | std::array<T,N> F;
Where did I do wrong? Thanks in advance!

fwinter · 2022-11-24T16:37:21Z

I could reproduce this using gcc12. Before had used gcc11. I'll fix this and let you know..
EDIT: On another note, I had found that CUDA is somewhat restrictive about the GCC versions that it supports. To build QUDA with gcc11 I had to install the very latest CUDA (v11.8). If you have gcc11 at hand it might be worth a shot. But I will fix the gcc12 issue for sure.

wittscien · 2022-11-25T02:44:08Z

Thanks for the information. I have only CUDA 11.1 and gcc 12.2, I can ask the admin to install other versions later and try to build again. (The gcc I installed directly from the source in my own directory does not work.)

fwinter · 2022-11-25T16:26:51Z

Committed changes to qdp-jit for gcc12. Your CUDA version 11.1 might cause you trouble when it comes to building QUDA. My guess is you need the latest version.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Openmp is not working #41

Openmp is not working #41

wittscien commented Nov 18, 2022

fwinter commented Nov 18, 2022 •

edited

Loading

wittscien commented Nov 18, 2022

fwinter commented Nov 18, 2022

wittscien commented Nov 18, 2022

wittscien commented Nov 24, 2022

fwinter commented Nov 24, 2022 •

edited

Loading

wittscien commented Nov 25, 2022

fwinter commented Nov 25, 2022

Openmp is not working #41

Openmp is not working #41

Comments

wittscien commented Nov 18, 2022

fwinter commented Nov 18, 2022 • edited Loading

wittscien commented Nov 18, 2022

fwinter commented Nov 18, 2022

wittscien commented Nov 18, 2022

wittscien commented Nov 24, 2022

fwinter commented Nov 24, 2022 • edited Loading

wittscien commented Nov 25, 2022

fwinter commented Nov 25, 2022

fwinter commented Nov 18, 2022 •

edited

Loading

fwinter commented Nov 24, 2022 •

edited

Loading