-
Notifications
You must be signed in to change notification settings - Fork 58
nvtxConnector
NVIDIA Nsight profiling tools are commonly used to profile and debug CUDA programs on NVIDIA GPUs. When running a Kokkos parallel application program with the CUDA backend (run on an NVIDIA GPU), how can one obtain that same profiling and debugging information in a way that is meaningful with respect to the Kokkos application program? The Kokkos Tools nvtx-connector
provides a solution to this.
You can find the Kokkos Tools nvtx-connector at:
https://github.com/kokkos/kokkos-tools/tree/develop/profiling/nvtx-connector
When applying NVIDIA Nsight tools directly to a Kokkos parallel program, the output will give mangled names of Kokkos lbirary functions invoked. This is not meaningful or insightful to the user. The purpose of the Kokkos Tools nvtx-connector is to resolve mangling of names so that users can easily associate with each Kokkos kernel invoked the corresponding NVIDIA's Nsight profiling and debugging generated.
The tool redirects Kokkos Tools event callbacks to NVIDIA's Nsight compute profiling and profiling tools when using CUDA device backend of Kokkos. The tool does so by invoking nvtxPush/Push Region('name') for the begin/end callback respectively, where the kernel name is the name given by user, e.g., Kokkos::parallel_for("myGreatParallelFor", A, 102)
, the name myGreatParallelFor would show up in the NVIDIA insight profile.
In this way, profiles are shown with respect to Kokkos library function names as opposed to the mangled names that would otherwise be shown when using nvtx directly on Kokkos parallel program run on an NVIDIA GPU: this offers a more meaningful and easier to interpret profiling to the Kokkos user.
There are two ways to build the nvtx-connector: (1) using make
to create the library in the src directory and (2) using cmake
to create the library in your specified install directory. The method using cmake
build system is the recommended approach to building the connector.
To use the Makefile, simply go to the source code directory for nvtx-connector from the top-level Kokkos Tools directory, and then type make
. This will generate the nvtx-connector dynamic library file (.so on most machines and .dylib on Mac) within that source code directory. Specifically,
-
On the command-line type
cd profiling/nvtx-connector;
-
Go into the Makefile and check that the compiler being used is available and the one that you want, e.g.,
nvcc
,gcc
. If it is not using the correct compiler, change it. -
Finally, type
make;
which generateskp_nvtx_connector.so
in this directory.
Notes:
-
You may have type
make clean
beforemake
if you have made modifications to your Kokkos Tools connector kp_nvtx_connector.cpp. -
The Makefile is only touching the file kp_nvtx_connector.cpp and kp_nvtx_connector_domain.h` in this directory.
Note that this will require that you have a directory with Kokkos installation with a CUDA backend. When building with cmake, you should ensure you have passed to Kokkos_DIR
the path to the Kokkos installation which has that CUDA backend. The Kokkos Tools cmake checks whether Kokkos_ENABLE_CUDA
is set in order to enable to the nvtx-connector.
To use this connector, you must have an application that uses the Kokkos CUDA device backend. Note that the host backend can be serial, OpenMP or C++ threads. NVIDIA Nsight profiling can profile both the host and device backend. With this type:
export KOKKOS_TOOLS_LIBS=${INSTALL_Path_to_KTOOLS}/libkp_nvtx_connector.so
; ncu -o prof myKokkosApp.exe -t nvtx`
Alternatively, you can run NVIDIA NSight profiling by type:
ncu profile myKokkosApp.exe --kokkos-tools-libs=${INSTALL_Path_to_KTOOLS}/libkp_nvtx_connector.so
For more detailed information on NSight Systems ncu, check out the documentation at developer.nvidia.com/Nsight-compute/. You can also find an informative tutorial from July 2023 from LRZ at: https://doku.lrz.de/files/29609547/36864865/1/1689006900610/Intro_Nsight+Systems.pdf.
SAND2017-3786