Skip to content

nvtxConnector

Vivek Kale edited this page Jul 17, 2024 · 9 revisions

Summary

NVIDIA Nsight profiling tools are commonly used to profile and debug CUDA programs on NVIDIA GPUs. When running a Kokkos parallel application program with the CUDA backend (run on an NVIDIA GPU), how can one obtain that same profiling and debugging information in a way that is meaningful with respect to the Kokkos application program? The Kokkos Tools nvtx-connector provides a solution to this.

You can find the Kokkos Tools nvtx-connector at:

https://github.com/kokkos/kokkos-tools/tree/develop/profiling/nvtx-connector

Purpose

When applying NVIDIA Nsight tools directly to a Kokkos parallel program, the output will give mangled names of Kokkos lbirary functions invoked. This is not meaningful or insightful to the user. The purpose of the Kokkos Tools nvtx-connector is to resolve mangling of names so that users can easily associate with each Kokkos kernel invoked the corresponding NVIDIA's Nsight profiling and debugging generated.

Functionality

The tool redirects Kokkos Tools event callbacks to NVIDIA's Nsight compute profiling and profiling tools when using CUDA device backend of Kokkos. The tool does so by invoking nvtxPush/Push Region('name') for the begin/end callback respectively, where the kernel name is the name given by user, e.g., Kokkos::parallel_for("myGreatParallelFor", A, 102), the name myGreatParallelFor would show up in the NVIDIA insight profile.

In this way, profiles are shown with respect to Kokkos library function names as opposed to the mangled names that would otherwise be shown when using nvtx directly on Kokkos parallel program run on an NVIDIA GPU: this offers a more meaningful and easier to interpret profiling to the Kokkos user.

Building

There are two ways to build the nvtx-connector: (1) using make to create the library in the src directory and (2) using cmake to create the library in your specified install directory. The method using cmake build system is the recommended approach to building the connector.

Using make

To use the Makefile, simply go to the source code directory for nvtx-connector from the top-level Kokkos Tools directory, and then type make. This will generate the nvtx-connector dynamic library file (.so on most machines and .dylib on Mac) within that source code directory. Specifically,

  1. On the command-line type cd profiling/nvtx-connector;

  2. Go into the Makefile and check that the compiler being used is available and the one that you want, e.g., nvcc, gcc. If it is not using the correct compiler, change it.

  3. Finally, type make; which generates kp_nvtx_connector.so in this directory.

Notes:

  • You may have type make clean before make if you have made modifications to your Kokkos Tools connector kp_nvtx_connector.cpp.

  • The Makefile is only touching the file kp_nvtx_connector.cpp and kp_nvtx_connector_domain.h` in this directory.

Using cmake

Note that this will require that you have a directory with Kokkos installation with a CUDA backend. When building with cmake, you should ensure you have passed to Kokkos_DIR the path to the Kokkos installation which has that CUDA backend. The Kokkos Tools cmake checks whether Kokkos_ENABLE_CUDA is set in order to enable to the nvtx-connector.

Usage

To use this connector, you must have an application that uses the Kokkos CUDA device backend. Note that the host backend can be serial, OpenMP or C++ threads. NVIDIA Nsight profiling can profile both the host and device backend. With this type:

export KOKKOS_TOOLS_LIBS=${INSTALL_Path_to_KTOOLS}/libkp_nvtx_connector.so ; ncu -o prof myKokkosApp.exe -t nvtx`

Alternatively, you can run NVIDIA NSight profiling by type:

ncu profile myKokkosApp.exe --kokkos-tools-libs=${INSTALL_Path_to_KTOOLS}/libkp_nvtx_connector.so

For more detailed information on NSight Systems ncu, check out the documentation at developer.nvidia.com/Nsight-compute/. You can also find an informative tutorial from July 2023 from LRZ at: https://doku.lrz.de/files/29609547/36864865/1/1689006900610/Intro_Nsight+Systems.pdf.

Clone this wiki locally