Skip to content

Profiling

Shaokai (Jerry) Lin edited this page Apr 24, 2023 · 4 revisions

In addition to our tracing tools for C and C++, profiling can be a valuable tool for analyzing code performance and finding bottlenecks. This page gives a brief overview of some profiling tools and their usage.

Valgrind

Valgrind is mostly known for its memcheck tool, but it also comes with a code profiler called callgrind and a tool for analyzing cache misses called cachegrind. To use these tools, the binary that should be analyzed simply needs to be run like this: valgrind --tool=callgrind <binary> <args...>?. The cachegrind tool can be used simply by replacing callgrind with cachegrind. Both tools will produce an output file callgrind.out.nnnnnn or cachegrind.out.nnnnnn where nnnnnn denotes an integer number. These files can be opened with a tool called kcachegrind. kcachegrind provides detailed information on how often each function is called and for how many percent of the overall execution time this function is responsible. For each function, kcachegrind can show a call graph (see the screenshot below) and also provide detailed performance characteristics down to the instruction level. The output of kcachegrind is most useful if debug symbols are present. Therefore, a LF program should be compiled with the target property build-type: RelWithDebInfo or passing the --build-type RelWithDebInfo command line argument to lfc. Screenshot_20230421_122635

Linux Perf

The Linux perf tool, short for Performance Counters for Linux (PCL), is a versatile performance monitoring and profiling tool available in the Linux kernel. It allows users to analyze various aspects of system and application performance, such as CPU usage, cache misses, branch mispredictions, and other hardware-level events. The tool is useful for identifying performance bottlenecks, optimizing code, and tuning system settings. This tool is only available on Linux, however.

Follow these steps to profile the performance of an LF application.

  1. Compile the LF program using -g flag. This can be done by running cmake with CMAKE_BUILD_TYPE set to Debug in the src-gen folder of the LF application.

  2. Run the LF program with perf record. One specific command that works well is

sudo perf record -g --user-callchains -F max -- <lf_bin> -o <timeout_value>

where -g enables call graph, --user-callchains only shows activities from user space, -F max sets the sampling frequency to maximum, -- separates the perf flags from the user application commands.

  1. Generate a human-readable report using sudo perf script > <report_name>

  2. Render a graphical view using the Firefox Profiler (or other UIs). Open the Firefox Profiler and upload the human-readable report file (<report_name> in the previous step). You should then see something like this.

Screen Shot 2023-04-23 at 11 06 25 PM

The perf tool can do more things such as profiling the memory performance. We refer the user to the perf man page for more info.