-
-
Notifications
You must be signed in to change notification settings - Fork 63
Profiling
In addition to our tracing tools for C and C++, profiling can be a valuable tool for analyzing code performance and finding bottlenecks. This page gives a brief overview of some profiling tools and their usage.
Valgrind is mostly known for its memcheck tool, but it also comes with a code profiler called callgrind and a tool for analyzing cache misses called cachegrind. To use these tools, the binary that should be analyzed simply needs to be run like this: valgrind --tool=callgrind <binary> <args...>?
. The cachegrind tool can be used simply by replacing callgrind
with cachegrind
. Both tools will produce an output file callgrind.out.nnnnnn
or cachegrind.out.nnnnnn
where nnnnnn
denotes an integer number. These files can be opened with a tool called kcachegrind
. kcachegrind
provides detailed information on how often each function is called and for how many percent of the overall execution time this function is responsible. For each function, kcachegrind can show a call graph (see the screenshot below) and also provide detailed performance characteristics down to the instruction level. The output of kcachegrind is most useful if debug symbols are present. Therefore, a LF program should be compiled with the target property build-type: RelWithDebInfo
or passing the --build-type RelWithDebInfo
command line argument to lfc.
The Linux perf
tool, short for Performance Counters for Linux (PCL), is a versatile performance monitoring and profiling tool available in the Linux kernel. It allows users to analyze various aspects of system and application performance, such as CPU usage, cache misses, branch mispredictions, and other hardware-level events. The tool is useful for identifying performance bottlenecks, optimizing code, and tuning system settings. This tool is only available on Linux, however.
Follow these steps to profile the performance of an LF application.
-
Compile the LF program using
-g
flag. This can be done by runningcmake
withCMAKE_BUILD_TYPE
set toDebug
in thesrc-gen
folder of the LF application. -
Run the LF program with
perf record
. One specific command that works well is
sudo perf record -g --user-callchains -F max -- <lf_bin> -o <timeout_value>
where -g
enables call graph, --user-callchains
only shows activities from user space, -F max
sets the sampling frequency to maximum, --
separates the perf
flags from the user application commands.
-
Generate a human-readable report using
sudo perf script > <report_name>
-
Render a graphical view using the Firefox Profiler (or other UIs). Open the Firefox Profiler and upload the human-readable report file (
<report_name>
in the previous step). You should then see something like this.
The perf
tool can do more things such as profiling the memory performance. We refer the user to the perf
man page for more info.