proftool
is a utility for capturing and examining profiles of Java
programs. It's intended to provide machine level details about the execution to
aid with JIT performance analysis. It currently consists of three primary components:
- a JVMTI agent for capturing all the assembly generated by the JVM.
- a parser for Linux perf output
- a parser for HotSpot LogCompilation information
By combining these components into a single command line, the perf profile information can be attributed to the JIT code. Proftool can also capture and examine profiles of standalone executables built with Native Image. Profile collection currently only supports Linux perf though once the data is captured then the profile can be viewed anywhere.
This suite must be built first to create the JVMTI agent library and the suite is normally dynamically imported to make the commands available through mx.
There are currently 2 ways to capture profiles. The direct way is to use the profrecord
command which accepts either
a full Java command line, an executable compiled by Native Image, or a Truffle language launcher command line.
profrecord
launches the command line with extra arguments to collect the data.
$ mx profrecord -E fop /home/graal/oraclejdk1.8.0_291-jvmci-21.1-b03/bin/java -jar dacapo.jar fop -n 56
$ mx profrecord -E myprofile /.../graalvm/bin/js ...
$ mx profrecord -E image_profile ./my_native_image ...
The required argument -E <name>
specifies the directory that will contain the output the files. It's an error if it
already exists but the -O
will overwrite the directory if it already exists.
mx profrecord
should work with any Java command line you can supply.
The second way is to use the proftool
profiler with the mx benchmark
command. The proftool
profiler
conflicts with the rss
tracker which is enabled by default, so it currently must be explicitly turned off. For example, running:
$ mx benchmark dacapo:fop --tracker none -- --profiler proftool
produces a uniquely named directory like proftool_fop_2021-07-11_181530
which contains a profile of the full
benchmark run. Note that the --tracker
and --profiler
arguments are on different sides of the --
separator.
The same process applies to Native Image benchmarks, e.g.:
$ mx --env ni-ce benchmark dacapo-native-image:fop --tracker none -- --jvm=native-image --jvm-config=default-ce \
--profiler proftool
After capturing the profile, the perf binary output needs to be converted to text
using the profpackage
command because the current benchmark profiler infrastructure can't do that automatically.
$ mx profpackage proftool_fop_2021-07-11_181530
Created /home/graal/ws/graal/compiler/proftool_fop_2021-07-11_181530.zip
This command additionally packs everything into a zip to make it easy to move profiles around,
though that step can be skipped with the -n
option. Both the zip and directory form can be used directly
by all the proftool
commands so use whichever form is most convenient. Packaging as a zip it intended to
simplify capturing the profile on a Linux machine then moving it to another machine for analysis.
Not all benchmark suites actually support the --profiler
option even though it's broadly advertised in the
help output. At the current time, only the dacapo
, scala-dacapo
, renaissance
and renaissance-legacy
suites fully support the --profiler
option. Their Native Image counterparts (e.g., dacapo-native-image
) are also
supported. The JMH
benchmarks only correctly support it when the JMH option -f 0
is used to suppress forking by the
harness. Note that this changes the way the benchmark is run and might not produce the same results. With JMH
you
can always fall back to the JMH
perfasm
profiler which is sufficient for most uses.
For other benchmarks, you can always use mx profrecord
directly by capturing the command
line used to launch the benchmark with mx -v benchmark ...
and then passing that to mx profrecord
. For example,
for the fop benchmark above:
$ mx -v benchmark dacapo:fop --tracker none
displays a very long command line in the middle of all the output that includes something like the follow:
/home/tkrodrig/ws/unroll/graal/compiler/mxbuild/linux-amd64/graaljdks/jdk1.8-cmp/bin/java -server ... \
-jar /home/graal/.mx/cache/DACAPO_e39957904b7e79caf4fa54f30e8e4ee74d4e9e37/dacapo.jar fop -n 56
For compactness, the sample output above doesn't include all required arguments.
Copy and paste that line to use it with mx profrecord
:
$ mx profrecord -E fop /home/tkrodrig/ws/unroll/graal/compiler/mxbuild/linux-amd64/graaljdks/jdk1.8-cmp/bin/java -server ... \
-jar /home/graal/.mx/cache/DACAPO_e39957904b7e79caf4fa54f30e8e4ee74d4e9e37/dacapo.jar fop -n 56
The primary command for examining profiles is profhot
. This gives a summary
of the top C functions and the top JIT methods. The JIT methods are additionally disassembled and
annotated with the profile information. Truncated sample output looks like this:
$ mx profhot proftool_fop_2021-07-11_181530.zip
Hot C functions:
Percent Name
5.39% SpinPause
2.07% ParMarkBitMap::mark_obj
1.98% ParallelCompactData::add_obj
0.95% GreyObjectsWalker_walkGreyObjects_e59a80a81022647b793f2f995d251cf18cbcbab4
0.79% ControlFlowGraph_identifyBlocks_91c6bd2376323c10cda627f1ed5130f90d88a82f
0.69% EconomicMapImpl_compareKeys_139d60c35d64f232b5b9fac87391585f45801e09
0.67% ArrayList_add_43f6399c3b46d50c69e7ebaee5e44ba3b6a870f3
0.58% ControlFlowGraph_identifyBlock_7f27028f31036b771721050156eba6ce33899dfa
0.55% SchedulePhase$Instance_scheduleEarliestIterative_f11f1d84d698be305ec40ea50fa995f7185d393f
0.54% EconomicMapImpl_find_70589f86ac2abbc40566645a1af83af4aa5c89f0
Hot generated code:
Percent Compile Id Name
1.97% 2557 sun.misc.FloatingDecimal$BinaryToASCIIBuffer.dtoa(int, long, int, boolean)
1.86% 3101 org.apache.fop.render.ps.PSRenderer.renderText(org.apache.fop.area.inline.AbstractTextArea, java.lang.String, int[])
1.27% Interpreter
1.12% 3353 org.apache.fop.fo.FObj.addChildNode(org.apache.fop.fo.FONode)
0.75% 3112 java.text.DigitList.set(boolean, double, int, boolean)
0.63% 1357 org.apache.fop.fo.properties.PropertyMaker.get(int, org.apache.fop.fo.PropertyList, boolean, boolean)
0.61% 3145 org.apache.fop.layoutmgr.inline.TextLayoutManager.getNextKnuthElements(org.apache.fop.layoutmgr.LayoutContext, int)
0.51% 3110 java.text.DecimalFormat.subformat(java.lang.StringBuffer, java.text.Format$FieldDelegate, boolean, boolean, int, int, int, int)
0.34% 2787 org.apache.fop.layoutmgr.inline.TextLayoutManager.addAreas(org.apache.fop.layoutmgr.PositionIterator, org.apache.fop.layoutmgr.LayoutContext)
0.32% 2738 org.apache.fop.layoutmgr.inline.TextLayoutManager.createTextArea(org.apache.fop.traits.MinOptMax, int, org.apache.fop.layoutmgr.LayoutContext, int, int, int, boolean)
sun.misc.FloatingDecimal$BinaryToASCIIBuffer.dtoa(int, long, int, boolean)
0x7f7a69251e80-0x7f7a69252d38 (samples=613, period=2591391597)
Hot region 1
0x7f7a69252082: sub ecx, edi ; sun.misc.FloatingDecimal$BinaryToASCIIBuffer.dtoa:252
0x7f7a69252084: mov edx, ecx ; sun.misc.FloatingDecimal$BinaryToASCIIBuffer.dtoa:287
0x7f7a69252086: add edx, r13d
0x7f7a69252089: add edx, dword ptr [rax + rsi*4 + 0x10] ; sun.misc.FloatingDecimal$BinaryToASCIIBuffer.dtoa:307
0.16% 0x7f7a6925208d: cmp edx, 0x40 ; sun.misc.FloatingDecimal$BinaryToASCIIBuffer.dtoa:330
0x7f7a69252090: jge 0x7f7a69252a62
0x7f7a69252096: sub r10d, r14d ; sun.misc.FloatingDecimal$BinaryToASCIIBuffer.dtoa:372
0x7f7a69252099: sub r10d, edi ; sun.misc.FloatingDecimal$BinaryToASCIIBuffer.dtoa:259
0x7f7a6925209c: mov edi, r10d ; sun.misc.FloatingDecimal$BinaryToASCIIBuffer.dtoa:294
0x7f7a6925209f: add edi, dword ptr [rax + rbx*4 + 0x10]
0x7f7a692520a3: inc edi
0x7f7a692520a5: cmp edi, 0x40
0x7f7a692520a8: jge 0x7f7a69252c6f
...
Every piece of generated code is disassembled and broken into hot regions where ticks are clustered. This gives a general overview of the hot parts of the execution.
mx profhot
can also summarize profiles of native images, but the output is slightly different. To view the annotated
code, it is necessary to run the perf command given by profhot
. For example:
$ mx profhot proftool_*
Hot code:
Percent Name
10.40% com.oracle.svm.core.genscavenge.GreyObjectsWalker.walkGreyObjects()
4.06% [unknown]
3.26% java.util.HashMap.computeIfAbsent(Object, Function)
2.54% java.util.stream.AbstractPipeline.copyInto(Sink, Spliterator)
2.29% java.util.stream.AbstractPipeline.evaluate(TerminalOp)
2.12% java.util.HashMap$EntrySpliterator.forEachRemaining(Consumer)
2.06% java.util.regex.Pattern$CharPropertyGreedy.match(Matcher, int, CharSequence)
1.92% java.util.stream.Collectors.lambda$groupingBy$53(Function, Supplier, BiConsumer, Map, Object)
1.88% java.util.stream.ReferencePipeline.collect(Collector)
1.87% java.lang.StringLatin1.toUpperCase(String, byte[], Locale)
Display annotated code by running:
perf report -Mintel --sort symbol -i /path/to/perf_binary_file
Proftool displays only one section where symbols might come from different executables or libraries. This is because a native image may statically link non-Java code, and proftool cannot distinguish Java symbols from foreign symbols. As another consequence, non-Java method names might be formatted as if they were Java names.
An [unknown]
name might come from a dynamic library or the kernel if perf cannot resolve the symbols. Run with
--dso
to display the name of the library. If there are no Java symbols at all, try building the executable with -g
so that debug info is generated. Read the manual
to learn more about debugging native images.
Proftool provides limited extra support for Truffle programs at the moment. The only extra information printed is the Truffle provided name of the generated nmethod when printing the nmethod names. This results in summary output like this for the octane splay benchmark:
Hot generated code:
Percent Name
3.04% 10260: "MeasureDefault#2" OptimizedCallTarget.profiledPERoot(Object[])
1.79% 5934: "GeneratePayloadTree#2" OptimizedCallTarget.profiledPERoot(Object[])
0.74% 10118: "SplayTree.remove#2" OptimizedCallTarget.profiledPERoot(Object[])
0.65% Interpreter
0.18% 7976: "SplayRun#2" OptimizedCallTarget.profiledPERoot(Object[])
0.16% 11695: FastDtoa.digitGen(DiyFp, DiyFp, DiyFp, DtoaBuffer, int)
0.13% 8914: "SplayTree.splay_#2" OptimizedCallTarget.profiledPERoot(Object[])
0.07% 1345: OptimizedCallTarget.callBoundary(Object[])
0.07% jshort_disjoint_arraycopy
0.07% 7877: "SplayTree.splay_#1" OptimizedCallTarget.profiledPERoot(Object[])
OptimizedCallTarget.profiledPERoot
is the Java method that is used by the partial evaluator when compiling Truffle methods and the name
in quotes should correspond to the compiled JavaScript function. More support is planned for future versions of proftool.
Other options to profhot give more control over how this is printed. It's possible to strip package names from the output
using the '-s' option which can make the output more compact. The debug information associated with a pc
can be
quite deep which will overwhelm the actual assembly output. The '-c' option can be used to control the number of frames printed,
so passing 0
will hide the frame information completely and 1
will show just deepest inline frame.
Proftool has a functionality to check the relative frequencies of the basic blocks that the graal compiler computes during its transformations.
It works by telling the compiler to dump the relative frequencies of blocks during the code emittion. To enable it, use the debug option -Dgraal.PrintBBInfoPath
with the path where to dump the basic block information. During compilation the compiler will create a new file in the provided directory (if provided otherwise nothing is dumped) for each compilation and write the block information in it. To simplify its use, proftool can automatically manage this option. By using the -B
or --with-basic-block-info
option of the mx profrecord
command, it will create a directory called block_info
inside the experiment directory and tell the compiler to dump the block information into it. Similarly with mx benchmarks
, one can use the proftool-with-blocks
profiler to enable the dumping of block information in the experiment directory.
To check the blocks relative frequencies, use the mx checkblocks
command. It requires a path to a proftool experiment directory. If the block_info
directory is missing from this experiment or if an other one should be used, it can be provided using the -b
or --block-info
options. Otherwise it will use the block_info
directory from the proftool experiment. Checkblocks firstly counts the number of perf sample and their total period for each blocks. Then for the hottest methods, it performs two kind of checks.
- The first one computes the relative frequency of each block with respect to the first block and compares it with the relative frequency from the compiler. Note that if the first block didn't got any sample this check won't be done and a warning will be printed.
- The second check computes and compares the relative frequency of each block with respect to the most frequent block. Note that if the hottest block from the compiler and from perf aren't the same the check won't be performed and a warning will be printed. Additionally if the 5 hottest blocks from the compiler and from perf are disjoint sets then something is probably wrong so an error is printed.
These checks are not perfect and so to manually look at the data, checkblocks can print it using the
-r
or--raw
flag.