The gcc
binary is actually a relatively small "driver" program, which
parses some command-line options, and then invokes one or more other
programs to do the real work.
Consider compiling a simple hello world C program:
.. literalinclude:: hello.c :language: c
to generate an a.out
binary:
$ gcc hello.c
$ ./a.out
Hello world
Internally, the driver will invoke cc1
(the C compiler), which
converts the .c code to a .s assembler file. Assuming this succeeds the
driver will typically then invoke as
(the assembler), then the linker.
Given that, how do we debug the C compiler? The easier way is to add
-wrapper gdb,--args
to the gcc command-line:
# Invoke "cc1" (and "as", etc) under gdb:
$ gcc hello.c -wrapper gdb,--args
The gcc
driver will then invoke cc1
under gdb
, and you can
set breakpoints, and step through the code.
Note
If you ever need to debug the driver itself, you can simply run it under gdb in the normal way:
# Invoke the "gcc" driver under gdb: $ gdb --args gcc hello.c
I find myself doing this much less frequently than the
-wrapper gdb,--args
invocation for debugging cc1
though.
You can invoke other debugging programs this way, for example, valgrind:
# Invoke "cc1" (and "as", etc) under valgrind:
$ gcc hello.c -wrapper valgrind
Note
For good results under valgrind, it's best to configure your build of gcc with :option:`--enable-valgrind-annotations`, which automatically suppresses various known false positives.
The source tree contains two support scripts that significantly improve
the debugging experience within gdb
, but some setup is required.
gcc/configure
(from configure.ac
) automatically generates a
.gdbinit
within the gcc
subdirectory of the build directory,
and when run by gdb
.
This should be automatically detected and run by gdb. However, you may see a message from gdb of the form:
"path-to-build/gcc/.gdbinit" auto-loading has been declined by your `auto-load safe-path'
- as a protection against untrustworthy python scripts. See
- http://sourceware.org/gdb/onlinedocs/gdb/Auto_002dloading-safe-path.html
The fix is to mark the paths of the build/gcc
directory as trustworthy.
An easy way to do so is by adding the following to your ~/.gdbinit
script:
add-auto-load-safe-path /absolute/path/to/build/gcc
for the build directories for your various checkouts of gcc.
If it's working, you should see the message:
Successfully loaded GDB hooks for GCC
as gdb starts up.
The generated .gdbinit
script loads two files:
- gcc/gdbinit.in
contains useful commands in
gdb
's own language, sets up useful breakpoints, and skipping of some very heavily-used inline functions. - gcc/gdbhooks.py injects useful Python code into gdb, for pretty-printing important data types.
See the links above for more information.
Consider this line of code:
return optimize > 0 && flag_forward_propagate;
It might be reasonable to presume that these are variables that can be inspected in the debugger, but, despite the lack of block capitals, they're actually macros, and hence the "obvious" approach fails:
(gdb) print optimize No symbol "optimize" in current context. (gdb) print flag_forward_propagate No symbol "flag_forward_propagate" in current context.
They're autogenerated preprocessor macros: during the build,
BUILDDIR/gcc/options.h
is written out, and contains code like this:
#ifdef GENERATOR_FILE
extern int optimize;
#else
int x_optimize;
#define optimize global_options.x_optimize
#endif
and:
#ifdef GENERATOR_FILE
extern int flag_forward_propagate;
#else
int x_flag_forward_propagate;
#define flag_forward_propagate global_options.x_flag_forward_propagate
#endif
Hence they're only variables when GENERATOR_FILE
is defined (when
building certain build-time support files); for the common case of
the compiler and driver, these are actually fields within the
global_options
struct, with a x_
prefix.
Hence to read these values when debugging you would use the following:
(gdb) print global_options.x_optimize $1 = 3 (gdb) print global_options.x_flag_forward_propagate $2 = 1
reflecting that, in this case, :option:`-O3` was supplied on the command-line, and that it implicitly enabled :option:`-fforward-propagate`.
If you have a tree node, you can put a watchpoint on the memory location representing its tree code. This will trigger as the tree node is created, which can be helpful for detecting, say, where in a front-end something is built. The memory location might be modified a few times before the node is allocated.
For example, when tracking down where a particular IDENTIFIER_NODE
was built (to fix a bogus suggestion in the C++ frontend):
(gdb) p suggestion $5 = <identifier_node 0x7ffff0d10600 ._61> (gdb) p suggestion->base.code $6 = IDENTIFIER_NODE
Here I put the watchpoint on it:
(gdb) watch -l suggestion->base.code Hardware watchpoint 10: -location suggestion->base.code
On re-running, it takes a few writes before we hit the creation of
the IDENTIFIER_NODE
:
(gdb) run The program being debugged has been started already. Start it from the beginning? (y or n) y [...snip...] Hardware watchpoint 10: -location suggestion->base.code Old value = <unreadable> New value = 2947526575 memset () at ../sysdeps/x86_64/memset.S:69 69 movdqu %xmm8, -16(%rdi,%rdx) (gdb) cont Continuing. Hardware watchpoint 10: -location suggestion->base.code Old value = 2947526575 New value = ERROR_MARK memset () at ../sysdeps/x86_64/memset.S:69 69 movdqu %xmm8, -16(%rdi,%rdx) (gdb) cont Continuing. Hardware watchpoint 10: -location suggestion->base.code Old value = ERROR_MARK New value = IDENTIFIER_NODE make_node (code=IDENTIFIER_NODE) at ../../src/gcc/tree.c:1035 1035 switch (type)
At this point, we can examine the backtrace and see what created the node.
Similar techniques can be used to track down where gimple statements are created, and so on.
GCC uses location_t to track locations in the user's source code. This data type is effectively a key into a database, and due to the need to pack information into a limited number of bits is encoded in a non-trivial way.
A handy trick for debugging locations is to inject a call to inform in the debugger, which emits a note diagnostic at a particular location_t:
(gdb) call inform (loc, "") test.c: In function ‘fn_1’: test.c:15:7: note: 15 | if (flag) | ^~~~
A couple of caveats with this:
- the diagnostics subsystem doesn't print the source code if it was the same location_t as the last time a diagnostic was emitted
- the diagnostic subsystem is not re-entrant, so you can't use this when you're inside the diagnostic emission code
TODO:
- howto: stepping through the compiler, stepping through a pass
- talk about dumpfiles also