-
Notifications
You must be signed in to change notification settings - Fork 4
What is all this stuff in the Toolchain?
Why are all these tools necessary and what does all of this 'arm-none-eabi' stuff mean ... All of this is fairly basic if you have a lot of experience in C, and I'll gloss over the difficult bits so feel free to skip this section.
So first things first, the compiler takes a highlevel language (in our case C with pre-processor statements) and converts it into low level languages in a number of steps:
-
We start with C files that include preprocessor directives (
#define
's and#include
's) -
First the preprocessor runs, includes all the necessary files and expands the macros. This is typically a 'virtual' step, but if you'd like to see the results of this step, you can run the compiler with the
-E
flag. This option is not really standardized but almost any compiler will support it. In our case:$ arm-none-eabi-gcc -E main.c
-
Next, the C code is compiled, the result of this step is (typically processor-specific) assembly language. Again, you usually don't see the result of this step, but you can using the
-S
compiler flag. The generated assembler code corresponds (almost) exactly to what the processor will later run and this is the last step of the compilation process that results in anything remotely human readable. Examining assembly is useful to learn what is actually happening on the processor and has one other important use. Most compilers support options for various optimizations. If you suspect that the compiler might be screwing up your code through optimization, the-S
flag is a useful debugging feature to help determine how the compiler is optimizing your code. -
The resulting assembly files are the assembled into equivalent object code. This means that the human readable assemble language is converted into it's opcode / machine code equivalent (i.e. the 0's and 1's the processor can understand). At this point the processor still couldn't execute the code though, because it still contains a number of references and symbols that need to be resolved in a further step. This probably sound a bit abstract...
Example:
int main () {
int i;
for (i=0; i!=10; ++i) {}
}
The generalized form of the for
loop is:
for (initialization; test; increment/decrement)
A variable is initialized to a value, each iteration checks whether the test is true, executing the loop while the condition holds. The increment/decrement step is performed AFTER each iteration of the loop.
Since the object code is for machines and it's equivalent to the assembly code, we'll generate assembly using:
arm-none-eabi-gcc -S
. We'll discuss it step by step.
.cpu arm7tdmi
.fpu softvfp
.eabi_attribute 20, 1
.eabi_attribute 21, 1
.eabi_attribute 23, 3
.eabi_attribute 24, 1
.eabi_attribute 25, 1
.eabi_attribute 26, 1
.eabi_attribute 30, 6
.eabi_attribute 18, 4
.file "test.c"
.text
.align 2
.global main
.type main, %function
We'll gloss over the first section, it contains some instruction for the assembler, for example what sort of machine code it should generate .cpu arm7tdmi
and that the target cpu doesn't support hardware floating point operations so the compiler should generate code to deal with floating point values .fpu softvfp
.
main:
@ Function supports interworking.
@ args = 0, pretend = 0, frame = 8
@ frame_needed = 1, uses_anonymous_args = 0
@ link register save eliminated.
str fp, [sp, #-4]!
add fp, sp, #0
sub sp, sp, #12
Since main
is a function call, the compiler generated code to pass in parameters and needs to remember where in the code to return to when the function completes. The way this information is handled varies depending on your plattform. Different architectures provide hardware features and conventions concerning how to handle this in a way that generated object code is interoperable. These conventions are known as the ABI (application binary interface). Without getting into too much details, ARM uses a stack and has hardware registers known as sp
, lr
and fp
(stack pointer, link register and frame pointer) to manage function calls. The first four lines of assembly (not starting with @) are the basic boiler plate code to deal with the function call. These ABI conventions are called eabi
(embedded application binary interface) which explains the eabi
part in the tools! ARM's eabi is fully documented here
mov r3, #0
str r3, [fp, #-8]
The next two lines correspond to the the initialization of the i
variable in the for loop. The mov
command moves a 0 (#0, the hash # indicates this is not an address, but a literal 0) to register 3. Next the register is stored (str
) in a position relative to the frame pointer.
A quick excursion: Registers are special hardware memories that are located within the CPU. The CPU can only perform calculations with values that are in registers. As there are only a limited number of registers available, so values that aren't currently needed for calculations are stored (str
) in RAM. Once the values are needed they are loaded into registers (ldr
).
b .L2
The b
instruction is a branch (goto) that tells the processor to jump to a certain position in memory and continue to execute the code at that location. The assembly code doesn't actually contain the memory location though, but only a label (.L2). This is an example of the symbols that we talked about earlier that need to be expanded in a further step of processing.
The rest of the code consists mostly of the for
loop. My compiler generated a section marked ".L3" first. This bit of code corresponds to the "increment/decrement" part of the for loop. Since this is only executed AFTER each iteration, it's skipped in the first iteration. This is the purpose of b .L2
instruction. So note that the next section is skipped during the loop's first iteration:
.L3:
ldr r3, [fp, #-8]
add r3, r3, #1
str r3, [fp, #-8]
This corresponds to the increment/decrement part of the for loop above (++i
). The code loads (ldr
) the value of the i
variable into the register r3
and then adds 1 (#1) to the that value, placing the result in r3
:
a = b + c
corresponds to
add a, b, c
Finally the check condition of the loop is executed:
.L2:
ldr r3, [fp, #-8]
cmp r3, #10
bne .L3
Again, the value of our i
variable is loaded from RAM. Next it's compared (cmp
) with 10. Since we specified that we want the loop to continue as long as i
hasn't reached 10 (i!=10
), a branch-not-equal (bne
) command branches back to the increment/decrement code at .L3
as long as i!=10.
Note that the .L2:
label is not actual code. At the end of the .L3
section, the code will continue to execute the code that follows it in .L2
, as there is no branch instruction.
If the comparison (cmp r3, #10
) succeeds, the brach-no-equal instruction (bne
) won't branch and the code continues executing with:
mov r0, r3
add sp, fp, #0
ldmfd sp!, {fp}
bx lr
This final bit of code does some cleanup required for returning from the function. It's more or less the reverse of the four instructions at the beginning.
.size main, .-main
.ident "GCC: (GNU) 4.6.2"
Finally a bit of housekeeping.
-
That was a long example ... So, we went from C with preprocessor directives to preprocessed C to assembly to object code that still contains (among other things) symbolic labels ("main", "L3", "L3") in place of actual, physical locations in memory.
-
The next step is called "linking" and this step is responsible for resolving these memory locations. This is done by a program called a "linker" (who would have guessed). The linker takes a bunch of object files (and libraries, which are basically reusable object files), and writes them into a single file. Once all the object files are stuck together, the linker is able to calculate the actual physical positions of all the symbols and resolve them.
-
On larger computers (PCs ...), the system may contain a set of standard libraries (DLLs or dynamic libraries) so that the code in those libraries doesn't need to be copied into every program. If this is the case, the linker also provides code in the binary to be able to locate and load the code at runtime.
-
The linker also does an assortment of other tasks, a number of which are skipped on embedded plattforms, for example, on a PC that linker would add startup code (ever wonder how
main
gets called ...). On our embedded device, we write this startup code ourselves, so this is skipped. -
Finally, the result of linking is copied into a form that the plattform can execute. This is done by the program 'obj-copy'. Our embedded device expects this code (the firmware) to start with the RAM address where the stack is located, followed by a bunch of addresses containing the location of code to be executed under certain circumstances (for example during startup or if an error or an interrupt occur). The actual machine code starts immediately following these addresses. On larger systems, the binaries need to contain more information for the system to be able to execute, and for this purpose is packed into formats called elf (unix) or coff (windows) etc. We don't need this information to run. This explains the final bit of the
arm-none-eabi
triplet: none means we don't package our binary along with additional information. -
Finally, the
checksum
program. As stated above, the firmware that the M3 processors run starts with the location of the stack in memory, followed by a bunch of locations in the code and finally the code itself.
Example:
0000000: f01f 0010 f908 0000 4509 0000 4509 0000 ........E...E...
0000010: 4509 0000 4509 0000 4509 0000 4509 0000 E...E...E...E...
0000020: 4509 0000 4509 0000 4509 0000 4509 0000 E...E...E...E...
(...)
0000100: 4509 0000 4509 0000 4509 0000 4509 0000 E...E...E...E...
0000110: 4509 0000 4509 0000 4509 0000 4509 0000 E...E...E...E...
0000120: 4509 0000 80b4 85b0 00af 7860 4ff0 0003 E.........x`O...
0000130: fb60 03e0 fb68 03f1 0103 fb60 fa68 7b68 .`...h.....`.h{h
0000140: 9a42 f7db 07f1 1407 bd46 80bc 7047 00bf .B.......F..pG..
0000150: 80b5 00af 4ff0 0000 4ff0 0701 4ff0 0102 ....O...O...O...
The above is a hex dump of an example program. The stack starts at memory location 10001f0f
(it's little endian). The first address indicates where the processor is supposed to start executing code 000008f9
, all the other addresses are set to the same address 00000945
which leads to a dead end. Actual code starts around 0x0000124.
The LPC1343 has a, somewhat arbitrary, safeguard in that it expects a 4 byte checksum of the first 28 bytes at location 29-32. The checksum
program calculates this value and copies it into the correct position:
0000000: f01f 0010 f908 0000 4509 0000 4509 0000 ........E...E...
0000010: 4509 0000 4509 0000 4509 0000 bea8 ffef E...E...E.......
0000020: 4509 0000 4509 0000 4509 0000 4509 0000 E...E...E...E...
(...)