-
Notifications
You must be signed in to change notification settings - Fork 7
M680x0 options in greater detail
The gcc M680x0 Options man page gives a brief description of each -m
option. However there's much more to it and many implementation details stay hidden unless one decides to dig deeper. This is my collection of notes and findings.
Older gcc versions (before 4.3) used simple -m680x0
-like notation giving very little room for configurability (and misunderstandings). gcc 4.3 introduced -march
/ -mcpu
/ -mtune
trio which is not 100% bullet-proof and a source of many misconceptions. One can ask whether it bothers us, m680x0 users?
The short answer is no, it doesn't: we have a hard set of available CPUs (basically all 680x0s + one ColdFire CPU, a MCF5474) so basically we can interchangeably use -march=
and -mcpu=
with arguments 68000
, 68010
, 68020
, 68030
, 68040
and 68060
. For the 5474 it is a bit different -- it is Revision B of the ColdFire instruction set architecture (ISA_B) so that implies -march=isab
while the actual CPU is 5474 so that implies -mcpu=5474
(well, 5475
in fact). But since there are no other ColdFire CPUs for our platform, we are fine with -mcpu=5475
and that's that.
The long answer is that we can use those three parameters for better understanding how the m68k backend works.
-march
says what instruction set architecture gcc should generate code for. As mentioned, for 680x0 we have only one architecture and one CPU but the ColdFire (and other CPUs like x86 or ARM) offers more flexibility so you can generate code for one family of CPUs (for instance those running on ISA_B, ARMv6, SSE2 or whatever) and yet optimise for one specific CPU (but using only CPU instructions from given instruction set architecture).
-mcpu
is a narrower version of -march
. Again, in our 680x0 case there's nothing to narrow down (gcc doesn't see the 68000 as an architecture for the 68030 CPU for instance) but in case of other CPUs you can optimise for one specific CPU and not just the whole CPU family (which can use less instructions and therefore generate more generic code).
You can use -march
and -mcpu
together but it's not very useful as you can use only allowed combinations and -mcpu
takes preference. So it's essentially the same as without -march
. It can have some special uses like when you want to be 100% sure that given CPU belongs to certain instruction set architecture but otherwise try to use only one of them.
-mtune
is more interesting -- it says which microarchitecture tune code for (within the constraints set by -march
and -mcpu
). So it can't add new instructions if the arch/cpu is an older CPU yet it can try to benefit from different instruction timings or remove unsupported/emulated instructions not present on the newer CPU we are tuning our code to .
As 680x0's -march
/-mcpu
implies -mtune
with the same value, you can safely use just -mcpu
. With one particular exception and that is -mtune=68020-40
and -mtune=68020-60
(-m68020-40
and -m68020-60
respectively).
From now on let's focus only on the 680x0 series as the various ColdFire CPUs are not so interesting for our platform. So let's dig into the 680x0 specifics. I'll be using the shorter form (i.e. -m68000
instead of -mcpu=68000
).
- primary audience: ST/STE, generally speaking any m68k machine, it's the lowest common denominator
- lacks many useful instructions (32-bit & 64-bit mul/div, bitfield instructions, faster addressing modes)
- on the other hand, doesn't use any instruction emulated on later CPUs (
movep
is not used by gcc) - all floating point operations are emulated using library calls (libgcc) so it's friendly even to the EC/LC variants of the 040 and 060 CPU.
- primary audience: TT/Falcon machines (also various ST accelerators with a 68030/6888x)
- many 32-bit integer instructions and addressing modes are used
- uses the 68881/68882 FPU instructions by default, so even operations in floating point run nicely
- primary audience: AfterBurner/Milan040/Hades040/Aranym machines (although Aranym has native support for all 68881/68882 instructions)
- integer instructions are basically identical to
-m68030
(there's only one new instruction,move16
, and it is not used by gcc) - avoids usage of the emulated 68881/68882 instructions (
fmovecr
,fscale
orfintrz
for example) and heavily uses the single/double precision 68040 instructions (fsadd
/fdadd
etc)
- primary audience: CT60/Milan060/Hades060 machines
- in addition to
-m68040
it avoids usage of 64-bit mul/div and replaces them by a library call (as those are emulated on 68060; in theorycmp2
too but this one isn’t used by gcc) - avoids usage of the emulated 68881/68882 instructions (
fmovecr
,fscale
,FDBcc
,FScc
,FTRAPcc
, ...) and addressing modes/sizes and heavily uses the single/double precision 68040/68060 instructions (fsadd
/fdadd
etc) - beware, the emulated instructions are not the same as on the 68040 -- some are indeed still emulated (
fmovecr
,fscale
), some are again native (fint
,fintrz
) and heavily used and some are additionally emulated (see above)
This is where it gets interesting because since gcc 4.3 one can use different arch/cpu and tune options. As far as gcc 10.x goes, gcc internally recognises only the following instruction set architectures: 68000, 68010, 68020 and 68040. It also recognises only the following microarchitectures for tuning: 68000/68010 (either of them has the same effect), 68040 and 68060.
As you can see, there is no way to generate code for, say, the 68000 and optimise it for a 68020/68030 (i.e. -mcpu=68000 -mtune=68020
-- it works but that mtune
has no effect). Also this means that currently you can't generate 68040-friendly code which inhibits instructions incompatible with the 68060 (i.e. -mcpu=68040 -mtune=68060
) because 68040 and 68060 targets share the same instruction set and it's the tune option which directs gcc to emit/inhibit given instructions. In our case fintrz
which is emulated on the 68040 but allowed on the 68060 so it shouldn't be emitted at all due to the -mcpu=68040
switch and yet it is (what I find as incorrect behaviour).
There are two native options for tuning available: -m68020-40
and -m68020-60
.
- primary audience: TT/Falcon/AfterBurner/Milan040/Hades040/Aranym machines
- equivalent to
-march=68020 -mtune=68020-40
, i.e. using the 68020 instruction set and-mtune=68020
,-mtune=68030
, and-mtune=68040
are implied - as
-mtune=68020
and-mtune=68030
are ignored (see above), code is primarily optimised for a 68040 but all 040-specific instructions are avoided, i.e. all the single/double precision FPU instructions - it may use 68881/68882 instructions that are emulated on the 68040 but gcc itself inhibits them (due to implied
-mtune=68040
; for instancefintrz
is worked around but an external library is allowed to do otherwise)
- primary audience: TT/Falcon/AfterBurner/Milan/Hades/Aranym/CT60 machines
- equivalent to
-march=68020 -mtune=68020-60
, i.e. using the 68020 instruction set and-mtune=68020
,-mtune=68030
,-mtune=68040
and-mtune=68060
are implied - as
-mtune=68020
and-mtune=68030
are ignored (see above) and-mtune=68060
takes precedence in a few places, code is primarily optimised for a 68060 but all 040/060-specific instructions are avoided, i.e. all the single/double precision FPU instructions - additionally, 64-bit mul/div is replaced by a library call (due to implied
-mtune=68060
) - it may use 68881/68882 instructions that are emulated on the 68040 and 68060 but gcc itself inhibits them (due to implied
-mtune=68040
and-mtune=68060
; for instancefintrz
is worked around even though it is native to the 68881/68882/68060 but an external library is allowed to do otherwise)
This is an interesting option for various reasons but in context of 680x0 optimisations one has to be careful about one thing: it inlines various trigonometric functions (sin, cos, ...) directly (using 68881/68882's fsin
, fcos
, ...) instead of using the math library (libm
) calls. So even -m68060 -ffast-math
leads to an inlined fsin
for example, despite the fact that -m68060
is supposed to generate only non-emulated FPU code.
Unintuitively for us, Atari users, -m68020
and -m68030
has this setting on by default (as does -m68040
and -m68060
but they have the FPU built in). That means that FPU instructions are generated instead of gcc library calls. It can be explicitly disabled using -msoft-float
but there must also be an appropriate libm
library supplied by the user. Please note that despite its name -m68881
doesn't mean generation of 68881/68882 instructions but any FPU instructions in general.
Do not generate floating-point instructions; use library calls instead. This is the default for -m68000
.
Explanation by Xavier Joubert:
fsgldiv
and fsglmul
are less accurate (and faster, I think) than fsdiv
and fsmul
, since they truncate datas to single precision before computing results.
Explanation by Andreas Schwab:
fsglmul
/fsgldiv
calculate with single float rounding mode but extended precision exponent (i.e. more range than single float), whereas fsmul
and fsdiv
calculate completely in single float format. The difference between the two methods is visible in the overflow handling: with fsglmul
you can multiply two values of FLT_MAX
without overflow, whereas with fsmul
you would get Infinity
.