Skip to content

M680x0 options in greater detail

Miro Kropáček edited this page Feb 12, 2022 · 5 revisions

The gcc M680x0 Options man page gives a brief description of each -m option. However there's much more to it and many implementation details stay hidden unless one decides to dig deeper. This is my collection of notes and findings.

arch vs. cpu vs. tune

Older gcc versions (before 4.3) used simple -m680x0-like notation giving very little room for configurability (and misunderstandings). gcc 4.3 introduced -march / -mcpu / -mtune trio which is not 100% bullet-proof and a source of many misconceptions. One can ask whether it bothers us, m680x0 users?

The short answer is no, it doesn't: we have a hard set of available CPUs (basically all 680x0s + one ColdFire CPU, a MCF5474) so basically we can interchangeably use -march= and -mcpu= with arguments 68000, 68010, 68020, 68030, 68040 and 68060. For the 5474 it is a bit different -- it is Revision B of the ColdFire instruction set architecture (ISA_B) so that implies -march=isab while the actual CPU is 5474 so that implies -mcpu=5474 (well, 5475 in fact). But since there are no other ColdFire CPUs for our platform, we are fine with -mcpu=5475 and that's that.

The long answer is that we can use those three parameters for better understanding how the m68k backend works.

-march says what instruction set architecture gcc should generate code for. As mentioned, for 680x0 we have only one architecture and one CPU but the ColdFire (and other CPUs like x86 or ARM) offers more flexibility so you can generate code for one family of CPUs (for instance those running on ISA_B, ARMv6, SSE2 or whatever) and yet optimise for one specific CPU (but using only CPU instructions from given instruction set architecture).

-mcpu is a narrower version of -march. Again, in our 680x0 case there's nothing to narrow down (gcc doesn't see the 68000 as an architecture for the 68030 CPU for instance) but in case of other CPUs you can optimise for one specific CPU and not just the whole CPU family (which can use less instructions and therefore generate more generic code).

You can use -march and -mcpu together but it's not very useful as you can use only allowed combinations and -mcpu takes preference. So it's essentially the same as without -march. It can have some special uses like when you want to be 100% sure that given CPU belongs to certain instruction set architecture but otherwise try to use only one of them.

-mtune is more interesting -- it says which microarchitecture tune code for (within the constraints set by -march and -mcpu). So it can't add new instructions if the arch/cpu is an older CPU yet it can try to benefit from different instruction timings or remove unsupported/emulated instructions not present on the newer CPU we are tuning our code to .

As 680x0's -march/-mcpu implies -mtune with the same value, you can safely use just -mcpu. With one particular exception and that is -mtune=68020-40 and -mtune=68020-60 (-m68020-40 and -m68020-60 respectively).

Tuning to a specific Atari machine

From now on let's focus only on the 680x0 series as the various ColdFire CPUs are not so interesting for our platform. So let's dig into the 680x0 specifics. I'll be using the shorter form (i.e. -m68000 instead of -mcpu=68000).

-m68000

  • primary audience: ST/STE, generally speaking any m68k machine, it's the lowest common denominator
  • lacks many useful instructions (32-bit & 64-bit mul/div, bitfield instructions, faster addressing modes)
  • on the other hand, doesn't use any instruction emulated on later CPUs (movep is not used by gcc)
  • all floating point operations are emulated using library calls (libgcc) so it's friendly even to the EC/LC variants of the 040 and 060 CPU.

-m68030

  • primary audience: TT/Falcon machines (also various ST accelerators with a 68030/6888x)
  • many 32-bit integer instructions and addressing modes are used
  • uses the 68881/68882 FPU instructions by default, so even operations in floating point run nicely

-m68040

  • primary audience: AfterBurner/Milan040/Hades040/Aranym machines (although Aranym has native support for all 68881/68882 instructions)
  • integer instructions are basically identical to -m68030 (there's only one new instruction, move16, and it is not used by gcc)
  • avoids usage of the emulated 68881/68882 instructions (fmovecr, fscale or fintrz for example) and heavily uses the single/double precision 68040 instructions (fsadd / fdadd etc)

-m68060

  • primary audience: CT60/Milan060/Hades060 machines
  • in addition to -m68040 it avoids usage of 64-bit mul/div and replaces them by a library call (as those are emulated on 68060; in theory cmp2 too but this one isn’t used by gcc)
  • avoids usage of the emulated 68881/68882 instructions (fmovecr, fscale, FDBcc, FScc, FTRAPcc, ...) and addressing modes/sizes and heavily uses the single/double precision 68040/68060 instructions (fsadd / fdadd etc)
  • beware, the emulated instructions are not the same as on the 68040 -- some are indeed still emulated (fmovecr, fscale), some are again native (fint, fintrz) and heavily used and some are additionally emulated (see above)

Tuning to a set of Atari machines

This is where it gets interesting because since gcc 4.3 one can use different arch/cpu and tune options. As far as gcc 10.x goes, gcc internally recognises only the following instruction set architectures: 68000, 68010, 68020 and 68040. It also recognises only the following microarchitectures for tuning: 68000/68010 (either of them has the same effect), 68040 and 68060.

As you can see, there is no way to generate code for, say, the 68000 and optimise it for a 68020/68030 (i.e. -mcpu=68000 -mtune=68020 -- it works but that mtune has no effect). Also this means that currently you can't generate 68040-friendly code which inhibits instructions incompatible with the 68060 (i.e. -mcpu=68040 -mtune=68060) because 68040 and 68060 targets share the same instruction set and it's the tune option which directs gcc to emit/inhibit given instructions. In our case fintrz which is emulated on the 68040 but allowed on the 68060 so it shouldn't be emitted at all due to the -mcpu=68040 switch and yet it is (what I find as incorrect behaviour).

There are two native options for tuning available: -m68020-40 and -m68020-60.

-m68020-40

  • primary audience: TT/Falcon/AfterBurner/Milan040/Hades040/Aranym machines
  • equivalent to -march=68020 -mtune=68020-40, i.e. using the 68020 instruction set and -mtune=68020, -mtune=68030, and -mtune=68040 are implied
  • as -mtune=68020 and -mtune=68030 are ignored (see above), code is primarily optimised for a 68040 but all 040-specific instructions are avoided, i.e. all the single/double precision FPU instructions
  • it may use 68881/68882 instructions that are emulated on the 68040 but gcc itself inhibits them (due to implied -mtune=68040; for instance fintrz is worked around but an external library is allowed to do otherwise)

-m68020-60

  • primary audience: TT/Falcon/AfterBurner/Milan/Hades/Aranym/CT60 machines
  • equivalent to -march=68020 -mtune=68020-60, i.e. using the 68020 instruction set and -mtune=68020, -mtune=68030, -mtune=68040 and -mtune=68060 are implied
  • as -mtune=68020 and -mtune=68030 are ignored (see above) and -mtune=68060 takes precedence in a few places, code is primarily optimised for a 68060 but all 040/060-specific instructions are avoided, i.e. all the single/double precision FPU instructions
  • additionally, 64-bit mul/div is replaced by a library call (due to implied -mtune=68060)
  • it may use 68881/68882 instructions that are emulated on the 68040 and 68060 but gcc itself inhibits them (due to implied -mtune=68040 and -mtune=68060; for instance fintrz is worked around even though it is native to the 68881/68882/68060 but an external library is allowed to do otherwise)

Other important options for 680x0

-ffast-math

This is an interesting option for various reasons but in context of 680x0 optimisations one has to be careful about one thing: it inlines various trigonometric functions (sin, cos, ...) directly (using 68881/68882's fsin, fcos, ...) instead of using the math library (libm) calls. So even -m68060 -ffast-math leads to an inlined fsin for example, despite the fact that -m68060 is supposed to generate only non-emulated FPU code.

-mhard-float, -m68881

Unintuitively for us, Atari users, -m68020 and -m68030 has this setting on by default (as does -m68040 and -m68060 but they have the FPU built in). That means that FPU instructions are generated instead of gcc library calls. It can be explicitly disabled using -msoft-float but there must also be an appropriate libm library supplied by the user. Please note that despite its name -m68881 doesn't mean generation of 68881/68882 instructions but any FPU instructions in general.

-msoft-float

Do not generate floating-point instructions; use library calls instead. This is the default for -m68000.

Difference between fsmul/fsdiv and fsglmul/fsgldiv

Explanation by Xavier Joubert:

fsgldiv and fsglmul are less accurate (and faster, I think) than fsdiv and fsmul, since they truncate datas to single precision before computing results.

Explanation by Andreas Schwab:

fsglmul/fsgldiv calculate with single float rounding mode but extended precision exponent (i.e. more range than single float), whereas fsmul and fsdiv calculate completely in single float format. The difference between the two methods is visible in the overflow handling: with fsglmul you can multiply two values of FLT_MAX without overflow, whereas with fsmul you would get Infinity.