Skip to content
R. Matthew Emerson edited this page May 24, 2024 · 2 revisions

Register usage

The AArch64 ABI says that x18 is reserved for platform use. Apparently you have to treat it as something that can get clobbered at any time.

https://developer.apple.com/documentation/xcode/writing-arm64-code-for-apple-platforms https://stackoverflow.com/questions/71152539/consequence-of-violating-macoss-arm64-calling-convention

Address space

CCL has often used absolute addressing to access some stuff in low memory. On the Mac with Apple silicon, we're not going to be able to do this.

I also just ran across https://developer.apple.com/forums/thread/655950, which says "Modifying pagezero_size isn't a supportable option in the arm64 environment. arm64 code must be in an ASLR binary, which using a custom pagezero_size is incompatible with. An ASLR binary encodes signed pointers using a large random size along with the expected page zero size, and this combination is going to extend beyond the range of values covered in the lower 32-bits.”

On an Apple silicon Mac, it works to compile with cc -Wl,-pagezero_size,0x4000 -g foo.c, but that produces a binary that won't run: "error: Malformed Mach-o file" is what the debugger prints out.

On an Intel Mac, that same cc -Wl,-pagezero_size,0x4000 -g foo.c produces a binary that runs.

If that’s the case, then that may be an exciting problem for an ARM64 port (well, an Apple silicon port in particular, I suppose). Maybe we give up on controlling low memory and burn a register to point at the necessary data. Or maybe we can use the TCR (register pointing to per-thread data) somehow.

W^X page protection

On Macs with Apple silicon, W^X memory protection is always on. We'll have to deal with that. https://developer.apple.com/documentation/apple-silicon/porting-just-in-time-compilers-to-apple-silicon

Need com.apple.security.cs.allow-jit entitlement, so that we can call mmap(2) with the MAP_JIT flag. Threads have to use pthread_jit_write_protect_np to enable and disable write access. Note that this operates per-thread.

Call sys_icache_invalidate after writing new instructions to memory.

Page size

CCL currently hard-codes a page size of 4K. On Apple silicon, page size is 16K.

Stack pointer

Whenever the stack pointer is used as the base register in an address operand, it must have 16-byte alignment (hardware-enforced).

For example, this doesn’t work:

str x1, [sp, #-8]! ;OK, but sp has only 8 byte alignment...
str x0, [sp, #-8]! ;... so this subsequent store fails

One possibility is to use a different register (and a separate memory area) for the value stack. GPRs don’t have the alignment restriction. This sounds like it’s probably the way to go for CCL, because this maintains the invariant that the contents of the value stack between its bottom at top are always unambiguously nodes.

Some other techniques are described at https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/using-the-stack-in-aarch64-implementing-push-and-pop

Addressing modes

Register indirect with offset

ldr x0, [Xn/sp, #imm]
ldr x0, [Xn] ;#0 implied if omitted

This offset can be -256 to +255, or an unsigned multiple of the operand size up to 4095 times the size. For example, ldr x0, [x1, #0x7ff8] is valid. Because the operand size is a 64-bit register, the #xfff immediate is shifted left 3 bits, yielding the 0x7ff8 value above.

It is possible to encode some values either way, with the unscaled signed 9-bit immediate, or with the scaled unsigned 12-bit immediate.

This is an uncomfortably small signed offset.