LLVM: field addition with saturated fields #456

mratsim · 2024-08-14T09:48:55Z

This PR merges several experiments to implement modular addition in pure LLVM IR so that instead of writing an assembly backend for each target we can generate multiplatform code from LLVM IR, especially for ARM and AMDGPU as they support addition-with-carry and also SIMD, without doing vectorization myself.

That was to try to address:

Unfortunately, compilers are still inefficient at translating modular addition into optimal code.
See #357:

[Aarch64] Missed optimization: substraction with borrow llvm/llvm-project#102062
[x86 / Clang] Pessimization: missed fusion of substraction/compare/cmov after -O1 optimization or extra caller. llvm/llvm-project#102868
[Isel x86/Aarch64] i320+ generates useless 33% extra instructions when chaining sub.with.overflow and select / conditional moves. llvm/llvm-project#103717

Description of experiments:

432a91e implements word-level (so 64-bit by 64-bit) modular addition with inlined adc/sbb "super-instructions" (and mulExt/muladd1/muladd2/mulacc). This generates optimal code on x86-64 but not on ARM due to [Aarch64] Missed optimization: substraction with borrow llvm/llvm-project#102062, and the alloca needs to be specialized to addrspace(5) on AMDGPUs: [AMDGPU] llc crash in 'AMDGPU DAG->DAG Pattern Instruction Selection' llvm/llvm-project#102058
0354d5b refactors the IR to add callable functions, linkage, calling conventions and boilerplate reductions
08b8671 introduces inline/alwaysinline functions failing to work around the bad codegen. Unfortunately inlining breaks instruction sub/icmp fusion into a single sub-with-borrow
a76cfd8 partially works around inlining and instruction fusion by not using an inlining pass, only alwaysinline
b415418 materializes addcarry/subborrow as actual functions in LLVM IR instead of being just a sequence of instructions. Unfortunately it doesn't optimize well on ARM [Aarch64] Missed optimization: substraction with borrow llvm/llvm-project#102062
480ede5 uses builtin llvm.uadd.with.overflow llvm.usub.with.overflow to try to workaround bad codegen. While it is portable to x86 and ARM for up to 256-bit primes, there are 33% extra instructions for prime fields beyond that threshold [Isel x86/Aarch64] i320+ generates useless 33% extra instructions when chaining sub.with.overflow and select / conditional moves. llvm/llvm-project#103717

… with inline function.

…project/issues/102868\#issuecomment-2284935755 module inlining breaks machine instruction fusion

…6 (but not ARM see llvm/llvm-project#102062)

…imal code (and fail for i320 and i384 llvm/llvm-project#103717)

* nvidia: update hello world following changes in #456 * update Nvidia backend to use the new LLVM infra * update Nvidia multiplication

mratsim added 6 commits August 5, 2024 22:25

feat(LLVM): add codegenerator for saturated field add/sub

432a91e

LLVM: WIP refactor - boilerplate, linkage, assembly sections, ...

0354d5b

feat(llvm): try (and fail) to workaround bad modular addition codegen…

08b8671

… with inline function.

llvm: partial workaround failure around https://github.com/llvm/llvm-…

a76cfd8

…project/issues/102868\#issuecomment-2284935755 module inlining breaks machine instruction fusion

llvm: define our own addcarry/subborrow which properly optimize on x8…

b415418

…6 (but not ARM see llvm/llvm-project#102062)

llvm: use builtin llvm.uadd.with.overflow.iXXX to try to generate opt…

480ede5

…imal code (and fail for i320 and i384 llvm/llvm-project#103717)

mratsim merged commit 569e029 into master Aug 14, 2024
23 of 24 checks passed

mratsim deleted the llvm-sat-codegen branch August 14, 2024 09:50

mratsim mentioned this pull request Aug 14, 2024

llvm: more tentatives at optimal field addition with pure LLVM IR #457

Draft

mratsim added a commit that referenced this pull request Aug 27, 2024

nvidia: update hello world following changes in #456

6fc9da3

mratsim mentioned this pull request Aug 27, 2024

Nvidia remastered #464

Merged

mratsim added a commit that referenced this pull request Aug 27, 2024

Nvidia remastered (#464)

65147ed

* nvidia: update hello world following changes in #456 * update Nvidia backend to use the new LLVM infra * update Nvidia multiplication

mratsim mentioned this pull request Aug 27, 2024

[GPU] GPU / LLVM IR Elliptic curves implementation plan #465

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LLVM: field addition with saturated fields #456

LLVM: field addition with saturated fields #456

mratsim commented Aug 14, 2024

LLVM: field addition with saturated fields #456

LLVM: field addition with saturated fields #456

Conversation

mratsim commented Aug 14, 2024