[LLVMCPU] Add support for dynamic quantization + reassociation of gro…

…uped qmm MegaPR [LLVMCPU] Allow parallel tiling in LLVMCPUSplitReduction, tile reduction by 2 This commit enables tiling of parallel dimensions in LLVMCPUSplitReduction, as well as changing the tile size of the resulting reduction to 2. The latter change is an x86 specific optimization that allows targeting specific instructions through VectorContractCustomKernels. [LLVMCPU] Add support for vecmat cases in VectorContractCustomKernel This commit introduces some new functionality to VectorContractCustomKernels: 1. Matching for vecmat kernels that have 1D vector shapes 2. Support for `vector.contract` ops with split reduction dimensions 3. Ability to allow promoting smaller bitwidth inputs with `arith.extui` or `arith.extsi` before passing into the `llvm.inline_asm` op 4. Ability to specify explicit constraint strings per register input in a VectorContractCustomKernel 5. Support for `i4` and `i8` input types 6. New x86 AVX512VNNI i16xi16->i32 vecmat kernel with split reduction This commit also adds `vector.transfer_read` flattening patterns and VectorContractCustomKernel lowering patterns to LLVMCPUVectorLowering. [LLVMCPU] Add pass to breakdown subbyte `arith.extui` This pass breaks down `arith.extui` ops that have `i4` inputs into a sequence of `vector.shuffle->arith.andi->arith.shrui`. This avoids bad lowering of subbyte extends in x86 backend. This pass is somewhat specific to some work on vecmat VectorContractCustomKernels right now, and has some unique matchings. The pass also attempts to make use of AVX512 registers, so the vector size for the resulting IR is hardcoded as 512 bits. This needs to change before landing. This pass in general needs some refactoring before landing. [LLVMCPU] Add pass to fold away unit dimensions on `vector.contract` ops This pass folds away unit dimensions on `vector.contract` ops to get these ops into a form that is recognizable by the VectorContractCustomKernels patterns. This pass also hoists `vector.shape_cast` ops out of containing `scf.for` ops if possible when the shape cast operates on the accumulator of a `vector.contract` op. This pattern may be better off somewhere else, but for now it is here because the unit dim folding pattern can produce a hoistable `vector.shape_cast` op in cases with split reduction. [LLVMCPU] Add flag to restrict reassociated quantized matmul optimizations [LLVMCPU] Add additional Memref alias foldings [LLVMCPU] Simplify VectorContractCustomKernels x86 constraint codes, add new AVX512 kernel
nod-ai · Oct 12, 2023 · f2b185c · f2b185c
1 parent c5ac55e
commit f2b185c
Show file tree

Hide file tree

Showing 11 changed files with 1,502 additions and 88 deletions.
diff --git a/compiler/src/iree/compiler/Codegen/LLVMCPU/BUILD.bazel b/compiler/src/iree/compiler/Codegen/LLVMCPU/BUILD.bazel
@@ -53,8 +53,11 @@ iree_compiler_cc_library(
         "KernelDispatch.cpp",
         "LLVMCPUAssignConstantOrdinals.cpp",
         "LLVMCPUAssignImportOrdinals.cpp",
+        "LLVMCPUBreakDownSubbyteExtend.cpp",
         "LLVMCPUCheckIRBeforeLLVMConversion.cpp",
         "LLVMCPUEmitVectorizationRemarks.cpp",
+        "LLVMCPUFoldMemRefAliasOps.cpp",
+        "LLVMCPUFoldVectorContractUnitDims.cpp",
         "LLVMCPULinkExecutables.cpp",
         "LLVMCPULowerExecutableTarget.cpp",
         "LLVMCPULowerToUKernels.cpp",

diff --git a/compiler/src/iree/compiler/Codegen/LLVMCPU/CMakeLists.txt b/compiler/src/iree/compiler/Codegen/LLVMCPU/CMakeLists.txt
@@ -54,8 +54,11 @@ iree_cc_library(
     "KernelDispatch.cpp"
     "LLVMCPUAssignConstantOrdinals.cpp"
     "LLVMCPUAssignImportOrdinals.cpp"
+    "LLVMCPUBreakDownSubbyteExtend.cpp"
     "LLVMCPUCheckIRBeforeLLVMConversion.cpp"
     "LLVMCPUEmitVectorizationRemarks.cpp"
+    "LLVMCPUFoldMemRefAliasOps.cpp"
+    "LLVMCPUFoldVectorContractUnitDims.cpp"
     "LLVMCPULinkExecutables.cpp"
     "LLVMCPULowerExecutableTarget.cpp"
     "LLVMCPULowerToUKernels.cpp"