Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loop auto-vectorization #22

Open
12 of 16 tasks
dubiousconst282 opened this issue Mar 14, 2023 · 0 comments
Open
12 of 16 tasks

Loop auto-vectorization #22

dubiousconst282 opened this issue Mar 14, 2023 · 0 comments

Comments

@dubiousconst282
Copy link
Owner

dubiousconst282 commented Mar 14, 2023

Auto-vectorization is a very interesting transform because it can have a significant impact on the few cases where it works. In reality, only a few loops are effectively vectorizable due to unprevalent memory accesses and branching patterns present in normal code. The lack of efficient gather instructions makes this even more problematic with the involvement of non-sequential objects/structs rather than flat arrays.

TODOs

  • Initial implementation
    • Simple for-i loop, memory accesses and basic ops
  • Support for reductions (Add/Mul/And/Or/Xor/Min/Max)
    • It is also possible to support selects by storing the iteration index vector followed by an horz_max() at the end of the loop (this would be useful for loops searching and returning an index).
  • Vector width selection
    • Loop must only work with scalars of the same type. Mixing types and conversions makes this more difficult (general solution seems to be partial unrolling).
  • Legalization
    • Check that stores don't overlap with any other load inside the loop (fallback to runtime guards)
  • Operations
    • Memory accesses (address must be a lea invariant + loop_index)
    • Basic binops: add, sub, mul, fdiv, and, or, xor
      • idiv/frem were intentionally left out because there's no hw accel in neither x64 nor arm.
    • Other ops: neg, not
    • Basic math calls: Min, Max, Abs, Floor, Ceil, Sqrt
    • Other math calls: RSqrt, Rcp, Fmadd
      • Some of these don't have xplat intrinsics so we'd need to implement aux functions (worth proposing fma?)
    • Comparisons and selects
      • Only if operands match the op signess (e.g. ult u32, u32 is ok but not ult i32, i32)
    • Conversions
      • Float <-> Int32

Extras:

  • Basic if-conversion
    • Code gen support for SelectInst
    • Handle non-diamond graphs through path duplication (for empty blocks only)
  • Consider introducing a GetElementPtr/LEA instruction, because recognizing indexing expressions is tricky. Having this could also help consolidation of load/store instructions for arrays and fields.
  • Consider first-class support for vector types in the IR. This may not be that valuable outside of bringing the ability to perform basic peepholes.
  • Consider supporting basic transcendental math functions: Sin, Cos, Log, Exp (port from DirectXMath lib?)
  • Consider supporting some basic cost-model
dubiousconst282 added a commit that referenced this issue Mar 14, 2023
dubiousconst282 added a commit that referenced this issue Mar 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant