You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Auto-vectorization is a very interesting transform because it can have a significant impact on the few cases where it works. In reality, only a few loops are effectively vectorizable due to unprevalent memory accesses and branching patterns present in normal code. The lack of efficient gather instructions makes this even more problematic with the involvement of non-sequential objects/structs rather than flat arrays.
TODOs
Initial implementation
Simple for-i loop, memory accesses and basic ops
Support for reductions (Add/Mul/And/Or/Xor/Min/Max)
It is also possible to support selects by storing the iteration index vector followed by an horz_max() at the end of the loop (this would be useful for loops searching and returning an index).
Vector width selection
Loop must only work with scalars of the same type. Mixing types and conversions makes this more difficult (general solution seems to be partial unrolling).
Legalization
Check that stores don't overlap with any other load inside the loop (fallback to runtime guards)
Operations
Memory accesses (address must be a lea invariant + loop_index)
idiv/frem were intentionally left out because there's no hw accel in neither x64 nor arm.
Other ops: neg, not
Basic math calls: Min, Max, Abs, Floor, Ceil, Sqrt
Other math calls: RSqrt, Rcp, Fmadd
Some of these don't have xplat intrinsics so we'd need to implement aux functions (worth proposing fma?)
Comparisons and selects
Only if operands match the op signess (e.g. ult u32, u32 is ok but not ult i32, i32)
Conversions
Float <-> Int32
Extras:
Basic if-conversion
Code gen support for SelectInst
Handle non-diamond graphs through path duplication (for empty blocks only)
Consider introducing a GetElementPtr/LEA instruction, because recognizing indexing expressions is tricky. Having this could also help consolidation of load/store instructions for arrays and fields.
Consider first-class support for vector types in the IR. This may not be that valuable outside of bringing the ability to perform basic peepholes.
Consider supporting basic transcendental math functions: Sin, Cos, Log, Exp (port from DirectXMath lib?)
Consider supporting some basic cost-model
The text was updated successfully, but these errors were encountered:
Auto-vectorization is a very interesting transform because it can have a significant impact on the few cases where it works. In reality, only a few loops are effectively vectorizable due to unprevalent memory accesses and branching patterns present in normal code. The lack of efficient gather instructions makes this even more problematic with the involvement of non-sequential objects/structs rather than flat arrays.
TODOs
lea invariant + loop_index
)ult u32, u32
is ok but notult i32, i32
)Extras:
The text was updated successfully, but these errors were encountered: