`krnl.acos` (KrnlAcosOp)

Krnl acos scalar operation

Krnl acos scalar operation.

Operands:

Operand	Description
`in`	floating-point

Results:

Result	Description
`out`	floating-point

`krnl.acosh` (KrnlAcoshOp)

Krnl acosh scalar operation

Krnl acosh scalar operation.

Operands:

Operand	Description
`in`	floating-point

Results:

Result	Description
`out`	floating-point

`krnl.asin` (KrnlAsinOp)

Krnl asin scalar operation

Krnl asin scalar operation.

Operands:

Operand	Description
`in`	floating-point

Results:

Result	Description
`out`	floating-point

`krnl.asinh` (KrnlAsinhOp)

Krnl asinh scalar operation

Krnl asinh scalar operation.

Operands:

Operand	Description
`in`	floating-point

Results:

Result	Description
`out`	floating-point

`krnl.atan` (KrnlAtanOp)

Krnl atan scalar operation

Krnl atan scalar operation.

Operands:

Operand	Description
`in`	floating-point

Results:

Result	Description
`out`	floating-point

`krnl.atanh` (KrnlAtanhOp)

Krnl atanh scalar operation

Krnl atanh scalar operation.

Operands:

Operand	Description
`in`	floating-point

Results:

Result	Description
`out`	floating-point

`krnl.block` (KrnlBlockOp)

Krnl block operation

Syntax:

operation ::= `krnl.block` $loop $tile_size attr-dict `:` functional-type($loop, results)

Block a single for loop by a constant tile size. For instance,

$ib, $il = krnl.block %i, 4

means to block the for loop referred to by %i using a tile size of 4.

Attributes:

Attribute	MLIR Type	Description
`tile_size`	::mlir::IntegerAttr	64-bit signless integer attribute

Operands:

Operand	Description
`loop`	any type

Results:

Result	Description
`loop_block`	any type
`loop_local`	any type

`krnl.call` (KrnlCallOp)

Call operation

The call operation provides a generic way to replace an ONNX Op with a call to an external function at Krnl level. funcName attributes determines which function to call. parameters is the inputs to Krnl.Call. It includes the outputs and inputs of the ONNX Op. The outputs and inputs are already lowered to MemRefs. The external function is assumed NOT to allocate or free any memory. 'numOfOutput` attribute to tell how manu outputs Memref in parameters. mlir::OpTrait::AttrSizedOperandSegments is not used to put outputs and inputs into separate variadic parameters because I am thinking of mixing the inputs and outpus as required by external library.

The attributes of the ONNX Op will be copied to KrnlCallOp under the control of the user. In Krnl To llvm lowering, the parameters and attributes will be lowered to parameters of the llvm function call.

Several builder is defined to help translating an ONNX Op to Krnl.Call. User can provides the allocated MemRefs for outputs and the inputs separately. The inputs are usually the operands of the ONNX Op. The attributes of ONNX Op can be copied or not copied based on a bool parameter in the builder. Builder also provide a mechanism for user to selectively copy some attributes.

The krnl.call op will be lowered to llvm at krnl-to-llvm conversion in which OMTensor is used as a container for MemRef arguments. Other representation of parameters, such as data pointer only, will be supported in future.

Interfaces: MemoryEffectOpInterface

Attributes:

Attribute	MLIR Type	Description
`funcName`	::mlir::StringAttr	string attribute
`numOfOutput`	::mlir::IntegerAttr	64-bit signed integer attribute

Operands:

Operand	Description
`parameters`	variadic of any type

`krnl.copy_from_tile_buffer` (KrnlCopyFromBufferOp)

Copy from buffer.

Syntax:

operation ::= `krnl.copy_from_tile_buffer` $buffer `,` $dest `[` $starts `]`  attr-dict `:` type($buffer) `,` type($dest)

Operation that copy a destination memory from a buffer memory. Starts indicate where the buffer data starts to go into the destination memory. Start values must be at multiples of buffer size in all dimensions. The buffer rank and dimensions are compile time constants.

If the buffer was oversized with respect of the actual data contained in the tile, the actual tile size can be given using the tileSize optional attribute. This attributes has the same rank as the buffer size, and each dimension must be smaller or equal to the actual buffer size.

Traits: MemRefsNormalizable

Attributes:

Attribute	MLIR Type	Description
`tileSize`	::mlir::ArrayAttr	64-bit integer array attribute

Operands:

Operand	Description
`buffer`	memref of any type values
`dest`	memref of any type values
`starts`	variadic of index

`krnl.copy_to_tile_buffer` (KrnlCopyToBufferOp)

Copy to buffer.

Syntax:

operation ::= `krnl.copy_to_tile_buffer` $buffer `,` $source `[` $starts `]` `,`  $padValue  attr-dict
              `:` type($buffer) `,` type($source)

Operation that copy a source memory to a buffer memory. Starts indicate where the source data starts to come from within the source memory. Start values must be at multiples of buffer size in all dimensions. The buffer rank and dimensions are compile time constants.

The buffer will be entirely filled with the source data. By default, the amount of data to copy is given by the size of the buffer. In some cases, we may want to oversize a buffer for better cache, simd, or loop unroll and jam reasons. If that is the case, the actual tile size of the data to be copied over is given by an optional tileSize attribute. This attributes has the same rank as the buffer size, and each dimension must be smaller or equal to the actual buffer size.

If there is not enough data in the source memory to fill the buffer, because the operation reaches the upper bounds of the source memory, several actions may happen.

IfpadToNext attribute is given, the pad value will be copied from the last source data of to the next index for which index modulo padToNext is zero, i.e. to the end of a "cache line" of side padToLine. Pad of 1 means no padding, pad of buffer size means fully pad the buffer. Default is no padding (1). PadValue is used to initialized the padded areas.
If overreadToNext attribute is given, the copy may read source past its upper bound value. This enable optimized code, e.g. using SIMD read operations even if going past the last value of the source memory, or unrolling and jamming copy loops to reduce memory latency. overreadToNext is expressed like padToNext: value of 1 means no reading past boundary; value of buffer size enables reading as many additional source value as needed to fill the full buffer. Default is buffer-size.

padToNext and overreadToNext are of the same rank as source and memory memrefs.

Traits: MemRefsNormalizable

Attributes:

Attribute	MLIR Type	Description
`tileSize`	::mlir::ArrayAttr	64-bit integer array attribute
`padToNext`	::mlir::ArrayAttr	64-bit integer array attribute
`transpose`	::mlir::BoolAttr	bool attribute

Operands:

Operand	Description
`buffer`	memref of any type values
`source`	memref of any type values
`starts`	variadic of index
`padValue`	any type

`krnl.define_loops` (KrnlDefineLoopsOp)

Define_loops operation

The "krnl.define_loops" operation is used to define input loops, those are the for loops appearing in the input program that we intend to optimize.

Results:

Result	Description
«unnamed»	variadic of any type

`krnl.entry_point` (KrnlEntryPointOp)

Indicate ONNX entry point

The "krnl.entry_point" function indicates the main entry point of ONNX model.

`krnl.erf` (KrnlErfOp)

Krnl erf scalar operation

Krnl erf scalar operation.

Operands:

Operand	Description
`in`	floating-point

Results:

Result	Description
`out`	floating-point

`krnl.find_index` (KrnlFindIndexOp)

Retrieve an index into a perfect hash table described by G and V.

This operation can be used to generate a call to a runtime function which, given two arrays of int32_t values (G and V), which are used to represent a perfect hash table for a dictionary, returns the index corresponding to the input value. The index returned is valid only if 'input' is in the dictionary described by G and V.

Traits: AlwaysSpeculatableImplTrait, MemRefsNormalizable

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Operands:

Operand	Description
`input`	string type or 64-bit signless integer
`G`	memref of 32-bit signless integer values
`V`	memref of 32-bit signless integer values
`len`	32-bit signless integer

Results:

Result	Description
`index`	index

`krnl.get_induction_var_value` (KrnlGetInductionVariableValueOp)

Krnl

Syntax:

operation ::= `krnl.get_induction_var_value` `(` $loops `)` attr-dict `:` functional-type($loops, results)

Krnl operation to convert loop references to corresponding induction variable values. This is useful for accessing optimized loop induction variables, as they are not otherwise accessible during Krnl Dialect.

For example, this operation can be applied to loop references corresponding to inter-tile iterations. The return values will be the starting index of the current tile being iterated over.

Operands:

Operand	Description
`loops`	variadic of any type

Results:

Result	Description
`ind_var_vals`	variadic of any type

`krnl.global` (KrnlGlobalOp)

Krnl global operation

Operation for holding global data values. A global constant can have a meaningful name recorded as its name attribute. Its content is stored in the value dense element attribute.

Traits: AlwaysSpeculatableImplTrait, MemRefsNormalizable

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Attributes:

Attribute	MLIR Type	Description
`shape`	::mlir::Attribute	any attribute
`name`	::mlir::StringAttr	string attribute
`value`	::mlir::Attribute	any attribute
`offset`	::mlir::IntegerAttr	64-bit signless integer attribute
`alignment`	::mlir::IntegerAttr	64-bit signless integer attribute

Results:

Result	Description
`output`	memref of any type values

`krnl.runtime_instrument` (KrnlInstrumentOp)

Instrumentation point.

Operation that invokes the runtime instrument utility. May be used for gdb.

Attributes:

Attribute	MLIR Type	Description
`opName`	::mlir::StringAttr	string attribute
`tag`	::mlir::IntegerAttr	64-bit signless integer attribute
`nodeName`	::mlir::StringAttr	string attribute

`krnl.isinf` (KrnlIsInfOp)

Krnl isinf scalar operation

Krnl isinf scalar operation.

Operands:

Operand	Description
`in`	floating-point

Results:

Result	Description
`out`	1-bit signless integer

`krnl.isnan` (KrnlIsNaNOp)

Krnl isnan scalar operation

Krnl isnan scalar operation.

Operands:

Operand	Description
`in`	floating-point

Results:

Result	Description
`out`	1-bit signless integer

`krnl.iterate` (KrnlIterateOp)

Iterate operation

The "krnl.iterate" operation is conceptually equivalent to a nested for loops.

For instance, say we have the following two

%l0, %l1 = krnl.define_loops 2
%o0, %o1 = krnl.optimize_loops  {
    // Identity schedule.
    krnl.return_loops %l0, %l1
}

Then, consider the following krnl.iterate operation:

krnl.iterate (%o0, %o1) with (%l0 -> %i0 = 0 to 10, %l1 -> %i1 = 0 to 10) {
  // Some operations.
}

It is equivalent to:

for (i0 = 0; i0 < 10; i0++)
  for (i1 = 0; i1 < 10; i1++)
    // Some operations.

Traits: SingleBlock, SingleBlockImplicitTerminator

Interfaces: LoopLikeOpInterface

Operands:

Operand	Description
«unnamed»	variadic of any type

`krnl.load` (KrnlLoadOp)

A Krnl operation to load data from the memref.

Syntax:

operation ::= `krnl.load` $memref `[` $indices `]` attr-dict `:` type($memref)

The krnl.load op reads an element from a memref specified by an index list. The output of load is a new value with the same type as the elements of the memref. The arity of indices is the rank of the memref (i.e., if the memref loaded from is of rank 3, then 3 indices are required for the load following the memref identifier).

Traits: MemRefsNormalizable

Operands:

Operand	Description
`memref`	memref of any type values
`indices`	variadic of index

Results:

Result	Description
`result`	any type

`krnl.matmul` (KrnlMatMulOp)

Matmul operation for a single pannel.

Syntax:

operation ::= `krnl.matmul` $A `[` $aGlobalIndexMemStart `]` `,`
              $B `[` $bGlobalIndexMemStart `]` `,`
              $C `[` $cGlobalIndexMemStart `]` `,`
              `(` $loops `)` `,`
              `(` $iGlobalIndexComputeStart `,` $jGlobalIndexComputeStart `,`
              $kGlobalIndexComputeStart `)` `,`
              `(` $iGlobalUB `,` $jGlobalUB `,` $kGlobalUB `)`
              attr-dict `:` type($A) `,` type($B)`,` type($C) `,` `(` type($loops) `)`

Perform a matrix multiplication AA * BB + CC with sizes `[IxK] * [KxJ] + [IxJ]`.
The original matrices AA, BB, and CC can be buffered in buffered arrays
which may be padded. The original matrices and the padded array might
have a higher rank than 2, but the actual matrix multiplication operation
only deal with the innermost 2 ranks of the matrices to perform its matrix
multiplication operations.

The computations may also compute only a sub-tile of the buffered arrays.
This region is depicted using stars '*' below.

All indices passed to this operation are the global indices in the original
computation, so as to better know if we have boundary conditions.

ORIGINAL ARRAY: denoted as AA, BB, CC with sizes AA: `*xIxK`; BB: `*xKxJ`; CC: `*xI*J`).

BUFFER ARRAYS: denoted as A, B, and C. Note that this operation does
  not require the use of buffers arrays. If none are used, then A=AA,
  B=BB, C=CC. If buffers are used, it is the responsibility of the caller
  to properly fill the buffers with the appropriate data. Buffers are
  typically used for cache tiling.

 ORIGINAL ARRAY

     -------------------------------------------------
     |                                               ]
     |                                               ]
     |             buffer array       buffer pad     ]
     |            (3)---------------- ++++           ]
     |             |                 |   +           ]
     |             |     (1)****     |   +           ]
     |             |      *    *     |   +           ]
     |             |      *    *     |   +           ]
     |             |      ****(5)    |   +           ]
     |             |                 |   +           ]
     |             |                 |   +           ]
     |             ------------------|   +           ]
     |             +                     +           ]
     |             +++++++++++++++++++++(4)          ]
     |                                               ]
     -----------------------------------------------(2)

(1) iGlobalIndexComputeStart/jGlobalIndexComputeStart/kGlobalIndexComputeStart, required, each three are global 1D indices.
(2) iGlobalUB/jGlobalUB/jGlobalUB, required, each three are global 1D indices.
(3) aGlobalIndexMemStart/bGlobalIndexMemStart/cGlobalIndexMemStart, required, global nD indices with the same rank as the buffers A, B, and C.
(4) aTileSize/bTileSize/cTileSize, required when padding, each 2D sizes.
(5) computeTileSizes, required when tiled computation within buffer, 3D sizes (I, J, K).

The iGlobalIndexComputeStart/jGlobalIndexComputeStart/ kGlobalIndexComputeStart (1) indicate the global indices of the first element of a tile to be computed in the original computations.

The iGlobalUB/jGlobalUB/kGlobalUB (2) indicate the global upper bounds in the original computations.

We provide 3 buffers for matrix multiply: A, B, and C. For each buffer, we indicate the global indices pointing the beginning of the buffer: aGlobalIndexMemStart, bGlobalIndexMemStart, and cGlobalIndexMemStart (3). If no buffers are used, i.e. the computation starts directly in the original memory, the global index is 0. If a buffer for AA is used to put data into it starting at indices [i1, k1], where i1 & k1 are the global indices in the original computations, then aGlobalIndexMemStart0 and aGlobalIndexMemStart1 are i1 & k1, respectively.

If the A, B, or C buffers are larger than the actual data tile they contain (see copy_to_tile_buffer), then the actual tile size must be given using an optional attribute: aTileSize, bTileSize, or cTileSize (4). These optional tile size have a rank of 2, and their values must be equal or smaller than their corresponding buffer memrefs.

If the computation are further tiled with respect to the size of the buffers A, B, or C, then the actual computation tile is given by the optional tile attribute computeTileSize (5). Its rank is 3, for the I, J, and K dimension. The actual A, B, and C buffer tile size (possibly specified by the optional parameters) must be a multiple of the I, J, and K computeTileSizes, in their respective dimensions (A: [IxK], B: [KxJ], C: [IxJ]).

Note that the buffers A, B, and C can be of higher dimensionality than the traditional 2D mentioned up to now, because of broadcasting rules. At this time, we only support broadcast of arrays having ranks of 2 or more. Because of the broadcast rules, the higher dimensions have a constant index during one matrix multiply. These fixed indices are given as prefix dimensions in the starting indices for AA, BB, and CC as described above. E.g. if AA has a rank of 3, and BB has a rank of 2, the starting indices for AA are [d, i1, k1] where i1 and k1 are as above, and d is index pointing to the current instance of the IxK AA matrix to be computed. B start indices would be unchanged at [k1, j1].

Simdize is used to state if simdization is requested. Unrolling is used to unroll and jam loops as warranted.

Below is an example calculating a matrix multiply with pre-zeroed C matrix with the sizes below.

    %A: memref<40x60xf32>, %B: memref<60x80xf32>, %C: memref<40x80xf32>

    // 3 tiled loops.
    %ii, %jj, %kk = krnl.define_loops 3
    %ib, %il = krnl.block %ii 10 : (!krnl.loop) -> (!krnl.loop, !krnl.loop)
    %jb, %jl = krnl.block %jj 8 : (!krnl.loop) -> (!krnl.loop, !krnl.loop)
    %kb, %kl = krnl.block %kk 10 : (!krnl.loop) -> (!krnl.loop, !krnl.loop)
    // 3 subtiles.
    %ilb, %ill = krnl.block %il 5 : (!krnl.loop) -> (!krnl.loop, !krnl.loop)
    %jlb, %jll = krnl.block %jl 4 : (!krnl.loop) -> (!krnl.loop, !krnl.loop)
    %klb, %kll = krnl.block %kl 5 : (!krnl.loop) -> (!krnl.loop, !krnl.loop)
    // Permute.
    krnl.permute(%ib, %ilb, %ill, %jb, %jlb, %jll, %kb, %klb, %kll)
        [0, 3, 6, 1, 4, 7, 2, 5, 8] :
        !krnl.loop, !krnl.loop, !krnl.loop, !krnl.loop, !krnl.loop,
        !krnl.loop, !krnl.loop, !krnl.loop, !krnl.loop
    // Outer 2 for i, j.
    krnl.iterate(%ib, %jb) with (%ii -> %i = 0 to 40,
                                 %jj -> %j = 0 to 80,
                                 %kk -> %k = 0 to 60) {
        %i1, %j1 = krnl.get_induction_var_value(%ib, %jb) :
          (!krnl.loop,!krnl.loop) -> (index, index)
        // Fill C buffer.
        %Cbuff = alloca(): memref<10x8xf32>  // n x m_simd
        krnl.copy_to_tile_buffer %Cbuff, %C[%i1, %j1], %f0 :
          memref<10x8xf32>, memref<40x80xf32>
        // Outer 1 for k.
        krnl.iterate(%kb) with () {
            %k1 = krnl.get_induction_var_value(%kb) : (!krnl.loop) -> (index)
            // Fill A and B buffer
            %Abuff = alloca(): memref<10x10xf32> // i x k
            %Bbuff = alloca(): memref<10x8xf32>  // k x j_simd
            krnl.copy_to_tile_buffer %Abuff, %A[%i1, %k1], %f0 :
              memref<10x10xf32>, memref<40x60xf32>
            krnl.copy_to_tile_buffer %Bbuff, %B[%k1, %j1], %f0 :
              memref<10x8xf32>, memref<60x80xf32>

            // Inner iterations for subtiles.
            krnl.iterate(%ilb, %jlb, %klb) with () {
                %i2, %j2, %k2 = krnl.get_induction_var_value(%ilb, %jlb, %klb) :
                (!krnl.loop,!krnl.loop,!krnl.loop) -> (index,index,index)

                krnl.matmul %Abuff[%i1, %k1], %Bbuff[%k1, %j1], %Cbuff[%i1, %j1],
                    (%ill, %jll, %kll), (%i2, %j2, %k2), (%c40, %c80, %c60)
                    { computeTileSize=[5,4,5], simdize=false, unroll=false } :
                    memref<10x10xf32>, memref<10x8xf32>, memref<10x8xf32>,
                    (!krnl.loop,!krnl.loop,!krnl.loop)
            }
        }
        // Copy back the data into C.
        krnl.copy_from_tile_buffer %Cbuff, %C[%i1, %j1] :
          memref<10x8xf32>, memref<40x80xf32>
    }

Note that code is simdized along the J dim (last dim of B and C matrices).
For simd to be enabled, the simdized flag must be set to true, and the
following condition must be true:
1) The vector length is the second entry of (i, j, k) compute tile size.
   The vector length must be a compile time constant.

Traits: AttrSizedOperandSegments, MemRefsNormalizable

Interfaces: SpecializedKernelOpInterface

Attributes:

Attribute	MLIR Type	Description
`computeTileSize`	::mlir::ArrayAttr	64-bit integer array attribute
`aTileSize`	::mlir::ArrayAttr	64-bit integer array attribute
`bTileSize`	::mlir::ArrayAttr	64-bit integer array attribute
`cTileSize`	::mlir::ArrayAttr	64-bit integer array attribute
`simdize`	::mlir::BoolAttr	bool attribute
`unroll`	::mlir::BoolAttr	bool attribute
`overcompute`	::mlir::BoolAttr	bool attribute

Operands:

Operand	Description
`A`	memref of any type values
`aGlobalIndexMemStart`	variadic of index
`B`	memref of any type values
`bGlobalIndexMemStart`	variadic of index
`C`	memref of any type values
`cGlobalIndexMemStart`	variadic of index
`loops`	variadic of any type
`iGlobalIndexComputeStart`	index
`jGlobalIndexComputeStart`	index
`kGlobalIndexComputeStart`	index
`iGlobalUB`	index
`jGlobalUB`	index
`kGlobalUB`	index

`krnl.memcpy` (KrnlMemcpyOp)

Krnl memcpy operation

Copy num_elems elements from src to dest MemRef.

Starting positions for src and dest are defined by src_offset and dest_offset, respectively.

It is the users' responsibility to make sure there is no out-of-bound read/write.

Traits: MemRefsNormalizable

Operands:

Operand	Description
`dest`	memref of any type values
`src`	memref of any type values
`num_elems`	64-bit signless integer
`dest_offset`	index
`src_offset`	index

`krnl.memset` (KrnlMemsetOp)

Set buffer to a given value.

Syntax:

operation ::= `krnl.memset` $dest `,` $value attr-dict `:` type($dest)

Krnl operation that sets a buffer to a given value. In case that the buffer is a MemRef with affine_map, delayed indicates whether we set values along original or extended iteration space.

For example, given

an affine_map #tile = affine_map < (i)->(i floordiv 4, i mod 4) >, and
a buffer of type memref<5xf32, #tile>

Original iteration space is along the first axis that has 5 elements.

If we do normalization, the memref becomes memref<2x4xf32>. Now we have an extended iteration space along two axes of sizes 2 and 4, respectively. This extended iteration space has 8 elements in total.

If delayed = false, the original iteration space is used to set values. In the above example, only 5 out of 8 elementes will be set to the given value.

If delayed = true, the extended iteration space is used to set values. In the above example, all 8 elements will be set to the given value.

Traits: MemRefsNormalizable

Attributes:

Attribute	MLIR Type	Description
`delayed`	::mlir::BoolAttr	bool attribute

Operands:

Operand	Description
`dest`	memref of any type values
`value`	any type

`krnl.movable` (KrnlMovableOp)

Krnl movable operation

Syntax:

operation ::= `krnl.movable` $region attr-dict

Encapsulates a list of operations, which should be moved under a newly lowered affine for operation eventually, but cannot presently because the destination affine for operation is not materialized yet.

This operation is automatically generated by the lowering of Krnl to affine dialect to assist with maintaining the relative positioning of loop and inner-loop statements. This construct is particularly helpful, for example, for lowering statements that are nested imperfectly between an "eager" and a "lazy" loop.

Traits: SingleBlock, SingleBlockImplicitTerminator

`krnl.noValue` (KrnlNoneOp)

An operation representing the absence of a value.

This operation can be used to represent the absence of a value. It is typically used as an argument to operators that have optional parameters, and converted into nullptr while krnl to llvm lowering. Typically it is used for optional arguments used in KrnlCallop.

Attributes:

Attribute	MLIR Type	Description
`value`	::mlir::UnitAttr	unit attribute

Results:

Result	Description
`none_val`	none type

`krnl.parallel` (KrnlParallelOp)

Mark Krnl loops as parallel loops

Syntax:

operation ::= `krnl.parallel` `(` $loops `)` attr-dict `:` type($loops)

Parallelize the specified loops. When multiple loop specifiers are passed as parameters, there loops can be parallelized as a collapsed loop. krnl.parallel should be placed as the last operator before krnl.iterate, Since we do not want to parallelize the loop until we interpret krnl.block, krnl.permute and krnl.unroll.

krnl.parallel (%i0, %i1) : !Krnl.loop, !Krnl.loop

Operands:

Operand	Description
`loops`	variadic of any type

`krnl.permute` (KrnlPermuteOp)

Krnl permute operation

Syntax:

operation ::= `krnl.permute` `(` $loops `)` $map attr-dict `:` type($loops)

Permute a set of affine for loops using a specified permutation map. The permutation map map should be constructed in such way that the for loop referred to by the i-th operand to permute operation is sent to the map[i]-th position.

For example, the following krnl dialect IR:

%ii, %jj, %kk = krnl.define_loops 3
krnl.permute(%ii, %jj, %kk) [1, 2, 0] : !krnl.loop, !krnl.loop, !krnl.loop
krnl.iterate (%ii, %jj, %kk) with (%ii -> %i = 0 to 10, %jj -> %j = 0 to 20, %kk -> %k = 0 to 30) {}

will be lowered to:

// Referenced by %kk
affine.for %arg0 = 0 to 30 {
  // Referenced by %ii
  affine.for %arg1 = 0 to 10 {
    // Referenced by %jj
    affine.for %arg2 = 0 to 20 {
    }
  }
}

For a more complicated example, we demonstrate 3-D tiling using krnl.block in conjunction with krnl.permute:

%ii, %jj, %kk = krnl.define_loops 3
// Blocking each loop by a factor of 4.
%ib, %il = krnl.block %ii 4 : (!krnl.loop) -> (!krnl.loop, !krnl.loop)
%jb, %jl = krnl.block %jj 4 : (!krnl.loop) -> (!krnl.loop, !krnl.loop)
%kb, %kl = krnl.block %kk 4 : (!krnl.loop) -> (!krnl.loop, !krnl.loop)
// Move iteration over tile coordinates to be the outer loops and iterateion over
// the inter-tile elements to be the inner loops.
krnl.permute(%ib, %il, %jb, %jl, %kb, %kl) [0, 3, 1, 4, 2, 5] : !krnl.loop, !krnl.loop, !krnl.loop, !krnl.loop, !krnl.loop, !krnl.loop
krnl.iterate(%ib, %il, %jb, %jl, %kb, %kl) with (%ii -> %i = 0 to 1024, %jj -> %j = 0 to 2048, %kk -> %k = 0 to 4096)  {
}

The above IR gets lowered to:

affine.for %arg0 = 0 to 1024 step 4 {
  affine.for %arg1 = 0 to 2048 step 4 {
    affine.for %arg2 = 0 to 4096 step 4 {
      affine.for %arg3 = #map0(%arg0) to #map1(%arg0) {
        affine.for %arg4 = #map0(%arg1) to #map1(%arg1) {
          affine.for %arg5 = #map0(%arg2) to #map1(%arg2) {
          }
        }
      }
    }
  }
}

Attributes:

Attribute	MLIR Type	Description
`map`	::mlir::ArrayAttr	64-bit integer array attribute

Operands:

Operand	Description
`loops`	variadic of any type

`krnl.print` (KrnlPrintOp)

Print a value.

This operation can be used to print the input value. The user needs to provide a format string (à la printf) to specify how to print the input value. If the input value is not specified the operator will print the format string.

Traits: MemRefsNormalizable

Attributes:

Attribute	MLIR Type	Description
`format`	::mlir::StringAttr	string attribute

Operands:

Operand	Description
`input`	any type

`krnl.print_tensor` (KrnlPrintTensorOp)

Print a tensor.

This operation can be used to generate a call to a runtime function which prints a tensor. At the beginning of the msg string, user can add formatting instructions. The flags are:

%s: detailed signature (including shape, type, offsets),
%t: compact type (ala MLIR: 32x16xfloat),
%d: data values.

When no formatting is provided, %s%d is used (detailed signature and data) by default. Print operation ends with a newline, except when only requesting a compact types (%t).

Traits: MemRefsNormalizable

Attributes:

Attribute	MLIR Type	Description
`msg`	::mlir::StringAttr	string attribute

Operands:

Operand	Description
`input`	memref of any type values

`krnl.random_normal` (KrnlRandomNormalOp)

Generate a random normal tensor.

Operation that generates a random normally distributed tensor.

Traits: MemRefsNormalizable

Operands:

Operand	Description
`output`	memref of any type values
`numberOfValues`	index
`mean`	floating-point
`scale`	floating-point
`seed`	floating-point

`krnl.region` (KrnlRegionOp)

Affine boundary for krnl loops

This Op has a region with AffineScope trait and is used to limit the scope of affine.for. The loop inside krnl.region can be affined if its boundary is defined at the level of krnl.region. The krnl.region does not guarantee or require the loops inside it to be affine. With krnl.region, a krnl loop may not be affine if its boundary symbol is not defined inside a enclosing region without AffineScope trait. In MLIR, FuncOp has the AffineScope trait. The krnl.region will be removed after affine.for is lowered. ToFix: current krnl.region does not have input and output. You cannot create a new memref inside the region and use it outside of the region.

Traits: AffineScope, NoTerminator, SingleBlock

`krnl.seqalloc` (KrnlSeqAllocOp)

Krnl create a sequence

This op allocates a memref for a new sequence according to the input Type and length. The output is tagged with Allocate side effect, and a deallocation is defined for sequence. This deallocation will free all the elements in the sequence as well as the sequence itself.

Traits: MemRefsNormalizable

Interfaces: AllocationOpInterface, MemoryEffectOpInterface

Operands:

Operand	Description
`length`	variadic of index

Results:

Result	Description
`output`	memref of any type values

`krnl.seqdealloc` (KrnlSeqDeallocOp)

Krnl dealloc a sequence

This op deallocate the elements in the sequence and the sequence itself with memref::dealloc. This Op is a deep dealloc for sequence type.

Traits: MemRefsNormalizable

Operands:

Operand	Description
`input_sequence`	memref of any type values

`krnl.seqextract` (KrnlSeqExtractOp)

Krnl load from a sequence

This op loads an element from the input sequence 'seq' at position 'index'. The loaded element is copied and then return. The position value is guaranteed to be positive. Negative position allowed by ONNX Op definition should be handled before lowered to KrnlSeqExtract.

Attribute 'copy' provides an optimization for copying. When the attribute 'copy' is 1 (default value): the extracted element is copied and then return. When the attribute 'copy' is 0: the extracted element is directly returned without copy.

The returned element is marked as allocated by this Op with the bufferation interface so that deallocation can be generated correctly through the Bufferization::Deallocation pass.

Traits: MemRefsNormalizable

Interfaces: AllocationOpInterface, MemoryEffectOpInterface

Attributes:

Attribute	MLIR Type	Description
`copy`	::mlir::IntegerAttr	1-bit unsigned integer attribute

Operands:

Operand	Description
`seq`	memref of any type values
`index`	index

Results:

Result	Description
`output`	any type

`krnl.seqstore` (KrnlSeqStoreOp)

Krnl store into a seq

This op is similar to KrnSeqInsertOp but assumes that the input seq has the space for the new element and only need to copy the element and store it into the sequence. There is no return of a new seq, different from KrnlSeqInsertOp. This Op is introduced to accumulate a dynamic tensor in a LoopOp with statically known iteration count.

Traits: MemRefsNormalizable

Operands:

Operand	Description
`input`	any type
`seq`	memref of any type values
`index`	index

`krnl.specialized_kernel` (KrnlSpecializedKernel)

Krnl specialized kernel op

Syntax:

operation ::= `krnl.specialized_kernel` `(` $loops `)` attr-dict `:` type($loops)

Krnl operation to convert.

Interfaces: SpecializedKernelOpInterface

Operands:

Operand	Description
`loops`	variadic of any type

`krnl.store` (KrnlStoreOp)

A Krnl operation to store data to the memref.

Syntax:

operation ::= `krnl.store` $value `,` $memref `[` $indices `]` attr-dict `:` type($memref)

The krnl.store stores a value to a memref location given by indices. The value stored should have the same type as the elemental type of the memref. The number of arguments provided within brackets need to match the rank of the memref.

Traits: MemRefsNormalizable

Operands:

Operand	Description
`value`	any type
`memref`	memref of any type values
`indices`	variadic of index

`krnl.strlen` (KrnlStrlenOp)

Compute the length of a string.

Krnl operation that computes the length of a string.

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Operands:

Operand	Description
`str`	string type

Results:

Result	Description
`res`	64-bit signless integer

`krnl.strncmp` (KrnlStrncmpOp)

Perform string comparison up to N bytes.

Krnl operation that performs a string comparison up to N bytes.

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Operands:

Operand	Description
`str1`	string type
`str2`	string type
`len`	64-bit signless integer

Results:

Result	Description
`res`	32-bit signless integer

`krnl.tan` (KrnlTanOp)

Krnl tan scalar operation

Krnl tan scalar operation.

Operands:

Operand	Description
`in`	floating-point

Results:

Result	Description
`out`	floating-point

`krnl.terminate` (KrnlTerminatorOp)

Krnl terminator operation

Krnl terminator is a special terminator operation for blocks inside krnl iterate operations. It unconditionally transmits the control flow to the successor of the operation enclosing the region.

This operation does not have a custom syntax. However, krnl control operations omit the terminator in their custom syntax for brevity.

Traits: ReturnLike, Terminator

Interfaces: RegionBranchTerminatorOpInterface

`krnl.unroll` (KrnlUnrollOp)

Krnl unroll operation

Syntax:

operation ::= `krnl.unroll` $loop attr-dict `:` type($loop)

Fully unroll the specified loops.

krnl.unroll %i

unrolls the loop referred to by %i fully.

Operands:

Operand	Description
`loop`	any type

`krnl.vector_type_cast` (KrnlVectorTypeCastOp)

Vector type cast operation

Syntax:

operation ::= `krnl.vector_type_cast` $source attr-dict `:` type($source) `to` type($result)

The "vector_type_cast" operation converts a memref from an non-vector element type to another memref of a vector elemental type while not changing the source memref's element type. The last dimension size of the source dimension is divided (floor division) by the vector size to obtain the corresponding dimension for target memref type.

%MV = vector_type_cast %M : memref<64x16xf32> to memref<64x2xvector<8xf32>>
%AV = vector_type_cast %A : memref<?x?xf32> to memref<?x?xvector<8xf32>>

Traits: AlwaysSpeculatableImplTrait, MemRefsNormalizable

Interfaces: CastOpInterface, ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface), ViewLikeOpInterface

Effects: MemoryEffects::Effect{}

Operands:

Operand	Description
`source`	memref of any type values

Results:

Result	Description
`result`	memref of any type values

Files

krnl.md

Latest commit

History

krnl.md

File metadata and controls

krnl.acos (KrnlAcosOp)

Operands:

Results:

krnl.acosh (KrnlAcoshOp)

Operands:

Results:

krnl.asin (KrnlAsinOp)

Operands:

Results:

krnl.asinh (KrnlAsinhOp)

Operands:

Results:

krnl.atan (KrnlAtanOp)

Operands:

Results:

krnl.atanh (KrnlAtanhOp)

Operands:

Results:

krnl.block (KrnlBlockOp)

Attributes:

Operands:

Results:

krnl.call (KrnlCallOp)

Attributes:

Operands:

krnl.copy_from_tile_buffer (KrnlCopyFromBufferOp)

Attributes:

Operands:

krnl.copy_to_tile_buffer (KrnlCopyToBufferOp)

Attributes:

Operands:

krnl.define_loops (KrnlDefineLoopsOp)

Results:

krnl.entry_point (KrnlEntryPointOp)

krnl.erf (KrnlErfOp)

Operands:

Results:

krnl.find_index (KrnlFindIndexOp)

Operands:

Results:

krnl.get_induction_var_value (KrnlGetInductionVariableValueOp)

Operands:

Results:

krnl.global (KrnlGlobalOp)

Attributes:

Results:

krnl.runtime_instrument (KrnlInstrumentOp)

Attributes:

krnl.isinf (KrnlIsInfOp)

Operands:

Results:

krnl.isnan (KrnlIsNaNOp)

Operands:

Results:

krnl.iterate (KrnlIterateOp)

Operands:

krnl.load (KrnlLoadOp)

Operands:

Results:

krnl.matmul (KrnlMatMulOp)

Attributes:

Operands:

krnl.memcpy (KrnlMemcpyOp)

Operands:

krnl.memset (KrnlMemsetOp)

Attributes:

Operands:

krnl.movable (KrnlMovableOp)

krnl.noValue (KrnlNoneOp)

Attributes:

Results:

krnl.parallel (KrnlParallelOp)

Operands:

krnl.permute (KrnlPermuteOp)

`krnl.acos` (KrnlAcosOp)

`krnl.acosh` (KrnlAcoshOp)

`krnl.asin` (KrnlAsinOp)

`krnl.asinh` (KrnlAsinhOp)

`krnl.atan` (KrnlAtanOp)

`krnl.atanh` (KrnlAtanhOp)

`krnl.block` (KrnlBlockOp)

`krnl.call` (KrnlCallOp)

`krnl.copy_from_tile_buffer` (KrnlCopyFromBufferOp)

`krnl.copy_to_tile_buffer` (KrnlCopyToBufferOp)

`krnl.define_loops` (KrnlDefineLoopsOp)

`krnl.entry_point` (KrnlEntryPointOp)

`krnl.erf` (KrnlErfOp)

`krnl.find_index` (KrnlFindIndexOp)

`krnl.get_induction_var_value` (KrnlGetInductionVariableValueOp)

`krnl.global` (KrnlGlobalOp)

`krnl.runtime_instrument` (KrnlInstrumentOp)

`krnl.isinf` (KrnlIsInfOp)

`krnl.isnan` (KrnlIsNaNOp)

`krnl.iterate` (KrnlIterateOp)

`krnl.load` (KrnlLoadOp)

`krnl.matmul` (KrnlMatMulOp)

`krnl.memcpy` (KrnlMemcpyOp)

`krnl.memset` (KrnlMemsetOp)

`krnl.movable` (KrnlMovableOp)

`krnl.noValue` (KrnlNoneOp)

`krnl.parallel` (KrnlParallelOp)

`krnl.permute` (KrnlPermuteOp)

`krnl.print` (KrnlPrintOp)

`krnl.print_tensor` (KrnlPrintTensorOp)

`krnl.random_normal` (KrnlRandomNormalOp)

`krnl.region` (KrnlRegionOp)

`krnl.seqalloc` (KrnlSeqAllocOp)

`krnl.seqdealloc` (KrnlSeqDeallocOp)

`krnl.seqextract` (KrnlSeqExtractOp)

`krnl.seqstore` (KrnlSeqStoreOp)

`krnl.specialized_kernel` (KrnlSpecializedKernel)

`krnl.store` (KrnlStoreOp)

`krnl.strlen` (KrnlStrlenOp)

`krnl.strncmp` (KrnlStrncmpOp)

`krnl.tan` (KrnlTanOp)

`krnl.terminate` (KrnlTerminatorOp)

`krnl.unroll` (KrnlUnrollOp)

`krnl.vector_type_cast` (KrnlVectorTypeCastOp)