Beaver.MLIR.Dialect.ArmSVE (beaver v0.4.6)
Summary
Functions
arm_sve.convert_from_svbool - Convert a svbool type to a SVE predicate type
arm_sve.convert_to_svbool - Convert a SVE predicate type to a svbool type
arm_sve.dupq_lane - Broadcast indexed 128-bit segment to vector
arm_sve.intr.add
arm_sve.intr.bfmmla - BFloat16 matrix multiply-accumulate
arm_sve.intr.convert.from.svbool
arm_sve.intr.convert.to.svbool
arm_sve.intr.dupq_lane
arm_sve.intr.fadd
arm_sve.intr.fdiv
arm_sve.intr.fmul
arm_sve.intr.fsub
arm_sve.intr.mul
arm_sve.intr.psel
arm_sve.intr.sdiv
arm_sve.intr.sdot
arm_sve.intr.smmla
arm_sve.intr.sub
arm_sve.intr.udiv
arm_sve.intr.udot
arm_sve.intr.ummla
arm_sve.intr.usmmla
arm_sve.intr.whilelt
arm_sve.intr.zip.x2
arm_sve.intr.zip.x4
arm_sve.masked.addf - masked addition for scalable vectors of floats
arm_sve.masked.addi - masked addition for scalable vectors of integers
arm_sve.masked.divf - masked division for scalable vectors of floats
arm_sve.masked.divi_signed - masked signed division for scalable vectors of integers
arm_sve.masked.divi_unsigned - masked unsigned division for scalable vectors of integers
arm_sve.masked.mulf - masked multiplication for scalable vectors of floats
arm_sve.masked.muli - masked multiplication for scalable vectors of integers
arm_sve.masked.subf - masked subtraction for scalable vectors of floats
arm_sve.masked.subi - masked subtraction for scalable vectors of integers
arm_sve.psel - Predicate select
arm_sve.sdot - Vector-vector dot product and accumulate op
arm_sve.smmla - Matrix-matrix multiply and accumulate op
arm_sve.udot - Vector-vector dot product and accumulate op
arm_sve.ummla - Matrix-matrix multiply and accumulate op
arm_sve.usmmla - Matrix-matrix multiply and accumulate op
arm_sve.zip.x2 - Multi-vector two-way zip op
arm_sve.zip.x4 - Multi-vector four-way zip op
Functions
arm_sve.convert_from_svbool - Convert a svbool type to a SVE predicate type
Operands
source- Single,SVBoolMask, trailing scalable vector of 1-bit signless integer values with dim -1 having a size of {16}
Results
result- Single,SVEPredicateMask, trailing scalable vector of 1-bit signless integer values with dim -1 having a size of {16, 8, 4, 2, 1}
Description
Converts svbool types (vector<[16]xi1> or vectors of that type, e.g.
vector<2x3x[16]xi1>) to SVE predicate types. Note: Only the trailing
dimension can be scalable.
Example 1: Convert a 1-D svbool mask to a SVE predicate.
%source = vector.load %memref[%c0] : memref<?xi1>, vector<[16]xi1>
%result = arm_sve.convert_from_svbool %source : vector<[4]xi1>Example 2: Convert a 2-D svbool mask to a mask of SVE predicates.
%source = vector.load %memref[%c0, %c0] : memref<2x?xi1>, vector<2x[16]xi1>
%result = arm_sve.convert_from_svbool %source : vector<2x[8]xi1>A svbool is the smallest SVE predicate type that has a in-memory
representation (and maps to a full predicate register). In MLIR svbool is
represented as vector<[16]xi1>. Smaller SVE predicate types
(vector<[1|2|4|8]xi1>) must be stored as a svbool then converted back to
the original predicate type after loading.
arm_sve.convert_to_svbool - Convert a SVE predicate type to a svbool type
Operands
source- Single,SVEPredicateMask, trailing scalable vector of 1-bit signless integer values with dim -1 having a size of {16, 8, 4, 2, 1}
Results
result- Single,SVBoolMask, trailing scalable vector of 1-bit signless integer values with dim -1 having a size of {16}
Description
Converts SVE predicate types (or vectors of predicate types, e.g.
vector<4x[4]xi1>) to svbool types. Note: Only the trailing dimension can
be scalable.
Example 1: Convert a 1-D SVE predicate to a svbool mask.
%source = vector.create_mask %dim_size : vector<[4]xi1>
%result = arm_sve.convert_to_svbool %source : vector<[4]xi1>
// => Results in vector<[16]xi1>Example 2: Convert a 2-D mask of SVE predicates to a svbool mask.
%source = vector.create_mask %c2, %dim_size : vector<2x[2]xi1>
%result = arm_sve.convert_to_svbool %source : vector<2x[2]xi1>
// => Results in vector<2x[16]xi1>A svbool is the smallest SVE predicate type that has a in-memory
representation (and maps to a full predicate register). In MLIR svbool is
represented as vector<[16]xi1>. Smaller SVE predicate types
(vector<[1|2|4|8]xi1>) must be converted to a svbool before they can be
stored.
arm_sve.dupq_lane - Broadcast indexed 128-bit segment to vector
Attributes
lane- Single,I64Attr, 64-bit signless integer attribute
Operands
src- Single,SVEVector, a vector type that matches the size of a SVE vector
Results
dst- Single,SVEVector, a vector type that matches the size of a SVE vector
Description
This operation fills each 128-bit segment of a vector with the elements from the indexed 128-bit segment of the source vector. If the VL is 128 bits the operation is a NOP. If the index exceeds the number of 128-bit segments in a vector the result is an all-zeroes vector.
Example:
// VL == 256
// %X = [A B C D x x x x]
%Y = arm_sve.dupq_lane %X[0] : vector<[4]xi32>
// Y = [A B C D A B C D]
// %U = [x x x x x x x x A B C D E F G H]
%V = arm_sve.dupq_lane %U[1] : vector<[8]xf16>
// %V = [A B C D E F H A B C D E F H]Note: The semantics of the operation match those of the svdupq_lane instrinsics.
Source
arm_sve.intr.add
Operands
- anonymous - Single,
AnyScalableVectorOfAnyRank, scalable vector of any type values - anonymous - Single,
AnyScalableVectorOfAnyRank, scalable vector of any type values - anonymous - Single,
AnyScalableVectorOfAnyRank, scalable vector of any type values
Results
res- Single,LLVM_Type, LLVM dialect-compatible type
arm_sve.intr.bfmmla - BFloat16 matrix multiply-accumulate
Operands
acc- Single, anonymous/composite constraint, f8E8M0FNU type or fixed-length vector of f8E8M0FNU type values of length 4src1- Single, anonymous/composite constraint, scalable vector of bfloat16 type values of length 8src2- Single, anonymous/composite constraint, scalable vector of bfloat16 type values of length 8
Results
res- Single, anonymous/composite constraint, f8E8M0FNU type or fixed-length vector of f8E8M0FNU type values of length 4
Description
BFMMLA: BFloat16 matrix multiply-accumulate into 2×2 matrices";
This operation multiplies the 2x4 BFloat16 matrix held in each 128-bit segment of the first source vector by the 4x2 BFloat16 matrix in the corresponding segment of the second source vector, then accumulates this intermediate result with the 2x2 Float32 matrix in the corresponding segment of the accumulator vector, yielding the final 2x2 Float32 segment of the result.
arm_sve.intr.convert.from.svbool
Operands
svbool- Single,SVBool, vector<[16]xi1>
Results
res- Single,LLVM_Type, LLVM dialect-compatible type
arm_sve.intr.convert.to.svbool
Operands
mask- Single,SVEPredicate, a vector type that matches the size of a SVE predicate
Results
res- Single,LLVM_Type, LLVM dialect-compatible type
arm_sve.intr.dupq_lane
Attributes
lane- Single,I64Attr, 64-bit signless integer attribute
Operands
v- Single, anonymous/composite constraint, of ranks 1
Results
res- Single,LLVM_Type, LLVM dialect-compatible type
arm_sve.intr.fadd
Operands
- anonymous - Single,
AnyScalableVectorOfAnyRank, scalable vector of any type values - anonymous - Single,
AnyScalableVectorOfAnyRank, scalable vector of any type values - anonymous - Single,
AnyScalableVectorOfAnyRank, scalable vector of any type values
Results
res- Single,LLVM_Type, LLVM dialect-compatible type
arm_sve.intr.fdiv
Operands
- anonymous - Single,
AnyScalableVectorOfAnyRank, scalable vector of any type values - anonymous - Single,
AnyScalableVectorOfAnyRank, scalable vector of any type values - anonymous - Single,
AnyScalableVectorOfAnyRank, scalable vector of any type values
Results
res- Single,LLVM_Type, LLVM dialect-compatible type
arm_sve.intr.fmul
Operands
- anonymous - Single,
AnyScalableVectorOfAnyRank, scalable vector of any type values - anonymous - Single,
AnyScalableVectorOfAnyRank, scalable vector of any type values - anonymous - Single,
AnyScalableVectorOfAnyRank, scalable vector of any type values
Results
res- Single,LLVM_Type, LLVM dialect-compatible type
arm_sve.intr.fsub
Operands
- anonymous - Single,
AnyScalableVectorOfAnyRank, scalable vector of any type values - anonymous - Single,
AnyScalableVectorOfAnyRank, scalable vector of any type values - anonymous - Single,
AnyScalableVectorOfAnyRank, scalable vector of any type values
Results
res- Single,LLVM_Type, LLVM dialect-compatible type
arm_sve.intr.mul
Operands
- anonymous - Single,
AnyScalableVectorOfAnyRank, scalable vector of any type values - anonymous - Single,
AnyScalableVectorOfAnyRank, scalable vector of any type values - anonymous - Single,
AnyScalableVectorOfAnyRank, scalable vector of any type values
Results
res- Single,LLVM_Type, LLVM dialect-compatible type
arm_sve.intr.psel
Operands
p1- Single,SVBool, vector<[16]xi1>p2- Single,SVEPredicate, a vector type that matches the size of a SVE predicateindex- Single,I32, 32-bit signless integer
Results
res- Single,LLVM_Type, LLVM dialect-compatible type
arm_sve.intr.sdiv
Operands
- anonymous - Single,
AnyScalableVectorOfAnyRank, scalable vector of any type values - anonymous - Single,
AnyScalableVectorOfAnyRank, scalable vector of any type values - anonymous - Single,
AnyScalableVectorOfAnyRank, scalable vector of any type values
Results
res- Single,LLVM_Type, LLVM dialect-compatible type
arm_sve.intr.sdot
Operands
- anonymous - Single,
AnyScalableVectorOfAnyRank, scalable vector of any type values - anonymous - Single,
AnyScalableVectorOfAnyRank, scalable vector of any type values - anonymous - Single,
AnyScalableVectorOfAnyRank, scalable vector of any type values
Results
res- Single,LLVM_Type, LLVM dialect-compatible type
arm_sve.intr.smmla
Operands
- anonymous - Single,
AnyScalableVectorOfAnyRank, scalable vector of any type values - anonymous - Single,
AnyScalableVectorOfAnyRank, scalable vector of any type values - anonymous - Single,
AnyScalableVectorOfAnyRank, scalable vector of any type values
Results
res- Single,LLVM_Type, LLVM dialect-compatible type
arm_sve.intr.sub
Operands
- anonymous - Single,
AnyScalableVectorOfAnyRank, scalable vector of any type values - anonymous - Single,
AnyScalableVectorOfAnyRank, scalable vector of any type values - anonymous - Single,
AnyScalableVectorOfAnyRank, scalable vector of any type values
Results
res- Single,LLVM_Type, LLVM dialect-compatible type
arm_sve.intr.udiv
Operands
- anonymous - Single,
AnyScalableVectorOfAnyRank, scalable vector of any type values - anonymous - Single,
AnyScalableVectorOfAnyRank, scalable vector of any type values - anonymous - Single,
AnyScalableVectorOfAnyRank, scalable vector of any type values
Results
res- Single,LLVM_Type, LLVM dialect-compatible type
arm_sve.intr.udot
Operands
- anonymous - Single,
AnyScalableVectorOfAnyRank, scalable vector of any type values - anonymous - Single,
AnyScalableVectorOfAnyRank, scalable vector of any type values - anonymous - Single,
AnyScalableVectorOfAnyRank, scalable vector of any type values
Results
res- Single,LLVM_Type, LLVM dialect-compatible type
arm_sve.intr.ummla
Operands
- anonymous - Single,
AnyScalableVectorOfAnyRank, scalable vector of any type values - anonymous - Single,
AnyScalableVectorOfAnyRank, scalable vector of any type values - anonymous - Single,
AnyScalableVectorOfAnyRank, scalable vector of any type values
Results
res- Single,LLVM_Type, LLVM dialect-compatible type
arm_sve.intr.usmmla
Operands
- anonymous - Single,
AnyScalableVectorOfAnyRank, scalable vector of any type values - anonymous - Single,
AnyScalableVectorOfAnyRank, scalable vector of any type values - anonymous - Single,
AnyScalableVectorOfAnyRank, scalable vector of any type values
Results
res- Single,LLVM_Type, LLVM dialect-compatible type
arm_sve.intr.whilelt
Operands
base- Single,I64, 64-bit signless integern- Single,I64, 64-bit signless integer
Results
res- Single,LLVM_Type, LLVM dialect-compatible type
arm_sve.intr.zip.x2
Operands
v1- Single,AnyScalableVectorOfAnyRank, scalable vector of any type valuesv2- Single,AnyScalableVectorOfAnyRank, scalable vector of any type values
Results
res- Single,LLVM_Type, LLVM dialect-compatible type
arm_sve.intr.zip.x4
Operands
v1- Single,AnyScalableVectorOfAnyRank, scalable vector of any type valuesv2- Single,AnyScalableVectorOfAnyRank, scalable vector of any type valuesv3- Single,AnyScalableVectorOfAnyRank, scalable vector of any type valuesv4- Single,AnyScalableVectorOfAnyRank, scalable vector of any type values
Results
res- Single,LLVM_Type, LLVM dialect-compatible type
arm_sve.masked.addf - masked addition for scalable vectors of floats
Operands
mask- Single, anonymous/composite constraint, vector of 8-bit signless integer or 16-bit signless integer or 32-bit signless integer values of length 8/4/2src1- Single, anonymous/composite constraint, a vector with length 8 of bfloat16 type valuessrc2- Single, anonymous/composite constraint, a vector with length 8 of bfloat16 type values
Results
res- Single, anonymous/composite constraint, a vector with length 8 of bfloat16 type values
Description
The arm_sve.masked.addf operation takes one scalable vector mask
and two scalable vector operands, and perform floating point addition on active lanes. Inactive lanes will keep the value of
the first operand.
arm_sve.masked.addi - masked addition for scalable vectors of integers
Operands
mask- Single, anonymous/composite constraint, vector of 8-bit signless integer or 16-bit signless integer or 32-bit signless integer values of length 8/4/2src1- Single, anonymous/composite constraint, scalable vector of 8-bit signless integer or 16-bit signless integer or 32-bit signless integer or 64-bit signless integer valuessrc2- Single, anonymous/composite constraint, scalable vector of 8-bit signless integer or 16-bit signless integer or 32-bit signless integer or 64-bit signless integer values
Results
res- Single, anonymous/composite constraint, scalable vector of 8-bit signless integer or 16-bit signless integer or 32-bit signless integer or 64-bit signless integer values
Description
The arm_sve.masked.addi operation takes one scalable vector mask
and two scalable vector operands, and perform integer addition on active lanes. Inactive lanes will keep the value of
the first operand.
arm_sve.masked.divf - masked division for scalable vectors of floats
Operands
mask- Single, anonymous/composite constraint, vector of 8-bit signless integer or 16-bit signless integer or 32-bit signless integer values of length 8/4/2src1- Single, anonymous/composite constraint, a vector with length 8 of bfloat16 type valuessrc2- Single, anonymous/composite constraint, a vector with length 8 of bfloat16 type values
Results
res- Single, anonymous/composite constraint, a vector with length 8 of bfloat16 type values
Description
The arm_sve.masked.divf operation takes one scalable vector mask
and two scalable vector operands, and perform floating point division on active lanes. Inactive lanes will keep the value of
the first operand.
arm_sve.masked.divi_signed - masked signed division for scalable vectors of integers
Operands
mask- Single, anonymous/composite constraint, vector of 8-bit signless integer or 16-bit signless integer or 32-bit signless integer values of length 8/4/2src1- Single, anonymous/composite constraint, scalable vector of 8-bit signless integer or 16-bit signless integer or 32-bit signless integer or 64-bit signless integer valuessrc2- Single, anonymous/composite constraint, scalable vector of 8-bit signless integer or 16-bit signless integer or 32-bit signless integer or 64-bit signless integer values
Results
res- Single, anonymous/composite constraint, scalable vector of 8-bit signless integer or 16-bit signless integer or 32-bit signless integer or 64-bit signless integer values
Description
The arm_sve.masked.divi_signed operation takes one scalable vector mask
and two scalable vector operands, and perform integer signed division on active lanes. Inactive lanes will keep the value of
the first operand.
arm_sve.masked.divi_unsigned - masked unsigned division for scalable vectors of integers
Operands
mask- Single, anonymous/composite constraint, vector of 8-bit signless integer or 16-bit signless integer or 32-bit signless integer values of length 8/4/2src1- Single, anonymous/composite constraint, scalable vector of 8-bit signless integer or 16-bit signless integer or 32-bit signless integer or 64-bit signless integer valuessrc2- Single, anonymous/composite constraint, scalable vector of 8-bit signless integer or 16-bit signless integer or 32-bit signless integer or 64-bit signless integer values
Results
res- Single, anonymous/composite constraint, scalable vector of 8-bit signless integer or 16-bit signless integer or 32-bit signless integer or 64-bit signless integer values
Description
The arm_sve.masked.divi_unsigned operation takes one scalable vector mask
and two scalable vector operands, and perform integer unsigned division on active lanes. Inactive lanes will keep the value of
the first operand.
arm_sve.masked.mulf - masked multiplication for scalable vectors of floats
Operands
mask- Single, anonymous/composite constraint, vector of 8-bit signless integer or 16-bit signless integer or 32-bit signless integer values of length 8/4/2src1- Single, anonymous/composite constraint, a vector with length 8 of bfloat16 type valuessrc2- Single, anonymous/composite constraint, a vector with length 8 of bfloat16 type values
Results
res- Single, anonymous/composite constraint, a vector with length 8 of bfloat16 type values
Description
The arm_sve.masked.mulf operation takes one scalable vector mask
and two scalable vector operands, and perform floating point multiplication on active lanes. Inactive lanes will keep the value of
the first operand.
arm_sve.masked.muli - masked multiplication for scalable vectors of integers
Operands
mask- Single, anonymous/composite constraint, vector of 8-bit signless integer or 16-bit signless integer or 32-bit signless integer values of length 8/4/2src1- Single, anonymous/composite constraint, scalable vector of 8-bit signless integer or 16-bit signless integer or 32-bit signless integer or 64-bit signless integer valuessrc2- Single, anonymous/composite constraint, scalable vector of 8-bit signless integer or 16-bit signless integer or 32-bit signless integer or 64-bit signless integer values
Results
res- Single, anonymous/composite constraint, scalable vector of 8-bit signless integer or 16-bit signless integer or 32-bit signless integer or 64-bit signless integer values
Description
The arm_sve.masked.muli operation takes one scalable vector mask
and two scalable vector operands, and perform integer multiplication on active lanes. Inactive lanes will keep the value of
the first operand.
arm_sve.masked.subf - masked subtraction for scalable vectors of floats
Operands
mask- Single, anonymous/composite constraint, vector of 8-bit signless integer or 16-bit signless integer or 32-bit signless integer values of length 8/4/2src1- Single, anonymous/composite constraint, a vector with length 8 of bfloat16 type valuessrc2- Single, anonymous/composite constraint, a vector with length 8 of bfloat16 type values
Results
res- Single, anonymous/composite constraint, a vector with length 8 of bfloat16 type values
Description
The arm_sve.masked.subf operation takes one scalable vector mask
and two scalable vector operands, and perform floating point subtraction on active lanes. Inactive lanes will keep the value of
the first operand.
arm_sve.masked.subi - masked subtraction for scalable vectors of integers
Operands
mask- Single, anonymous/composite constraint, vector of 8-bit signless integer or 16-bit signless integer or 32-bit signless integer values of length 8/4/2src1- Single, anonymous/composite constraint, scalable vector of 8-bit signless integer or 16-bit signless integer or 32-bit signless integer or 64-bit signless integer valuessrc2- Single, anonymous/composite constraint, scalable vector of 8-bit signless integer or 16-bit signless integer or 32-bit signless integer or 64-bit signless integer values
Results
res- Single, anonymous/composite constraint, scalable vector of 8-bit signless integer or 16-bit signless integer or 32-bit signless integer or 64-bit signless integer values
Description
The arm_sve.masked.subi operation takes one scalable vector mask
and two scalable vector operands, and perform integer subtraction on active lanes. Inactive lanes will keep the value of
the first operand.
arm_sve.psel - Predicate select
Operands
p1- Single,SVEPredicate, a vector type that matches the size of a SVE predicatep2- Single,SVEPredicate, a vector type that matches the size of a SVE predicateindex- Single,Index, index
Results
result- Single,SVEPredicate, a vector type that matches the size of a SVE predicate
Description
This operation returns the input predicate p1 or an all-false predicate
based on the bit at p2[index]. Informally, the semantics are:
if p2[index % num_elements(p2)] == 1:
return p1 : type(p1)
return all-false : type(p1)Example:
// Note: p1 and p2 can have different sizes.
%pd = arm_sve.psel %p1, %p2[%index] : vector<[4]xi1>, vector<[8]xi1>Note: This requires SME or SVE2.1 (+sme or +sve2p1 in LLVM target features).
arm_sve.sdot - Vector-vector dot product and accumulate op
Operands
acc- Single, anonymous/composite constraint, scalable vector of 32-bit signless integer or 64-bit signless integer values of length 4/2src1- Single, anonymous/composite constraint, scalable vector of 8-bit signless integer or 16-bit signless integer values of length 16/8src2- Single, anonymous/composite constraint, scalable vector of 8-bit signless integer or 16-bit signless integer values of length 16/8
Results
dst- Single, anonymous/composite constraint, scalable vector of 32-bit signless integer or 64-bit signless integer values of length 4/2
Description
SDOT: Signed integer addition of dot product.
This function maps to the SDOT instruction, and it takes signless integer operands that the operation interprets as signed. It partitions the second and third vector inputs into groups of four elements. They calculate the dot product of each group (without loss of precision) and then add each result to the overlapping element of the first vector input.
arm_sve.smmla - Matrix-matrix multiply and accumulate op
Operands
acc- Single, anonymous/composite constraint, scalable vector of 32-bit signless integer values of length 4src1- Single, anonymous/composite constraint, scalable vector of 8-bit signless integer values of length 16src2- Single, anonymous/composite constraint, scalable vector of 8-bit signless integer values of length 16
Results
dst- Single, anonymous/composite constraint, scalable vector of 32-bit signless integer values of length 4
Description
SMMLA: Signed integer matrix multiply-accumulate.
This function maps to the SMMLA instruction, and it takes signless integer operands that the operation interprets as signed. It partitions the inputs into 128-bit quadwords, with the first input containing a row-by-row 2×2 matrix of 32-bit integers, the second input containing a row-by-row 2×8 matrix of 8-bit integers, and the third input containing a column-by-column 8×2 matrix of 8-bit integers. For each quadword, they multiply the second input matrix by the third input matrix using natural arithmetic and then add the result to the first input using modular arithmetic.
arm_sve.udot - Vector-vector dot product and accumulate op
Operands
acc- Single, anonymous/composite constraint, scalable vector of 32-bit signless integer or 64-bit signless integer values of length 4/2src1- Single, anonymous/composite constraint, scalable vector of 8-bit signless integer or 16-bit signless integer values of length 16/8src2- Single, anonymous/composite constraint, scalable vector of 8-bit signless integer or 16-bit signless integer values of length 16/8
Results
dst- Single, anonymous/composite constraint, scalable vector of 32-bit signless integer or 64-bit signless integer values of length 4/2
Description
UDOT: Unsigned integer addition of dot product.
This function maps to the UDOT instruction, and it takes signless integer operands that the operation interprets as unsigned. It partitions the second and third vector inputs into groups of four elements. They calculate the dot product of each group (without loss of precision) and then add each result to the overlapping element of the first vector input.
arm_sve.ummla - Matrix-matrix multiply and accumulate op
Operands
acc- Single, anonymous/composite constraint, scalable vector of 32-bit signless integer values of length 4src1- Single, anonymous/composite constraint, scalable vector of 8-bit signless integer values of length 16src2- Single, anonymous/composite constraint, scalable vector of 8-bit signless integer values of length 16
Results
dst- Single, anonymous/composite constraint, scalable vector of 32-bit signless integer values of length 4
Description
UMMLA: Unsigned integer matrix multiply-accumulate.
This function maps to the UMMLA instruction, and it takes signless integer operands that the operation interprets as unsigned. It partitions the inputs into 128-bit quadwords, with the first input containing a row-by-row 2×2 matrix of 32-bit integers, the second input containing a row-by-row 2×8 matrix of 8-bit integers, and the third input containing a column-by-column 8×2 matrix of 8-bit integers. For each quadword, they multiply the second input matrix by the third input matrix using natural arithmetic and then add the result to the first input using modular arithmetic.
arm_sve.usmmla - Matrix-matrix multiply and accumulate op
Operands
acc- Single, anonymous/composite constraint, scalable vector of 32-bit signless integer values of length 4src1- Single, anonymous/composite constraint, scalable vector of 8-bit signless integer values of length 16src2- Single, anonymous/composite constraint, scalable vector of 8-bit signless integer values of length 16
Results
dst- Single, anonymous/composite constraint, scalable vector of 32-bit signless integer values of length 4
Description
USMMLA: Unsigned by signed integer matrix multiply-accumulate.
The unsigned by signed integer matrix multiply-accumulate operation multiplies the 2×8 matrix of unsigned 8-bit integer values held the first source vector by the 8×2 matrix of signed 8-bit integer values in the second source vector. The resulting 2×2 widened 32-bit integer matrix product is then added to the 32-bit integer matrix accumulator.
arm_sve.zip.x2 - Multi-vector two-way zip op
Operands
sourceV1- Single,ZipInputVectorType, an SVE vector with element size <= 64-bitsourceV2- Single,ZipInputVectorType, an SVE vector with element size <= 64-bit
Results
resultV1- Single,ZipInputVectorType, an SVE vector with element size <= 64-bitresultV2- Single,ZipInputVectorType, an SVE vector with element size <= 64-bit
Description
This operation interleaves elements from two input SVE vectors, returning
two new SVE vectors (resultV1 and resultV2), which contain the low and
high halves of the result respectively.
Example:
// sourceV1 = [ A1, A2, A3, ... An ]
// sourceV2 = [ B1, B2, B3, ... Bn ]
// (resultV1, resultV2) = [ A1, B1, A2, B2, A3, B3, ... An, Bn ]
%resultV1, %resultV2 = arm_sve.zip.x2 %sourceV1, %sourceV2 : vector<[16]xi8>Note: This requires SME 2 (+sme2 in LLVM target features)
arm_sve.zip.x4 - Multi-vector four-way zip op
Operands
sourceV1- Single,ZipInputVectorType, an SVE vector with element size <= 64-bitsourceV2- Single,ZipInputVectorType, an SVE vector with element size <= 64-bitsourceV3- Single,ZipInputVectorType, an SVE vector with element size <= 64-bitsourceV4- Single,ZipInputVectorType, an SVE vector with element size <= 64-bit
Results
resultV1- Single,ZipInputVectorType, an SVE vector with element size <= 64-bitresultV2- Single,ZipInputVectorType, an SVE vector with element size <= 64-bitresultV3- Single,ZipInputVectorType, an SVE vector with element size <= 64-bitresultV4- Single,ZipInputVectorType, an SVE vector with element size <= 64-bit
Description
This operation interleaves elements from four input SVE vectors, returning
four new SVE vectors, each of which contain a quarter of the result. The
first quarter will be in resultV1, second in resultV2, third in
resultV3, and fourth in resultV4.
// sourceV1 = [ A1, A2, ... An ]
// sourceV2 = [ B1, B2, ... Bn ]
// sourceV3 = [ C1, C2, ... Cn ]
// sourceV4 = [ D1, D2, ... Dn ]
// (resultV1, resultV2, resultV3, resultV4)
// = [ A1, B1, C1, D1, A2, B2, C2, D2, ... An, Bn, Cn, Dn ]
%resultV1, %resultV2, %resultV3, %resultV4 = arm_sve.zip.x4
%sourceV1, %sourceV2, %sourceV3, %sourceV4 : vector<[16]xi8>Warning: The result of this op is undefined for 64-bit elements on hardware with less than 256-bit vectors!
Note: This requires SME 2 (+sme2 in LLVM target features)