Beaver.MLIR.Dialect.ArmNeon (beaver v0.4.7)
Summary
Functions
arm_neon.2d.sdot - sdot op
arm_neon.intr.bfmmla - BFloat16 matrix multiply-accumulate to single-precision
arm_neon.intr.sdot - sdot op
arm_neon.intr.smmla - Matrix-matrix multiply and accumulate op
arm_neon.intr.smull - smull roundscale op
arm_neon.intr.ummla - Unsinged matrix-matrix multiply and accumulate op
arm_neon.intr.usmmla - Unsignged and signed matrix-matrix multiply and accumulate op
Functions
arm_neon.2d.sdot - sdot op
Operands
a- Single, anonymous/composite constraint, vector of 32-bit signless integer values of length 4/2b- Single, anonymous/composite constraint, vector of 8-bit signless integer values of length 16/8c- Single, anonymous/composite constraint, vector of 8-bit signless integer values of length 16/8
Results
res- Single, anonymous/composite constraint, vector of 32-bit signless integer values of length 4/2
Description
The two input vectors b and c have a 2D shape, consisting of either 2
or 4 rows, each row having length 4. This operation computes the pair-wise
dot-products of the rows of b and c and accumulates them with the
corresponding entry of a:
res[i] := a[i] + dot_product(b[i, ...], c[i, ...])
arm_neon.intr.bfmmla - BFloat16 matrix multiply-accumulate to single-precision
Operands
acc- Single, anonymous/composite constraint, a vector with length 4 of 32-bit float valuessrc1- Single, anonymous/composite constraint, a vector with length 8 of bfloat16 type valuessrc2- Single, anonymous/composite constraint, a vector with length 8 of bfloat16 type values
Results
res- Single, anonymous/composite constraint, a vector with length 4 of 32-bit float values
Description
BFMMLA: BFloat16 matrix multiply-accumulate to single-precision.
The operation multiplies the 2x4 BFloat16 matrix in the first source vector with the 4x2 BFloat16 matrix in the second source vector, then accumulates this intermediate result with the 2x2 Float32 matrix in the accumulator vector, yielding the final 2x2 Float32 result.
Source: https://developer.arm.com/architectures/instruction-sets/intrinsics/vbfmmlaq_f32
arm_neon.intr.sdot - sdot op
Operands
a- Single, anonymous/composite constraint, vector of 32-bit signless integer values of length 4/2b- Single, anonymous/composite constraint, vector of 8-bit signless integer values of length 16/8c- Single, anonymous/composite constraint, vector of 8-bit signless integer values of length 16/8
Results
res- Single, anonymous/composite constraint, vector of 32-bit signless integer values of length 4/2
Description
Signed integer addition of dot product (vector). This instruction performs the following operation on signed integer vectors: res = dot(b, c) + a, where vector operands are partitioned into groups of four elements.
Source: https://developer.arm.com/architectures/instruction-sets/simd-isas/neon/intrinsics
arm_neon.intr.smmla - Matrix-matrix multiply and accumulate op
Operands
acc- Single, anonymous/composite constraint, a vector with length 4 of 32-bit signless integer valuessrc1- Single, anonymous/composite constraint, a vector with length 16 of 8-bit signless integer valuessrc2- Single, anonymous/composite constraint, a vector with length 16 of 8-bit signless integer values
Results
res- Single, anonymous/composite constraint, a vector with length 4 of 32-bit signless integer values
Description
SMMLA: Signed integer matrix multiply-accumulate.
Signed 8-bit integer matrix multiply-accumulate. This instruction multiplies the 2x8 matrix of signed 8-bit integer values in the first source vector by the 8x2 matrix of signed 8-bit integer values in the second source vector. The resulting 2x2 32-bit integer matrix product is destructively added to the 32-bit integer matrix accumulator in the destination vector. This is equivalent to performing an 8-way dot product per destination element.
arm_neon.intr.smull - smull roundscale op
Operands
a- Single, anonymous/composite constraint, vector of 8-bit signless integer or 16-bit signless integer or 32-bit signless integer values of length 8/4/2b- Single, anonymous/composite constraint, vector of 8-bit signless integer or 16-bit signless integer or 32-bit signless integer values of length 8/4/2
Results
res- Single, anonymous/composite constraint, vector of 16-bit signless integer or 32-bit signless integer or 64-bit signless integer values of length 8/4/2
Description
Signed Multiply Long (vector). This instruction multiplies corresponding signed integer values in the lower or upper half of the vectors of the two source SIMD&FP registers, places the results in a vector, and writes the vector to the destination SIMD&FP register.
Source: https://developer.arm.com/architectures/instruction-sets/simd-isas/neon/intrinsics
arm_neon.intr.ummla - Unsinged matrix-matrix multiply and accumulate op
Operands
acc- Single, anonymous/composite constraint, a vector with length 4 of 32-bit signless integer valuessrc1- Single, anonymous/composite constraint, a vector with length 16 of 8-bit signless integer valuessrc2- Single, anonymous/composite constraint, a vector with length 16 of 8-bit signless integer values
Results
res- Single, anonymous/composite constraint, a vector with length 4 of 32-bit signless integer values
Description
UMMLA: Signed integer matrix multiply-accumulate.
Unsigned 8-bit integer matrix multiply-accumulate. This instruction multiplies the 2x8 matrix of unsigned 8-bit integer values in the first source vector by the 8x2 matrix of unsigned 8-bit integer values in the second source vector. The resulting 2x2 32-bit integer matrix product is destructively added to the 32-bit integer matrix accumulator in the destination vector. This is equivalent to performing an 8-way dot product per destination element.
arm_neon.intr.usmmla - Unsignged and signed matrix-matrix multiply and accumulate op
Operands
acc- Single, anonymous/composite constraint, a vector with length 4 of 32-bit signless integer valuessrc1- Single, anonymous/composite constraint, a vector with length 16 of 8-bit signless integer valuessrc2- Single, anonymous/composite constraint, a vector with length 16 of 8-bit signless integer values
Results
res- Single, anonymous/composite constraint, a vector with length 4 of 32-bit signless integer values
Description
USMMLA: Signed integer matrix multiply-accumulate.
Unsigned and signed 8-bit integer matrix multiply-accumulate. This instruction multiplies the 2x8 matrix of unsigned 8-bit integer values in the first source vector by the 8x2 matrix of signed 8-bit integer values in the second source vector. The resulting 2x2 32-bit integer matrix product is destructively added to the 32-bit integer matrix accumulator in the destination vector. This is equivalent to performing an 8-way dot product per destination element.