Beaver.MLIR.Dialect.ArmNeon (beaver v0.4.7)

Summary

Functions

arm_neon.2d.sdot - sdot op

arm_neon.intr.bfmmla - BFloat16 matrix multiply-accumulate to single-precision

arm_neon.intr.sdot - sdot op

arm_neon.intr.smmla - Matrix-matrix multiply and accumulate op

arm_neon.intr.smull - smull roundscale op

arm_neon.intr.ummla - Unsinged matrix-matrix multiply and accumulate op

arm_neon.intr.usmmla - Unsignged and signed matrix-matrix multiply and accumulate op

Functions

2d_sdot(ssa)

arm_neon.2d.sdot - sdot op

Operands

  • a - Single, anonymous/composite constraint, vector of 32-bit signless integer values of length 4/2
  • b - Single, anonymous/composite constraint, vector of 8-bit signless integer values of length 16/8
  • c - Single, anonymous/composite constraint, vector of 8-bit signless integer values of length 16/8

Results

  • res - Single, anonymous/composite constraint, vector of 32-bit signless integer values of length 4/2

Description

The two input vectors b and c have a 2D shape, consisting of either 2 or 4 rows, each row having length 4. This operation computes the pair-wise dot-products of the rows of b and c and accumulates them with the corresponding entry of a:

res[i] := a[i] + dot_product(b[i, ...], c[i, ...])

intr_bfmmla(ssa)

arm_neon.intr.bfmmla - BFloat16 matrix multiply-accumulate to single-precision

Operands

  • acc - Single, anonymous/composite constraint, a vector with length 4 of 32-bit float values
  • src1 - Single, anonymous/composite constraint, a vector with length 8 of bfloat16 type values
  • src2 - Single, anonymous/composite constraint, a vector with length 8 of bfloat16 type values

Results

  • res - Single, anonymous/composite constraint, a vector with length 4 of 32-bit float values

Description

BFMMLA: BFloat16 matrix multiply-accumulate to single-precision.

The operation multiplies the 2x4 BFloat16 matrix in the first source vector with the 4x2 BFloat16 matrix in the second source vector, then accumulates this intermediate result with the 2x2 Float32 matrix in the accumulator vector, yielding the final 2x2 Float32 result.

Source: https://developer.arm.com/architectures/instruction-sets/intrinsics/vbfmmlaq_f32

intr_sdot(ssa)

arm_neon.intr.sdot - sdot op

Operands

  • a - Single, anonymous/composite constraint, vector of 32-bit signless integer values of length 4/2
  • b - Single, anonymous/composite constraint, vector of 8-bit signless integer values of length 16/8
  • c - Single, anonymous/composite constraint, vector of 8-bit signless integer values of length 16/8

Results

  • res - Single, anonymous/composite constraint, vector of 32-bit signless integer values of length 4/2

Description

Signed integer addition of dot product (vector). This instruction performs the following operation on signed integer vectors: res = dot(b, c) + a, where vector operands are partitioned into groups of four elements.

Source: https://developer.arm.com/architectures/instruction-sets/simd-isas/neon/intrinsics

intr_smmla(ssa)

arm_neon.intr.smmla - Matrix-matrix multiply and accumulate op

Operands

  • acc - Single, anonymous/composite constraint, a vector with length 4 of 32-bit signless integer values
  • src1 - Single, anonymous/composite constraint, a vector with length 16 of 8-bit signless integer values
  • src2 - Single, anonymous/composite constraint, a vector with length 16 of 8-bit signless integer values

Results

  • res - Single, anonymous/composite constraint, a vector with length 4 of 32-bit signless integer values

Description

SMMLA: Signed integer matrix multiply-accumulate.

Signed 8-bit integer matrix multiply-accumulate. This instruction multiplies the 2x8 matrix of signed 8-bit integer values in the first source vector by the 8x2 matrix of signed 8-bit integer values in the second source vector. The resulting 2x2 32-bit integer matrix product is destructively added to the 32-bit integer matrix accumulator in the destination vector. This is equivalent to performing an 8-way dot product per destination element.

Source: https://developer.arm.com/architectures/instruction-sets/intrinsics/#f:@navigationhierarchiessimdisa=[Neon]&q=smmla

intr_smull(ssa)

arm_neon.intr.smull - smull roundscale op

Operands

  • a - Single, anonymous/composite constraint, vector of 8-bit signless integer or 16-bit signless integer or 32-bit signless integer values of length 8/4/2
  • b - Single, anonymous/composite constraint, vector of 8-bit signless integer or 16-bit signless integer or 32-bit signless integer values of length 8/4/2

Results

  • res - Single, anonymous/composite constraint, vector of 16-bit signless integer or 32-bit signless integer or 64-bit signless integer values of length 8/4/2

Description

Signed Multiply Long (vector). This instruction multiplies corresponding signed integer values in the lower or upper half of the vectors of the two source SIMD&FP registers, places the results in a vector, and writes the vector to the destination SIMD&FP register.

Source: https://developer.arm.com/architectures/instruction-sets/simd-isas/neon/intrinsics

intr_ummla(ssa)

arm_neon.intr.ummla - Unsinged matrix-matrix multiply and accumulate op

Operands

  • acc - Single, anonymous/composite constraint, a vector with length 4 of 32-bit signless integer values
  • src1 - Single, anonymous/composite constraint, a vector with length 16 of 8-bit signless integer values
  • src2 - Single, anonymous/composite constraint, a vector with length 16 of 8-bit signless integer values

Results

  • res - Single, anonymous/composite constraint, a vector with length 4 of 32-bit signless integer values

Description

UMMLA: Signed integer matrix multiply-accumulate.

Unsigned 8-bit integer matrix multiply-accumulate. This instruction multiplies the 2x8 matrix of unsigned 8-bit integer values in the first source vector by the 8x2 matrix of unsigned 8-bit integer values in the second source vector. The resulting 2x2 32-bit integer matrix product is destructively added to the 32-bit integer matrix accumulator in the destination vector. This is equivalent to performing an 8-way dot product per destination element.

Source: https://developer.arm.com/architectures/instruction-sets/intrinsics/#f:@navigationhierarchiessimdisa=[Neon]&q=ummla

intr_usmmla(ssa)

arm_neon.intr.usmmla - Unsignged and signed matrix-matrix multiply and accumulate op

Operands

  • acc - Single, anonymous/composite constraint, a vector with length 4 of 32-bit signless integer values
  • src1 - Single, anonymous/composite constraint, a vector with length 16 of 8-bit signless integer values
  • src2 - Single, anonymous/composite constraint, a vector with length 16 of 8-bit signless integer values

Results

  • res - Single, anonymous/composite constraint, a vector with length 4 of 32-bit signless integer values

Description

USMMLA: Signed integer matrix multiply-accumulate.

Unsigned and signed 8-bit integer matrix multiply-accumulate. This instruction multiplies the 2x8 matrix of unsigned 8-bit integer values in the first source vector by the 8x2 matrix of signed 8-bit integer values in the second source vector. The resulting 2x2 32-bit integer matrix product is destructively added to the 32-bit integer matrix accumulator in the destination vector. This is equivalent to performing an 8-way dot product per destination element.

Source: https://developer.arm.com/architectures/instruction-sets/intrinsics/#f:@navigationhierarchiessimdisa=[Neon]&q=usmmla