Beaver.MLIR.Dialect.ArmSVE (beaver v0.4.6)

Summary

Functions

convert_from_svbool(ssa)

arm_sve.convert_from_svbool - Convert a svbool type to a SVE predicate type

convert_to_svbool(ssa)

arm_sve.convert_to_svbool - Convert a SVE predicate type to a svbool type

dupq_lane(ssa)

arm_sve.dupq_lane - Broadcast indexed 128-bit segment to vector

intr_add(ssa)

arm_sve.intr.add

intr_bfmmla(ssa)

arm_sve.intr.bfmmla - BFloat16 matrix multiply-accumulate

intr_convert_from_svbool(ssa)

arm_sve.intr.convert.from.svbool

intr_convert_to_svbool(ssa)

arm_sve.intr.convert.to.svbool

intr_dupq_lane(ssa)

arm_sve.intr.dupq_lane

intr_fadd(ssa)

arm_sve.intr.fadd

intr_fdiv(ssa)

arm_sve.intr.fdiv

intr_fmul(ssa)

arm_sve.intr.fmul

intr_fsub(ssa)

arm_sve.intr.fsub

intr_mul(ssa)

arm_sve.intr.mul

intr_psel(ssa)

arm_sve.intr.psel

intr_sdiv(ssa)

arm_sve.intr.sdiv

intr_sdot(ssa)

arm_sve.intr.sdot

intr_smmla(ssa)

arm_sve.intr.smmla

intr_sub(ssa)

arm_sve.intr.sub

intr_udiv(ssa)

arm_sve.intr.udiv

intr_udot(ssa)

arm_sve.intr.udot

intr_ummla(ssa)

arm_sve.intr.ummla

intr_usmmla(ssa)

arm_sve.intr.usmmla

intr_whilelt(ssa)

arm_sve.intr.whilelt

intr_zip_x2(ssa)

arm_sve.intr.zip.x2

intr_zip_x4(ssa)

arm_sve.intr.zip.x4

masked_addf(ssa)

arm_sve.masked.addf - masked addition for scalable vectors of floats

masked_addi(ssa)

arm_sve.masked.addi - masked addition for scalable vectors of integers

masked_divf(ssa)

arm_sve.masked.divf - masked division for scalable vectors of floats

masked_divi_signed(ssa)

arm_sve.masked.divi_signed - masked signed division for scalable vectors of integers

masked_divi_unsigned(ssa)

arm_sve.masked.divi_unsigned - masked unsigned division for scalable vectors of integers

masked_mulf(ssa)

arm_sve.masked.mulf - masked multiplication for scalable vectors of floats

masked_muli(ssa)

arm_sve.masked.muli - masked multiplication for scalable vectors of integers

masked_subf(ssa)

arm_sve.masked.subf - masked subtraction for scalable vectors of floats

masked_subi(ssa)

arm_sve.masked.subi - masked subtraction for scalable vectors of integers

psel(ssa)

arm_sve.psel - Predicate select

sdot(ssa)

arm_sve.sdot - Vector-vector dot product and accumulate op

smmla(ssa)

arm_sve.smmla - Matrix-matrix multiply and accumulate op

udot(ssa)

arm_sve.udot - Vector-vector dot product and accumulate op

ummla(ssa)

arm_sve.ummla - Matrix-matrix multiply and accumulate op

usmmla(ssa)

arm_sve.usmmla - Matrix-matrix multiply and accumulate op

zip_x2(ssa)

arm_sve.zip.x2 - Multi-vector two-way zip op

zip_x4(ssa)

arm_sve.zip.x4 - Multi-vector four-way zip op

Functions

convert_from_svbool(ssa)

arm_sve.convert_from_svbool - Convert a svbool type to a SVE predicate type

Operands

source - Single, SVBoolMask, trailing scalable vector of 1-bit signless integer values with dim -1 having a size of {16}

Results

result - Single, SVEPredicateMask, trailing scalable vector of 1-bit signless integer values with dim -1 having a size of {16, 8, 4, 2, 1}

Description

Converts svbool types (vector<[16]xi1> or vectors of that type, e.g. vector<2x3x[16]xi1>) to SVE predicate types. Note: Only the trailing dimension can be scalable.

Example 1: Convert a 1-D svbool mask to a SVE predicate.

%source = vector.load %memref[%c0] : memref<?xi1>, vector<[16]xi1>
%result = arm_sve.convert_from_svbool %source : vector<[4]xi1>

Example 2: Convert a 2-D svbool mask to a mask of SVE predicates.

%source = vector.load %memref[%c0, %c0] : memref<2x?xi1>, vector<2x[16]xi1>
%result = arm_sve.convert_from_svbool %source : vector<2x[8]xi1>

A svbool is the smallest SVE predicate type that has a in-memory representation (and maps to a full predicate register). In MLIR svbool is represented as vector<[16]xi1>. Smaller SVE predicate types (vector<[1|2|4|8]xi1>) must be stored as a svbool then converted back to the original predicate type after loading.

convert_to_svbool(ssa)

arm_sve.convert_to_svbool - Convert a SVE predicate type to a svbool type

Operands

source - Single, SVEPredicateMask, trailing scalable vector of 1-bit signless integer values with dim -1 having a size of {16, 8, 4, 2, 1}

Results

result - Single, SVBoolMask, trailing scalable vector of 1-bit signless integer values with dim -1 having a size of {16}

Description

Converts SVE predicate types (or vectors of predicate types, e.g. vector<4x[4]xi1>) to svbool types. Note: Only the trailing dimension can be scalable.

Example 1: Convert a 1-D SVE predicate to a svbool mask.

%source = vector.create_mask %dim_size : vector<[4]xi1>
%result = arm_sve.convert_to_svbool %source : vector<[4]xi1>
// => Results in vector<[16]xi1>

Example 2: Convert a 2-D mask of SVE predicates to a svbool mask.

%source = vector.create_mask %c2, %dim_size : vector<2x[2]xi1>
%result = arm_sve.convert_to_svbool %source : vector<2x[2]xi1>
// => Results in vector<2x[16]xi1>

dupq_lane(ssa)

arm_sve.dupq_lane - Broadcast indexed 128-bit segment to vector

Attributes

lane - Single, I64Attr, 64-bit signless integer attribute

Operands

src - Single, SVEVector, a vector type that matches the size of a SVE vector

Results

dst - Single, SVEVector, a vector type that matches the size of a SVE vector

Description

This operation fills each 128-bit segment of a vector with the elements from the indexed 128-bit segment of the source vector. If the VL is 128 bits the operation is a NOP. If the index exceeds the number of 128-bit segments in a vector the result is an all-zeroes vector.

Example:

// VL == 256
// %X = [A B C D x x x x]
%Y = arm_sve.dupq_lane %X[0] : vector<[4]xi32>
// Y = [A B C D A B C D]

// %U = [x x x x x x x x A B C D E F G H]
%V = arm_sve.dupq_lane %U[1] : vector<[8]xf16>
// %V = [A B C D E F H A B C D E F H]

Note: The semantics of the operation match those of the svdupq_lane instrinsics. Source

intr_add(ssa)

arm_sve.intr.add

Operands

anonymous - Single, AnyScalableVectorOfAnyRank, scalable vector of any type values
anonymous - Single, AnyScalableVectorOfAnyRank, scalable vector of any type values
anonymous - Single, AnyScalableVectorOfAnyRank, scalable vector of any type values

Results

res - Single, LLVM_Type, LLVM dialect-compatible type

intr_bfmmla(ssa)

arm_sve.intr.bfmmla - BFloat16 matrix multiply-accumulate

Operands

acc - Single, anonymous/composite constraint, f8E8M0FNU type or fixed-length vector of f8E8M0FNU type values of length 4
src1 - Single, anonymous/composite constraint, scalable vector of bfloat16 type values of length 8
src2 - Single, anonymous/composite constraint, scalable vector of bfloat16 type values of length 8

Results

res - Single, anonymous/composite constraint, f8E8M0FNU type or fixed-length vector of f8E8M0FNU type values of length 4

Description

BFMMLA: BFloat16 matrix multiply-accumulate into 2×2 matrices";

This operation multiplies the 2x4 BFloat16 matrix held in each 128-bit segment of the first source vector by the 4x2 BFloat16 matrix in the corresponding segment of the second source vector, then accumulates this intermediate result with the 2x2 Float32 matrix in the corresponding segment of the accumulator vector, yielding the final 2x2 Float32 segment of the result.

Source: https://developer.arm.com/documentation/100987/0000

intr_convert_from_svbool(ssa)

arm_sve.intr.convert.from.svbool

Operands

svbool - Single, SVBool, vector<[16]xi1>

Results

res - Single, LLVM_Type, LLVM dialect-compatible type

intr_convert_to_svbool(ssa)

arm_sve.intr.convert.to.svbool

Operands

mask - Single, SVEPredicate, a vector type that matches the size of a SVE predicate

Results

res - Single, LLVM_Type, LLVM dialect-compatible type

intr_dupq_lane(ssa)

arm_sve.intr.dupq_lane

Attributes

lane - Single, I64Attr, 64-bit signless integer attribute

Operands

v - Single, anonymous/composite constraint, of ranks 1

Results

res - Single, LLVM_Type, LLVM dialect-compatible type

intr_fadd(ssa)

arm_sve.intr.fadd

Operands

anonymous - Single, AnyScalableVectorOfAnyRank, scalable vector of any type values
anonymous - Single, AnyScalableVectorOfAnyRank, scalable vector of any type values
anonymous - Single, AnyScalableVectorOfAnyRank, scalable vector of any type values

Results

res - Single, LLVM_Type, LLVM dialect-compatible type

intr_fdiv(ssa)

arm_sve.intr.fdiv

Operands

anonymous - Single, AnyScalableVectorOfAnyRank, scalable vector of any type values
anonymous - Single, AnyScalableVectorOfAnyRank, scalable vector of any type values
anonymous - Single, AnyScalableVectorOfAnyRank, scalable vector of any type values

Results

res - Single, LLVM_Type, LLVM dialect-compatible type

intr_fmul(ssa)

arm_sve.intr.fmul

Operands

anonymous - Single, AnyScalableVectorOfAnyRank, scalable vector of any type values
anonymous - Single, AnyScalableVectorOfAnyRank, scalable vector of any type values
anonymous - Single, AnyScalableVectorOfAnyRank, scalable vector of any type values

Results

res - Single, LLVM_Type, LLVM dialect-compatible type

intr_fsub(ssa)

arm_sve.intr.fsub

Operands

anonymous - Single, AnyScalableVectorOfAnyRank, scalable vector of any type values
anonymous - Single, AnyScalableVectorOfAnyRank, scalable vector of any type values
anonymous - Single, AnyScalableVectorOfAnyRank, scalable vector of any type values

Results

res - Single, LLVM_Type, LLVM dialect-compatible type

intr_mul(ssa)

arm_sve.intr.mul

Operands

anonymous - Single, AnyScalableVectorOfAnyRank, scalable vector of any type values
anonymous - Single, AnyScalableVectorOfAnyRank, scalable vector of any type values
anonymous - Single, AnyScalableVectorOfAnyRank, scalable vector of any type values

Results

res - Single, LLVM_Type, LLVM dialect-compatible type

intr_psel(ssa)

arm_sve.intr.psel

Operands

p1 - Single, SVBool, vector<[16]xi1>
p2 - Single, SVEPredicate, a vector type that matches the size of a SVE predicate
index - Single, I32, 32-bit signless integer

Results

res - Single, LLVM_Type, LLVM dialect-compatible type

intr_sdiv(ssa)

arm_sve.intr.sdiv

Operands

anonymous - Single, AnyScalableVectorOfAnyRank, scalable vector of any type values
anonymous - Single, AnyScalableVectorOfAnyRank, scalable vector of any type values
anonymous - Single, AnyScalableVectorOfAnyRank, scalable vector of any type values

Results

res - Single, LLVM_Type, LLVM dialect-compatible type

intr_sdot(ssa)

arm_sve.intr.sdot

Operands

anonymous - Single, AnyScalableVectorOfAnyRank, scalable vector of any type values
anonymous - Single, AnyScalableVectorOfAnyRank, scalable vector of any type values
anonymous - Single, AnyScalableVectorOfAnyRank, scalable vector of any type values

Results

res - Single, LLVM_Type, LLVM dialect-compatible type

intr_smmla(ssa)

arm_sve.intr.smmla

Operands

anonymous - Single, AnyScalableVectorOfAnyRank, scalable vector of any type values
anonymous - Single, AnyScalableVectorOfAnyRank, scalable vector of any type values
anonymous - Single, AnyScalableVectorOfAnyRank, scalable vector of any type values

Results

res - Single, LLVM_Type, LLVM dialect-compatible type

intr_sub(ssa)

arm_sve.intr.sub

Operands

anonymous - Single, AnyScalableVectorOfAnyRank, scalable vector of any type values
anonymous - Single, AnyScalableVectorOfAnyRank, scalable vector of any type values
anonymous - Single, AnyScalableVectorOfAnyRank, scalable vector of any type values

Results

res - Single, LLVM_Type, LLVM dialect-compatible type

intr_udiv(ssa)

arm_sve.intr.udiv

Operands

anonymous - Single, AnyScalableVectorOfAnyRank, scalable vector of any type values
anonymous - Single, AnyScalableVectorOfAnyRank, scalable vector of any type values
anonymous - Single, AnyScalableVectorOfAnyRank, scalable vector of any type values

Results

res - Single, LLVM_Type, LLVM dialect-compatible type

intr_udot(ssa)

arm_sve.intr.udot

Operands

anonymous - Single, AnyScalableVectorOfAnyRank, scalable vector of any type values
anonymous - Single, AnyScalableVectorOfAnyRank, scalable vector of any type values
anonymous - Single, AnyScalableVectorOfAnyRank, scalable vector of any type values

Results

res - Single, LLVM_Type, LLVM dialect-compatible type

intr_ummla(ssa)

arm_sve.intr.ummla

Operands

anonymous - Single, AnyScalableVectorOfAnyRank, scalable vector of any type values
anonymous - Single, AnyScalableVectorOfAnyRank, scalable vector of any type values
anonymous - Single, AnyScalableVectorOfAnyRank, scalable vector of any type values

Results

res - Single, LLVM_Type, LLVM dialect-compatible type

intr_usmmla(ssa)

arm_sve.intr.usmmla

Operands

anonymous - Single, AnyScalableVectorOfAnyRank, scalable vector of any type values
anonymous - Single, AnyScalableVectorOfAnyRank, scalable vector of any type values
anonymous - Single, AnyScalableVectorOfAnyRank, scalable vector of any type values

Results

res - Single, LLVM_Type, LLVM dialect-compatible type

intr_whilelt(ssa)

arm_sve.intr.whilelt

Operands

base - Single, I64, 64-bit signless integer
n - Single, I64, 64-bit signless integer

Results

res - Single, LLVM_Type, LLVM dialect-compatible type

intr_zip_x2(ssa)

arm_sve.intr.zip.x2

Operands

v1 - Single, AnyScalableVectorOfAnyRank, scalable vector of any type values
v2 - Single, AnyScalableVectorOfAnyRank, scalable vector of any type values

Results

res - Single, LLVM_Type, LLVM dialect-compatible type

intr_zip_x4(ssa)

arm_sve.intr.zip.x4

Operands

v1 - Single, AnyScalableVectorOfAnyRank, scalable vector of any type values
v2 - Single, AnyScalableVectorOfAnyRank, scalable vector of any type values
v3 - Single, AnyScalableVectorOfAnyRank, scalable vector of any type values
v4 - Single, AnyScalableVectorOfAnyRank, scalable vector of any type values

Results

res - Single, LLVM_Type, LLVM dialect-compatible type

masked_addf(ssa)

arm_sve.masked.addf - masked addition for scalable vectors of floats

Operands

mask - Single, anonymous/composite constraint, vector of 8-bit signless integer or 16-bit signless integer or 32-bit signless integer values of length 8/4/2
src1 - Single, anonymous/composite constraint, a vector with length 8 of bfloat16 type values
src2 - Single, anonymous/composite constraint, a vector with length 8 of bfloat16 type values

Results

res - Single, anonymous/composite constraint, a vector with length 8 of bfloat16 type values

Description

The arm_sve.masked.addf operation takes one scalable vector mask and two scalable vector operands, and perform floating point addition on active lanes. Inactive lanes will keep the value of the first operand.

masked_addi(ssa)

arm_sve.masked.addi - masked addition for scalable vectors of integers

Operands

mask - Single, anonymous/composite constraint, vector of 8-bit signless integer or 16-bit signless integer or 32-bit signless integer values of length 8/4/2
src1 - Single, anonymous/composite constraint, scalable vector of 8-bit signless integer or 16-bit signless integer or 32-bit signless integer or 64-bit signless integer values
src2 - Single, anonymous/composite constraint, scalable vector of 8-bit signless integer or 16-bit signless integer or 32-bit signless integer or 64-bit signless integer values

Results

res - Single, anonymous/composite constraint, scalable vector of 8-bit signless integer or 16-bit signless integer or 32-bit signless integer or 64-bit signless integer values

Description

The arm_sve.masked.addi operation takes one scalable vector mask and two scalable vector operands, and perform integer addition on active lanes. Inactive lanes will keep the value of the first operand.

masked_divf(ssa)

arm_sve.masked.divf - masked division for scalable vectors of floats

Operands

mask - Single, anonymous/composite constraint, vector of 8-bit signless integer or 16-bit signless integer or 32-bit signless integer values of length 8/4/2
src1 - Single, anonymous/composite constraint, a vector with length 8 of bfloat16 type values
src2 - Single, anonymous/composite constraint, a vector with length 8 of bfloat16 type values

Results

res - Single, anonymous/composite constraint, a vector with length 8 of bfloat16 type values

Description

The arm_sve.masked.divf operation takes one scalable vector mask and two scalable vector operands, and perform floating point division on active lanes. Inactive lanes will keep the value of the first operand.

masked_divi_signed(ssa)

arm_sve.masked.divi_signed - masked signed division for scalable vectors of integers

Operands

mask - Single, anonymous/composite constraint, vector of 8-bit signless integer or 16-bit signless integer or 32-bit signless integer values of length 8/4/2
src1 - Single, anonymous/composite constraint, scalable vector of 8-bit signless integer or 16-bit signless integer or 32-bit signless integer or 64-bit signless integer values
src2 - Single, anonymous/composite constraint, scalable vector of 8-bit signless integer or 16-bit signless integer or 32-bit signless integer or 64-bit signless integer values

Results

res - Single, anonymous/composite constraint, scalable vector of 8-bit signless integer or 16-bit signless integer or 32-bit signless integer or 64-bit signless integer values

Description

The arm_sve.masked.divi_signed operation takes one scalable vector mask and two scalable vector operands, and perform integer signed division on active lanes. Inactive lanes will keep the value of the first operand.

masked_divi_unsigned(ssa)

arm_sve.masked.divi_unsigned - masked unsigned division for scalable vectors of integers

Operands

mask - Single, anonymous/composite constraint, vector of 8-bit signless integer or 16-bit signless integer or 32-bit signless integer values of length 8/4/2
src1 - Single, anonymous/composite constraint, scalable vector of 8-bit signless integer or 16-bit signless integer or 32-bit signless integer or 64-bit signless integer values
src2 - Single, anonymous/composite constraint, scalable vector of 8-bit signless integer or 16-bit signless integer or 32-bit signless integer or 64-bit signless integer values

Results

res - Single, anonymous/composite constraint, scalable vector of 8-bit signless integer or 16-bit signless integer or 32-bit signless integer or 64-bit signless integer values

Description

The arm_sve.masked.divi_unsigned operation takes one scalable vector mask and two scalable vector operands, and perform integer unsigned division on active lanes. Inactive lanes will keep the value of the first operand.

masked_mulf(ssa)

arm_sve.masked.mulf - masked multiplication for scalable vectors of floats

Operands

mask - Single, anonymous/composite constraint, vector of 8-bit signless integer or 16-bit signless integer or 32-bit signless integer values of length 8/4/2
src1 - Single, anonymous/composite constraint, a vector with length 8 of bfloat16 type values
src2 - Single, anonymous/composite constraint, a vector with length 8 of bfloat16 type values

Results

res - Single, anonymous/composite constraint, a vector with length 8 of bfloat16 type values

Description

The arm_sve.masked.mulf operation takes one scalable vector mask and two scalable vector operands, and perform floating point multiplication on active lanes. Inactive lanes will keep the value of the first operand.

masked_muli(ssa)

arm_sve.masked.muli - masked multiplication for scalable vectors of integers

Operands

mask - Single, anonymous/composite constraint, vector of 8-bit signless integer or 16-bit signless integer or 32-bit signless integer values of length 8/4/2
src1 - Single, anonymous/composite constraint, scalable vector of 8-bit signless integer or 16-bit signless integer or 32-bit signless integer or 64-bit signless integer values
src2 - Single, anonymous/composite constraint, scalable vector of 8-bit signless integer or 16-bit signless integer or 32-bit signless integer or 64-bit signless integer values

Results

res - Single, anonymous/composite constraint, scalable vector of 8-bit signless integer or 16-bit signless integer or 32-bit signless integer or 64-bit signless integer values

Description

The arm_sve.masked.muli operation takes one scalable vector mask and two scalable vector operands, and perform integer multiplication on active lanes. Inactive lanes will keep the value of the first operand.

masked_subf(ssa)

arm_sve.masked.subf - masked subtraction for scalable vectors of floats

Operands

mask - Single, anonymous/composite constraint, vector of 8-bit signless integer or 16-bit signless integer or 32-bit signless integer values of length 8/4/2
src1 - Single, anonymous/composite constraint, a vector with length 8 of bfloat16 type values
src2 - Single, anonymous/composite constraint, a vector with length 8 of bfloat16 type values

Results

res - Single, anonymous/composite constraint, a vector with length 8 of bfloat16 type values

Description

The arm_sve.masked.subf operation takes one scalable vector mask and two scalable vector operands, and perform floating point subtraction on active lanes. Inactive lanes will keep the value of the first operand.

masked_subi(ssa)

arm_sve.masked.subi - masked subtraction for scalable vectors of integers

Operands

mask - Single, anonymous/composite constraint, vector of 8-bit signless integer or 16-bit signless integer or 32-bit signless integer values of length 8/4/2
src1 - Single, anonymous/composite constraint, scalable vector of 8-bit signless integer or 16-bit signless integer or 32-bit signless integer or 64-bit signless integer values
src2 - Single, anonymous/composite constraint, scalable vector of 8-bit signless integer or 16-bit signless integer or 32-bit signless integer or 64-bit signless integer values

Results

res - Single, anonymous/composite constraint, scalable vector of 8-bit signless integer or 16-bit signless integer or 32-bit signless integer or 64-bit signless integer values

Description

The arm_sve.masked.subi operation takes one scalable vector mask and two scalable vector operands, and perform integer subtraction on active lanes. Inactive lanes will keep the value of the first operand.

psel(ssa)

arm_sve.psel - Predicate select

Operands

p1 - Single, SVEPredicate, a vector type that matches the size of a SVE predicate
p2 - Single, SVEPredicate, a vector type that matches the size of a SVE predicate
index - Single, Index, index

Results

result - Single, SVEPredicate, a vector type that matches the size of a SVE predicate

Description

This operation returns the input predicate p1 or an all-false predicate based on the bit at p2[index]. Informally, the semantics are:

if p2[index % num_elements(p2)] == 1:
  return p1 : type(p1)
return all-false : type(p1)

Example:

// Note: p1 and p2 can have different sizes.
%pd = arm_sve.psel %p1, %p2[%index] : vector<[4]xi1>, vector<[8]xi1>

Note: This requires SME or SVE2.1 (+sme or +sve2p1 in LLVM target features).

sdot(ssa)

arm_sve.sdot - Vector-vector dot product and accumulate op

Operands

acc - Single, anonymous/composite constraint, scalable vector of 32-bit signless integer or 64-bit signless integer values of length 4/2
src1 - Single, anonymous/composite constraint, scalable vector of 8-bit signless integer or 16-bit signless integer values of length 16/8
src2 - Single, anonymous/composite constraint, scalable vector of 8-bit signless integer or 16-bit signless integer values of length 16/8

Results

dst - Single, anonymous/composite constraint, scalable vector of 32-bit signless integer or 64-bit signless integer values of length 4/2

Description

SDOT: Signed integer addition of dot product.

This function maps to the SDOT instruction, and it takes signless integer operands that the operation interprets as signed. It partitions the second and third vector inputs into groups of four elements. They calculate the dot product of each group (without loss of precision) and then add each result to the overlapping element of the first vector input.

Source: https://developer.arm.com/documentation/100987/0000

smmla(ssa)

arm_sve.smmla - Matrix-matrix multiply and accumulate op

Operands

acc - Single, anonymous/composite constraint, scalable vector of 32-bit signless integer values of length 4
src1 - Single, anonymous/composite constraint, scalable vector of 8-bit signless integer values of length 16
src2 - Single, anonymous/composite constraint, scalable vector of 8-bit signless integer values of length 16

Results

dst - Single, anonymous/composite constraint, scalable vector of 32-bit signless integer values of length 4

Description

SMMLA: Signed integer matrix multiply-accumulate.

This function maps to the SMMLA instruction, and it takes signless integer operands that the operation interprets as signed. It partitions the inputs into 128-bit quadwords, with the first input containing a row-by-row 2×2 matrix of 32-bit integers, the second input containing a row-by-row 2×8 matrix of 8-bit integers, and the third input containing a column-by-column 8×2 matrix of 8-bit integers. For each quadword, they multiply the second input matrix by the third input matrix using natural arithmetic and then add the result to the first input using modular arithmetic.

Source: https://developer.arm.com/documentation/100987/0000

udot(ssa)

arm_sve.udot - Vector-vector dot product and accumulate op

Operands

acc - Single, anonymous/composite constraint, scalable vector of 32-bit signless integer or 64-bit signless integer values of length 4/2
src1 - Single, anonymous/composite constraint, scalable vector of 8-bit signless integer or 16-bit signless integer values of length 16/8
src2 - Single, anonymous/composite constraint, scalable vector of 8-bit signless integer or 16-bit signless integer values of length 16/8

Results

dst - Single, anonymous/composite constraint, scalable vector of 32-bit signless integer or 64-bit signless integer values of length 4/2

Description

UDOT: Unsigned integer addition of dot product.

This function maps to the UDOT instruction, and it takes signless integer operands that the operation interprets as unsigned. It partitions the second and third vector inputs into groups of four elements. They calculate the dot product of each group (without loss of precision) and then add each result to the overlapping element of the first vector input.

Source: https://developer.arm.com/documentation/100987/0000

ummla(ssa)

arm_sve.ummla - Matrix-matrix multiply and accumulate op

Operands

acc - Single, anonymous/composite constraint, scalable vector of 32-bit signless integer values of length 4
src1 - Single, anonymous/composite constraint, scalable vector of 8-bit signless integer values of length 16
src2 - Single, anonymous/composite constraint, scalable vector of 8-bit signless integer values of length 16

Results

dst - Single, anonymous/composite constraint, scalable vector of 32-bit signless integer values of length 4

Description

UMMLA: Unsigned integer matrix multiply-accumulate.

This function maps to the UMMLA instruction, and it takes signless integer operands that the operation interprets as unsigned. It partitions the inputs into 128-bit quadwords, with the first input containing a row-by-row 2×2 matrix of 32-bit integers, the second input containing a row-by-row 2×8 matrix of 8-bit integers, and the third input containing a column-by-column 8×2 matrix of 8-bit integers. For each quadword, they multiply the second input matrix by the third input matrix using natural arithmetic and then add the result to the first input using modular arithmetic.

Source: https://developer.arm.com/documentation/100987/0000

usmmla(ssa)

arm_sve.usmmla - Matrix-matrix multiply and accumulate op

Operands

acc - Single, anonymous/composite constraint, scalable vector of 32-bit signless integer values of length 4
src1 - Single, anonymous/composite constraint, scalable vector of 8-bit signless integer values of length 16
src2 - Single, anonymous/composite constraint, scalable vector of 8-bit signless integer values of length 16

Results

dst - Single, anonymous/composite constraint, scalable vector of 32-bit signless integer values of length 4

Description

USMMLA: Unsigned by signed integer matrix multiply-accumulate.

The unsigned by signed integer matrix multiply-accumulate operation multiplies the 2×8 matrix of unsigned 8-bit integer values held the first source vector by the 8×2 matrix of signed 8-bit integer values in the second source vector. The resulting 2×2 widened 32-bit integer matrix product is then added to the 32-bit integer matrix accumulator.

Source: https://developer.arm.com/documentation/100987/0000

zip_x2(ssa)

arm_sve.zip.x2 - Multi-vector two-way zip op

Operands

sourceV1 - Single, ZipInputVectorType, an SVE vector with element size <= 64-bit
sourceV2 - Single, ZipInputVectorType, an SVE vector with element size <= 64-bit

Results

resultV1 - Single, ZipInputVectorType, an SVE vector with element size <= 64-bit
resultV2 - Single, ZipInputVectorType, an SVE vector with element size <= 64-bit

Description

This operation interleaves elements from two input SVE vectors, returning two new SVE vectors (resultV1 and resultV2), which contain the low and high halves of the result respectively.

Example:

// sourceV1 = [ A1, A2, A3, ... An ]
// sourceV2 = [ B1, B2, B3, ... Bn ]
// (resultV1, resultV2) = [ A1, B1, A2, B2, A3, B3, ... An, Bn ]
%resultV1, %resultV2 = arm_sve.zip.x2 %sourceV1, %sourceV2 : vector<[16]xi8>

Note: This requires SME 2 (+sme2 in LLVM target features)

Source

zip_x4(ssa)

arm_sve.zip.x4 - Multi-vector four-way zip op

Operands

sourceV1 - Single, ZipInputVectorType, an SVE vector with element size <= 64-bit
sourceV2 - Single, ZipInputVectorType, an SVE vector with element size <= 64-bit
sourceV3 - Single, ZipInputVectorType, an SVE vector with element size <= 64-bit
sourceV4 - Single, ZipInputVectorType, an SVE vector with element size <= 64-bit

Results

resultV1 - Single, ZipInputVectorType, an SVE vector with element size <= 64-bit
resultV2 - Single, ZipInputVectorType, an SVE vector with element size <= 64-bit
resultV3 - Single, ZipInputVectorType, an SVE vector with element size <= 64-bit
resultV4 - Single, ZipInputVectorType, an SVE vector with element size <= 64-bit

Description

This operation interleaves elements from four input SVE vectors, returning four new SVE vectors, each of which contain a quarter of the result. The first quarter will be in resultV1, second in resultV2, third in resultV3, and fourth in resultV4.

// sourceV1 = [ A1, A2, ... An ]
// sourceV2 = [ B1, B2, ... Bn ]
// sourceV3 = [ C1, C2, ... Cn ]
// sourceV4 = [ D1, D2, ... Dn ]
// (resultV1, resultV2, resultV3, resultV4)
//   = [ A1, B1, C1, D1, A2, B2, C2, D2, ... An, Bn, Cn, Dn ]
%resultV1, %resultV2, %resultV3, %resultV4 = arm_sve.zip.x4
  %sourceV1, %sourceV2, %sourceV3, %sourceV4 : vector<[16]xi8>

Warning: The result of this op is undefined for 64-bit elements on hardware with less than 256-bit vectors!

Note: This requires SME 2 (+sme2 in LLVM target features)

Source