244 Commits

Author SHA1 Message Date
Victor Do Nascimento
5c77e72e01 aarch64: rcpc3: add support in general_constraint_met_p
Given the introduction of the new address operand types for rcpc3
instructions, this patch adds the necessary logic to teach
`general_constraint_met_p` how to proper handle these.
2024-01-15 13:11:48 +00:00
Victor Do Nascimento
51bb8593e6 aarch64: rcpc3: New RCPC3_ADDR operand types
The particular choices of address indexing, along with their encoding
for RCPC3 instructions lead to the requirement of a new set of operand
descriptions, along with the relevant inserter/extractor set.

That is, for the integer load/stores, there is only a single valid
indexing offset quantity and offset mode is allowed - The value is
always equivalent to the amount of data read/stored by the
operation and the offset is post-indexed for Load-Acquire RCpc, and
pre-indexed with writeback for Store-Release insns.

This indexing quantity/mode pair is selected by the setting of a
single bit in the instruction. To represent these insns, we add the
following operand types:

  - AARCH64_OPND_RCPC3_ADDR_OPT_POSTIND
  - AARCH64_OPND_RCPC3_ADDR_OPT_PREIND_WB

In the case of loads and stores involving SIMD/FP registers, the
optional offset is encoded as an 8-bit signed immediate, but neither
post-indexing or pre-indexing with writeback is available.  This
created the need for an operand type similar to
AARCH64_OPND_ADDR_OFFSET, with the difference that FLD_index should
not be checked.

We thus introduce the AARCH64_OPND_RCPC3_ADDR_OFFSET operand, a
variant of AARCH64_OPND_ADDR_OFFSET, w/o the FLD_index bitfield.
2024-01-15 13:11:48 +00:00
Victor Do Nascimento
c354600877 aarch64: rcpc3: Define address operand fields and inserter/extractors
Beyond the need to encode any registers involved in data transfer and
the address base register for load/stores, it is necessary to specify
the data register addressing mode and whether the address register is
to be pre/post-indexed, whereby loads may be post-indexed and stores
pre-indexed with write-back.

The use of a single bit to specify both the indexing mode and indexing
value requires a novel function be written to accommodate this for
address operand insertion in assembly and another for extraction in
disassembly, along with the definition of two insn fields for use with
these instructions.

This therefore defines the following functions:

  - aarch64_ins_rcpc3_addr_opt_offset
  - aarch64_ins_rcpc3_addr_offset
  - aarch64_ext_rcpc3_addr_opt_offset
  - aarch64_ext_rcpc3_addr_offset

It extends the `do_special_{encoding|decoding}' functions and defines
two rcpc3 instruction fields:

  - FLD_opc2
  - FLD_rcpc3_size
2024-01-15 13:11:48 +00:00
Victor Do Nascimento
2f8890efc5 aarch64: rcpc3: Create implicit load/store size calc function
The allowed immediate offsets in integer rcpc3 load store instructions
are not encoded explicitly in the instruction itself, being rather
implicitly equivalent to the amount of data loaded/stored by the
instruction.

This leads to the requirement that this quantity be calculated based on
the number of registers involved in the transfer, either as data
source or destination registers and their respective qualifiers.

This is done via `calc_ldst_datasize (const aarch64_opnd_info *opnds)'
implemented here, using a cumulative sum of qualifier sizes preceding
the address operand in the OPNDS operand list argument.
2024-01-15 13:11:48 +00:00
Andrew Carlotti
0796bfa487 aarch64: Fix tlbi and tlbip instructions
There are some tlbi operations that don't have a corresponding tlbip operation,
but we were incorrectly using the same list for both.  Add the missing tlbi
*nxs operations, and use the F_REG_128 flag to filter tlbi operations that
don't have a tlbip analogue.  For increased clarity, I have also used a macro
to reduce duplication between the 'nxs' and non-'nxs' variants, and added a
test to verify that no invalid combinations are accepted.

Additionally, fix two missing checks for AARCH64_OPND_SYSREG_TLBIP that were
preventing disassembly of tlbip instructions.
2024-01-15 12:42:30 +00:00
Andrew Carlotti
6344535387 aarch64: Refactor aarch64_sys_ins_reg_supported_p
Add an aarch64_feature_set field to aarch64_sys_ins_reg, and use this for
feature checks instead of testing against a list of operand codes.
2024-01-15 12:42:30 +00:00
Srinath Parvathaneni
b33f1bcd15 aarch64: Add SVE2.1 Contiguous load/store instructions.
Hi,

This patch add support for SVE2.1 instructions ld1q,
ld2q, ld3q and ld4q, st1q, st2q, st3q and st4q.

Regression testing for aarch64-none-elf target and found no regressions.

Ok for binutils-master?

Regards,
Srinath.
2024-01-15 11:45:42 +00:00
Srinath Parvathaneni
39092c7a1f aarch64: Add SVE2.1 dupq, eorqv and extq instructions.
Hi,

This patch add support for SVE2.1 instruction dupq, eorqv and extq.

Regression testing for aarch64-none-elf target and found no regressions.

Ok for binutils-master?

Regards,
Srinath.
2024-01-15 11:45:41 +00:00
Srinath Parvathaneni
89e06ec152 aarch64: Add support for FEAT_SME2p1 instructions.
Hi,

This patch add support for FEAT_SME2p1 and "movaz" instructions
along with the optional flag +sme2p1.

Following "movaz" instructions are add:
Move and zero two ZA tile slices to vector registers.
Move and zero four ZA tile slices to vector registers.

Regression testing for aarch64-none-elf target and found no regressions.

Ok for binutils-master?

Regards,
Srinath.
2024-01-15 11:45:41 +00:00
Andrew Carlotti
43291582c0 aarch64: Add +xs flag for existing instructions
Additionally, change FEAT_XS tlbi variants to be gated on "+xs" instead of
"+d128".  This is an incremental improvement; there are still some FEAT_XS tlbi
variants that are gated incorrectly or missing entirely.
2024-01-12 13:46:35 +00:00
Victor Do Nascimento
9af8f67118 aarch64: Add support for 128-bit system register mrrs and msrr insns
With the addition of 128-bit system registers to the Arm architecture
starting with Armv9.4-a, a mechanism for manipulating their contents
is introduced with the `msrr' and `mrrs' instruction pair.

These move values from one such 128-bit system register into a pair of
contiguous general-purpose registers and vice-versa, as for example:

	   msrr ttlb0_el1, x0, x1
	   mrrs x0, x1, ttlb0_el1

This patch adds the necessary support for these instructions, adding
checks for system-register width by defining a new operand type in the
form of `AARCH64_OPND_SYSREG128' and the `aarch64_sys_reg_128bit_p'
predicate, responsible for checking whether the requested system
register table entry is marked as implemented in the 128-bit mode via
the F_REG_128 flag.
2024-01-09 10:16:41 +00:00
Victor Do Nascimento
c0fbed6407 aarch64: Add xs variants of tlbip operands
The 2020 Architecture Extensions to the Arm A-profile architecture
added FEAT_XS, the XS attribute feature, giving cores the ability to
identify devices which can be subject to long response delays. TLB
invalidate (TLBI) operations and barriers can also be annotated with
this attribute[1].

With the introduction of the 128-bit translation tables with the
Armv8.9-a/Armv9.4-a Translation Hardening Extension, a series of new
TLB invalidate operations are introduced which make use of this
extension.  These are added to aarch64_sys_regs_tlbi[] for use
with the `tlbip' insn.

[1] https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/arm-a-profile-architecture-developments-2020
2024-01-09 10:16:40 +00:00
Victor Do Nascimento
3521a28f10 aarch64: Add support for the SYSP 128-bit system instruction
Mirroring the use of the `sys' - System Instruction assembly
instruction, this implements its 128-bit counterpart, `sysp'.

This optionally takes two contiguous general-purpose registers
starting at an even number or, when these are omitted, by default
sets both of these to xzr.

Syntax:

	sysp #<op1>, <Cn>, <Cm>, #<op2>{, <Xt1>, <Xt2>}
2024-01-09 10:16:40 +00:00
Victor Do Nascimento
d30eb38d5b aarch64: Add support for xzr register in register pair operands
Analysis of the allowed operand values for `sysp' and `tlbip' reveals
a significant departure from the allowed behavior for operand register
pairs (hitherto labeled AARCH64_OPND_PAIRREG) observed for other
insns in this category.

For instructions `casp', `mrrs' and `msrr' the register pair must
always start at an even index and the second register in the pair is
the index + 1.  This precludes the use of xzr as the first register,
given it corresponds to register number 31.

This is different in the case of `sysp' and `tlbip', however.  These
allow the use of xzr and, where the first operand in the pair is
omitted, this is the default value assigned to it.  When this
operand is assigned xzr, it is expected that the second operand will
likewise take on a value of xzr.

These two instructions therefore "break" two rules of register pairs:

  * The first of the two registers is odd-numbered.
  * The index of the second register is equal to that of the first,
  and not n+1.

To allow for this departure from hitherto standard behavior, we
extend the functionality of the assembler by defining an extension of
the AARCH64_OPND_PAIRREG, called AARCH64_OPND_PAIRREG_OR_XZR.

It is used in defining `sysp' and `tlbip' and allows
`operand_general_constraint_met_p' to allow the pair to both take on
the value of xzr.
2024-01-09 10:16:40 +00:00
Alan Modra
fd67aa1129 Update year range in copyright notice of binutils files
Adds two new external authors to etc/update-copyright.py to cover
bfd/ax_tls.m4, and adds gprofng to dirs handled automatically, then
updates copyright messages as follows:

1) Update cgen/utils.scm emitted copyrights.
2) Run "etc/update-copyright.py --this-year" with an extra external
   author I haven't committed, 'Kalray SA.', to cover gas testsuite
   files (which should have their copyright message removed).
3) Build with --enable-maintainer-mode --enable-cgen-maint=yes.
4) Check out */po/*.pot which we don't update frequently.
2024-01-04 22:58:12 +10:30
Srinath Parvathaneni
281fda33bc aarch64: Add new AT system instructions.
This patch adds 3 new AT system instructions through FEAT_ATS1A
feature, which are available by default from Armv9.4-A architecture.
2023-11-16 14:24:30 +00:00
Srinath Parvathaneni
ebd5c32f2f aarch64: Add SLC target for PRFM instruction.
This patch adds support for FEAT_PRFMSLC feature which enables
SLC target for PRFM instructions.
2023-11-16 12:11:51 +00:00
Victor Do Nascimento
f11f256f56 aarch64: Fix error in THE system register checking
The erroneous omission of a "reg_value == " in the THE system register
encoding check added in [1] led to an error which was not picked up in
GCC but which was flagged in Clang due to its use of
[-Werror,-Wconstant-logical-operand] check.  Together with this fix we
add a new test for the THE registers to pick up their illegal use,
adding an extra and important layer of validation.

Furthermore, in separating system register from instruction
implementation (with which only the former was of concern in the cited
patch), additions made to `aarch64-tbl.h' are rolled back so
that these can be added later when adding THE instructions to the
codebase, a more natural place for these changes.

[1] https://sourceware.org/pipermail/binutils/2023-November/130314.html

opcodes/ChangeLog:

	* aarch64-opc.c (aarch64_sys_ins_reg_supported_p): Fix typo.
	* aarch64-tbl.h (THE): Remove.
	 (aarch64_feature_set aarch64_feature_the): Likewise.

gas/ChangeLog:

	* testsuite/gas/aarch64/illegal-sysreg-8.l: Add tests for THE
	system registers.
	* testsuite/gas/aarch64/illegal-sysreg-8.s: Likewise.
2023-11-09 13:37:33 +00:00
Victor Do Nascimento
6219f9dae7 aarch64: Add LSE128 instruction operand support
Given the particular encoding of the LSE128 instructions, create the
necessary shared input+output operand register description and
handling in the code to allow for the encoding of the LSE128 128-bit
atomic operations.

gas/ChangeLog:

	* config/tc-aarch64.c (parse_operands):

include/ChangeLog:

	* opcode/aarch64.h (enum aarch64_opnd):

opcodes/ChangeLog:

	* aarch64-opc.c (fields):
	(aarch64_print_operand):
	* aarch64-opc.h (enum aarch64_field_kind):
	* aarch64-tbl.h (AARCH64_OPERANDS):
2023-11-07 21:53:59 +00:00
Victor Do Nascimento
9203a155ee aarch64: Add THE system register support
Add Binutils support for system registers associated with the
Translation Hardening Extension (THE).

In doing so, we also add core feature support for THE, enabling its
associated feature flag and implementing the necessary
feature-checking machinery.

Regression tested on aarch64-linux-gnu, no regressions.

gas/ChangeLog:

	* config/tc-aarch64.c (aarch64_features): Add "+the" feature modifier.
	* doc/c-aarch64.texi (AArch64 Extensions): Update
	documentation for `the' option.
	* testsuite/gas/aarch64/sysreg-8.s: Add tests for `the'
	associated system registers.
	* testsuite/gas/aarch64/sysreg-8.d: Likewise.

include/ChangeLog:

	* opcode/aarch64.h (enum aarch64_feature_bit): Add
	AARCH64_FEATURE_THE.

opcode/ChangeLog:

	* aarch64-opc.c (aarch64_sys_ins_reg_supported_p): Add `the'
	system register check support.
	* aarch64-sys-regs.def: Add `rcwmask_el1' and `rcwsmask_el1'
	* aarch64-tbl.h: Define `THE' preprocessor macro.
2023-11-07 20:38:11 +00:00
Srinath Parvathaneni
c58f84d899 aarch64: Add support for GCSB DSYNC instruction.
This patch adds support for Guarded control stack data synchronization
instruction (GCSB DSYNC). This instruction is allocated to existing
HINT space and uses the HINT number 19 and to match this an entry is
added to the aarch64_hint_options array.
2023-11-02 13:09:26 +00:00
Srinath Parvathaneni
6c0ecdbad7 aarch64: Add support for Check Feature Status Extension.
This patch adds support for Check Feature Status Extension (CHK) which
is mandatory from Armv8.0-A. Also this patch supports "chkfeat" instruction
(hint #40).
2023-11-02 12:45:08 +00:00
Victor Do Nascimento
5dd233b314 aarch64: Refactor system register data
This patch  moves instances of system register definitions, represented
by the SYSREG macro, out of their original place in `aarch64-opc.c'
and into a dedicated .def file, `aarch64-sys-regs.def'.

System register entries in this new file are ordered alphabetically by
name.  This choice is made to enable the use of fast search algorithms
such as binary search when validating register names.

The SYSREG macro, defined as SYSREG (name, encoding, flags, features)
is kept as is and used in the def file, but all other SR_* macros
which previously served as indirections to SYSREG are removed.

opcodes/ChangeLog:
	* aarch64-opc.c (SR_CORE): Macro definition and uses deleted.
	(SR_FEAT): Likewise.
	(SR_FEAT2): Likewise.
	(SR_V8_1_A): Likewise.
	(SR_V8_4_A): Likewise.
	(SR_V8A): Likewise.
	(SR_V8R): Likewise.
	(SR_V8_1A): Likewise.
	(SR_V8_2A): Likewise.
	(SR_V8_3A): Likewise.
	(SR_V8_4A): Likewise.
	(SR_V8_6A): Likewise.
	(SR_V8_7A): Likewise.
	(SR_V8_8A): Likewise.
	(SR_GIC): Likewise.
	(SR_AMU): Likewise.
	(SR_LOR): Likewise.
	(SR_PAN): Likewise.
	(SR_RAS): Likewise.
	(SR_RNG): Likewise.
	(SR_SME): Likewise.
	(SR_SSBS): Likewise.
	(SR_SVE): Likewise.
	(SR_ID_PFR2): Likewise.
	(SR_PROFILE): Likewise.
	(SR_MEMTAG): Likewise.
	(SR_SCXTNUM): Likewise.
	(SR_EXPAND_ELx): Likewise.
	(SR_EXPAND_EL12): Likewise.
	* opcodes/aarch64-sys-regs.def: New.
2023-10-04 12:21:53 +01:00
Victor Do Nascimento
1bf6696b59 aarch64: system register aliasing detection
This patch adds a mechanism for system register name alias detection
to register-matching mechanisms.

A new `F_REG_ALIAS' flag is added to the set of register flags and
used to label which entries in aarch64_sys_regs[] correspond to
aliases (and thus which CPENC values are non-unique in this array).

Where this is used is, for example, in `aarch64_print_operand' where,
in the case of system register decoding, the aarch64_sys_regs[] array
is iterated through until a match in CPENC value is made and the first
match accepted.  If insufficient care is given in the ordering of
system registers in this array, the alias is encountered before the
"real" register and used incorrectly as the register name in the
disassembled output.

With this flag and the new `aarch64_sys_reg_alias_p' test, search
candidates corresponding to aliases can be conveniently skipped over.

One concrete example of where this is useful is with the
`trcextinselr0' system register.  It was initially placed in the
system register list before `trcextinselr', in contrast to a more
natural alphabetical order.

include/ChangeLog:
	* opcode/aarch64.h: add `aarch64_sys_reg_alias_p' prototype.

opcodes/ChangeLog:
	* aarch64-opc.c (aarch64_sys_reg_alias_p): New.
	(aarch64_print_operand): add aarch64_sys_reg_alias_p check.
	(aarch64_sys_regs): Add F_REG_ALIAS flag to "trcextinselr"
	entry.
	* aarch64-opc.h (F_REG_ALIAS): New.
2023-10-04 12:21:53 +01:00
Richard Sandiford
4abb672ac1 aarch64: Restructure feature flag handling
The AArch64 feature-flag code is currently limited to a maximum
of 64 features.  This patch reworks it so that the limit can be
increased more easily.  The basic idea is:

(1) Turn the ARM_FEATURE_FOO macros into an enum, with the enum
    counting bit positions.

(2) Make the feature-list macros take an array index argument
    (currently always 0).  The macros then return the
    aarch64_feature_set contents for that array index.

    An N-element array would then be initialised as:

      { MACRO (0), ..., MACRO (N - 1) }

(3) Provide convenience macros for initialising an
    aarch64_feature_set for:

    - a single feature
    - a list of individual features
    - an architecture version
    - an architecture version + a list of additional features

(2) and (3) use the preprocessor to generate static initialisers.
The main restriction was that uses of the same preprocessor macro
cannot be nested.  So if a macro wants to do something for N individual
arguments, it needs to use a chain of N macros to do it.  There then
needs to be a way of deriving N, as a preprocessor token suitable for
pasting.

The easiest way of doing that was to precede each list of features
by the number of features in the list.  So an aarch64_feature_set
initialiser for three features A, B and C would be written:

  AARCH64_FEATURES (3, A, B, C)

This scheme makes it difficult to keep AARCH64_FEATURE_CRYPTO as a
synonym for SHA2+AES, so the patch expands the former to the latter.
2023-09-26 15:01:21 +01:00
Victor Do Nascimento
a4822788d7 aarch64: Improve naming conventions for A and R-profile architecture
Historically, flags and variables relating to architectural revisions
for the A-profile architecture omitted the trailing `A' such that, for
example, assembling for `-march=armv8.4-a' set the `AARCH64_ARCH_V8_4'
flag in the assembler.

This leads to some ambiguity, since Binutils also targets the
R-profile Arm architecture.  Therefore, it seems prudent to have
everything associated with the A-profile cores end in `A' and likewise
`R' for the R-profile.  Referring back to the example above, the flag
set for `-march=armv8.4-a' is better characterized if labeled
`AARCH64_ARCH_V8_4A'.

The only exception to the rule of appending `A' to variables is found
in the handling of the `AARCH64_FEATURE_V8' macro, as it is the
baseline from which ALL processors derive and should therefore be left
unchanged.

In reflecting the `ARM' architectural nomenclature choices, where we
have `ARM_ARCH_V8A' and `ARM_ARCH_V8R', the choice is made to not have
an underscore separating the numerical revision number and the
A/R-profile indicator suffix.  This has meant that renaming of
R-profile related flags and variables was warranted, thus going from
`.*_[vV]8_[rR]' to `.*_[vV]8[rR]'.

Finally, this is more in line with conventions within GCC and adds consistency
across the toolchain.

gas/ChangeLog:
	* gas/config/tc-aarch64.c:
	(aarch64_cpus): Reference to arch feature macros updated.
	(aarch64_archs): Likewise.

include/ChangeLog:
	* include/opcode/aarch64.h:
	(AARCH64_FEATURE_V8A): Updated name: V8_A -> V8A.
	(AARCH64_FEATURE_V8_1A): A-suffix added.
	(AARCH64_FEATURE_V8_2A): Likewise.
	(AARCH64_FEATURE_V8_3A): Likewise.
	(AARCH64_FEATURE_V8_4A): Likewise.
	(AARCH64_FEATURE_V8_5A): Likewise.
	(AARCH64_FEATURE_V8_6A): Likewise.
	(AARCH64_FEATURE_V8_7A): Likewise.
	(AARCH64_FEATURE_V8_8A):Likewise.
	(AARCH64_FEATURE_V9A): Likewise.
	(AARCH64_FEATURE_V8R): Updated name: V8_R -> V8R.
	(AARCH64_ARCH_V8A_FEATURES): Updated name: V8_A -> V8A.
	(AARCH64_ARCH_V8_1A_FEATURES): A-suffix added.
	(AARCH64_ARCH_V8_2A_FEATURES): Likewise.
	(AARCH64_ARCH_V8_3A_FEATURES): Likewise.
	(AARCH64_ARCH_V8_4A_FEATURES): Likewise.
	(AARCH64_ARCH_V8_5A_FEATURES): Likewise.
	(AARCH64_ARCH_V8_6A_FEATURES): Likewise.
	(AARCH64_ARCH_V8_7A_FEATURES): Likewise.
	(AARCH64_ARCH_V8_8A_FEATURES): Likewise.
	(AARCH64_ARCH_V9A_FEATURES): Likewise.
	(AARCH64_ARCH_V9_1A_FEATURES): Likewise.
	(AARCH64_ARCH_V9_2A_FEATURES): Likewise.
	(AARCH64_ARCH_V9_3A_FEATURES): Likewise.
	(AARCH64_ARCH_V8A): Updated name: V8_A -> V8A.
	(AARCH64_ARCH_V8_1A): A-suffix added.
	(AARCH64_ARCH_V8_2A): Likewise.
	(AARCH64_ARCH_V8_3A): Likewise.
	(AARCH64_ARCH_V8_4A): Likewise.
	(AARCH64_ARCH_V8_5A): Likewise.
	(AARCH64_ARCH_V8_6A): Likewise.
	(AARCH64_ARCH_V8_7A): Likewise.
	(AARCH64_ARCH_V8_8A): Likewise.
	(AARCH64_ARCH_V9A): Likewise.
	(AARCH64_ARCH_V9_1A): Likewise.
	(AARCH64_ARCH_V9_2A): Likewise.
	(AARCH64_ARCH_V9_3A): Likewise.
	(AARCH64_ARCH_V8_R): Updated name: V8_R -> V8R.

opcodes/ChangeLog:
	* opcodes/aarch64-opc.c (SR_V8A): Updated name: V8_A -> V8A.
	(SR_V8_1A): A-suffix added.
	(SR_V8_2A): Likewise.
	(SR_V8_3A): Likewise.
	(SR_V8_4A): Likewise.
	(SR_V8_6A): Likewise.
	(SR_V8_7A): Likewise.
	(SR_V8_8A): Likewise.
	(aarch64_sys_regs): Reference to arch feature macros updated.
	(aarch64_pstatefields): Reference to arch feature macros updated.
	(aarch64_sys_ins_reg_supported_p): Reference to arch feature macros
	updated.
	* opcodes/aarch64-tbl.h:
	(aarch64_feature_v8_2a): a-suffix added.
	(aarch64_feature_v8_3a): Likewise.
	(aarch64_feature_fp_v8_3a): Likewise.
	(aarch64_feature_v8_4a): Likewise.
	(aarch64_feature_fp_16_v8_2a): Likewise.
	(aarch64_feature_v8_5a): Likewise.
	(aarch64_feature_v8_6a): Likewise.
	(aarch64_feature_v8_7a): Likewise.
	(aarch64_feature_v8r): Updated name: v8_r-> v8r.
	(ARMV8R): Updated name: V8_R-> V8R.
	(ARMV8_2A): A-suffix added.
	(ARMV8_3A): Likewise.
	(FP_V8_3A): Likewise.
	(ARMV8_4A): Likewise.
	(FP_F16_V8_2A): Likewise.
	(ARMV8_5): Likewise.
	(ARMV8_6A): Likewise.
	(ARMV8_6A_SVE): Likewise.
	(ARMV8_7A): Likewise.
	(V8_2A_INSN): `A' added to macro symbol.
	(V8_3A_INSN): Likewise.
	(V8_4A_INSN): Likewise.
	(FP16_V8_2A_INSN): Likewise.
	(V8_5A_INSN): Likewise.
	(V8_6A_INSN): Likewise.
	(V8_7A_INSN): Likewise.
	(V8R_INSN): Updated name: V8_R-> V8R.
2023-08-22 16:46:33 +01:00
Richard Sandiford
8ff429203d aarch64: Add the RPRFM instruction
This patch adds the RPRFM (range prefetch) instruction.
It was introduced as part of SME2, but it belongs to the
prefetch hint space and so doesn't require any specific
ISA flags.

The aarch64_rprfmop_array initialiser (deliberately) only
fills in the leading non-null elements.
2023-03-30 11:09:18 +01:00
Richard Sandiford
dfc12f9f53 aarch64: Add new SVE dot-product instructions
This patch adds the SVE FDOT, SDOT and UDOT instructions,
which are available when FEAT_SME2 is implemented.  The patch
also reorders the existing SVE_Zm3_22_INDEX to keep the
operands numerically sorted.
2023-03-30 11:09:17 +01:00
Richard Sandiford
6efa660124 aarch64: Add the SME2 shift instructions
There are two instruction formats here:

- SQRSHR, SQRSHRU and UQRSHR, which operate on lists of two
  or four registers.

- SQRSHRN, SQRSHRUN and UQRSHRN, which operate on lists of
  four registers.

These are the first SME2 instructions to have immediate operands.
The patch makes sure that, when parsing SME2 instructions with
immediate operands, the new predicate-as-counter registers are
parsed as registers rather than as #-less immediates.
2023-03-30 11:09:16 +01:00
Richard Sandiford
ce623e7aa4 aarch64: Add the SME2 saturating conversion instructions
There are two instruction formats here:

- SQCVT, SQCVTU and UQCVT, which operate on lists of two or
  four registers.

- SQCVTN, SQCVTUN and UQCVTN, which operate on lists of
  four registers.
2023-03-30 11:09:16 +01:00
Richard Sandiford
a8cb21aa06 aarch64: Add the SME2 MLALL and MLSLL instructions
SMLALL, SMLSLL, UMLALL and UMLSLL have the same format.
USMLALL and SUMLALL allow the same operand types as those
instructions, except that SUMLALL does not have the multi-vector
x multi-vector forms (which would be redundant with USMLALL).
2023-03-30 11:09:14 +01:00
Richard Sandiford
ed429b33c1 aarch64: Add the SME2 MLAL and MLSL instructions
The {BF,F,S,U}MLAL and {BF,F,S,U}MLSL instructions share the same
encoding.  They are the first instance of a ZA (as opposed to ZA tile)
operand having a range of offsets.  As with ZA tiles, the expected
range size is encoded in the operand-specific data field.
2023-03-30 11:09:13 +01:00
Richard Sandiford
80752eb098 aarch64: Add the SME2 FMLA and FMLS instructions 2023-03-30 11:09:13 +01:00
Richard Sandiford
e87ff6724f aarch64: Add the SME2 ADD and SUB instructions
Add support for the SME2 ADD. SUB, FADD and FSUB instructions.
SUB and FSUB have the same form as ADD and FADD, except that
ADD also has a 2-operand accumulating form.

The 64-bit ADD/SUB instructions require FEAT_SME_I16I64 and the
64-bit FADD/FSUB instructions require FEAT_SME_F64F64.

These are the first instructions to have tied register list
operands, as opposed to tied single registers.

The parse_operands change prevents unsuffixed Z registers (width==-1)
from being treated as though they had an Advanced SIMD-style suffix
(.4s etc.).  It means that:

  Error: expected element type rather than vector type at operand 2 -- `add za\.s\[w8,0\],{z0-z1}'

becomes:

  Error: missing type suffix at operand 2 -- `add za\.s\[w8,0\],{z0-z1}'
2023-03-30 11:09:13 +01:00
Richard Sandiford
cbd11b8818 aarch64: Add the SME2 ZT0 instructions
SME2 adds lookup table instructions for quantisation.  They use
a new lookup table register called ZT0.

LUTI2 takes an unsuffixed SVE vector index of the form Zn[<imm>],
which is the first time that this syntax has been used.
2023-03-30 11:09:12 +01:00
Richard Sandiford
99e01a66b4 aarch64: Add the SME2 predicate-related instructions
Implementation-wise, the main things to note here are:

- the WHILE* instructions have forms that return a pair of predicate
  registers.  This is the first time that we've had lists of predicate
  registers, and they wrap around after register 15 rather than after
  register 31.

- the predicate-as-counter WHILE* instructions have a fourth operand
  that specifies the vector length.  We can treat this as an enumeration,
  except that immediate values aren't allowed.

- PEXT takes an unsuffixed predicate index of the form PN<n>[<imm>].
  This is the first instance of a vector/predicate index having
  no suffix.
2023-03-30 11:09:12 +01:00
Richard Sandiford
b408ebbf52 aarch64: Add the SME2 multivector LD1 and ST1 instructions
SME2 adds LD1 and ST1 variants for lists of 2 and 4 registers.
The registers can be consecutive or strided.  In the strided case,
2-register lists have a stride of 8, starting at register x0xxx.
4-register lists have a stride of 4, starting at register x00xx.

The instructions are predicated on a predicate-as-counter register in
the range pn8-pn15.  Although we already had register fields with upper
bounds of 7 and 15, this is the first plain register operand to have a
nonzero lower bound.  The patch uses the operand-specific data field
to record the minimum value, rather than having separate inserters
and extractors for each lower bound.  This in turn required adding
an extra bit to the field.
2023-03-30 11:09:12 +01:00
Richard Sandiford
d8773a8a5f aarch64: Add the SME2 MOVA instructions
SME2 defines new MOVA instructions for moving multiple registers
to and from ZA.  As with SME, the instructions are also available
through MOV aliases.

One notable feature of these instructions (and many other SME2
instructions) is that some register lists must start at a multiple
of the list's size.  The patch uses the general error "start register
out of range" when this constraint isn't met, rather than an error
specifically about multiples.  This ensures that the error is
consistent between these simple consecutive lists and later
strided lists, for which the requirements aren't a simple multiple.
2023-03-30 11:09:12 +01:00
Richard Sandiford
503fae1299 aarch64: Add support for predicate-as-counter registers
SME2 adds a new format for the existing SVE predicate registers:
predicates as counters rather than predicates as masks.  In assembly
code, operands that interpret predicates as counters are written
pn<N> rather than p<N>.

This patch adds support for these registers and extends some
existing instructions to support them.  Since the new forms
are just a programmer convenience, there's no need to make them
more restrictive than the earlier predicate-as-mask forms.
2023-03-30 11:09:11 +01:00
Richard Sandiford
586c62819f aarch64; Add support for vector offset ranges
Some SME2 instructions operate on a range of consecutive ZA vectors.
This is indicated by syntax such as:

   za[<Wv>, <imml>:<immh>]

Like with the earlier vgx2 and vgx4 support, we get better error
messages if the parser allows all ZA indices to have a range.
We can then reject invalid cases during constraint checking.
2023-03-30 11:09:11 +01:00
Richard Sandiford
e2dc4040f3 aarch64: Add support for vgx2 and vgx4
Many SME2 instructions operate on groups of 2 or 4 ZA vectors.
This is indicated by adding a "vgx2" or "vgx4" group size to the
ZA index.  The group size is optional in assembly but preferred
for disassembly.

There is not a binary distinction between mnemonics that have
group sizes and mnemonics that don't, nor between mnemonics that
take vgx2 and mnemonics that take vgx4.  We therefore get better
error messages if we allow any ZA index to have a group size
during parsing, and wait until constraint checking to reject
invalid sizes.

A quirk of the way errors are reported means that if an instruction
is wrong both in its qualifiers and its use of a group size, we'll
print suggested alternative instructions that also have an incorrect
group size.  But that's a general property that also applies to
things like out-of-range immediates.  It's also not obviously the
wrong thing to do.  We need to be relatively confident that we're
looking at the right opcode before reporting detailed operand-specific
errors, so doing qualifier checking first seems resonable.
2023-03-30 11:09:11 +01:00
Richard Sandiford
90cd80f8c2 aarch64: Add _off4 suffix to AARCH64_OPND_SME_ZA_array
SME2 adds various new fields that are similar to
AARCH64_OPND_SME_ZA_array, but are distinguished by the size of
their offset fields.  This patch adds _off4 to the name of the
field that we already have.
2023-03-30 11:09:11 +01:00
Richard Sandiford
abd542a2f1 aarch64: Add a _10 suffix to FLD_imm3
SME2 adds various new 3-bit immediate fields, so this patch adds
an lsb position suffix to the name of the field that we already have.
2023-03-30 11:09:10 +01:00
Richard Sandiford
4eede8c244 aarch64: Prefer register ranges & support wrapping
Until now, binutils has supported register ranges such
as { v0.4s - v3.4s } as an unofficial shorthand for
{ v0.4s, v1.4s, v2.4s, v3.4s }.  The SME2 ISA embraces this form
and makes it the preferred disassembly.  It also embraces wrapped
lists such as { z31.s - z2.s }, which is something that binutils
didn't previously allow.

The range form was already binutils's preferred disassembly for 3- and
4-register lists.  This patch prefers it for 2-register lists too.
The patch also adds support for wrap-around.
2023-03-30 11:09:10 +01:00
Richard Sandiford
f5b57feac2 aarch64: Add support for strided register lists
SME2 has instructions that accept strided register lists,
such as { z0.s, z4.s, z8.s, z12.s }.  The purpose of this
patch is to extend binutils to support such lists.

The parsing code already had (unused) support for strides of 2.
The idea here is instead to accept all strides during parsing
and reject invalid strides during constraint checking.

The SME2 instructions that accept strided operands also have
non-strided forms.  The errors about invalid strides therefore
take a bitmask of acceptable strides, which allows multiple
possibilities to be summed up in a single message.

I've tried to update all code that handles register lists.
2023-03-30 11:09:10 +01:00
Richard Sandiford
b5c36ad2e0 aarch64: Sort fields alphanumerically
This patch just sorts the field enum alphanumerically, which makes
it easier to see if a particular field has already been defined.
2023-03-30 11:09:09 +01:00
Richard Sandiford
ccb6da7c82 aarch64: Resync field names
This patch just makes the comments in aarch64-opc.c:fields match
the names of the associated FLD_* enum.
2023-03-30 11:09:09 +01:00
Richard Sandiford
1d1060427d aarch64: Regularise FLD_* suffixes
Some FLD_imm* suffixes used a counting scheme such as FLD_immN,
FLD_immN_2, FLD_immN_3, etc., while others used the lsb as the
suffix.  The latter seems more mnemonic, and was a big help
in doing the SME2 work.

Similarly, the _10 suffix on FLD_SME_size_10 was nonobvious.
Presumably it indicated a 2-bit field, but it actually starts
in bit 22.
2023-03-30 11:09:09 +01:00
Richard Sandiford
199cfcc475 aarch64: Add a aarch64_cpu_supports_inst_p helper
Quite a lot of SME2 instructions have an opcode bit that selects
between 32-bit and 64-bit forms of an instruction, with the 32-bit
forms being part of base SME2 and with the 64-bit forms being part
of an optional extension.  It's nevertheless useful to have a single
opcode entry for both forms since (a) that matches the ISA definition
and (b) it tends to improve error reporting.

This patch therefore adds a libopcodes function called
aarch64_cpu_supports_inst_p that tests whether the target
supports a particular instruction.  In future it will depend
on internal libopcodes routines.
2023-03-30 11:09:09 +01:00
Richard Sandiford
b5b4f66545 aarch64: Try to report invalid variants against the closest match
If an instruction has invalid qualifiers, GAS would report the
error against the final opcode entry that got to the qualifier-
checking stage.  It seems better to report the error against
the opcode entry that had the closest match, just like we
pick the closest match within an opcode entry for the
"did you mean this?" message.

This patch adds the number of invalid operands as an
argument to AARCH64_OPDE_INVALID_VARIANT and then picks the
AARCH64_OPDE_INVALID_VARIANT with the lowest argument.
2023-03-30 11:09:08 +01:00