This patch adds support for the new AArch64 system registers that are part of the following extensions:
* FEAT_DEBUGv8p9
* FEAT_PMUv3p9
* FEAT_PMUv3_SS
* FEAT_PMUv3_ICNTR
* FEAT_SEBEP
As already indicated during review, we can't get away without certain
adjustments here: Without these, respective {evex}-prefixed insns are
assembled to APX encodings even when APX_F is turned off.
While there also extend the respective comment in the opcode table, to
explain why this construct is used.
PR gas/31178
In da0784f961d8 ("x86: fold FMA VEX and EVEX templates") I overlooked
that C aliases StaticRounding, and hence build_vex_prefix() now needs to
be aware of that aliasing. Disambiguation is easy, as StaticRounding is
only ever used together with SAE (hence why the overlaying works in the
first place).
This patch adds support for FEAT_THE doubleword and quadword instructions.
doubleword insturctions are enabled by "+the" flag whereas quadword
instructions are enabled on passing both "+the and +d128" flags.
Support for following sets of instructions is added in this patch.
Read check write compare and swap doubleword:
(rcwcas, rcwcasa, rcwcasal, rcwcasl)
Read check write compare and swap quadword:
(rcwcasp,rcwcaspa, rcwcaspal, rcwcaspl)
Read check write software compare and swap doubleword:
(rcwscas, rcwscasa, rcwscasal, rcwscasl)
Read check write software compare and swap quadword:
(rcwscasp, rcwscaspa, rcwscaspal, rcwscaspl)
Read check write atomic bit clear on doubleword:
(rcwclr, rcwclra, rcwclral, rcwclrl)
Read check write atomic bit clear on quadword:
(rcwclrp, rcwclrpa, rcwclrpal, rcwclrpl)
Read check write software atomic bit clear on doubleword:
(rcwsclr, rcwsclra, rcwsclral, rcwsclrl)
Read check write software atomic bit clear on quadword:
(rcwsclrp,rcwsclrpa, rcwsclrpal,rcwsclrpl)
Read check write atomic bit set on doubleword:
(rcwset,rcwseta, rcwsetal,rcwsetl)
Read check write atomic bit set on quadword:
(rcwsetp,rcwsetpa,rcwsetpal,rcwsetpl)
Read check write software atomic bit set on doubleword:
(rcwsset,rcwsseta,rcwssetal,rcwssetl)
Read check write software atomic bit set on quadword:
(rcwssetp,rcwssetpa,rcwssetpal,rcwssetpl)
Read check write swap doubleword:
(rcwswp,rcwswpa,rcwswpal,rcwswpl)
Read check write swap quadword:
(rcwswpp,rcwswppa, rcwswppal,rcwswppl)
Read check write software swap doubleword:
(rcwsswp,rcwsswpa,rcwsswpal,rcwsswpl)
Read check write software swap quadword:
(rcwsswpp,rcwsswppa,rcwsswppal,rcwsswppl)
Add tests to cover the full range of behaviors observed around
optional register operands for the `tlbip' and `sysp' instructions,
namely:
* Not all `tlbip' operations take GPR operands. When this is the
case, we should check that neither optional operand was supplied.
* When a `tlbip' operation is labeled with the `F_HASXT' flag, xzr
is not a valid optional operand. In such case, at least the fist
optional register needs to be specified with a non-xzr value.
* The first operand for both insns should be either xzr or an
even-numbered register (n % 2 == 0). In the former scenario, the
second operand should default to xzr too, while in the latter, it
should default to n + 1.
With the addition of 128-bit system registers to the Arm architecture
starting with Armv9.4-a, a mechanism for manipulating their contents
is introduced with the `msrr' and `mrrs' instruction pair.
These move values from one such 128-bit system register into a pair of
contiguous general-purpose registers and vice-versa, as for example:
msrr ttlb0_el1, x0, x1
mrrs x0, x1, ttlb0_el1
This patch adds the necessary support for these instructions, adding
checks for system-register width by defining a new operand type in the
form of `AARCH64_OPND_SYSREG128' and the `aarch64_sys_reg_128bit_p'
predicate, responsible for checking whether the requested system
register table entry is marked as implemented in the 128-bit mode via
the F_REG_128 flag.
The addition of 128-bit page table descriptors and, with it, the
addition of 128-bit system registers for these means that special
"invalidate translation table entry" instructions are needed to cope
with the new 128-bit model. This is introduced with the `tlbpi'
instruction, implemented here.
While CRn and CRm fields in the SYSP instruction are 4-bit wide and
are thus able to accommodate values in the range 0-15, the
specifications for the SYSP instructions limit their ranges to 8-9 for
CRm and 0-7 in the case of CRn.
This led to the need to signal in some way to the operand parser that
a given operand is under special restrictions regarding its use. This
is done via the new `F_OPD_NARROW' flag, indicating a narrowing in the
range of operand values for fields in the instruction tagged with the
flag.
The flag is then used in `parse_operands' when the instruction is
assembled, but needs not be taken into consideration during
disassembly.
Mirroring the use of the `sys' - System Instruction assembly
instruction, this implements its 128-bit counterpart, `sysp'.
This optionally takes two contiguous general-purpose registers
starting at an even number or, when these are omitted, by default
sets both of these to xzr.
Syntax:
sysp #<op1>, <Cn>, <Cm>, #<op2>{, <Xt1>, <Xt2>}
Two of the instructions added by the `+d128' architectural extension
add the flexibility to have two optional operands. Prior to the
addition of the `tlbip' and `sysp' instructions, no mnemonic allowed
more than one such optional operand.
With `tlbip' as an example, some TLBIP instruction names do not allow
for any optional operands, while others allow for both to be optional.
In the latter case, it is possible that either the second operand
alone is omitted or both operands are omitted.
Therefore, a considerable degree of flexibility needed to be added to
the way operands were parsed. It was, however, possible to achieve
this with relatively few changes to existing code.
it is noteworthy that opcode flags specifying the optional operand
number are non-orthogonal. For example, we have:
#define F_OPD1_OPT (2 << 12) : 0b10 << 12
#define F_OPD2_OPT (3 << 12) : 0b11 << 12
such that by virtue of the observation that
(F_OPD1_OPT | F_OPD2_OPT) == F_OPD2_OPT
it is impossible to mark both operands 1 and 2 as optional for an
instruction and it is assumed that a maximum of 1 operand can ever be
optional. This is not overly-problematic given that, for optional
pairs, the second optional operand is always found immediately after
the first. Thus, it suffices for us to flag that there is a second
optional operand. With this fact, we can infer its position in the
mnemonic from the position of the first (e.g. if the second operand in
the mnemonic is optional, we know the third is too). We therefore
define the `F_OPD_PAIR_OPT' flag and calculate its position in the
mnemonic from the value encoded by the `F_OPD<n>_OPT' flag.
Another observation is that there is a tight coupling between default
values assigned to the two registers when one (or both) are omitted
from the mnemonic. Namely, if Xt1 has a value of 0x1f (the zero
register is specified), Xt2 defaults to the same value, otherwise Xt2
will be assigned Xt + 1. This meant that where you have default value
validation, in checking the second optional operand's value, it is
also necessary to look at the value assigned to the
previously-processed operand value before deciding its validity. Thus
`process_omitted_operand' needs not only access to its `operand'
argument, but also to the global `inst' struct.
Analysis of the allowed operand values for `sysp' and `tlbip' reveals
a significant departure from the allowed behavior for operand register
pairs (hitherto labeled AARCH64_OPND_PAIRREG) observed for other
insns in this category.
For instructions `casp', `mrrs' and `msrr' the register pair must
always start at an even index and the second register in the pair is
the index + 1. This precludes the use of xzr as the first register,
given it corresponds to register number 31.
This is different in the case of `sysp' and `tlbip', however. These
allow the use of xzr and, where the first operand in the pair is
omitted, this is the default value assigned to it. When this
operand is assigned xzr, it is expected that the second operand will
likewise take on a value of xzr.
These two instructions therefore "break" two rules of register pairs:
* The first of the two registers is odd-numbered.
* The index of the second register is equal to that of the first,
and not n+1.
To allow for this departure from hitherto standard behavior, we
extend the functionality of the assembler by defining an extension of
the AARCH64_OPND_PAIRREG, called AARCH64_OPND_PAIRREG_OR_XZR.
It is used in defining `sysp' and `tlbip' and allows
`operand_general_constraint_met_p' to allow the pair to both take on
the value of xzr.
Indicating the presence of the Armv9.4-a features concerning 128-bit
Page Table Descriptors, 128-bit System Registers and Instructions,
the "+d128" architectural extension flag is added to the list of
possible -march options in Binutils, together with the necessary macro
for encoding d128 instructions.
Currently, only mipsisa32-linux and mipsisa32el-linux is marked
as addr32, which make mipsisa32rN(el) not marked.
This change can fix 2 test failures on mipsisa32rN(el)-linux:
FAIL: MIPS MIPS64 MIPS-3D ASE instructions (-mips3d flag)
FAIL: MIPS MIPS64 MDMX ASE instructions (-mdmx flag)
These failures don't happen for mipsisa32rN-mti-elf etc,
due to that, the output is set as NO_ABI instead of O32, then
gas won't warn:
`fp=64' used with a 32-bit ABI
Maybe, we should change this behaivour in future.
This patch adds AArch32 support for -march=armv8.9-a and
-march=armv9.4-a. The behaviour of the new options can be
expressed using a combination of existing feature flags
and tables.
The cpu_arch_ver entries for ARM_ARCH_V9_4A and ARM_ARCH_V8_9A
are technically redundant but it including them for macro code
consistency across architectures.
Also recognized are aarch64-*-gnu tagrets, e.g. aarch64-pc-gnu or
aarch64-unknown-gnu.
The ld/emulparams/aarch64gnu.sh file is (for now) identical to aarch64fbsd.sh,
or to aarch64linux.sh with Linux-specific logic removed; and mainly different
from the generic aarch64elf.sh in that it does not set EMBEDDED=yes.
Coupled with a corresponding GCC patch, this produces a toolchain that can
sucessfully build working binaries targeting aarch64-gnu.
Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
Since 0x66 is the opcode prefix for adcx, it is wrong to use the 'S'
prefix:
'S' => print 'w', 'l' or 'q' if suffix_always is true
on adcx. Add
'L' => print 'l' or 'q' if suffix_always is true
replace S with L on adcx and adox.
gas/
PR binutils/31219
* testsuite/gas/i386/suffix.d: Updated.
* testsuite/gas/i386/x86-64-suffix.d: Likewise.
* testsuite/gas/i386/suffix.s: Add tests for adcx and adox.
* testsuite/gas/i386/x86-64-suffix.s: Likewise.
opcodes/
PR binutils/31219
* i386-dis.c: Add the 'L' suffix.
(prefix_table): Replace S with L on adcx and adox.
(putop): Handle the 'L' suffix.
There are a number of issues with 734dfd1cc966 ("x86: pack CPU flags in
opcode table"):
- the condition when two array slots need writing wasn't correct (with
enough new Cpu* added an out of bounds array access would validly have
been complained about by the compiler),
- table generation didn't take into account CpuAttrUnused and CpuUnused
being independent, and hence there not always (not) being an "unused"
bitfield member in both structures,
- cpu_flags_from_attr() wasn't ready for use on big-endian hosts,
- there were two style violations.
Various targets have / had overrides for .bss. Make sure that in such
cases
- .previous still works correctly (requiring such targets to invoke
obj_elf_section_change_hook() from their overriding handlers),
- sub-section specifiers are accepted as far as feasible (mandated by
the doc).
It doesn't look to be a good idea to override the custom handlers that
ELF and COFF have; afaict doing so broke .previous on ELF, and a sub-
section specifier wasn't accepted either.
The comment in s_bss() looks bogus (perhaps simply stale, or wrongly
copied from another target). It also doesn't look to be a good idea to
override the custom handler that ELF has (afaict doing so broke
.previous as well as sub-section specification).
The override for .skip is simply pointless, for read.c having exactly
the same.
While there also drop two adjacent redundant (with read.h) declarations
(which would be outright dangerous if read.h wasn't included anyway).
While there doesn't look to be anything wrong with this override,
there's also no apparent reason why this override would be needed. Drop
it, reducing overall size a tiny bit.
The comment looks bogus (perhaps simply stale, or wrongly copied from
another target). It also doesn't look to be a good idea to override the
custom handler that ELF has (afaict doing so broke .previous as well as
sub-section specification).
While there also fold the identical handlers for .text (there likely is
more room for such folding).
The comment looks bogus (perhaps simply stale), and there are also no
other precautions against subsections being used on ELF with .bss. It
also doesn't look to be a good idea to override the custom handler that
ELF has (afaict doing so further broke .previous).
While only ELF is supported right now, (stub) code generally is in place
for the non-ELF case as well. Don't override .bss for ELF - that's
unlikely to be a good idea anyway and prevented the sub-section
specifier from being usable. Don't override .text and .data at all - for
.data and ELF for the same reason, while for .text and ELF obj-elf.c's is
all we need, and for (hypothetical) non-ELF read.c's identical handling
would have been invoked anyway.
The comment looks bogus (perhaps simply stale), and there are also no
other precautions against subsections being used on ELF with .bss. It
also doesn't look to be a good idea to override the custom handler that
ELF has (afaict doing so further broke .previous).
It doesn't look to be a good idea to override the custom handlers that
ELF and COFF have. While in this case interaction with ELF's .previous
wasn't screwed, the sub-section specifier wasn't permitted.
The comment looks bogus (perhaps simply stale, perhaps wrongly copied
from Arm in the first place), and there are also no other precautions
against subsections being used on ELF with .bss. It also doesn't look
to be a good idea to override the custom handlers that ELF and COFF
have (afaict doing so further broke .previous on ELF).
As to the mapping state update - such also doesn't appear to be done
for other section switching, so its original purpose was at best
questionable as well.
The comment looks bogus (perhaps simply stale), and there are also no
other precautions against subsections being used on ELF with .bss. It
also doesn't look to be a good idea to override the custom handlers that
ELF and COFF have (afaict doing so further broke .previous on ELF).
Since the particularity of "th.vsetvli" was not taken into account in the
initial support patches for XTheadVector, the program operation failed
due to instruction coding errors. According to T-Head SPEC ([1]), the
"vsetvl" in the XTheadVector extension consists of SEW, LMUL and EDIV,
which is quite different from the "V" extension. Therefore, we cannot
simply reuse the processing of vsetvl in V extension.
We have set up tens of thousands of test cases to ensure that no
further encoding issues are there, and and execute all compiled test
files on real HW and make sure they don't trigger SIGILL.
Ref:
[1] https://github.com/T-head-Semi/thead-extension-spec/releases/download/2.3.0/xthead-2023-11-10-2.3.0.pdf
Co-developed-by: Lifang Xia <lifang_xia@linux.alibaba.com>
Co-developed-by: Christoph Müllner <christoph.muellner@vrull.eu>
gas/ChangeLog:
* config/tc-riscv.c (validate_riscv_insn): Add handling for
th.vsetvli.
(my_getThVsetvliExpression): New function.
(riscv_ip): Likewise.
* testsuite/gas/riscv/x-thead-vector.d: Likewise.
* testsuite/gas/riscv/x-thead-vector.s: Likewise.
include/ChangeLog:
* opcode/riscv.h (OP_MASK_XTHEADVLMUL): New macro.
(OP_SH_XTHEADVLMUL): Likewise.
(OP_MASK_XTHEADVSEW): Likewise.
(OP_SH_XTHEADVSEW): Likewise.
(OP_MASK_XTHEADVEDIV): Likewise.
(OP_SH_XTHEADVEDIV): Likewise.
(OP_MASK_XTHEADVTYPE_RES): Likewise.
(OP_SH_XTHEADVTYPE_RES): Likewise.
opcodes/ChangeLog:
* riscv-dis.c (print_insn_args): Likewise.
* riscv-opc.c: Likewise.
Adds two new external authors to etc/update-copyright.py to cover
bfd/ax_tls.m4, and adds gprofng to dirs handled automatically, then
updates copyright messages as follows:
1) Update cgen/utils.scm emitted copyrights.
2) Run "etc/update-copyright.py --this-year" with an extra external
author I haven't committed, 'Kalray SA.', to cover gas testsuite
files (which should have their copyright message removed).
3) Build with --enable-maintainer-mode --enable-cgen-maint=yes.
4) Check out */po/*.pot which we don't update frequently.
Suppose we want to use la.got to generate 32 pcrel and
32 abs instruction sequences respectively. According to
the existing conditions, to generate 32 pcrel sequences
use -mabi=ilp32*, and to generate 32 abs use -mabi=ilp32*
and -mla-global-with-abs.
Due to the fact that the conditions for generating 32 abs
also satisfy 32 pcrel, using -mabi=ilp32* and -mla-global-with-abs
will result in only generating instruction sequences of 32 pcrel.
By modifying the conditions for macro expansion and adjusting
the matching order of macro instructions, it is ensured that
the correct sequence of instructions can be generated.
Append "#pass" to APX tests for targets which pad text sections with NOPs.
* testsuite/gas/i386/x86-64-apx-evex-promoted-bad.d: Append
"#pass".
* testsuite/gas/i386/x86-64-apx-ndd-optimize.d: Likewise.
* testsuite/gas/i386/x86-64-apx-ndd.d: Likewise.
* testsuite/gas/i386/x86-64-apx-pushp-popp-intel.d: Likewise.
* testsuite/gas/i386/x86-64-apx-pushp-popp.d: Likewise.
'/' starts a comment for some targets. Use .byte instead of .insn with
'/'.
* testsuite/gas/i386/x86-64-apx-evex-promoted-bad.s: Use .byte
instead of .insn with '/'.
commit 3d5a60de52556f6a53d71d7e607c6696450ae3e4
Author: H.J. Lu <hjl.tools@gmail.com>
Date: Thu Jun 8 10:01:03 2023 -0700
x86-64: Add R_X86_64_CODE_4_GOTPCRELX
added a new field, fx_tcbit3, to fix. But it didn't initialize it.
Fix it by clearing it in fix_new_internal.
* wrtite.c (fix_new_internal): Clear fx_tcbit3.