AArch64 or ARM64 is the
64-bit Execution state of the
ARM architecture family. It was first introduced with the
Armv8-A architecture, and had many extension updates.[1]
Announced in October 2011,[2]ARMv8-A represents a fundamental change to the ARM architecture. It adds an optional 64-bit Execution state, named "AArch64", and the associated new "A64" instruction set, in addition to a 32-bit Execution state, "AArch32", supporting the 32-bit "A32" (original 32-bit Arm) and "T32" (Thumb/Thumb-2) instruction sets. The latter instruction sets provide
user-space compatibility with the existing 32-bit ARMv7-A architecture. ARMv8-A allows 32-bit applications to be executed in a 64-bit OS, and a 32-bit OS to be under the control of a 64-bit
hypervisor.[3] ARM announced their
Cortex-A53 and
Cortex-A57 cores on 30 October 2012.[4]Apple was the first to release an ARMv8-A compatible core (
Cyclone) in a consumer product (
iPhone 5S).
AppliedMicro, using an
FPGA, was the first to demo ARMv8-A.[5] The first ARMv8-A
SoC from
Samsung is the Exynos 5433 used in the
Galaxy Note 4, which features two clusters of four Cortex-A57 and Cortex-A53 cores in a
big.LITTLE configuration; but it will run only in AArch32 mode.[6]
ARMv8-A includes the VFPv3/v4 and advanced SIMD (Neon) as standard features in both AArch32 and AArch64. It also adds cryptography instructions supporting
AES,
SHA-1/
SHA-256 and
finite field arithmetic.[7]
An ARMv8-A processor can support one or both of AArch32 and AArch64; it may support AArch32 and AArch64 at lower Exception levels and only AArch64 at higher Exception levels.[8] For example, the ARM Cortex-A32 supports only AArch32,[9] the
ARM Cortex-A34 supports only AArch64,[10] and the
ARM Cortex-A72 supports both AArch64 and AArch32.[11] An ARMv9-A processor must support AArch64 at all Exception levels, and may support AArch32 at EL0.[8]
AES encrypt/decrypt and SHA-1/SHA-2 hashing instructions also use these registers.
A new exception system:
Fewer banked registers and modes.
Memory translation from 48-bit virtual addresses based on the existing Large Physical Address Extension (LPAE), which was designed to be easily extended to 64-bit.
Extension: Data gathering hint (ARMv8.0-DGH).
AArch64 was introduced in ARMv8-A and is included in subsequent versions of ARMv8-A. It was also introduced in ARMv8-R as an option, after its introduction in ARMv8-A; it is not included in ARMv8-M.
Instruction formats
The main opcode for selecting which group an A64 instruction belongs to is at bits 25–28.
A64 instruction formats
Type
Bit
31
30
29
28
27
26
25
24
23
22
21
20
19
18
17
16
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
0
Reserved
0
op0
0
0
0
0
op1
SME
1
op0
0
0
0
0
Varies
Unallocated
0
0
0
1
SVE
0
0
1
0
Varies
Unallocated
0
0
1
1
Data Processing — Immediate PC-rel.
op
immlo
1
0
0
0
0
immhi
Rd
Data Processing — Immediate Others
sf
1
0
0
01–11
Rd
Branches + System Instructions
op0
1
0
1
op1
op2
Load and Store Instructions
op0
1
op1
0
op2
op3
op4
Data Processing — Register
sf
op0
op1
1
0
1
op2
op3
Data Processing — Floating Point and SIMD
op0
1
1
1
op1
op2
op3
ARMv8.1-A
In December 2014, ARMv8.1-A,[13] an update with "incremental benefits over v8.0", was announced. The enhancements fell into two categories: changes to the instruction set, and changes to the exception model and memory translation.
Instruction set enhancements included the following:
A set of AArch64 atomic read-write instructions.
Additions to the Advanced SIMD instruction set for both AArch32 and AArch64 to enable opportunities for some library optimizations:
Signed Saturating Rounding Doubling Multiply Accumulate, Returning High Half.
Signed Saturating Rounding Doubling Multiply Subtract, Returning High Half.
The instructions are added in vector and scalar forms.
A set of AArch64 load and store instructions that can provide memory access order that is limited to configurable address regions.
The optional CRC instructions in v8.0 become a requirement in ARMv8.1.
Enhancements for the exception model and memory translation system included the following:
A new Privileged Access Never (PAN) state bit provides control that prevents privileged access to user data unless explicitly enabled.
An increased VMID range for virtualization; supports a larger number of virtual machines.
Optional support for hardware update of the page table access flag, and the standardization of an optional, hardware updated, dirty bit mechanism.
The Virtualization Host Extensions (VHE). These enhancements improve the performance of Type 2 hypervisors by reducing the software overhead associated when transitioning between the Host and Guest operating systems. The extensions allow the Host OS to execute at EL2, as opposed to EL1, without substantial modification.
A mechanism to free up some translation table bits for operating system use, where the hardware support is not needed by the OS.
The Scalable Vector Extension (SVE) is "an optional extension to the ARMv8.2-A architecture and newer" developed specifically for vectorization of
high-performance computing scientific workloads.[16][17] The specification allows for variable vector lengths to be implemented from 128 to 2048 bits. The extension is complementary to, and does not replace, the
NEON extensions.
A 512-bit SVE variant has already been implemented on the
Fugaku supercomputer using the
Fujitsu A64FX ARM processor; this computer[18] was the fastest supercomputer in the world for two years, from June 2020[19] to May 2022.[20] A more flexible version, 2x256 SVE, was implemented by the
AWS Graviton3 ARM processor.
SVE is supported by the
GCC compiler, with GCC 8 supporting automatic vectorization[17] and GCC 10 supporting C intrinsics. As of July 2020[update],
LLVM and
clang support C and IR intrinsics. ARM's own fork of LLVM supports auto-vectorization.[21]
ARMv8.3-A
In October 2016, ARMv8.3-A was announced. Its enhancements fell into six categories:[22]
Pointer authentication[23] (AArch64 only); mandatory extension (based on a new block cipher,
QARMA[24]) to the architecture (compilers need to exploit the security feature, but as the instructions are in NOP space, they are backwards compatible albeit providing no extra security on older chips).
Nested virtualization (AArch64 only).
Advanced SIMD
complex number support (AArch64 and AArch32); e.g. rotations by multiples of 90 degrees.
New FJCVTZS (Floating-point
JavaScript Convert to Signed fixed-point, rounding toward Zero) instruction.[25]
A change to the memory consistency model (AArch64 only); to support the (non-default) weaker RCpc (Release Consistent processor consistent) model of
C++11/
C11 (the default C++11/C11 consistency model was already supported in previous ARMv8).
ID mechanism support for larger system-visible caches (AArch64 and AArch32).
ARMv8.3-A architecture is now supported by (at least) the
GCC 7 compiler.[26]
ARMv8.4-A
In November 2017, ARMv8.4-A was announced. Its enhancements fell into these categories:[27][28][29]
Branch Target Indicators (BTI) (AArch64) to reduce "the ability of an attacker to execute arbitrary code". Like pointer authentication, the relevant instructions are no-ops on earlier versions of ARMv8-A.
Random Number Generator instructions – "providing Deterministic and True Random Numbers conforming to various National and International Standards".
On 2 August 2019,
Google announced
Android would adopt Memory Tagging Extension (MTE).[34]
In March 2021, ARMv9-A was announced. ARMv9-A's baseline is all the features from ARMv8.5.[35][36][37] ARMv9-A also adds:
Scalable Vector Extension 2 (SVE2). SVE2 builds on SVE's scalable vectorization for increased fine-grain
Data Level Parallelism (DLP), to allow more work done per instruction. SVE2 aims to bring these benefits to a wider range of software including DSP and multimedia SIMD code that currently use
Neon.[38] The
LLVM/
Clang 9.0 and
GCC 10.0 development codes were updated to support SVE2.[38][39]
SIMD matrix manipulation instructions, BFDOT, BFMMLA, BFMLAL and BFCVT.
Enhancements for virtualization, system management and security.
And the following extensions (that
LLVM 11 already added support for[43]):
Enhanced Counter Virtualization (ARMv8.6-ECV).
Fine-Grained Traps (ARMv8.6-FGT).
Activity Monitors virtualization (ARMv8.6-AMU).
For example, fine-grained traps, Wait-for-Event (WFE) instructions, EnhancedPAC2 and FPAC. The bfloat16 extensions for SVE and Neon are mainly for deep learning use.[44]
ARMv8.7-A and ARMv9.2-A
In September 2020, ARMv8.7-A was announced. Its enhancements fell into these categories:[30][45]
Scalable Matrix Extension (SME)(ARMv9.2 only).[46] SME adds new features to process matrices efficiently, such as:
Matrix tile storage.
On-the-fly matrix transposition.
Load/store/insert/extract tile vectors.
Matrix outer product of SVE vectors.
"Streaming mode" SVE.
Enhanced support for PCIe hot plug (AArch64).
Atomic 64-byte load and stores to accelerators (AArch64).
Wait For Instruction (WFI) and Wait For Event (WFE) with timeout (AArch64).
Branch-Record recording (ARMv9.2 only).
ARMv8.8-A and ARMv9.3-A
In September 2021, ARMv8.8-A and ARMv9.3-A were announced. Their enhancements fell into these categories:[30][47]
Non-maskable interrupts (AArch64).
Instructions to optimize memcpy() and memset() style operations (AArch64).
In September 2022, ARMv8.9-A and ARMv9.4-A were announced, including:[49]
Virtual Memory System Architecture (VMSA) enhancements.
Permission indirection and overlays.
Translation hardening.
128-bit translation tables (ARMv9 only).
Scalable Matrix Extension 2 (SME2) (ARMv9 only).
Multi-vector instructions.
Multi-vector predicates.
2b/4b weight compression.
1b binary networks.
Range Prefetch.
Guarded Control Stack (GCS) (ARMv9 only).
Confidential Computing.
Memory Encryption Contexts.
Device Assignment.
ARM-R (real-time architecture)
This section needs expansion with: examples and additional citations. You can help by
adding to it. Relevant discussion may be found on
Talk:AArch64.(May 2021)
Optional AArch64 support was added to the Armv8-R profile, with the first Arm core implementing it being the Cortex-R82.[50] It adds the A64 instruction set, with some changes to the memory barrier instructions.[51]
References
^"Overview". Learn the architecture: Understanding the Armv8.x and Armv9.x extensions.
^"GCC 7 Release Series – Changes, New Features, and Fixes". The ARMv8.3-A architecture is now supported. It can be used by specifying the -march=armv8.3-a option. [..] The option -msign-return-address= is supported to enable return address protection using ARMv8.3-A Pointer Authentication Extensions.