M E R C K H UN G < M E R C K H UN G @ G M A I L . C O M >
ARM V8 INSTRUCTION OVERVIEW
& 64-BIT ANDROID BRIEFING
OUTLINE
• Benefit of 64-bit
• Drawback of 64-bit
• 64-bit Android ecosystem
• 64-bit ELF file format
• AARCH32 & AARCH64 states
• AARCH64 register files
• AARCH64 calling convention
• AARCH64 runtime ABI (AEABI)
• AARCH64 instructions
• Summary of AARCH64
• State of 64-bit Android
Benefit of 64-bit
BENEFIT OF 64-BIT
• Larger virtual address space (AARCH64 -> 49 bits)
(Wind down overlaying issues of larger programs)
• Wider width memory access bus
(Reduce memory latency)
• Wider width of register files
(Better for 64-bit lengthy arithmetic)
• More register files (Reduce register spilling)
• New instruction set & new wider I/O peripherals
(New feature)
• New marketing momentum
• Opportunities to power consumption reduces
Drawback of 64-bit
DRAWBACK OF 64-BIT
• Larger size of program files
(On disk format)
• Larger size of pointers
(In memory format)
(32-bit -> 64-bit, 4bytes -> 8bytes in size)
• Mode switching overhead
(T32, A32, and A64 execution environment)
• Ecosystem migration & backward compatibility
(Upgrade to 64-bit, compiler, software vendors,
validations, time to market, SoC vendors, …etc.)
64-bit Android ecosystem
64-BIT ANDROID ECOSYSTEM
• 64-bit CPUs, SoCs & Reference Designs
(ARM, Apple, Samsung, nVidia, QCOM, Intel, …etc.)
• 64-bit Compiler (Linaro, LLVM/GNU and community)
• On Disk Format (ELF64 format)
• In Memory Format (Program Loader/Linker)
• Runtime (Linux kernel itself, SYSCALL, ART/DALVIK,
Calling convention)
• 64-bit native core, framework, shared/static library,
and SW vendor support (Mono, AIR, Unity, …etc.)
• Marketing & time to popularity of 64-bit APPs
• Validation efforts of 64-bit system & APPs
64-bit ELF file format
64-BIT ELF FILE FORMAT
• ELF for the ARM 64-bit Architecture (AArch64)
(http://infocenter.arm.com/help/topic/com.arm.doc.ihi0056b/IHI0056B_aaelf64.pdf)
• 64-bit header format (EM_AARCH64,
SHT_AARCH64_ATTRIBUTES, …etc.)
• Larger GOT/PLT entries & addresses
• New relocation types for 64-bit
• 64-bit DRAWF format of debugging info.
AARCH32(T32/A32) & AARCH64(A64)
states
AARCH32 & AARCH64 STATES
AARCH32 & AARCH64 STATES
AARCH32 & AARCH64 STATES
(T32 AND A32 MODES)
AARCH64 register files
AARCH64 REGISTER FILES
• 64-BIT ARM INTRODUCTION TO PORTING
(http://people.linaro.org/~rikuvoipio/aarch64-talk/)
• Integral registers
X0 X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14
X15 X16 X17 X18 X19 X20 X21 X22 X23 X24 X25 X26
X27 X28 X29 X30/LR SP/ZERO
AARCH64 REGISTER FILES
• 64-BIT ARM INTRODUCTION TO PORTING
(http://people.linaro.org/~rikuvoipio/aarch64-talk/)
• SCALAR/SIMD REGISTERS
32 bit float registers: S0 ... S31
64 bit double registers: D0 ... D31
128 bit SIMD registers: V0 ... V31
SIMD and Scalar share register bank
S0 is bottom 32 bits of D0 which is the bottom 64 bits of
V0.
• There are 32 S registers and 32 D registers. The S registers
are not packed into D registers, but occupy the low 32
bits of the corresponding D register.
For example S31=D31<31:0>, not D15<63:32>
AARCH64 REGISTER FILES
• Introducing the 64-bit ARMv8 Architecture
http://andrew.wafaa.eu/files/EuroBSDConARMv8.pdf
AARCH64 REGISTER FILES
(HIGHLIGHT)
• Zero register (Read from R31)
• Stack pointer (Write to R31)
• PC (Program Counter) is never accessible
• Zero extended to 64-bits in A32 mode
• General purpose registers extended from 15 to 31
• FP registers kept 32 in amount, changed to non-
packed
AARCH64 calling convention
AARCH64 CALLING CONVENTION
• Procedure Call Standard for the ARM 64-bit
Architecture (AArch64)
http://infocenter.arm.com/help/topic/com.arm.doc.ihi0055b/IHI0055B_aapcs64.pdf
AARCH64 CALLING CONVENTION
• Procedure Call Standard for the ARM® Architecture
http://infocenter.arm.com/help/topic/com.arm.doc.ihi0042e/IHI0042E_aapcs.pdf
AARCH64 CALLING CONVENTION
AARCH64 runtime ABI (AEABI)
AARCH64 RUNTIME ABI (AEABI)
• Run-time ABI for the ARM® Architecture
http://infocenter.arm.com/help/topic/com.arm.doc.ihi0043d/IHI0043D_rtabi.pdf
• C++ Application Binary Interface Standard for the
ARM 64-bit Architecture
http://infocenter.arm.com/help/topic/com.arm.doc.ihi0059b/IHI0059B_cppabi64.pdf
• Floating-point library (Removed, Built-in vFP)
• Long long helper functions (Removed, native 64-bit)
• Other C and assembly lang. helper functions (Kept)
• C++ helper function (Kept, Reinforced C++ ABI)
AARCH64 RUNTIME ABI (AEABI)
(LONG LONG HELPERS)
AARCH64 RUNTIME ABI (AEABI)
(LONG LONG HELPERS)
AARCH64 instructions
AARCH64 INSTRUCTIONS
• Conditional instructions
• Addressing features
1. Register indexed
2. PC-relative
• The program counter
• Memory Load-Store
1. Bulk transfers
2. Exclusive accesses
3. Load-Acquire, Store-Release
• Integer Multiply/Divide (for addressing 64-bit)
AARCH64 INSTRUCTIONS
(CONDITIONAL INSTRUCTIONS)
• Modern branch predictors work well enough
• In order to justify OPCODE space and impl. COST
• Only a very small set of “conditional data
processing” instr. are provided
1. Conditional branch
2. Add/substract
3. Conditional select with increment, negate or
invert (Select (move) or Set)
4. Conditional compare
AARCH64 INSTRUCTIONS
(ADDRESSING FEATURES)
• Register indexed addressing
Extended T32 addressing modes, allowing 64-bit
index and base registers to obtain addresses
• PC-relative addressing
PC-relative literal loads (+- 1MB)
Most conditional branches (+- 1MB)
Unconditional branches (+- 128MB)
PC-relative load/store by only 2 instructions (+- 4GB)
AARCH64 INSTRUCTIONS
(THE PROGRAM COUNTER)
• In AARCH32, R15 = PC, writing to R15 means
change the program counter
• In AARCH64, R15 != PC, PC can be changed by
neither writing values to R15 nor other instructions
• In AARCH64, PC can only be read by computing a
PC-relative address (ADR, ADRP, literal load, and
direct branch), and branch-and-link instructions (BL
and BLR)
• In AARCH64, PC can only be written by
conditional/unconditional branches and exception
handle/return
AARCH64 INSTRUCTIONS
(MEMORY LOAD-STORE)
• Bulk transfers
1. LDM, STM, PUSH, and POP removed
2. LDP and STP added (Paired dest. registers)
3. LDNP and STNP added (streaming and non-
temporal)
4. PRFM (prefetch memory) added
• Exclusive accesses (atomic operations)
• Load-acquire, Store-release
(Release-consistency, RCsc), reducing the need for
explicit memory barriers
AARCH64 INSTRUCTIONS
(INTEGER MULTIPLY/DIVIDE)
Summary of AARCH64
SUMMARY OF AARCH64
• New instruction set (decoding) & 32-bit fixed length
• Larger number of register files (31GPs, 32FPs)
• 64-bit pointer and integral registers
• Interoperability of AARCH32 (T32/A32) & AARCH64
• Mandate vFP and Advanced SIMD (built-in)
• LDM/STM removed, LDP/STP added
• Conditional instructions are reduced, few left
• PC-relative addressing
• Memory ordering (new LDRA/STRL, Load-
Acquire/Store-Release)
State of 64-bit Android
STATE OF 64-BIT ANDROID
• ARM64 CPUs, SoCs & Reference Designs
1. Samsung Exynos 5433 (Samsung Galaxy Note 4)
2. Qualcomm Snapdragon 8916 (Next upcoming)
3. nVidia Tegra K1(N9)
• X86_64 CPUs, SoCs & Reference Designs
1. Intel Baytrail-T ATOM SoC
2. Intel Moorefield ATOM SoC
• ARM64 Compiler
GCC (Ready, by Linaro and communities)
LLVM (Ready, by Apple for iOS development)
• X86_64 Compiler
GCC and LLVM (Ready)
STATE OF 64-BIT ANDROID
• ELF64 Format for ARM64 (On Disk)
ARM64 (Ready)
X86_64 (Ready)
• ELF64 Program Loader/Linker (In Memory)
ARM64 GNU Linker (Ready, by Linaro)
x86_64 GNU Linker (Ready)
• 64-bit Calling Convention
ARM64 (Ready)
X86_64 (Ready)
• 64-bit Linux Kernel & ABI
ARM64 (Ready, by Linaro)
X86_64 (Ready)
STATE OF 64-BIT ANDROID
• 64-bit Android ART Runtime
ARM64 (Ready)
X86_64 (Ready)
• 64-bit Android Emulator
ARM64 (Ready)
X86_64 (Ready)
• 64-bit native core, framework, shared/static library
ARM64 (Ready)
X86_64 (Ready)
STATE OF 64-BIT ANDROID
• 64-bit Android SW infrastructure software
(Mono, AIR, Unity, …etc.)
ARM64 (Not available yet)
X86_64 (Not available yet)
• 64-bit Android APPs
ARM64 (Not available yet)
X86_64 (Not available yet)
• RenderScript
ARM64 (Not available yet)
X86_64 (Not available yet)
THANK YOU

Arm v8 instruction overview android 64 bit briefing

  • 1.
    M E RC K H UN G < M E R C K H UN G @ G M A I L . C O M > ARM V8 INSTRUCTION OVERVIEW & 64-BIT ANDROID BRIEFING
  • 2.
    OUTLINE • Benefit of64-bit • Drawback of 64-bit • 64-bit Android ecosystem • 64-bit ELF file format • AARCH32 & AARCH64 states • AARCH64 register files • AARCH64 calling convention • AARCH64 runtime ABI (AEABI) • AARCH64 instructions • Summary of AARCH64 • State of 64-bit Android
  • 3.
  • 4.
    BENEFIT OF 64-BIT •Larger virtual address space (AARCH64 -> 49 bits) (Wind down overlaying issues of larger programs) • Wider width memory access bus (Reduce memory latency) • Wider width of register files (Better for 64-bit lengthy arithmetic) • More register files (Reduce register spilling) • New instruction set & new wider I/O peripherals (New feature) • New marketing momentum • Opportunities to power consumption reduces
  • 5.
  • 6.
    DRAWBACK OF 64-BIT •Larger size of program files (On disk format) • Larger size of pointers (In memory format) (32-bit -> 64-bit, 4bytes -> 8bytes in size) • Mode switching overhead (T32, A32, and A64 execution environment) • Ecosystem migration & backward compatibility (Upgrade to 64-bit, compiler, software vendors, validations, time to market, SoC vendors, …etc.)
  • 7.
  • 8.
    64-BIT ANDROID ECOSYSTEM •64-bit CPUs, SoCs & Reference Designs (ARM, Apple, Samsung, nVidia, QCOM, Intel, …etc.) • 64-bit Compiler (Linaro, LLVM/GNU and community) • On Disk Format (ELF64 format) • In Memory Format (Program Loader/Linker) • Runtime (Linux kernel itself, SYSCALL, ART/DALVIK, Calling convention) • 64-bit native core, framework, shared/static library, and SW vendor support (Mono, AIR, Unity, …etc.) • Marketing & time to popularity of 64-bit APPs • Validation efforts of 64-bit system & APPs
  • 9.
  • 10.
    64-BIT ELF FILEFORMAT • ELF for the ARM 64-bit Architecture (AArch64) (http://infocenter.arm.com/help/topic/com.arm.doc.ihi0056b/IHI0056B_aaelf64.pdf) • 64-bit header format (EM_AARCH64, SHT_AARCH64_ATTRIBUTES, …etc.) • Larger GOT/PLT entries & addresses • New relocation types for 64-bit • 64-bit DRAWF format of debugging info.
  • 11.
  • 12.
  • 13.
  • 14.
    AARCH32 & AARCH64STATES (T32 AND A32 MODES)
  • 15.
  • 16.
    AARCH64 REGISTER FILES •64-BIT ARM INTRODUCTION TO PORTING (http://people.linaro.org/~rikuvoipio/aarch64-talk/) • Integral registers X0 X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14 X15 X16 X17 X18 X19 X20 X21 X22 X23 X24 X25 X26 X27 X28 X29 X30/LR SP/ZERO
  • 17.
    AARCH64 REGISTER FILES •64-BIT ARM INTRODUCTION TO PORTING (http://people.linaro.org/~rikuvoipio/aarch64-talk/) • SCALAR/SIMD REGISTERS 32 bit float registers: S0 ... S31 64 bit double registers: D0 ... D31 128 bit SIMD registers: V0 ... V31 SIMD and Scalar share register bank S0 is bottom 32 bits of D0 which is the bottom 64 bits of V0. • There are 32 S registers and 32 D registers. The S registers are not packed into D registers, but occupy the low 32 bits of the corresponding D register. For example S31=D31<31:0>, not D15<63:32>
  • 18.
    AARCH64 REGISTER FILES •Introducing the 64-bit ARMv8 Architecture http://andrew.wafaa.eu/files/EuroBSDConARMv8.pdf
  • 19.
    AARCH64 REGISTER FILES (HIGHLIGHT) •Zero register (Read from R31) • Stack pointer (Write to R31) • PC (Program Counter) is never accessible • Zero extended to 64-bits in A32 mode • General purpose registers extended from 15 to 31 • FP registers kept 32 in amount, changed to non- packed
  • 20.
  • 21.
    AARCH64 CALLING CONVENTION •Procedure Call Standard for the ARM 64-bit Architecture (AArch64) http://infocenter.arm.com/help/topic/com.arm.doc.ihi0055b/IHI0055B_aapcs64.pdf
  • 22.
    AARCH64 CALLING CONVENTION •Procedure Call Standard for the ARM® Architecture http://infocenter.arm.com/help/topic/com.arm.doc.ihi0042e/IHI0042E_aapcs.pdf
  • 23.
  • 24.
  • 25.
    AARCH64 RUNTIME ABI(AEABI) • Run-time ABI for the ARM® Architecture http://infocenter.arm.com/help/topic/com.arm.doc.ihi0043d/IHI0043D_rtabi.pdf • C++ Application Binary Interface Standard for the ARM 64-bit Architecture http://infocenter.arm.com/help/topic/com.arm.doc.ihi0059b/IHI0059B_cppabi64.pdf • Floating-point library (Removed, Built-in vFP) • Long long helper functions (Removed, native 64-bit) • Other C and assembly lang. helper functions (Kept) • C++ helper function (Kept, Reinforced C++ ABI)
  • 26.
    AARCH64 RUNTIME ABI(AEABI) (LONG LONG HELPERS)
  • 27.
    AARCH64 RUNTIME ABI(AEABI) (LONG LONG HELPERS)
  • 28.
  • 29.
    AARCH64 INSTRUCTIONS • Conditionalinstructions • Addressing features 1. Register indexed 2. PC-relative • The program counter • Memory Load-Store 1. Bulk transfers 2. Exclusive accesses 3. Load-Acquire, Store-Release • Integer Multiply/Divide (for addressing 64-bit)
  • 30.
    AARCH64 INSTRUCTIONS (CONDITIONAL INSTRUCTIONS) •Modern branch predictors work well enough • In order to justify OPCODE space and impl. COST • Only a very small set of “conditional data processing” instr. are provided 1. Conditional branch 2. Add/substract 3. Conditional select with increment, negate or invert (Select (move) or Set) 4. Conditional compare
  • 31.
    AARCH64 INSTRUCTIONS (ADDRESSING FEATURES) •Register indexed addressing Extended T32 addressing modes, allowing 64-bit index and base registers to obtain addresses • PC-relative addressing PC-relative literal loads (+- 1MB) Most conditional branches (+- 1MB) Unconditional branches (+- 128MB) PC-relative load/store by only 2 instructions (+- 4GB)
  • 32.
    AARCH64 INSTRUCTIONS (THE PROGRAMCOUNTER) • In AARCH32, R15 = PC, writing to R15 means change the program counter • In AARCH64, R15 != PC, PC can be changed by neither writing values to R15 nor other instructions • In AARCH64, PC can only be read by computing a PC-relative address (ADR, ADRP, literal load, and direct branch), and branch-and-link instructions (BL and BLR) • In AARCH64, PC can only be written by conditional/unconditional branches and exception handle/return
  • 33.
    AARCH64 INSTRUCTIONS (MEMORY LOAD-STORE) •Bulk transfers 1. LDM, STM, PUSH, and POP removed 2. LDP and STP added (Paired dest. registers) 3. LDNP and STNP added (streaming and non- temporal) 4. PRFM (prefetch memory) added • Exclusive accesses (atomic operations) • Load-acquire, Store-release (Release-consistency, RCsc), reducing the need for explicit memory barriers
  • 34.
  • 35.
  • 36.
    SUMMARY OF AARCH64 •New instruction set (decoding) & 32-bit fixed length • Larger number of register files (31GPs, 32FPs) • 64-bit pointer and integral registers • Interoperability of AARCH32 (T32/A32) & AARCH64 • Mandate vFP and Advanced SIMD (built-in) • LDM/STM removed, LDP/STP added • Conditional instructions are reduced, few left • PC-relative addressing • Memory ordering (new LDRA/STRL, Load- Acquire/Store-Release)
  • 37.
  • 38.
    STATE OF 64-BITANDROID • ARM64 CPUs, SoCs & Reference Designs 1. Samsung Exynos 5433 (Samsung Galaxy Note 4) 2. Qualcomm Snapdragon 8916 (Next upcoming) 3. nVidia Tegra K1(N9) • X86_64 CPUs, SoCs & Reference Designs 1. Intel Baytrail-T ATOM SoC 2. Intel Moorefield ATOM SoC • ARM64 Compiler GCC (Ready, by Linaro and communities) LLVM (Ready, by Apple for iOS development) • X86_64 Compiler GCC and LLVM (Ready)
  • 39.
    STATE OF 64-BITANDROID • ELF64 Format for ARM64 (On Disk) ARM64 (Ready) X86_64 (Ready) • ELF64 Program Loader/Linker (In Memory) ARM64 GNU Linker (Ready, by Linaro) x86_64 GNU Linker (Ready) • 64-bit Calling Convention ARM64 (Ready) X86_64 (Ready) • 64-bit Linux Kernel & ABI ARM64 (Ready, by Linaro) X86_64 (Ready)
  • 40.
    STATE OF 64-BITANDROID • 64-bit Android ART Runtime ARM64 (Ready) X86_64 (Ready) • 64-bit Android Emulator ARM64 (Ready) X86_64 (Ready) • 64-bit native core, framework, shared/static library ARM64 (Ready) X86_64 (Ready)
  • 41.
    STATE OF 64-BITANDROID • 64-bit Android SW infrastructure software (Mono, AIR, Unity, …etc.) ARM64 (Not available yet) X86_64 (Not available yet) • 64-bit Android APPs ARM64 (Not available yet) X86_64 (Not available yet) • RenderScript ARM64 (Not available yet) X86_64 (Not available yet)
  • 42.