ARM 64bit has come!

31,330 views

Published on

The first impression of A64 instruction set.

Published in: Technology
  • Be the first to comment

ARM 64bit has come!

  1. 1. 1 ARM 64bit has come! Tetsuyuki Kobayashi 2014.5.23 Japan Technical Jamboree 2014.5.25 Updated for カーネル /VM 探検隊
  2. 2. 2  The latest version of this slide will be available from here  http://www.slideshare.net/tetsu.koba/presentati ons
  3. 3. 3 Who am I?  20+ years involved in embedded systems  10 years in real time OS, such as iTRON  10 years in embedded Java Virtual Machine  Now GCC, Linux, QEMU, Android, …  Blogs  http://d.hatena.ne.jp/embedded/ (Personal)  http://blog.kmckk.com/ (Corporate)  http://kobablog.wordpress.com/(English)  Twitter  @tetsu_koba
  4. 4. Today's topics  Introduction of ARM 64bit  But does not cover all, only something interesting for me :)  Try aarch64 using QEMU
  5. 5. ARMv8 terminology  AArch64: 64 bit mode  1 instruction set: A64  A64: 32bit fixed length instructions  AArch32: 32 bit mode  Upper compatible with ARMv7-A architecture  2 instruction sets: A32, T32  A32: ARM, 32bit fixed length instructions  T32: Thumb2, 16bit/32bit instructions
  6. 6. 6 ARM64 is not official name  In the kernel source  arch/arm64
  7. 7. Exception level  4 levels  Typical usage  EL0: User application  EL1: Kernel of OS  EL2: Hypervisor  EL3: Secure monitor  Aarch64/aarch32 can change between exception level  CF. PL0-PL2 (Privilege level) at ARMv7
  8. 8. Aarch64 execution model  R0 – R30: 64bit length general purpose registers  Wn: lower 32bit  Xn: 64bit  32th register means zero register(XZR, WZR) or SP  SP: Stack Pointer  Must be 16 byte aligned  WSP for lower 32bit  PC: Program Counter  Can not use for calculate destination
  9. 9. Aarch64 execution model (cont.)  V0 – V31: 128 bit length registers  For floating point and SIMD  Aarch64 must have FPU. No calling standard for soft-float.  Scalar  Bn, Hn, Sn, Dn, Qn  Vector  Vn.8B, Vn.16B, Vn.4H, Vn.8H, Vn.2S, Vn.4S, Vn.1D, Vn.2D  FPCR: Floating Point Control Register  FPSR: Floating Point Status Register
  10. 10. Aarch64 addressing model  Without tag: 64bit virtual address  With tag: 8bit tag + 56bit virtual address  Tag is ignored when load/store/branch  Good for implementing type-less languages  Effective virtual address length is 48bit.
  11. 11. Calling standard (AAPCS64)  R30 = LR (Link Register)  R29 = FP (Frame Pointer)  Parameter passing  R0 – R7 for integer and pointer  V0 – V7 for float  Callee must preserve  R19 – R29, SP  V8 – V15  No calling standard for soft-float
  12. 12. A64 instruction set  Brand-new, clean design for 64bit architecture  Not all, very small set of ”conditional data processing” instructions  No equivalent of Thumb2's IT instruction.
  13. 13. No multiple load/store  No multiple load/store GP registers such as LDM/STM, PUSH/POP  Instead, there are 2 register load/store such as LDP/STP
  14. 14. YIELD instruction  NOP with hinting not important  Use in spin-loop and trigger context switching in SMT(Symmetric Multi- Threading)
  15. 15. Sample #1 source #include <stdio.h> int main() { int i; for (i = 5; i >=0; i--) { printf("count down: %dn", i); } return 0; }
  16. 16. Sample #1 Thumb2 000083f8 <main>: 83f8: b570 push {r4, r5, r6, lr} 83fa: 2405 movs r4, #5 83fc: f248 456c movw r5, #33900 ; 0x846c 8400: f2c0 0500 movt r5, #0 8404: 2601 movs r6, #1 8406: 4630 mov r0, r6 8408: 4629 mov r1, r5 840a: 4622 mov r2, r4 840c: f7ff ef7a blx 8304 <_init+0x38> 8410: 3c01 subs r4, #1 8412: f1b4 3fff cmp.w r4, #4294967295 ; 0xffffffff 8416: d1f6 bne.n 8406 <main+0xe> 8418: 2000 movs r0, #0 841a: bd70 pop {r4, r5, r6, pc}
  17. 17. Sample #1 A64 0000000000400440 <main>: 400440: a9be7bfd stp x29, x30, [sp,#-32]! 400444: 910003fd mov x29, sp 400448: a90153f3 stp x19, x20, [sp,#16] 40044c: 90000014 adrp x20, 400000 <_init-0x3c0> 400450: 528000b3 mov w19, #0x5 // #5 400454: 911a0294 add x20, x20, #0x680 400458: 2a1303e2 mov w2, w19 40045c: 52800020 mov w0, #0x1 // #1 400460: aa1403e1 mov x1, x20 400464: 97ffffeb bl 400410 <__printf_chk@plt> 400468: 51000673 sub w19, w19, #0x1 40046c: 3100067f cmn w19, #0x1 400470: 54ffff41 b.ne 400458 <main+0x18> 400474: 52800000 mov w0, #0x0 // #0 400478: a94153f3 ldp x19, x20, [sp,#16] 40047c: a8c27bfd ldp x29, x30, [sp],#32 400480: d65f03c0 ret
  18. 18. Sample #2 source int iaload(int *base, int index) { return base[index]; } long long laload(long long *base, int index) { return base[index]; } char ibload(char *base, int index) { return base[index]; } short isload(short *base, int index) { return base[index]; }
  19. 19. Sample #2 Thumb2 00000000 <iaload>: 0: f850 0021 ldr.w r0, [r0, r1, lsl #2] 4: 4770 bx lr 6: bf00 nop 00000008 <laload>: 8: eb00 01c1 add.w r1, r0, r1, lsl #3 c: e9d1 0100 ldrd r0, r1, [r1] 10: 4770 bx lr 12: bf00 nop 00000014 <ibload>: 14: 5c40 ldrb r0, [r0, r1] 16: 4770 bx lr 00000018 <isload>: 18: f930 0011 ldrsh.w r0, [r0, r1, lsl #1] 1c: 4770 bx lr 1e: bf00 nop
  20. 20. Sample #2 A64 0000000000000000 <iaload>: 0: b861d800 ldr w0, [x0,w1,sxtw #2] 4: d65f03c0 ret 0000000000000008 <laload>: 8: f861d800 ldr x0, [x0,w1,sxtw #3] c: d65f03c0 ret 0000000000000010 <ibload>: 10: 3861c800 ldrb w0, [x0,w1,sxtw] 14: d65f03c0 ret 0000000000000018 <isload>: 18: 7861d800 ldrh w0, [x0,w1,sxtw #1] 1c: d65f03c0 ret
  21. 21. Sample #3 source double range(double x, double min, double max) { if (x < min) return min; else if (x > max) return max; else return x; }
  22. 22. Sample #3 Thumb2 00000000 <range>: 0: eeb4 0bc1 vcmpe.f64 d0, d1 4: eef1 fa10 vmrs APSR_nzcv, fpscr 8: d407 bmi.n 1a <range+0x1a> a: eeb4 0bc2 vcmpe.f64 d0, d2 e: eef1 fa10 vmrs APSR_nzcv, fpscr 12: bfc8 it gt 14: eeb0 0b42 vmovgt.f64 d0, d2 18: 4770 bx lr 1a: eeb0 0b41 vmov.f64d0, d1 1e: 4770 bx lr
  23. 23. Sample #3 A64 0000000000000000 <range>: 0: 1e612010 fcmpe d0, d1 4: 540000a4 b.mi 18 <range+0x18> 8: 1e622010 fcmpe d0, d2 c: 1e604041 fmov d1, d2 10: 5400004c b.gt 18 <range+0x18> 14: 1e604001 fmov d1, d0 18: 1e604020 fmov d0, d1 1c: d65f03c0 ret
  24. 24. Cache control  Application level cache instructions  Data cache  DC VAU  DC CVAC  DC CIVAC  Instruction cache  IC IVAU  No need to call kernel syscall  JIT friendly
  25. 25. Preloading cache  PRFM <prfop>, addr|label  <prfop> ::= <type><target><policy>  <type> ::= PLD | PST | PLI  <target> ::= L1 | L2 | L3  <policy> ::= KEEP | STRM
  26. 26. Non-temporal load/store  LDNP/STNP  Hinting unlikely to be accessed again (like streaming)
  27. 27. Aarch32  Upper compatible with ARMv7  Added encrypt extension  Added other some new instructions aligned to aarch64  Removed Jazelle, ThumbEE
  28. 28. Let's try Aarch64 using QEMU  Qemu 2.0 supports aarch64 user mode emulation  Ubuntu 14.04 has qemu 2.0 and cross compiler for aarch64 $ sudo apt-get install qemu-user-static $ sudo apt-get install g++-aarch64-linux-gnu
  29. 29. Prepare gdb for aarch64 $ sudo apt-get build-dep gdb $ wget http://ftp.gnu.org/gnu/gdb/gdb-7.7.1.tar.bz2 $ tar xf gdb-7.7.1.tar.bz2 $ mkdir obj $ cd obj $ ../gdb-7.7.1/configure --target=aarch64-linux-gnu $ make $ sudo make install
  30. 30. Execute by qemu and connect gdb $ aarch64-linux-gnu-gcc -g a.c $ export QEMU_LD_PREFIX=/usr/aarch64-linux-gnu/ $ qemu-aarch64-static -g 1234 ./a.out $ aarch64-linux-gnu-gdb ./a.out   ... (gdb) target remote :1234 (gdb) b main (gdb) c (gdb) x/i $pc => 0x4005a0 <main>: stp x29, x30, [sp,#-48]! (gdb)
  31. 31. DEMO
  32. 32. 32 References  ARMv8Technology Preview  ARMv8 Instruction Set Overview  ARM®Architecture Reference Manual  Procedure Call Standard for theARM 64-bitArch itecture(AArch64)  ARM 64bit ARMv8の アーキテクチャ の概要  Ubuntu 14.04 arm 64bit(aarch6で 4)のコードをコンパイルして動かしてみる
  33. 33. 33 Any comment? @tetsu_koba Thank you for listening!

×