Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

ARM 64bit has come!

32,167 views

Published on

The first impression of A64 instruction set.

Published in: Technology
  • Be the first to comment

ARM 64bit has come!

  1. 1. 1 ARM 64bit has come! Tetsuyuki Kobayashi 2014.5.23 Japan Technical Jamboree 2014.5.25 Updated for カーネル /VM 探検隊
  2. 2. 2  The latest version of this slide will be available from here  http://www.slideshare.net/tetsu.koba/presentati ons
  3. 3. 3 Who am I?  20+ years involved in embedded systems  10 years in real time OS, such as iTRON  10 years in embedded Java Virtual Machine  Now GCC, Linux, QEMU, Android, …  Blogs  http://d.hatena.ne.jp/embedded/ (Personal)  http://blog.kmckk.com/ (Corporate)  http://kobablog.wordpress.com/(English)  Twitter  @tetsu_koba
  4. 4. Today's topics  Introduction of ARM 64bit  But does not cover all, only something interesting for me :)  Try aarch64 using QEMU
  5. 5. ARMv8 terminology  AArch64: 64 bit mode  1 instruction set: A64  A64: 32bit fixed length instructions  AArch32: 32 bit mode  Upper compatible with ARMv7-A architecture  2 instruction sets: A32, T32  A32: ARM, 32bit fixed length instructions  T32: Thumb2, 16bit/32bit instructions
  6. 6. 6 ARM64 is not official name  In the kernel source  arch/arm64
  7. 7. Exception level  4 levels  Typical usage  EL0: User application  EL1: Kernel of OS  EL2: Hypervisor  EL3: Secure monitor  Aarch64/aarch32 can change between exception level  CF. PL0-PL2 (Privilege level) at ARMv7
  8. 8. Aarch64 execution model  R0 – R30: 64bit length general purpose registers  Wn: lower 32bit  Xn: 64bit  32th register means zero register(XZR, WZR) or SP  SP: Stack Pointer  Must be 16 byte aligned  WSP for lower 32bit  PC: Program Counter  Can not use for calculate destination
  9. 9. Aarch64 execution model (cont.)  V0 – V31: 128 bit length registers  For floating point and SIMD  Aarch64 must have FPU. No calling standard for soft-float.  Scalar  Bn, Hn, Sn, Dn, Qn  Vector  Vn.8B, Vn.16B, Vn.4H, Vn.8H, Vn.2S, Vn.4S, Vn.1D, Vn.2D  FPCR: Floating Point Control Register  FPSR: Floating Point Status Register
  10. 10. Aarch64 addressing model  Without tag: 64bit virtual address  With tag: 8bit tag + 56bit virtual address  Tag is ignored when load/store/branch  Good for implementing type-less languages  Effective virtual address length is 48bit.
  11. 11. Calling standard (AAPCS64)  R30 = LR (Link Register)  R29 = FP (Frame Pointer)  Parameter passing  R0 – R7 for integer and pointer  V0 – V7 for float  Callee must preserve  R19 – R29, SP  V8 – V15  No calling standard for soft-float
  12. 12. A64 instruction set  Brand-new, clean design for 64bit architecture  Not all, very small set of ”conditional data processing” instructions  No equivalent of Thumb2's IT instruction.
  13. 13. No multiple load/store  No multiple load/store GP registers such as LDM/STM, PUSH/POP  Instead, there are 2 register load/store such as LDP/STP
  14. 14. YIELD instruction  NOP with hinting not important  Use in spin-loop and trigger context switching in SMT(Symmetric Multi- Threading)
  15. 15. Sample #1 source #include <stdio.h> int main() { int i; for (i = 5; i >=0; i--) { printf("count down: %dn", i); } return 0; }
  16. 16. Sample #1 Thumb2 000083f8 <main>: 83f8: b570 push {r4, r5, r6, lr} 83fa: 2405 movs r4, #5 83fc: f248 456c movw r5, #33900 ; 0x846c 8400: f2c0 0500 movt r5, #0 8404: 2601 movs r6, #1 8406: 4630 mov r0, r6 8408: 4629 mov r1, r5 840a: 4622 mov r2, r4 840c: f7ff ef7a blx 8304 <_init+0x38> 8410: 3c01 subs r4, #1 8412: f1b4 3fff cmp.w r4, #4294967295 ; 0xffffffff 8416: d1f6 bne.n 8406 <main+0xe> 8418: 2000 movs r0, #0 841a: bd70 pop {r4, r5, r6, pc}
  17. 17. Sample #1 A64 0000000000400440 <main>: 400440: a9be7bfd stp x29, x30, [sp,#-32]! 400444: 910003fd mov x29, sp 400448: a90153f3 stp x19, x20, [sp,#16] 40044c: 90000014 adrp x20, 400000 <_init-0x3c0> 400450: 528000b3 mov w19, #0x5 // #5 400454: 911a0294 add x20, x20, #0x680 400458: 2a1303e2 mov w2, w19 40045c: 52800020 mov w0, #0x1 // #1 400460: aa1403e1 mov x1, x20 400464: 97ffffeb bl 400410 <__printf_chk@plt> 400468: 51000673 sub w19, w19, #0x1 40046c: 3100067f cmn w19, #0x1 400470: 54ffff41 b.ne 400458 <main+0x18> 400474: 52800000 mov w0, #0x0 // #0 400478: a94153f3 ldp x19, x20, [sp,#16] 40047c: a8c27bfd ldp x29, x30, [sp],#32 400480: d65f03c0 ret
  18. 18. Sample #2 source int iaload(int *base, int index) { return base[index]; } long long laload(long long *base, int index) { return base[index]; } char ibload(char *base, int index) { return base[index]; } short isload(short *base, int index) { return base[index]; }
  19. 19. Sample #2 Thumb2 00000000 <iaload>: 0: f850 0021 ldr.w r0, [r0, r1, lsl #2] 4: 4770 bx lr 6: bf00 nop 00000008 <laload>: 8: eb00 01c1 add.w r1, r0, r1, lsl #3 c: e9d1 0100 ldrd r0, r1, [r1] 10: 4770 bx lr 12: bf00 nop 00000014 <ibload>: 14: 5c40 ldrb r0, [r0, r1] 16: 4770 bx lr 00000018 <isload>: 18: f930 0011 ldrsh.w r0, [r0, r1, lsl #1] 1c: 4770 bx lr 1e: bf00 nop
  20. 20. Sample #2 A64 0000000000000000 <iaload>: 0: b861d800 ldr w0, [x0,w1,sxtw #2] 4: d65f03c0 ret 0000000000000008 <laload>: 8: f861d800 ldr x0, [x0,w1,sxtw #3] c: d65f03c0 ret 0000000000000010 <ibload>: 10: 3861c800 ldrb w0, [x0,w1,sxtw] 14: d65f03c0 ret 0000000000000018 <isload>: 18: 7861d800 ldrh w0, [x0,w1,sxtw #1] 1c: d65f03c0 ret
  21. 21. Sample #3 source double range(double x, double min, double max) { if (x < min) return min; else if (x > max) return max; else return x; }
  22. 22. Sample #3 Thumb2 00000000 <range>: 0: eeb4 0bc1 vcmpe.f64 d0, d1 4: eef1 fa10 vmrs APSR_nzcv, fpscr 8: d407 bmi.n 1a <range+0x1a> a: eeb4 0bc2 vcmpe.f64 d0, d2 e: eef1 fa10 vmrs APSR_nzcv, fpscr 12: bfc8 it gt 14: eeb0 0b42 vmovgt.f64 d0, d2 18: 4770 bx lr 1a: eeb0 0b41 vmov.f64d0, d1 1e: 4770 bx lr
  23. 23. Sample #3 A64 0000000000000000 <range>: 0: 1e612010 fcmpe d0, d1 4: 540000a4 b.mi 18 <range+0x18> 8: 1e622010 fcmpe d0, d2 c: 1e604041 fmov d1, d2 10: 5400004c b.gt 18 <range+0x18> 14: 1e604001 fmov d1, d0 18: 1e604020 fmov d0, d1 1c: d65f03c0 ret
  24. 24. Cache control  Application level cache instructions  Data cache  DC VAU  DC CVAC  DC CIVAC  Instruction cache  IC IVAU  No need to call kernel syscall  JIT friendly
  25. 25. Preloading cache  PRFM <prfop>, addr|label  <prfop> ::= <type><target><policy>  <type> ::= PLD | PST | PLI  <target> ::= L1 | L2 | L3  <policy> ::= KEEP | STRM
  26. 26. Non-temporal load/store  LDNP/STNP  Hinting unlikely to be accessed again (like streaming)
  27. 27. Aarch32  Upper compatible with ARMv7  Added encrypt extension  Added other some new instructions aligned to aarch64  Removed Jazelle, ThumbEE
  28. 28. Let's try Aarch64 using QEMU  Qemu 2.0 supports aarch64 user mode emulation  Ubuntu 14.04 has qemu 2.0 and cross compiler for aarch64 $ sudo apt-get install qemu-user-static $ sudo apt-get install g++-aarch64-linux-gnu
  29. 29. Prepare gdb for aarch64 $ sudo apt-get build-dep gdb $ wget http://ftp.gnu.org/gnu/gdb/gdb-7.7.1.tar.bz2 $ tar xf gdb-7.7.1.tar.bz2 $ mkdir obj $ cd obj $ ../gdb-7.7.1/configure --target=aarch64-linux-gnu $ make $ sudo make install
  30. 30. Execute by qemu and connect gdb $ aarch64-linux-gnu-gcc -g a.c $ export QEMU_LD_PREFIX=/usr/aarch64-linux-gnu/ $ qemu-aarch64-static -g 1234 ./a.out $ aarch64-linux-gnu-gdb ./a.out   ... (gdb) target remote :1234 (gdb) b main (gdb) c (gdb) x/i $pc => 0x4005a0 <main>: stp x29, x30, [sp,#-48]! (gdb)
  31. 31. DEMO
  32. 32. 32 References  ARMv8Technology Preview  ARMv8 Instruction Set Overview  ARM®Architecture Reference Manual  Procedure Call Standard for theARM 64-bitArch itecture(AArch64)  ARM 64bit ARMv8の アーキテクチャ の概要  Ubuntu 14.04 arm 64bit(aarch6で 4)のコードをコンパイルして動かしてみる
  33. 33. 33 Any comment? @tetsu_koba Thank you for listening!

×