Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

ARM AAE - Developing Code for ARM

1,021 views

Published on

ARM AAE Techcon 2012 - Developing code for ARM

Published in: Education
  • Be the first to comment

ARM AAE - Developing Code for ARM

  1. 1. SOFTWARE & SYSTEMS DESIGN 5 – Developing Code for ARM
  2. 2. AGENDA • Embedded Software Development Compilation Tools Linking and Libraries Target Platforms AAETC5v00 Developing Code for ARM 2 Target Platforms Debug Invasive Debug Non-Invasive Debug Performance Monitoring
  3. 3. BUILDING EMBEDDED SOFTWARE .c .o .axf Compile Link Optional AAETC5v00 Developing Code for ARM 3 .s .lib Assemble Target System Optional binary conversion Librarian
  4. 4. AGENDA Embedded Software Development • Compilation Tools Linking and Libraries Target Platforms AAETC5v00 Developing Code for ARM 4 Target Platforms Debug Invasive Debug Non-Invasive Debug Performance Monitoring
  5. 5. THE COMPILER • Set optimization level appropriately – In general, increasing optimization level reduces debug visibility • For the ARM compiler… -O0 : best debug view, restricted optimization -O1 : most optimizations, good debug view -O2 : full optimization (the default), limited debug view -O3 : higher optimisation, “more aggressive” than –O2 • Most compilers allow you to optimize for either code size or execution speed AAETC5v00 Developing Code for ARM 5 • Most compilers allow you to optimize for either code size or execution speed – For the ARM compiler… -Otime / -Ospace • It is vital to specify the target processor or architecture • For the ARM compiler… • Specify architecture: –-cpu 5TE • Specify processor: –-cpu Cortex-A9 – Be as specific as possible to enable maximum optimization – Make sure you specify other features of the target platform, e.g. • Unaligned access: –-no_unaligned_access • Floating point support: –-fpu=vfpv3_d16
  6. 6. VARIABLE TYPES • An ABI-compliant ARM compiler supports these basic types: int / long 32 bit (word) integer short 16-bit (half-word) integer char 8-bit byte, unsigned by default long long 64-bit integer AAETC5v00 Developing Code for ARM 6 float 32-bit single-precision IEEE floating point double 64-bit double-precision IEEE floating point bool 8-bit Boolean (C++ only) wchar_t 16-bit “wide character” type (C++ only) Pointers 32-bit integer addresses • Take care when porting legacy code from other vendors’ architectures
  7. 7. INSTRUCTION SET SELECTION • ARMv7-AR processors support two instructions sets • ARM – Use for critical functions which perform better with access to the whole register set and all instruction features – For the ARM compiler… --arm or --arm_only • Thumb with Thumb-2 extensions AAETC5v00 Developing Code for ARM 7 • Thumb with Thumb-2 extensions – Use for the majority of compiled code – For the ARM compiler… --thumb • Some compilers support #pragmas for selecting instruction set on a per- function basis – For the ARM compiler… #pragma arm or #pragma thumb
  8. 8. INTRINSICS • C/C++ are suited to a wide variety of tasks but do not provide built-in support for specific areas of application, e.g. DSP operations • Most compilers support various families of intrinsics – Instruction intrinsics for realizing ARM instructions from your C/C++ code • Generic intrinsics: __current_pc, __current_sp, __return_address, ... AAETC5v00 Developing Code for ARM 8 __return_address, ... • IRQ/FIQ intrinsics: __disable_irq, __enable_irq, ... • Optimization barriers: __schedule_barrier, __force_stores, ... • Native instructions: __pld, __ldrex, __isb, __dsb,... • DSP intrinsics: __clz, __fabs, __sqrt, ... – Named register variables (e.g., register int cpsr __asm(“CPSR”)) – NEON intrinsics for use with the NEON instruction set to access NEON features (in arm_neon.h)
  9. 9. AUTOMATIC VECTORIZATION void add_int(int * restrict pa, int * restrict pb, unsigned int n, int x) { unsigned int i; for(i = 0; i < (n & ~3); i++) pa[i] = pb[i] + x; } add_int PROC BICS r12,r2,#3 AAETC5v00 Developing Code for ARM 9 armcc ----cpu=Cortex-A8 –O3 –Otime BICS r12,r2,#3 BEQ |L0.36| VDUP.32 q1,r3 LSR r2,r2,#2 |L0.16| VLD1.32 {d0,d1},[r1]! SUBS r2,r2,#1 VADD.I32 q0,q0,q1 VST1.32 {d0,d1},[r0]! BNE |L0.16| |L0.36| BX lr
  10. 10. CROSS VS. NATIVE COMPILATION • In traditional (native) software development the compilation tools are executed on the same platform which runs the output code • In embedded software development, this is not usually the case .c .o .axf Target System Download AAETC5v00 Developing Code for ARM 10 .c .axf Compile Link System Debugger Host system Target system In the above example, code is compiled and linked on the host system and then downloaded to the target system for execution The debugger may run on the same host or on a different one
  11. 11. AGENDA Embedded Software Development Compilation Tools • Linking and Libraries Target Platforms AAETC5v00 Developing Code for ARM 11 Target Platforms Debug Invasive Debug Non-Invasive Debug Performance Monitoring
  12. 12. armlink THE LINKER • It links object files produced by a compiler or assembler into an executable image image.axf Object Files Libraries AAETC5v00 Developing Code for ARM 12 Memory Description To do this it must: Ensure that all the required functions and data are present in the image Place the contents of the object files to suit the specified memory map Fill in any required addresses
  13. 13. HOW DOES THE LINKER KNOW WHAT TO DO? • The linker uses several inputs to decide what to do – Command line • List of object files and user library files • Output file name • Other options, for example diagnostic information – Description of the memory map • Command line options for simple images, Scatterfile for complex images – Object files AAETC5v00 Developing Code for ARM 13 – Object files • Symbol table – contains information on what variables/functions are in the object file (definitions) and required (references) by the object file • Relocation information – informs the linker where it needs to fill in address information • For a link step to succeed it must match a single symbol definition to every reference • Example command line armlink object1.o object2.o lib1.a --scatter memory.scat –o image.axf
  14. 14. OBJECT FILE STRUCTURE • Object files (and images) are ELF format • Contents are split into a number of sections • Program sections – Program code – Initialized (RW) data – Zero-Initialized (ZI) data ELF header Code RW Data AAETC5v00 Developing Code for ARM 14 – Zero-Initialized (ZI) data • Non-program sections – Symbol table – Relocation information – Debug data (DWARF2/3) • The linker works with whole sections – Can not split sections or add to sections – A section can be moved independently of other sections ZI Data Symbol table Relocation information …
  15. 15. LIBRARY STRUCTURE • A library is a collection of object files gathered together into a single “ar” format file • Symbol table Library header Object1.o Symbol table AAETC5v00 Developing Code for ARM 15 • Symbol table – Symbol names – Object file(s) that contain the symbol – File offset to object file Object2.o Object3.o Object4.o
  16. 16. SCATTER-LOADING • “Scatter-loading” is the ARM tools mechanism to describe the memory layout for the program • The memory description can be specified with – Command line options for simple images (--ro-base, --rw-base) AAETC5v00 Developing Code for ARM 16 – Command line options for simple images (--ro-base, --rw-base) – Text description file (scatterfile) for more complex images (--scatter) • This describes the placement of code and data • The syntax of the scatter-loading file is not discussed in detail in this training course
  17. 17. ENTRY POINTS • An application usually has to have at least one entry point – This is where the application starts executing – When running with a debugger, this is the initial program counter value – When executing stand-alone on target hardware, the entry point is usually the reset vector • Entry points are used by the linker to identify which modules are AAETC5v00 Developing Code for ARM 17 • Entry points are used by the linker to identify which modules are required by an application – Unused modules will be automatically eliminated – Modules which are not called or referenced must be marked as entry points to prevent their removal • Examples include the vector table • Linkers vary in how entry points are defined
  18. 18. STATIC AND DYNAMIC LIBRARIES .o.c Static Program X Static library Program Y Static library Creating static and dynamic libraries Using static and dynamic libraries Static linking at build-time AAETC5v00 Developing Code for ARM 18 Dynamic Program X Program Y Shared library Dynamic linking at run-time A dynamic, shared library may be loaded automatically by the Operating System or on demand by the application
  19. 19. C LIBRARY AAETC5v00 Developing Code for ARM 19
  20. 20. RETARGETING THE C LIBRARY • You should replace the C library’s device driver level functionality with an implementation that is tailored to your target hardware – For example: printf() should go to LCD screen, not debugger console AAETC5v00 Developing Code for ARM 20 • You must also target the C library memory map to y our target e.g. setting the initial value of the stack pointer
  21. 21. REMOVING SEMIHOSTING • The standard ARM C library makes use of a technique called “semihosting” to access hardware-specific features – In the absence of drivers, these are intercepted by the debugger and routed to the host system – For more detail on semihosting, see the Software Debug section • To ‘Retarget’ the C library, simply replace those C library functions which use semihosting with your own implementations, to suit your system – For example, the family of functions (except ) all ultimately call AAETC5v00 Developing Code for ARM 21 – For example, the printf() family of functions (except sprintf()) all ultimately call fputc() – The default implementation of fputc() uses semihosting – Replace this with: extern void sendchar(char *ch); – int fputc(int ch, FILE *f) { /* e.g. write a character to an LCD */ char tempch = ch; sendchar(&tempch); return ch; }
  22. 22. RUN-TIME MEMORY MODELS • You must decide whether to place your stack and heap in a single region of memory (one-region model) or in separate regions (two-region model) Stack SB SB HB HL Heap heap is checked against heap limit AAETC5v00 Developing Code for ARM 22 Heap Stack One region model Two region model HB SBheap is checked against stack pointer • One region model is the default • To implement a two-region model, import __use_two_region_memory The initial value of the stack pointer must be doubleword-aligned
  23. 23. ABI • The standard C library will conform to the ARM ABI – Application Binary Interface • The most important part of this is the calling convention – Otherwise know as the Procedure Call AAETC5v00 Developing Code for ARM 23 – Otherwise know as the Procedure Call Standard for the ARM Architecture, or “AAPCS” – This governs register usage across function calls – It also specifies stack alignment requirements • Floating point linkage… …is tricky!
  24. 24. SOFT OR HARD FLOATING POINT • Soft FP does not require hardware capability – Entirely software solution using run-time library – Slower than hardware solutions • Hard FP requires coprocessor (e.g. VFP/NEON) – Later versions do not require library support – Faster than software emulation AAETC5v00 Developing Code for ARM 24 – Faster than software emulation • Code compiled for hard FP will not run on systems which do not have floating point hardware support • Code can be compiled with a variety of linkage options to maximize flexibility
  25. 25. FLOATING POINT LINKAGE • How floating point parameters and return values are passed into and returned from functions is called the “floating point linkage” • Hardware floating point linkage – Floating point arguments are passed to (and returned from) functions in VFP Coprocessor registers – Requires VFP Coprocessor to be present – Can only be used with ARM and Thumb-2 code AAETC5v00 Developing Code for ARM 25 – Can only be used with ARM and Thumb-2 code • Software floating point linkage – Floating point arguments are passed to (and returned from) functions in ARM registers – Compatible with all ARM cores, with or without VFP – Can still have code that uses VFP instructions • Can not mix functions that use different floating point linkage – Arguments will not be in the correct registers
  26. 26. foo PUSH {r4-r6, lr} MOV r4, r1 BL __aeabi_fadd MOV r5, r0 MOV r1, r4 MOV r0, r4 BL __aeabi_fmul MOV r1, r5 POP {r4-r6, lr} float foo(float num1, float num2) { float temp, temp2; temp = num1 + num2; temp2 = num2 * num2; return temp2 - temp; } float.c FLOATING POINT EXAMPLE AAETC5v00 Developing Code for ARM 26 armcc float.c POP {r4-r6, lr} B __aeabi_fsub foo VADD.F32 s2, s0, s1 VMUL.F32 s0, s1, s1 VSUB.F32 s0, s0, s2 BX lr armcc --fpu=vfpv2 float.carmcc --fpu=softvfp+vfpv2 float.c foo VMOV s1,r0 VMOV s0,r1 VADD.F32 s1,s1,s0 VMUL.F32 s0,s0,s0 VSUB.F32 s0,s0,s1 VMOV r0,s0 BX lr armcc --fpu=softvfp+vfpv2 --thumb float.c
  27. 27. AGENDA Embedded Software Development Compilation Tools Linking and Libraries • Target Platforms AAETC5v00 Developing Code for ARM 27 • Target Platforms Debug Invasive Debug Non-Invasive Debug Performance Monitoring
  28. 28. TARGET • Models – Programmers view model (PV) – Cycle Accurate Model (CA) AAETC5v00 Developing Code for ARM 28 • Development Boards • Final Hardware
  29. 29. AGENDA Embedded Software Development Compilation Tools Linking and Libraries Target Platforms AAETC5v00 Developing Code for ARM 29 Target Platforms • Debug Invasive Debug Non-Invasive Debug Performance Monitoring
  30. 30. WHY DEBUG? • Debugging can be a useful way to determine why events are occurring on your system • For example: – Why is an abort occurring when the core executes a particular function? – Why is an interrupt not being taken as expected? AAETC5v00 Developing Code for ARM 30 – Why is an interrupt not being taken as expected? – Why am I not seeing the expected result for a set of computations? – Why does my application crash when X occurs? – What was happening when my application crashed? • ARM debug falls into two categories, invasive and non- invasive
  31. 31. TYPES OF DEBUG • Invasive – Any debug method that affects the state of the system – For example: • Stopping execution • Modifying registers • Reading from and writing to memory via the core AAETC5v00 Developing Code for ARM 31 • Reading from and writing to memory via the core • Non-invasive – Any debug method that does not effect the state of the system – For example: • Performance Monitoring Unit (without interrupts) • Trace
  32. 32. DEBUG INFRASTRUCTURE e.g. DStream Debugger USB/Ethernet ARM Debug Logic CoreSightInfrastructure Debug Hardware e.g. DS-5 JTAG ARM Debug Logic Third Party IP AAETC5v00 Developing Code for ARM 32 • ARM processors have integrated debug logic, which contains the necessary registers and comparators to perform debug operations ― The Debug Status and Control Register (DSCR) in the Debug Logic controls the debug mode and state of the core • CoreSight is the standard for connecting together multiple debug components in a system ― This course does not cover low level debug information or CoreSight Debug Logic
  33. 33. ARM DEBUG LOGIC COMPONENTS ARM Core ETM Memory Control Address Data Single Core System on Chip Debug Logic AAETC5v00 Developing Code for ARM 33 Debug Port Trace Port ETB Debug Logic
  34. 34. AGENDA Embedded Software Development Compilation Tools Linking and Libraries Target Platforms AAETC5v00 Developing Code for ARM 34 Target Platforms Debug • Invasive Debug Non-Invasive Debug Performance Monitoring
  35. 35. HALTING MODE DEBUG • Debug State – Core is halted and isolated from rest of system – Processor and system state can be viewed/modified – No interrupts will be handled until execution restarted by debugger • Entry into debug state is caused by AAETC5v00 Developing Code for ARM 35 • Entry into debug state is caused by – Request from external debug agent, or – Core hitting a breakpoint • In debug state, the core is isolated from the clock • The external debugger may read the status of core signals • Under the control of the debugger, the core may be made to execute instructions – This allows the debugger to read and modify system state
  36. 36. MONITOR MODE DEBUG • Used when it is not possible or desirable to halt the target CPU – e.g. Hard disk controller, Engine Management system • External debugger communicates with the system via a resident software monitor – This is downloaded to the target by the debugger • Monitor program is entered via an exception AAETC5v00 Developing Code for ARM 36 • Monitor program is entered via an exception – Caused when a BKPT instruction is executed – This instruction is placed in instruction memory by the debugger in order to set breakpoints • The debugger communicates with the monitor via a reserved channel called the “Debug Communications Channel” (DCC) • Breakpoints and Watchpoints can be set when in Monitor Mode Debug – Using MRC, MCR instructions from a privileged mode
  37. 37. BREAKPOINTS AND WATCHPOINTS Data Address Instruction MEMORYARM Mask and Control logic VALUE Comparators EXECUTE AAETC5v00 Developing Code for ARM 37 • Separate comparators on instruction and data buses • Breakpoints for Instruction, Watchpoints for Data Instruction Address VALUE BREAK ‘Tags’ instruction so break will only occur if instruction reaches the execute stage FETCH DECODE EXECUTE Pipeline
  38. 38. VECTOR CATCH • Dedicated logic for trapping exceptions – Sensitive only to hard exceptions – A branch into the vector table will not be trapped • Useful during early stages of development when software handlers may not be implemented FIQ IRQ (Reserved) Data Abort Prefetch Abort Software Interrupt Undefined Instruction Reset 0x1C 0x18 0x14 0x10 0x0C 0x08 0x04 0x00 AAETC5v00 Developing Code for ARM 38 when software handlers may not be implemented • Allows core to be reset and execution stop at reset vector – Prevents any code from being executed out of reset • Useful to trap data aborts Reset0x00
  39. 39. SINGLE STEPPING • Stepping can occur at high or low levels in a debugger • High level step – Step over a single line of C/C++ code • Can involve execution of many instructions • Usually implemented by setting an instruction breakpoint at destination address and running the core AAETC5v00 Developing Code for ARM 39 and running the core • Low level step – Step a single machine instruction • These are configured to halt core at any address except the current one, when the core is run • Execution halted when next instruction about to be executed • Interrupts may or may not be respected when stepping (depending on debug logic configuration)
  40. 40. SOFTWARE INSTRUCTION BREAKPOINTS • Software Instruction Breakpoints rely on modifying the contents of memory – Can therefore only be used in RAM – In theory, an unlimited number of software breakpoints can be set • Debug tools use BKPT instruction • Original instruction needs to be replaced to ‘step off’ the breakpoint – First instruction stepped, and breakpoint replaced when target execution started AAETC5v00 Developing Code for ARM 40 Memory 1. Read and store opcode 2. Write BKPT opcode BKPT instruction written to memory
  41. 41. VIEWING MEMORY • When in debug state, most debuggers access memory through the processor – This means the debugger displays memory as seen by the processor • Will see the affects of the memory management, caches, etc… ARMDAP (Debug Access Port) Chip AAETC5v00 Developing Code for ARM 41 • If a debugger reads or writes target memory it may need to perform cache maintenance operations – For instance, to write a software breakpoint, the debugger may need to clean the data cache and invalidate the instruction cache • Beware of possible side-effects when accessing memory via the debugger (Debug Access Port) Memory
  42. 42. SEMIHOSTING • Semihosting: Library code runs on ARM target, low-level I/O support provided by debug tools printf(“Hello World!n”); : Application Code Library Code SVC x123456 : AAETC5v00 Developing Code for ARM 42 – Commonly used to provide file and string I/O before hardware-specific device drivers available • Uses reserved SVC numbers (0x123456 or 0xAB) – ARM compilation tools use semihosting implementations for many default C library I/O functions (eg: printf, scanf, fopen) • Full details in compiler documentation – Semihosting is supported by all ARM's debug tools
  43. 43. AGENDA Embedded Software Development Compilation Tools Linking and Libraries Target Platforms AAETC5v00 Developing Code for ARM 43 Target Platforms Debug Invasive Debug • Non-Invasive Debug Performance Monitoring
  44. 44. ARM TRACE LOGIC COMPONENTS ARM Core ETM Memory Control Address Data Single Core System on Chip Debug Logic AAETC5v00 Developing Code for ARM 44 Debug Port Trace Port ETB Debug Logic • What is trace? • Trace is non-invasive debug
  45. 45. ON-CHIP TRACE CAPTURE VS. OFF-CHIP • High speed, high bandwidth trace – Small on-chip embedded trace buffer (ETB) – Limited execution history and data capture – Useful for in field failure analysis AAETC5v00 Developing Code for ARM 45 • Lower speed, lower bandwidth trace – Larger off-chip trace buffer • Trace port analyzer (e.g. RVT unit) – Increased execution history – Better for profiling and code coverage • More trace port pins, higher bandwidth
  46. 46. STANDARD DEBUG TECHNIQUES • Call Stack – A call stack trace will show you the history of function calls up to the point the program halted • Single Step/Start/Stop – Allows you to execute AAETC5v00 Developing Code for ARM 46 – Allows you to execute • Single instructions • Single high-level source statements • Functions • Etc. • Printf – Simple text output (often generated via printf) is a very common way of tracking program execution printf(“Hello World!n”); :
  47. 47. DEBUG SERVER VS. BARE METAL • A debug server (e.g. GDBserver) is a control program which runs on the target platform alongside the application you wish to debug – Requires that the application to be debugged is already resident on the target GDBServer ARM Application TCP or serial port Debugger AAETC5v00 Developing Code for ARM 47 JTAG – Requires that the application to be debugged is already resident on the target – Operates under Unix-based OSes (like Linux or Android) • Bare metal debug – Used to debug non-OS based images, kernels and device drivers – Images can be dynamically downloaded to target memory ARM Debugger
  48. 48. AGENDA Embedded Software Development Compilation Tools Linking and Libraries Target Platforms AAETC5v00 Developing Code for ARM 48 Target Platforms Debug Invasive Debug Non-Invasive Debug • Performance Monitoring
  49. 49. PERFORMANCE MONITORING HARDWARE • ARMv7-A cores include a performance monitoring unit (PMU) • A PMU provides a non-intrusive method of collecting execution information from the core – Enabling the PMU does not change the timing of the core • The PMU provides: – Cycle counter – counts execution cycles (optional 1/64 divider) – Programmable event counters AAETC5v00 Developing Code for ARM 49 – Programmable event counters • The number of counters and available events vary between cores – The PMU can be configured to generate interrupts if a counter overflows • Counting the interrupts allows data to be collected over an arbitrarily long time period • Some examples common to most cores: – Cache Hits or Misses, TLB Misses (on MMU cores), Branch prediction, correct/incorrect predictions, Number of instructions executed, etc… • Some events are architecturally defined while others are core-dependent – Check the ARM ARM and your core’s TRM for a full list
  50. 50. USING THE PMU IN LINUX • In an OS environment you may not have direct access to the PMU • Most OSes will provide some other method to access the PMU – Typically an API, e.g. Linux provides PerfEvents AAETC5v00 Developing Code for ARM 50 armv7_pmnc_enable_counter(ARMV7_CCNT); armv7pmu_start(void); armv7pmu_stop(void); armv7pmu_read_counter(ARMV7_CCNT);
  51. 51. HOW CLOSE TO REALITY? • When debugging you need to consider how close the development platform is to your final target hardware: • Custom board (final hardware) • May not be available until late in the development cycle • Development board based on the same part • Available earlier and similar base peripheral set, may not include custom IP AAETC5v00 Developing Code for ARM 51 Realism include custom IP • Development platform based on the same core • Available very early, but could have a very different peripheral set/memory characteristics to final design • Cycle Accurate Model • Limited availability • Programmers view model • Available early, may not show errors due to timing or access ordering
  52. 52. ARE MY NUMBERS MEANINGFUL? • It is easy to get a set of numbers, but how can you ensure that they are meaningful? • System Configuration – Are you configuring the core and board features (MMU/MPU, caches, branch prediction…) as they will be in the final design? • Semihosting AAETC5v00 Developing Code for ARM 52 • Semihosting – An “out of the box” build with DS-5 will use semihosting for many operations • For example input/output and calls to time() – Semi-hosted operations can take hundreds or thousands of extra cycles – Useful for getting something working, but will not be included in the final design • Code Fragments: Caches & Interrupts – When testing small code sections (e.g. an algorithm) in isolation you might get different performance to running the same code under an OS • Code may fit entirely within cache, without risk of being evicted • Might not have interrupts enabled
  53. 53. SOFTWARE & SYSTEMS DESIGN 5 – Developing Code for ARM

×