1 
Embedded System - ARM 
Development and Optimization Sharing Session 
 Thumb2 Conditional Code 
 Benchmark using cycle count 
 Optimization Technique 
 Mutex & ARM Exclusive Monitor 
竹內宏輝 hirokiht <hiroki04030@yahoo.com>
2 
Thumb2 Conditional Code 
Thumb Thumb2 
ISA 16bit 16bit/32bit 
Conditional 
Only supported in 
Code 
branch 
Use IT-block 
Bitwise - BFC/BFI/SBFX 
/UBFX 
Table-branch 
TBB/TBH 
List of Condition Code: 
 Eq, hs, mi, vs, hi, ge, gt, 
al, ne, lo, pl, vc, ls, lt, le 
 Example: 
ITETT EQ 
MOVEQ r0, #1 
MOVNE r0, #0 
MOVEQ r1, #0 
MOVEQ r2, #0
3 
Thumb2 Conditional Code
4 
Benchmark using cycle count 
Naive Approach: 
 Using cycle 
count/systick/system time 
difference, less is better. 
 QEMU doesn't have 
systick, so can't be 
emulated. Have to use real 
hardware. 
Commercial Approach 
 Use SDK/Development 
Kit/Workbench, highly 
emulated mcu. Very 
expensive....
5 
Benchmark using cycle count 
My Thoughts: 
 For low-end hardware, hardware is cheaper 
than the commercial development tool, just 
buy and plug it in. The best benchmark is to do 
a REAL benchmark on hardware.
6 
Optimization Technique 
Choose the best compilation option, 
fully utilize the hardware (if 
possible) 
 Hardware Multiplication and Division 
 DSP 
 FPU 
 DMA 
 Saturation 
 Thumb2 ISA 
 Hardware controller - Ethernet, Graphic, 
SDIO, RTC, Encoder/Decoder, etc
7 
Optimization Technique 
 If the operation involve 2n, think again, 
maybe bitwise operator can help 
 Multiplication x<<n = x*2^n 
 Division x >> n = x/2^n (Note, lose 
precision) 
 Modulus x&(n-1) = x%2^n 
 Check is 2^n or not: x & (x-1) == 0 
 Swap (without using tmp): 
– x ^= y; y ^= x; x ^= y;
8 
Optimization Technique (for C) 
 Pointer is your friend, learn it, use it! (Esp. in memory limited 
environment) 
 Struct/Union is the best tool to pack data 
– C99 support bitwise struct. (Older version can use bitwise shifting and 
masking) 
– Variable order is important, in memory and in files. 
 If many boolean variable is needed, consider using "flag" 
implementation 
 #define can help to make better code readability, also can be used 
to define macro
9 
Optimization Technique 
Best Code Density != Best Code Performance 
(most of the time) 
 Decide before start to develop 
 Again use the compilation option accordingly
10 
Optimization Technique 
Algorithm is important, choose the one that 
best fit your application. 
 Searching, Sorting, Random, Hashing, etc 
Data Structure is more important! 
 But don't overkill, no need to use an indexed list, to 
store 10 single-attribute item, a simple array will do.
11 
Optimization Technique 
But sometimes overkill is good: 
 If the software need 128MB RAM, no harm giving 
512MB (Memory is cheap, Time is expensive to debug). 
Breathing room for further software modification. 
 A well structured program may not be needed for small 
project, but program only grows bigger and bigger.
12 
Optimization Technique 
Standards and Protocols are complicated, but 
they're there for some reason..... 
 Well designed, implemented, and tested. 
 Compatible with other software/hardware.
13 
Optimization Technique 
Don't reinvent the wheel, make the wheel 
better. 
 Google/Stack Overflow is your friend, see how other 
people implement and why. 
 If there's a (open source) library, why not use it?
14 
Mutex & ARM Exclusive Monitor 
What is Mutex? 
 Mutual Exclusion - Make sure no more than one 
concurrent process is in their critical section, so that 
race condition will not occur. Mutex only have two 
states -- "locked" and "unlocked".
15 
Mutex & ARM Exclusive Monitor 
Mutex vs Binary Semaphore 
 Semaphore is to control access by multiple process to a common resource. Typical 
producer-consumer problem can be solved using semaphore. 
 Binary semaphore is semaphore variable with only 0 or 1. Means only one process can 
access the resource at the time. Then how is it different with mutex? 
 Mutex have a owner concept, only the process who locks it can unlocks it. But 
semaphore doesn't, process A may be using the resource, and process B contain a 
bug which "release" the resource it doesn't own, and process C will attempt to use it, 
and causing concurrent access with process A.
16 
Mutex & ARM Exclusive Monitor 
ARM Exclusive Monitor 
 Each processor that supports 
exclusive accesses has a local 
monitor. As the name indicate, "local 
monitor" is to monitor the memory 
which is local (shared and non-shared). 
Whereas, "global monitor" is 
to monitor shared memory. Access to 
shared memory will be checked for 
both local and global monitor.
17 
Mutex & ARM Exclusive Monitor 
 Since during "exclusive" state, access to the specific memory is 
only done by one process/thread. 
 To enter the "exclusive" state, we must use LDREX, if we decided 
to update the value, we use STREX to update the value and return 
to the "open access" state; otherwise it is advisable that we use 
CLREX to return to "open access" state without change any value. 
 If we attempt to use STREX during "open access" state, it will fail. 
For safety purpose, it is necessary to check if an operation has 
been succeeded.

Arm developement

  • 1.
    1 Embedded System- ARM Development and Optimization Sharing Session  Thumb2 Conditional Code  Benchmark using cycle count  Optimization Technique  Mutex & ARM Exclusive Monitor 竹內宏輝 hirokiht <hiroki04030@yahoo.com>
  • 2.
    2 Thumb2 ConditionalCode Thumb Thumb2 ISA 16bit 16bit/32bit Conditional Only supported in Code branch Use IT-block Bitwise - BFC/BFI/SBFX /UBFX Table-branch TBB/TBH List of Condition Code:  Eq, hs, mi, vs, hi, ge, gt, al, ne, lo, pl, vc, ls, lt, le  Example: ITETT EQ MOVEQ r0, #1 MOVNE r0, #0 MOVEQ r1, #0 MOVEQ r2, #0
  • 3.
  • 4.
    4 Benchmark usingcycle count Naive Approach:  Using cycle count/systick/system time difference, less is better.  QEMU doesn't have systick, so can't be emulated. Have to use real hardware. Commercial Approach  Use SDK/Development Kit/Workbench, highly emulated mcu. Very expensive....
  • 5.
    5 Benchmark usingcycle count My Thoughts:  For low-end hardware, hardware is cheaper than the commercial development tool, just buy and plug it in. The best benchmark is to do a REAL benchmark on hardware.
  • 6.
    6 Optimization Technique Choose the best compilation option, fully utilize the hardware (if possible)  Hardware Multiplication and Division  DSP  FPU  DMA  Saturation  Thumb2 ISA  Hardware controller - Ethernet, Graphic, SDIO, RTC, Encoder/Decoder, etc
  • 7.
    7 Optimization Technique  If the operation involve 2n, think again, maybe bitwise operator can help  Multiplication x<<n = x*2^n  Division x >> n = x/2^n (Note, lose precision)  Modulus x&(n-1) = x%2^n  Check is 2^n or not: x & (x-1) == 0  Swap (without using tmp): – x ^= y; y ^= x; x ^= y;
  • 8.
    8 Optimization Technique(for C)  Pointer is your friend, learn it, use it! (Esp. in memory limited environment)  Struct/Union is the best tool to pack data – C99 support bitwise struct. (Older version can use bitwise shifting and masking) – Variable order is important, in memory and in files.  If many boolean variable is needed, consider using "flag" implementation  #define can help to make better code readability, also can be used to define macro
  • 9.
    9 Optimization Technique Best Code Density != Best Code Performance (most of the time)  Decide before start to develop  Again use the compilation option accordingly
  • 10.
    10 Optimization Technique Algorithm is important, choose the one that best fit your application.  Searching, Sorting, Random, Hashing, etc Data Structure is more important!  But don't overkill, no need to use an indexed list, to store 10 single-attribute item, a simple array will do.
  • 11.
    11 Optimization Technique But sometimes overkill is good:  If the software need 128MB RAM, no harm giving 512MB (Memory is cheap, Time is expensive to debug). Breathing room for further software modification.  A well structured program may not be needed for small project, but program only grows bigger and bigger.
  • 12.
    12 Optimization Technique Standards and Protocols are complicated, but they're there for some reason.....  Well designed, implemented, and tested.  Compatible with other software/hardware.
  • 13.
    13 Optimization Technique Don't reinvent the wheel, make the wheel better.  Google/Stack Overflow is your friend, see how other people implement and why.  If there's a (open source) library, why not use it?
  • 14.
    14 Mutex &ARM Exclusive Monitor What is Mutex?  Mutual Exclusion - Make sure no more than one concurrent process is in their critical section, so that race condition will not occur. Mutex only have two states -- "locked" and "unlocked".
  • 15.
    15 Mutex &ARM Exclusive Monitor Mutex vs Binary Semaphore  Semaphore is to control access by multiple process to a common resource. Typical producer-consumer problem can be solved using semaphore.  Binary semaphore is semaphore variable with only 0 or 1. Means only one process can access the resource at the time. Then how is it different with mutex?  Mutex have a owner concept, only the process who locks it can unlocks it. But semaphore doesn't, process A may be using the resource, and process B contain a bug which "release" the resource it doesn't own, and process C will attempt to use it, and causing concurrent access with process A.
  • 16.
    16 Mutex &ARM Exclusive Monitor ARM Exclusive Monitor  Each processor that supports exclusive accesses has a local monitor. As the name indicate, "local monitor" is to monitor the memory which is local (shared and non-shared). Whereas, "global monitor" is to monitor shared memory. Access to shared memory will be checked for both local and global monitor.
  • 17.
    17 Mutex &ARM Exclusive Monitor  Since during "exclusive" state, access to the specific memory is only done by one process/thread.  To enter the "exclusive" state, we must use LDREX, if we decided to update the value, we use STREX to update the value and return to the "open access" state; otherwise it is advisable that we use CLREX to return to "open access" state without change any value.  If we attempt to use STREX during "open access" state, it will fail. For safety purpose, it is necessary to check if an operation has been succeeded.