#ESCBOS #ESCBOS
From	
  Hw	
  to	
  Sw:	
  Parallel	
  Logic	
  Applied	
  to	
  Event-­‐Driven	
  Firmware	
  
Jonny	
  Doin	
  –	
  GridVortex	
  
#ESCBOS
From  Hardware  to  Firmware
•	
  Introduc+on	
  
•	
  Mul+tasking:	
  the	
  holy	
  grail	
  of	
  compu+ng	
  
•	
  Parallel	
  compu+ng	
  and	
  VHDL	
  	
  
•	
  process()	
  and	
  sequen+al	
  parallel	
  logic	
  
•	
  Signals	
  and	
  Sensi+vity	
  lists	
  in	
  VHDL	
  
•	
  Signals	
  and	
  Sensi+vity	
  lists	
  in	
  Firmware	
  
•	
  Bit-­‐banding	
  on	
  Cortex-­‐M	
  
•	
  Event-­‐driven	
  scheduling	
  
•	
  Hardware	
  scheduling	
  and	
  Mul+core	
  µC	
  
•	
  Final	
  thoughts	
  
#ESCBOS
Intro
In	
  this	
  talk	
  we	
  will	
  see:	
  
•	
  Architectural	
  aspects	
  of	
  mul+-­‐tasking	
  
•	
  Some	
  techniques	
  for	
  implemen+ng	
  event-­‐driven	
  firmware	
  
•	
  Concepts	
  of	
  Hardware	
  Design	
  that	
  can	
  be	
  applied	
  to	
  Firmware	
  
development	
  
#ESCBOS
Mul3tasking
Mul+tasking	
  is	
  one	
  of	
  the	
  most	
  
important	
  concepts	
  of	
  modern	
  
compu+ng.	
  
Efficient	
  use	
  of	
  processing	
  bandwidth	
  
affects	
  energy	
  and	
  real-­‐+me	
  response.	
  
Microcontrollers	
  with	
  over	
  200MIPS	
  are	
  
becoming	
  very	
  accessible	
  to	
  even	
  the	
  
smallest	
  applica+ons.	
  
hRps://s-­‐media-­‐cache-­‐ak0.pinimg.com/736x/d5/6e/06/d56e06a6441353a405456bbdc29df294.jpg	
  
#ESCBOS
Mul3tasking  (2)
Mul+tasking	
  can	
  be	
  described	
  as	
  simula+on	
  of	
  a	
  
parallel	
  processing	
  system	
  using	
  a	
  smaller	
  
number	
  of	
  sequen+al	
  processors.	
  
Several	
  mul+tasking	
  schemes	
  evolved	
  over	
  +me	
  
for	
  tradi+onal	
  compu+ng	
  systems:	
  
•  Priority-­‐based	
  scheduling	
  and	
  mul+threading	
  
•  Collabora+ve	
  mul+tasking	
  
•  Interrupt-­‐based	
  real	
  +me	
  systems	
  
•  Event-­‐driven	
  mul+tasking	
  
#ESCBOS
Mul3tasking  (3)
Mul+tasking	
  schemes	
  are	
  a	
  compromise:	
  
•  Cost	
  of	
  scheduling	
  
•  System	
  blocking	
  +me	
  
•  Effec+ve	
  processing	
  bandwidth	
  
•  System	
  response	
  +me	
  
USER	
  TASK	
  
CPU	
  TIME	
  
SCHEDULER	
  
CPU	
  TIME	
  
#ESCBOS
Parallel  processing  and  VHDL
Truly	
  parallel	
  systems	
  can	
  be	
  implemented	
  in	
  
digital	
  hardware.	
  
Languages	
  to	
  describe	
  and	
  design	
  such	
  
systems	
  have	
  specific	
  language	
  features	
  to	
  
describe	
  parallel	
  logic.	
  
VHDL	
  uses	
  a	
  state-­‐based	
  model	
  to	
  describe	
  
parallel	
  processing.	
  
#ESCBOS
process()  and  parallel  logic
In	
  VHDL,	
  sec+ons	
  of	
  sequen+al	
  logic	
  that	
  run	
  in	
  parallel	
  with	
  the	
  rest	
  of	
  the	
  system	
  
are	
  defined	
  using	
  the	
  process()	
  structure:	
  
!
counter: process (clk_i, cnt_clear) is
begin
if cnt_clear = '1' then
cnt_reg <= 0;
else
if clk_i'event and clk_i = '1' then
if cnt_ce = '1' then
cnt_reg <= cnt_next;
end if;
end if;
end if;
end process counter;
cnt_next <= cnt_reg + 1 when cnt_top = '0' else cnt_reg;
Register,	
  sequen+al	
  logic	
  
Adder,	
  combina+onal	
  logic	
  
#ESCBOS
Signals  and  sensi3vity  lists
The	
  process()	
  defini+on	
  includes	
  a	
  list	
  of	
  signals:	
  
process (clk_i, cnt_clear)
Logic	
  in	
  the	
  process()	
  is	
  only	
  “executed”	
  when	
  any	
  signals	
  declared	
  on	
  its	
  
sensi(vity	
  list	
  change	
  state.	
  	
  
Any	
  other	
  logic	
  in	
  the	
  circuit	
  can	
  alter	
  the	
  state	
  of	
  these	
  signals,	
  and	
  when	
  that	
  
happens,	
  the	
  process	
  is	
  executed.	
  
The	
  signals	
  in	
  VHDL	
  have	
  much	
  more	
  to	
  them.	
  They	
  have	
  a	
  “transac+on	
  +meline”	
  
and	
  support	
  future	
  transac+ons	
  to	
  be	
  scheduled	
  on	
  the	
  signal.	
  	
  
#ESCBOS
Signals  and  sensi3vity  lists  (2)
VHDL	
  sensi+vity	
  lists:	
  
•  Simple	
  state-­‐based,	
  event-­‐driven	
  paradigm	
  
•  Simulate	
  parallel	
  hardware	
  logic	
  
•  Simulators	
  use	
  processing	
  bandwidth	
  efficiently	
  
The	
  paradigm	
  is	
  based	
  on	
  the	
  delta	
  cycle,	
  a	
  concept	
  similar	
  to	
  an	
  execu(on	
  pass	
  of	
  
the	
  logic.	
  All	
  signals	
  will	
  be	
  assigned	
  their	
  values	
  only	
  at	
  the	
  end	
  of	
  the	
  current	
  
delta	
  cycle.	
  	
  
#ESCBOS
Signals  and  sensi3vity  lists  (3)
The	
  VHDL	
  concepts	
  of	
  process()	
  with	
  sensi+vity	
  lists	
  and	
  delta	
  cycles	
  
can	
  be	
  implemented	
  in	
  a	
  bare-­‐metal	
  firmware	
  to	
  achieve	
  mul+tasking	
  
with	
  low	
  processing	
  cost.	
  
The	
  benefits	
  of	
  these	
  elements	
  of	
  mul+tasking	
  are:	
  
•  Fast	
  event-­‐driven	
  scheduling	
  
•  Structural	
  integrity	
  of	
  the	
  logic	
  
•  Scalability	
  for	
  mul+core	
  systems	
  
#ESCBOS
Bit-­‐banding  on  Cortex-­‐M
ARM	
  Cortex-­‐M	
  cores	
  have	
  dedicated	
  memory	
  addressing	
  hardware	
  to	
  
implement	
  atomic	
  bit-­‐access	
  in	
  memory	
  without	
  read-­‐modify-­‐write	
  
ar+facts.	
  	
  
•  bit-­‐signals	
  can	
  be	
  used	
  as	
  efficient	
  Inter	
  Process	
  Communica+on	
  (IPC)	
  
•  Fastest	
  atomic	
  opera+ons	
  in	
  a	
  Cortex-­‐M	
  (faster	
  than	
  STREX/LDREX)	
  
•  Map	
  to	
  a	
  special	
  area	
  in	
  RAM	
  
#ESCBOS
Bit-­‐banding  on  Cortex-­‐M  (2)
System Control Space (SCS) and debug components.
Priority is always given to the processor to ensure that any debug accesses are as non-intrusive
as possible. For a zero wait state system, all debug accesses to system memory, SCS, and debug
resources are completely non-intrusive.
Figure 3-1 shows the system address map.
Figure 3-1 System address map
Table 3-3 shows the processor interfaces that are addressed by the different memory map
regions.
System
External device
External RAM
Peripheral
SRAM
Code
0xFFFFFFFF
Private peripheral bus - External
0xE0100000
0xE0040000
0xA0000000
0x60000000
0x40000000
0x20000000
0x00000000
ROM Table
ETM
TPIU
Reserved
SCS
Reserved
FPB
DWT
ITM
External PPB
0xE0042000
0xE0041000
0xE0040000
0xE000F000
0xE000E000
0xE0003000
0xE0002000
0xE00FF000
0x40000000
Bit band region
Bit band alias32MB
1MB
31MB
0x40100000
0x42000000
0x44000000
0xE0001000
0xE0000000
Private peripheral bus - Internal
Bit band region
Bit band alias32MB
1MB
31MB
0x20000000
0x20100000
0x22000000
1.0GB
1.0GB
0.5GB
0.5GB
0.5GB
0xE0000000
0xE0100000
0xE0040000
0x24000000
•  Hardware	
  remapping	
  of	
  accesses	
  
•  Known	
  adresses	
  for	
  any	
  Cortex-­‐M	
  
•  Atomic	
  writes	
  on	
  individual	
  bits	
  
•  Simultaneous	
  reads	
  on	
  all	
  32bits	
  
source:	
  ARM	
  DDI	
  0439C,	
  page	
  3-­‐20	
  
#ESCBOS
Bit-­‐banding  on  Cortex-­‐M  (3)
Bit-­‐banding	
  memory	
  remap	
  
structure:	
  
•  Words	
  (32bit)	
  in	
  the	
  alias	
  
region	
  map	
  to	
  individual	
  
bits	
  in	
  the	
  normal	
  SRAM	
  
memory	
  
•  The	
  remapped	
  writes	
  are	
  
guaranteed	
  atomic	
  
ProgrammersModel
• The alias word at 0x2200001C maps to bit [7] of the bit-band byte at 0x20000000: 0x2200001C
= 0x22000000 + (0*32) + 7*4.
Figure 3-2 Bit-band mapping
0x23FFFFE4
0x22000004
0x23FFFFE00x23FFFFE80x23FFFFEC0x23FFFFF00x23FFFFF40x23FFFFF80x23FFFFFC
0x220000000x220000140x220000180x2200001C 0x220000080x22000010 0x2200000C
32MB alias region
0
7 0
07
0x200000000x200000010x200000020x20000003
6 5 4 3 2 1 07 6 5 4 3 2 1 7 6 5 4 3 2 1 07 6 5 4 3 2 1
07 6 5 4 3 2 1 6 5 4 3 2 107 6 5 4 3 2 1 07 6 5 4 3 2 1
0x200FFFFC0x200FFFFD0x200FFFFE0x200FFFFF
1MB SRAM bit-band region
source:	
  ARM	
  DDI	
  0439C,	
  page	
  3-­‐20	
  
#ESCBOS
Event-­‐driven  scheduling
Using	
  the	
  concepts	
  from	
  VHDL	
  and	
  the	
  atomic	
  Bit-­‐banding	
  from	
  
Cortex-M	
  it	
  is	
  possible	
  to:	
  
•  Implement	
  event-­‐driven	
  mul+tasking	
  
•  Have	
  process()-­‐like	
  handlers	
  with	
  light	
  overhead	
  
•  Implement	
  state	
  machine	
  logic	
  efficiently	
  
•  Use	
  bit	
  signals	
  as	
  efficient	
  IPC	
  
#ESCBOS
Event-­‐driven  scheduling  (2)
typedef uint32_t * PFLAGS_T;
typedef volatile struct ipc_flags_t { // any object of this type is volatile qualified
PFLAGS_T pflags_bits; // Ptr to the 'bit bandable' word with 32 ipc bits
PFLAGS_T pflags_base; // Ptr to the base of the word alias array
} IPC_FLAGS_T;
// for the ipc macros, pass a IPC_FLAGS_T struct
#define get_bit(flags, bit) ((flags).pflags_base[(bit)])
#define set_bit(flags, bit) ((flags).pflags_base[(bit)] = 1)
#define clr_bit(flags, bit) ((flags).pflags_base[(bit)] = 0)
#define toggle(flags, bit) ((flags).pflags_base[(bit)] ^= 1)
#define event(flags, bit) (get_bit((flags), (bit)) ? ((clr_bit((flags), (bit))), 1) : 0)
#define clr_bits(flags) (*((flags).pflags_bits) = 0)
#define get_bits(flags, bitmask) (*((flags).pflags_bits) & (bitmask))
extern void init_ipc(void);
extern uint32_t request_ipc_word(IPC_FLAGS_T *pflags);
#ESCBOS
Event-­‐driven  scheduling  (3)
#define set_bit(flags, bit) ((flags).pflags_base[(bit)] = 1)
so:
set_bit(my_flags, 7);
translates to:
myflags.pflags_base[7] = 1;
where:
IPC_FLAGS_T myflags;
myflags.pflags_base = (PFLAGS_T) 0x22000000;
myflags.pflags_bits = (PFLAGS_T) 0x20000000;
...	
  
0x00000001	
  
bit-­‐band	
  alias	
  area	
  
0x22000000	
  
0x22000080	
  
bit-­‐band	
  region	
  0x00000080	
  0x20000000	
  
#ESCBOS
Event-­‐driven  scheduling  (4)
#define event(flags, bit) (get_bit((flags), (bit)) ? ((clr_bit((flags), (bit))), 1) : 0)
so:
if(event(my_flags, 7))
{
...
}
translates to:
if(((myflags.pflags_base[7] = 0), 1))
after evaluation of the side effect, becomes:
if((1))
comma	
  operator	
  
side	
  effect	
  part	
   result	
  
#ESCBOS
Event-­‐driven  scheduling  (5)
enum keypad_bits_t {
bit_keypad_value_update = 0,
bit_keypressed_wait,
bit_refresh_debounce_tmr,
};
void process_keypad(void)
{
if(event_refresh_debounce_tmr())
{
keypad_data.debounce_tmr = KEYPAD_DEBOUNCE_TIME;
keypad_data.state = KEYPAD_DEBOUNCE;
}
...
}
static void trigger_keypad_update(void *object)
{
keypad_data.latched = read_keypad_value();
set_bit_refresh_debounce_tmr();
}
#ESCBOS
Event-­‐driven  scheduling  (6)
This	
  event-­‐driven	
  architecture:	
  
•  Is	
  simple	
  to	
  implement	
  
•  Scales	
  well	
  even	
  with	
  mul+core	
  Cortex-­‐M	
  systems	
  
•  Improves	
  processing	
  granularity	
  
•  Can	
  be	
  implemented	
  in	
  hardware	
  on	
  ARM+FPGA	
  systems	
  
#ESCBOS
Hardware  scheduling
The	
  event-­‐driven	
  scheduling	
  can	
  be	
  implemented	
  directly	
  in	
  hardware	
  
on	
  a	
  ARM+FPGA	
  system.	
  
Instead	
  of	
  using	
  a	
  round-­‐robin	
  cycle	
  in	
  firmware,	
  the	
  underlying	
  
hardware	
  can	
  place	
  a	
  “call”	
  to	
  each	
  process()	
  according	
  to	
  its	
  
sensi+vity	
  list.	
  
This	
  approach	
  can	
  reduce	
  overhead	
  to	
  a	
  few	
  instruc+on	
  cycles	
  for	
  a	
  
very	
  responsive	
  real+me	
  system.	
  
#ESCBOS
Mul3core  Cortex-­‐M  devices
The	
  event-­‐driven	
  paradigm	
  can	
  be	
  effec+vely	
  implemented	
  in	
  a	
  
mul+core	
  Cortex-­‐M	
  system	
  with	
  common	
  memory.	
  
hRp://hothardware.com/newsimages/Item9563/cortex-­‐m3-­‐arm-­‐cpu.png	
  
BUX	
  MATRIX	
  
SHARED	
  	
  
RAM	
  
SHARED	
  FLASH	
  
This	
  approach	
  simplifies	
  system	
  par++oning	
  
on	
  the	
  processor	
  cores,	
  and	
  can	
  decrease	
  
system	
  response	
  +me	
  for	
  event-­‐driven	
  bare-­‐
metal	
  logic.	
  
Even	
  when	
  no	
  bit-­‐banding	
  is	
  available	
  in	
  the	
  
shared	
  memory,	
  atomic	
  events	
  can	
  be	
  used.	
  
#ESCBOS
Final  Thoughts
The	
  event-­‐driven	
  paradigm	
  is	
  a	
  powerful	
  and	
  scalable	
  architectural	
  
structure.	
  
It	
  is	
  being	
  used	
  in	
  bare-­‐metal	
  embedded	
  systems	
  with	
  300KLOC+.	
  
If	
  coupled	
  with	
  hardware	
  scheduling	
  support,	
  it	
  can	
  be	
  used	
  to	
  
implement	
  very	
  fast	
  event	
  response	
  systems	
  that	
  are	
  very	
  hard	
  to	
  
implement	
  with	
  priority-­‐based	
  schedulers.	
  
#ESCBOS
Thank	
  you	
  
Jonny	
  Doin	
  
jonnydoin@gridvortex.com	
  
	
  

ParallelLogicToEventDrivenFirmware_Doin

  • 1.
    #ESCBOS #ESCBOS From  Hw  to  Sw:  Parallel  Logic  Applied  to  Event-­‐Driven  Firmware   Jonny  Doin  –  GridVortex  
  • 2.
    #ESCBOS From  Hardware  to Firmware •  Introduc+on   •  Mul+tasking:  the  holy  grail  of  compu+ng   •  Parallel  compu+ng  and  VHDL     •  process()  and  sequen+al  parallel  logic   •  Signals  and  Sensi+vity  lists  in  VHDL   •  Signals  and  Sensi+vity  lists  in  Firmware   •  Bit-­‐banding  on  Cortex-­‐M   •  Event-­‐driven  scheduling   •  Hardware  scheduling  and  Mul+core  µC   •  Final  thoughts  
  • 3.
    #ESCBOS Intro In  this  talk  we  will  see:   •  Architectural  aspects  of  mul+-­‐tasking   •  Some  techniques  for  implemen+ng  event-­‐driven  firmware   •  Concepts  of  Hardware  Design  that  can  be  applied  to  Firmware   development  
  • 4.
    #ESCBOS Mul3tasking Mul+tasking  is  one  of  the  most   important  concepts  of  modern   compu+ng.   Efficient  use  of  processing  bandwidth   affects  energy  and  real-­‐+me  response.   Microcontrollers  with  over  200MIPS  are   becoming  very  accessible  to  even  the   smallest  applica+ons.   hRps://s-­‐media-­‐cache-­‐ak0.pinimg.com/736x/d5/6e/06/d56e06a6441353a405456bbdc29df294.jpg  
  • 5.
    #ESCBOS Mul3tasking  (2) Mul+tasking  can  be  described  as  simula+on  of  a   parallel  processing  system  using  a  smaller   number  of  sequen+al  processors.   Several  mul+tasking  schemes  evolved  over  +me   for  tradi+onal  compu+ng  systems:   •  Priority-­‐based  scheduling  and  mul+threading   •  Collabora+ve  mul+tasking   •  Interrupt-­‐based  real  +me  systems   •  Event-­‐driven  mul+tasking  
  • 6.
    #ESCBOS Mul3tasking  (3) Mul+tasking  schemes  are  a  compromise:   •  Cost  of  scheduling   •  System  blocking  +me   •  Effec+ve  processing  bandwidth   •  System  response  +me   USER  TASK   CPU  TIME   SCHEDULER   CPU  TIME  
  • 7.
    #ESCBOS Parallel  processing  and VHDL Truly  parallel  systems  can  be  implemented  in   digital  hardware.   Languages  to  describe  and  design  such   systems  have  specific  language  features  to   describe  parallel  logic.   VHDL  uses  a  state-­‐based  model  to  describe   parallel  processing.  
  • 8.
    #ESCBOS process()  and  parallel logic In  VHDL,  sec+ons  of  sequen+al  logic  that  run  in  parallel  with  the  rest  of  the  system   are  defined  using  the  process()  structure:   ! counter: process (clk_i, cnt_clear) is begin if cnt_clear = '1' then cnt_reg <= 0; else if clk_i'event and clk_i = '1' then if cnt_ce = '1' then cnt_reg <= cnt_next; end if; end if; end if; end process counter; cnt_next <= cnt_reg + 1 when cnt_top = '0' else cnt_reg; Register,  sequen+al  logic   Adder,  combina+onal  logic  
  • 9.
    #ESCBOS Signals  and  sensi3vity lists The  process()  defini+on  includes  a  list  of  signals:   process (clk_i, cnt_clear) Logic  in  the  process()  is  only  “executed”  when  any  signals  declared  on  its   sensi(vity  list  change  state.     Any  other  logic  in  the  circuit  can  alter  the  state  of  these  signals,  and  when  that   happens,  the  process  is  executed.   The  signals  in  VHDL  have  much  more  to  them.  They  have  a  “transac+on  +meline”   and  support  future  transac+ons  to  be  scheduled  on  the  signal.    
  • 10.
    #ESCBOS Signals  and  sensi3vity lists  (2) VHDL  sensi+vity  lists:   •  Simple  state-­‐based,  event-­‐driven  paradigm   •  Simulate  parallel  hardware  logic   •  Simulators  use  processing  bandwidth  efficiently   The  paradigm  is  based  on  the  delta  cycle,  a  concept  similar  to  an  execu(on  pass  of   the  logic.  All  signals  will  be  assigned  their  values  only  at  the  end  of  the  current   delta  cycle.    
  • 11.
    #ESCBOS Signals  and  sensi3vity lists  (3) The  VHDL  concepts  of  process()  with  sensi+vity  lists  and  delta  cycles   can  be  implemented  in  a  bare-­‐metal  firmware  to  achieve  mul+tasking   with  low  processing  cost.   The  benefits  of  these  elements  of  mul+tasking  are:   •  Fast  event-­‐driven  scheduling   •  Structural  integrity  of  the  logic   •  Scalability  for  mul+core  systems  
  • 12.
    #ESCBOS Bit-­‐banding  on  Cortex-­‐M ARM  Cortex-­‐M  cores  have  dedicated  memory  addressing  hardware  to   implement  atomic  bit-­‐access  in  memory  without  read-­‐modify-­‐write   ar+facts.     •  bit-­‐signals  can  be  used  as  efficient  Inter  Process  Communica+on  (IPC)   •  Fastest  atomic  opera+ons  in  a  Cortex-­‐M  (faster  than  STREX/LDREX)   •  Map  to  a  special  area  in  RAM  
  • 13.
    #ESCBOS Bit-­‐banding  on  Cortex-­‐M (2) System Control Space (SCS) and debug components. Priority is always given to the processor to ensure that any debug accesses are as non-intrusive as possible. For a zero wait state system, all debug accesses to system memory, SCS, and debug resources are completely non-intrusive. Figure 3-1 shows the system address map. Figure 3-1 System address map Table 3-3 shows the processor interfaces that are addressed by the different memory map regions. System External device External RAM Peripheral SRAM Code 0xFFFFFFFF Private peripheral bus - External 0xE0100000 0xE0040000 0xA0000000 0x60000000 0x40000000 0x20000000 0x00000000 ROM Table ETM TPIU Reserved SCS Reserved FPB DWT ITM External PPB 0xE0042000 0xE0041000 0xE0040000 0xE000F000 0xE000E000 0xE0003000 0xE0002000 0xE00FF000 0x40000000 Bit band region Bit band alias32MB 1MB 31MB 0x40100000 0x42000000 0x44000000 0xE0001000 0xE0000000 Private peripheral bus - Internal Bit band region Bit band alias32MB 1MB 31MB 0x20000000 0x20100000 0x22000000 1.0GB 1.0GB 0.5GB 0.5GB 0.5GB 0xE0000000 0xE0100000 0xE0040000 0x24000000 •  Hardware  remapping  of  accesses   •  Known  adresses  for  any  Cortex-­‐M   •  Atomic  writes  on  individual  bits   •  Simultaneous  reads  on  all  32bits   source:  ARM  DDI  0439C,  page  3-­‐20  
  • 14.
    #ESCBOS Bit-­‐banding  on  Cortex-­‐M (3) Bit-­‐banding  memory  remap   structure:   •  Words  (32bit)  in  the  alias   region  map  to  individual   bits  in  the  normal  SRAM   memory   •  The  remapped  writes  are   guaranteed  atomic   ProgrammersModel • The alias word at 0x2200001C maps to bit [7] of the bit-band byte at 0x20000000: 0x2200001C = 0x22000000 + (0*32) + 7*4. Figure 3-2 Bit-band mapping 0x23FFFFE4 0x22000004 0x23FFFFE00x23FFFFE80x23FFFFEC0x23FFFFF00x23FFFFF40x23FFFFF80x23FFFFFC 0x220000000x220000140x220000180x2200001C 0x220000080x22000010 0x2200000C 32MB alias region 0 7 0 07 0x200000000x200000010x200000020x20000003 6 5 4 3 2 1 07 6 5 4 3 2 1 7 6 5 4 3 2 1 07 6 5 4 3 2 1 07 6 5 4 3 2 1 6 5 4 3 2 107 6 5 4 3 2 1 07 6 5 4 3 2 1 0x200FFFFC0x200FFFFD0x200FFFFE0x200FFFFF 1MB SRAM bit-band region source:  ARM  DDI  0439C,  page  3-­‐20  
  • 15.
    #ESCBOS Event-­‐driven  scheduling Using  the  concepts  from  VHDL  and  the  atomic  Bit-­‐banding  from   Cortex-M  it  is  possible  to:   •  Implement  event-­‐driven  mul+tasking   •  Have  process()-­‐like  handlers  with  light  overhead   •  Implement  state  machine  logic  efficiently   •  Use  bit  signals  as  efficient  IPC  
  • 16.
    #ESCBOS Event-­‐driven  scheduling  (2) typedefuint32_t * PFLAGS_T; typedef volatile struct ipc_flags_t { // any object of this type is volatile qualified PFLAGS_T pflags_bits; // Ptr to the 'bit bandable' word with 32 ipc bits PFLAGS_T pflags_base; // Ptr to the base of the word alias array } IPC_FLAGS_T; // for the ipc macros, pass a IPC_FLAGS_T struct #define get_bit(flags, bit) ((flags).pflags_base[(bit)]) #define set_bit(flags, bit) ((flags).pflags_base[(bit)] = 1) #define clr_bit(flags, bit) ((flags).pflags_base[(bit)] = 0) #define toggle(flags, bit) ((flags).pflags_base[(bit)] ^= 1) #define event(flags, bit) (get_bit((flags), (bit)) ? ((clr_bit((flags), (bit))), 1) : 0) #define clr_bits(flags) (*((flags).pflags_bits) = 0) #define get_bits(flags, bitmask) (*((flags).pflags_bits) & (bitmask)) extern void init_ipc(void); extern uint32_t request_ipc_word(IPC_FLAGS_T *pflags);
  • 17.
    #ESCBOS Event-­‐driven  scheduling  (3) #defineset_bit(flags, bit) ((flags).pflags_base[(bit)] = 1) so: set_bit(my_flags, 7); translates to: myflags.pflags_base[7] = 1; where: IPC_FLAGS_T myflags; myflags.pflags_base = (PFLAGS_T) 0x22000000; myflags.pflags_bits = (PFLAGS_T) 0x20000000; ...   0x00000001   bit-­‐band  alias  area   0x22000000   0x22000080   bit-­‐band  region  0x00000080  0x20000000  
  • 18.
    #ESCBOS Event-­‐driven  scheduling  (4) #defineevent(flags, bit) (get_bit((flags), (bit)) ? ((clr_bit((flags), (bit))), 1) : 0) so: if(event(my_flags, 7)) { ... } translates to: if(((myflags.pflags_base[7] = 0), 1)) after evaluation of the side effect, becomes: if((1)) comma  operator   side  effect  part   result  
  • 19.
    #ESCBOS Event-­‐driven  scheduling  (5) enumkeypad_bits_t { bit_keypad_value_update = 0, bit_keypressed_wait, bit_refresh_debounce_tmr, }; void process_keypad(void) { if(event_refresh_debounce_tmr()) { keypad_data.debounce_tmr = KEYPAD_DEBOUNCE_TIME; keypad_data.state = KEYPAD_DEBOUNCE; } ... } static void trigger_keypad_update(void *object) { keypad_data.latched = read_keypad_value(); set_bit_refresh_debounce_tmr(); }
  • 20.
    #ESCBOS Event-­‐driven  scheduling  (6) This  event-­‐driven  architecture:   •  Is  simple  to  implement   •  Scales  well  even  with  mul+core  Cortex-­‐M  systems   •  Improves  processing  granularity   •  Can  be  implemented  in  hardware  on  ARM+FPGA  systems  
  • 21.
    #ESCBOS Hardware  scheduling The  event-­‐driven  scheduling  can  be  implemented  directly  in  hardware   on  a  ARM+FPGA  system.   Instead  of  using  a  round-­‐robin  cycle  in  firmware,  the  underlying   hardware  can  place  a  “call”  to  each  process()  according  to  its   sensi+vity  list.   This  approach  can  reduce  overhead  to  a  few  instruc+on  cycles  for  a   very  responsive  real+me  system.  
  • 22.
    #ESCBOS Mul3core  Cortex-­‐M  devices The  event-­‐driven  paradigm  can  be  effec+vely  implemented  in  a   mul+core  Cortex-­‐M  system  with  common  memory.   hRp://hothardware.com/newsimages/Item9563/cortex-­‐m3-­‐arm-­‐cpu.png   BUX  MATRIX   SHARED     RAM   SHARED  FLASH   This  approach  simplifies  system  par++oning   on  the  processor  cores,  and  can  decrease   system  response  +me  for  event-­‐driven  bare-­‐ metal  logic.   Even  when  no  bit-­‐banding  is  available  in  the   shared  memory,  atomic  events  can  be  used.  
  • 23.
    #ESCBOS Final  Thoughts The  event-­‐driven  paradigm  is  a  powerful  and  scalable  architectural   structure.   It  is  being  used  in  bare-­‐metal  embedded  systems  with  300KLOC+.   If  coupled  with  hardware  scheduling  support,  it  can  be  used  to   implement  very  fast  event  response  systems  that  are  very  hard  to   implement  with  priority-­‐based  schedulers.  
  • 24.
    #ESCBOS Thank  you   Jonny  Doin   jonnydoin@gridvortex.com