ParallelLogicToEventDrivenFirmware_Doin

#ESCBOS #ESCBOS
From
Hw
to
Sw:
Parallel
Logic
Applied
to
Event-‐Driven
Firmware

Jonny
Doin
–
GridVortex

#ESCBOS
From Hardware to Firmware
•
Introduc+on

•
Mul+tasking:
the
holy
grail
of
compu+ng

•
Parallel
compu+ng
and
VHDL

•
process()
and
sequen+al
parallel
logic

•
Signals
and
Sensi+vity
lists
in
VHDL

•
Signals
and
Sensi+vity
lists
in
Firmware

•
Bit-‐banding
on
Cortex-‐M

•
Event-‐driven
scheduling

•
Hardware
scheduling
and
Mul+core
µC

•
Final
thoughts

#ESCBOS
Intro
In
this
talk
we
will
see:

•
Architectural
aspects
of
mul+-‐tasking

•
Some
techniques
for
implemen+ng
event-‐driven
ﬁrmware

•
Concepts
of
Hardware
Design
that
can
be
applied
to
Firmware

development

#ESCBOS
Mul3tasking
Mul+tasking
is
one
of
the
most

important
concepts
of
modern

compu+ng.

Eﬃcient
use
of
processing
bandwidth

aﬀects
energy
and
real-‐+me
response.

Microcontrollers
with
over
200MIPS
are

becoming
very
accessible
to
even
the

smallest
applica+ons.

hRps://s-‐media-‐cache-‐ak0.pinimg.com/736x/d5/6e/06/d56e06a6441353a405456bbdc29df294.jpg

#ESCBOS
Mul3tasking (2)
Mul+tasking
can
be
described
as
simula+on
of
a

parallel
processing
system
using
a
smaller

number
of
sequen+al
processors.

Several
mul+tasking
schemes
evolved
over
+me

for
tradi+onal
compu+ng
systems:

•  Priority-‐based
scheduling
and
mul+threading

•  Collabora+ve
mul+tasking

•  Interrupt-‐based
real
+me
systems

•  Event-‐driven
mul+tasking

#ESCBOS
Mul3tasking (3)
Mul+tasking
schemes
are
a
compromise:

•  Cost
of
scheduling

•  System
blocking
+me

•  Eﬀec+ve
processing
bandwidth

•  System
response
+me

USER
TASK

CPU
TIME

SCHEDULER

CPU
TIME

#ESCBOS
Parallel processing and VHDL
Truly
parallel
systems
can
be
implemented
in

digital
hardware.

Languages
to
describe
and
design
such

systems
have
speciﬁc
language
features
to

describe
parallel
logic.

VHDL
uses
a
state-‐based
model
to
describe

parallel
processing.

#ESCBOS
process() and parallel logic
In
VHDL,
sec+ons
of
sequen+al
logic
that
run
in
parallel
with
the
rest
of
the
system

are
deﬁned
using
the
process()
structure:

!
counter: process (clk_i, cnt_clear) is
begin
if cnt_clear = '1' then
cnt_reg <= 0;
else
if clk_i'event and clk_i = '1' then
if cnt_ce = '1' then
cnt_reg <= cnt_next;
end if;
end if;
end if;
end process counter;
cnt_next <= cnt_reg + 1 when cnt_top = '0' else cnt_reg;
Register,
sequen+al
logic

Adder,
combina+onal
logic

#ESCBOS
Signals and sensi3vity lists
The
process()
deﬁni+on
includes
a
list
of
signals:

process (clk_i, cnt_clear)
Logic
in
the
process()
is
only
“executed”
when
any
signals
declared
on
its

sensi(vity
list
change
state.

Any
other
logic
in
the
circuit
can
alter
the
state
of
these
signals,
and
when
that

happens,
the
process
is
executed.

The
signals
in
VHDL
have
much
more
to
them.
They
have
a
“transac+on
+meline”

and
support
future
transac+ons
to
be
scheduled
on
the
signal.

#ESCBOS
Signals and sensi3vity lists (2)
VHDL
sensi+vity
lists:

•  Simple
state-‐based,
event-‐driven
paradigm

•  Simulate
parallel
hardware
logic

•  Simulators
use
processing
bandwidth
eﬃciently

The
paradigm
is
based
on
the
delta
cycle,
a
concept
similar
to
an
execu(on
pass
of

the
logic.
All
signals
will
be
assigned
their
values
only
at
the
end
of
the
current

delta
cycle.

#ESCBOS
Signals and sensi3vity lists (3)
The
VHDL
concepts
of
process()
with
sensi+vity
lists
and
delta
cycles

can
be
implemented
in
a
bare-‐metal
ﬁrmware
to
achieve
mul+tasking

with
low
processing
cost.

The
beneﬁts
of
these
elements
of
mul+tasking
are:

•  Fast
event-‐driven
scheduling

•  Structural
integrity
of
the
logic

•  Scalability
for
mul+core
systems

#ESCBOS
Bit-‐banding on Cortex-‐M
ARM
Cortex-‐M
cores
have
dedicated
memory
addressing
hardware
to

implement
atomic
bit-‐access
in
memory
without
read-‐modify-‐write

ar+facts.

•  bit-‐signals
can
be
used
as
eﬃcient
Inter
Process
Communica+on
(IPC)

•  Fastest
atomic
opera+ons
in
a
Cortex-‐M
(faster
than
STREX/LDREX)

•  Map
to
a
special
area
in
RAM

#ESCBOS
Bit-‐banding on Cortex-‐M (2)
System Control Space (SCS) and debug components.
Priority is always given to the processor to ensure that any debug accesses are as non-intrusive
as possible. For a zero wait state system, all debug accesses to system memory, SCS, and debug
resources are completely non-intrusive.
Figure 3-1 shows the system address map.
Figure 3-1 System address map
Table 3-3 shows the processor interfaces that are addressed by the different memory map
regions.
System
External device
External RAM
Peripheral
SRAM
Code
0xFFFFFFFF
Private peripheral bus - External
0xE0100000
0xE0040000
0xA0000000
0x60000000
0x40000000
0x20000000
0x00000000
ROM Table
ETM
TPIU
Reserved
SCS
Reserved
FPB
DWT
ITM
External PPB
0xE0042000
0xE0041000
0xE0040000
0xE000F000
0xE000E000
0xE0003000
0xE0002000
0xE00FF000
0x40000000
Bit band region
Bit band alias32MB
1MB
31MB
0x40100000
0x42000000
0x44000000
0xE0001000
0xE0000000
Private peripheral bus - Internal
Bit band region
Bit band alias32MB
1MB
31MB
0x20000000
0x20100000
0x22000000
1.0GB
1.0GB
0.5GB
0.5GB
0.5GB
0xE0000000
0xE0100000
0xE0040000
0x24000000
•  Hardware
remapping
of
accesses

•  Known
adresses
for
any
Cortex-‐M

•  Atomic
writes
on
individual
bits

•  Simultaneous
reads
on
all
32bits

source:
ARM
DDI
0439C,
page
3-‐20

#ESCBOS
Bit-‐banding on Cortex-‐M (3)
Bit-‐banding
memory
remap

structure:

•  Words
(32bit)
in
the
alias

region
map
to
individual

bits
in
the
normal
SRAM

memory

•  The
remapped
writes
are

guaranteed
atomic

ProgrammersModel
• The alias word at 0x2200001C maps to bit [7] of the bit-band byte at 0x20000000: 0x2200001C
= 0x22000000 + (0*32) + 7*4.
Figure 3-2 Bit-band mapping
0x23FFFFE4
0x22000004
0x23FFFFE00x23FFFFE80x23FFFFEC0x23FFFFF00x23FFFFF40x23FFFFF80x23FFFFFC
0x220000000x220000140x220000180x2200001C 0x220000080x22000010 0x2200000C
32MB alias region
0
7 0
07
0x200000000x200000010x200000020x20000003
6 5 4 3 2 1 07 6 5 4 3 2 1 7 6 5 4 3 2 1 07 6 5 4 3 2 1
07 6 5 4 3 2 1 6 5 4 3 2 107 6 5 4 3 2 1 07 6 5 4 3 2 1
0x200FFFFC0x200FFFFD0x200FFFFE0x200FFFFF
1MB SRAM bit-band region
source:
ARM
DDI
0439C,
page
3-‐20

#ESCBOS
Event-‐driven scheduling
Using
the
concepts
from
VHDL
and
the
atomic
Bit-‐banding
from

Cortex-M
it
is
possible
to:

•  Implement
event-‐driven
mul+tasking

•  Have
process()-‐like
handlers
with
light
overhead

•  Implement
state
machine
logic
eﬃciently

•  Use
bit
signals
as
eﬃcient
IPC

#ESCBOS
Event-‐driven scheduling (2)
typedef uint32_t * PFLAGS_T;
typedef volatile struct ipc_flags_t { // any object of this type is volatile qualified
PFLAGS_T pflags_bits; // Ptr to the 'bit bandable' word with 32 ipc bits
PFLAGS_T pflags_base; // Ptr to the base of the word alias array
} IPC_FLAGS_T;
// for the ipc macros, pass a IPC_FLAGS_T struct
#define get_bit(flags, bit) ((flags).pflags_base[(bit)])
#define set_bit(flags, bit) ((flags).pflags_base[(bit)] = 1)
#define clr_bit(flags, bit) ((flags).pflags_base[(bit)] = 0)
#define toggle(flags, bit) ((flags).pflags_base[(bit)] ^= 1)
#define event(flags, bit) (get_bit((flags), (bit)) ? ((clr_bit((flags), (bit))), 1) : 0)
#define clr_bits(flags) (*((flags).pflags_bits) = 0)
#define get_bits(flags, bitmask) (*((flags).pflags_bits) & (bitmask))
extern void init_ipc(void);
extern uint32_t request_ipc_word(IPC_FLAGS_T *pflags);

#ESCBOS
#define set_bit(flags, bit) ((flags).pflags_base[(bit)] = 1)
so:
set_bit(my_flags, 7);
translates to:
myflags.pflags_base[7] = 1;
where:
IPC_FLAGS_T myflags;
myflags.pflags_base = (PFLAGS_T) 0x22000000;
myflags.pflags_bits = (PFLAGS_T) 0x20000000;
...

0x00000001

bit-‐band
alias
area

0x22000000

0x22000080

bit-‐band
region
0x00000080
0x20000000

#ESCBOS
#define event(flags, bit) (get_bit((flags), (bit)) ? ((clr_bit((flags), (bit))), 1) : 0)
so:
if(event(my_flags, 7))
{
...
}
translates to:
if(((myflags.pflags_base[7] = 0), 1))
after evaluation of the side effect, becomes:
if((1))
comma
operator

side
eﬀect
part
result

#ESCBOS
enum keypad_bits_t {
bit_keypad_value_update = 0,
bit_keypressed_wait,
bit_refresh_debounce_tmr,
};
void process_keypad(void)
{
if(event_refresh_debounce_tmr())
{
keypad_data.debounce_tmr = KEYPAD_DEBOUNCE_TIME;
keypad_data.state = KEYPAD_DEBOUNCE;
}
...
}
static void trigger_keypad_update(void *object)
{
keypad_data.latched = read_keypad_value();
set_bit_refresh_debounce_tmr();
}

#ESCBOS
This
event-‐driven
architecture:

•  Is
simple
to
implement

•  Scales
well
even
with
mul+core
Cortex-‐M
systems

•  Improves
processing
granularity

•  Can
be
implemented
in
hardware
on
ARM+FPGA
systems

#ESCBOS
Hardware scheduling
The
event-‐driven
scheduling
can
be
implemented
directly
in
hardware

on
a
ARM+FPGA
system.

Instead
of
using
a
round-‐robin
cycle
in
ﬁrmware,
the
underlying

hardware
can
place
a
“call”
to
each
process()
according
to
its

sensi+vity
list.

This
approach
can
reduce
overhead
to
a
few
instruc+on
cycles
for
a

very
responsive
real+me
system.

#ESCBOS
Mul3core Cortex-‐M devices
The
event-‐driven
paradigm
can
be
eﬀec+vely
implemented
in
a

mul+core
Cortex-‐M
system
with
common
memory.

hRp://hothardware.com/newsimages/Item9563/cortex-‐m3-‐arm-‐cpu.png

BUX
MATRIX

SHARED

RAM

SHARED
FLASH

This
approach
simpliﬁes
system
par++oning

on
the
processor
cores,
and
can
decrease

system
response
+me
for
event-‐driven
bare-‐
metal
logic.

Even
when
no
bit-‐banding
is
available
in
the

shared
memory,
atomic
events
can
be
used.

#ESCBOS
Final Thoughts
The
event-‐driven
paradigm
is
a
powerful
and
scalable
architectural

structure.

It
is
being
used
in
bare-‐metal
embedded
systems
with
300KLOC+.

If
coupled
with
hardware
scheduling
support,
it
can
be
used
to

implement
very
fast
event
response
systems
that
are
very
hard
to

implement
with
priority-‐based
schedulers.

#ESCBOS
Thank
you

Jonny
Doin

jonnydoin@gridvortex.com

ParallelLogicToEventDrivenFirmware_Doin

More Related Content

What's hot

Similar to ParallelLogicToEventDrivenFirmware_Doin

More from Jonny Doin

ParallelLogicToEventDrivenFirmware_Doin