Machine Learning for Body Sensor Networks

Machine
Learning
for
BSN

Dr.
Anna
Förster,
Alessandro
Puia4

BSN
Tutorial,
June
17th
2014

Zürich,
Switzerland

Copyright
A.Förster,
A.Puia4
2014

1

Presenters

Dr.
Anna
Förster

Researcher
at
SUPSI

anna.foerster@ieee.org

Alessandro
Puia<

Senior
researcher
at
SUPSI

alessandro.puia<@supsi.ch

2

Copyright
A.Förster,
A.Puia4
2014

Schedule
and
outlook

•  Data
in
Body
Sensor
Networks

•  What
is
Machine
Learning?

•  Decision
Trees
and
their
applicaNons

•  Discussion

•  Break

•  Neural
networks
and
their
applicaNons

•  Reinforcement
Learning
and
its
applicaNons

•  Other
Machine
Learning
techniques

•  Comparison
of
ML
for
BSNs

•  Open
discussion!
3

Copyright
A.Förster,
A.Puia4
2014

BSN:
The
Challenges

Dr.
Anna
Förster,
Alessandro
Puia4

BSN
Tutorial,
June
17th
2014

Zürich,
Switzerland

Copyright
A.Förster,
A.Puia4
2014

4

BSN
vs
WSN

DC-DC
Sensors ADC MCU
Memory
Wireless
Battery
Node

Architecture

Network

Architecture

DC-DC
Sensors ADC MCU
Memory
Wireless
Battery
DC-DC
Sensors ADC MCU
Memory
Wireless
Battery
DC-DC
Sensors ADC MCU
Memory
Wireless
Battery
SINK

5

Copyright
A.Förster,
A.Puia4
2014

BSN
vs
WSN:
Number
of
Nodes

WSN

BSN

6

Copyright
A.Förster,
A.Puia4
2014

BSN
vs
WSN:
Parameters

WSN

BSN

Almost
homogeneous:
same
sensors
in
every
node

Extremely
heterogeneous:
diﬀerent
sensor
for
each

node

Temperature
Humidity
Light

Body

Temperature

EEG
EMG
SPO2

7

Copyright
A.Förster,
A.Puia4
2014

BSN
vs
WSN:
Other
requirements

8

Requirements
WSN
BSN

Babery
life
Years
App.
dependent

Network
topology
Mostly
Mesh
Star

Mobility
StaNc
Mobile

ComputaNon
Low
Low,
Medium,
High

Frequency
Low
High

Form
factor
Almost
indiﬀerent
Hidden,
Invisible

“Wearability”
-‐-‐
Mandatory

Copyright
A.Förster,
A.Puia4
2014

BSN
Form
Factor

9

hbp://cnbi.epﬂ.ch/page-‐39979-‐en.html

hbp://blog.broadcom.com/wireless-‐technology/

Copyright
A.Förster,
A.Puia4
2014

BSN
Form
Factor

10



Copyright
A.Förster,
A.Puia4
2014

BSN
Form
Factor

11



Copyright
A.Förster,
A.Puia4
2014

BSN
Devices

12

Copyright
A.Förster,
A.Puia4
2014

BSN
Applications

13

INTERNETT1
T1T1
T1 T1
hbp://si.epﬂ.ch/page-‐34870-‐en.html

Patel
at
al,
2012

hbp://technabob.com/blog/2013/09/04/priovr-‐full-‐body-‐sensor/

Copyright
A.Förster,
A.Puia4
2014

BSN
Applications

14

INTERNETT1
T1T1
T1 T1

Patel
at
al,
2012


Copyright
A.Förster,
A.Puia4
2014

BSN
Applications

15

INTERNETT1
T1T1
T1 T1

Patel
at
al,
2012


Copyright
A.Förster,
A.Puia4
2014

BSN
Applications

16

INTERNETT1
T1T1
T1 T1

Patel
at
al,
2012


Copyright
A.Förster,
A.Puia4
2014

BSN:
In
Summary

•  High
heterogeneous
data

•  High
sampling/sending
frequency

•  Small
number
of
nodes
(even
only
one)

•  Many
applicaNons:
not
only
e-‐health

Copyright
A.Förster,
A.Puia4
2014

17

Introduction
to

Machine
Learning

Dr.
Anna
Förster,
Alessandro
Puia4

BSN
Tutorial,
June
17th
2014

Zürich,
Switzerland

Copyright
A.Förster,
A.Puia4
2014

18

Major
goal

Produce
models
(rules,

paberns)

from
data

ProperGes

Robust
and
ﬂexible

Global
models
from
local
data

No
environmental
model

Machine
Learning

…

Neural

Networks

Reinforcement

Learning

GeneNc

Algorithms

Decision

Trees

Swarm

Intelligence

Copyright
A.Förster,
A.Puia4
2014

Clustering

19

Classes
of
Machine
Learning
Algorithms

Copyright
A.Förster,
A.Puia4
2014

Pre-‐labeled

Training
Dataset

TesNng
Dataset

(Usage)

Supervised

learning

Model

Unsupervised

learning

Model

Non-‐labeled

data
item

Reinforcement

learning

Agent
/
Model

Environment

20

Online
against
Batch
Learning

Training
dataset
Use
the
model

Batch
Learning

Model

Use
the
model
Online
learning

Model

Next
data

item

Copyright
A.Förster,
A.Puia4
2014

21

Introduction
to

Decision
Trees

Dr.
Anna
Förster,
Alessandro
Puia4

BSN
Tutorial,
June
17th
2014

Zürich,
Switzerland

Copyright
A.Förster,
A.Puia4
2014

22

Decision
Based
Learning

•  Classifying
objects
into
groups
based
on
abribute
pairs

form

=

round

color
=

orange

taste
=
sour

form

=

round

color
=

red,
orange,
green

taste
=
sweet

apple
orange

?
23

Copyright
A.Förster,
A.Puia4
2014

Decision
Based
Learning

•  Classifying
objects
into
groups
based
on
abribute
pairs

form

=

round

color
=

orange

taste
=
sour

form

=

round

color
=

red,
orange,
green

taste
=
sweet

apple
orange

form

=

?

color
=

?

taste
=
?

24

Copyright
A.Förster,
A.Puia4
2014

Decision
Based
Learning

•  Classifying
objects
into
groups
based
on
abribute
pairs

form

=

round

color
=

orange

taste
=
sour

form

=

round

color
=

red,
orange,
green

taste
=
sweet

apple
orange

form

=

round

color
=

?

taste
=
?

???
25

Copyright
A.Förster,
A.Puia4
2014

Decision
Based
Learning

•  Classifying
objects
into
groups
based
on
abribute
pairs

form

=

round

color
=

orange

taste
=
sour

apple
orange

form

=

round

color
=

orange

taste
=
?

???

form

=

round

color
=

red,
orange,
green

taste
=
sweet

26

Copyright
A.Förster,
A.Puia4
2014

Decision
Based
Learning

•  Classifying
objects
into
groups
based
on
abribute
pairs

form

=

round

color
=

orange

taste
=
sour

apple
orange

form

=

round

color
=

orange

taste
=
sweet

apple!

form

=

round

color
=

red,
orange,
green

taste
=
sweet

27

Copyright
A.Förster,
A.Puia4
2014

Decision
Based
Learning

•  Classifying
objects
into
groups
based
on
abribute
pairs

form

=

round

color
=

orange

taste
=
sour

apple
orange

form

=

round

color
=

orange

taste
=
sweet

apple!

form

=

round

color
=

red,
orange,
green

taste
=
sweet

3
quesNons!
28

Copyright
A.Förster,
A.Puia4
2014

Decision
Based
Learning

•  Classifying
objects
into
groups
based
on
abribute
pairs

form

=

round

color
=

orange

taste
=
sour

apple
orange

taste
=
sweet

color
=

?

form

=

?

apple!

form

=

round

color
=

red,
orange,
green

taste
=
sweet

29

Copyright
A.Förster,
A.Puia4
2014

Decision
Based
Learning

•  Classifying
objects
into
groups
based
on
abribute
pairs

form

=

round

color
=

orange

taste
=
sour

apple
orange

taste
=
sweet

color
=

?

form

=

?

apple!

form

=

round

color
=

red,
orange,
green

taste
=
sweet

1
quesNon!
30

Copyright
A.Förster,
A.Puia4
2014

Decision
Tree
Learning

•  Supervised
learning
approach
(use
pre-‐labeled
dataset)

•  Maps
observaNons
(features,
abributes)
into
classes
(decisions)

•  Very
powerful
and
eﬃcient
technique
to
analyze
large
and
fuzzy

datasets

Is
male?

Is
age
<
9.5?

Family
on
board
>
2.5?

survived

survived
died

died

0.73
:
36%

0.89
:
2%
0.05
:
2%

0.17
:
61%

Probability
of
survival
on
the
Titanic
:
%observa@ons

31

Copyright
A.Förster,
A.Puia4
2014

Decision
Based
Learning

•  Classifying
objects
into
groups
based
on
abribute
pairs

•  Which
quesNons
to
ask
ﬁrst,
which
next?

•  Compute
informaNon
gain
of
abributes

•  How
well
does
an
abribute
separates

the
tesNng
set?

32

Copyright
A.Förster,
A.Puia4
2014

C4.5
algorithm

Goal:
construct
a
decision
tree
with
aVribute
at
each
node

1.  Start
at
root

2.  Find
the
abribute
with
maximal
informaNon
gain,
which
is

not
an
ancestor
of
the
node

3.  Put
a
child
node
for
each
value
of
this
abribute

4.  Add
all
examples

from
the
training
set
to
the

corresponding
child

5.  If
all
examples
of
a
child
belong
to
the
same
class,
put
the

class
there
and
go
back
up
in
the
tree

6.  If
not,
conNnue
with
step
2
while
abributes
are
let

7.  When
no
more
abributes
are
let,
put
the
classiﬁcaNon
of

the
majority
of
the
examples
to
this
node

33

Copyright
A.Förster,
A.Puia4
2014

C4.5
algorithm:
Example

example
form
color
class

1
round
red
apple

2
round
orange
apple

3
round
orange
orange

4
round
green
apple

5
round
yellow
apple

6
round
orange
orange

¡  InformaNon
gain
of
FORM:
zero

¡  InformaNon
gain
of
COLOR:
more

34

Copyright
A.Förster,
A.Puia4
2014

C4.5
algorithm:
Example

example
form
color
class

1
round
red
apple

2
round
orange
apple

3
round
orange
orange

4
round
green
apple

5
round
yellow
apple

6
round
orange
orange

¡  InformaNon
gain
of
FORM:
zero

¡  InformaNon
gain
of
COLOR:
more

color

red
green
orange
yellow

35

Copyright
A.Förster,
A.Puia4
2014

C4.5
algorithm:
Example

example
form
color
class

1
round
red
apple

2
round
orange
apple

3
round
orange
orange

4
round
green
apple

5
round
yellow
apple

6
round
orange
orange

¡  InformaNon
gain
of
FORM:
zero

¡  InformaNon
gain
of
COLOR:
more

color

red
green
orange
yellow

1
4
2,3,6
5

36

Copyright
A.Förster,
A.Puia4
2014

C4.5
algorithm:
Example

example
form
color
class

1
round
red
apple

2
round
orange
apple

3
round
orange
orange

4
round
green
apple

5
round
yellow
apple

6
round
orange
orange

¡  InformaNon
gain
of
FORM:
zero

¡  InformaNon
gain
of
COLOR:
more

color

red
green
orange
yellow

1
4
2,3,6
5

apple
apple
apple
?

37

Copyright
A.Förster,
A.Puia4
2014

C4.5
algorithm:
Example

example
form
color
class

1
round
red
apple

2
round
orange
apple

3
round
orange
orange

4
round
green
apple

5
round
yellow
apple

6
round
orange
orange

¡  InformaNon
gain
of
FORM:
zero

¡  InformaNon
gain
of
COLOR:
more

¡  Only
let
abribute:
FORM

color

red
green
orange
yellow

1
4
2,3,6
5

apple
apple
apple

form

round

2,3,6

orange

38

Copyright
A.Förster,
A.Puia4
2014

C4.5
algorithm:
Problems

example
form
color
class

1
round
red
apple

2
round
orange
apple

3
round
orange
orange

4
round
green
apple

5
round
yellow
apple

6
round
orange
orange

¡  All
orange
apples
will
be
classiﬁed
as
oranges

¡  Leaf
node
FORM
unnecessary

¡  DECISION
TREE
DEPENDS
ON
TRAINING
SET

color

red
green
orange
yellow

1
4
2,3,6
5

apple
apple
apple

form

round

2,3,6

orange

39

Copyright
A.Förster,
A.Puia4
2014

Information
Gain

•  Input
are
T
tuples
(classiﬁed
samples
with
K
features):

•  The
informaNon
gain
of
feature
a
is
deﬁned
in
terms
of
the

entropy
as
follows:

x,Y( )= x1, x2, x3,..., xk,Y( )
xa ∈ vals a( ),Y = class
IG T,a( )= H T( )−
x ∈ T xa = v{ }
T
⋅ H x ∈ T xa = v{ }( )∑
H(T) = − pi log2
i=1
Y
∑ (pi )
Entropy
of
the

full
dataset

Entropies
of
the
sub-‐
datasets
“MALE”
and

“FEMALE”

40

Copyright
A.Förster,
A.Puia4
2014

Properties
of
Decision
Based
Learning

•  Good
for
fast
classiﬁcaNon
of
fuzzy,
overlapping
groups

•  Tree
generated
only
once

•  Well-‐suited
for
staNc,
but
error-‐prone
environments

•  Needs
a
good
large
training
set

•  Moderate
processing
and
large
memory
requirements
(to

hold
the
training
set)

41

Copyright
A.Förster,
A.Puia4
2014

Incremental
Decision
Trees

•  Hoeﬀding
tree
algorithm

•  Hoeﬀding
bound
guarantees

that
if

Xa
is
indeed
the
best
feature

with
some
small
probability

Copyright
A.Förster,
A.Puia4
2014

Pre

DT

+/-‐

WSN

IDT

Disc

Classify
the
new
sample

Save
the
sample
at
the
leaf

Compute
IG
for
each
feature
X

All
samples

belong
to
same

class?

IG(Xa )− IG(Xb ) < ε
Split
the
node
according
to

feature
Xa

true

false

IG(Xa )− IG(Xb ) < ε
[Domingos:2000]
P.
Domingos
and
G.
Hulten:
Mining

High-‐speed
Data
Streams,
in
Proceedings
of
the
6th

ACM
Interna@onal
Conference
on
Knowledge
Discovery

and
Data
Mining
(SIGKDD)

42

Neural
Networks
–

Introduction
and
Applications

Dr.
Anna
Förster,
Alessandro
Puia4

BSN
Tutorial,
June
17th
2014

Zürich,
Switzerland

Copyright
A.Förster,
A.Puia4
2014

43

Background

•  Simpliﬁed
(extremely!)
model
of
the
human
brain
and
its

neurons

44

Copyright
A.Förster,
A.Puia4
2014

Fundamentals

45

Copyright
A.Förster,
A.Puia4
2014

Perceptron

•  Simplest
form
of
neural
network

•  Computes
linear
funcNons
only

•  AcNvaNon
funcNon
is
simple
threshold

•  Where
do
the
weights
come
from?

46

Copyright
A.Förster,
A.Puia4
2014

Perceptron
Learning

1.  Present
the
network
with
an
input

2.  Calculate
its
current
output

3.  Compare
with
real
output
(supervised
learning!)

4.  Correct
the
weights
to
minimize
the
error
between
the

computer
output
and
the
desired
one

wnew
=
wold
–
α*(desired-‐output)*input,
α
–
learning
constant

47

Copyright
A.Förster,
A.Puia4
2014

Multi-‐Layer
Networks

48

•  Generalizes
all

possible
funcNons

•  Uses
the
logisNc

funcNon
(sigmoid)
for

acNvaNon

•  Back
propagaNon
is

the
most
oten
used

weight
learning

method

Copyright
A.Förster,
A.Puia4
2014

Applications

•  Very
well
suited
for

•  Pabern
recogniNon,
image
recogniNon

•  Noise
cancelling

•  PredicNon
(based
on
extrapolated
data)

•  ProperNes:

•  Supervised
learning,
requires
a
large
training
set

•  Memory
and
processing
intensive
training

•  TesNng
is
also
processing
intensive

•  Examples
from
BSN:

•  Paberns
recogniNon
based
on
mulN-‐modal
data

•  Cardio-‐vascular
problems,
heart
abacks

•  Falls

•  AcNviNes

49

Zhanpeng
Jin,
Yuwen
Sun,
and
Allen
C.
Cheng:
PredicNng
Cardiovascular
Disease
from
Real-‐Time
Electrocardiographic

Monitoring:
An
AdapNve
Machine
Learning
Approach
on
a
Cell
Phone,
IEEE
EMBS
2009.

Copyright
A.Förster,
A.Puia4
2014

Introduction
to

Reinforcement
Learning

Dr.
Anna
Förster,
Alessandro
Puia4

BSN
Tutorial,
June
17th
2014

Zürich,
Switzerland

Copyright
A.Förster,
A.Puia4
2014

50

Reinforcement
Learning

•  A
learning
agent

•  A
pool
of
possible
acNons

•  Goodness
of
acNons

•  A
reward
funcNon

•  Select
one
acNon

•  Execute
the
acNon

•  Observe
the
reward

•  Correct
the
goodness
of
the
executed
acNon
51

Copyright
A.Förster,
A.Puia4
2014

Introduction
to
Q-‐Learning

52

Copyright
A.Förster,
A.Puia4
2014

Introduction
to
Q-‐Learning

¤  Learning
agent

53

Copyright
A.Förster,
A.Puia4
2014

D
B
A
E
F
C
START
Introduction
to
Q-‐Learning

¤  Learning
agent

¤  Internal
current
state
st

54

Copyright
A.Förster,
A.Puia4
2014

D
B
A
E
F
C
START
Introduction
to
Q-‐Learning

¤  Learning
agent

¤  Internal
current
state
st

¤  Pool
of
possible
acNons

At(st)

55

Copyright
A.Förster,
A.Puia4
2014

Introduction
to
Q-‐Learning

D
B
A
E
F
C
START
¤  Learning
agent

¤  Internal
current
state
st

¤  Pool
of
possible
acNons

At(st)

¤  Associated
Q-‐value
to
each

acNon
in
each
state

56

Copyright
A.Förster,
A.Puia4
2014

D
B
A
E
F
C
START
0
0
0
0
100
0
0
0
100
0
0
action with immediate
reward 0 and cost -1
0
100
100
Introduction
to
Q-‐Learning

¤  Learning
agent

¤  Internal
current
state
st

¤  Pool
of
possible
acNons

At(st)

¤  Associated
Q-‐value
to
each

acNon
in
each
state

¤  Immediate
reward
ater

each
acNon

1.
select
an
ac+on

57

Copyright
A.Förster,
A.Puia4
2014

D
B
A
E
F
C
START
0
0
0
0
100
0
0
0
100
0
0
0
100
100
Introduction
to
Q-‐Learning

¤  Learning
agent

¤  Internal
current
state
st

¤  Pool
of
possible
acNons

At(st)

¤  Associated
Q-‐value
to
each

acNon
in
each
state

¤  Immediate
reward
ater

each
acNon

¤  Learning
procedure:

¤  select
an
acNon
1.
select
an
ac+on

58

Copyright
A.Förster,
A.Puia4
2014

D
B
A
E
F
C
START
0
0
0
0
100
0
0
0
100
0
0
0
100
100
Introduction
to
Q-‐Learning

¤  Learning
agent

¤  Internal
current
state
st

¤  Pool
of
possible
acNons

At(st)

¤  Associated
Q-‐value
to
each

acNon
in
each
state

¤  Immediate
reward
ater

each
acNon

¤  Learning
procedure:

¤  select
an
acNon

¤  execute
the
acNon

1.
select
an
ac+on

2.
execute
the
ac+on

59

Copyright
A.Förster,
A.Puia4
2014

D
B
A
E
F
C
START
0
0
0
0
100
0
0
0
100
0
0
0
100
100
Introduction
to
Q-‐Learning

¤  Learning
agent

¤  Internal
current
state
st

¤  Pool
of
possible
acNons

At(st)

¤  Associated
Q-‐value
to
each

acNon
in
each
state

¤  Immediate
reward
ater

each
acNon

¤  Learning
procedure:

¤  select
an
acNon

¤  execute
the
acNon

¤  observe
reward

1.
select
an
ac+on

2.
execute
the
ac+on

3.
receive
reward

60

Copyright
A.Förster,
A.Puia4
2014

D
B
A
E
F
C
START
0
0
0
0
100
0
0
0
100
0
0
0
100
100
Introduction
to
Q-‐Learning

¤  Learning
agent

¤  Internal
current
state
st

¤  Pool
of
possible
acNons

At(st)

¤  Associated
Q-‐value
to
each

acNon
in
each
state

¤  Immediate
reward
ater

each
acNon

¤  Learning
procedure:

¤  select
an
acNon

¤  execute
the
acNon

¤  observe
reward

¤  update
state
and
Q-‐
values

1.
select
an
ac+on

2.
execute
the
ac+on

3.
receive
reward

4.
st
=
D,
Q(aD,
C)

61

Copyright
A.Förster,
A.Puia4
2014

D
B
A
E
F
C
START
0
0
0
0
100
0
0
0
100
0
0
0
100
100
Introduction
to
Q-‐Learning

¤  Learning
agent

¤  Internal
current
state
st

¤  Pool
of
possible
acNons

At(st)

¤  Associated
Q-‐value
to
each

acNon
in
each
state

¤  Immediate
reward
ater

each
acNon

¤  Learning
procedure:

¤  select
an
acNon

¤  execute
the
acNon

¤  observe
reward

¤  update
state
and
Q-‐
values

1.
select
an
ac+on

2.
execute
the
ac+on

3.
receive
reward

4.
st
=
D,
Q(aD,
C)

62

Copyright
A.Förster,
A.Puia4
2014

How
to
recompute
the
Q-‐values?

€
Q st +1,at( ) = Q st,at( )+ γ R st,at( )− Q st,at( )( )
new
Q-‐Value
old
Q-‐Value
immediate
reward
received

a`er
execuGng
acGon

a
in
state
s
at
Gme
t

old
Q-‐Value
learning
constant

¡  Learning
constant:
avoid
oscillaNons
of
Q
values
at
the

beginning
of
the
learning
process
(smooth
the
Q-‐Values)

¡  γ
≈

1

:
new
Q-‐Value
is
exchanged
with
the
reward

¡  γ
≈
0
:
new
Q-‐Value
is
the
same
as
the
old
one
63

Copyright
A.Förster,
A.Puia4
2014

How
to
deTine
the
reward

function?

•  Two
main
types:

•  Pre-‐deﬁned

•  Computed
ater
each
acNon

•  Oten
used
:

•  zero
awards
for
acNons
leading
directly
to
the
goal

•  negaNve
for
all
others
(e.g.
-‐1)

•  Also
used:

•  Manhaban
distance
to
the
goal

•  Geographic
distance
to
the
goal

•  Currently
best
available
Q
value
at
the
state
(!!)

64

Copyright
A.Förster,
A.Puia4
2014

How
to
decide
which
action
to

take?

•  ExploraGon
strategy
(acGon
selecGon
policy)

•  Cannot
be
random,
need
to
use
accumulated
knowledge

•  Cannot
be
greedy,
need
to
explore
all
possibiliNes

•  Oten
used:
ε-‐greedy

•  select
a
random
acNon
with
probability
ε

•  select
the
best
available
one
(best
Q-‐value)
with
probability
(1-‐ε)

65

Copyright
A.Förster,
A.Puia4
2014

Properties
of
Reinforcement
Learning

•  Simple,
ﬂexible
model

•  Adapts
to
changing
environments,
re-‐learns
quickly

•  Copes
successfully
with
mobile
or
unreliable
environments

•  Simple
to
design
and
implement

•  Small
to
moderate
processing
and
memory
needs

•  Can
be
implemented
fully
distributed

66

Copyright
A.Förster,
A.Puia4
2014

Reinforcement
Learning
for
BSNs?

•  All
distributed
problems:

•  RouNng
protocols

•  Clustering
protocols

•  Neighborhood
management
protocols

•  Medium
Access
protocols

•  Further

•  Parameter
opNmizaNon
and
learning

•  ApplicaNon-‐level
cooperaNon
among
nodes

67

Copyright
A.Förster,
A.Puia4
2014

Applications
of

Reinforcement
Learning

Dr.
Anna
Förster,
Alessandro
Puia4

BSN
Tutorial,
June
17th
2014

Zürich,
Switzerland

Copyright
A.Förster,
A.Puia4
2014

68

Q-‐Learning
in
WSN
Routing

•  Agents:
the
packets

•  States:
the
nodes

•  AcGons:
next
hops

•  q-‐values:
esNmaNons
of
rouNng
costs

•  IniGal
q-‐values:
some
ﬁrst
guess
about
rouNng
costs

•  Reward
funcNon:
the
best
cost
esNmaNon

of
the
next
hop

•  ExploraGon
strategy:
simple,
e.g.
ε-‐greedy

69

Copyright
A.Förster,
A.Puia4
2014

Unicast
routing
with
RL

Sending
a
packet
from
A
to
D

Init
all
q
values
to
10
(guess)

A

B

C

D

Rewards:"
"r = qbest, if not sink"
"r = 0, if sink"
Send rewards to all neighbors
(broadcast)"
70

Copyright
A.Förster,
A.Puia4
2014

Sending
a
packet
from
A
to
D

Init
all
q
values
to
10
(guess)

A

B

C

D

QB = 10 (initial)"
QC = 10 (initial)"
Action selection policy"
(Exploration strategy)"
"ε-greedy"
Balance exploration/exploitation"
Unicast
routing
with
RL

state
Q

B
10

C
10

state
Q

A
10

C
10

D
10

state
Q

B
10

A
10

D
10

71

Copyright
A.Förster,
A.Puia4
2014

A

B

C

D

QB = 10 (initial)"
Sending
a
packet
from
A
to
D

Select
next
hop
(state)
B

Unicast
routing
with
RL

state
Q

B
10

C
10

72

Copyright
A.Förster,
A.Puia4
2014

A

B

C

D

QA = 10 (initial)"
Sending
a
packet
from
A
to
D

B
has
3
possible
next
hops,
with
qbest
=
10

QC = 10 (initial)"
QD = 10 (initial)"
Unicast
routing
with
RL

state
Q

A
10

C
10

D
10

73

Copyright
A.Förster,
A.Puia4
2014

A

B

C

D

Sending
a
packet
from
A
to
D

B
selects
D
as
next
hop,

packet"
Unicast
routing
with
RL

state
Q

A
10

C
10

D
10

74

Copyright
A.Förster,
A.Puia4
2014

A

B

C

D

Sending
a
packet
from
A
to
D

B
selects
D
as
next
hop,

reward
=
qbest
=
10

packet"
reward"
reward"
Unicast
routing
with
RL

state
Q

A
10

C
10

D
10

75

Copyright
A.Förster,
A.Puia4
2014

A

B

C

D

Sending
a
packet
from
A
to
D

B
selects
D
as
next
hop,

reward
=
qbest
=
10

packet"
reward"
QB = cB + rB = 11"
QC = 10"
reward"
QA = 10"
QB = cB + rB = 11"
QD = 10"
Unicast
routing
with
RL

state
Q

A
10

C
10

D
10

76

Copyright
A.Förster,
A.Puia4
2014

A

B

C

D

Sending
a
packet
from
A
to
D

B
selects
D
as
next
hop,

reward
=
qbest
=
10

packet"
reward"
QB = cB + rB = 11"
reward"
QB = cB + rB = 11"
Unicast
routing
with
RL

state
Q

A
10

C
10

D
10

state
Q

B
11

C
10

state
Q

B
11

A
10

D
10

77

Copyright
A.Förster,
A.Puia4
2014

A

B

C

D

Sending
a
packet
from
A
to
D

D
is
the
sink,
goal
reached

Unicast
routing
with
RL

78

Copyright
A.Förster,
A.Puia4
2014

A

B

C

D

Sending
a
packet
from
A
to
D

D
is
the
sink,
goal
reached

reward
=
0
(real
costs)

reward"
reward"
Unicast
routing
with
RL

79

Copyright
A.Förster,
A.Puia4
2014

A

B

C

D

Sending
a
packet
from
A
to
D

D
is
the
sink,
goal
reached

reward
=
0
(real
costs)

reward"
QD = cB + rB = 1"
QD = cB + rB = 1"
reward"
Unicast
routing
with
RL

state
Q

A
10

C
10

D
1

state
Q

B
11

A
10

D
1

80

Copyright
A.Förster,
A.Puia4
2014

A

B

C

D

Sending
a
packet
from
A
to
D

State
of
the
network
ater
ﬁrst
packet

Unicast
routing
with
RL

state
Q

B
11

C
10

state
Q

A
10

C
10

D
1

state
Q

B
11

A
10

D
1

81

Copyright
A.Förster,
A.Puia4
2014

A

B

C

D

Sending
a
packet
from
A
to
D

State
of
the
network
ater
many
packets

Unicast
routing
with
RL

state
Q

B
2

C
2

state
Q

A
3

C
2

D
1

state
Q

B
2

A
3

D
1

How to go faster?!
Make better guesses!!
82

Copyright
A.Förster,
A.Puia4
2014

Unicast
routing
with
RL

Bene3its

•  Simple
and
powerful

•  Reacts
immediately
to
changes:

•  New
rewards
propagate
quickly

•  New
routes
are
learnt

•  Only
necessary
changes
in
the
immediate
neighborhood
of
failure

•  Route
iniNalizaNon
is
sink/source
driven

•  Low
memory
and
processing
overhead

83

Copyright
A.Förster,
A.Puia4
2014

Unicast
Routing
with
RL

•  Hops:
too
trivial
to
deserve
a
publicaNon…

•  Maximum
aggregaNon
rate:

P.
Beyens,
M.
Peeters,
K.
Steenhaut,
and
A.
Nowe.
RouGng
with
compression
in

wireless
sensor
networks:
A
Q-‐learning
approach.
In
Proceedings
of
the
5th

European
Workshop
on
AdapNve
Agents
and
MulN-‐Agent
Systems
(AAMAS),
page

12pp.,
Paris,
France,
2005.

•  Combined
with
geographic
rouNng:

R.
Arroyo-‐Valles,
R.
Alaiz-‐Rodrigues,
A.
Guerrero-‐Curieses,
and
J.
Cid-‐
Suiero.

Q-‐probabilisGc
rouGng
in
wireless
sensor
networks.
In
Proceedings
of
the
3rd

InternaNonal
Conference
on
Intelligent
Sensors,
Sensor
Networks
and
InformaNon

Processing
(ISSNIP),
pages
1–6,
Melbourne,
Australia,
2007.

•  Minimum
delay:

J.
A.
Boyan
and
M.
L.
Libman.
Packet
rouGng
in
dynamically
changing
networks:

A
reinforcement
learning
approach.
Advances
in
Neural
InformaNon
Processing

Systems,
6:671–678,
1994.

84

Copyright
A.Förster,
A.Puia4
2014

•  Challenges:

•  AcNons
need
to
reﬂect
not
the
next
hop,

but
HOPS

•  Reward
funcNon
is
distributed
among

several
neighbors

•  Set
of
acNons
very
large
–
needs
a
lot
of

exploraNon!

•  SoluNon
steps:

•  Separate
acNons
into
sub-‐acNons

•  Smart
iniNal
Q
values

Multicast
Routing
with
RL

A

B

C

D

A.
Förster
and
A.
L.
Murphy.

FROMS:
A
Failure
Tolerant
and
Mobility
Enabled
MulGcast
RouGng
Paradigm

with
Reinforcement
Learning.

Elsevier
Ad
Hoc
Networks,
2011

85

Copyright
A.Förster,
A.Puia4
2014

FROMS:
Multicast
routing
with
Q-‐Learning

§  Localized
view
ater
sink
announcement

§  The
minimum
esNmated
is
not
the
opNmal:

§  best
esNmate
for

(A,B):
3
+
3
-‐
1
=
5
hops

§  opNmal
for

(A,B):
4
hops

A
-‐
5
hops

B
-‐
3
hops

A
-‐
3
hops

B
-‐
5
hops

2

1
3

A
B

A
-‐
4
hops

B
-‐
4
hops

st+1,
Qt+1
environmentagent
rt(st,at)

at

st,
At,
Qt

86

Copyright
A.Förster,
A.Puia4
2014

FROMS:
Multicast
routing
with
Q-‐Learning

agent

st+1,
Qt+1
environmentagent
rt(st,at)

at

st,
At,
Qt

§  Agent:
each
node
in
the
network

87

Copyright
A.Förster,
A.Puia4
2014

FROMS:
Multicast
routing
with
Q-‐Learning

agent

st+1,
Qt+1
environmentagent
rt(st,at)

at

st,

At,
Qt

§  Agent:
each
node
in
the
network

§  State:
agent’s
neighbors

88

Copyright
A.Förster,
A.Puia4
2014

FROMS:
Multicast
routing
with
Q-‐Learning

2

1
3

agent

for
sink
A

for
sink
B

ai = {n1 for A}, {n3 for B} !
Actions:!
aj = {n2 for A,B} !
for
sinks
A,
B

sub-actions

st+1,
Qt+1
environmentagent
rt(st,at)

at

st,
At,
Qt

§  Agent:
each
node
in
the
network

§  State:
agent’s
neighbors

§  Possible
acNons:
combinaNon
of
neighbors
to
reach

all
sinks

89

Copyright
A.Förster,
A.Puia4
2014

FROMS:
Multicast
routing
with
Q-‐Learning

2

1
3

for
sink
A

for
sink
B

for
sinks
A,
B

Q(n2,
{A,B})

Q(n3,
{B})
Q(n1,
{A})

st+1,
Qt+1
environmentagent
rt(st,at)

at

st,
At,
Qt

§  Agent:
each
node
in
the
network

§  State:
agent’s
neighbors

§  Possible
acNons:
combinaNon
of
neighbors

§  Q
Values:
associate
with

§  each
sub-‐acNon

§  computable
for
each
(full)
acNon

90

Copyright
A.Förster,
A.Puia4
2014

FROMS:
Multicast
routing
with
Q-‐Learning

2

1
3

for
sinks
A
(4
hops)

B
(4
hops)

Q(n2,
{A,B})
=
4+4-‐1

st+1,
Qt+1
environmentagent
rt(st,at)

at

st,
At,
Qt

§  Agent:
each
node
in
the
network

§  State:
agent’s
neighbors

§  Possible
acNons:
combinaNon
of
neighbors

§  Q
Values:
associate
with
sub-‐acNons,

compute
for
acNons

§  IniNalize
Q
Values
with
number
of
esNmated
hops

91

Copyright
A.Förster,
A.Puia4
2014

FROMS:
Multicast
routing
with
Q-‐Learning

2

1
3

st+1,
Qt+1
environmentagent
rt(st,at)

at

st,

At,
Qt

§  Agent:
each
node
in
the
network

§  State:
agent’s
neighbors

§  Possible
acNons:
combinaNon
of
neighbors

§  Q
Values:
associate
with
sub-‐acNons,

compute
for
acNons

§  IniNalize
Q
Values
with
number
of
esNmated
hops

§  Environment:
all
other
nodes

92

Copyright
A.Förster,
A.Puia4
2014

FROMS:
Multicast
routing
with
Q-‐Learning

2

1
3

for
sinks
A,B

st+1,
Qt+1
environmentagent
rt(st,at)

at

st,

At,
Qt

§  Agent:
each
node
in
the
network

§  State:
agent’s
neighbors

§  Possible
acNons:
combinaNon
of

§  Q
Values:
associate
with
sub-‐acNons,

compute
for
acNons

§  IniNalize
Q
Values
with
number
of
esNmated
hops

all
other
nodes

93

Copyright
A.Förster,
A.Puia4
2014

FROMS:
Multicast
routing
with
Q-‐Learning

2

1
3

§  Agent:
each
node
in
the
network

§  State:
agent’s
neighbors

§  Possible
acNons:
combinaNon
of

§  Q
Values:
associate
with
sub-‐acNons,

compute
for
acNons

§  IniNalize
Q
Values
with
number
of
esNmated
hops

all
other
nodes

§  Reward:
the
best
available
Q
value
+
1
hop

for
sinks
A,B

i

st+1,
Qt+1
environmentagent
rt(st,at)

at

st,

At,
Qt

94

Copyright
A.Förster,
A.Puia4
2014

environmentagent
FROMS:
Multicast
routing
with
Q-‐Learning

2

1
3

§  Agent:
each
node
in
the
network

§  State:
agent’s
neighbors

§  Possible
acNons:
combinaNon
of

§  Q
Values:
associate
with
sub-‐acNons,

compute
for
acNons

§  IniNalize
Q
Values
with
number
of
esNmated
hops

all
other
nodes

§  Reward:
the
best
available
Q
value
+
1
hop

§  Update
at
neighboring
nodes
(learn)

for
sinks
A,B

i

st+1,
Qt+1

rt(st,at)

at

st,

At,
Qt

exploraNon
strategy

update
rules

reward
computaNon

95

Copyright
A.Förster,
A.Puia4
2014

Parameters
of
FROMS

•  Possible
cost
funcNons:

•  Any
cost
funcNon
deﬁned
over
the
edges
or

nodes
of
the
communicaNon
graph

•  Here:
minimum
hops
to
desGnaGons

•  Further:
minimum
delay
to
the
sinks;
minimum

geographic
progress;
minimum
transmission

power;
maximum
remaining
energy
on
the

nodes;
combinaNons;
…

•  ExploraNon
strategy

•  Balance
exploraNon
against
exploitaNon

•  Depend
on
the
used
cost
funcNon

•  Memory
management

•  HeurisNcs
for
pruning
the
available
acNons
and

sub-‐acNons

st+1,
Qt+1
environmentagent
rt(st,at)

at

st,

At,
Qt

96

Copyright
A.Förster,
A.Puia4
2014

Further
Applications
of
RL
to

WSNs

•  Clustering
for
WSNs:

Anna
Förster
and
Amy
L.
Murphy,
Clique:
Role-‐free
Clustering
with
Q-‐
Learning
for
Wireless
Sensor
Networks,
in
Proceedings
of
the
29th

InternaNonal
Conference
on
Distributed
CompuNng
Systems
(ICDCS)
2009,

9pp.,
Canada,
June
2009

•  MAC
protocols:

Z.
Liu
and
I.
Elahanany.
RL-‐MAC:
A
reinforcement
learning
based
MAC

protocol
for
wireless
sensor
networks.
InternaNonal
Journal
on
Sensor

Networks,
1(3/4):117–124,
2006.

•  Best
coverage:

M.W.M.
Seah,
C.K.
Tham,
K.
Srinivasan,
and
A.
Xin.
Achieving
coverage

through
distributed
reinforcement
learning
in
wireless
sensor
networks.
In

Proceedings
of
the
3rd
InternaNonal
Conference
on
Intelligent
Sensors,

Sensor
Networks
and
InformaNon
Processing
(ISSNIP),
2007.
97

Copyright
A.Förster,
A.Puia4
2014

Discussion

Dr.
Anna
Förster,
Alessandro
Puia4

BSN
Tutorial,
June
17th
2014

Zürich,
Switzerland

Copyright
A.Förster,
A.Puia4
2014

98

ML
Techniques! Memory! ComputaGon! Tolerance
to

topology

changes!
OpGmality! Init.costs! Add.

costs!
Reinforcement

Learning!
low" low" high" high" medium" low"
Swarm

Intelligence!
medium" low" high" high" high" medium"
HeurisGcs! low" low" low/medium" medium" high" low"
Mobile
Agents! low" low" medium" low" low" medium
/high"
Neural

networks!
medium" medium" low" high" high" low"
GeneGc

algorithms!
high" medium" low" high" high" low"
Comparison
of
properties

required
memory

for
on-‐node
storage

required
processing

on
the
node
or
base

staNon

ﬂexibility
of
the

found
soluNon
to

environmental

changes

opNmality
of
derived

soluNon
compared

to
a
centrally

computed
opNmal

soluNon

required

communicaNon
or

processing
costs

before
starNng

normal
work

addiNonal

communicaNon
or

processing
costs

during
runNme

99

Copyright
A.Förster,
A.Puia4
2014

ML
to

topology

changes!

costs!
Reinforcement

Learning!
Swarm

Intelligence!
Mobile
/high"
Neural

networks!
GeneGc

algorithms!
Comparison
of
properties

100

Copyright
A.Förster,
A.Puia4
2014

ML
to

topology

changes!

costs!
Reinforcement

Learning!
Swarm

Intelligence!
Mobile
/high"
Neural

networks!
GeneGc

algorithms!
Comparison
of
properties

101

Copyright
A.Förster,
A.Puia4
2014

ML
to

topology

changes!

costs!
Reinforcement

Learning!
Swarm

Intelligence!
Mobile
/high"
Neural

networks!
GeneGc

algorithms!
Comparison
of
properties

102

Copyright
A.Förster,
A.Puia4
2014

ML
to

topology

changes!

costs!
Reinforcement

Learning!
Swarm

Intelligence!
Mobile
/high"
Neural

networks!
GeneGc

algorithms!
Comparison
of
properties

103

Copyright
A.Förster,
A.Puia4
2014

ML
to

topology

changes!

costs!
Reinforcement

Learning!
Swarm

Intelligence!
Mobile
/high"
Neural

networks!
GeneGc

algorithms!
Comparison
of
properties

104

Copyright
A.Förster,
A.Puia4
2014

ML
to

topology

changes!

costs!
Reinforcement

Learning!
Swarm

Intelligence!
Mobile
/high"
Neural

networks!
GeneGc

algorithms!
Comparison
of
properties

105

Copyright
A.Förster,
A.Puia4
2014

Comparison
of
properties

ML
to

topology

changes!

costs!
Reinforcement

Learning!
Swarm

Intelligence!
Mobile
/high"
Neural

networks!
GeneGc

algorithms!
106

Copyright
A.Förster,
A.Puia4
2014

Comparison
of
properties

ML
to

topology

changes!

costs!
Reinforcement

Learning!
Swarm

Intelligence!
Mobile
/high"
Neural

networks!
GeneGc

algorithms!
high" medium" low" high" high" Low

"
Decision
Trees
medium
medium
low
high
high
low

107

Copyright
A.Förster,
A.Puia4
2014

Comparison
of
properties

ML
to

topology

changes!

costs!
Reinforcement

Learning!
Swarm

Intelligence!
Mobile
/high"
Neural

networks!
GeneGc

algorithms!
Decision
Trees!
Distributed problems
Centralized and localized problems
Optimization
108

Copyright
A.Förster,
A.Puia4
2014

Further
readings

M.
Dorigo
and
T.
Stuetzle.

Ant
Colony
OpGmizaGon.

MIT
Press,
2004.

J.
Kennedy
and
R.C.
Eberhart.

Swarm
Intelligence.

Morgan
Kaufmann,
2001.

T.M.
Mitchell.

Machine
Learning.

McGraw-‐Hill,
1997.

A.
Förster.

Teaching
Networks
How
to

Learn

SVH
Verlag,
2009

S.J.
Russell
and
P.
Norvig.
ArGﬁcial

Intelligence:

A
Modern
Approach.

PrenNce
Hall
InternaNonal,
2003.

R.
S.
Subon
and
A.
G.
Barto.

Reinforcement
Learning:

An
IntroducGon.

The
MIT
Press,
March
1998.

109

Copyright
A.Förster,
A.Puia4
2014

Copyright
A.Förster,
A.Puia4
2014

110

OPEN
DISCUSSION

111

Copyright
A.Förster,
A.Puia4
2014

Machine Learning for Body Sensor Networks

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (15)

Similar to Machine Learning for Body Sensor Networks

Similar to Machine Learning for Body Sensor Networks (20)

Recently uploaded

Recently uploaded (20)

Machine Learning for Body Sensor Networks