A reinforcement learning based routing protocol with qo s support for biomedical sensor networks

A Reinforcement Learning bbaasseedd RRoouuttiinngg
PPrroottooccooll
wwiitthh QQooSS SSuuppppoorrtt ffoorr BBiioommeeddiiccaall SSeennssoorr
NNeettwwoorrkkss Author:
XXuueeddoonngg LLiiaanngg
IIllaannggkkoo BBaallaassiinngghhaamm
SSaanngg--SSeeoonn BByyuunn
The Interventional Center, Rikshospitalet University Hospital, Oslo, Norway N-0027
Dept. of Informatics, University of Oslo, Oslo, Norway N-0316
Dept. of Electronics and Telecommunications, Norwegian University of Science and
Technology, Trondheim, Norway N-7491
Presented by:
Iffat Anjum(Roll: 16)
Nazia Alam(Roll: 28)
15th Batch.
Date:26 th April, 2012
Green Networking Research Group
Dept. of Computer Science and Engineering, University of Dhaka
Slide 1

CCoonntteennttss
 Contribution.
 Problem Definition.
• Related works.
• Biomedical Sensor Networks
• Reinforcement Learning
• Q-learning
 Design of RL-QRP
• Local Information Exchange
• Q-learning Implementation
• Learning-Based Routing Algorithm
 Performance Evaluation.
 Limitation.
2
Slide 2

CCoonnttrriibbuuttiioonnss
 In RL-QRP, optimal routing policies can be found
through experiences and rewards without the need of
maintaining precise network state information.
 Considering impact of network traffic load and sensor
node mobility on the network performance, RL-QRP fits
well in dynamic environments.
 RL-QRP performs well in terms of a number of QoS
metrics and energy efficiency in various medical
scenarios.
3
Slide 3

PPrroobblleemm DDeeffiinniittiioonn
The main function of biomedical sensor networks is ,
Ensuring that data packets can be sensed and delivered to the
medical server reliably and efficiently.
Related works
A number of QoS support routing protocols have been proposed for
wireless sensor networks recently,
 INSIGNIA, supported in mobile ad hoc networks, framework
is based on in-band signaling and soft-state resource
management. But not suitable for biomedical sensor networks
for the inflexible nature of resource reservation scheme.
Green Networking Research Group 4

 CEDAR, is a core-extraction distributed ad hoc routing algorithm
for QoS routing in ad hoc network environments. But the core
could be the bottleneck of the network, the selection and
maintenance of the core use extra network resources.
 AdaR, adaptively learns optimal strategy to achieve multiple
optimization goals. But how to map diverse QoS requirements
into concrete Q-values is not defined.
Most of the previous QoS support routing protocols suffer .
Heavy communication overhead.
Computation burden of complicated algorithms.
5
Slide 5
Related works

 A biomedical sensor network is deployed in a certain area, Sensor
nodes are implanted or attached to patients body, Sink nodes are
deployed in fix positions.
 Biomedical sensor networks have the following features:
 Dynamic network topology : sensor node may leave, join or
dead (run out of battery);
 Time-varying wireless channel with serious electrical
interferences;
 Each sensor node has different QoS requirements , duty cycle,
packet arrival rate and forwarding willingness.
6
Slide 6
Biomedical Sensor Networks

 Mobile nodes are aware of its geographic location , either using
global positioning system (GPS) or distributed localization
services.
 Each node is aware of its immediate neighbors (within its radio
range) and their locations using beacon exchanges.
 Mobile sensor nodes follow the Random Waypoint Mobility Model
(RWMM), for the network mobility.
 This paper focus on 2 types of QoS requirements,
Packet delivery ratio.
End-to-end delay.
7
Slide 7
Biomedical Sensor Networks

8
Slide 8
Reinforcement Learning
Figure: A reinforcement learning model.

 The concept of Reinforcement Learning is Markov Decision
Process.
 A MDP models an agent with a tuple (S,A,P,R).
• S is the set of states,
• A is a set of actions,
• P(s` |s, a) is the transition model that describes the probability of
entering state s` after executing action a at state s.
• R(s, a, s` ) is the reward obtained when the agent executes a at s and
enter s`.
 The goal of solving a MDP is to find an optimal policy , π : S → A,
that maps states to actions such that the cumulative reward is
maximized.
9
Slide 9
Reinforcement Learning

 A model-free method which calculates function Q(s, a) to find an
optimal decision policy.
 Each time an action a is executed, the agent receives an
immediate reward r from the environment.
• Q(s, a) denotes the quality of action a at state s, α is the
learning rate. And the weight of future rewards is modeled by
γ.
• Q(s`, a`) is the expected future reward at state s` by taking
action a`.
10
Slide 10
Q-learning

DDeessiiggnn ooff RRLL--QQRRPP
 The QoS routes computation and selection are based on a
distributed reinforcement learning algorithm.
 Sensor node calculates the route independently and individually.
 The Q-value Q(s, a) stands for the quality (progress has been
made) of the action a at state s.
11
Slide 11
Figure: Reinforcement learning based routing model.

 Each node will check the Qos requirement of the data packet and its
Q-value table.
 The node then checks if it can make a certain progress of the data
packet, if so, it will forward the packet to one of its neighboring nodes
with the highest Q-value; if not, the packet will be dropped or sent
with ‘best effort’.
The local information exchange are facilitated using beacon
exchanges with 1-hop neighboring sensor nodes. Which contains,
12
Slide 12
QoS Support Consideration
Local Information Exchange
 Position Information Exchange.
 Q-values Exchange.

Q-learning Implementation
 State: S = {si}, i= 1,2...N. N is the number of sensor nodes. Each
node is a state s ∈ S.
 Action: A = {a(sj |si)}, si, sj ∈ S. Execution of a(sj |si) means that a
packet is forwarded from state si to sj , provided si and sj are within
each other’s communication range.
 Reward function: R = prg(Pn).
Rn is the reward of execution of the action, which describes the
progress has been made of forwarding data packet Pn.
13
Slide 13

 The reward of an action is implemented using ACK scheme.
When node sj receives a packet from node si, sj will acknowledge
the packet by sending an ACK packet.
 By calculating the1-hop delay, and the ratio of the number of ACK
received divided by the number of data packets sent, si can estimate
the link properties between si and sj.
14
Slide 14
Q-learning Implementation
Tsisj is the experienced delay between node si and sj ,

15
Slide 15
Learning-Based Routing Algorithm

16
Slide 16
Learning-Based Routing Algorithm

PPeerrffoorrmmaannccee EEvvaalluuaattiioonn
Fig: Average end-to-end delay Fig: Average packet delivery
to the sink node. ratio to the sink node.
17
Slide 17

PPeerrffoorrmmaannccee EEvvaalluuaattiioonn
Fig: The impact of node mobility Fig:The impact of network traffic
on average packet delivery ratio. load on average end-to-end
delay.
18
Slide 18

LLiimmiittaattiioonn
 RL-QRP has neglected many common QoS requirements
like network lifetime, throughput, connectivity etc.
 Sensor nodes does not consider the interactions between
itself and other sensor nodes, but this approach is not
sufficient to achieve global optimization.
• Sensor nodes should consider the interactions with
both the environment and the other nodes in the
network, and cooperatively calculate the QoS
routes in the context of multi-agent reinforcement
learning (MaRL) framework.
19
Slide 19

TTHHAANNKK YYOOUU
Green Networking Research Group 20

A reinforcement learning based routing protocol with qo s support for biomedical sensor networks

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Viewers also liked

Viewers also liked (15)

Similar to A reinforcement learning based routing protocol with qo s support for biomedical sensor networks

Similar to A reinforcement learning based routing protocol with qo s support for biomedical sensor networks (20)

More from Iffat Anjum

More from Iffat Anjum (20)

Recently uploaded

Recently uploaded (20)

A reinforcement learning based routing protocol with qo s support for biomedical sensor networks