Test-bench for Task Offloading Mechanisms: Modelling the rewards of Non-stationary nodes

Test-bench for Task Oﬄoading Mechanisms: Modelling
the Rewards of Non-stationary Nodes
Aniq Ur Rahman, Sarnava Konar, Ayan Banerjee
Department of Electronics and Communication Engineering
National Institute of Technology Durgapur, India

Overview
1 Motivations
2 Non-stationary Rewards and Task Oﬄoading
3 Modelling Using Markov Chain
4 Transition Probability Matrix
5 State Probability Vector
6 Creating Transition Probability Matrix
7 Why the Interval Rate is a Non-Homogeneous Poisson Process
8 Simulation Results
9 State Jump
10 Framework
11 Future Plan

Motivations
Many works which deal with task offloading, define their own models
for the resource availability of the nodes, which makes direct
comparison difficult.
No standard has been set, which emulates the non-stationary
behaviour of the nodes.
Investigations, which treat the task offloading problem in the
multi-armed bandit framework, often consider a non-stationary
scenario where the amount of resources available is sampled from a
fixed distribution at regular intervals of time. This simple treatment
makes the solution less realistic.
Thus, here we present a technique for dynamic modelling of a
time-evolving quantity.

Non-stationary Rewards and Task Offloading
Major reason for task offloading is to minimize the system latency.
Therefore, the amount of resources available directly translates to the
reward of the system.
Learning algorithms like UCB and Thompson Sampling are applied to
find the most reliable node. The performance of these algorithms
varies significantly, on varying the reward distribution and the interval
at which the behaviour changes.
This calls for a standard model of reward distribution across time,
which can serve as a test-bench for comparing algorithms.
Non-stationary reward here means time-evolving reward.

Modelling Using Markov Chain
We model the server nodes’ non-stationary reward distribution as a
Markov chain in which the state transitions occur at an interval deﬁned as
a non-homogeneous Poisson process.
We consider the total space set ∈ [0,1], where 0 is the minimum
reward and 1 is the maximum reward.
The space set is divided into N equal slots X{x1, x2, x3, ..., xN}, where
xi ∈ [i−1
N , i
N ) are the N states of the system.
The probability of the server being in state xi is denoted by pi .
When the server has the state xi , the reward r is obtained by drawing
from a uniform probability distribution in the range of xi , i.e.,
r ∼ U(xi ).

Transition Probability Matrix
For demonstration purpose, we set N = 4 and deﬁne the transition
probability matrix as:
T4(p1, p2, p3, p4) =
x1 x2 x3 x4
x1 a11 a12 a13 a14
x2 a21 a22 a23 a24
x3 a31 a32 a33 a34
x4 a41 a42 a43 a44
where aij denotes the probability of transition from state xi to xj and it is
constrained to the property: 4
j=1 aij = 1.

State Probability Vector
The transitional probability matrix elements [aij ]N×N are functions of
the state probability vector [pi ]N.
N
i=1
aik
N
j=1
N
i=1
aij = pk
Another constraint:
N
j=1
aij = 1
We have N2 values to ﬁnd and only 2N equations to aid us. Thus,
2N = N2, i.e., N = 2.
Thus to remove this constraint of having only a 2 state Markov
process we introduce the following condition for having a N state
Markov process.
T2(p1, p2) =
x1 x2
x1 p1 p2
x2 p1 p2

State Probability Vector (contd.)
T2(p1, p2) =
x1 x2
x1 p1 p2
x2 p1 p2
This solution involves the simple relation aij = pj , which satisﬁes both
the constraints. This way, there are no constraints on the value of N,
except N > 0.

Creating Transition Probability Matrix
We ﬁrst need the probability density function of reward. We take the
pdf of the reward y = h(r), where r is the reward.
In an N-state system, we ﬁnd the probability pi as:
pi =
i
N
i−1
N
h(r) · dr
1
0 h(r) · dr
This set {p1, p2, ..., pN} is used to generate the transition probability
matrix, and hence shape the Markov model, as per the following
relation:
aij = pj ∀i, j ∈ [1, N]

Why the Interval Rate is a Non-Homogeneous Poisson
Process
The state transitions occur at an interval of τ, which is a
non-homogeneous Poisson distributed random variable with a sinusoidally
varying interarrival rate.
τ ∼ P(Λ + β sin ωt)
where P denotes a Poisson distribution. The parameter Λ is the mean
interval, β is the swing parameter and ω is the fluctuation frequency.
Poisson distributions are used to model various processes as they are
analytically tractable. However, not all processes are exponentially
distributed, as modelled by Poisson distribution.
Poisson distributions fail to capture the bursts in traffic. For this
reason, instead of fixing the interarrival rate, we have made it a
sinusoidal function.

Simulation Results
For demonstrating the usage of the framework, we set h(r) = N(µ, σ) 1,
N = 4, Λ0 = 300 s, ω0 = 2π/3600 and time-step of 1 s. There are three
long-term reward distribution models {M1, M2, M3} and three
state-transition behaviour models {T1, T2, T3}.
Parameter M1 M2 M3
µ 0.2 0.5 0.8
σ 0.3 0.2 0.3
Table: Long-term reward distribution.
Parameter T1 T2 T3
Λ Λ0 Λ0/2 2Λ0
β Λ0/3 Λ0/3 Λ0
ω 3ω0 2ω0 ω0
Table: State-transition behaviour.
1
N represents a Gaussian distribution

Simulation Results (contd.)
11 22 33 44
State
0.0
0.1
0.2
0.3
0.4
0.5
Probability
(a) M1
0 2500 5000 7500 10000 12500 15000 17500 20000
Time [sec]
11
22
33
4
State
(b) M1-T1
11 22 33 44
State
0.0
0.1
0.2
0.3
0.4
0.5
Probability
(c) M2
0 2500 5000 7500 10000 12500 15000 17500 20000
Time [sec]
11
22
33
4
State
(d) M2-T1
11 22 33 44
State
0.0
0.1
0.2
0.3
0.4
0.5
Probability
(e) M3
0 2500 5000 7500 10000 12500 15000 17500 20000
Time [sec]
11
22
33
4
State
(f) M3-T1
Figure: State transition characteristics for three contrasting reward distributions
in a 4-state system. M1 resembles a busy node, whereas M3 resembles a node
which is mostly available. M2 represents a node, which is moderately available.

Simulation Results (contd.)
150 200 250 300 350 400 450
Transition Interval [sec]
0.0
0.2
0.4
0.6
0.8
1.0
Probability[x10
−2
]
(a) T1
0 2500 5000 7500 10000 12500 15000 17500 20000
Time [sec]
11
22
33
4
State
(b) M2-T1
0 50 100 150 200 250 300
0.0
0.2
0.4
0.6
0.8
1.0
Probability[x10
−2
]
(c) T2
0 2500 5000 7500 10000 12500 15000 17500 20000
Time [sec]
11
22
33
4
State
(d) M2-T2
200 300 400 500 600 700 800 900 1000
0.0
0.1
0.2
0.3
0.4
0.5
Probability[x10
−2
]
(e) T3
0 2500 5000 7500 10000 12500 15000 17500 20000
Time [sec]
11
22
33
4
State
(f) M2-T3
Figure: State transition characteristics for three diﬀerent sets of state transition
frequency parameters.

State Jump
Height of state jump: HN(xi , xj ) = |j−i|
N
Mean height of state jump: ¯HN = E[HN(xi , xj )] = γ 1
N
From graph it is evident that ¯HN is a linear function of 1/N.
0.0 0.1 0.2 0.3 0.4 0.5
1/N
0.4
0.3
0.2
0.1
0.0
̄HN
Figure: µ = 0.2, σ = 0.3, γ = 0.829
In general, a higher value of N resembles a less dynamic scenario,
where the rewards fluctuate steadily. Whereas, for lower value of N,
the reward fluctuation and state jumps affect the learning algorithm
adversely.

Framework
Open source framework at https://github.com/Aniq55/genMarkov.
The library can be readily imported and used to generate resource
availability data based on parameter values deﬁned by user-input.
The state transition characteristics can also be visualized with the
help of inbuilt functions.

Future Plan
Analyze the resource monitor information of real cloud servers, to
assess how well our framework ﬁts the actual measurements.

Test-bench for Task Offloading Mechanisms: Modelling the rewards of Non-stationary nodes

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Test-bench for Task Offloading Mechanisms: Modelling the rewards of Non-stationary nodes

Similar to Test-bench for Task Offloading Mechanisms: Modelling the rewards of Non-stationary nodes (20)

Recently uploaded

Recently uploaded (20)

Test-bench for Task Offloading Mechanisms: Modelling the rewards of Non-stationary nodes