Machine Learning and Stochastic Geometry: Statistical Frameworks Against Uncertainty in Wireless LANs --- Part A

Machine Learning and Stochastic Geometry:
Statistical Frameworks Against Uncertainty in
Wireless LANs
Koji Yamamoto and Takayuki Nishio
Graduate School of Informatics, Kyoto University
2019-05-24 — IEEE ICC 2019 Tutorial

Contents of this tutorial
Part A — Deep Reinforcement Learning and Stochastic Geometry for
microwave WLANs
Koji Yamamoto — 90 min
I. Wireless LANs — IEEE 802.11ax and 11be
II. Deep Reinforcement Learning — Channel Allocation in WLANs
III. Stochastic Geometry — Modeling and Analysis of WLANs
Break
Part B — Deep Learning for/in mmWave WLANs
Takayuki Nishio — 90 min
2 / 88

Radio parameters
Transmission power
Frequency channel
and states
topology
fading
−−−−−−−−→ Communication quality
Throughput
Delay
x
Model f
−−−−−−−−→ y = f(x)
We would like to
▶ maximize the communication quality f(x)
▶ ﬁnd the optimal parameter arg max
x
f(x)
3 / 88

Model f
is given?
Optimization/
Game theory
yes
Data (xi, yi)
is given?
yi = f(xi) + noise
no
Supervised
learning
yes
Reinforcement
learning
no
Deep learning
(CNN)
xi is image
Deep RL
xi is image
Part A-III
Part A-IIPart B
4 / 88

Goal of this tutorial
Deep understanding of important ideas
▶ Each of ML, RL, SG, and WLANs can be one tutorial
▶ A deep understanding of important techniques helps you to
understand the relevant techniques yourself
Machine learning and stochastic geometry analysis
▶ Mainly for resource management and system design
5 / 88

Part I
Wireless LANs — IEEE 802.11ax and
11be
6 / 88

Maximum data rate in IEEE 802.11
11
11a
11b
11g
11n
11ac
11ax
11be(bit/s)
1995 2000 2005 2010 2015 2020
1M
10M
100M
1G
10G
7 / 88

PHY (cited from [Khorov+2019, Table II])
Legacy 802.11ax
OFDM Constellation Order 256-QAM (11ac) 1024-QAM
OFDM Symbol Duration 3.2 µs 12.8µs
OFDM Guard Interval 0.4 or 0.8 µs 0.8, 1.6 or 3.2 µs
Maximum Data Rate ≈ 7 Gbit/s ≈ 9.6 Gbit/s
8 / 88

The 802.11ax Challenge is
Dense Networks
[Khorov+2019]
9 / 88

0
0.2
0.4
0.6
0.8
1
1 10 100 1000 10000
Probabilityofcollisions
Number of transmitters
Bianchi model (saturated traﬃc) [Bianchi2000]
IEEE 802.11a, Frame size: 1500 B, Data rate: 6 Mbit/s
11 / 88

Channel Access and OBSS Management
(cited from [Khorov+2019, Table II])
Legacy (802.11ac and earlier) 802.11ax
Basic Channel
Access
CSMA/CA OFDMA on top of CSMA/CA
Random Channel
Access
DCF, EDCA UL OFDMA Random Access
on top of CSMA/CA
Contention-free
Access
PCF, HCCA (not
implemented in real devices),
RAW (11ah)
Trigger-based UL OFDMA
MU Technology MU-MIMO (11ac) MU-MIMO, OFDMA
MU Transmission
Direction
DL (11ac) DL and UL
Fragmentation Static Flexible
Interference
Mitigation
NAV, RTS/CTS,
HCCA TXOP Negotiation
Two NAVs, Quiet Period
Spatial Reuse Sectorization (11ah) Adaptive Power and Sensitiv-
ity Thresholds, Color
12 / 88

Spatial Reuse Using BSS “Color”
Start
Channel
Transmit
Idle
End
Busy
(11ac)
“Color” < OBSS PD
Busy
(11ax)
Same BSS
OBSS
Yes
No
▶ “Color” header
▶ Parameter: Carrier sense threshold for OBSS (OBSS PD)
▶ As a result, channel is spatially reused (spatial reuse).
OBSS: Overlapped Basic Service Set, i.e., the other adjacent cells 13 / 88

IEEE 802.11ax IEEE 802.11be
Will be finalized in 2020 (in schedule) Starts in 2018
High Efficiency WLAN (HEW) Study
Group
Extremely High Throughput (EHT)
Study Group
Wi-Fi 6 Wi-Fi 7 (?)
http://www.ieee802.org/11/
Reports/tgax_update.htm
http://www.ieee802.org/11/
Reports/ehtsg_update.htm
[Khorov+2019] E. Khorov, A. Kiryanov, A.
Lyakhov, and G. Bianchi, “A tutorial on
IEEE 802.11ax high efficiency WLANs,”
IEEE Commun. Surveys Tuts., vol. 21,
no. 1, 2019
▶ At least 30 Gbit/s
▶ 320 MHz bandwidth and
non-contiguous spectrum
▶ Multi-band/multi-channel
aggregation
▶ 16 spatial streams MIMO
▶ Multi-Access Point Coordination
▶ Enhanced link adaptation and
retransmission protocol
14 / 88

Throughput Starvation
BA C D E
C
D
E t
Busy
First order starvation
[Mhatre+2007]
Flow-in-the middle starvation
[Garetto+2008]
Second order starvation [Mhatre+2007]
Under saturated traﬃc, A and D always detect signals thus they cannot
start transmissions
15 / 88

Goal of performance modeling
▶ Points represent TXs or
TX-RX pairs.
▶ Circle represents carrier sense
area (contention domain).
We would like to evaluate the following performances without Monte
Carlo simulations to ensure reproducibility
1. Individual throughput in a speciﬁc topology — Graph theory Part I
2. Average throughput over possible topologies — SG Part III
3. Distributions of interference and SIR — SG Part III
16 / 88

Contention graph
TX-RX pair Vertex
Pairs detecting signals each other Vertices are connected with edge
Implicit assumption: TX and RX forming a given link have the same carrier sense
relationship (The situation where one of them detects the carrier but the other does
not detect it is not allowed)
Carrier
detection
Carrier sensing relationship Contention graph
17 / 88

Back-of-Envelope (BoE) throughput [Liew+2010]
Contention graph
0.50.5
0
1
BoE throughput
Normalized throughput
Medium access probability
18 / 88

Model of simultaneous transmissions according to CSMA
Maximum independent sets (MISs)
▶ Induced subgraph without edges with the maximum number of
vertices
▶ Possible patterns of simultaneous TXs according to CSMA
(Vertex-)induced subgraph
▶ A subset of the vertex set of the original graph and related edge set
Induced subgraph o o o o
MIS x x o o
Wolfram—Alpha: Subgraph, Independent Set
19 / 88

n1 = 2
n1/n = 2/2 = 1
n2 = 0
n2/n = 0/2 = 0
n3 = 1
n3/n = 1/2 = 0.5
1. Find all MISs of the contention graph
(possible simultaneous transmission patterns)
2. The normalized throughput of link (vertex) i is determined as ni/n,
where
▶ n is the number of MISs and
▶ ni is the number of MISs containing i.
We assume that all possible patterns occur with equal probability
20 / 88

Example of BoE throughput
0.6
0.0
1.0
0.4
0.2
1.0
0.3
0.2
0.8
0.3
1.0
0.6
1.0
0.4
0.3
0.8
21 / 88

1 require("igraph")
2 require("spatstat")
3 require("plotrix")
4 set.seed(9)
5
6 LAMBDA <- 2e-5
7 RADIUS <- 200
8 W.LENGTH <- 500
9 W <- owin(xrange = c(-W.LENGTH, W.LENGTH), yrange = c(-W.LENGTH, W.LENGTH))
10
11 # Poisson point process
12 P <- rpoispp(lambda = LAMBDA, win = W)
13
14 # Adjacency matrix
15 D <- pairdist(P)
16 adjacency.matrix <- (D<RADIUS)
17 diag(adjacency.matrix) <- 0
18
19 # Contention graph
20 adjacency.graph <- graph.adjacency(adjacency.matrix, mode="undirected")
21
22 # Maximum independent sets
23 MISs <- largest_ivs(adjacency.graph)
24
25 # Evaluate throughput
26 ni <- 0
27 for ( i in 1:(P$n) ) {
28 N.MIS <- 0
29 for ( j in 1:length(MISs) ) {
30 N.MIS <- N.MIS + sum(MISs[[j]]==i)
31 }
32 ni[i] <- N.MIS
33 }
34
22 / 88

35 # Plotting figures
36 quartz(type="pdf", file="160408_BoE_simple_position.pdf")
37 par(pty="s",mar=c(0, 0, 0, 0)+0.1)
38 plot(P$x, P$y, xlim=c(-W.LENGTH,W.LENGTH), ylim=c(-W.LENGTH,W.LENGTH))
39 for (i in 1:(P$n) ) {
40 draw.circle(P[i]$x,P[i]$y,200,border=NA,col=adjustcolor( "blue", alpha.f = 0.1))
41 }
42 dev.off()
43 quartz(type="pdf", file="160408_BoE_simple.pdf")
44 par(pty="s",mar=c(0, 0, 0, 0)+0.1)
45 plot(P$x, P$y, xlim=c(-W.LENGTH,W.LENGTH), ylim=c(-W.LENGTH,W.LENGTH))
46 for ( i in 1:(P$n-1) ) {
47 for ( j in (i+1):(P$n) ) {
48 if (adjacency.matrix[i,j]==1) {
49 segments(P[i]$x, P[i]$y, P[j]$x, P[j]$y, col="blue", lwd = 2)
50 }
51 }
52 }
53 text(P$x, P$y, sprintf("%.1f",ni/length(MISs)), pos=1)
54 dev.off()
23 / 88

Potential game-based channel allocation [Yin+2017]
0
200
400
600
800
1000
1200
0 200 400 600 800 1000 1200
Location(m)
Location (m)
Proposed scheme
0
200
400
600
800
1000
1200
0 200 400 600 800 1000 1200
Location(m)
Location (m)
Compared scheme
▶ Cost function taking into account the number of three-node chains
which result in ﬂow-in-the-middle starvation
▶ Cost function is designed to be a potential game [Yamamoto2015],
i.e., channel selection dynamics is proved to converge to a Nash
equilibrium
B. Yin, S. Kamiya, K. Yamamoto, T. Nishio, M. Morikura, and H. Abeysekera, “Mitigating throughput
starvation in dense WLANs through potential game-based channel selection,” IEICE Trans. Fundamentals,
vol. E100-A, no. 11, Nov. 2017 24 / 88

Summary (Part I)
▶ Spatial reuse using “color” is a promising technology in IEEE
802.11ax
▶ The next standard — IEEE 802.11be
▶ Throughput starvation problem
▶ Back-of-envelope throughput evaluation
25 / 88

Part II
Deep Reinforcement Learning —
Channel Allocation in WLANs
26 / 88

BoE throughput evaluation takes into account
▶ Adjacency (carrier sensing relationship) among stations
▶ Saturated traffic
But there are many other factors
▶ Unsaturated traffic and its fluctuations
▶ Random backoff algorithm in CSMA/CA
▶ Fading and its impact on adjacency
▶ Other factors
Optimization based on observations
↓
Machine learning
27 / 88

Machine Learn-
ing (ML)
Unsupervised
Learning
Supervised
Learning
Deep
Learning
Reinforcement
Learning
(RL)
Multi-
armed
Bandit
Q-learning
Deep RL
Multi-
agent RL
Adversarial
RL
28 / 88
Which framework can we utilize
for performance improvement in WLANs?
From a viewpoint of radio resource
management

Reinforcement
Learning (RL)
Multi-armed
Bandit
Q-learning
Deep RL
Multi-
agent RL
Adversarial
RL
“Deep RL with graph convolution”
— My opinion at this point
Multi-agent RL is also attractive because WLANs are decentralized
networks. However, at this point,
▶ Multi-agent RL is mainly developed for cooperative control systems
▶ It is hard to apply deep learning with recent progress
29 / 88

Deep RL
with graph convolution
RL
Q-learning
Deep learning
Function approximation
with deep neural networks
Graph convolution
30 / 88

Deep RL
with graph convolution
RL
Q-learning
31 / 88

Motivation — Optimal Channel Allocation
Optimal channel allocation?
Criterion: Aggregated throughput
Number of channels: 2
1
2
3
4
5
1
2
3
4
5
1
1
1
1
1
1
2
3
4
5
1
1
1
1
1← BoE throughput
Framework: Combinatorial
optimization (NP-hard)
BoE throughput is shown for the ease of explanation.
32 / 88

Motivation — Optimal Sequence
Different problem setting:
▶ Finding the minimal sequence
▶ Only one AP can change its channel at a given time
▶ Throughput can be observed only after channel allocation
Initial state Optimal states
1
2
3
4
5
1
0
1
0
1
We would like to
find the minimum
sequence.
1
2
3
4
5
1
1
1
1
1
1
2
3
4
5
1
1
1
1
1
This part is simplified version of [Nakashima+2019]
33 / 88

Channel Allocation Sequence
1
2
3
4
5
1
0
1
0
1
1
2
3
4
5
1
1
1
0
1
1
2
3
4
5
1
1
1
1
1
←
We want to ac-
quire this sequence
based on observed
throughput
Criterion?
1
2
3
4
5
1
0
1
0
1
1
2
3
4
5
0.5
0.5
1
0.5
0.5
1
2
3
4
5
1
1
1
0.5
0.5
1
2
3
4
5
1
1
1
1
1
t = 0 t = 1 t = 2 t = 3 34 / 88

Automated driving system — Self-driving cars
https://www.metacar-project.com/
Markov decision process
(MDP)
Agent Environment
Action
State=Position
Reward
Metacar: “A reinforcement learning environment for self-driving cars in
the browser”
35 / 88

Problem
At time t, the agent has a sequence of observations (
state
S0, A0
action
,
reward
R1, S1),
(S1, A1, R2, S2), . . . , (St, At, Rt+1, St+1)
Based on the observations, what action At+1 should the agent take to
get higher reward?
The agent (decision maker) should have the criterion
36 / 88

Total reward and optimal action-value function
Criterion of agent: Expected total discounted reward
with discount factor 0 ≤ γ ≤ 1
E
[ ∞∑
t=0
γt
Rt+1
]
By using the optimal action-value function
Q∗
(s, a)
.
= max
π
Eπ
[ ∞∑
t=0
γt
Rt+1 S0 = s, A0 = a
]
, s ∈ S, a ∈ A
The next action should be determined as
At+1 = arg max
a∈A
Q∗
(St+1, a)
.
= denotes “equality relationship that is true by deﬁnition” [Sutton+2018]
Eπ
[·] denotes the expected value of a random variable given that the agent follows policy π. [Sutton+2018]
37 / 88

For the following initial state S0, the following sequence starting with
A0 = “AP 4 uses CH 2” is optimal because the sequence after t = 1 is optimal
t = 0 t = 1 t = 2 · · ·
State St
1
2
3
4
5
1
1
1
0
1
1
2
3
4
5
1
1
1
1
1
1
2
3
4
5
1
1
1
1
1
· · ·
Action At AP 4 uses CH 2
Reward Rt R1 = 5 R2 = 5 · · ·
Q∗
(
S0 =
1
2
3
4
5
1
1
1
0
1
, A0 = AP 4 uses CH 2
)
= R1 + γR2 + γ2
R3 + · · ·
= 5 + γ5 + γ2
5 + · · · =
5
1 − γ
= 50 (γ = 0.9)
38 / 88

Part of Optimal Q-table
For the initial state
1
2
3
4
5
1
1
1
0
1
CH
1 2
AP
1 4 + γ5 + γ25 + · · · = 49 3 + γ4 + γ25 + · · · = 47.1
2 3 + γ4 + γ25 + · · · = 47.1 4 + γ5 + γ25 + · · · = 49
3 4 + γ5 + γ25 + · · · = 49 3 + γ4 + γ25 + · · · = 47.1
4 4 + γ5 + γ25 + · · · = 49 5 + γ5 + γ25 + · · · = 50
5 4 + γ5 + γ25 + · · · = 49 4 + γ4 + γ25 + · · · = 48.1
γ = 0.9
▶ The maximum total reward is achieved starting with “AP 4 uses CH
2”.
▶ If we have the optimal Q-table, we can take the optimal action.
39 / 88

▶ The optimal action-value function Q∗ is unknown
▶ We need to estimate Q∗ from observations
New problem
Based on observations, how can we estimate Q∗?
▶ Qt(s, a): Estimate of Q∗(s, a) at time t
40 / 88

(Tabular) Q-learning [Watkins+1992]
Updated estimate
from observations
↓
Qt+1(St, At)
.
=
Estimate at t
↓
Qt(St, At) + αt δt+1(Qt)
δt+1(Qt)
.
= Rt+1 + γ max
a′∈A
Qt(St+1, a′
) − Qt(St, At)
▶ (St, At, Rt+1, St+1): Observation
▶ 0 ≤ αt ≤ 1: (Small) non-negative number
Before observation After observation
Qt(St, At)
Rt+1 + γ max
a′∈A
Qt(St+1, a′
)
αt = 0 αt = 1
δt+1(Qt)
Time-diﬀerence error
(If Qt = Q∗, δt+1 = 0 from the Bellman equation)
41 / 88

Feature of Q-learning [Szepesv´ari2010, 3.3.1]
“Under the assumption that every state-action pair is visited inﬁnitely
often, the sequence (Qt; t ≥ 0) converges to Q∗.”
42 / 88

Markov decision process
▶ Agent: Centralized controller of all APs
▶ State: Channels and contention graph of APs
▶ Action: Channel selection of one AP
▶ Reward: Throughput
CH 1
i
Current state
CH 1 CH 2
i
Next state
Action: AP i changes its channel to 2
[Nakashima+2019]
43 / 88

State Space
To estimate Q∗(s, a), every state-action pair should be visited
When the number of states is huge, training is infeasible — State
explosion
Number of APs Number of channels Number of states
4 2 1024
10 3 2 · 1018
44 / 88

Tabular Q-learning (p. 46)
Qt+1(St, At)
.
= Qt(St, At) + αt δt+1(Qt)
δt+1(Qt)
.
= Rt+1 + γ max
a′∈A
Qt(St+1, a′
) − Qt(St, At)
Q-learning with function approximation
Extension to function approximation with a parameterized function Qθ,
θ ∈ Rd
θt+1
.
= θt + αt δt+1(Qθt ) ∇θ Qθt (St, At)
δt+1(Qθt )
.
= Rt+1 + γ max
a′∈A
Qθt (St+1, a′
) − Qθt (St, At)
Deep reinforcement learning
Q-learning with function approximation using deep neural networks
45 / 88

Regression in supervised learning
▶ Model f is not given
▶ Samples {xi, yi}i∈{1,...,n} are available,
where yi = f(xi) + noise
▶ Estimate the output for test input
x ̸∈ {xi}i∈{1,...,n}
x −−−−→ y
∈
∈
Rd f
−−−−→ R
Approach
▶ Consider a form of parameterized function fθ to approximate f
e.g., linear basis function model: fθ(x)
.
=
b∑
j=1
θj
basis
function
ϕj(x) = θ⊤
ϕ(x)
▶ θ is trained (estimated) using observed samples {xi, yi}i∈{1,...,n}
e.g., estimation by minimizing a sum-of-squares error
θ
.
= arg min
θ
n∑
i=1
(
fθ(xi) − yi
)2
.= J(θ) loss function
▶ Estimation for test input x: y
.
= fθ
(x)
46 / 88

Training set
n observations
(xi, yi)i=1,...,n
Learning algorithm
θ
.
= arg min
θ
J(θ)
New input
x
Estimated output
fθ
(x)
Model fθ(·)
Training Regression
47 / 88

Models (parameterized functions)
fθ(x)
Learning algorithms
Linear basis function model
fθ(x)
.
=
b∑
j=1
θj ϕj(x) = θ⊤
ϕ(x)
Stochastic gradient descent
Neural networks
fθ(x)
.
=
b∑
j=1
αj ϕ(x; βj)
ϕ(x; β)
.
=
1
1 + exp(−x⊤w − γ)
θ
.
= (αj, βj)j, β
.
= (w⊤, γ)⊤
Deep neural networks
(CNN, LSTM)
Deep neural networks oﬀer better
function approximation capabilities
compared to shallow networks
Backpropagation
48 / 88

State Design and Feature Extraction [Nakashima+2019]
State: Adjacency of APs and selected channel
Physical expression
CH 2
CH 1
2
1
3
4
5
Graph expression
2
1
3
4
5
Tabular expression
5x7 matrix






0 1 1 1 0
1 0 0 0 0
1 0 0 1 0
1 0 1 0 0
0 0 0 0 0






[
1 0 0 1 1
0 1 1 0 0
]
Ch 1
Ch 2
Feature extraction (pre-processing):
Graph convolutional networks Convolutional neural network (CNN)
Feature extraction for graphs Feature extraction for images
49 / 88

State: 5x7 matrix
input_1: InputLayer
input:
output:
(None, 1, 5, 7)
(None, 1, 5, 7)
lambda_2: Lambda
input:
output:
(None, 1, 5, 7)
(None, 1, 5, 2)
lambda_1: Lambda
input:
output:
(None, 1, 5, 7)
(None, 1, 5, 5)
spectral_graph_convolution_1: SpectralGraphConvolution
input:
output:
[(None, 1, 5, 2), (None, 1, 5, 5)]
(None, 1, 5, 32)
spectral_graph_convolution_2: SpectralGraphConvolution
input:
output:
[(None, 1, 5, 32), (None, 1, 5, 5)]
(None, 1, 5, 16)
batch_normalization_1: BatchNormalization
input:
output:
(None, 1, 5, 32)
(None, 1, 5, 32)
batch_normalization_2: BatchNormalization
input:
output:
(None, 1, 5, 16)
(None, 1, 5, 16)
ﬂatten_1: Flatten
input:
output:
(None, 1, 5, 16)
(None, 80)
dense_1: Dense
input:
output:
(None, 80)
(None, 10)
Q-values for 10 actions (5 APs x 2 CHs)
e
50 / 88

Libraries and model
1 import numpy as np
2 import tensorflow as tf
3 from keras.models import Model
4 from keras.layers import Dense, Activation, Flatten, Input, Reshape, Lambda, Concatenate,
BatchNormalization→
5 from rl.policy import EpsGreedyQPolicy # keras-rl
6
7 def get_model_functional(env, nb_actions, ap_number, ch_number):
8 inputs = Input(shape=(1,) + env.observation_space.shape)
9 adj = Lambda(lambda x: x[:, :, :, :ap_number],
10 output_shape=(1, ap_number, ap_number))(inputs) # adjacency matrix
11 ch = Lambda(lambda x: x[:, :, :, ap_number:],
12 output_shape=(1, ap_number, ch_number))(inputs) # channel matrix
13 # GraphConvolution is not provided in libraries
14 H = GraphConvolution(units=32, activation='relu',
15 kernel_initializer='he_normal')([ch, adj])
16 H = BatchNormalization()(H)
17 H = GraphConvolution(units=16, activation='relu',
18 kernel_initializer='he_normal')([H, adj])
19 H = BatchNormalization()(H)
20 H = Flatten()(H)
21 predictions = Dense(nb_actions, activation='linear',
22 kernel_initializer='he_normal')(H)
23 model = Model(inputs=inputs, outputs=predictions)
24 return model
51 / 88

0 200000 400000 600000 800000
Step
0.55
0.60
0.65
0.70
0.75
0.80
Reward
▶ Five APs are located uniformly and randomly
▶ Reward: The minimum individual BoE throughput of ﬁve APs
Simpliﬁed version of [Nakashima+2019]
52 / 88

End-to-End Learning of Proactive Handover Policy
for Camera-Assisted mmWave Networks Using
Deep Reinforcement Learning
Yusuke Koda Kota Nakashima Koji Yamamoto
Takayuki Nishio Masahiro Morikura
arXiv:1904.04585

Image-to-decision proactive handover
▶ Determine one BS from two candidate BSs
▶ The output of NNs is Q-value for input images
NN [16]
(Combination of
CNN and
LSTM)
Consequtive input
images
: Index of associated BS
: Remaining time step until handover process is completed : Action
Output layer
Output of NN
prior to output layer
54 / 88

12.2 12.4 12.6 12.8 13.0 13.2 13.4 13.6 13.8 14.0
050150250
Time (s)
DatarateinBS1(Mbit/s)
Data rate degradation
Data rate of BS 1
12.2 12.4 12.6 12.8 13.0 13.2 13.4 13.6 13.8 14.0
40506070
Time (s)
Actionvalue
Action value for selecting BS 1
Handover
to BS 2
Handover
to BS 1
Q-value
for selecting BSs 1 and
2
without camera images
▶ Because the received power degrades sharply, handover decision is
conducted after degradation in general
55 / 88

12.2 12.4 12.6 12.8 13.0 13.2 13.4 13.6 13.8 14.0
40506070
Time (s)
Actionvalue
Handover
to BS 2
Handover
to BS 1
Q-value for
selecting BSs 1 and 2
without camera images
12.2 12.4 12.6 12.8 13.0 13.2 13.4 13.6 13.8 14.0
1015202530
Time (s)
Action−value
Handover
to BS 2
Handover
to BS 1
Action value degradation
ahead of blockage
Q-value for
selecting BSs 1 and 2
with camera images
▶ By extending the state space using camera images, proactive
handover is enabled
56 / 88

Our related works
Deep reinforcement learning with graph convolution
▶ Channel allocation for WLANs [Nakashima+2019]
We utilize
▶ Deep Q network (DQN) [Mnih+2015]
▶ Double Deep Q-network [vHasselt+2016]
▶ Dueling network [Wang+2015]
▶ Prioritized experience replay [Schaul+2015]
Deep reinforcement learning with CNN
▶ Image-based proactive handover for mmWave WLANs
[Koda+2018], [Koda+2019]
57 / 88

References
▶ R. S. Sutton and A. G. Barto, Reinforcement Learning, 2nd ed.
MIT Pr., 2018
▶ C. Szepesv´ari, Algorithms for Reinforcement Learning. Morgan and
Claypool Pub., 2010
▶ K. Arulkumaran, M. P. Deisenroth, M. Brundage, and
A. A. Bharath, “Deep reinforcement learning: A brief survey,” IEEE
Signal Process. Mag., vol. 34, no. 6, Nov. 2017
58 / 88

Summary (Part II)
▶ Reinforcement learning is used to acquire the optimal sequence to
maximize the total reward.
▶ Q-learning can estimate the optimal action-value function Q∗.
▶ Deep neural networks are used for function approximation of Q
function (deep RL).
59 / 88

Part III
Stochastic Geometry — Modeling
and Analysis of WLANs
60 / 88

TX-RX pairs.
1. Individual throughput in a speciﬁc topology — Graph theory Part 1
2. Average throughput over possible topologies — SG
3. Distributions of interference and SIR — SG
61 / 88

0.51
0.31
0.43
0.69
0.09
0.23
0.27
0.27
0.62
0.43
0.65
0.57
0.11
0.60
0.36
0.43
All points Potential TXs Poisson point process (PPP)
Φ
Numbers Backoﬀ counter Uniform distribution
Circles Contention domain
Points with disks Transmitting TXs Mat´ern hardcore point pro-
cess (MHCPP) ΦM
62 / 88

Poisson point process (PPP) Φ
Φ
.
= {x} A set of locations of random points (point process)
Φ(B) A r.v. representing the number of points of Φ in region B,
i.e., Φ(B)
.
= #{Φ ∩ B}
Φ is a set of independent random points with homogeneous density λ
⇐⇒ Φ is a PPP with intensity (density) λ
⇐⇒ Φ(B) ∼ Pois(λ|B|), i.e., Poisson distribution with intensity λ|B|
⇐⇒ P
(
Φ(B) = n
)
=
e−λ|B|(λ|B|)n
n!
, n = 0, 1, . . .
λ = 1, area of square grid is 1, total area is 40
63 / 88

Probabilities derived assuming PPP
▶ Probability that Φ has no point in B
P
(
Φ(B) = 0
)
= e−λ|B|
▶ Probability that Φ has at least one point in B
P
(
Φ(B) ≥ 1
)
= 1 − e−λ|B|
Φ is a PPP with intensity λ
⇐⇒ Φ(B) ∼ Pois(λ|B|)
⇐⇒ P
(
Φ(B) = n
)
=
e−λ|B|(λ|B|)n
n!
, n = 0, 1, . . .
64 / 88

Generating Realizations of PPP
Both codes generates the ﬁgure on p. 68
B
.
= { (x, y) | 0 ≤ x ≤ 10, 0 ≤ y ≤ 4 }, i.e., |B| = 40, and λ = 1
General way in R or other languages
n <- rpois(1, lambda=40)
x <- runif(n, min=0,max=10)
y <- runif(n, min=0,max=4)
plot(x,y)
1. For region B, generating a realization n of Poisson random variable
with expectation λ|B| (= lambda in the code)
2. Generating n uniformly distributed points in region B
Package “spatstat” [Baddeley+2015] in R
P <- rpoispp(lambda=1, win=owin(xrange=c(0,10), yrange=c(0,4)))
plot(P)
For more complicated point processes, this package is useful.
65 / 88

CSMA
0.51
0.31
0.43
0.69
0.09
0.23
0.27
0.27
0.62
0.43
0.65
0.57
0.11
0.60
0.36
0.43
Points Potential TXs Poisson point process Φ
Numbers Backoff counter m(x) Uniform distributed r.v.
〇 Contention domain b(x, r) Disc with radius r
● Contended TXs Φ ∩ b(x, r)
● Simultaneous TXs Matérn hardcore point process ΦM
→ Interference
RXs are omitted in figures. 66 / 88

Contention domain
〇 b(x, r)
▶ Disc with radius r centered at x
▶ Model of contention domain of x ∈ Φ, i.e., area where the received
power is grater than the carrier sense threshold
67 / 88

Contended TXs
● Locations of contended TXs with TX x
Φ ∩ b(x, r)
Number of Contended TXs with TX x
Φ(b(x, r)) ∼ Pois(λ|b(x, r)|) = Pois(λπr2)
68 / 88

Backoff and simultaneous TXs
0.51
0.31
0.43
0.69
0.09
0.23
0.27
0.27
0.62
0.43
0.65
0.57
0.11
0.60
0.36
0.43
m(x)
▶ “Mark”
▶ Uniformly distributed r.v. on [0, 1] for each point x ∈ Φ
▶ Model of backoff counter
● MHCPP ΦM
.
= { x ∈ Φ : m(x) < m(y) for all y ∈ Φ ∩ b(x, r) {x} }
▶ TXs with smallest backoff counter in their contention domains
▶ Model of a set of simultaneous TXs
69 / 88

Expected normalized throughput (medium access probability)
P(x ∈ Φ : x ∈ ΦM) = 1−e−λπr2
λπr2
▶ Probability that a point in PPP retains in MHCPP
▶ Probability that a potential TX has the smallest backoff counter, in
other words, no neighboring TXs have smaller backoff counter, in its
contention domain, and thus can transmit
P(x ∈ Φ : x ∈ ΦM) = P
(
#{ z ∈ Φ ∩ b(x, r) | m(z) < m(x) } = 0
)
Conditioning on m(x) = t and considering ˜Φ
.
= { z ∈ Φ | m(z) < t }
▶ TXs with backoff counter less than t
▶ Intensity of ˜Φ is tλ
▶ ˜Φ(b(x, r)) ∼ Pois(tλ|b(x, r)|) = Pois(tλπr2)
P
(
#{ z ∈ Φ ∩ b(x, r) | m(z) < t } = 0
)
= P
(
˜Φ(b(x, r)) = 0
)
= e−tλπr2
P
(
#{ z ∈ Φ ∩ b(x, r) | m(z) < m(x) } = 0
)
=
∫ 1
0
e−tλπr2
dt =
1 − e−λπr2
λπr2
[Baccelli+2009, 18] derived medium access probability taking into account fading
70 / 88

Intensity of MHCPP
▶ Intensity of MHCPP λM
.
=
Intensity of
potential TXs
λ ·
1 − e−λπr2
λπr2
=
1 − e−λπr2
πr2
▶ In ultra dense networks,
lim
λ→∞
λM =
1
πr2
,
The intensity of simultaneous transmissions only depends on the
area of contention domain.
71 / 88

Analysis of Inversely Proportional Carrier Sense
Threshold and Transmission Power Setting
Koji Yamamoto1 Xuedan Yang1 Takayuki Nishio1
Masahiro Morikura1 Hirantha Abeysekera2
1
2
NTT Access Network Service Systems Laboratories, NTT Corporation
IEEE CCNC 2017, Jan. 2017

Carrier Sense Threshold Setting for Spatial Reuse
▶ 〇 Contention domain:
RxPower > CST
▶ ● Contended TXs
▶ ● Simultaneous TXs
▶ → Interference
Increasing Carrier Sense Threshold (CST)
▶ ● reduces
▶ → increases
In densely deployed WLANs, CST adjustment is crucial
However, CST setting can cause throughput starvation
RXs are omitted in ﬁgures.
73 / 88

-82
0
-1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
-90
-80
-70
-60
-50
-40
-30
-20
-10
0
10
20
AP1 AP2
Receivedpower(dBm)
Position (m)
p1=13 dBm p2=13 dBm
θ1=-82 dBm θ2=-82 dBm
▶ Received signal power > threshold, θx,
i.e., two APs detect signals from each other.
▶ Two APs time-share the channel.
74 / 88

-82
0
-1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
-90
-80
-70
-60
-50
-40
-30
-20
-10
0
10
20
AP1 AP2
Receivedpower(dBm)
Position (m)
p1=13 dBm p2=13 dBm
θ1=-82 dBm
θ2=-82 dBm
+34 dB
▶ AP2 increases θ2 not to detect signals from AP1,
then AP2 would transmit independently of AP1
▶ AP1 should defer its transmission during transmission of AP2
→ Starvation due to asymmetric carrier sensing relationship
[Mhatre+2007]
75 / 88

-82
0
-1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
-90
-80
-70
-60
-50
-40
-30
-20
-10
0
10
20
AP1 AP2
Receivedpower(dBm)
Position (m)
p1=13 dBm
p2=13 dBm
-34 dB
θ1=-82 dBm
θ2=-82 dBm
+34 dB
▶ Reduction in transmission power along with increase in CST
[Fuemmeler+2006; Mhatre+2007]
Inversely proportional setting (IPS) of CST and transmission power
▶ Symmetric carrier sensing relationship is maintained and APs
spatially reuse the channel
76 / 88

Numerical example: Expected throughput
▶ Carrier sense range is a function of CST and transmission power
0
1
2
3
0 2 4 6 8 10 12 14 16
n=4
n=10
Throughput(bit/s/Hz)
ax= θx /Θ (dB)
Throughput
Asymptotic throughput
Approximation
SIR = 30 dB
α = 3.5
▶ The optimal value of CST ax assuming approximation is close to
that without approximation
▶ [Iwata+2019] extend the work under nearest AP association
M. Iwata, K. Yamamoto, B. Yin, T. Nishio, M. Morikura, and H. Abeysekera, “Analysis of inversely
proportional carrier sense threshold and transmission power setting based on received power for IEEE
802.11ax,” in Proc. IEEE Consum. Commun. Netw. Conf. (CCNC), Las Vegas, NV, USA, Jan. 2019
77 / 88

TX-RX pairs.
1. Individual throughput in a speciﬁc topology — Graph theory Part 1
2. Average throughput over possible topologies — SG
3. Distributions of interference and SIR — SG
78 / 88

SG Analysis of CSMA
▶ [Nguyen+2007; Baccelli+2009] MHCPP type II for CSMA modeling
▶ [Alfano+2011; Alfano+2014] Distribution of throughput
▶ [Busson+2009] Intensity underestimation of MHCPP
▶ [Kaynia+2011] Optimization of carrier sensing threshold
▶ [ElSawy+2012; ElSawy+2013] Modiﬁed hard-core point process
▶ [Venkataraman+2006] Interference distribution from PPP outside
the contention domain
Approximating MHCPP by PPP with intensity λMHCPP
[ElSawy+2012]
79 / 88

SIR distribution in Poisson cellular networks [Andrews+2011]
▶ Positions of base stations (BSs) follows a
Poisson point process (PPP)
▶ Nearest BS association (Poisson-Voronoi cells)
▶ Rayleigh fading
▶ Exponential-decaying path loss with α > 2
Spatial distribution of SIR in downlink can be derived w/o fading
P(SIR > θ) = · · ·
=
1
1 + 2θ
α−2 2F1
(
1, 1 − 2
α ; 2 − 2
α ; −θ
)
=
√
θ arctan
√
θ (α=4)
Closed form when α = 4
0
1
−20 −10 0 10 20 30
P(SIR≤θ)
θ (dB)
−20
−10
0
10
20
30
with fading
80 / 88

Expected sum — Campbell’s theorem for sums [Haenggi2012, §4.2]
Let Φ be a PPP on R2 with intensity λ, and f : R2 → R be a
measurable function. Then,
EΦ
[
∑
x∈Φ
ex. interference from x to origin
↓
f(x)
ex. sum of interference from Φ
]
= λ
∫
R2
f(x) dx
▶ Expected sum can be evaluated as an area integral
▶ Only applicable for PPPs. Thus, PPPs are often assumed
81 / 88

Stochastic Geometry Analysis of Normalized SNR
Scheduling in Downlink Cellular Networks
Takuya Ohto† Koji Yamamoto† Seong-Lyun Kim‡
Takayuki Nishio† Masahiro Morikura†
†Kyoto University ‡Yonsei University
IEEE Wireless Communications Letters, Aug. 2017

1) Round-robin scheduler
A user is selected in rotation independent of the channel state.
2) Proportional fair (PF) scheduler
A user with the largest instantaneous
rate normalized by the short-term aver-
age value,
arg max
i
ratei
E[ratei | ri]
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
Normalizeddatarate
3) Normalized SNR scheduler
A user with the largest instantaneous
SNR normalized by the short-term aver-
age value,
arg max
i
SNRi
E[SNRi | ri]
= arg max
i
hi
0
1
2
3
4
5
6
NormalizedSNR
SNRi
.
= hiri
−α
/σ2
Short-term average SNR: E[SNRi | ri] = ri
−α
/σ2
83 / 88

SIR distributions (success prob. / coverage prob.)
[Andrews+2011]
SIR ccdf under Rayleigh fading channel in Poisson cellular networks
P(SIR > θ) =
1
1 + 2θ
α−2 2F1
(
1, 1 − 2
α
; 2 − 2
α
; −θ
)
√
θ arctan
√
θ (when α = 4)
α: path loss exponent
User is selected randomly independent of channel state.
This performance can be achieved under round-robin scheduling.
Normalized SNR scheduling [Ohto+2017]
P(SIR > θ) ≈
∞∑
n=0
(λu/cλb)n
c B(n + 1, c) (λu/cλb + 1)n+c+1
×
n+1∑
k=1
(n+1
k
)
(−1)k+1
1 + 2kθ
α−2 2F1
(
1, 1 − 2
α
; 2 − 2
α
; −kθ
)
√
kθ arctan
√
kθ (when α = 4)
λb: intensity of BSs, λu: intensity of users, c = 3.5
84 / 88

SIR ccdf (Interference-limited, Rayleigh, and α = 4) [Ohto+2017]
0.0
0.2
0.4
0.6
0.8
1.0
−10 −5 0 5 10 15 20
λu/λb = 4 λu/λb = 8P(SIR>θ)
θ (dB)
Normalized SNR (Analysis)
(Simulation)
Round-robin (Analysis)
Round-robin [Andrews+2011] P(SIR > θ) =
1
1 +
√
θ arctan
√
θ
Normalized SNR P(SIR⋆
> θ) ≈
∞∑
n=0
fN (n)
n+1∑
k=1
(n+1
k
)
(−1)k+1
1 +
√
kθ arctan
√
kθ
85 / 88

Related works
Stochastic geometry analysis of channel-adaptive user scheduling in
cellular networks
Downlink
▶ [Ohto+2017] T. Ohto, K. Yamamoto, S.-L. Kim, T. Nishio, and M. Morikura,
“Stochastic geometry analysis of normalized SNR-based scheduling in downlink
cellular networks,” IEEE Wireless Commun. Lett., vol. 6, no. 4, Aug. 2017
▶ [Yamamoto2018] K. Yamamoto, “SIR distribution and scheduling gain of
normalized SNR scheduling in Poisson networks,” in Proc. Int. Symp. Model.
Optim. Mobile Ad Hoc Wireless Netw. (WiOpt), Shanghai, China, May 2018
Uplink
▶ [Kamiya+2018] S. Kamiya, K. Yamamoto, S.-L. Kim, T. Nishio, and
M. Morikura, “Asymptotic analysis of normalized SNR-based scheduling in
uplink cellular networks with truncated channel inversion power control,” in
Proc. IEEE Int. Conf. Commun. (ICC), Kansas, MO, USA, May 2018
86 / 88

References
▶ M. Haenggi, Stochastic Geometry for Wireless Networks.
Cambridge, U.K.: Cambridge Univ. Press, 2012
▶ F. Baccelli and B. Blaszczyszyn, “Stochastic geometry and wireless
networks: Volume II applications,” Found. Trends Netw., vol. 4,
no. 1-2, 2009
87 / 88

Summary (Part III)
Stochastic geometry modeling
▶ Poisson point process — Potential TXs
▶ Mat´ern CSMA point process — Simultaneous TXs
▶ Distribution of interference and SIR
Applications
▶ Joint carrier sense threshold and transmission power setting
▶ User scheduling
88 / 88

References I
[Alfano+2011] G. Alfano, M. Garetto, and E. Leonardi, “New insights
into the stochastic geometry analysis of dense CSMA
networks,” in Proc. IEEE Int. Conf. Comput. Commun.
(INFOCOM), Shanghai, China, Apr. 2011.
[Alfano+2014] ——,“New directions into the stochastic geometry
analysis of dense CSMA networks,” IEEE Trans. Mobile
Comput., vol. 13, no. 2, Feb. 2014.
[Andrews+2011] J. G. Andrews, F. Baccelli, and R. K. Ganti, “A tractable
approach to coverage and rate in cellular networks,”
IEEE Trans. Commun., vol. 59, no. 11, Nov. 2011.
[Arulkumaran+2017] K. Arulkumaran, M. P. Deisenroth, M. Brundage, and
A. A. Bharath, “Deep reinforcement learning: A brief
survey,” IEEE Signal Process. Mag., vol. 34, no. 6, Nov.
2017.
[Baccelli+2009] F. Baccelli and B. Blaszczyszyn, “Stochastic geometry
and wireless networks: Volume II applications,” Found.
Trends Netw., vol. 4, no. 1-2, 2009.
89 / 88

References II
[Baddeley+2015] A. Baddeley, E. Rubak, and R. Turner, Spatial Point
Patterns: Methodology and Applications with R. CRC
Pr., 2015.
[Bianchi2000] G. Bianchi, “Performance analysis of the IEEE 802.11
distributed coordination function,” IEEE J. Sel. Areas
Commun., vol. 18, no. 3, Mar. 2000.
[Busson+2009] A. Busson and G. Chelius, “Point processes for
interference modeling in CSMA/CA ad-hoc networks,” in
Proc. the 6th ACM Symp. Perform. Eval. Wireless Ad
Hoc, Sensor, Ubiquitous Netw. (PE-WASUN), Canary
Islands, Spain, Oct. 2009.
[ElSawy+2012] H. ElSawy, E. Hossain, and S. Camorlinga,
“Characterizing random CSMA wireless networks: A
stochastic geometry approach,” in Proc. IEEE Int. Conf.
Commun. (ICC), Ottawa, ON, Jun. 2012.
90 / 88

References III
[ElSawy+2013] H. ElSawy and E. Hossain, “A modiﬁed hard core point
process for analysis of random CSMA wireless networks
in general fading environments,” IEEE Trans. Commun.,
vol. 61, no. 4, Apr. 2013.
[Fuemmeler+2006] J. A. Fuemmeler, N. H. Vaidya, and V. V. Veeravalli,
“Selecting transmit powers and carrier sense thresholds
in csma protocols for wireless ad hoc networks,” in
Proceedings of the 2nd annual international workshop on
Wireless internet - WICON ’06, New York, New York,
USA, Aug. 2006.
[Garetto+2008] M. Garetto, T. Salonidis, and E. W. Knightly,
“Modeling per-ﬂow throughput and capturing starvation
in CSMA multi-hop wireless networks,” IEEE/ACM
Trans. Netw., vol. 16, no. 4, Aug. 2008.
[Haenggi2012] M. Haenggi, Stochastic Geometry for Wireless Networks.
Cambridge, U.K.: Cambridge Univ. Press, 2012.
91 / 88

References IV
[Iwata+2019] M. Iwata, K. Yamamoto, B. Yin, T. Nishio,
M. Morikura, and H. Abeysekera, “Analysis of inversely
proportional carrier sense threshold and transmission
power setting based on received power for IEEE
802.11ax,” in Proc. IEEE Consum. Commun. Netw.
Conf. (CCNC), Las Vegas, NV, USA, Jan. 2019.
[Kamiya+2018] S. Kamiya, K. Yamamoto, S.-L. Kim, T. Nishio, and
M. Morikura, “Asymptotic analysis of normalized
SNR-based scheduling in uplink cellular networks with
truncated channel inversion power control,” in Proc.
IEEE Int. Conf. Commun. (ICC), Kansas, MO, USA,
May 2018.
[Kaynia+2011] M. Kaynia, N. Jindal, and G. E. Oien, “Improving the
performance of wireless ad hoc networks through MAC
layer design,” IEEE Trans. Wireless Commun., vol. 10,
no. 1, Jan. 2011.
[Khorov+2019] E. Khorov, A. Kiryanov, A. Lyakhov, and G. Bianchi, “A
tutorial on IEEE 802.11ax high eﬃciency WLANs,” IEEE
Commun. Surveys Tuts., vol. 21, no. 1, 2019.
92 / 88

References V
[Koda+2018] Y. Koda, K. Yamamoto, T. Nishio, and M. Morikura,
“Reinforcement learning based predictive handover for
pedestrian-aware mmWave networks,” in Proc. IEEE Int.
Conf. Comput. Commun. Workshops (INFOCOM
Workshops), Honolulu, HI, USA, Apr. 2018.
[Koda+2019] Y. Koda, K. Nakashima, K. Yamamoto, T. Nishio, and
M. Morikura, “End-to-end learning of proactive handover
policy for camera-assisted mmWave networks using deep
reinforcement learning,” arXiv preprint arXiv:1904.04585,
Apr. 2019.
[Liew+2010] S. C. Liew, C. H. Kai, H. C. Leung, and P. Wong,
“Back-of-the-envelope computation of throughput
distributions in CSMA wireless networks,” IEEE Trans.
Mobile Comput., vol. 9, no. 9, Sep. 2010.
[Mhatre+2007] V. P. Mhatre, K. Papagiannaki, and F. Baccelli,
“Interference mitigation through power control in high
density 802.11 WLANs,” in Proc. IEEE Int. Conf.
Comput. Commun. (INFOCOM), Anchorage, AK, May
2007.
93 / 88

References VI
[Mnih+2015] V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu,
J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller,
A. K. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie,
A. Sadik, I. Antonoglou, H. King, D. Kumaran,
D. Wierstra, S. Legg, and D. Hassabis, “Human-level
control through deep reinforcement learning,” Nature,
vol. 518, no. 7540, Feb. 2015.
[Nakashima+2019] K. Nakashima, S. Kamiya, K. Ohtsu, K. Yamamoto,
T. Nishio, and Masahiro Morikura, “Deep reinforcement
learning-based channel allocation for wireless LANs with
graph convolutional networks,” arXiv preprint
arXiv:1905.07144, May 2019.
[Nguyen+2007] H. Q. Nguyen, F. Baccelli, and D. Kofman, “A
stochastic geometry analysis of dense IEEE 802.11
networks,” in Proc. IEEE Int. Conf. Comput. Commun.
(INFOCOM), Anchorage, AK, May 2007.
94 / 88

References VII
[Ohto+2017] T. Ohto, K. Yamamoto, S.-L. Kim, T. Nishio, and
M. Morikura, “Stochastic geometry analysis of
normalized SNR-based scheduling in downlink cellular
networks,” IEEE Wireless Commun. Lett., vol. 6, no. 4,
Aug. 2017.
[Schaul+2015] T. Schaul, J. Quan, I. Antonoglou, and D. Silver,
“Prioritized experience replay,” in ICLR, Sna Juan,
Puerto Rico, May 2015.
[Sutton+2018] R. S. Sutton and A. G. Barto, Reinforcement Learning,
2nd ed. MIT Pr., 2018.
[Szepesv´ari2010] C. Szepesv´ari, Algorithms for Reinforcement Learning.
Morgan and Claypool Pub., 2010.
[Venkataraman+2006] J. Venkataraman, M. Haenggi, and O. Collins, “Shot
noise models for outage and throughput analyses in
wireless ad hoc networks,” in Proc. IEEE Military
Commun. Conf. (MILCOM), Oct. 2006.
95 / 88

References VIII
[vHasselt+2016] H. van Hasselt, A. Guez, and D. Silver, “Deep
reinforcement learning with double q-learning,” in 30th
AAAI Conf. Artiﬁcial Intelligence, Mar. 2016.
[Wang+2015] Z. Wang, T. Schaul, M. Hessel, H. van Hasselt,
M. Lanctot, and N. de Freitas, “Dueling network
architectures for deep reinforcement learning,” arXiv
preprint arXiv:1511.06581, Nov. 2015.
[Watkins+1992] C. J. C. H. Watkins and P. Dayan, “Q-learning,” Mach.
Learn., vol. 8, no. 3-4, May 1992.
[Yamamoto2015] K. Yamamoto, “A comprehensive survey of potential
game approaches to wireless networks,” IEICE Trans.
Commun., vol. E98-B, no. 9, Sep. 2015.
[Yamamoto2018] ——,“SIR distribution and scheduling gain of normalized
SNR scheduling in Poisson networks,” in Proc. Int.
Symp. Model. Optim. Mobile Ad Hoc Wireless Netw.
(WiOpt), Shanghai, China, May 2018.
96 / 88

References IX
[Yin+2017] B. Yin, S. Kamiya, K. Yamamoto, T. Nishio,
M. Morikura, and H. Abeysekera, “Mitigating
throughput starvation in dense WLANs through potential
game-based channel selection,” IEICE Trans.
Fundamentals, vol. E100-A, no. 11, Nov. 2017.
Some ﬁgures in this slide are licensed to speakers only.
97 / 88

Machine Learning and Stochastic Geometry: Statistical Frameworks Against Uncertainty in Wireless LANs --- Part A

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Machine Learning and Stochastic Geometry: Statistical Frameworks Against Uncertainty in Wireless LANs --- Part A

Similar to Machine Learning and Stochastic Geometry: Statistical Frameworks Against Uncertainty in Wireless LANs --- Part A (20)

Recently uploaded

Recently uploaded (20)

Machine Learning and Stochastic Geometry: Statistical Frameworks Against Uncertainty in Wireless LANs --- Part A