1 s2.0-s1389128620312354-main

Computer Networks 183 (2020) 107602
Available online 17 October 2020
1389-1286/© 2020 Elsevier B.V. All rights reserved.
Joint access and backhaul resource allocation for D2D-assisted dense
mmWave cellular networks
Xiangwen Dai, Jinsong Gui *
School of Computer Science and Engineering, Central South University, South Road of Lushan, Changsha, Hunan Province, 410083, PR China
A R T I C L E I N F O
Keywords:
Dense mmWave cellular network
Resource allocation
D2D communication
Potential game
A B S T R A C T
Device-to-Device (D2D) communication technology is a promising solution to solve the problem that millimeter-
wave (mmWave) signal propagation is vulnerable to blockage and thus rapidly weakened. However, after
introducing in-band D2D communication to dense mmWave cellular networks, we must consider how to
reasonably share resources and effectively control interference between more types of communication links (e.g.,
D2D, access, backhaul), which makes the resource allocation problem more challenging. In other words, it is
more challenging to investigate joint access and backhaul resource allocation problem for in-band D2D-assisted
dense mmWave cellular networks. To address this challenging problem, we firstly decouple it into the two sub-
problems (i.e., joint access and backhaul resource allocation sub-problem, and joint D2D access and forwarding
link resource allocation sub-problem). Then, we formulate the two sub-problems as the two non-cooperative
games respectively and prove that they are exact potential game and there exist feasible pure strategy Nash
equilibrium under some mild conditions. Finally, based on the formulated game models, we propose a
centralized algorithm and a decentralized algorithm to get resource allocation results respectively. The extensive
simulation results show that the proposed algorithms can effectively mitigate the impact of blockage on network
performance.
1. Introduction
Mobile data traffic grows exponentially in the fifth-generation (5G)
cellular network. To better handle this growth, researchers have been
using lots of solutions to increase network capacity. These solutions
include introducing massive multi-input multi-output (MIMO) and
beamforming technologies into wireless networks, using millimeter
wave (mmWave) bands, and densely deploying base stations to achieve
the throughput in the range of gigabits per second [1]. However, the
large path loss, the susceptibility to blockage and other features of
mmWave bands have made it difficult to implement these solutions. In
order to tackle the above problems, the mmWave small base stations
should be densely deployed to increase line-of-sight (LOS) probability
[2,3]. However, due to the increase of network densification, it is not
feasible to use wired fiber backhaul. Since dense mmWave cellular
networks need lots of backhauls, the use of many wired fiber backhauls
in dense wireless networks will significantly increase the network con
struction cost and operation maintenance overhead. Therefore, the
wireless self-backhaul solution was proposed as a promising scheme for
dense small cell networks, where the common mmWave radio spectrum
is shared in both access and backhaul transmission [4,5]. So, the
requirement for wired backhaul can be reduced or even avoided.
In dense mmWave cellular networks with wireless backhaul, the
access link and backhaul link are in pairs. Hence, if they all use the same
mmWave bands, there exists a resource competition problem. How to
allocate resources reasonably to increase the network sum rate or
network energy efficiency as high as possible has always been the focus
of researchers. There are also a lot of research works on resource allo
cation problems in dense mmWave networks [6–9]. In [6], the authors
focused on the backhaul spectrum allocation problem in hybrid
mmWave and sub-6 GHz bands, and the authors of [7] addressed an
optimization of scheduling scheme and power control scheme to maxi
mize the mmWave wireless backhaul energy efficiency. However, they
did not take joint access and backhaul resource allocation into account.
Although the authors of [8] considered joint access and backhaul
resource allocation in dense mmWave networks, the computational
complexity of the solution is too high to be applied in reality. Therefore,
in [9], the authors proposed a low-complexity algorithm for joint access
* Corresponding author.
E-mail addresses: 184612321@csu.edu.cn (X. Dai), jsgui2010@csu.edu.cn (J. Gui).
Contents lists available at ScienceDirect
Computer Networks
journal homepage: www.elsevier.com/locate/comnet
https://doi.org/10.1016/j.comnet.2020.107602
Received 20 May 2020; Received in revised form 9 October 2020; Accepted 10 October 2020

2
and backhaul resource allocation in self-backhauling dense mmWave
cellular networks. However, the authors of [9] did not consider that
mmWave signal propagation is susceptibility to blockage. Especially, at
the access end, due to the random movement of user equipments (UEs),
both none-line-of-sight (NLOS) and LOS transmission links coexist in the
actual networks. Therefore, the network energy efficiency will be
greatly reduced if the NLOS transmission links are used. However, it is
possible to replace an NLOS link with a path formed by multiple seg
ments of LOS links with several idle UEs’ assistance. The communication
mode between the blocked UE and the idle UE is the device-to-device
(D2D) communication mode. Due to the inherent physical proximity
and spectrum reuse gain, D2D communication can alleviate spectrum
deficiency and offload cellular traffic, which makes it become one of the
key technologies in the 5G cellular network [10,11].
Although the application of D2D communication technology to
wireless backhaul dense mmWave cellular networks has potential
development prospects, it also poses some new challenges. Firstly, in
order to combine D2D communication technology with joint access and
backhaul resource allocation schemes, it is necessary to choose appro
priate D2D relaying UEs. So, the D2D relaying selection strategy will
affect the final result to a certain extent. Secondly, due to the spectrum
uncontrollability of out-band D2D communication [12], in-band D2D
communication should be more suitable for dense mmWave cellular
networks, where we adopt underlay in-band D2D communication since
it is better than overlay in-band D2D communication in terms of spec
trum efficiency. However, a new resource allocation problem after
introducing D2D communications to dense mmWave cellular networks
has also arisen. For example, on the access side of a mmWave small base
station, the introduction of D2D communication will make a blocked
access link be replaced by an access path with multiple links, which
makes the resource allocation problem more complex because different
patterns of resource allocation have a significant impact on network
energy efficiency.
In order to address the above challenges, in this paper, we investigate
the resource allocation optimization problem in D2D-assisted dense
mmWave cellular networks with wireless backhaul. Firstly, we need to
design a D2D relaying selection strategy to handle the NLOS trans
mission problem. Then, the resource allocation optimization problem
that can consider two situations (i.e., with D2D communications and
without D2D communications) at an access end is proposed. Moreover,
in order to solve the resource allocation problem with lower computa
tional complexity, we consider to use a game theory method to formu
late the problem as a non-cooperative game and then solve it. There are
other good optimization techniques to solve the resource allocation
problem. For example, the heuristic method (e.g., reinforcement
learning approach, neural network algorithm, genetic algorithm, parti
cle swarm optimization, ant colony algorithm, simulated annealing al
gorithm) and the convex optimization method. However, these methods
have some limitations when they are applied to solve the resource
allocation optimization problem in this paper, which we will explain
later. In addition, the solutions in [9] that are most relevant to our work
use game theory. To facilitate the comparison with them, we also adopt
the same as theirs.
To the best of our knowledge, the joint access and backhaul resource
allocation problem for D2D-assisted dense mmWave cellular network
has not been studied in any previous work. The main contributions of
this paper are summarized as follows:
1) We introduce in-band D2D communication into the dense mmWave
cellular network and formulate the resource allocation problem as
the two sub-problems P1and P2. The sub-problem P1aims at
maximizing intermediate energy efficiency (this parameter is
explained in Section 3.2.1), which considers the constraint of back
haul throughput. The sub-problem P2aims at maximizing the
average energy efficiency for access paths, which considers the
constraint of D2D throughput.
2) In order to solve the two sub-problems with lower computational
complexity, we formulate them as non-cooperative games 𝒢1and 𝒢2,
respectively. Game 𝒢1and game 𝒢2are exact potential games, which
have a common utility for all the players and at least one pure
strategy Nash equilibrium (NE). We also prove that, no matter for
game 𝒢1or game 𝒢2, there is at least a feasible pure strategy NE under
some mild conditions.
3) We propose a centralized algorithm and a decentralized algorithm
based on these two games. The proposed centralized algorithm can
converge to a feasible pure strategy NE from any feasible initial
strategy profile in finite steps, while the proposed decentralized al
gorithm can achieve the global optimal solution with an arbitrarily
high probability. From the analysis of the algorithms, it is seen that
the convergence speed of our proposed algorithms is in the same
order of magnitude as the existing algorithms most relevant to ours,
but our algorithms outperform them in terms of network energy ef
ficiency and sum data rate.
4) We compare the proposed algorithms with the exiting similar algo
rithms in the different system configurations. The simulation results
verify that the proposed algorithms can effectively mitigate the
impact of blockage on network performance and improve the
throughput of links in NLOS state.
The rest of this paper is organized as follows. Section 2 gives an
overview of the works on resource allocation problem. In Section 3, we
describe the system model, including network architecture and problem
formulation. In Section 4, we expound the game-based resource allo
cation algorithms. The algorithms in this paper are evaluated in Sections
5, while the conclusions of this paper are given in Section 6. In addition,
in view of the large number of notations in this paper, for the conve
nience of readers, the main notations are explained in Table 1.
2. Related work
The network resource allocation problem has been a hot research
issue. Many researchers have studied the problem of resource allocation
in different networks (e.g., Internet of Things [13], Internet of vehicles
[14], satellite-drone networks [15,16], mmWave cellular networks [17,
18]) and their purpose is to improve network performance (e.g. spec
trum efficiency, network capacity, energy efficiency). The authors of
[16] studied a joint user association and resource allocation problem in
an integrated satellite-drone network, where the problem is modeled by
using a competitive market setting. This formulated market consists of
goods, buyers, and sellers, in which the buyers and sellers seek to reach a
Walrasian equilibrium solved by a heavy ball based iterative algorithm.
The authors of [17] proposed a framework assisted by a reconfig
urable intelligent reflector (IR) to optimize the downlink multi-user
communications in mmWave cellular networks. They proposed a
distributional reinforcement learning (DRL) approach to learn the
optimal IR reflection and maximize the expectation of downlink ca
pacity. Furthermore, they also developed a learning algorithm based on
quantile regression (QR), and proved the proposed QR-DRL method to
converge to a stable distribution of downlink transmission rate. The
authors of [18] proposed a D2D-assisted cooperative edge caching pol
icy in mmWave dense networks, which aims at reducing the content
retrieval delay and relieving the huge burden on the backhaul links by
cooperatively utilizing the cache resource of users and SBSs in
proximity.
Besides the use of the above theoretical tools, based on game theory,
the selection of access network by end users and the allocation of
connection resources by network operators are analyzed in [19]. The
authors in [20] proposed the two joint transceiver placement and
resource allocation schemes to improve reliability, optimize average
transmission power, and reduce average bit error rate (BER) and the
number of disrupted links of redirected cooperation in hybrid free-space
optics (FSO) and mmWave fronthaul networks.
X. Dai and J. Gui

3
Among many researches on network resource allocation, there are
also some researches on joint access and backhaul resource allocation.
The authors of [21] used a two-stage Stackelberg game to investigate the
influence of the uplink access spectrum allocation to the backhaul en
ergy allocation. In [22], in order to maximize the minimum user rate in
two-tier heterogeneous networks (HetNets), the authors proposed a
low-complexity matching theory and successive convex approximation
(SCA)-based geometric programming approximation framework for
full-duplex (FD) joint backhaul-access power and sub-channel alloca
tion. A joint access and backhaul transmission optimization was pro
posed in a time-division duplexing (TDD)-based system to maximize the
network weight sum-rate [23].
Recently, more and more studies are related to mmWave networks.
For instance, the authors of [24] proposed a cross-layer optimization
problem to investigate the joint optimization problem of resource allo
cation in a cellular network under the combination of mmWave and
traditional wave. Furthermore, there are also some studies related to
joint access and backhaul resource allocation in mmWave networks. In
[25], the authors investigated a predictive joint backhaul and access
scheduling and routing mechanism for dense mmWave networks to
optimize the performance of enhanced mobile broadband (eMBB)
services while maintaining a stable performance allowing an optimal
quality of experience (QoE). The authors of [26] studied joint access and
backhaul resource optimization problem in ultra-dense networks with
wireless backhaul.
Apart from the works like [24–26] that only consider the optimiza
tion of mmWave resource allocation, there are some works like [27–30]
that consider the resource optimization for D2D-assisted mmWave net
works. The authors of [27] proposed a timeslot and transmitting angle
joint resource allocation scheme for D2D-assisted mmWave networks
with heterogeneous antenna arrays to improve the system throughput.
In [28], a band selection and channel allocation problem was formulated
for D2D-assisted small cell networks with heterogeneous spectrum to
maximize the system utility. The authors of [29] investigated the
resource allocation for underlay D2D in the outdoor mmWave scenario.
It aimed to emphasize on fair distribution of resources in a cell while
maximizing spectral efficiency. The authors of [30] studied the resource
allocation on the user provided network (UPN) formed by D2D links
under 5G integrated access and backhaul network. In order to ensure
that each user is willing to serve other users, it further proposed a joint
incentive and resource allocation scheme.
Although there have been researches on resource allocation in D2D-
assisted mmWave networks, none of these works involves scenarios of
joint access and backhaul resource allocation. Although the work like
[9] takes the joint access and backhaul resource allocation into account,
it does not consider the problem of low network efficiency caused by the
obstacles on mmWave access links. Therefore, we consider introducing
D2D communication to mmWave networks to avoid obstacles. More
over, we also investigate the joint access and backhaul resource allo
cation problem for D2D-assisted dense mmWave networks with wireless
backhaul to improve the network energy efficiency as much as possible.
3. System model
3.1. Network architecture
In this paper, we consider the uplink communication scenario in an
integrated mmWave/sub-6 GHz cellular network, which is shown in
Fig. 1. There is a single macro base station (MBS), M small base stations
(SBSs) and N UEs. We denote ℳ = {1,2,…,M}as the set of MSBSs and
𝒩 = {1,2,…,N}as the set of NUEs, respectively. The MBS is located in
the center of the network, while M SBSs are evenly distributed within the
cell coverage of the MBS and N UEs are randomly located in the M SBSs’
cell coverage. In addition, the number of the UEs to which beams have
been assigned for transmitting data in a scheduling period is denoted as
Ntra
, and the corresponding setis denoted as 𝒩 tra
= {1,2,…,Ntra
}, where
𝒩 tra
⊂𝒩 .
An SBS can communicate with the MBS via a backhaul link, while a
UE can communicate with an SBS via a single-hop connection or multi-
hop connection. In order to distinguish between single-hop and multi-
hop connections at the access end, we define an access link as a
single-hop connection between a UE and an SBS. Multiple links can form
an access path from a UE to an SBS via another UE, where we only
consider one relaying UE in each path because each hop of the network
increases the path delay and energy consumption. For such an access
path, it can be divided into the two types of links (i.e., a D2D link and a
forwarding link, as shown in Fig. 1).
Each UE needs to connect to an SBS before transmitting data. How
ever, even SBSs have been deployed densely in mmWave networks,
there may be NLOS links due to the randomness of movement of every
UE. For weak diffraction of mmWave signals, the transmission in the
NLOS link will reduce network energy efficiency significantly [31]. In
order to solve this problem, D2D communication is under consideration.
By selecting a D2D relay for any UE with the transmission link in NLOS
state, we can use the access path with the two LOS links instead of the
original access link in NLOS state. The D2D relay selection strategy is
Table 1
Notations used in our work.
Notation Description
M The number of SBSs
ℳ The set of MSBSs
N The number of UEs
𝒩 The set of NUEs
Ntra The number of the UEs to which beams have been assigned for
transmitting data in a scheduling period
𝒩 tra
The set of Ntra
UEs to which beams have been assigned for transmitting
data in a scheduling period
αT
n,r
Angle between the line from UE n to UE r and the center line of the
transmitting beam of UE n
αR
n,r
Angle between the line from UE n to UE r and the center line of the
receiving beam of UE r
φT
n,r
Operation beam width of UE n to UE r
φR
n,r
Operation beam width of UE r to UE n
ξ Gain in the side lobe
GT
n,r
Transmission gain between the beam of UE n directed to UE r
GR
r,n
Reception gain between the beam of UE r directed to UE n
τn,r Propagation delay of the D2D link from UE n to UE r
χC
n,r
Amplitude of the D2D link from UE n to UE r
GC
n,r
Channel gain of the D2D link from UE n to UE r
dn,r Distance from UE n to UE r
c Speed of light
λ Wavelength
R The number of relaying UEs
ℛ The set of Rrelaying UEs
pu
n Transmission power of UE n in an access link or D2D link
Cu
The cardinality of the transmission power set ℘u
tra
m Access transmission duration of SBS m
tD2D
r
D2D transmission duration in a time slot
t
ra
1
Minimum access transmission durations in an access transmission
duration sequence
t
ra
M
Maximum access transmission durations in an access transmission
duration sequence
t
D2D
1
Minimum D2D access durations in a D2D access duration sequence
t
D2D
Rm
Maximum D2D access durations in a D2D access duration sequence
ℳi The set of the SBSs with indices less than i
𝒩 m′ The UE set associated with SBS m
′
sgn(⋅) Signum function
ℛm
The set of relaying UEs of the m-th SBS coverage
Rm
The number of members in ℛm
X. Dai and J. Gui

4
described as follows.
1) Each UE can broadcast the D2D relay request message at its
maximum transmission power if it finds itself in NLOS state.
2) Any UE that receives the D2D relay request message (e.g., if the
receiving signal strength of request message is not less than the
reception threshold, it is considered to be received successfully), will
respond the D2D relay response message by using its maximum
transmission power if it meets the following conditions: a) it is idle (i.
e., it neither has data delivery requirement nor acts as a relay) and b)
its transmission link with an SBS is in LOS state.
3) Each UE may receive multiple D2D relay response messages, at
which point the D2D relay response message with the strongest
receiving signal should be selected and then a confirmation message
should be sent to the selected D2D relay.
Besides, the C-plane/U-plane split network architecture [32] is under
consideration, where data transmission takes place in U-plane via
mmWave bands and other signals transmission is performed in C-plane
via sub-6 GHz bands. Meanwhile, since the access links (or paths) and
backhaul links in the mmWave network share the mmWave bands, the
time division multiple access mechanism and the beamforming tech
nology are adopted.
To facilitate the interference analysis and other parameters calcula
tions, we give the corresponding D2D link gain calculation equations as
follow by referring to the similar formulas in [9]. For a D2D link of UE n
to UE r, we denote αT
n,rand αR
n,ras the angle between the line from UE nto
UE r and the center line of the transmitting beam of UE n and the angle
between the line from UE n to UE r and the center line of the receiving
beam of UE r, and denote φT
n,rand φR
n,ras the operation beam width of UE
nto UE r and the operation beam width of UE r to UE n. For this D2D link,
its transmission and reception gains can be given by
GT
n,r
(
αT
n,r, φT
n,r
)
=
⎧
⎪
⎨
⎪
⎩
2π −
(
2π − φT
n,r
)
ξ
φT
n,r
, if
⃒
⃒
⃒αT
n,r
⃒
⃒
⃒ ≤
φT
n,r
2
ξ, otherwise
(1)
and
GR
n,r
(
αR
n,r, φR
n,r
)
=
⎧
⎪
⎨
⎪
⎩
2π −
(
2π − φR
n,r
)
ξ
φR
n,r
, if
⃒
⃒
⃒αR
n,r
⃒
⃒
⃒ ≤
φR
n,r
2
ξ, otherwise
(2)
where 0 ≤ ξ < 1denotes the gain in the side lobe, with ξ≪1for narrow
beams. Similarly, we can calculate the transmission and reception gains
between the beam of UE ndirected to SBS m and the beam of SBS m
directed to UE n denoted by GT
n,mand GR
m,n, the transmission and recep
tion gains between the beam of SBS mdirected to the MBS and the beam
of the MBS directed to SBS mdenoted by GT
mand GR
m, respectively.
We assume that τn,rdonates the propagation delay of the D2D link
from UE nto UE r and χC
n,rdonates the amplitude of the D2D link from UE
nto UE r. According to [33], the channel gain of the D2D link from UE n
to UE r can be given by
GC
n,r =
⃒
⃒
⃒χC
n,rδ
(
τ − τn,r
)⃒
⃒
⃒
2
(3)
where δ(⋅)is the Dirac delta function, and τn,ris given by
τn,r = dn,r
/
c (4)
where dn,ris the distance from UE nto UE r and c is the speed of light.
According to [34], as for a LOS link, its amplitude is estimated by
λ
4πdn,r
, where λis the wavelength, λ = c/fc, and fcis the carrier frequency.
In addition, as for a NLOS link, its amplitude includes both path loss and
reflection coefficients and thus is estimated by λ
4πdn,r
∏
REF
ref=1
Γref , where Γref is
the reflection coefficient of the ref-th reflection of the path between UE
nand UE r, and REF is the number of reflections of the path. According to
[35], we know that the reflection loss of mmWave band is very high, and
thus consider one reflection for a given path (i.e., REF is set to 1). Based
on the above, we summarize the amplitude estimation formula as
follows.
χC
n,r =
⎧
⎪
⎨
⎪
⎩
λ
4πdn,r
if path is in LOS state
λ
4πdn,r
Γref if path is in NLOS state
(5)
Similarly, we can calculate the channel gain GC
mof the backhaul link
of SBS mto the MBS, the channel gain GC
n,mof the access link of UE nto SBS
Fig. 1. A mmWave cellular communication scenario.
X. Dai and J. Gui

5
m, respectively.
We assume that “Dn = 1” represents that UE nis using D2D
communication and “Dn = 0” represents that UE ndoes not use D2D
communication. Based on the previous description, the UE directly
communicating with the SBS in the access path is regarded as the
relaying UE while the other one is regarded as the source UE. We also
assume that the number of relaying UEs is Rand ℛ = {1,2,…,R}is the set
of relaying UEs, where ℛ⊂𝒩 . Since the research in this paper focuses on
the network resource allocation, we do not delve into the reward
mechanism of D2D relaying UEs and just assume that any UE is willing
to offer relay service as long as it is idle.
Considering a continuous power range, there are theoretically an
infinite number of available power values. The discretization is to
convert this power value range into a finite number of power values.
Therefore, compared with continuous power control, discrete power
control has the two advantages in practice [36]: 1) simpler transmitter
design, and 2) less information exchange overhead of network nodes.
Therefore, we consider using discrete power control scheme in this
paper. We denote the transmission power of backhaul link of SBS m to
the MBS as ps
m, which is assumed as a fixed value due to most backhaul
links are predetermined. Besides, we also denote the transmission power
of access link from UE nto SBS mor D2D link from UE n to UE r as
pu
nwhich belongs to the set ℘u
= {pu
n1 < pu
n2 < ... < pu
nCu }, and Cu
is the
cardinality of ℘u
, where n ∈ 𝒩 .
To meet the requirements of different users for data traffic, we adopt
a mmWave duration architecture including downlink (DL) sweeping
sub-frame, uplink (UL) sweeping sub-frame, and configurable DL/UL
sub-frame, and the detailed description of this architecture can be found
in [9]. Besides this, each sub-frame can be further divided into two
segmentations (i.e., access duration and backhaul duration) and the
length of each segmentation can be dynamically adjusted to adapt to
different channel and traffic requirements. Due to the large path loss, the
susceptibility to blockage, narrow beam and lower cross-link interfer
ence, in this paper, we use the non-unified transmission duration allo
cation scheme [9] for each small cell. Based on the above mentioned
benefits of discretization, we also construct a discrete duration sequence
[tra
1 , tra
2 , …, tra
m , …, tra
M ], where tra
m corresponds to the access transmission
duration of SBS m. Since there are multiple time slots in each scheduling
period (i.e., a sub-frame period) in cellular network, we assume that
each scheduling period has Qtstime slots. Therefore, the access trans
mission duration of SBS mafter normalization satisfies tra
m ∈ 𝒬ts, 𝒬ts =
[
1
Qts
, 2
Qts
,…,Qts− 1
Qts
]
.
As the foregoing, an access path is divided into a D2D link and a
forwarding link. Since we use underlay D2D communication in this
paper, the transmission of these two links requires time division multiple
access mechanism to avoid access conflict. For computational conve
nience, we assume that each time slot will be divided into two parts (i.e.,
D2D transmission part and forwarding transmission part), and the D2D
transmission part accounts for tD2D
r of each time slot. Assume that each
time slot can be further divided into Qra
smaller time slots, so the D2D
access duration of relaying UE rafter normalization satisfies tD2D
r ∈ 𝒬ra
=
[
1
Qra, 2
Qra,…,Qra
− 1
Qra
]
.
3.2. Problem formulation
Both the scheme described in the following text and that in [9]
construct the estimation formulas for interference values on the basis of
the millimeter-wave propagation model. At the same time, both schemes
estimate the throughput values of various links based on the Shannon
theorem. After introducing D2D communication, there are more types of
links (i.e., access links, backhaul links, forwarding links, and D2D links)
in our scheme, while the scheme in [9] only involves access links and
backhaul links. Our scheme needs to construct more types of interfer
ence estimation formulas, since the interference details between
different types of links are different.
Because of the difference of interference types, the expression of
signal to interference and noise ratio (SINR) is also various, or it is
difficult to cover all the cases with a unified expression form. As long as
the necessary SINR values are obtained, the Shannon theorem can be
used to solve the link throughput. Furthermore, unlike the scheme in
[9], we focus on the network energy efficiency, so we also have to
construct a set of energy efficiency estimation formulas. To this end,
firstly, we need to derive a set of formulas to estimate SINR values of all
the types of links. Then, we can get a set of throughput estimation for
mulas of all the types of links. Next, after deriving a set of power con
sumption estimation formulas for all the types of links, we can construct
an optimal model of network energy efficiency.
To facilitate analysis, we rearrange the access transmission durations
of all the SBSs in an ascending order, which can be expressed as [t
ra
1 ,t
ra
2 ,
…, t
ra
M], where t
ra
1 and t
ra
Mare the minimum and maximum access trans
mission durations in this access transmission duration sequence,
respectively. We also make L = [L1,L2,…,LM]be the index list of all the
SBSs corresponding to the above sequence, where Lmdenotes the index
of the m-th SBS. Besides, we introduce two auxiliary constants t
ra
0 =
0and t
ra
M+1 = 1, and an empty set ℳ0 = ∅.
What is more, we further assume that the set of relaying UEs in the
coverage of the m-th SBS is ℛm
and the number of them is Rm
. For
analytical tractability, we rearrange the D2D transmission durations of
all the relaying UEs of ℛm
in an ascending order, and the sequence can be
given as [t
D2D
1 ,t
D2D
2 ,…,t
D2D
r ,…,t
D2D
Rm ], where t
D2D
1 and t
D2D
Rm are the minimum
and maximum D2D access durations in this D2D access duration
sequence, respectively. Let Lm
= [Lm
1 ,Lm
2 ,…,Lm
r ,…,Lm
Rm ]be the index list of
all the relaying UEs corresponding to the above sequence. To facilitate
analysis, we also introduce two auxiliary constants t
D2D
0 = 0 and t
D2D
Rm+1 =
1.
In order to get the SINR of each link, we will analyze the interference
of different types of links to get the corresponding estimation formula.
Since we use the non-unified transmission duration allocation and D2D
communication technology, the interference experienced by each link is
more complicated. In addition, because there are the two situations in
access end (i.e., using D2D communication and not using D2D
communication), we need to consider these two situations separately.
According to the interference analysis method in [9], we successively
carry out interference analysis for access links, access paths (where each
access path consisting of a D2D link and a forwarding link) and backhaul
links in different transmission durations. Firstly, for the access link of UE
n associated with SBS m in the duration of t
ra
i − t
ra
i− 1for 1 ≤ i ≤ Lm, it will
be interfered by the access links of all the SBSs except for whose indices
is less than iin list L, which is denoted as Ira
n,m. Also, the access link
n→mwill be interfered by the backhaul links of the SBSs whose indices
is less than iin L, which is denoted as Ibh
n,m. The corresponding estimate
formulas are respectively given as follows.
Ira
n,m =
∑
m
′
∈ℳℳi
∑
n
′
∈𝒩
m
′ n
(1 − sgn(Dn
′ ))pu
n
′ GT
n
′
,m
Gc
n
′
,m
GR
n
′
,m (6)
and
Ibh
n,m =
∑
m
′
∈ℳi
ps
m
′ GT
m
′
,m
Gc
m
′
,m
GR
m
′
,m
(7)
where ℳirepresents the set of the SBSs with indices less than iin L that is
the set of all the SBSs whose access transmission duration is less than
that of the i-th SBS, 𝒩 m′ is the UE set associated with SBS m
′
, and sgn(⋅)is
the signum function. The derivation process of the above two formulas is
same as that in [9]. However, in (6), signum function is added to
distinguish the UEs that use D2D from those that do not use D2D since
X. Dai and J. Gui

6
we introduce D2D communication in this paper.
In addition, in the duration of t
ra
i − t
ra
i− 1for 1 ≤ i ≤ Lm, it will also be
interfered by the access paths of all the SBSs except for whose indices is
less than iin list L. Since there are two link types for access paths in each
access duration, we need to refine each access duration into D2D access
duration and forwarding access duration to estimate their interference
values. For any SBS m
′
∈ ℳℳi, in the duration of t
D2D
j − t
D2D
j− 1 for
1 ≤ j ≤ Rm
′
of each time slot, the D2D links of all the relaying UEs in
ℛm
′
except for those with indices in list Lm
′
being less than jwill interfere
with the access link of UE n to SBS m, which is denoted as ID2D
n,m . Also, the
forwarding links of all the relaying UEs in ℛm
′
with indices in list
Lm
′
being less than jwill interfere with the access link of UE n to SBS m in
the duration of t
D2D
j+1 − t
D2D
j for 1 ≤ j ≤ Rm
of each time slot, which is
denoted as I
fwd
n,m. Therefore, the interference from the D2D links and the
forwarding links experienced by the access link of UE nto SBS mcan be
respectively given by
ID2D
n,m =
∑
m
′
∈ℳℳi
∑
r
′
∈ℛm
′
ℛm
′
j
sgn(Dn
′ )pu
n
′ GT
n
′
,m
Gc
n
′
,m
GR
n
′
,m
(8)
and
Ifwd
n,m =
∑
m
′
∈ℳℳi
∑
r
′
∈ℛm
′
j
pu
r
′ GT
r
′
,m
Gc
r
′
,m
GR
r
′
,m
(9)
where ℛm
′
j denotes the m
′
-th SBS’s set of the D2D relaying UEs with
indices in Lm
′
being less than jthat is the m
′
-th SBS’s set of the D2D
relaying UEs whose D2D access duration is less than that of the j-th D2D
relaying UE. In (8), UE n
′
is the source UE of relaying UE r
′
. The deri
vation of the above two formulas is similar to the interference formulas’
derivation of access links and backhaul links, but they are based on a
smaller time division (e.g. D2D access duration). The derivation of the
following formulas is similar to the above four formulas’ derivation.
Secondly, for the access path of UE n indirectly connected to SBS m
via the r-th relaying UE in ℛm
in the duration of t
ra
i − t
ra
i− 1for 1 ≤ i ≤ Lm,
we divide this access path into the D2D link n→rand the forwarding
link r→mto model their interference calculation models respectively.
Therefore, in the duration of t
D2D
j − t
D2D
j− 1 for 1 ≤ j ≤ Lm
r
of each time
slot, the D2D link n→rwill experience the interferences from the access
links, the backhaul links, the D2D links, and the forwarding links, where
the modeling idea is similar to those of the above formulas (6)~(9).
These interferences can be respectively given by
Ira
n,r =
∑
m
′
∈ℳℳi
∑
n
′
∈𝒩
m
′ n
(1 − sgn(Dn
′ ))pu
n
′ GT
n
′
,r
Gc
n
′
,r
GR
n
′
,r (10)
and
Ibh
n,r =
∑
m
′
∈ℳi
ps
m
′ GT
m
′
,r
Gc
m
′
,r
GR
m
′
,r
(11)
and
ID2D
n,r =
∑
m
′
∈ℳℳi
∑
r
′
∈ℛm
′
(ℛm
′
j ∪r)
sgn(Dn
′ )pu
n
′ GT
n
′
,r
Gc
n
′
,r
GR
n
′
,r (12)
and
Ifwd
n,r =
∑
m
′
∈ℳℳi
∑
r
′
∈ℛm
′
j
pu
r
′ GT
r
′
,r
Gc
r
′
,r
GR
r
′
,r
(13)
In (10)~(13), Ira
n,r, Ibh
n,r, ID2D
n,r , and Ifwd
n,r are the interferences from the
access links, backhaul links, D2D links, forwarding links experienced by
the D2D link n→r, respectively. However, in the duration of t
D2D
j+1 −
t
D2D
j for Lm
r
≤ j ≤ Rm
of each time slot, these interfering sources will
interfere with the forwarding link r→m, which can be respectively given
by
Ira
r,m =
∑
m
′
∈ℳℳi
∑
n
′
∈𝒩
m
′
(1 − sgn(Dn
′ ))pu
n
′ GT
n
′
,m
Gc
n
′
,m
GR
n
′
,m (14)
and
Ibh
r,m =
∑
m
′
∈ℳi
ps
m
′ GT
m
′
,m
Gc
m
′
,m
GR
m
′
,m
(15)
and
ID2D
r,m =
∑
m
′
∈ℳℳi
∑
r
′
∈ℛm
′
ℛm
′
j
sgn(Dn
′ )pu
n
′ GT
n
′
,m
Gc
n
′
,m
GR
n
′
,m (16)
and
Ifwd
r,m =
∑
m
′
∈ℳℳi
∑
r
′
∈ℛm
′
j r
pu
r
′ GT
r
′
,m
Gc
r
′
,m
GR
r
′
,m
(17)
In (14)~(17), Ira
r,m, Ibh
r,m, ID2D
r,m , and Ifwd
r,m are the interferences from the
access links, backhaul links, D2D links, forwarding links experienced by
the forwarding link r→m, respectively.
Finally, for the backhaul link of SBS m to the MBS in the duration of
t
ra
i+1 − t
ra
i for Lm ≤ i ≤ M, it will be interfered by the access links of all the
SBSs except for whose indices is less than iin list L, which is denoted as
Ira
m . This backhaul link will also be interfered by the backhaul links of the
SBSs whose indices is larger than iin L, which is denoted as Ibh
m . The
corresponding estimate formulas are respectively given as follows.
Ira
m =
∑
m
′
∈ℳℳi
∑
n
′
∈𝒩
m
′
(1 − sgn(Dn
′ ))pu
n
′ GT
n
′ Gc
n
′ GR
n
′ (18)
and
Ibh
m =
∑
m
′
∈ℳim
ps
m
′ GT
m
′ Gc
m
′ GR
m
′ (19)
Similar to the access link, this backhaul link will also experience the
interference from the D2D links in the duration of t
D2D
j − t
D2D
j− 1 for
1 ≤ j ≤ Rm
of each time slot and the forwarding links in the duration of
t
D2D
j+1 − t
D2D
j for 1 ≤ j ≤ Rm
of each time slot of all the SBSs except for whose
indices is less than iin list L, which are respectively given as follow.
ID2D
m =
∑
m
′
∈ℳℳi
∑
r
′
∈ℛm
′
ℛm
′
j
sgn(Dn
′ )pu
n
′ GT
n
′ Gc
n
′ GR
n
′ (20)
and
Ifwd
m =
∑
m
′
∈ℳℳi
∑
r
′
∈ℛm
′
j
pu
r
′ GT
r
′ Gc
r
′ GR
r
′ (21)
In (20)~(21), ID2D
m and Ifwd
m are the interferences from the D2D links
and forwarding links experienced by this backhaul link, respectively.
The solutions in [9]only consider the interferences such as Ira
n,m, Ibh
n,m, Ira
m ,
and Ibh
m since they do not use D2D communication. However, besides the
above interferences, we must consider the interferences such as
ID2D
n,m , Ifwd
n,m, Ira
n,r, Ibh
n,r, ID2D
n,r , Ifwd
n,r , Ira
r,m, Ibh
r,m, ID2D
r,m , Ifwd
r,m , ID2D
m and Ifwd
m since we focus
on joint access and backhaul resource allocation for in-band D2D-as
sisted dense mmWave cellular networks. Obviously, the resource allo
cation problem to be solved in this paper faces the influence of more
types of interference sources.
According to the interference estimation formulas obtained above,
we can get the SINR estimation formulas of the corresponding links in
different transmission durations. Firstly, in the duration of t
ra
i − t
ra
i− 1for
X. Dai and J. Gui

7
1 ≤ i ≤ Lm, the SINR of the access link of UE nassociated with the SBS
min the duration of t
D2D
j − t
D2D
j− 1 for 1 ≤ j ≤ Rm
+ 1of each time slot is
given by
SINRi,j
n,m =
pu
nGT
n,mGc
n,mGR
n,m
Ira
n,m + Ibh
n,m + ID2D
n,m + Ifwd
n,m + WN0
(22)
where W is the bandwidth, and N0is the background noise power
spectrum density.
Then, in the duration of t
ra
i+1 − t
ra
i for Lm ≤ i ≤ M, the SINR of the
backhaul link of SBS massociated with the MBS in the duration of t
D2D
j −
t
D2D
j− 1 for 1 ≤ j ≤ Rm
+ 1of each time slot is given by
SINRi,j
m =
ps
mGT
mGc
mGR
m
Ira
m + Ibh
m + ID2D
m + Ifwd
m + WN0
(23)
Compare with the SINR formulas in [9], we need to consider the
additional interference values from D2D links and forwarding links since
we introduce D2D communication in this paper. Moreover, the inter
ference values from D2D links and forwarding links are estimated based
on D2D access duration which is smaller than access duration, so their
SINR calculations should also be based on D2D access duration.
Finally, similar to the SINR derivation of access links and backhaul
links, we can easily get the SINR derivation of D2D links and forwarding
links. For the access path of UE n indirectly connected to SBS m via
relaying UE r in the duration of t
ra
i − t
ra
i− 1for 1 ≤ i ≤ Lm, the SINR of the
D2D link n→rin the duration of t
D2D
j − t
D2D
j− 1 for 1 ≤ j ≤ Lm
r of each time slot
is given by
SINRi,j
n,r =
pu
nGT
n,rGc
n,rGR
n,r
Ira
n,r + Ibh
n,r + ID2D
n,r + Ifwd
n,r + WN0
(24)
and in the duration of t
D2D
j+1 − t
D2D
j for Lm
r ≤ j ≤ Rm
of each time slot, the
SINR of the forwarding link r→mis given by
SINRi,j
r,m =
pu
r GT
r,mGc
r,mGR
r,m
Ira
r,m + Ibh
r,m + ID2D
r,m + Ifwd
r,m + WN0
(25)
Because of further division of time, the derivation of throughput
formulas needs to be divided into two steps. The first is in each access
duration period, based on the above SINR estimation formulas of each
type of link in different transmission durations, we can get the corre
sponding throughput estimation formulas. Firstly, for the m-th SBS, its
throughput of access links in the duration of t
ra
i − t
ra
i− 1for 1 ≤ i ≤ Lmis
denoted as Tral
m,i, which can be estimated by
Tral
m,i = W
(
tra
i − tra
i− 1
) ∑
n∈𝒩 m
(1 − sgn(Dn))
∑
Rm+1
j=1
⎛
⎜
⎝
(
tD2D
j − tD2D
j− 1
)
⋅
log2
(
1 + SINRi,j
n,m
)
⎞
⎟
⎠ (26)
Then, for the m-th SBS, its throughput of access paths in the duration
of t
ra
i − t
ra
i− 1for 1 ≤ i ≤ Lmis denoted as Trap
m,i , since the throughput of each
access path is determined by the lower throughput segment between the
D2D link and the forwarding link. Therefore, Trap
m,i can be estimated by
Trap
m,i =
(
tra
i − tra
i− 1
)∑
r∈ℛm
min
(
TD2D
r,i , Tfwd
r,i
)
(27)
where TD2D
r,i and T
fwd
r,i represent the D2D access throughput and the for
warding throughput of the r-th relaying UE of the m-th SBS, respectively,
which can be respectively given by
TD2D
r,i = W
∑
Lm
r
j=1
(
tD2D
j − tD2D
j− 1
)
log2
(
1 + SINRi,j
n,r
)
(28)
and
Tfwd
r,i = W
∑
Rm
j=Lm
r
(
tD2D
j+1 − tD2D
j
)
log2
(
1 + SINRi,j
r,m
)
(29)
In (28), UE n is the source UE of the r-th relaying UE. Finally, the
backhaul throughput of the m-th SBS in the duration of t
ra
i+1 − t
ra
i for
Lm ≤ i ≤ Mis denoted as Tbh
m,i, which can be estimated by
Tbh
m,i = W
(
tra
i+1 − tra
i
) ∑
Rm+1
j=1
(
tD2D
j − tD2D
j− 1
)
log2
(
1 + SINRi,j
m
)
(30)
After getting the throughput formulas of each access duration, we
can easily obtain the access throughput and backhaul throughput of
each SBS in each scheduling period. For the m-th SBS, its access
throughput is denoted as Tra
m , which can be estimated by
Tra
m =
∑
Lm
i=1
(
Tral
m,i + Trap
m,i
)
(31)
Also, for the m-th SBS, its backhaul throughput is denoted as Tbh
m ,
which can be estimated by
Tbh
m =
∑
M
i=Lm
Tbh
m,i (32)
After the throughput of each scheduling period is obtained, in order
to calculate the network energy efficiency, we also need to figure out the
power consumption of each SBS in each scheduling period. Therefore,
we will calculate the power consumption of each type of link in each SBS
in each scheduling period. In a scheduling period, for the m-th SBS, the
power consumption of all access links is denoted as Pral
m , which can be
estimated by
Pral
m =
∑
n∈𝒩 m
(1 − sgn(Dn))tra
m
(
pu
n + PRF
)
(33)
where PRFrepresents the energy consumption of an RF chain. For the
access path with D2D communication of the m-th SBS, in a scheduling
period, its power consumption is denoted as Prap
m , which can be estimated
by
Prap
m =
∑
n∈𝒩 m
sgn(Dn)tra
m
(
tD2D
r
(
pu
n + PRF
)
+
(
1 − tD2D
r
)(
pu
r + PRF
) )
(34)
Also, the power consumption of the backhaul links of the m-th SBS in
a scheduling period is denoted as Pbh
m , which can be estimated by
Pbh
m =
(
1 − tra
m
)(
ps
m + PRF
)
(35)
Finally, we can get the network energy efficiency expression based
on the formulas derived above, which is given as follows.
Eb =
∑
m∈ℳTra
m
∑
m∈ℳ
(
Pral
m + Prap
m + Pbh
m
) (36)
Similar to [37], we assume that we can use the methods in [38] to
measure the distance and direction in this network precisely. In addi
tion, we also assume that we can use the beam training algorithm in
[39–41] to obtain the optimal beam between any SBS (or UE) and UE (or
SBS). In order to take both of the two situations of access end (i.e., with
D2D communication and without D2D communication) into account
when maximizing the network energy efficiency, we propose a resource
allocation problem consisting of two joint discrete power control and
non-unified transmission duration allocation optimization
sub-problems.
3.2.1. Joint access and backhaul resource allocation optimization sub-
problem
To solve the blocking problem of access end, we introduce D2D
X. Dai and J. Gui

8
communications to convert one NLOS link to an access path containing
two LOS links. Usually, the performance of each LOS link in this access
path is better than the original NLOS link, so the access throughput of
this access path is bound to be higher than that of the original NLOS link.
Therefore, when the access link between a UE (e.g., n) and the nearest
SBS to it (e.g., m) is in NLOS state, UE n will select a relaying UE (e.g., r)
according to the relay selection strategy described above. To simplify the
problem, UE n delegates r to represent it to participate in joint access and
backhaul resource allocation, where the behaviors of all the UEs similar
to UE n (i.e., the source UEs of all the selected relaying UEs) are
temporarily ignored. So, the D2D access duration of all the relaying UEs
must be set to 0 and the throughput of access paths of the m-th SBS in the
duration of t
ra
i − t
ra
i− 1for 1 ≤ i ≤ Lmis actually equal to the throughput of
the corresponding forwarding link and denoted as T
raf
m,i, which can be
estimated by
Traf
m,i =
(
tra
i − tra
i− 1
)∑
r∈ℛm
Tfwd
r,i (37)
In this case, for the m-th SBS, its access throughput can be given by
Tra
m =
∑
Lm
i=1
(
Tral
m,i + Traf
m,i
)
(38)
Also, in this case, the power consumption of the access path of the m-
th SBS in a scheduling period is actually equal to the power consumption
of the corresponding forwarding link and denoted as Praf
m , which can be
estimated by
Praf
m =
∑
n∈𝒩 m
sgn(Dn)tra
m
(
pu
r + PRF
)
(39)
Based on the formulas (37~39), the meaning of network energy ef
ficiency is obviously different from that expressed in the formula (36).
Therefore, we call it the intermediate energy efficiency, which is
denoted as Ef and estimated by the following formula.
Ef =
∑
m∈ℳTra
m
∑
m∈ℳ
(
Pral
m + Praf
m + Pbh
m
) (40)
Based on the above, we can formulate the first sub-problem as P1.
P1 :
max
tra,pu
Ef
(
tra
m , pu
n, pu
r
)
s.t.
C1.1 : tra
m ∈ 𝒬ts, ∀m,
C1.2 : pu
n, pu
r ∈ ℘u
, ∀n, r,
C1.3 : Tra
m ≤ Tbh
m , ∀m
(41)
where tra
with element tra
m , pu
with elements pu
nand pu
r are a 1 × Mand 1 ×
Ntra
dimensional matrice, respectively. Constraint C1.1 specifies the
available set of access transmission duration for each SBS. Constraint
C1.2 specifies the available set of power level for all the UEs with Dn =
0in 𝒩 tra
and all the relaying UEs in ℛ. Constraint C1.3 ensures that the
uplink throughput of each SBS is ultimately controlled by its backhaul
link.
3.2.2. Joint D2D access and forwarding link resource allocation
optimization sub-problem
After the joint access and backhaul resource allocation is completed
in the previous text, we can get the access transmission duration tra
m of
each SBS (e.g., m), the transmission power pu
nof each UE (e.g., n) without
selecting any relaying UE, and the transmission power pu
r of each selected
relaying UE (e.g., r). Therefore, based on these results, we can use the
formula (40) to get the intermediate network energy efficiency, which is
the result that the behaviors of the source UEs of all the selected relaying
UEs are temporarily ignored.
Here, we need to further perform resource allocation in each access
path. For each access path (e.g., n→r→m) in the coverage of SBS m, SBS
mneeds to divide its access transmission duration tra
m into the two parts (i.
e., D2D access duration and forwarding duration), to adjust the trans
mission power pu
r of relaying UE r, and to determine the transmission
power pu
nof source UE nof relaying UE r. The corresponding D2D access
throughput is denoted as TD2D
r , which can be estimated by
TD2D
r =
∑
Lm
i=1
(
tra
i − tra
i− 1
)
TD2D
r,i (42)
and the corresponding forwarding throughput is denoted as T
fwd
r , which
can be estimated by
Tfwd
r =
∑
Lm
i=1
(
tra
i − tra
i− 1
)
Tfwd
r,i (43)
Therefore, the average energy efficiency for access paths (e.g.,
n→r→m) is denoted as Er, which can be estimated by
Er =
∑
r∈ℛTD2D
r
∑
m∈ℳPrap
m
(44)
Since the transmission distance of D2D link is generally close and the
number of D2D links is much smaller than that of access links, we
reconstruct the transmission power set ℘su
of D2D link by adding several
smaller optional transmission powers to prevent excessive throughput of
D2D link. Then, based on the value of the transmission power pu
r of each
relaying UE (e.g., r) determined by solving the first sub-problem P1, the
corresponding forwarding throughput can be obtained by the formula
(43), which is denoted as Tfwd
r,max. Therefore, we can formulate the second
sub-problem as P2for all the access paths.
P2 :
max
tD2D,pu
Er
(
tD2D
r , pu
n, pu
r
)
s.t.
C2.1 : tD2D
r ∈ 𝒬ra
, ∀r,
C2.2 : pu
n ∈ ℘su
, ∀n,
C2.3 : pu
r ∈ ℘u
, ∀r,
C2.4 :TD2D
r ≤ Tfwd
r ≤ Tfwd
r,max,∀r.
(45)
where tD2D
with element tD2D
r is the 1 × Rdimensional matrice while
pu
with a pair of elements (pu
r , pu
n) is the 2 × Rdimensional matrice;
constraint C2.1 specifies the available set of D2D access duration for
each relaying UE; constraint C2.2 and C2.3 specifies the available set of
power levels for all relaying UEs in ℛand all source UEs with Dn = 1in
𝒩 tra
; constraint C2.4 ensures that the D2D access throughput will never
exceed the forwarding throughput in the same access path link.
After getting the adjusted transmission powers of all the relaying
UEs, and the determined transmission powers pu
nof the source UEs of
these relaying UEs, we can use the formula (36) to verify the network
energy efficiency, which is the result that the behaviors of the source
UEs of all the selected relaying UEs are considered. If the difference
between the two indicators (i.e.,Eb and Ef ) is within the tolerance range,
it indicates that our method to decompose the complex problem into the
two smaller sub-problems is feasible, which reduces the complexity of
solving the original overall problem.
4. Game theory and resource allocation
4.1. Resource allocation based on game theory
For the two sub-problems P1and P2proposed in Section 3, it is
obvious that their optimal solutions can be obtained by brute force al
gorithm. However, the computational complexity of this method is too
high to have good application in reality. The heuristic method is a type
of algorithms based on intuitive or empirical construction, which can
give a feasible solution to a combinatorial optimization problem at an
acceptable cost (i.e., computational time and space). However, most of
X. Dai and J. Gui

9
these methods need to be trained in advance before they can be used.
Moreover, the degree of deviation from the optimal solution cannot
generally be predicted. The sub-problem P1has the non-convexity
caused by the constraint C1.3, while the sub-problem P2 has the non-
convexity caused by the constraint C2.4. Therefore, the convex opti
mization method cannot be directly applied to them. Just like in [9], in
order to reduce the computational complexity, we also use game theory
to formulate the two sub-problems as the non-cooperative game prob
lems to get their suboptimal solutions. Moreover, we design a central
ized algorithm based on the best response dynamic to achieve a feasible
pure strategy NE for the two games respectively. After that, we will
propose a decentralized algorithm based on loglinear learning to obtain
a feasible pure strategy NE for the two games respectively, which is also
based on the C-plane/U-plane split architecture.
In particular, there is a special non-cooperative game (i.e., a poten
tial game with a common utility), whose utility function design principle
is concise and practical. This facilitates us to design a utility function
that has monotonicity in a specified range of variables of this function.
Also, beyond the specified range, there is a sharp reversal in the function
value. According to such characteristics, if each game player’s action
strategy space is discretized into a finite number of action strategies, it
can start game process through taking a conservative value for each
variable and search its own action strategy space in an ordering manner,
which avoids traversing the entire action strategy space and thus re
duces searching time. If each game player has a larger number of action
strategies, it is more likely to get a desired game convergence result,
which will more approach the optimal one. However, each game player
will spend more time searching in its own action strategy space.
Fortunately, since the elements (i.e., action strategies) of the action
strategy space can be arranged in an ordering mode, the binary search
approach can be used to improve search performance, especially in a
large number of action strategies.
4.1.1. Solution of joint access and backhaul resource allocation
Since the transmission durations of the access links and backhaul
links will affect each other, we formulate the first sub-problem as a non-
cooperative game denoted by 𝒢1 = [𝒦1,{𝒜1k}k∈𝒦1
,{u1k}k∈𝒦1
], where 𝒦1
= {1,2,…,M,M + 1,…,K1}is the set of players (i.e., all the SBSs, the UEs
with Dn = 0in 𝒩
tra
and the all relaying UEs) with K1 = M + Ntra
, 𝒜1kis
and u1kare the available pure strategies set and the utility function for
player k, respectively. u1kis a function of decision variables (i.e., tra
m ,
pu
n, pu
r ) and thus defined as
u1k
(
tra
m , pu
n, pu
r
)
= U1
(
tra
m , pu
n, pu
r
)
= Ef
(
tra
m , pu
n, pu
r
)
+ η1
∑
m∈ℳ
Φ
(
Tbh
m , Tra
m
)
(46)
where η1represents the penalty coefficient with the unit “bps/W”, and
Φ(x,y)is the penalty function [42], which satisfies that Φ(x,y) = − 1, if
x < y, and Φ(x,y) = 0, if x ≥ y. For each player k ∈ 𝒦1, it is regarded as
an SBS if 1 ≤ k ≤ Mand its strategy is defined as A1k = tra
k . If M + 1 ≤
k ≤ M + R, player kis regarded as a relaying UE and its strategy is
defined as A1k = pu
k. And it is regarded as a UE with Dn = 0in 𝒩 tra
if M +
R + 1 ≤ k ≤ K1and its strategy is defined as A1k = pu
k. The strategies of
SBSs represent their access transmission duration, the strategies of
relaying UEs represent the transmission power of their forwarding links,
and the strategies of the UEs without D2D communication represent the
transmission power of their access links.
The first term in (46) corresponds to the intermediate energy effi
ciency, and the second term represents the constraint C1.3 of the first
sub-problem for all the SBSs, which ensures that the player which
chooses the strategy that violates constraint C1.3 of the first sub-
problem will get a lower utility value.
Definition 1. (pure strategy NE) In game 𝒢 = [𝒦, 𝒮, U], a strategy
profile (S∗
k,S∗
− k)is a pure strategy NE, if ∀k ∈ 𝒦and Sk ∈ 𝒮k,
uk
(
S∗
k , S∗
− k
)
≥ uk
(
Sk, S∗
− k
)
(47)
where S∗
− k ∈ S− krepresent a strategy of all the players except for player k.
If we do not consider constraint C1.3, there may be lots of Nash
equilibria points for game 𝒢1. However, it is difficult to determine
whether those Nash equilibria points of game 𝒢1are feasible due to the
constraint C1.3. Therefore, we need to investigate the feasibility of the
pure strategy NE of the proposed game 𝒢1. For convenience, we denote
the maximum energy efficiency for the first sub-problem as η1 =
maxEf (tra
m ,pu
n,pu
r ). In addition, we also assume that each scheduling period
consists of enough time slots and the reason of this assumption can be
found in [9].
It is easy to know that the strategy of any player k which violates the
constraint C1.3 will lead to u1k(tra
m , pu
n, pu
r ) < 0if η1 ≥ η1. Therefore, the
strategy that violates the constraint C1.3 will never be the optimal
strategy.
Theorem 1. If η1 ≥ η1, the strategy profile which violates constraint
C1.3 of game 𝒢1will never be the pure strategy NE of game 𝒢1.
Proof. : We assume that (A∗
11
,A∗
12
,…,A∗
1M
)is a pure strategy NE of game
𝒢1which violates constraint C1.3. Furthermore, without loss of gener
ality, we assume that SBS m`violates constraint C1.3. Due to the
assumption that each scheduling period has enough time slots, SBS m`
can choose a better strategy A
′
1m
′
to make Tbh
m′ ≥ Tra
m′ . We only change the
strategy of SBS m`unilaterally, so we can get
u1
m
′
(
A∗
1
m
′
,A∗
1
− m
′
)
− u1
m
′
(
A
′
1
m
′
,A∗
1
− m
′
)
=Ef
(
A∗
1
m
′
,A∗
1
− m
′
)
− Ef
(
A
′
1
m
′
,A∗
1
− m
′
)
+
∑
m∈ℳnei
m
′
(
ϕm
(
A∗
1
m
′
,A∗
1
− m
′
)
− ϕm
(
A
′
1
m
′
,A∗
1
− m
′
))
=Ef
(
A∗
1
m
′
,A∗
1
− m
′
)
− Ef
(
A
′
1
m
′
,A∗
1
− m
′
)
− η1 <0
(48)
The result in (48) is contrary to that of (47) in Definition 1. So, the
strategy (A∗
11
, A∗
12
, …, A∗
1M
)is not a pure strategy NE. Therefore, we can
conclude that the strategy which violates constraint C1.3 will never be
the pure strategy NE of game 𝒢1. This completes the proof.
According to Theorem 1, we can know that the feasible solution of
game 𝒢1is constrained by a condition, so the pure strategy Nash equi
libria solutions of game 𝒢1will not be all feasible when this condition (i.
e., constraint C1.3) is not satisfied. Combining Theorem 1with the utility
formula (46) of game 𝒢1, we can know that any pure strategy violating
C1.3 will get a negative utility value, so it will never become a pure
strategy Nash equilibrium of game 𝒢1. Then, we study the existence of
the pure feasible strategy NE of game 𝒢1.
Definition 2. Game 𝒢 = [𝒦,𝒮,U]is an Ordinal Potential Game (OPG), if
∀k ∈ 𝒦, ∀si, s
′
i ∈ 𝒮and ∀s− i ∈ 𝒮− i, there exists a potential function
O : S→Rsuch that
Ui(si, s− i) > Ui
(
s
′
i , s− i
)
⇔ O(si, s− i) > O
(
s
′
i , s− i
)
(49)
What is more, the game 𝒢is an Exact Potential Game (EPG), if the
potential function Osatisfies
Ui(si, s− i) − Ui
(
s
′
i , s− i
)
= O(si, s− i) − O
(
s
′
i , s− i
)
(50)
From Definition 2, we can know that the EPG is a kind of special
OPG. So, the EPG also has all the properties of the OPG.
Lemma 1. If game 𝒢is an OPG, it has at least one pure strategy NE.
Lemma 2. Let game 𝒢is an OPG and function Ois the potential function
X. Dai and J. Gui

10
of game 𝒢. If the strategy profile S∗
∈ 𝒮maximizes function O, S∗
is a pure
strategy NE of game 𝒢.
Lemmas 1 and 2 indicate that every OPG has at least one pure
strategy NE which can maximize the potential function of the game.
Lemmas 1 and 2 come from the literature [43]. For Lemma 1, according
to definition 2, we can know that the trend of individual utility is the
trend of total utility for OPG. What’s more, the strategy profile of the
game is limited. So, in a finite number of iterations, the total utility will
move toward a better direction. Finally, it will find an optimal solution
that is the pure strategy NE of the game. For Lemma 2, we give the
following proof.
Proof. : Assume that 𝒢 = [𝒦, 𝒮, U]is an OPG and function Ois the po
tential function of game 𝒢. Since the strategy profile S∗
∈ 𝒮maximizes
function O, we can get O(S∗
) > O(S
′
), ∀S
′
∈ 𝒮S∗
. So, according to the
definition 2, we can get Uk(S∗
) > Uk(S
′
), ∀S
′
∈ 𝒮S∗
and ∀k ∈ 𝒦. There
fore, S∗
is a pure strategy NE of game 𝒢according to definition 1.
Theorem 2. If η1 ≥ η1and each scheduling period consists of enough
time slots, game 𝒢1is an EPG which has at least one pure strategy NE and
the globally optimal solution to the sub-problem P1constitutes a pure
strategy NE of game 𝒢1.
Proof. : Obviously, U1is a potential function for game 𝒢1. Then we
assume that
U1
(
tra
m , pu
n, pu
r
)
= Ef
(
tra
m , pu
n, pu
r
)
+ η1
∑
m∈ℳ
Φ
(
Tbh
m , Tra
m
)
= Ef
(
tra
m , pu
n, pu
r
)
+
∑
m∈ℳ
ϕm (51)
For any player k and its strategy (A1k
,A1− k
), if we change its strategy
unilaterally, then
U1
(
A
′
1k
, A1− k
)
− U1
(
A1k
, A1− k
)
= u1k
(
A
′
1k
, A1− k
)
− u1k
(
A1k
, A1− k
)
(52)
According to Definition 2 and Lemma 1, we can conclude that game
𝒢1is the EPG that has at least one pure strategy NE.
Then, we assume that A∗
1 = (A∗
11
, A∗
12
, …, A∗
1M
)is the global optimal
solution of game 𝒢1so that Ef (A∗
1) > Ef (A
′
1),∀A
′
1 ∈ 𝒜1A∗
1. What is more,
for strategy A∗
1, Tbh
m ≥ Tra
m , ∀m ∈ ℳ. Therefore, we can get
Ef
(
A∗
1
)
+
∑
m∈ℳ
ϕm
(
A∗
1
)
> Ef
(
A
′
1
)
+
∑
m∈ℳ
ϕm
(
A
′
1
)
(53)
That is U1(A∗
1) > U1(A
′
1) . So, strategy A∗
1maximizes the potential
game. According to Lemma 2, we can conclude that strategy A∗
1is a pure
strategy NE of game 𝒢1. This completes the proof.
Theorem 2 shows that game 𝒢1is a perfect potential game with at
least one pure strategy Nash equilibrium. Therefore, we can find a
feasible solution to game 𝒢1. In addition, since the optimal solution of
the sub-problem P1will constitute the pure strategic Nash equilibrium of
the game 𝒢1, the feasible solution of the game 𝒢1is also the feasible
solution of the sub-problem P1.
4.1.2. Solution of joint D2D access and forwarding link resource allocation
Similarly, we formulate the second sub-problem as a non-
cooperative game denoted by 𝒢2 = [𝒦2, {𝒜2k}k∈𝒦2
, {u2k}k∈𝒦2
], where
𝒦2 = {1,2,…,K2}is the set of players (i.e., all the relaying UEs, and their
source UEs) with K2 = 2R, 𝒜2kand u2kare the available pure strategies
set and the utility function for player k, respectively. u2kis a function of
decision variables (i.e., tD2D
r , pu
n, pu
r ) and thus defined as
u2k
(
tD2D
r , pu
n, pu
r
)
= U2
(
tD2D
r , pu
n, pu
r
)
= Er
(
tD2D
r , pu
n, pu
r
)
+ η2
∑
r∈ℛ
(
Φ
(
TD2D
r , Tfwd
r
)
+ Φ
(
Tfwd
r , Tfwd
r,max
) )
(54)
where η2represents the penalty coefficient with the unit “bps/W”. For
each player k ∈ 𝒦2, it is regarded as a relaying UE if 1 ≤ k ≤ Rand its
strategy is defined as A2k = (tD2D
k , pu
k). And if R + 1 ≤ k ≤ K2, it is
regarded as a source UE and its strategy is defined as A2k = pu
k. The
strategies of relaying UEs represent their D2D access duration and the
transmission power of their forwarding links, and the strategies of
source UEs represent the transmission power of their D2D links.
The first term in (54) corresponds to the average energy efficiency of
access paths, and the second term represents the constraint C2.4 of the
second sub-problem for each access path, which implies that the player
who chooses a strategy violating constraint C2.4 will be punished.
Similar to the first sub-problem, we donate the maximum energy
efficiency for the second sub-problem as η2 = maxEr(tD2D
r , pu
n, pu
r ). Then
we can get theorems as follows.
Theorem 3. If η2 ≥ η2, then the strategy profile which violates
constraint C2.4 of game 𝒢2will never be the pure strategy NE of game 𝒢2.
Theorem 4. If η2 ≥ η2and each scheduling period consists of enough
time slots, game 𝒢2is an EPG which has at least one pure strategy NE and
the optimal solution to the sub-problem P2constitutes a pure strategy
NE of game 𝒢2.
The proofs of the above theorems are the same as Theorems 1 and 2
respectively and they are omitted here due to the limited space. Similar
to Theorems 1 and 2, through Theorems 3 and 4, we can know the
following conclusions. 1) any strategy violating the constraint C2.4 will
get a negative utility value according to the formula (54), and thus it will
never be the pure strategy Nash equilibrium of game 𝒢2; 2) game 𝒢2has
at least one full potential game of pure strategy Nash equilibrium, and
thus we can find out at least a feasible solution for game 𝒢2, which is also
a feasible solution of sub-problem P2.
4.2. Centralized resource allocation algorithm with D2D communications
In this subsection, referring to the design idea of the centralized
resource allocation algorithm (CRA) in [9], we propose a centralized
resource allocation algorithm with D2D communications (CRA-D2D) to
get the pure strategy NEs of the first and second games, which is based
on the best response dynamic. The algorithm is described in Algorithm
1.
Since Algorithm 1 is a centralized algorithm, we can run it on a
powerful node (e.g., the MBS). Also, in order to meet the constraints of
sub-problem P1and sub-problem P2, each game of Algorithm 1must
start from a feasible initial strategy. Therefore, in Algorithm 1, the initial
access transmission duration for each SBS and the initial D2D trans
mission duration for each source UE selecting a relaying UE are set to be
the minimum in their available transmission duration sets respectively.
The lines 3~9 in Algorithm 1describe the process by which the MBS
executes the game operation for each player in the first game, where the
game process will continue until the MBS believe that each player will
not change its strategy. From line 9 in Algorithm 1, we can see that, if
any player’s utility in current round is better than that in the last round,
all the players will have the opportunity to continue improving their
utilities by changing their strategies in the next game decision. Also, we
can see from lines 5~6 that, when each player has the opportunity to
make its decision, the MBS selects the strategy that maximizes its current
utility for each player and updates its current strategy. As for the
execution process of the second game, it is similar to the first game, so
we will not elaborate on it. The convergence of the proposed Algorithm
X. Dai and J. Gui

11
1 is investigated in the following theorem which is similar to the proof
given in [42].
Theorem 5. If ∀ηi ≥ ηi, i ∈ {1,2}and each scheduling period consists of
enough time slots, Algorithm 1converges to the feasible pure strategy
NEs of game 𝒢1and game 𝒢2respectively in finite steps from any initial
feasible strategy profile.
Proof. : We first prove game 𝒢1. For any feasible strategy, since we use
the best response dynamic, for ∀k ∈ 𝒦1such that
u
′
1k
(
At1+1
1k
, At1
1− k
)
> u
′
1k
(
At1
1k
, At1
1− k
)
(55)
and (55) will be satisfied in each iteration. We assume U∗
1is the
maximum potential function value and U∗
1 < ∞because of the limited
number of players and their strategy. According to the Lemma 2, we can
know that the strategy which makes the potential function value be
equal to U∗
1is a pure strategy NE of game 𝒢1. Since game 𝒢1is a potential
game, for ∀k ∈ 𝒦1such that
u1k
(
At1+1
1k
, At1
1− k
)
− u1k
(
At1
1k
, At1
1− k
)
= U1
(
At1+1
1k
, At1
1− k
)
− U1
(
At1
1k
, At1
1− k
)
(56)
So, we can get U1(At1+1
1k
, At1
1− k
) > U1(At1
1k
, At1
1− k
), for ∀k ∈ 𝒦1, which
means that the value of the potential function will increase after each
iteration. Due to U∗
1 < ∞, we can assume that there be a time
T1(0 < T1 < ∞), and U1(AT1+1
1k
, AT1
1− k
) = U∗
1when T1is sufficiently large.
Therefore, Algorithm1 can converge to a feasible pure strategy NE of
game 𝒢1in finite steps from any initial feasible strategy profile. And the
proof process of game 𝒢2is similar to game 𝒢1.
Theorem 5 shows that both games in Algorithm 1 can start from a
feasible strategy and eventually converge to a feasible pure strategy
Nash equilibrium. This also means that, as long as we ensure that the
initial strategies of the two games do not violate C1.3 and C2.4, we can
finally get a feasible solution satisfying all the constraints through Al
gorithm 1.
4.3. Decentralized resource allocation algorithm with D2D
communications
Compared with the exhaustive search algorithm, the centralized al
gorithm proposed in the previous section can significantly reduce the
computational complexity. However, the disadvantage is that the
running node of this algorithm needs to know the channel state infor
mation (CSI) of all the players, which makes it require significant system
overhead. The authors in [9] proposed decentralized resource allocation
algorithm (DRA) based on loglinear learning to reduce system overhead,
and proposed concurrent decentralized resource allocation algorithm
(CDRA) to further improve convergence speed. However, in this paper,
since it is difficult to accurately calculate the network energy efficiency
with local information, we only consider to the application of DRA al
gorithm in this paper. Thus, in this section, we only design our decen
tralized algorithm based on the idea of the DRA, where the convergence
of the DRA has been proved in [44–46]. The decentralized resource
allocation algorithm with D2D communications (DRA-D2D) is described
in Algorithm 2.
The scaling parameter εi(i ∈ {1,2}) is used to prevent infinity, and its
specific value can be set according to experience. The information ex
change procedure for calculating utilities in the first game of Algorithm
2is similar to that in [9], and its detailed procedure can refer to the
Section III in [9]. However, in the second game of Algorithm 2, when the
selected player lcalculates its utility, the procedure of exchanging in
formation is as follows: 1) player l sends a request of collecting other
players’ utilities to the MBS through its C-plane; 2) after receiving the
request, the MBS broadcasts a command message to all the SBSs to
command them to report their D2D access throughputs and forwarding
throughputs by the C-plane; 3) after receiving the command message
from the MBS, all the SBSs order their relaying UEs to report their D2D
access throughputs and forwarding throughputs to the MBS through
their C-plane; 4) the MBS firstly evaluates Φ(TD2D
r , T
fwd
r )and Φ(T
fwd
r ,
T
fwd
r,max)for each relaying UE r, and then calculates the network utility,
then forwards the result to player lby the C-plane.
In fact, no matter CRA-D2D or DRA-D2D, the utility of each player is
calculated by the MBS since calculating utility requires global infor
mation. In CRA-D2D, each player should actively report its CSI-related
information to the MBS according to a pre-determined strategy. The
more frequent such reporting mode is, the better the timeliness of in
formation is, but the higher the network transmission cost is. However,
in DRA-D2D, each player will ask the MBS to calculate the utility used in
its game decision only if it is selected by the MBS to make a game de
cision. At this point, the MBS starts the collection of information for
calculating utility, and thus its network transmission cost is usually
smaller than that of CRA-D2D. However, since only one player is
selected in each game round to execute the decision in DRA-D2D, its
convergence speed is much slower than that of CRA-D2D.
In addition, we investigate the stability and optimality of the pro
posed Algorithm 2 based on the proofs in [45,46].
Theorem 6. For any game, if all the players of the corresponding game
adhere to Algorithm 2, the unique stationary distribution ρi(A)of any
strategy profile Ain the game is given as:
ρi(A) =
exp{βUi(A)/εi}
∑
̂
A∈𝒜i
exp
{
βUi
(
̂
A
)/
εi
} (58)
where i ∈ {1, 2}, 𝒜iis the space of power control and transmission
duration allocation strategy profile for all the players in game 𝒢i, and
Ui(⋅)is the potential function of game 𝒢i.
Proof. : We first prove game 𝒢1. We denote all the players’ state vector
at the t-th iteration as A1(t) = (At
11
, At
12
, …, At
1M
). Obviously, A1(t)is a
discrete time Markov process, which is irreducible and aperiodic.
Therefore, it has a unique stationary distribution, and then we only need
to prove that the distribution in (58) satisfies the following equation.
∑
X∈𝒜1
ρ1(X)Pr(Y|X) = ρ1(Y) (59)
where X, Y ∈ 𝒜1are arbitrary resource allocation states, and Pr(Y|X)is
the transition probability from Xto Y.
First of all, we suppose that X = (A11
, …, A1i
, …, A1M
), where the
iteration index tis omitted. In Algorithm 2, only one player is selected to
change its strategies. Without loss of generality, we can assume that Y =
(A11
,…,A
′
1i
,…,A1M
). Then we can get
ρ1(X)Pr(Y|X) =
exp{βU1(X)/ε1}
∑
̂
A∈𝒜1
exp
{
βU1
(
̂
A
)/
ε1
}
×
exp{βu1i (Y)/ε1}
( exp{βu1i (X)/ε1} + exp{βu1i (Y)/ε1} )
(60)
Denoting γas
γ =
1
K1
×
1
∑
̂
A∈𝒜1
exp
{
βU1
(
̂
A
)/
ε1
}
×
1
( exp{βu1i (X)/ε1} + exp{βu1i (Y)/ε1} )
(61)
Then we can get
ρ1(X)Pr(Y|X) = γexp{βU1(X) / ε1} × exp{βu1i (Y) / ε1} = γexp{ βU1(X)/ε1
+βu1i (Y)/ε1} (62)
According to the symmetry property, we can obtain
X. Dai and J. Gui

12
ρ1(Y)Pr(X|Y) = γexp{ βU1(Y)/ε1 + βu1i (X)/ε1 } (63)
Since u1i
(X) − u1i
(Y) = U1(X) − U1(Y), the following equation holds
ρ1(X)Pr(Y|X) = ρ1(Y)Pr(X|Y) (64)
Obviously, (64) holds when X = Y. Therefore, we have
∑
X∈𝒜1
ρ1(X)Pr(Y|X) =
∑
X∈𝒜1
ρ1(Y)Pr(X|Y) = ρ1(Y)
∑
X∈𝒜1
Pr(X|Y) = ρ1(Y) (65)
which implies that the distribution in (58) satisfies the balanced sta
tionary equation of Markov process A1(t). Since A1(t)is a unique sta
tionary distribution, we can conclude that its stationary distribution
must be (58), which completes our proof. In addition, the proof process
of game 𝒢2is similar to that of game 𝒢1, so the proof process is omitted.
Theorem 7. Given that ηi > ηi, ∀i ∈ {1,2}and each scheduling period
consists of enough time slots, the proposed Algorithm 2achieves the
global optimal solution of sub-problem P1and sub-problem
P2respectively with an arbitrarily high probability, if β /εiis suffi
ciently large.
Proof. : We first prove game 𝒢1. We assume that Aopt
1 is the global
optimal solution of game 𝒢1, and Aopt
1 is a pure strategy NE of game
𝒢1according to Theorem 2which maximizes the potential function value,
that is, U1(Aopt
1 ) > U1(A1), ∀A1 ∈ 𝒜1Aopt
1 . Therefore, we can get
exp{βU1(Aopt
1 ) /ε1}≫exp{βU1(A1) /ε1}, ∀A1 ∈ 𝒜1Aopt
1 when β /ε1is suf
ficiently large. According to (58), we can obtain lim
β/ε1→∞
ρ1(Aopt
1 ) = 1and
lim
β/ε1→∞
ρ1(A1) = 0, which means that the probability of choosing the
global optimal solution is 1 and the probability of choosing other solu
tion is 0. Hence, we can conclude that Algorithm 2achieves the global
optimal solution of sub-problem P1with arbitrarily high probability, if β
/ε1is sufficiently large. This completes our proof. In addition, the proof
process of game 𝒢2is similar to that of game 𝒢1, so the proof process is
omitted.
According to Theorems 6 and 7, we can know that Algorithm 2 which
makes decision according to the formula (57) is stable and can obtain the
optimal solution. First of all, Theorem 6 shows that no matter for game
𝒢1or game 𝒢2, its strategy can follow a steady distribution, which means
that no matter what the initial strategy is, it will converge to the same
strategy after several iterations. Also, through Theorem 7, we know that
these two games eventually converge to the optimal solution of the
corresponding sub-problem. Therefore, after several iterations, Algo
rithm 2will eventually converge to the optimal strategy.
4.4. Complexity analysis
In this subsection, we compare and analyze the computational
complexity of the proposed algorithms in this paper, the most relevant
algorithms to ours (i.e. CRA algorithm and DRA algorithm [9]) and the
exhaustive search algorithm. CRA-D2D requires to search
O(ζ1
∑K1
k=1M1
kδ1 + ζ2
∑K2
k=1M2
kδ2)combinations, where ζ1and ζ2represent
the number of iterations of the first game and the second game respec
tively which are less than 4 in the simulations, M1
kand M2
krepresent
complexity of choosing the strategy with the maximum utility of the first
game and the second game, and δ1and δ2represent the computation
complexity of the utilities of game 𝒢1and game 𝒢2respectively. For
M1
kand M2
k, different data processing methods have great influence on
them. For instance, we assume that the number of strategies of player kof
game 𝒢1is V1
k . If we do not perform data processing for utility value, then
M1
k = V1
k . However, if we sort the utility values after each calculation,
then M1
k = log2V1
k . Therefore, choosing different data processing
methods according to the number of strategies can effectively reduce the
complexity of calculation.
According to [47] and [9], we can get the computational complexity
of DRA-D2D which is π1(O(2δ1) + O(σ1)) + π2(O(2δ2) + O(σ2)), where
π1and π2denote the number of iterations of the first game and the second
game respectively, and σ1and σ2are the strategy updating complexity
which include 2 exponents, 1 sum and 2 divisions respectively. Both
σ1and σ2are small constants.
According to [9], we can know that the complexity of CRA algorithm
and DRA algorithm is O(ζ1
∑K1
k=1M1
kδ1)and π1(O(2δ1) + O(σ1)), respec
tively. It is obviously that our proposed algorithms and the most relevant
algorithms to ours converge at the same order magnitude. However, our
scheme has better network performance, especially when the network
communication condition is relatively poor. Then, although the
exhaustive search algorithm can usually find the optimal solution which
is always very difficult for our proposed algorithms, exhaustive search
algorithm requires to search O(
∏
K1
k=1
M1
kδ1 +
∏
K2
k=1
M2
kδ2)combinations which
converges much slower than our proposed algorithms, where the larger
network scale leads to the bigger gap. Therefore, our proposed algo
rithms have more practical value.
5. Performance evaluation
5.1. Simulation setting
The simulation scenario is shown in Fig. 2, where the radius of macro
cell coverage is 500 meters and that of small cell coverage is 100 meters.
There are the five groups of the non-overlapping small cells in the macro
cell, where the distance between the MBS and each group center is at
least 200 meters. The small cells in each group are in a non-overlap way.
Besides, for simplicity without loss of generality, we assume that
there are four UEs randomly distributed in the coverage of each small
cell and each SBS can simultaneously connect to two UEs at most. The
angle of departure (AOD) and angle of arrival (AOA) for each beam pair
link can be estimated by using the method in [48].
We adopt the mmWave channel model of 28 GHz band, and consider
the two types of mmWave link states (i.e., LOS and NLOS). In order to
better compare the changes of performance metrics after using D2D
communications, we assume that the two access links of each SBS are in
NLOS state, while we also show the performance under the situation that
each SBS has one NLOS and one LOS access links.
In addition, we assume that the beam width of MBS is 2∘
or 5∘
and the
range of the beam width of each UE or SBS is 10∘
∼ 60∘
. Unless otherwise
stated, we assume that the beam width of each UE or SBS is set to 30∘
and
the beam width of MBS is set to 5∘
. The other main simulation param
eters are list in Table 2.
5.2. Comparison algorithms and performance metrics
We evaluate the performance of our two algorithms (i.e., CRA-D2D
and DRA-D2D) and the two comparison algorithms (i.e. CRA and
DRA) in [9]. For convenience and intuition, the comparison algorithms
are called CRA-ND2D and DRA-ND2D since they do not use D2D
communications.
The two comparison algorithms take network sum rate as the opti
mization objective to allocate network resources, while our two algo
rithms take network energy efficiency as the optimization objective to
allocate network resources. The two comparison algorithms only need to
solve one game problem, while our two algorithms need to solve two
game sub-problems, where the result of the first game sub-problem is the
basis for solving the second game sub-problem. The information ex
change procedure and the exchanged content for calculating game
utilities in the second game of DRA-D2D are different from those of DRA-
ND2D, while the corresponding procedure and content in the first game
of DRA-D2D are similar to those of DRA-ND2D.
In the simulations, the interference distance threshold dthof non-
X. Dai and J. Gui

13
interfering player is set to 70 meters. It is worth noting that, no matter in
the first game or second game, unless otherwise specified, the maximum
number of iterations of the two centralized algorithms is set to 3 and that
of the two decentralized algorithms is set to 1000.
We will observe the performance of the above algorithms in terms of
the network energy efficiency and the network sum rate. The network
energy efficiency is defined as the ratio of network access throughput to
total power consumption in a scheduling period, while the network sum
rate is defined as the network access throughput in a scheduling period.
5.3. Simulation results and analysis
5.3.1. Convergence behaviors of the proposed solutions
Fig. 3 shows the convergence behaviors of the CRA-D2D algorithm,
while Fig. 4 shows the convergence behaviors of the DRA-D2D algo
rithm. The simulation parameters are the same as those in Table 2. From
Fig. 3(a), we can see that the CRA-D2D algorithm can always achieve
Fig. 2. simulation scenario.
Table 2
Simulation parameters.
Notation Description Value
Qts The number of time slots in a
scheduling period
10
Qra
The number of sub-slots
(smaller time slots) in a time slot
10
fc Carrier frequency 28 GHz
W Bandwidth 1 GHz
N0 Background noise power
spectrum density
-150 dBm/Hz
℘u
Transmission power set of UE [0.05,0.1,0.2] Watts
℘su
Transmission power set of
source UE
[0.0005,0.001,0.01,0.05,0.1,0.2]
Watts
ps
m Transmission power of backhaul
link
5 Watts
Γref Reflection coefficient 0.3
ε1 Scaling parameter for game 𝒢1 109
bps
ε2 Scaling parameter for game 𝒢2 1010
bps
β Learning parameter 100
PRF The energy consumed by an RF
chain
0.0344 Watts
Algorithm 1
CRA-D2D.
Input: the initial strategy 𝒬ts(1)or random power for all the players of game 𝒢1and the
initial strategy 𝒬ra
(1)or random power for all the players of game 𝒢2
Output: the final strategy
1. First game: joint access and backhaul resource allocation
2. Set iteration index t1 = 0;
3. repeat
4. for k = 1to K1do
5. At1+1
1k
= argmax
A1k
∈𝒜1
u
′
1k
(A1k
,A1− k
);
6. Update At1
1k
= At1 +1
1k
;
7. end for
8. Update t1 = t1 + 1;
9. until u
′
1k
(At1
1k
,At1
1− k
) = u
′
1k
(At1− 1
1k
,At1− 1
1− k
) ∀k ∈ 𝒦1
10. Second game: access paths resource allocation
11. Set iteration index t2 = 0;
12. repeat
13. for k = 1to K2do
14. At2 +1
2k
= argmax
A2k
∈𝒜2
u
′
2k
(A2k
,A2− k
);
15. Update At2
2k
= At2 +1
2k
;
16. end for
17. Update t2 = t2 + 1;
18. until u
′
2k
(At2
2k
,At2
2− k
) = u
′
2k
(At2 − 1
2k
,At2 − 1
2− k
) ∀k ∈ 𝒦2
X. Dai and J. Gui

14
convergence within 3 iterations in the first game process. In addition, we
can also see that it can achieve convergence within 2 iterations in the
second game process from Fig. 3(b). The main reason behind this rapid
convergence is that the node running the CRA-D2D algorithm executes a
decision operation for each player in each game round, and the best
response strategy is applied to each game decision process.
From Fig. 3, we can see that, with the increase of network scale, the
performance of the CRA-D2D algorithm is worse in the first game pro
cess while the reverse is true in the second game process. This is because
the larger network scale means the more SBSs, and thus there are the
more concurrent backhaul transmissions to the MBS in the first game
process, which results in the greater co-frequency interference. How
ever, in the second game process, the number of access paths for each
SBS does not exceed 2 in this simulation, where the same frequency
interference is very small. On the other hand, the mutual interference
between the different SBSs’ access paths may be less. Therefore, when
the number of SBSs in our simulation increases, the corresponding ac
cess paths may also increase. At this point, although there is the corre
sponding increase in mutual interference, the total access throughput
increases even more, and thus the overall network performance is better.
By comparing the results in Fig. 4 with those in Fig. 3, we can see that
the convergence speed of DRA-D2D is much slower than that of CRA-
D2D. The main reason behind this phenomenon is that the DRA-D2D
algorithm only allow one player to make decisions in each game
round. Also, from Fig. 4, we can see that, no matter in the first game
process or second game process, with the increasing of network scale,
the convergence speed of the DRA-D2D algorithm will become slower.
The reason is obvious. Since only one player can get decision-making
opportunity in each game round, the increasing number of players
will naturally reduce the convergence speed of the DRA-D2D algorithm.
Fig. 4(a) shows that, as for the impact of network expansion on the
performance of DRA-D2D, the variation trend is similar to that in Fig. 3
(a), and the corresponding explanation is also applicable here. However,
although Fig. 4(b) broadly reflects a trend similar to Fig. 3(b), there is
some irregularity when there are more SBSs. This may be due to the
characteristics of DRA-D2D’s randomly selecting a player in each game
round, which may result in some favorable players hardly being selected
and thus have less opportunity to adjust its strategy to a more optimized
state, especially when there are a lot of players. Although this possibility
is small, it is hardly avoided by the DRA-D2D algorithm.
5.3.2. Performance comparisons for different solutions
Fig. 5 shows the comparisons of the four algorithms in terms of
network energy efficiency and network sum rate. In Fig. 5, it is obvious
that no matter how many SBSs are given, the network energy efficiency
and the network sum rate of CRA-D2D is better than CRA-ND2D, and the
network energy efficiency of DRA-D2D is similar to DRA-ND2D but the
network sum rate of DRA-D2D is better than DRA-ND2D. The main
reason is that the NLOS link will significantly reduce the network per
formance especially in terms of network throughput, while using D2D
Algorithm 2
DRA-D2D.
Input: the initial strategy A0
1k
and A0
2l
for all k ∈ 𝒦1and l ∈ 𝒦2, the maximum time
T1
maxand T2
max
Output: the final strategy
1. First game: joint access and backhaul resource allocation
2. Set time iteration t1 = 0
3. repeat
4. Player selection: Player kis randomly selected at time step t1. Then, player
kexchanges information with other players by C-plane to calculate its utility ut1
1k
by
Eq. (46).
5. Strategy Exploration: Player krandomly, independently and autonomously
chooses a new strategy and performs this strategy in an estimation period, and then
other players calculate their utilities by Eq. (46). Player kexchanges information
with other players by C-plane again to calculate its new utility ̂
u
t1
1k
.
6. Strategy updating: Player kupdates its strategy according to the rule:
⎧
⎪
⎪
⎪
⎪
⎨
⎪
⎪
⎪
⎪
⎩
Pr(At1+1
1k
= ̂
A
t1
1k
) =
exp{β̂
u
t1
1k
/ε1}
Θ
Pr(At1+1
1k
= At1
1k
) =
exp{βut1
1k
/ε1}
Θ
(57)
where βis a learning parameter, εi(i ∈ {1,2}) is a scaling parameter, Pr(⋅)is the
probability of the event in (⋅)and Θ = exp{β̂
u
t1
1k
/ε1} + exp{βut1
1k
/ε1}. Meanwhile,
all other players keep their strategies unchanged, i.e., At1 +1
1k‘
= At1
1k‘
, for all k‘ ∈ 𝒦1k.
7. Update t1 = t1 + 1.
8. until t1 ≥ T1
max.
9. Second game: access paths resource allocation
10. Set time iteration t2 = 0
11. Repeat
12. Player selection: Player lis randomly selected at time step t2. Then, player
lexchanges information with other players by C-plane to calculate its utility ut2
2l
by
Eq. (54).
13. Exploration: Player lrandomly, independently and autonomously chooses a
new strategy and performs this strategy in an estimation period, and then other
players calculate their utilities by Eq. (54). Player lexchanges information with
other players by C-plane again to calculate its new utility ̂
u
t2
2l
.
14. Strategy updating: Player lupdates its strategy according to the rule which is
similar to (57), while all other players keep their strategies unchanged, i.e., At2 +1
2l‘
=
At2
2l‘
, for all l‘ ∈ 𝒦2l.
15. Update t2 = t2 + 1.
16. until t2 ≥ T2
max.
Fig. 3. Convergence behaviors of CRA-D2D in different network scales. (a) the first game process (b) the second game process.
X. Dai and J. Gui

15
communications can effectively reduce the decline of the network per
formance. It is worth mentioning that performance differences between
using CRA-D2D and using DRA-D2D are mainly due to the different ways
of convergence and the limit of the number of iterations.
5.3.3. Effects of different factors on network performance
Then, we will further investigate the effect of the radius of each SBS,
thermal noise density, the transmission environment, the transmission
power of backhaul links, and the beam width of UEs and SBSs on the
network energy efficiency and the network sum rate. By changing the
radius of SBS and thermal noise density, we expect to demonstrate the
greater adaptability of our solutions to the sparse base station deploy
ment and high ambient noise environments. By reducing the trans
mission power of backhaul links and increasing the beam width of UEs
and SBSs, the communication environment of mmWave networks will
become worse. At this moment, on the one hand, the receiving signal of
the receiver (e.g., the MBS) will be weaker, on the other hand, it will be
subject to greater interference. As a result, the signal interference noise
ratio is lower. We expect to demonstrate our solutions’ ability to cope
with a more hostile communication environment. The corresponding
results are shown in Figs. 6–10.
Fig. 6 shows that the two performance indexes vary with the radius
of each SBS in a network with M = 10and N = 40. When the radius of
each SBS is 20 meters or 50 meters, the simulation scenario is similar to
that of [9]. The other parameters are the same as those in Table 2. From
Fig. 6(a), we can see that with the increase of the radius, the network
energy efficiency of CRA-ND2D and DRA-ND2D will generally decrease,
but the network energy efficiency of CRA-D2D and DRA-D2D can be
relatively stable.
From Fig. 6(b), it is obvious that no matter what algorithm we use,
the network sum rate will decrease with the increase of the radius. In
addition, when the radius of each SBS is 100 meters, both the network
energy efficiency and the network sum rate of CRA-D2D and DRA-D2D
are better than those of CRA-ND2D and DRA-ND2D. The main reason is
that when the radius increases, the average SINR of links will decrease,
so the impact of NLOS links on network performance will increase and
the role of D2D communications will be more prominent. It should be
noted that a larger radius will reduce the network sum rate, but it can
also reduce the cost of network deployment.
In reality, seamless coverage is difficult to achieve because of the
huge investment required for dense base station deployment. It is a
feasible way to densely deploy base stations in hotspots. However,
hotspot prediction is not always accurate and thus there will be inevi
tably weak coverage areas in a certain period or in a certain region.
Therefore, it is also important to consider the adaptability of a scheme to
relatively sparse networks.
Fig. 7 shows that the two performance indexes vary with thermal
noise density, where the network scale is the same as that in Fig. 6. It can
Fig. 4. Convergence behaviors of DRA-D2D in different network scales. (a) the first game process (b) the second game process.
Fig. 5. Performance comparisons vs. the number of SBSs.
X. Dai and J. Gui

16
be seen from Fig. 7 that no matter what algorithm we use, both the
network energy efficiency and the network sum rate generally decrease
with the increase of thermal noise density. When thermal noise density
is -150 dBm/Hz, both the network energy efficiency and the network
sum rate of CRA-D2D and DRA-D2D are better than those of CRA-ND2D
and DRA-ND2D. The main reason is similar to that of Fig. 6. It is worth
mentioning that thermal noise density of -150 dBm/Hz may be relatively
strong, but the interference from other links may be of the same order of
magnitude due to the presence of spectrum multiplexing in practical
applications. Therefore, there is also the possibility of strong noise.
The effect of transmission environment on the two performance in
dexes is shown in Fig. 8, where we assume that there is one LOS link and
one NLOS link of each SBS. From Fig. 8, it is easy to see that no matter
what algorithm we use, as the number of SBSs increases, the network
energy efficiency will generally decrease while the network sum rate
will generally increase. Clearly, more SBSs can receive more UEs’ access
requests at the same time and thus result in more concurrent trans
missions, which helps increase the network sum rate. However, the more
SBSs also need the more backhaul links, where the average transmission
power of the backhaul links is very high and thus it is not necessarily
conducive to improving the network energy efficiency.
In addition, from Fig. 8(a), we can see that there is little difference in
terms of the network energy efficiency between CRA-D2D and CRA-
ND2D (or between DRA-D2D and DRA-ND2D). However, from Fig. 8
(b), we can see that the network sum rate of CRA-D2D (or DRA-D2D) is
slightly better than that of CRA-ND2D (or DRA-ND2D).
The reasons behind this phenomenon can be analyzed from two as
pects. When one NLOS link of each SBS is not assisted by D2D, the access
transmission duration obtained from the corresponding algorithm is
longer. At this moment, although this NLOS link has smaller throughput,
the other LOS link of the same SBS has a higher throughput and thus the
average access throughput of each SBS may not significantly change.
On the other hand, if one NLOS link of each SBS is assisted by D2D, it
can improve its throughput, but the access transmission duration ob
tained from the corresponding algorithm is shorter. At this time,
although the two links of each SBS have a good throughput, their
common access transmission duration gets shorter and thus the average
access throughput of each SBS may not significantly change.
Although D2D communications can indeed improve the throughput
of NLOS links, under the simulation environment set in Fig. 8, the
improvement of average access throughput is basically offset by the
increase of power consumption. Therefore, the network energy effi
ciency does not change significantly.
The effect of the transmission power of backhaul links ps
mon the two
performance indexes is shown in Fig. 9, where we assume that all the
backhaul links use the same power level. From Fig. 9(a), we can see that,
Fig. 7. Effect of thermal noise density.
Fig. 6. Effect of the radius of each SBS on network performance.
X. Dai and J. Gui

17
when the power of backhaul links is within a range from 0.5 to 5 in our
simulation, the network energy efficiency of the algorithms with D2D is
better than those without D2D. Outside this scope, the algorithms with
D2D are degraded to those without D2D. The reasons can be analyzed as
follows. When backhaul power is large enough, it is possible that the
access transmission duration in a scheduling period is increased while
the backhaul transmission duration is correspondingly reduced without
becoming a bottleneck itself. In this case, no matter what algorithm is
adopted, the difference in network energy efficiency is very small due to
the weakened effect of D2D and high backhaul power. However, as
backhaul links’ power level gets larger, the total energy consumption
gets greater. So the network energy efficiency of the four algorithms all
gets lower and lower. In addition, it can be seen from Fig. 9(b) that the
network sum rate of CRA-D2D (or DRA-D2D) is better than that of CRA-
ND2D (or DRA-ND2D). The main reason is that the solution using D2D
communication will indeed increase throughput by avoiding NLOS
communications.
The effect of the beam width of UEs and SBSs on the two performance
indexes is shown in Fig. 10. We assume that all the UEs use the same
transmitting beam width, all the SBSs use the same receiving beam
width, the transmitting beam width of all the SBSs is fixed at 2∘
, and the
receiving beam width of the MBS is set to 5∘
. From Fig. 10(a), we can see
that, with the increase of beam width of UEs and SBSs, the network
energy efficiency of CRA-ND2D and DRA-ND2D will generally decrease,
while the network energy efficiency of CRA-D2D and DRA-D2D is rela
tively stable. In addition, in the range from 30∘
to 60∘
in our simulation,
the network energy efficiency of CRA-D2D (or DRA-D2D) is better than
that of CRA-ND2D (or DRA-ND2D). Outside this scope, the network
energy efficiency of CRA-D2D (or DRA-D2D) is worse than that of CRA-
ND2D (or DRA-ND2D). From Fig. 10(b), we can also see that, except
when the beam width is 10∘
, the network sum rate of CRA-D2D (or DRA-
D2D) is better than that of CRA-ND2D (or DRA-ND2D). Besides, with the
increase of beam width, the network sum rate will generally decrease.
The main reason is that, as the beam width increases, the transmission
and reception gain will decrease and also the interference from other
links will increase. At this moment, the impact of NLOS links on network
performance is greater, and D2D communications lay a more prominent
role.
Although a very narrow beam is beneficial to enhance the trans
mission quality of mmWave links, it increases beam alignment burden,
especially if any transceiver position fluctuates. The beam width used in
practical applications should be tailored based on the need of a specific
application. If any transceiver is not moving and the surrounding envi
ronment is relatively stable, we may consider using a narrower beam.
Otherwise, the beam width should be enlarged to reduce the probability
of link interruption.
In the network environment where the average communication link
is relatively long, the environmental noise power is relatively large, the
Fig. 8. Effect of transmission environment on network performance.
Fig. 9. Effect of the transmission power of backhaul link on network performance.
X. Dai and J. Gui

1 s2.0-s1389128620312354-main

Recommended

Recommended

More Related Content

What's hot

What's hot (17)

Similar to 1 s2.0-s1389128620312354-main

Similar to 1 s2.0-s1389128620312354-main (20)

Recently uploaded

Recently uploaded (20)

1 s2.0-s1389128620312354-main