Documente Academic
Documente Profesional
Documente Cultură
6, JUNE 2018
Abstract—The millimeter wave (mmWave) radio band is promising for the next-generation heterogeneous cellular networks (HetNets)
due to its large bandwidth available for meeting the increasing demand of mobile traffic. However, the unique propagation
characteristics at mmWave band cause huge redundant handoffs in mmWave HetNets that brings heavy signaling overhead, low
energy efficiency and increased user equipment (UE) outage probability if conventional Reference Signal Received Power (RSRP)
based handoff mechanism is used. In this paper, we propose a reinforcement learning based handoff policy named SMART to reduce
the number of handoffs while maintaining user Quality of Service (QoS) requirements in mmWave HetNets. In SMART, we determine
handoff trigger conditions by taking into account both mmWave channel characteristics and QoS requirements of UEs. Furthermore,
we propose reinforcement-learning based BS selection algorithms for different UE densities. Numerical results show that in typical
scenarios, SMART can significantly reduce the number of handoffs when compared with traditional handoff policies without learning.
1 INTRODUCTION
3GPP Standard probabilistic LOS-NLOS models [18], mean- there is no extra antennas gain. For path loss, we use flexible
ing that the channel condition between UE and mm-FBS can path loss exponent model [20]
alternate between the two states, Line-of-Sight (LOS) and
Non-Line-of-Sight (NLOS). LOS state means that a line-of- PLðdÞ ¼ 10 log 10 ðdÞ þ 20 log 10 f þ 32:45; (5)
sight mmWave link between UE and mm-FBS exists. The
where d is the distance in meters, is the path loss exponent
channel state transition probability is related to environ-
and f is the carrier frequency in MHz. For NLOS environ-
ment, and this probability is typically unknown [16]. Note
ments, a larger exponent is used [20]. For shadow fading,
that the channel state for UEs may be different, even when
we use a zero mean Gaussian random variable to describe
they are located at the same position and associated with
it [21].
the same mm-FBS. This is due to blockages, and thus the
We assume that all BSs allocate bandwidth resources to
UEs may have different SNR. Similar to that in [5], [6], we
their serving UEs uniformly. According to Shannon capac-
assume that the path loss model is
ity formula, the achievable transmission rate for UE n asso-
ciated with BS j can be written as
LðdÞ ¼ a þ 10h log 10 ðdÞ þ ½dB; Nð0; u2 Þ; (1)
8
where d is the distance in meters, a and h are the least < BUm log2 ð1 þ SNRjn Þ; j 2 Mm
j
rn ¼ B
j
(6)
square fits of floating intercept and slope over the measured : t log2 ð1 þ SINRjn Þ; j 2 fMt [ MBSg;
Uj
distances (30 to 200 m), and u2 is the lognormal shadowing
variance. The values of a, h and u are different for LOS and where Bm ðBt Þ is the bandwidth of mm-FBS (Tr-FBS and
NLOS states [5], [6]. Since interference can be ignored for MBS) and Uj is the total number of UEs served by BS j.
mm-FBS, for a specific UE, say UE n, the SNR when associ-
ated with mm-FBS j can be written as 3.3 Initial Access Model
In this section, we illustrate how to discover a new BS, and
Pj cLðdÞ1
SNRjn ¼ ; (2) establish a possible connection in case a handover is per-
s2 formed. We assume that the cell search procedure for tradi-
where Pj is the transmit power of mm-FBS j, s 2 is the noise tional band is identical to that in LTE, i.e., the MBS and Tr-
power and c is the antenna gain. We assume that all mm- FBSs perform cell search by transmitting omnidirectional
FBSs are equipped with directional antennas which are nec- synchronization signals [22]. For mmWave system, 4G-LTE
essary to support beamforming and beam tracking for initial access procedure is infeasible due to the problem of
mmWave system. On the other hand, we assume that UEs discovery range mismatch [23], [24]. In our model, we adopt
are equipped with omnidirectional antennas, and thus the an efficient initial cell search scheme, iterative search [25], [26],
antenna gains are only accounted for at mm-FBSs side [19]. which performs a two-stage scanning procedure of the angu-
Similar to that in [5], [19], we assume that antenna gain lar space. In detail, the space is partitioned into several wide
model can be expressed as sectors, and each wide sector is divided into several narrow
sectors. In the first phase, the BS transmits pilots over wide
(
cmax ; if juj u2s sectors. In the second phase, the BS refines its search within
cðuÞ ¼ (3) the best wide sector by steering narrow beams, and thus
cmin ; otherwise; finds the best narrow sector [23]. All the pilots are transmit-
ted on a directional mmWave channel.
where u is the angle between UE and mm-FBS, and us is the
Technically, the cell search procedure is independent with
width of the antenna main lobe. When a UE is associated
the target BS selection policy. Hence, although the initial cell
with an mm-FBS, in order to maintain the mmWave com-
search scheme could affect the absolute value of the number
munication link, beam tracking could be used. We assume
of handoffs [27], [28], it does not affect the relative perfor-
perfect beam tracking is performed, and thus the transmis-
mance enhancement of the proposed SMART policy. Intui-
sion direction of the UE is always in the main lobe, so as to
tively, some new cell search schemes, such as those proposed
enjoy a high antenna gain.
in [27] and [28] which use context information to speed up the
Next, we present the traditional radio band channel
cell search process, could be implemented with the proposed
model. We assume that the MBS and Tr-FBSs are equipped
SMART handoffs policy in mmWave HetNets. As this is
with omnidirectional antennas to guarantee coverage area
beyond the scope of the work, we use the afore-mentioned
[19]. For traditional band links, we need to consider co-
iterative search scheme for mmWave band initial cell search.
channel interference due to shared bandwidth deploy-
ment. The SINR of UE n associated with BS j can be
3.4 QoS Model
expressed as
Similar to that in [29], [30], we use two factors to describe
8 QoS requirement: minimum threshold of transmission rate
<P
Pj gnj
> ; j is MBS
k2Mt
Pk gnk þs 2 g min
n and endurable time t n . The endurable time is the maxi-
SINRjn ¼ (4)
>
:P
Pj gnj
; j 2 Mt ;
mum time a UE is allowed to have the transmission rate
Pk gnk þs 2 lower than the minimum threshold. We state that the QoS
k2fMt [MBSg=fjg
of UE n is satisfied when the following condition holds
where gnj is the channel gain between UE n and BS j, which
includes path loss and shadowing. Since we assume that Tr- 9t0 2 ½t t n ; t; s:t:rjn ðt0 Þ g min
n : (7)
FBSs and UEs are equipped with omnidirectional antennas,
SUN ET AL.: THE SMART HANDOFF POLICY FOR MILLIMETER WAVE HETEROGENEOUS CELLULAR NETWORKS 1459
4.2 BS Selection
4.1 Handoff Trigger Conditions
Once handoff trigger conditions are met, UEs need to select
Event A2 occurs when the RSRP of the serving BS becomes
suitable target BSs. In SMART, we use reinforcement-learning
worse than a threshold [3], and the trigger condition can be
for selecting BSs to reduce the number of unnecessary hand-
expressed as
offs. We design two BS selection algorithms: SMART-S and
RSRP jn < threshold Hys; (8) SMART-M, for different UE density circumstances. SMART-S
with low computational complexity is for a specific UE. It is
where Hys is a hysteresis parameter added for reducing suitable for sparse UE density circumstance. SMART-M is a
redundant handoffs (e.g., ping-pong effect). Event A2 hand- joint optimal policy for multiple UEs triggering handoffs in
off is performed when the serving BS cannot fulfill the mini- the same measurement report period. It is suitable for dense
mum UE QoS requirement. Thus, in SMART, the trigger UE distribution circumstance with a central controller.
condition can be written as
5 SMART-S ALGORITHM FOR SINGLE TARGET
8t0 2 ½t t n ; t; rin ðt0 Þ < g min
n ; (9) BS SELECTION
where t n and g min
n are UE service type parameters. This Note that once a specific BS satisfies the trigger conditions of
change can avoid many unnecessary handoffs. Once Event A3, the target BS is determined. We therefore focus on
inequality (9) is satisfied for UE n, an Event A2 handoff is the BS selection for Event A2. Let An ðtÞ be the set of admis-
triggered, and the UE needs to select a suitable target BS. sible BSs when UE n triggers Event A2 handoff at time t,
Event A3 occurs when a neighbor BS becomes offset bet-
ter than the serving BS [3], and the trigger condition can be An ðtÞ ¼ fk j rkn ðtÞ g min þ G; 8k 2 M [ MBSg; (12)
n
expressed as
where G is a criteria offset parameter. For UE n with volume
RSRP kn RSRP jn þ offset; (10) of data Qn to be transmitted, we use Hn to denote the num-
ber of handoffs. Our goal is to select BS in set An ðtÞ with
for time-to-trigger (TTT) period, where RSRP kn and RSRP jn
minimum Hn once Event A2 condition is triggered.
are the RSRPs of target BS k and current serving BS j mea-
sured by UE n respectively, and offset and TTT are two 5.1 Reinforcement-Learning Framework
parameters defined in 3GPP. Once a UE experiences a hand-
We model the BS selection problem as a reinforcement
off in this event, it means that the UE switches to a better BS
learning problem. It consists of three elements: agent, envi-
which can improve its QoS although current serving BS can
ronment and action. In our model shown in Fig. 2, the agent
fulfill the minimum QoS requirement. Thus, SMART uses
is a specific UE n, the environment is the channel conditions
the following three trigger conditions
of BSs, and the action is BS selection policy. The aim is to
9t0 2 ½t t n ; t; s:t: rjn ðt0 Þ g min (11-1) maximize the total reward by a sequence of BS selections.
n ;
Our objective is to minimize the total number of handoffs
rkn ðtÞ rjn ðtÞ þ offset; (11-2) Hn . As it is difficult to incorporate Hn into the reward function
directly, we make a transformation as follows. Let reward
g max g min > : (11-3) function Rkn ðtÞ be defined as the volume of transmitted data
n n
from time t to tkn when UE n switches to BS k at time t, or
Condition (11-1) states that the current serving BS can fulfill Z tkn
the minimum UE QoS requirement. Condition (11-2) con-
Rkn ðtÞ ¼ rkn ðtÞdt: (13)
straints that the transmission rate of the target BS k is at least t
offset higher than that of the serving BS j. Condition (11-3)
indicates that the difference of transmission rate between Proposition 1. Minimizing the total number of handoffs Hn for
maximum threshold and minimum threshold is greater UE n is equivalent to solving the proposed reinforcement learn-
than in QoS requirement. Similar to traditional handoff ing problem with the reward function defined in (13).
1460 IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. 17, NO. 6, JUNE 2018
Proof. Let tkn in (13) equal to the time when the next handoff handoffs occur for UE n can be expressed as
for UE n is triggered after time t, and we define a sort
function F in a finite set X as X
W
Regretp ðW Þ ¼ ½Rp ðtkn Þ Rkn ðtkn Þp ; (18)
FðxÞ ¼ k; x 2 X and x is the k smallest element in X: Fðtkn Þ¼1
(14)
where Rp ðtkn Þ is the reward of the optimal policy p at time
The objective of the above reinforcement learning model tkn . It was shown in [31] that the best regret is logarithmic
is to find the optimal policy p : with respect to the number of handoffs W . Based on that,
2 3 the authors of [32] proposed an Upper Confidence Bound
X K
(UCB) algorithm to deal with this tradeoff. It can achieve
p ¼ arg max Ep 4
Rn ðtÞ5;
k
(15) logarithmic regret with low computation complexity. The
p
Fðtkn Þ¼1
UCB policy states that the agent chooses machine j at each
where K is the maximum value of Fðtkn Þ, which is equals decision time according to the following index
to the number of handoffs in the time period. sffiffiffiffiffiffiffiffiffiffiffiffiffi!
If we fix the volume of transmitted data of UE n as Qn , 2 ln W
j ¼ arg max xj þ ; (19)
applying policy p can minimize the total number of hand- j Wj
offs of UE n when transmitting Qn data, which equals to
our optimization objective minHn . u
t where xj is the average reward obtained from machine j, Wj
is the number of times machine j has been chosen and W is
5.2 Expected Reward Estimation the overall number of decisions so far.
As tkn and rkn ðtÞ in (13) are unknown random variables, the The BS selection algorithm when UE n triggers Event A2
expected reward E½Rkn ðtÞ can only be estimated from histor- handoffs is based on UCB. We set index of BS j for UE n as
ical information. We use R k ðtÞ to denote the observed value
n
rffiffiffiffiffiffiffiffiffiffiffi
of Rn ðtÞ which can be obtained once UE n switches to BS k.
k Rk ðT k Þ þ ‘ 2 lnkHn , where ‘ ¼ maxk2A ;C 2C R
k ðT k Þ and
Cn Cn T n n Cn Cn
Cn
However, a UE may not stay around a specific BS k for a
long time, and thus we cannot have enough historical infor- Hn is the total number of handoffs for UE n so far. Thus, the
mation to estimate Rkn ðtÞ accurately. To get around, we policy is selecting BS k in set An for UE n once Event A2
define type reward R k ðT k Þ as handoff occurs, where k can be expressed as
Cn Cn
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi!
k ð0Þ ¼ 0;
R (16-1) 2 ln Hn
Cn
k ¼ arg max RCn ðTCn Þ þ ‘
k k
: (20)
k TCkn
k ðT k Þ þ R
TCkn R k ðtÞ
k ðT k þ 1Þ ¼ Cn Cn n
R ; (16-2)
Cn Cn
TCkn þ 1 We summarize our proposed SMART-S BS selection algo-
rithm in Algorithm 1.
where TCkn denotes the number of times that BS k is selected
by UEs with service type Cn . We take this observed
k ðT k Þ as the mean reward for UEs with the Algorithm 1. SMART-S BS Selection Algorithm Based on
value R Cn Cn
same service type Cn , and each UE uses his own observed
UCB
reward R k ðtÞ to update the type reward R k ðT k Þ after a Input: Network topology (BS and UE distributions, ); service
n Cn Cn
handoff occurs based on (16-2). Thus, the expected reward type of UEs.
can be estimated as Output: BS selection decisions k .
k ðT k Þ in time T based on
1: Initialization: obtain TCkn , Hn , R
( Cn Cn
k ðT k Þ; if n 2 Cn
R traditional handoff policy
E½Rkn ðtÞ ¼ Cn Cn
(17) 2: while handoff conditions are met for a certain UE n do
0; otherwise:
3: if Event A2 handoff then
Since the handoff trigger conditions of UEs with the same 4: Judge service type Cn of UE n
TCk R
k ðT k ÞþR
k ðtÞ
k ðT k Þ can be accu-
service type are similar, type reward R 5: k ðT k þ 1Þ
R n Cn Cn n
Cn Cn Cn Cn TC þ1
k
rately estimated by reinforcement learning. n !
rffiffiffiffiffiffiffiffiffiffiffi
6: k ¼ arg maxk k ðT k Þ þ ‘ 2 lnkHn
RCn Cn T
5.3 BS Selection Algorithm Cn
We cannot always select the BS with the highest reward 7: TCkn TCkn þ 1,Hn Hn þ 1
since a well-known dilemma exploration versus exploita- 8: else
tion exists in reinforcement learning. This dilemma states 9: switch to the unique target BS k
that there is a tradeoff between improving UEs knowledge 10: end if
about the reward distributions of BSs (exploration) and 11: end while
switching to the BS with the highest empirical mean reward
(exploitation). Regret is a concept to measure the perfor-
mance of a policy [16], which is defined as the difference of 5.4 Properties of SMART-S
total reward between the adopted policy and global optimal SMART-S algorithm does not perform iteration and thus
policy. In our problem, the regret of policy p after W does not have convergence issue. We investigate here the
SUN ET AL.: THE SMART HANDOFF POLICY FOR MILLIMETER WAVE HETEROGENEOUS CELLULAR NETWORKS 1461
performance bound and signaling overhead. The perfor- 6 SMART-M ALGORITHM FOR MULTIPLE TARGET
mance bound is established from Fact 1 and Corollary 1. BS SELECTION
Fact 1. For all K > 1, if policy UCB is run on K machines The BS selection algorithm discussed in Section 5 focuses on
having arbitrary reward distributions P1 ; . . . ; PK with individual UEs. However, in the time interval between two
support in ½0; 1, its expected regret can achieve logarith- adjacent measurement report periods, there may be multi-
mic bound. ple UEs that need handoff especially for dense UE distribu-
tion. Moreover, multiple UEs may trigger handoffs in the
Proof. cf. [32] for proofs. u
t
same time period or even simultaneously in typical scenar-
Corollary 1. The proposed UCB-based SMART-S BS selection ios, such as a group of UEs riding in a moving bus. We
policy achieves logarithmic regret with respect to the total num- therefore design SMART-M algorithm for optimal multi-BS
ber of handoffs Hn . selection.
Proof. We construct a new reinforcement learning model 6.1 Problem Formulation Based on Learning
which is the same as our above proposed model except Results
for the reward function. For sake of convenience, we Let N be the set of UEs sending handoff request to the net-
denote the above proposed and the new reinforcement work central controller in a measurement period and let
learning model as RL 1 and RL 2 respectively. The reward N ¼ jN j. As the period is usually short (e.g., in tens of milli-
function of RL 2 is defined as seconds), we assume that the BS selection decisions are
made at the end of individual periods. Here, the objective
Rtn ðtÞ function Y is again chosen as the volume of transmitted
Ynk ðtÞ ¼ ; (21)
‘ data before the next handoff occurs for these N UEs. Also
we use R j ðT j Þ to estimate E½Rk ðtÞ based on the above
Ci Ci n
where Rkn ðtÞ is the reward function of RL 1 defined in reinforcement learning. The problem is formulated as
(13). As ‘ is a constant, RL1 and RL2 have the same policy
XX
solution. Since Ynk ðtÞ has a bounded support in ½0; 1, we max Y ¼ j ðT j Þ
xij R (25)
Ci Ci
use UCB algorithm to solve RL 2 problem, and thus the i2N j2Ai
index in (19) can be expressed as
X
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi! s:t: xij Nj ; 8j 2 [i2N Ai ; (25-1)
2 ln Hn i2N
k ¼ arg max yk þ ; (22)
k TCkn X
xij ¼ 1; 8i 2 N (25-2)
j2Ai
where yk is the average reward obtained from BS k which
k ðT k Þ
R
equals to C C n
‘
n
. Thus, the index can be rewritten as xij 2 f0; 1g; 8i 2 N ; 8j 2 [i2N Ai ; (25-3)
to replace the index in (23) which is the same as the pro- 6.2 BS Selection
posed BS selection policy. u
t The problem stated in (25) is a special case of a well-known
Next we discuss the signaling overhead for SMART-S BS NP-hard problem Generalized Assignment Problem (GAP),
selection algorithm. When the handoff trigger conditions with OðN jAj Þ complexity using brute force algorithm. Obvi-
are satisfied for a specific UE, it notifies his service type to ously it is infeasible to use the brute force algorithm for
the admissible BSs in set An , and the BSs calculate and send solving dense deployment mmWave HetNets due to pro-
their corresponding indexes to the UE. The UE switches to hibitively high computational complexity. Instead, we pro-
the target BS, say BS k, determined by using (20). When the pose the following efficient heuristics. We first relax binary
next handoff occurs, the UE obtains the value of R k ðtÞ, and variables xij in constraints (25-3) to be continuous variables
n
transmits it to BS k. The BS uses it to update the expected in ½0; 1. We then exploit Lagrange dual decomposition
reward and index according to (16) and (17). Thus, the num- method [33] to solve this optimization problem.
ber of signaling exchanges needed is 2jAn j and each signal- After relaxing xij , problem (25) becomes a linear problem
ing exchange uses several bits. with Lagrange function
1462 IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. 17, NO. 6, JUNE 2018
!
XX X X where ", b and r are fixed positive constant with b < 1 and
x; m Þ ¼
Lðx j ðT j Þ
xij R mj xij Nj ; (26)
Ci Ci r 1 [34].
i2N j2Ai j2A i2N
For linear programs, strong duality holds. Therefore, the
where mj is Lagrange multiplier. For a fixed vector m , minimum value of gðm mÞ is equal to the maximum value of
Lagrange dual function can be expressed as the original problem. The solution process is that: we first
obtain the maximum value gðm mÞ over x with fixed m , and
mÞ ¼ sup Lðx
gðm x; m Þ (27) then minimize gðm mÞ over m denoted as gðm m Þ. The optimal
x
binary solution x is obtained with the corresponding solu-
X tion m . According to x we make BS selections for those
s:t: xij ¼ 1; 8i 2 N ; (27-1)
j2Ai
handoff UEs.
Similar to that in Section 5, Rj ðT j Þ is updated once the
Ci Ci
0 xij 1; 8i 2 N ; 8j 2 A; (27-2) next handoff occurs according to (14) and (15). Note that,
the reinforcement-learning process in Section 2 can improve
and the dual problem is minm gðmmÞ. Rewriting function gðmmÞ the accuracy of the value of R j ðT j Þ thus the solution of
Ci Ci
yields this optimization problem. We summarize the SMART-M
XX X algorithm in Algorithm 2.
mÞ ¼ sup
gðm j ðT j Þ mj Þ þ
xij ðR mj Nj : (28)
Ci Ci
x
i2N j2Ai j2A
Algorithm 2. Joint Optimal SMART-M BS Selection
Since it dose not include the cross-term of xij , we can Algorithm.
exchange the computation order as: Input: Network topology (BS and UE distributions, ); handoff
X X X UEs N .
mÞ ¼
gðm sup j ðT j Þ mj Þ þ
xij ðRCi Ci mj Nj : (29) Output: BS selection decisions x .
i2N xij ;j2Ai j2Ai j2A Initialization:
1: Judge service type of UEs
Thus, we can solve the following problem for each UE i sep- 2: Determine admissible BSs
arately, 3: The BSs send the value of R j ðT j Þ and Nj to the central
Ci Ci
X controller
gi ðm
mÞ ¼ sup j ðT j Þ mj Þ
xij ðR (30) BS selection decisions:
Ci Ci
xij ;j2Ai j2A
i 4: x 0 0, x0 current connections, k 1
X 5: while xk 6¼ x k1 do
s:t: xij ¼ 1; (30-1) 6: k kþ1
j2Ai 7: for each UE i 2 N do
8: solve problem (30)
0 xij 1:8j 2 Ai : (30-2)
9: end for (obtain xk )
Since we want to find a binary solution of xij , for a fixed 10: update m k according to (31)
vector m , problem (30) is described as: for UE i, we choose a 11: end while
12: x xk
BS j from set Ai to maximize the value of RjCi ðtÞ mj .
Therefore, when m is fixed, problem (27) can be solved by
choosing the optimal BS j for each UE respectively. Then
we minimize gðm mÞ over m to obtain the optimal value m for 6.3 Convergence and Computational Complexity
the dual problem. We use negative gradient direction to of SMART-M
update mj with respect to mj 0, To prove the convergence of SMART-M BS selection algo-
rithm, we need Propositions 2 and 3.
" !#þ
X Proposition 2. Let mk be the sequence generated by (31). Then,
mj ðk þ 1Þ ¼ mj ðkÞ dðkÞ Nj xij ; 8j 2 A; (31)
i2N
for all non-negative jAj-dimensional vectors v and k 0
where dðkÞ > 0 is the update step size, and is given by mkþ1 v k2 km
km mk v k2 2dðkÞðgðm
mk Þ gðvvÞÞ
mk Þ gk
gðm þ dðkÞ2 kh xk Þk2 ;
hðx
dðkÞ ¼ ; 8k 0; (32)
hk k2
kh
where kh hðx
xPk Þk is an jAj-dimensional vector with elements
where gk is an estimate of the optimal value g . The proce- hj ¼ Nj i2N xkij , jAj is the cardinality of set A, and x k is
dure of updating gk is given by xk ; m k Þ ¼ supx Lðx
a vector that satisfies Lðx x; mk Þ ¼ gðm
mk Þ.
gk ¼ min gðm
mk Þ " k ; (33) Proof. According to (31), we have
1jk
As Lðx xÞ ¼ @Lðx
x; m Þ in (26) is linear and h ðx mÞ
x;m
m , for all vec-
@m First, the two algorithms are designed for different UE
jN j jAj
tors x 2 ½0; 1 we have density scenarios. SMART-S is appropriate for sparse UE
density, while SMART-M is designed for dense UE distribu-
x; m k Þ Lðx
Lðx mk v ÞT h ðx
x; v Þ ¼ ðm xÞ: (36) tion. Specifically, SMART-S chooses target BS for a single UE
without considering the states and decisions of other UEs.
For vector x k , SMART-M can achieve a joint optimal BS selection policy for
multiple UEs which are triggered to perform handoffs in the
xk ; m k Þ Lðx
Lðx xk ; v Þ ¼ gðmmk Þ Lðx
xk ; v Þ same measurement report period. The computational com-
gðm mk Þ sup Lðx x; v Þ ¼ gðm
mk Þ gðvvÞ: (37)
plexity of SMART-M is higher than SMART-S algorithm.
x
Thus, we choose SMART-S or SMART-M according to the
Combining (36) and (37) yields UE density.
On the other hand, SMART-M makes handoff decisions
mk v ÞT h ðx
ðm xÞ gðm
mk Þ gðvvÞ: (38) for multiple UEs by solving an optimization problem with
unknown parameters. We employ the learning algorithm of
Combining (35) and (38) yields SMART-S to evaluate the unknown parameters in the opti-
mization framework. In more details, SMART-M needs to
mkþ1 v k2 km
km mk v k2 2dðkÞðgðm
mk Þ gðvvÞÞ solve (25) with the expected reward R j ðT j Þ in the rein-
Ci Ci
þ dðkÞ2 kh xk Þk2 :
hðx forcement learning model of SMART-S, which is indeed the
estimated value of E½Rkn ðtÞ. In this sense, SMART-S can be
u
t used to enhance the accuracy of R j ðT j Þ from historical
Ci Ci
data, and thus improve the performance of SMART-M.
Proposition 3. Assuming that step size dðkÞ is determined by Second, let us discuss the implementations of the two algo-
(32), (33) and (34), if g > 1 then limk!1 inf gðm mk Þ rithms, in order to further clarify their relation. For the selec-
g þ ". tion between the two algorithms, SMART-S is indeed feasible
P for any UE density. From Corollary 1, we can see that
Proof. As Nj and xkij are bounded, hj ¼ Nj i2N xkij and
SMART-S achieves logarithmic regret with respect to the total
then vector hðx xÞ ¼ @Lðx
xk Þ are also bounded. Since h ðx mÞ
x;m
m ,
@m number of handoffs. In other words, although we always run
there exists a scalar c that
SMART-S for any UE density, we can still achieve at least log-
arithmic regret bound and enjoy performance improvement.
c supfk@gðm
mÞkg: (39)
Certainly, for dense UE distribution circumstances, we can
Combining Proposition 2 and (39), we can conclude that run SMART-M algorithm to further improve the handoff per-
our problem satisfies the necessary conditions of Propo- formance with some computational cost.
sition 6.3.6 in [34]. By applying this proposition with In our system, we adopt a simple selection policy between
g k ¼ 1, we obtain the Proposition 3, and the convergence the two algorithms. We define a UE density threshold G to
of SMART-M is proved. u
t identify sparse or dense UE distribution. When the number of
UEs which send handoff request to the controller in a mea-
Next we discuss the computational complexity of surement period is lower than G, SMART-S is selected, other-
SMART-M BS selection algorithm. At each iteration, we wise SMART-M is selected. Other handoff procedure remains
decompose gðm mÞ into N sub-problems gi ðm mÞ, and the compu- the same as that in conventional handoff policy.
tational complexity of gi ðm
mÞ is OðjAi jÞ. Thus, the complexity
of SMART-M BS selection algorithm is OðkjN jjAjÞ, where k 7 NUMERICAL RESULTS
is the number of iterations. In most simulation experiments,
the algorithm converges in less than 10 iterations with total In this section, we compare the performance of SMART with
run time several milliseconds, which can satisfy real-time two conventional handoff policies as follows. (1) Rate-based
requirements. handoff (RBH). RBH has similar trigger conditions as those in
We again evaluate the signaling overhead for SMART-M 3GPP. When choosing target BSs for handoffs, the ones with
BS selection algorithm. The UEs who trigger handoff condi- maximum transmission data rates are chosen (instead of max-
tions need to notify the service types to their admissible imum RSRP in 3GPP [3]). (2) SINR based handoff (SBH). SBH
BSs, which calculate and send the corresponding value has the same handoff trigger conditions as that of SMART
Rj ðT j Þ to the central controller. The central controller and uses maximum SINR for target BS selection.
Ci Ci
makes handoff decisions based on SMART-M policy, and
sends these decisionsP to UEs. Thus, the number of signaling 7.1 Simulation Settings
exchanges needed is N i¼1 jAi j þ jAj þ jN j, and each signal- We consider a two-tier HetNet deployed in urban area, and
ing exchange uses several bits. the HetNet consists of an MBS and varying number of mm-
FBSs, Tr-FBSs and UEs. The MBS is located at the central of
6.4 Further Discussions on SMART-S and a circular area with radius equal to 500 m, and both mm-
SMART-M FBSs and Tr-FBSs are randomly distributed in the area. The
After discussing the details of the two algorithms sepa- transmit power of MBS, mm-FBS and Tr-FBS is set to 46
rately, in this section, we clarify the two algorithms together dBm, 30 dBm and 20 dBm, respectively. Both the number
from two aspects: (1) the relation between the two algo- and region of blockages in mm-FBS are randomly gener-
rithms; and (2) the implementation of the algorithms. ated. Similar to that in [6], when UEs in mm-FBS move to
1464 IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. 17, NO. 6, JUNE 2018
TABLE 1
Simulation Parameters
Parameters Value
MBS radius 500 m
Power of MBS 46 dBm
Power of mm-FBSs 30 dBm
Power of Tr-FBSs 20 dBm
Bandwidth of MBS/Tr-FBS 20 MHz
Bandwidth of mm-FBS 500 MHz
Path loss exponent for LOS 2
Path loss exponent for NLOS 3
cmax 18 dB
cmin 2 dB
Carrier frequency 2,000 MHz
Parameters for LOS path loss a ¼ 72; h ¼ 2:92
Parameters for NLOS path loss a ¼ 61:4; h ¼ 2
Noise power for mmWave band 77 dBm
Noise power for traditional band 101 dBm
ACKNOWLEDGMENTS
This work was supported by the National Science Founda-
tion of China under Grant number 61631005 and 61471089,
and the Fundamental Research Funds for the Central Uni-
versities under Grant number ZYGX2015Z005.
REFERENCES
[1] B. V. Quang, R. V. Prasad, and I. Niemegeers, “A survey on Hand-
offs lessons for 60 GHz based,” IEEE Commun. Surveys Tutorials,
vol. 14, no. 1, pp. 64–86, Jan.–Mar. 2012.
[2] G. Godor, Z. Jak Knapp, and S. Imre, “A survey of handover
o, A.
management in LTE-based multi-tier femtocell networks:
Requirements, challenges and solutions,” Comput. Netw., vol. 76,
pp. 17–41, 2015.
[3] 3GPP TS 36.331, “E-UTRA Radio Resource Control (RRC); Proto-
col specification (Release 9),” 2016.
[4] QualcommEurope, “Range expansion for efficient support of het-
erogeneous networks,” TSG-RAN WG1, 2008.
[5] S. Singh, M. N. Kulkarni, A. Ghosh, and J. G. Andrews, “Tractable
Fig. 7. Handoff performance as a function of the number of FBSs. model for rate in self-backhauled millimeter wave cellular
networks,” IEEE J. Sel. Areas Commun., vol. 33, no. 10, pp. 2196–
number of handoffs. On the other hand, we would like to 2211, Oct. 2015.
[6] M. R. Akdeniz, et al., “Millimeter wave channel modeling and
mention that the computational complexity of SMART is
cellular capacity evaluation,” IEEE J. Sel. Areas Commun., vol. 32,
much lower than that of brute force algorithm. We find that no. 6, pp. 1164–1179, Jun. 2014.
the brute force algorithm is at least an order of magnitude [7] A. Talukdar, M. Cudak, and A. Ghosh, “Handoff rates for milli-
lower than the other three policies. meterwave 5G systems,” in Proc. IEEE 79th Veh. Technol. Conf.,
2014, pp. 1–5.
[8] F. Guidolin, I. Pappalardo, A. Zanella, and M. Zorzi, “Context-
8 CONCLUSION aware handover policies in HetNets,” IEEE Trans. Wireless Com-
mun., vol. 15, no. 3, pp. 1895–1906, Mar. 2016.
In this paper, the SMART handoff policy is proposed for [9] A. H. Arani, M. J. Omidi, A. Mehbodniya, and F. Adachi, “A
mmWave HetNets based on reinforcement learning. In handoff algorithm based on estimated load for dense green 5G
networks,” in Proc. IEEE Global Commun. Conf., 2015, pp. 1–7.
[10] Z. Guohua, P. Legg, and G. Hui, “A network controlled handover
mechanism and its optimization in LTE heterogeneous networks,”
in Proc. IEEE Wireless Commun. Netw. Conf., 2013, pp. 1915–1919.
[11] G. Araniti, J. Cosmas, A. Iera, A. Molinaro, A. Orsino, and P. Sco-
pelliti, “Energy efficient handover algorithm for green radio
networks,” in Proc. IEEE Int. Symp. Broadband Multimedia Syst.
Broadcast., 2014, pp. 1–6.
[12] H. Leem, J. Kim, D. K. Sung, Y. Yi, and B.-H. Kim, “A novel hand-
over scheme to support small-cell users in a HetNet environ-
ment,” in Proc. IEEE Wireless Commun. Netw. Conf., 2015, pp. 1978–
1983.
[13] B. Linh, M. G. Larrode, R. V. Prasad, I. Niemegeers, and A. M. J.
Koonen, “Radio-over-fiber based architecture for seamless wire-
less indoor communication in the 60 GHz band,” Comput. Com-
mun., vol. 30, no. 18, pp. 3598–3613, 2007.
[14] M. Polese, “Performance Comparison of Dual Connectivity and
Hard Handover for LTE-5G Tight Integration in mmWave cellular
networks,” preprint arXiv:1607.05425, vol. 3, 2016. [Online]. Avail-
Fig. 8. The comparison of number of handoffs with optimal solution. able: http://arxiv.org/abs/1607.04330
SUN ET AL.: THE SMART HANDOFF POLICY FOR MILLIMETER WAVE HETEROGENEOUS CELLULAR NETWORKS 1467
[15] M. Mezzavilla, S. Goyal, S. Panwar, S. Rangan, and M. Zorzi, “An Yao Sun received the BS degree in mathematical
MDP model for optimal handover decisions in mmWave cellular sciences from the University of Electronic Sci-
networks,” in Proc. IEEE Eur. Conf. Netw. Commun., 2016, pp. 100– ence and Technology of China (UESTC). He is
105. currently working towards the PhD degree at
[16] M. Wang, A. Dutta, S. Buccapatnam, and M. Chiang, “Smart National Key Laboratory of Science and Technol-
exploration in HetNets: Minimizing total regret with mmWave,” ogy on Communications, UESTC. His research
presented at the IEEE Int. Conf. Sens., Commun. Netw., London, and study interests include intelligent access con-
U.K., 2016. trol, handoff and resource management in mobile
[17] M. N. Soorki, M. J. Abdel-rahman, A. Mackenzie, and W. Saad, networks based on machine learning and other
“Joint access point deployment and assignment in mmWave net- data analytics.
works with stochastic user orientation joint access point deploy-
ment and assignment in mmWave networks with stochastic user
orientation,” in Proc. 15th Int. Symp. Model. Optim. Mobile Ad Hoc Gang Feng (M’01-SM’06) received the BEng and
Wireless Netw., 2017, pp. 1–6. MEng degrees in electronic engineering from the
[18] International TelecommunicationUnion, “Requirements related to University of Electronic Science and Technology
technical performance for IMTadvanced radio interfaces,” ITU of China (UESTC), in 1986 and1989, respec-
I.2134, 2009. tively, and the PhD degrees in information engi-
[19] H. Elshaer, M. N. Kulkarni, F. Boccardi, J. G. Andrews, and neering from The Chinese University of Hong
M. Dohler, “Downlink and uplink cell association with traditional Kong, in 1998. He joined the School of Electric
macrocells and millimeter wave small cells,” IEEE Trans. Wireless and Electronic Engineering, Nanyang Technolog-
Commun., vol. 15, no. 9, pp. 6244–6258, Sep. 2016. ical University, in Dec. 2000 as an assistant
[20] C. Phillips, D. Sticker, and D. Grunwald, “A survey of wireless professor and was promoted as an associate pro-
path loss prediction and a survey of wireless path loss prediction fessor in October 2005. At present he is a profes-
and coverage mapping methods,” IEEE Commun. Surveys Tutori- sor with the National Laboratory of Communications, University of
als, vol. 15, no. 1, pp. 255–270, Jan.-Mar. 2013. Electronic Science and Technology of China. He has extensive research
[21] A. Kumar, D. Manjunath, and J. Kuri, Wireless Networking. experience and has published widely in computer networking and wire-
Burlington, MA, USA: Morgan Kaufmann, 2008. less networking research. His research interests include resource man-
[22] E. Dahlman, S. Parkvall, J. Skold, and P. Beming, 3G Evolution: agement in wireless networks, next generation cellular networks, etc. He
HSPA and LTE for Mobile Broadband. Oxford, U.K.: Academic is a senior member of the IEEE.
Press, 2007.
[23] M. Giordani, M. Mezzavilla, and M. Zorzi, “Initial access in 5G
mm-wave cellular networks,” IEEE Commun. Mag., vol. 54, no. 11, Shuang Qin received the BS degree in electronic
pp. 40–47, Nov. 2016. information science and technology, and the PhD
[24] C. N. Barati, et al., “Directional cell discovery in millimeter wave degree in communication and information system
cellular networks,” IEEE Trans. Wireless Commun., vol. 14, no. 12, from University of Electronic Science and Tech-
pp. 1–13, Dec. 2015. nology of China (UESTC), in 2006 and 2012,
[25] M. Giordani, M. Mezzavilla, C. N. Barati, S. Rangan, and M. Zorzi, respectively. He is currently an associate profes-
“Comparative analysis of initial access techniques in 5G mmWave sor with National Key Laboratory of Science and
cellular networks,” in Proc. Annu. Conf. Inform. Sci. Syst., 2016, Technology on Communications in UESTC. His
pp. 268–273. research interests include cooperative communi-
[26] V. Desai, et al., “Initial beamforming for mmWave cation in wireless networks, data transmission in
communications,” in Proc. 48th Asilomar Conf. Signals Syst. Com- opportunistic networks and green communication
put., 2015, pp. 1926–1930. in heterogeneous networks. He is a member of
[27] A. Capone, I. Filippini, and V. Sciancalepore, “Context informa- the IEEE.
tion for fast cell discovery in mm-wave 5G networks,” in Proc.
IEEE Eur. Wireless Conf., 2015, pp. 1–6.
[28] F. Devoti, I. Filippini, and A. Capone, “Facing the millimeter-
wave cell discovery challenge in 5G networks with context-
awareness,” IEEE Access, vol. 4, pp. 8019–8034, 2016.
[29] F. Pantisano, M. Bennis, W. Saad, S. Valentin, and M. Debbah,
“Matching with externalities for context-aware user-cell associa-
tion in small cell networks,” in Proc. GLOBECOM Workshops, 2013,
pp. 4483–4488. [Online]. Available: http://arxiv.org/abs/
1307.2763
[30] H. Wang, L. Ding, P. Wu, Z. Pan, N. Liu, and X. You, “QoS-aware
load balancing in 3GPP long term evolution multi-cell networks,”
in Proc. IEEE Int. Conf. Commun., 2011, pp. 1–5.
[31] T. L. Lai and H. Robbins, “Asymptotically efficient adaptive allo-
cation rules,” Advances Appl. Math., vol. 6, no. 1, pp. 4–22, 1985.
[32] P. Auer, N. Cesa-Bianchi, and P. Fischer, “Finite-time analysis of
the Multiarmed bandit problem,” Mach. Learning, vol. 47, no. 2/3,
pp. 235–256, 2002.
[33] S. H. Low, S. Member, and D. E. Lapsley, “Optimization flow con-
trol—I: Basic algorithm and convergence,” IEEE/ACM Trans.
Netw., vol. 7, no. 6, pp. 861–874, Dec. 1999.
[34] D. P. Bertsekas, Convex Optimization Theory. Belmont, MA, USA:
Athena Scientific, 2009.
1468 IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. 17, NO. 6, JUNE 2018
Ying-Chang Liang (F’11) is a professor in the Tak-Shing Peter Yum (F’13) received primary
University of Electronic Science and Technology and secondary School Education in Hong Kong.
of China (UESTC), China, and also a professor in He received the BS, MS, MPh, and PhD degrees
the University of Sydney, Australia. He was a from Columbia University, in 1974, 1975, 1977,
principal scientist and technical advisor in the and 1978 respectively. He joined Bell Telephone
Institute for Infocomm Research (I2R), Singa- Laboratories in April 1978 working on switching
pore. His research interest lies in the general and signaling systems for 2.5 years. Then, he
area of wireless networking and communications, taught at National Chiao Tung University, Taiwan
with current focus on applying artificial intelli- for 2 years before joining The Chinese University
gence, big data analytics and machine learning of Hong Kong in 1982. He was appointed chair-
techniques to wireless network design and opti- man of IE Department two times and elected
mization. He was elected a fellow of the the IEEE in December 2010, dean of the Engineering for two terms (2004-2010). Since June 1, 2010
and was recognized by Thomson Reuters as a Highly cited researcher he took no-pay leave from CUHK to serve as CTO of ASTRI www.astri.
in 2014, 2015 and 2016. He received IEEE ComSocs TAOS Best Paper org (Hong Kong Applied Science and Technology Research Institute
Award in 2016, IEEE Jack Neubauer Memorial Award in 2014, the First Company Limited). He is currently a professor with Hunan University.
IEEE ComSocs APB Outstanding Paper Award in 2012, and the EURA- He has published widely in Internet research with contributions to rout-
SIP Journal of Wireless Communications and Networking Best Paper ing, buffer management, deadlock handling, message resequencing
Award in 2010. He also received the Institute of Engineers Singapore and multi-access protocols. He then branched out to work on cellular
(IES)s Prestigious Engineering Achievement Award in 2007, and the network, lightwave networks, video distribution networks and 3G net-
IEEE Standards Associations Outstanding Contribution Appreciation works. His recently research is in the areas of RFID, sensor networks
Award in 2011, for his contributions to the development of IEEE 802.22 and wireless positioning technologies. He and student Lei Zhu was
standard. He is now serving as the chair of the IEEE Communications awarded the Best Paper Award of ACM MSWiM 2009 with paper title,
Society Technical Committee on Cognitive Networks, an associate edi- “The Optimization of Framed Aloha based RFID Algorithms.” He and
tor of the IEEE Transactions on Signal and Information Processing over another student Xu Chen were awarded the Honorable Mention Award
Network, and an associate editor-in-chief of the World Scientific Journal (the first runner-up of the best paper award) with paper title “Cross
on Random Matrices: Theory and Applications. He served as founding Entropy Approach for Patrol Route Planning in Dynamic Environments”
editor-in-chief of the—IEEE Journal on Selected Areas in Communica- in IEEE international conference on Intelligence and Security Informatics
tions—Cognitive Radio Series, and was the key founder of the new jour- (ISI), 2010. He is a fellow of the IEEE.
nal the IEEE Transactions on Cognitive Communications and
Networking. He has been an (associate) editor of the IEEE Transactions
on Wireless Communications, the IEEE Transactions on Vehicular " For more information on this or any other computing topic,
Technology, and the IEEE Signal Processing Magazine. He was a distin- please visit our Digital Library at www.computer.org/publications/dlib.
guished lecturer of the IEEE Communications Society and the IEEE
Vehicular Technology Society, and has been a member of the Board of
Governors of the IEEE Asia-Pacific Wireless Communications Sympo-
sium since 2009. He served as Technical Program Committee (TPC)
Chair of CROWN08 and DySPAN10, Symposium Chair of ICC12 and
Globecom12, General co-chair of ICCS10 and ICCS14. He serves as
TPC Chair and Executive co-chair of Globecom17 to be held in Singa-
pore. He is a fellow of the IEEE.