08067509

1456 IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. 17, NO.
6, JUNE 2018
The SMART Handoff Policy for Millimeter Wave

Heterogeneous Cellular Networks
Yao Sun , Gang Feng, Senior Member, IEEE, Shuang Qin, Member, IEEE,
Ying-Chang Liang, Fellow, IEEE, and Tak-Shing Peter Yum, Fellow, IEEE
Abstract—The millimeter wave (mmWave) radio band is promising for the next-generation heterogeneous cellular networks (HetNets)
due to its large bandwidth available for meeting the increasing demand of mobile traffic. However, the unique propagation
characteristics at mmWave band cause huge redundant handoffs in mmWave HetNets that brings heavy signaling overhead, low
energy efficiency and increased user equipment (UE) outage probability if conventional Reference Signal Received Power (RSRP)
based handoff mechanism is used. In this paper, we propose a reinforcement learning based handoff policy named SMART to reduce
the number of handoffs while maintaining user Quality of Service (QoS) requirements in mmWave HetNets. In SMART, we determine
handoff trigger conditions by taking into account both mmWave channel characteristics and QoS requirements of UEs. Furthermore,
we propose reinforcement-learning based BS selection algorithms for different UE densities. Numerical results show that in typical
scenarios, SMART can significantly reduce the number of handoffs when compared with traditional handoff policies without learning.
Index Terms—Handoff, HetNets, millimeter wave, reinforcement learning
1 INTRODUCTION
T HE 5th generation (5G) networks are expected to support

the exponentially increasing demand of mobile traffic. A
simple way to increase the network capacity is to allocate
which are composed of traditional macro base stations and
small base stations with low transmit power, handoff poli-
cies are mostly based on RSRP with Cell Range Expansion
more bandwidth to 5G networks. Since the radio spectrum (CRE) [4]. With the introduction of mmWave band into cellu-
from 300 MHz to 3 GHz is very crowded, an effective solution lar networks, it needs to co-exist with traditional communi-
is to design the 5G networks as two-tier heterogeneous cellu- cation bands forming a complex heterogeneous network [5].
lar networks (HetNets) where the macrocell is supported by Moreover, mmWave channel quality often changes rapidly
traditional cellular band, while some small or femto cells are and intermittently [6]. Therefore, using conventional handoff
supported by the globally available spectrum at millimeter mechanisms in mmWave HetNets may lead to ping-pong
wave (mmWave) band ranging from 30 GHz to 300 GHz [1]. effect, causing high outage probability and redundant
This network architecture is called mmWave HetNets. handoffs.
The key propagation properties at mmWave band are Handoffs in mmWave HetNets are more frequent as
large propagation path loss and high sensitivity to blockage. mmWave cells are smaller. It was shown in [7] that the aver-
These properties cause many design challenges for mmWave age handoff interval can be as low as 0.75 second in typical
HetNets, including integrated circuits design, beamforming scenarios. A separate study [1] showed by computer simula-
design, user association and handoff mechanisms. In partic- tion that more than 61 percent handoffs are unnecessary.
ular, handoff is crucial for keeping users connected while The very large number of redundant handoffs causes heavy
moving around [1], [2]. Handoff mechanisms affect not only signaling overhead, low energy efficiency and high UE out-
service quality of users but also network performance, such age probability.
as throughput and energy efficiency. Conventional handoff To reduce redundant handoffs in traditional HetNets,
mechanisms are based on Reference Signal Received Power two parameters hysteresis and threshold are introduced in
(RSRP) measured by user equipments (UEs) [3]. In HetNets 3GPP [3]. For a specific UE, handoff is triggered if the RSRP
of the current serving BS is lower than the threshold value
Y. Sun, G. Feng, S. Qin, and Y-C. Liang are with the National Key Labora- and the RSRP of the target BS is stronger than that of the
tory of Science and Technology on Communications, University of Electronic serving BS by hysteresis. This method, however, is not suit-
Science and Technology of China, Chengdu 611731, P.R. China. able for use in mmWave HetNets due to highly dense BSs
E-mail: sunyao@std.uestc.edu.cn, {fenggang, blueqs, ycliang}@uestc.edu.cn. deployments, small BS coverage and fast varying mmWave
T-S. Peter Yum is with College of Computer Science and Electronic Engi-
neering, Hunan University, Changsha 611731, P.R. China. channel quality. There are also occasions where the “two-
E-mail: tsyum@ie.cuhk.edu.hk. parameter” method misses necessary handoffs. Under these
Manuscript received 11 Jan. 2017; revised 19 Sept. 2017; accepted 2 Oct. 2017. circumstances, artificial intelligence tools that incorporate
Date of publication 13 Oct. 2017; date of current version 3 May 2018. information on surrounding environment can be used to
(Corresponding author: Yao Sun.) design a smart handoff mechanism in mmWave HetNets.
For information on obtaining reprints of this article, please send e-mail to:
reprints@ieee.org, and reference the Digital Object Identifier below. In this paper, we propose a reinforcement learning based
Digital Object Identifier no. 10.1109/TMC.2017.2762668 handoff policy named SMART for mmWave HetNets. Our
1536-1233 ß 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See ht_tp://www.ieee.org/publications_standards/publications/rights/index.html for more information.
SUN ET AL.: THE SMART HANDOFF POLICY FOR MILLIMETER WAVE HETEROGENEOUS CELLULAR NETWORKS 1457
design objective is to reduce the number of unnecessary

handoffs while guaranteeing the QoS of UEs. SMART con-
sists of two parts. Part 1 is to determine the handoff trigger
condition by the mmWave channel characteristics and QoS
requirements of UEs. Part 2 is on BS selections, and is car-
ried out by two algorithms: SMART-S and SMART-M for
different UE density circumstances. SMART-S chooses tar-
get BS for single UE based on Upper Confidence Bound
(UCB) algorithm that can achieve logarithmic performance
when compared with the optimal algorithm that uses global
perfect information. SMART-M is used for dense UE distri- Fig. 1. The system model of mmWave HetNets.
bution circumstance to choose BSs for multiple UEs trigger-
ing handoffs in the same measurement report period. We then proposed the Extended Cell (EC) in RoF architecture.
formulate it as a 0-1 integer programming, and solve it by An EC is a group of adjacent cells or antennas that transmit
Lagrange dual decomposition with relaxation. the same data over the same frequency channel for a specific
In the following, we introduce related works and the sys- UE. This can increase overlapping areas and thus decrease
tem model in Sections 2 and 3, respectively. In Section 4 we the outage probability of UE during handoffs. The method
present the framework of our proposed handoff policy is suitable for indoor environment with low UE mobility.
SMART. In Sections 5 and 6, we present the BS selection The authors of [14] proposed a dual connectivity (DC) net-
algorithm for a single and multiple UEs respectively. We work architecture to deal with the handoff between two
compare the performance of SMART with traditional hand- radio access technologies (RATs): mmWave and LTE.
off policies in Section 7 and conclude the paper in Section 8. Focusing on the optimization of handoff policies for
mmWave HetNets, the authors of [15] solved the BS selec-
2 RELATED WORK tion problem by MDP through combining the contributions
of handoff overhead, cell load and channel conditions into a
Here, we present the handoff policies for traditional Het- reward function. The handoff policy can achieve high
Nets and mmWave HetNets separately. throughput while decreasing the number of handoffs. As
the computation complexity of solving MDP is formidable,
2.1 Handoff Strategies for Traditional HetNets this strategy cannot readily applied to densely deployed
In recent years, research on handoff is focused mainly on HetNets. The authors of [16] developed an online learning-
HetNets operating in the band of traditional frequency 900 based approach to solve single UE network selection prob-
MHz-2.4 GHz and considering one or more factors includ- lem in heterogeneous wireless networks consisting of
ing RSRP, QoS of UEs, UE mobility characteristics, BS load, mmWave and other RATs, such as Wi-Fi and LTE. This
etc., [8], [9], [10], [11], [12]. In [8], a handoff policy is pro- work is focused on RAT selection for a single UE and aims
posed that considers context parameters, such as user at maximizing the long-term throughput of the UE. We will
speed, channel gains and cell load information. The BS develop an approach in the following that reduces the num-
selection decision is based on a Markov Decision Process ber of unnecessary handoffs while guaranteeing the QoS of
(MDP) model with the aim of maximizing UE average UEs. Besides, due to random line-of-sight mm-wave link,
capacity. In [9], the authors proposed a handoff algorithm the authors of [17] suggest to assign more than one
based on BSs estimated load. They combined handoff deci- mmWave links to each user equipment so thus to decrease
sions with BS sleeping policy so as to improve system the signaling overhead for handoff in mmWave networks.
energy efficiency. [10] and [11] are mainly focused on the They propose a joint access point placement and mobile
improvement of handoff trigger conditions. The authors in device assignment scheme for mmWave networks with aim
[10] proposed a new handoff triggering mechanism named to minimize the number of access points while satisfying
Network Controlled Handover (NCH) for 3GPP Long Term the line of sight coverage of mobile devices.
Evolution (LTE) HetNets. NCH can optimize handoff trig-
ger parameters such as Channel Quality Indication (CQI)
3 SYSTEM MODEL
threshold based on the statistics of the handoff perfor-
mance. In [11], the authors proposed a new handoff algo- 3.1 Network Scenario
rithm aiming at the efficient management of BSs Consider a densely deployed HetNet with M femto cells
transmitted power and the reduction of unnecessary hand- underlying a macrocell as shown in Fig. 1. Let M be the set
offs. The authors of [12] proposed a novel handoff policy of femto base stations (FBSs). FBSs can use either mmWave
based on cooperation-based cell clustering in densely or the traditional cellular frequency shared with the macro
deployed HetNets to reduce handoff signaling overhead. base station (MBS). Let be the ratio of FBSs using mmWave
frequency, Mm be the set of the mmWave FBS (denoted as
2.2 Handoff Strategies for mmWave HetNets mm-FBS), and Mt be the set of the traditional FBS (denoted
Thus far, there is little research work on handoff in as Tr-FBS). UEs move randomly in the HetNet.
mmWave HetNets [13], [14], [15], [16]. The authors of [13]
proposed the Radio-over-Fiber (RoF) network architecture 3.2 Propagation Model
for mmWave communications which facilitates flexible and First, we discuss about the channel model of mmWave
cost effective deployment of distributed antennas. They band. We assume that the channel of mm-FBS is based on
1458 IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. 17, NO. 6, JUNE 2018
3GPP Standard probabilistic LOS-NLOS models [18], mean- there is no extra antennas gain. For path loss, we use flexible
ing that the channel condition between UE and mm-FBS can path loss exponent model [20]
alternate between the two states, Line-of-Sight (LOS) and
Non-Line-of-Sight (NLOS). LOS state means that a line-of- PLðdÞ ¼ 10 log 10 ðdÞ þ 20 log 10 f þ 32:45; (5)
sight mmWave link between UE and mm-FBS exists. The
where d is the distance in meters, is the path loss exponent
channel state transition probability is related to environ-
and f is the carrier frequency in MHz. For NLOS environ-
ment, and this probability is typically unknown [16]. Note
ments, a larger exponent is used [20]. For shadow fading,
that the channel state for UEs may be different, even when
we use a zero mean Gaussian random variable to describe
they are located at the same position and associated with
it [21].
the same mm-FBS. This is due to blockages, and thus the
We assume that all BSs allocate bandwidth resources to
UEs may have different SNR. Similar to that in [5], [6], we
their serving UEs uniformly. According to Shannon capac-
assume that the path loss model is
ity formula, the achievable transmission rate for UE n asso-
ciated with BS j can be written as
LðdÞ ¼ a þ 10h log 10 ðdÞ þ ½dB; Nð0; u2 Þ; (1)
8
where d is the distance in meters, a and h are the least < BUm log2 ð1 þ SNRjn Þ; j 2 Mm
j
rn ¼ B
j
(6)
square fits of floating intercept and slope over the measured : t log2 ð1 þ SINRjn Þ; j 2 fMt [ MBSg;
Uj
distances (30 to 200 m), and u2 is the lognormal shadowing
variance. The values of a, h and u are different for LOS and where Bm ðBt Þ is the bandwidth of mm-FBS (Tr-FBS and
NLOS states [5], [6]. Since interference can be ignored for MBS) and Uj is the total number of UEs served by BS j.
mm-FBS, for a specific UE, say UE n, the SNR when associ-
ated with mm-FBS j can be written as 3.3 Initial Access Model
In this section, we illustrate how to discover a new BS, and
Pj cLðdÞ1
SNRjn ¼ ; (2) establish a possible connection in case a handover is per-
s2 formed. We assume that the cell search procedure for tradi-
where Pj is the transmit power of mm-FBS j, s 2 is the noise tional band is identical to that in LTE, i.e., the MBS and Tr-
power and c is the antenna gain. We assume that all mm- FBSs perform cell search by transmitting omnidirectional
FBSs are equipped with directional antennas which are nec- synchronization signals [22]. For mmWave system, 4G-LTE
essary to support beamforming and beam tracking for initial access procedure is infeasible due to the problem of
mmWave system. On the other hand, we assume that UEs discovery range mismatch [23], [24]. In our model, we adopt
are equipped with omnidirectional antennas, and thus the an efficient initial cell search scheme, iterative search [25], [26],
antenna gains are only accounted for at mm-FBSs side [19]. which performs a two-stage scanning procedure of the angu-
Similar to that in [5], [19], we assume that antenna gain lar space. In detail, the space is partitioned into several wide
model can be expressed as sectors, and each wide sector is divided into several narrow
sectors. In the first phase, the BS transmits pilots over wide
(
cmax ; if juj u2s sectors. In the second phase, the BS refines its search within
cðuÞ ¼ (3) the best wide sector by steering narrow beams, and thus
cmin ; otherwise; finds the best narrow sector [23]. All the pilots are transmit-
ted on a directional mmWave channel.
where u is the angle between UE and mm-FBS, and us is the
Technically, the cell search procedure is independent with
width of the antenna main lobe. When a UE is associated
the target BS selection policy. Hence, although the initial cell
with an mm-FBS, in order to maintain the mmWave com-
search scheme could affect the absolute value of the number
munication link, beam tracking could be used. We assume
of handoffs [27], [28], it does not affect the relative perfor-
perfect beam tracking is performed, and thus the transmis-
mance enhancement of the proposed SMART policy. Intui-
sion direction of the UE is always in the main lobe, so as to
tively, some new cell search schemes, such as those proposed
enjoy a high antenna gain.
in [27] and [28] which use context information to speed up the
Next, we present the traditional radio band channel
cell search process, could be implemented with the proposed
model. We assume that the MBS and Tr-FBSs are equipped
SMART handoffs policy in mmWave HetNets. As this is
with omnidirectional antennas to guarantee coverage area
beyond the scope of the work, we use the afore-mentioned
[19]. For traditional band links, we need to consider co-
iterative search scheme for mmWave band initial cell search.
channel interference due to shared bandwidth deploy-
ment. The SINR of UE n associated with BS j can be
3.4 QoS Model
expressed as
Similar to that in [29], [30], we use two factors to describe
8 QoS requirement: minimum threshold of transmission rate
<P
Pj gnj
> ; j is MBS
k2Mt
Pk gnk þs 2 g min
n and endurable time t n . The endurable time is the maxi-
SINRjn ¼ (4)
>
:P
Pj gnj
; j 2 Mt ;
mum time a UE is allowed to have the transmission rate
Pk gnk þs 2 lower than the minimum threshold. We state that the QoS
k2fMt [MBSg=fjg
of UE n is satisfied when the following condition holds
where gnj is the channel gain between UE n and BS j, which
includes path loss and shadowing. Since we assume that Tr- 9t0 2 ½t t n ; t; s:t:rjn ðt0 Þ g min
n : (7)
FBSs and UEs are equipped with omnidirectional antennas,
Furthermore, to classify the type of service more precisely,

we introduce a third factor: maximum threshold of trans-
mission rate, denoted by g max n . Let C ¼ fC1 ; C2 ; . . . ; CL g be
the set of all service types, and specify that the service of UE
n belongs to type Ci when t n 2 ½t i ; t iþ1 Þ, g min
n 2 ½g min
i iþ1 Þ
; g min
and g max
n 2 ½g max
i ; g max
iþ1 Þ. We assume that UEs in the system
move at a random speed and in a random direction.
4 FRAMEWORK OF SMART HANDOFF POLICY

3GPP Standard defines six handoff events for cellular net-
works [3] with Event A2 and Event A3 being the most com- Fig. 2. Reinforcement learning based BS selection framework.
mon ones in HetNets. Our proposed SMART handoff policy
focuses on these two handoff events, and other handoff mechanism, when the above three conditions hold for TTT
decisions remain the same as those in 3GPP. time, Event A3 handoff is triggered.
4.2 BS Selection
4.1 Handoff Trigger Conditions
Once handoff trigger conditions are met, UEs need to select
Event A2 occurs when the RSRP of the serving BS becomes
suitable target BSs. In SMART, we use reinforcement-learning
worse than a threshold [3], and the trigger condition can be
for selecting BSs to reduce the number of unnecessary hand-
expressed as
offs. We design two BS selection algorithms: SMART-S and
RSRP jn < threshold Hys; (8) SMART-M, for different UE density circumstances. SMART-S
with low computational complexity is for a specific UE. It is
where Hys is a hysteresis parameter added for reducing suitable for sparse UE density circumstance. SMART-M is a
redundant handoffs (e.g., ping-pong effect). Event A2 hand- joint optimal policy for multiple UEs triggering handoffs in
off is performed when the serving BS cannot fulfill the mini- the same measurement report period. It is suitable for dense
mum UE QoS requirement. Thus, in SMART, the trigger UE distribution circumstance with a central controller.
condition can be written as
5 SMART-S ALGORITHM FOR SINGLE TARGET
8t0 2 ½t t n ; t; rin ðt0 Þ < g min
n ; (9) BS SELECTION
where t n and g min
n are UE service type parameters. This Note that once a specific BS satisfies the trigger conditions of
change can avoid many unnecessary handoffs. Once Event A3, the target BS is determined. We therefore focus on
inequality (9) is satisfied for UE n, an Event A2 handoff is the BS selection for Event A2. Let An ðtÞ be the set of admis-
triggered, and the UE needs to select a suitable target BS. sible BSs when UE n triggers Event A2 handoff at time t,
Event A3 occurs when a neighbor BS becomes offset bet-
ter than the serving BS [3], and the trigger condition can be An ðtÞ ¼ fk j rkn ðtÞ g min þ G; 8k 2 M [ MBSg; (12)
n
expressed as
where G is a criteria offset parameter. For UE n with volume
RSRP kn RSRP jn þ offset; (10) of data Qn to be transmitted, we use Hn to denote the num-
ber of handoffs. Our goal is to select BS in set An ðtÞ with
for time-to-trigger (TTT) period, where RSRP kn and RSRP jn
minimum Hn once Event A2 condition is triggered.
are the RSRPs of target BS k and current serving BS j mea-
sured by UE n respectively, and offset and TTT are two 5.1 Reinforcement-Learning Framework
parameters defined in 3GPP. Once a UE experiences a hand-
We model the BS selection problem as a reinforcement
off in this event, it means that the UE switches to a better BS
learning problem. It consists of three elements: agent, envi-
which can improve its QoS although current serving BS can
ronment and action. In our model shown in Fig. 2, the agent
fulfill the minimum QoS requirement. Thus, SMART uses
is a specific UE n, the environment is the channel conditions
the following three trigger conditions
of BSs, and the action is BS selection policy. The aim is to
9t0 2 ½t t n ; t; s:t: rjn ðt0 Þ g min (11-1) maximize the total reward by a sequence of BS selections.
n ;
Our objective is to minimize the total number of handoffs
rkn ðtÞ rjn ðtÞ þ offset; (11-2) Hn . As it is difficult to incorporate Hn into the reward function
directly, we make a transformation as follows. Let reward
g max g min > : (11-3) function Rkn ðtÞ be defined as the volume of transmitted data
n n
from time t to tkn when UE n switches to BS k at time t, or
Condition (11-1) states that the current serving BS can fulfill Z tkn
the minimum UE QoS requirement. Condition (11-2) con-
Rkn ðtÞ ¼ rkn ðtÞdt: (13)
straints that the transmission rate of the target BS k is at least t
offset higher than that of the serving BS j. Condition (11-3)
indicates that the difference of transmission rate between Proposition 1. Minimizing the total number of handoffs Hn for
maximum threshold and minimum threshold is greater UE n is equivalent to solving the proposed reinforcement learn-
than in QoS requirement. Similar to traditional handoff ing problem with the reward function defined in (13).
Proof. Let tkn in (13) equal to the time when the next handoff handoffs occur for UE n can be expressed as
for UE n is triggered after time t, and we define a sort
function F in a finite set X as X
W
Regretp ðW Þ ¼ ½Rp ðtkn Þ Rkn ðtkn Þp ; (18)
FðxÞ ¼ k; x 2 X and x is the k smallest element in X: Fðtkn Þ¼1
(14)
where Rp ðtkn Þ is the reward of the optimal policy p at time
The objective of the above reinforcement learning model tkn . It was shown in [31] that the best regret is logarithmic
is to find the optimal policy p : with respect to the number of handoffs W . Based on that,
2 3 the authors of [32] proposed an Upper Confidence Bound
X K
(UCB) algorithm to deal with this tradeoff. It can achieve
p ¼ arg max Ep 4

Rn ðtÞ5;
k
(15) logarithmic regret with low computation complexity. The
p
Fðtkn Þ¼1
UCB policy states that the agent chooses machine j at each
where K is the maximum value of Fðtkn Þ, which is equals decision time according to the following index
to the number of handoffs in the time period. sffiffiffiffiffiffiffiffiffiffiffiffiffi!
If we fix the volume of transmitted data of UE n as Qn , 2 ln W
j ¼ arg max xj þ ; (19)
applying policy p can minimize the total number of hand- j Wj
offs of UE n when transmitting Qn data, which equals to
our optimization objective minHn . u
t where xj is the average reward obtained from machine j, Wj
is the number of times machine j has been chosen and W is
5.2 Expected Reward Estimation the overall number of decisions so far.
As tkn and rkn ðtÞ in (13) are unknown random variables, the The BS selection algorithm when UE n triggers Event A2
expected reward E½Rkn ðtÞ can only be estimated from histor- handoffs is based on UCB. We set index of BS j for UE n as
ical information. We use R k ðtÞ to denote the observed value
n
rffiffiffiffiffiffiffiffiffiffiffi
of Rn ðtÞ which can be obtained once UE n switches to BS k.
k Rk ðT k Þ þ ‘ 2 lnkHn , where ‘ ¼ maxk2A ;C 2C R
k ðT k Þ and
Cn Cn T n n Cn Cn
Cn
However, a UE may not stay around a specific BS k for a
long time, and thus we cannot have enough historical infor- Hn is the total number of handoffs for UE n so far. Thus, the
mation to estimate Rkn ðtÞ accurately. To get around, we policy is selecting BS k in set An for UE n once Event A2
define type reward R k ðT k Þ as handoff occurs, where k can be expressed as
Cn Cn
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi!
k ð0Þ ¼ 0;
R (16-1) 2 ln Hn
Cn
k ¼ arg max RCn ðTCn Þ þ ‘
k k
: (20)
k TCkn
k ðT k Þ þ R
TCkn R k ðtÞ
k ðT k þ 1Þ ¼ Cn Cn n
R ; (16-2)
Cn Cn
TCkn þ 1 We summarize our proposed SMART-S BS selection algo-
rithm in Algorithm 1.
where TCkn denotes the number of times that BS k is selected
by UEs with service type Cn . We take this observed
k ðT k Þ as the mean reward for UEs with the Algorithm 1. SMART-S BS Selection Algorithm Based on
value R Cn Cn
same service type Cn , and each UE uses his own observed
UCB
reward R k ðtÞ to update the type reward R k ðT k Þ after a Input: Network topology (BS and UE distributions, ); service
n Cn Cn
handoff occurs based on (16-2). Thus, the expected reward type of UEs.
can be estimated as Output: BS selection decisions k .
k ðT k Þ in time T based on
1: Initialization: obtain TCkn , Hn , R
( Cn Cn
k ðT k Þ; if n 2 Cn
R traditional handoff policy
E½Rkn ðtÞ ¼ Cn Cn
(17) 2: while handoff conditions are met for a certain UE n do
0; otherwise:
3: if Event A2 handoff then
Since the handoff trigger conditions of UEs with the same 4: Judge service type Cn of UE n
TCk R
k ðT k ÞþR
k ðtÞ
k ðT k Þ can be accu-
service type are similar, type reward R 5: k ðT k þ 1Þ
R n Cn Cn n
Cn Cn Cn Cn TC þ1
k
rately estimated by reinforcement learning. n !
rffiffiffiffiffiffiffiffiffiffiffi

6: k ¼ arg maxk k ðT k Þ þ ‘ 2 lnkHn
RCn Cn T
5.3 BS Selection Algorithm Cn
We cannot always select the BS with the highest reward 7: TCkn TCkn þ 1,Hn Hn þ 1
since a well-known dilemma exploration versus exploita- 8: else
tion exists in reinforcement learning. This dilemma states 9: switch to the unique target BS k
that there is a tradeoff between improving UEs knowledge 10: end if
about the reward distributions of BSs (exploration) and 11: end while
switching to the BS with the highest empirical mean reward
(exploitation). Regret is a concept to measure the perfor-
mance of a policy [16], which is defined as the difference of 5.4 Properties of SMART-S
total reward between the adopted policy and global optimal SMART-S algorithm does not perform iteration and thus
policy. In our problem, the regret of policy p after W does not have convergence issue. We investigate here the
performance bound and signaling overhead. The perfor- 6 SMART-M ALGORITHM FOR MULTIPLE TARGET
mance bound is established from Fact 1 and Corollary 1. BS SELECTION
Fact 1. For all K > 1, if policy UCB is run on K machines The BS selection algorithm discussed in Section 5 focuses on
having arbitrary reward distributions P1 ; . . . ; PK with individual UEs. However, in the time interval between two
support in ½0; 1, its expected regret can achieve logarith- adjacent measurement report periods, there may be multi-
mic bound. ple UEs that need handoff especially for dense UE distribu-
tion. Moreover, multiple UEs may trigger handoffs in the
Proof. cf. [32] for proofs. u
t
same time period or even simultaneously in typical scenar-
Corollary 1. The proposed UCB-based SMART-S BS selection ios, such as a group of UEs riding in a moving bus. We
policy achieves logarithmic regret with respect to the total num- therefore design SMART-M algorithm for optimal multi-BS
ber of handoffs Hn . selection.
Proof. We construct a new reinforcement learning model 6.1 Problem Formulation Based on Learning
which is the same as our above proposed model except Results
for the reward function. For sake of convenience, we Let N be the set of UEs sending handoff request to the net-
denote the above proposed and the new reinforcement work central controller in a measurement period and let
learning model as RL 1 and RL 2 respectively. The reward N ¼ jN j. As the period is usually short (e.g., in tens of milli-
function of RL 2 is defined as seconds), we assume that the BS selection decisions are
made at the end of individual periods. Here, the objective
Rtn ðtÞ function Y is again chosen as the volume of transmitted
Ynk ðtÞ ¼ ; (21)
‘ data before the next handoff occurs for these N UEs. Also
we use R j ðT j Þ to estimate E½Rk ðtÞ based on the above
Ci Ci n
where Rkn ðtÞ is the reward function of RL 1 defined in reinforcement learning. The problem is formulated as
(13). As ‘ is a constant, RL1 and RL2 have the same policy
XX
solution. Since Ynk ðtÞ has a bounded support in ½0; 1, we max Y ¼ j ðT j Þ
xij R (25)
Ci Ci
use UCB algorithm to solve RL 2 problem, and thus the i2N j2Ai
index in (19) can be expressed as
X
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi! s:t: xij Nj ; 8j 2 [i2N Ai ; (25-1)
2 ln Hn i2N
k ¼ arg max yk þ ; (22)
k TCkn X
xij ¼ 1; 8i 2 N (25-2)
j2Ai
where yk is the average reward obtained from BS k which
k ðT k Þ
R
equals to C C n
‘
n
. Thus, the index can be rewritten as xij 2 f0; 1g; 8i 2 N ; 8j 2 [i2N Ai ; (25-3)
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi! where xij is a binary variable indicating whether UE i

k ðT k Þ
R 2 ln Hn
Cn Cn
k ¼ arg max þ : (23) switches to BS j, Nj is the current connection capacity of BS
k ‘ TCkn j (equals to the maximum connection capacity minus the
number of current serving UEs), and Ai is the set of admis-
According to Fact 1, we know that the regret bound is
sible BSs for UE i. Constraint (25-1) ensures that the number
logarithmic for RL 2 problem, and thus for RL 1. Since ‘
of UEs which switch to the same BS does not exceed the cur-
is a constant, we use
rent BS connection capacity. Constraints (25-2) and (25-3)
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi! guarantee that each UE can only be associated with one BS
2 ln Hn at a time. For convenience, we use set A to denote [i2N Ai in
k ¼ arg max RCn ðTCn Þ þ ‘
k k
(24)
k TCkn the rest of the paper.
to replace the index in (23) which is the same as the pro- 6.2 BS Selection
posed BS selection policy. u
t The problem stated in (25) is a special case of a well-known
Next we discuss the signaling overhead for SMART-S BS NP-hard problem Generalized Assignment Problem (GAP),
selection algorithm. When the handoff trigger conditions with OðN jAj Þ complexity using brute force algorithm. Obvi-
are satisfied for a specific UE, it notifies his service type to ously it is infeasible to use the brute force algorithm for
the admissible BSs in set An , and the BSs calculate and send solving dense deployment mmWave HetNets due to pro-
their corresponding indexes to the UE. The UE switches to hibitively high computational complexity. Instead, we pro-
the target BS, say BS k, determined by using (20). When the pose the following efficient heuristics. We first relax binary
next handoff occurs, the UE obtains the value of R k ðtÞ, and variables xij in constraints (25-3) to be continuous variables
n
transmits it to BS k. The BS uses it to update the expected in ½0; 1. We then exploit Lagrange dual decomposition
reward and index according to (16) and (17). Thus, the num- method [33] to solve this optimization problem.
ber of signaling exchanges needed is 2jAn j and each signal- After relaxing xij , problem (25) becomes a linear problem
ing exchange uses several bits. with Lagrange function
!
XX X X where ", b and r are fixed positive constant with b < 1 and
x; m Þ ¼
Lðx j ðT j Þ
xij R mj xij Nj ; (26)
Ci Ci r 1 [34].
i2N j2Ai j2A i2N
For linear programs, strong duality holds. Therefore, the
where mj is Lagrange multiplier. For a fixed vector m , minimum value of gðm mÞ is equal to the maximum value of
Lagrange dual function can be expressed as the original problem. The solution process is that: we first
obtain the maximum value gðm mÞ over x with fixed m , and
mÞ ¼ sup Lðx
gðm x; m Þ (27) then minimize gðm mÞ over m denoted as gðm m Þ. The optimal
x
binary solution x is obtained with the corresponding solu-
X tion m . According to x we make BS selections for those
s:t: xij ¼ 1; 8i 2 N ; (27-1)
j2Ai
handoff UEs.
Similar to that in Section 5, Rj ðT j Þ is updated once the
Ci Ci
0 xij 1; 8i 2 N ; 8j 2 A; (27-2) next handoff occurs according to (14) and (15). Note that,
the reinforcement-learning process in Section 2 can improve
and the dual problem is minm gðmmÞ. Rewriting function gðmmÞ the accuracy of the value of R j ðT j Þ thus the solution of
Ci Ci
yields this optimization problem. We summarize the SMART-M
XX X algorithm in Algorithm 2.
mÞ ¼ sup
gðm j ðT j Þ mj Þ þ
xij ðR mj Nj : (28)
Ci Ci
x
i2N j2Ai j2A
Algorithm 2. Joint Optimal SMART-M BS Selection
Since it dose not include the cross-term of xij , we can Algorithm.
exchange the computation order as: Input: Network topology (BS and UE distributions, ); handoff
X X X UEs N .
mÞ ¼
gðm sup j ðT j Þ mj Þ þ
xij ðRCi Ci mj Nj : (29) Output: BS selection decisions x .
i2N xij ;j2Ai j2Ai j2A Initialization:
1: Judge service type of UEs
Thus, we can solve the following problem for each UE i sep- 2: Determine admissible BSs
arately, 3: The BSs send the value of R j ðT j Þ and Nj to the central
Ci Ci
X controller
gi ðm
mÞ ¼ sup j ðT j Þ mj Þ
xij ðR (30) BS selection decisions:
Ci Ci
xij ;j2Ai j2A
i 4: x 0 0, x0 current connections, k 1
X 5: while xk 6¼ x k1 do
s:t: xij ¼ 1; (30-1) 6: k kþ1
j2Ai 7: for each UE i 2 N do
8: solve problem (30)
0 xij 1:8j 2 Ai : (30-2)
9: end for (obtain xk )
Since we want to find a binary solution of xij , for a fixed 10: update m k according to (31)
vector m , problem (30) is described as: for UE i, we choose a 11: end while
12: x xk
BS j from set Ai to maximize the value of RjCi ðtÞ mj .
Therefore, when m is fixed, problem (27) can be solved by
choosing the optimal BS j for each UE respectively. Then
we minimize gðm mÞ over m to obtain the optimal value m for 6.3 Convergence and Computational Complexity
the dual problem. We use negative gradient direction to of SMART-M
update mj with respect to mj 0, To prove the convergence of SMART-M BS selection algo-
rithm, we need Propositions 2 and 3.
" !#þ
X Proposition 2. Let mk be the sequence generated by (31). Then,
mj ðk þ 1Þ ¼ mj ðkÞ dðkÞ Nj xij ; 8j 2 A; (31)
i2N
for all non-negative jAj-dimensional vectors v and k 0
where dðkÞ > 0 is the update step size, and is given by mkþ1 v k2 km
km mk v k2 2dðkÞðgðm
mk Þ gðvvÞÞ
mk Þ gk
gðm þ dðkÞ2 kh xk Þk2 ;
hðx
dðkÞ ¼ ; 8k 0; (32)
hk k2
kh
where kh hðx
xPk Þk is an jAj-dimensional vector with elements
where gk is an estimate of the optimal value g . The proce- hj ¼ Nj i2N xkij , jAj is the cardinality of set A, and x k is
dure of updating gk is given by xk ; m k Þ ¼ supx Lðx
a vector that satisfies Lðx x; mk Þ ¼ gðm
mk Þ.
gk ¼ min gðm
mk Þ " k ; (33) Proof. According to (31), we have
1jk
and "k is updated according to mkþ1 v k2 ¼ km

km mk dðkÞh xk Þ v k 2
hðx
mk v k2 2dðkÞðm
¼ km xk Þ þ dðkÞ2 kh
mk v ÞT h ðx xk Þk2 :
hðx
r"k mkþ1 Þ gk
if gðm
"kþ1 ¼ (34) (35)
maxfb"k ; "g otherwise;
As Lðx xÞ ¼ @Lðx
x; m Þ in (26) is linear and h ðx mÞ
x;m
m , for all vec-
@m First, the two algorithms are designed for different UE
jN j jAj
tors x 2 ½0; 1 we have density scenarios. SMART-S is appropriate for sparse UE
density, while SMART-M is designed for dense UE distribu-
x; m k Þ Lðx
Lðx mk v ÞT h ðx
x; v Þ ¼ ðm xÞ: (36) tion. Specifically, SMART-S chooses target BS for a single UE
without considering the states and decisions of other UEs.
For vector x k , SMART-M can achieve a joint optimal BS selection policy for
multiple UEs which are triggered to perform handoffs in the
xk ; m k Þ Lðx
Lðx xk ; v Þ ¼ gðmmk Þ Lðx
xk ; v Þ same measurement report period. The computational com-
gðm mk Þ sup Lðx x; v Þ ¼ gðm
mk Þ gðvvÞ: (37)
plexity of SMART-M is higher than SMART-S algorithm.
x
Thus, we choose SMART-S or SMART-M according to the
Combining (36) and (37) yields UE density.
On the other hand, SMART-M makes handoff decisions
mk v ÞT h ðx
ðm xÞ gðm
mk Þ gðvvÞ: (38) for multiple UEs by solving an optimization problem with
unknown parameters. We employ the learning algorithm of
Combining (35) and (38) yields SMART-S to evaluate the unknown parameters in the opti-
mization framework. In more details, SMART-M needs to
mkþ1 v k2 km
km mk v k2 2dðkÞðgðm
mk Þ gðvvÞÞ solve (25) with the expected reward R j ðT j Þ in the rein-
Ci Ci
þ dðkÞ2 kh xk Þk2 :
hðx forcement learning model of SMART-S, which is indeed the
estimated value of E½Rkn ðtÞ. In this sense, SMART-S can be
u
t used to enhance the accuracy of R j ðT j Þ from historical
Ci Ci
data, and thus improve the performance of SMART-M.
Proposition 3. Assuming that step size dðkÞ is determined by Second, let us discuss the implementations of the two algo-
(32), (33) and (34), if g > 1 then limk!1 inf gðm mk Þ rithms, in order to further clarify their relation. For the selec-
g þ ". tion between the two algorithms, SMART-S is indeed feasible
P for any UE density. From Corollary 1, we can see that
Proof. As Nj and xkij are bounded, hj ¼ Nj i2N xkij and
SMART-S achieves logarithmic regret with respect to the total
then vector hðx xÞ ¼ @Lðx
xk Þ are also bounded. Since h ðx mÞ
x;m
m ,
@m number of handoffs. In other words, although we always run
there exists a scalar c that
SMART-S for any UE density, we can still achieve at least log-
arithmic regret bound and enjoy performance improvement.
c supfk@gðm
mÞkg: (39)
Certainly, for dense UE distribution circumstances, we can
Combining Proposition 2 and (39), we can conclude that run SMART-M algorithm to further improve the handoff per-
our problem satisfies the necessary conditions of Propo- formance with some computational cost.
sition 6.3.6 in [34]. By applying this proposition with In our system, we adopt a simple selection policy between
g k ¼ 1, we obtain the Proposition 3, and the convergence the two algorithms. We define a UE density threshold G to
of SMART-M is proved. u
t identify sparse or dense UE distribution. When the number of
UEs which send handoff request to the controller in a mea-
Next we discuss the computational complexity of surement period is lower than G, SMART-S is selected, other-
SMART-M BS selection algorithm. At each iteration, we wise SMART-M is selected. Other handoff procedure remains
decompose gðm mÞ into N sub-problems gi ðm mÞ, and the compu- the same as that in conventional handoff policy.
tational complexity of gi ðm
mÞ is OðjAi jÞ. Thus, the complexity
of SMART-M BS selection algorithm is OðkjN jjAjÞ, where k 7 NUMERICAL RESULTS
is the number of iterations. In most simulation experiments,
the algorithm converges in less than 10 iterations with total In this section, we compare the performance of SMART with
run time several milliseconds, which can satisfy real-time two conventional handoff policies as follows. (1) Rate-based
requirements. handoff (RBH). RBH has similar trigger conditions as those in
We again evaluate the signaling overhead for SMART-M 3GPP. When choosing target BSs for handoffs, the ones with
BS selection algorithm. The UEs who trigger handoff condi- maximum transmission data rates are chosen (instead of max-
tions need to notify the service types to their admissible imum RSRP in 3GPP [3]). (2) SINR based handoff (SBH). SBH
BSs, which calculate and send the corresponding value has the same handoff trigger conditions as that of SMART
Rj ðT j Þ to the central controller. The central controller and uses maximum SINR for target BS selection.
Ci Ci
makes handoff decisions based on SMART-M policy, and
sends these decisionsP to UEs. Thus, the number of signaling 7.1 Simulation Settings
exchanges needed is N i¼1 jAi j þ jAj þ jN j, and each signal- We consider a two-tier HetNet deployed in urban area, and
ing exchange uses several bits. the HetNet consists of an MBS and varying number of mm-
FBSs, Tr-FBSs and UEs. The MBS is located at the central of
6.4 Further Discussions on SMART-S and a circular area with radius equal to 500 m, and both mm-
SMART-M FBSs and Tr-FBSs are randomly distributed in the area. The
After discussing the details of the two algorithms sepa- transmit power of MBS, mm-FBS and Tr-FBS is set to 46
rately, in this section, we clarify the two algorithms together dBm, 30 dBm and 20 dBm, respectively. Both the number
from two aspects: (1) the relation between the two algo- and region of blockages in mm-FBS are randomly gener-
rithms; and (2) the implementation of the algorithms. ated. Similar to that in [6], when UEs in mm-FBS move to
TABLE 1
Simulation Parameters
Parameters Value
MBS radius 500 m
Power of MBS 46 dBm
Power of mm-FBSs 30 dBm
Power of Tr-FBSs 20 dBm
Bandwidth of MBS/Tr-FBS 20 MHz
Bandwidth of mm-FBS 500 MHz
Path loss exponent for LOS 2
Path loss exponent for NLOS 3
cmax 18 dB
cmin 2 dB
Carrier frequency 2,000 MHz
Parameters for LOS path loss a ¼ 72; h ¼ 2:92
Parameters for NLOS path loss a ¼ 61:4; h ¼ 2
Noise power for mmWave band 77 dBm
Noise power for traditional band 101 dBm
blockage regions, the channel state is assumed to be NLOS

with parameters a ¼ 72 and h ¼ 2:92 in (1). In non-block-
ages areas, the channel state is assumed to be LOS with
parameters a ¼ 61:4 and h ¼ 2 in (1). Other parameters
related to mmWave band path loss model are the same as
those in [6]. For traditional band, the carrier frequency is set
to 2 GHz, and we use path loss model in (5) with different
exponent, ¼ 2 for LOS and ¼ 3 for NLOS. The band-
width allocated to MBS/Tr-FBSs and mm-FBSs is 20 MHz
and 500MHz respectively. The noise power is set to
101dBm and 77dBm for traditional and mmWave band
respectively [5]. We assume that the UEs are randomly dis-
tributed in the area and move to a random direction at a
random speed. We assume that perfect initial cell search
can be performed, and thus UEs can discover BSs correctly
when a handoff occurs. Table 1 summarizes the system Fig. 3. Handoff performance as a function of mm-FBS ratio .
parameters we use in the simulations. Our simulations
are implemented with MATLAB codes and carried out on a RBH a UE may frequently perform handoff for achieving
PC equipped with an Intel-i5 4 core 3.2 GHz processor and maximum data rate, while ignoring the negative effective of
4 G RAM. handoff. We also find that the difference of system through-
put between SMART and RBH is relatively small (2 percent
7.2 Numerical Results and Discussions for ¼ 0:8, 5 percent for ¼ 0:2), implying that significant
In Experiment 1, we compare the number of handoffs and handoff performance gain can be accomplished with a small
system throughput of the three handoff policies. In this compromise on throughput.
experiment, we fix the number of FBSs and UEs as 100 and In Experiment 2, we evaluate the average running time per
500 respectively. The average UE movement speed is 5 m/s. handoff (RT) of the three handoff policies with varying num-
Fig. 3 shows the number of handoffs and system through- ber of FBSs. The simulation settings used remain the same
put for the three handoff policies with different mm-FBS with those in Experiment 1. RT directly reflects the computa-
ratio in 1,000 seconds. Fig. 3a shows that when ¼ 0:2, tional complexity for a handoff policy. Fig. 4 shows the RT of
the total number of handoffs for RBH, SBH and SMART is the three handoff policies as a function of the number of FBSs.
8:3 104 , 6:1 104 and 4:4 104 , respectively. These num- From the figure, we can see that the running time of SMART
bers show that SMART can reduce handoffs by 47 and 28 increases approximately linearly with the number of FBSs.
percent when compared with RBH and SBH respectively. Moreover, we can see that although the RT of SMART is
For ¼ 0:8, the reduction percentages are 50 and 46 percent. always the largest, it is still in the same order of magnitude
Note that fewer handoffs implies reduced signaling over- (within 2-3 times) as that of the other two policies.
head, energy consumption and UE outage probability. In Experiment 3, we examine the average signaling over-
Fig. 3b shows that the system throughput of all the three head (SOH), which is defined as the number of signaling
handoff policies increases with the ratio of mm-FBS because exchanges per handoff with varying number of FBSs by
of increasing available bandwidth in mm-FBS. The system using the same simulation settings. Fig. 5 shows the SOH of
throughput of RBH is higher than that of the other two the three handoff policies as a function of the number of
schemes since that the handoff trigger conditions in RBH FBSs. Note that in the experiment, SMART-S or SMART-M
takes into account only UE data rate. In other words, in algorithm is selected according to the number of handoff
Fig. 4. Running time comparisons for handoff policies.
UEs in each measurement report period, and we count the

total number of signaling exchanges in 1,000 s time as the
SOH of SMART. From the analysis in Section 5. (D) and
Section 6. (C), we know that the SOH of both SMART-S and
SMART-M increases with the number of FBSs in a linear
fashion with different slope. From the figure, we can
observe that trend of SOH for SMART increases approxi-
mately linearly with the number of FBSs, which is concor-
dant with the theoretical analysis. Moreover, we find that
the curve of SMART is very close to that of RBH, which
means that SMART handoff policy does not introduce addi-
tional signaling overhead. This is because that almost all the
handoff procedures of SMART remain the same as that in
conventional handoff policy, except for reward update and
handoff algorithm selection.
In Experiment 4, we examine the effect of UE movement
Fig. 6. Relationship between handoff performance and UE speed.
speed at ¼ 0:5 with parameters the same as the Experiment
1. Fig. 6 shows the number of handoffs and system through- In Experiment 5, we examine the performance of handoff
put for the three handoff policies as a function of the mean policies for varying number of FBSs while using fixed mm-
UE movement speed. From Fig. 6a, we see that from fast FBS ratio 0.5. Other parameters remain the same as those of
walking speed of 2 m/s (7.2 km/h) to slow driving of speed the Experiment 1. Fig. 7 shows the number of handoffs and
of 14 m/s (50 km/h), the numbers of handoffs are increased system throughput as a function of the number of FBSs.
slightly for all three policies. The relative advantage of From Fig. 7a we can see that the number of handoffs for
SMART remains. As expected, Fig. 6b shows that the system SMART is always significantly smaller than that of the other
throughput of all the three policies decreases with UE move- two policies. When the number of FBSs increases from 40 to
ment speed due to faster change of channel quality. 140, the number of handoffs for SMART is increased
slightly. Fig. 7b shows that the throughput of SMART and
SBH increases with the number of FBSs due to more avail-
able wireless resources.
In Experiment 6, we examine the optimality of SMART
policy. SMART-M algorithm cannot achieve the exactly opti-
mum solution due to the relaxation in solving problem (25).
Hence, we compare SMART policy with the optimal solu-
tion, denoted by SMART-OPT, which is obtained by using
integer programming solver, in small scale scenarios. In the
experiments, we set the number of BSs to 20, and vary the
number of USs from 50 to 200. Other simulation settings are
the same with those in Experiment 1. Fig. 8 shows the com-
parison of SMART-OPT with the other three handoff policies
in term of the number of handoffs in 500 seconds. From this
figure, we can see that the difference between SMART-OPT
and SMART is rather small, which means that SMART policy
Fig. 5. Signaling overhead comparisons for handoff policies. can reach a near-optimal performance in terms of the
SMART, the handoff trigger conditions are determined by

taking into account both mmWave channel characteristics
and QoS requirements of UEs. SMART has two BS selec-
tion algorithms for different UE density circumstances.
SMART-S is for single UE and uses reinforcement-learning
for BS selection. SMART-M is for multiple UEs and uses a
heuristic for the simultaneous identification of the best tar-
get BSs. The computational complexity of SMART is much
lower than that of brute force algorithm to calculate the
optimal solution. Moreover, as SMART is based on learn-
ing, it can be implemented in a distributed manner.
Numerical results have shown that the performance of
SMART is near the optimal solution. Without sacrificing
UE QoS, SMART can reduce the number of handoffs by
about 50 percent when compared with handoff policies
without machine learning.
ACKNOWLEDGMENTS
This work was supported by the National Science Founda-
tion of China under Grant number 61631005 and 61471089,
and the Fundamental Research Funds for the Central Uni-
versities under Grant number ZYGX2015Z005.
REFERENCES
[1] B. V. Quang, R. V. Prasad, and I. Niemegeers, “A survey on Hand-
offs lessons for 60 GHz based,” IEEE Commun. Surveys Tutorials,
vol. 14, no. 1, pp. 64–86, Jan.–Mar. 2012.
[2] G. Godor, Z. Jak Knapp, and S. Imre, “A survey of handover
o, A.
management in LTE-based multi-tier femtocell networks:
Requirements, challenges and solutions,” Comput. Netw., vol. 76,
pp. 17–41, 2015.
[3] 3GPP TS 36.331, “E-UTRA Radio Resource Control (RRC); Proto-
col specification (Release 9),” 2016.
[4] QualcommEurope, “Range expansion for efficient support of het-
erogeneous networks,” TSG-RAN WG1, 2008.
[5] S. Singh, M. N. Kulkarni, A. Ghosh, and J. G. Andrews, “Tractable
Fig. 7. Handoff performance as a function of the number of FBSs. model for rate in self-backhauled millimeter wave cellular
networks,” IEEE J. Sel. Areas Commun., vol. 33, no. 10, pp. 2196–
number of handoffs. On the other hand, we would like to 2211, Oct. 2015.
[6] M. R. Akdeniz, et al., “Millimeter wave channel modeling and
mention that the computational complexity of SMART is
cellular capacity evaluation,” IEEE J. Sel. Areas Commun., vol. 32,
much lower than that of brute force algorithm. We find that no. 6, pp. 1164–1179, Jun. 2014.
the brute force algorithm is at least an order of magnitude [7] A. Talukdar, M. Cudak, and A. Ghosh, “Handoff rates for milli-
lower than the other three policies. meterwave 5G systems,” in Proc. IEEE 79th Veh. Technol. Conf.,
2014, pp. 1–5.
[8] F. Guidolin, I. Pappalardo, A. Zanella, and M. Zorzi, “Context-
8 CONCLUSION aware handover policies in HetNets,” IEEE Trans. Wireless Com-
mun., vol. 15, no. 3, pp. 1895–1906, Mar. 2016.
In this paper, the SMART handoff policy is proposed for [9] A. H. Arani, M. J. Omidi, A. Mehbodniya, and F. Adachi, “A
mmWave HetNets based on reinforcement learning. In handoff algorithm based on estimated load for dense green 5G
networks,” in Proc. IEEE Global Commun. Conf., 2015, pp. 1–7.
[10] Z. Guohua, P. Legg, and G. Hui, “A network controlled handover
mechanism and its optimization in LTE heterogeneous networks,”
in Proc. IEEE Wireless Commun. Netw. Conf., 2013, pp. 1915–1919.
[11] G. Araniti, J. Cosmas, A. Iera, A. Molinaro, A. Orsino, and P. Sco-
pelliti, “Energy efficient handover algorithm for green radio
networks,” in Proc. IEEE Int. Symp. Broadband Multimedia Syst.
Broadcast., 2014, pp. 1–6.
[12] H. Leem, J. Kim, D. K. Sung, Y. Yi, and B.-H. Kim, “A novel hand-
over scheme to support small-cell users in a HetNet environ-
ment,” in Proc. IEEE Wireless Commun. Netw. Conf., 2015, pp. 1978–
1983.
[13] B. Linh, M. G. Larrode, R. V. Prasad, I. Niemegeers, and A. M. J.
Koonen, “Radio-over-fiber based architecture for seamless wire-
less indoor communication in the 60 GHz band,” Comput. Com-
mun., vol. 30, no. 18, pp. 3598–3613, 2007.
[14] M. Polese, “Performance Comparison of Dual Connectivity and
Hard Handover for LTE-5G Tight Integration in mmWave cellular
networks,” preprint arXiv:1607.05425, vol. 3, 2016. [Online]. Avail-
Fig. 8. The comparison of number of handoffs with optimal solution. able: http://arxiv.org/abs/1607.04330
[15] M. Mezzavilla, S. Goyal, S. Panwar, S. Rangan, and M. Zorzi, “An Yao Sun received the BS degree in mathematical
MDP model for optimal handover decisions in mmWave cellular sciences from the University of Electronic Sci-
networks,” in Proc. IEEE Eur. Conf. Netw. Commun., 2016, pp. 100– ence and Technology of China (UESTC). He is
105. currently working towards the PhD degree at
[16] M. Wang, A. Dutta, S. Buccapatnam, and M. Chiang, “Smart National Key Laboratory of Science and Technol-
exploration in HetNets: Minimizing total regret with mmWave,” ogy on Communications, UESTC. His research
presented at the IEEE Int. Conf. Sens., Commun. Netw., London, and study interests include intelligent access con-
U.K., 2016. trol, handoff and resource management in mobile
[17] M. N. Soorki, M. J. Abdel-rahman, A. Mackenzie, and W. Saad, networks based on machine learning and other
“Joint access point deployment and assignment in mmWave net- data analytics.
works with stochastic user orientation joint access point deploy-
ment and assignment in mmWave networks with stochastic user
orientation,” in Proc. 15th Int. Symp. Model. Optim. Mobile Ad Hoc Gang Feng (M’01-SM’06) received the BEng and
Wireless Netw., 2017, pp. 1–6. MEng degrees in electronic engineering from the
[18] International TelecommunicationUnion, “Requirements related to University of Electronic Science and Technology
technical performance for IMTadvanced radio interfaces,” ITU of China (UESTC), in 1986 and1989, respec-
I.2134, 2009. tively, and the PhD degrees in information engi-
[19] H. Elshaer, M. N. Kulkarni, F. Boccardi, J. G. Andrews, and neering from The Chinese University of Hong
M. Dohler, “Downlink and uplink cell association with traditional Kong, in 1998. He joined the School of Electric
macrocells and millimeter wave small cells,” IEEE Trans. Wireless and Electronic Engineering, Nanyang Technolog-
Commun., vol. 15, no. 9, pp. 6244–6258, Sep. 2016. ical University, in Dec. 2000 as an assistant
[20] C. Phillips, D. Sticker, and D. Grunwald, “A survey of wireless professor and was promoted as an associate pro-
path loss prediction and a survey of wireless path loss prediction fessor in October 2005. At present he is a profes-
and coverage mapping methods,” IEEE Commun. Surveys Tutori- sor with the National Laboratory of Communications, University of
als, vol. 15, no. 1, pp. 255–270, Jan.-Mar. 2013. Electronic Science and Technology of China. He has extensive research
[21] A. Kumar, D. Manjunath, and J. Kuri, Wireless Networking. experience and has published widely in computer networking and wire-
Burlington, MA, USA: Morgan Kaufmann, 2008. less networking research. His research interests include resource man-
[22] E. Dahlman, S. Parkvall, J. Skold, and P. Beming, 3G Evolution: agement in wireless networks, next generation cellular networks, etc. He
HSPA and LTE for Mobile Broadband. Oxford, U.K.: Academic is a senior member of the IEEE.
Press, 2007.
[23] M. Giordani, M. Mezzavilla, and M. Zorzi, “Initial access in 5G
mm-wave cellular networks,” IEEE Commun. Mag., vol. 54, no. 11, Shuang Qin received the BS degree in electronic
pp. 40–47, Nov. 2016. information science and technology, and the PhD
[24] C. N. Barati, et al., “Directional cell discovery in millimeter wave degree in communication and information system
cellular networks,” IEEE Trans. Wireless Commun., vol. 14, no. 12, from University of Electronic Science and Tech-
pp. 1–13, Dec. 2015. nology of China (UESTC), in 2006 and 2012,
[25] M. Giordani, M. Mezzavilla, C. N. Barati, S. Rangan, and M. Zorzi, respectively. He is currently an associate profes-
“Comparative analysis of initial access techniques in 5G mmWave sor with National Key Laboratory of Science and
cellular networks,” in Proc. Annu. Conf. Inform. Sci. Syst., 2016, Technology on Communications in UESTC. His
pp. 268–273. research interests include cooperative communi-
[26] V. Desai, et al., “Initial beamforming for mmWave cation in wireless networks, data transmission in
communications,” in Proc. 48th Asilomar Conf. Signals Syst. Com- opportunistic networks and green communication
put., 2015, pp. 1926–1930. in heterogeneous networks. He is a member of
[27] A. Capone, I. Filippini, and V. Sciancalepore, “Context informa- the IEEE.
tion for fast cell discovery in mm-wave 5G networks,” in Proc.
IEEE Eur. Wireless Conf., 2015, pp. 1–6.
[28] F. Devoti, I. Filippini, and A. Capone, “Facing the millimeter-
wave cell discovery challenge in 5G networks with context-
awareness,” IEEE Access, vol. 4, pp. 8019–8034, 2016.
[29] F. Pantisano, M. Bennis, W. Saad, S. Valentin, and M. Debbah,
“Matching with externalities for context-aware user-cell associa-
tion in small cell networks,” in Proc. GLOBECOM Workshops, 2013,
pp. 4483–4488. [Online]. Available: http://arxiv.org/abs/
1307.2763
[30] H. Wang, L. Ding, P. Wu, Z. Pan, N. Liu, and X. You, “QoS-aware
load balancing in 3GPP long term evolution multi-cell networks,”
in Proc. IEEE Int. Conf. Commun., 2011, pp. 1–5.
[31] T. L. Lai and H. Robbins, “Asymptotically efficient adaptive allo-
cation rules,” Advances Appl. Math., vol. 6, no. 1, pp. 4–22, 1985.
[32] P. Auer, N. Cesa-Bianchi, and P. Fischer, “Finite-time analysis of
the Multiarmed bandit problem,” Mach. Learning, vol. 47, no. 2/3,
pp. 235–256, 2002.
[33] S. H. Low, S. Member, and D. E. Lapsley, “Optimization flow con-
trol—I: Basic algorithm and convergence,” IEEE/ACM Trans.
Netw., vol. 7, no. 6, pp. 861–874, Dec. 1999.
[34] D. P. Bertsekas, Convex Optimization Theory. Belmont, MA, USA:
Athena Scientific, 2009.
Ying-Chang Liang (F’11) is a professor in the Tak-Shing Peter Yum (F’13) received primary
University of Electronic Science and Technology and secondary School Education in Hong Kong.
of China (UESTC), China, and also a professor in He received the BS, MS, MPh, and PhD degrees
the University of Sydney, Australia. He was a from Columbia University, in 1974, 1975, 1977,
principal scientist and technical advisor in the and 1978 respectively. He joined Bell Telephone
Institute for Infocomm Research (I2R), Singa- Laboratories in April 1978 working on switching
pore. His research interest lies in the general and signaling systems for 2.5 years. Then, he
area of wireless networking and communications, taught at National Chiao Tung University, Taiwan
with current focus on applying artificial intelli- for 2 years before joining The Chinese University
gence, big data analytics and machine learning of Hong Kong in 1982. He was appointed chair-
techniques to wireless network design and opti- man of IE Department two times and elected
mization. He was elected a fellow of the the IEEE in December 2010, dean of the Engineering for two terms (2004-2010). Since June 1, 2010
and was recognized by Thomson Reuters as a Highly cited researcher he took no-pay leave from CUHK to serve as CTO of ASTRI www.astri.
in 2014, 2015 and 2016. He received IEEE ComSocs TAOS Best Paper org (Hong Kong Applied Science and Technology Research Institute
Award in 2016, IEEE Jack Neubauer Memorial Award in 2014, the First Company Limited). He is currently a professor with Hunan University.
IEEE ComSocs APB Outstanding Paper Award in 2012, and the EURA- He has published widely in Internet research with contributions to rout-
SIP Journal of Wireless Communications and Networking Best Paper ing, buffer management, deadlock handling, message resequencing
Award in 2010. He also received the Institute of Engineers Singapore and multi-access protocols. He then branched out to work on cellular
(IES)s Prestigious Engineering Achievement Award in 2007, and the network, lightwave networks, video distribution networks and 3G net-
IEEE Standards Associations Outstanding Contribution Appreciation works. His recently research is in the areas of RFID, sensor networks
Award in 2011, for his contributions to the development of IEEE 802.22 and wireless positioning technologies. He and student Lei Zhu was
standard. He is now serving as the chair of the IEEE Communications awarded the Best Paper Award of ACM MSWiM 2009 with paper title,
Society Technical Committee on Cognitive Networks, an associate edi- “The Optimization of Framed Aloha based RFID Algorithms.” He and
tor of the IEEE Transactions on Signal and Information Processing over another student Xu Chen were awarded the Honorable Mention Award
Network, and an associate editor-in-chief of the World Scientific Journal (the first runner-up of the best paper award) with paper title “Cross
on Random Matrices: Theory and Applications. He served as founding Entropy Approach for Patrol Route Planning in Dynamic Environments”
editor-in-chief of the—IEEE Journal on Selected Areas in Communica- in IEEE international conference on Intelligence and Security Informatics
tions—Cognitive Radio Series, and was the key founder of the new jour- (ISI), 2010. He is a fellow of the IEEE.
nal the IEEE Transactions on Cognitive Communications and
Networking. He has been an (associate) editor of the IEEE Transactions
on Wireless Communications, the IEEE Transactions on Vehicular " For more information on this or any other computing topic,
Technology, and the IEEE Signal Processing Magazine. He was a distin- please visit our Digital Library at www.computer.org/publications/dlib.
guished lecturer of the IEEE Communications Society and the IEEE
Vehicular Technology Society, and has been a member of the Board of
Governors of the IEEE Asia-Pacific Wireless Communications Sympo-
sium since 2009. He served as Technical Program Committee (TPC)
Chair of CROWN08 and DySPAN10, Symposium Chair of ICC12 and
Globecom12, General co-chair of ICCS10 and ICCS14. He serves as
TPC Chair and Executive co-chair of Globecom17 to be held in Singa-
pore. He is a fellow of the IEEE.

08067509

Încărcat de

Informații document

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

08067509

Încărcat de

Drepturi de autor:

Formate disponibile

1456 IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. 17, NO.

The SMART Handoff Policy for Millimeter Wave

Index Terms—Handoff, HetNets, millimeter wave, reinforcement learning

T HE 5th generation (5G) networks are expected to support

design objective is to reduce the number of unnecessary

Furthermore, to classify the type of service more precisely,

4 FRAMEWORK OF SMART HANDOFF POLICY

sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi! where xij is a binary variable indicating whether UE i

and "k is updated according to mkþ1 v k2 ¼ km

blockage regions, the channel state is assumed to be NLOS

Fig. 4. Running time comparisons for handoff policies.

UEs in each measurement report period, and we count the

SMART, the handoff trigger conditions are determined by

S-ar putea să vă placă și

08067509

Încărcat de

Informații document

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

08067509

Încărcat de

Drepturi de autor:

Formate disponibile

1456 IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. 17, NO.

The SMART Handoff Policy for Millimeter Wave

Index Terms—Handoff, HetNets, millimeter wave, reinforcement learning

T HE 5th generation (5G) networks are expected to support

design objective is to reduce the number of unnecessary

Furthermore, to classify the type of service more precisely,

4 FRAMEWORK OF SMART HANDOFF POLICY

sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi! where xij is a binary variable indicating whether UE i

and "k is updated according to mkþ1  v k2 ¼ km

blockage regions, the channel state is assumed to be NLOS

Fig. 4. Running time comparisons for handoff policies.

UEs in each measurement report period, and we count the

SMART, the handoff trigger conditions are determined by

S-ar putea să vă placă și

and "k is updated according to mkþ1 v k2 ¼ km