Sunteți pe pagina 1din 13

Wireless Netw

DOI 10.1007/s11276-017-1551-9

A hierarchical learning approach to anti-jamming channel


selection strategies
Fuqiang Yao1,2 • Luliang Jia2,3 • Youming Sun2,3 • Yuhua Xu2,3 • Shuo Feng4 •

Yonggang Zhu1,2

 Springer Science+Business Media, LLC 2017

Abstract This paper investigates the channel selection Finally, we present simulation results to validate the
problem for anti-jamming defense in an adversarial envi- effectiveness of the proposed algorithm.
ronment. In our work, we simultaneously consider mali-
cious jamming and co-channel interference among users, Keywords Anti-jamming  Stackelberg game  Channel
and formulate this anti-jamming defense problem as a selection  Stochastic learning
Stackelberg game with one leader and multiple followers.
Specifically, the users and jammer independently and
selfishly select their respective optimal strategies and 1 Introduction
obtain the optimal channels based on their own utilities. To
derive the Stackelberg Equilibrium, a hierarchical learning The security of the spectrum availability (SSA) is a critical
framework is formulated, and a hierarchical learning issue in wireless communication networks under the con-
algorithm (HLA) is proposed. In addition, the convergence dition of complicated electromagnetic environment. It
performance of the proposed HLA algorithm is analyzed. mainly faces the following challenges. Firstly, malicious
jamming attack poses a significant security threat to the
SSA. Secondly, mutual interference, which comes from the
& Luliang Jia users with the same channel and unwanted emissions in
jiallts@163.com
spurious and out-of-band domain, also severely degrades
Fuqiang Yao network performance, especially in the area with enormous
yfq2030@163.com
number of wireless devices [1]. In our work, for the mutual
Youming Sun interference, we mainly consider the co-channel interfer-
sunyouming10@163.com
ence, which is the main source of mutual interference. For
Yuhua Xu the malicious jamming attack, we focus on the smart
yuhuaenator@gmail.com
jammer, which can adjust its strategies adaptively accord-
Shuo Feng ing to users’ strategies as well as dynamic environment to
fengs13@mcmaster.ca
maximize its damage.
Yonggang Zhu Anti-jamming defense is of great importance in the SSA
zhumaka1982@163.com
of wireless communications and has attained growing
1
Nanjing Telecommunication Technology Institute, attention in recent years [2, 3]. It is known that various
Nanjing 210007, China countermeasures have been proposed in order to deal with
2
College of Communication Engineering, PLA University of the threats of jamming attacks, such as Frequency Hopping
Science and Technology, Nanjing 210007, China Spread Spectrum (FHSS) and Uncoordinated Frequency
3
Science and Technology on Communication Networks Hopping (UFH). However, these approaches need wide-
Laboratory, Shijiazhuang 050002, China band spectrum or continuous broadband, and they are
4
Computational Science and Engineering at McMaster spectrally inefficient. Typically, these methods are infea-
University, Hamilton, ON L8S 4L8, Canada sible in the dynamic and scarce spectrum environment

123
Wireless Netw

[4–6]. Therefore, more flexible and efficient anti-jamming malicious jamming, which can severely deteriorate the
techniques are needed. intended transmission and pose a significant threat to
Cognitive radio technology can re-configure communi- wireless communications. In [4], a game-theoretic anti-
cation parameters intelligently, and it can be exploited by jamming scheme was proposed, and a channel selection
jamming attacks to launch more complicated and strategy was provided to combat jamming attacks. Wu
threatening attacks [7, 8]. Since users and jamming et al. [8] investigated the anti-jamming defense in cog-
attacks independently and selfishly select their strategies, nitive radio networks with a secondary user and proposed
game theory [9–14] can be employed to analyze the a flexible channel selection scheme. However, the studies
interactions between the users and jamming attacks. In mentioned above paid little attention to the mutual
[15], an anti-jamming stochastic game framework was competition among users for both limited spectrum
proposed in the presence of cognitive attackers. In [16], a resources and the sequential actions between users and
dogfight game was modeled to analyze the interactions external jamming attacks.
between defending secondary users and attacks in cog- Although there exist some related works for interference
nitive radio systems. The authors in [17] investigated the mitigation and anti-jamming, the two important aspects
security problem in cognitive radio ad hoc networks and were separately investigated in previous studies. In this
proposed a stochastic game to defend against sweep paper, the malicious jamming and co-channel interference
attack. In [18], a Bayesian game was formulated to among users are simultaneously considered, and we for-
analyze the competitive interactions between a transmit- mulate the channel selection problem for anti-jamming
ter–receiver pair and a jammer. A zero-sum game was defense as a Stackelberg game, which is an appropriate
formulated in [19] to make the strategic decision-making method to model the hierarchical interactions and make the
for both AWGN channels and frequency selective fading sequential decision-making between the users and smart
channels. In [20] and [21], a Markov game was formu- jammer. However, it is challenging to obtain the SE solu-
lated to model the interactions between the legitimate tions. Inspired by [32–36], we can incorporate learning
transmitter and jammer, and the optimal defense strategy technologies into the proposed game and obtain the
was obtained. In [22], a game-theoretic framework was learning solution.
adopted to investigate the anti-jamming rendezvous Since game theory is a well-developed and flexible tool
problem, and the known and unknown rendezvous to analyze interactions among players, and learning tech-
channel cases were considered. nologies can cope with the constraints of lacking infor-
Considering the hierarchical competition among players, mation exchange and random environment, incorporating
Stackelberg game [23, 24] is a suitable framework to learning technologies into game theory is a promising and
capture the sequential interactions between the users and important research direction [32–36]. In this paper, we
jammers. In [25] and [26], the anti-jamming defense extend the stochastic learning into a hierarchical learning
problem was formulated as a Stackelberg game, and the framework and propose a hierarchical learning algorithm
optimal power control strategy was achieved. Anti-jam- (HLA) based on the stochastic learning theory to acquire
ming Stackelberg game with observation error was ana- the solutions of the proposed game. The main contributions
lyzed in [27]. The uncertainties of the channel state of this paper are given as follows.
information and transmission cost information were con-
• We investigate the channel selection problem in
sidered, and a Bayesian anti-jamming Stackelberg game
wireless communication networks when malicious
was proposed in [28]. The anti-reactive-jamming problem
jamming and co-channel interference among users are
was investigated, and the transmitter-jammer interactions
simultaneously considered, and formulate this anti-
were modeled as a Stackelberg game in [29]. In [30] and
jamming defense problem as a Stackelberg game.
[31], the authors formulated the sequential interactions
Moreover, its properties are investigated. The network
among a mobile device, a smart attacker, and a security
utility is the expected weighted aggregate interference
agent as a Stackelberg game, and the Stackelberg
and jamming. Specifically, the smart jammer and users
Equilibrium (SE) was obtained. However, few work has
independently and selfishly select their optimal strate-
devoted to the channel selection problem in the anti-jam-
gies, and determine their optimal channels to maximize
ming field.
their own utilities.
Channel selection plays an important role in anti-
• To obtain the solution of the proposed game, based on
jamming defense. There are some studies for channel
the stochastic learning theory, a hierarchical learning
selection to mitigate the co-channel interference among
framework is formulated, in which each player chooses
users. Specifically, in [32–34], the interference mitigation
its channel strategy based on a probability distribution.
game was formulated and proved to be an exact potential
Furthermore, a hierarchical and learning-based channel
game. These studies, however, did not consider the

123
Wireless Netw

selection algorithm is proposed, which can converge to jamming attack, and the jamming channel set is
the SE solution of the proposed game. C ¼ fc1 ; c2 ; . . .; cH g. In this paper, to simplify analysis, we
The rest of this paper is organized as follows. In Sect. 2, we assume that only one channel cj 2 C is jammed at a time,
introduce the system model and problem formulation. In and the jamming channel set C is the same as the available
Sect. 3, an anti-jamming channel selection game is for- channel set M.
mulated, and its properties are investigated. In Sect. 4, a Similar to [32], it is assumed that the available channels
hierarchical stochastic learning algorithm is proposed. In undergo block fading, and the Rayleigh fading model is
Sect. 5, simulation results are presented. Finally, Section 6 considered. Moreover, we assume that each user selects
draws the conclusions. one channel for transmission. The instantaneous interfer-
ence gain from user m to n is defined as Hmn an
¼ ðdmn Þa1
an
bmn , where superscript an represents the selected channel of
2 System model and problem formulation user n, dmn , a1 and bamnn denote the distance, path-loss
exponent and instantaneous fading coefficient between user
2.1 System model m and user n, respectively. The instantaneous jamming
c c
gain from the smart jammer to user n is Hjnj ¼ ðdjn Þa2 bjnj ,
As shown in Fig. 1, there is one jammer and N users in our where superscript cj denotes the selected channel of the
system. Denote the user set as N ¼ f1; 2; . . .; N g and c
smart jammer, djn , a2 and bjnj are distance, path-loss
users’ available channel set as M ¼ f1; 2; . . .; M g. The exponent and instantaneous fading coefficient between the
jammer is smart, and it can sense the users’ available smart jammer and user n, respectively.
channels and adjust its strategies adaptively according to
the users’ strategies as well as dynamic environment to 2.2 Problem formulation
achieve maximum damage. On the other hand, each user is
intelligent such that it adopts flexible channel selection To improve network performance, efficient channel selec-
strategy, which can minimize its received external jam- tion faces the following challenges in the anti-jamming
ming and co-channel interference among users. field. Firstly, it is indispensable to defend against the
The channel selection profile of all users is malicious jamming. Secondly, it needs to mitigate the co-
a ¼ fa1 ; a2 ; . . .; aN g. Let an ¼ ½a0 ; a1 ; . . .; an1 ; anþ1 ; channel interference among users due to the conflicting
. . .; aN  denote the channel selection profile of all the users competition for limited channel resources. The selected
except user n. For users, if two or more users choose the channel of user n and smart jammer are denoted as an 2 M
same channel, mutual co-channel interference emerges. For and cj 2 C, respectively. Then, the received interference
the smart jammer, it chooses h channels to launch the and jamming of user n is given by:
X c
an
Dn ¼ In þ Jn ¼  Pm Hmn f ðam ; an Þ  Pj Hjnj f ðcj ; an Þ;
m2fN=fngg

ð1Þ
where In denotes the received co-channel interference from
other users, Jn represents the malicious jamming, Pm and Pj
denote the transmitting power of the user m and smart
jammer, respectively, and f ðÞ is the Kronecker delta
function, which is expressed as:

1; an ¼ a m ;
f ðam ; an Þ ¼ ð2Þ
0; an 6¼ am :
Referring to [32–34], the network utility function is the
expected weighted aggregate interference and jamming
given by:
Fig. 1 System model

123
Wireless Netw

X
U¼ Pn E½Dn ; From the perspective of the users, the follower sub-game
n2N can be defined as:
X X  
¼ mn
Pn P m H an
f ðam ; an Þ Gf ¼ N; M; un ðan ; an ; cj Þ ; ð6Þ
ð3Þ
n2N m2fN=fngg
X c where all the users are players, and the strategy set is the
þ jnj f ðcj ; an Þ;
Pn Pj H
n2N
available channel set M. Each user selfishly and inde-
h i pendently chooses its optimal channel, which can maxi-
 a 
where Hmn
an
¼ E Hmn
n
¼ ðdmn Þa1 bamnn , H jncj ¼ E Hjn
cj
¼ mize its utility.
a2 cj For the smart jammer, it aims at achieving its maximum
ðdjn Þ bjn , E½ denotes the statistical expectation. The
damage, and its utility function can be expressed as:
object of efficient channel selection strategy is to minimize
the received interference and jamming, thus, improving the   X
uj a; cj ¼ jncj f ðcj ; an Þ:
Pn Pj H
network performance, i.e., ð7Þ
n2N
X
ðP1Þ : aopt ¼ arg min U ¼ arg min Pn E½Dn : ð4Þ The smart jammer’s optimization problem is
n2N  
cj ¼ arg maxcj uj a; cj .
However, due to the randomness of environment, the com- From the smart jammer’s perspective, the leader sub-
binatorial optimization problem P1 is intractable through game can be defined as:
standard optimization methods. In the following, this prob-   
lem is formulated as a Stackelberg game, and a learning- Gl ¼ J; C; uj a; cj ; ð8Þ
based algorithm is proposed to obtain its SE. where the strategy set is the jamming channel set C,
which is the same as the available channel set M in this
paper.
3 Anti-jamming channel selection game
3.2 Stackelberg game solution
3.1 Game model
Definition 1 (Exact potential game [37]) The game Gf is
We formulate the channel selection problem for anti-jam-
an exact potential game, if there is a potential function U
ming defense as a Stackelberg game. Mathematically, the
which holds that the variation in potential function equals
anti-jamming channel selection game is denoted as
  to the variation in the utility function due to any player’s
G ¼ N; J; M; C; un ; uj , where N is the user set, J
unilateral deviation. Mathematically,
denotes the smart jammer, M and C respectively represent      
the strategy set of the users and smart jammer, un and uj are U a~n ; an ; cj  U an ; an ; cj ¼ un a~n ; an ; cj
  ð9Þ
the utility function of user n and smart jammer, respec-  un an ; an ; cj ;
tively. To be specific, the smart jammer acts as the leader,
where a~n is the user n’s action after unilateral deviation.
whereas the users are followers.
From the user-side, it aims to minimize the received inter- Lemma 1 ([37]) Every exact potential game possesses at
ference and jamming, and its utility can be defined as [32]: least one pure strategy Nash Equilibrium (NE).
un ðan ; an ; cj Þ ¼ L  Pn E½Dn ; Theorem 1 The follower sub-game Gf with given strat-
X
¼L mn
Pn Pm H an
f ðam ; an Þ egy cj of the smart jammer is an exact potential game.
ð5Þ
m2fN=fngg
Proof Motivated by [32–34], given the smart jammer’s
 jncj f ðcj ; an Þ;
Pn Pj H strategy cj , a potential function of the follower sub-game
can be formulated as:
where L denotes a predefined positive constant and will be
explained in Sect. 4.1. The user n’s object is to tune its      
U an ; an ; cj ¼ U1 an ; an ; cj þ U2 an ; an ; cj ;
channel selection to maximize its utility, and its opti-
1X X ð10Þ
mization problem can be written as an ¼ arg maxan un ¼ Pn E½In   Pn E½Jn ;
2 n2N n2N
ðan ; an ; cj Þ.

123
Wireless Netw

   
where U a~n ; an ; cj  U an ; an ; cj
   
  ¼ U1 a~n ; an ; cj þ U2 a~n ; an ; cj
U1 an ; an ; cj    
 U1 an ; an ; cj  U2 an ; an ; cj
1X 1X X mn
an X X
¼ Pn E½In  ¼  Pn Pm H f ðam ; an Þ ¼ Pn Pm Hmn
an
f ðam ; an Þ þ
cj
jn
Pn Pj H f ðcj ; an Þ
2 n2N 2 n2N m2fN=fngg
8 m2fN=fngg j2C
X X
1< X mn
an  mn
Pn Pm H a~n
f ðam ; an Þ  jncj f ðcj ; a~n Þ
Pn Pj H
¼ Pn Pm H f ðam ; an Þ
2 :m2fN=fngg m2fN=fngg j2C
   
9 ¼ un a~n ; an ; cj  un an ; an ; cj
X X =
a
mii f ðam ; ai Þ
þ Pi Pm H ð15Þ
i2fN=fngg m2fN=figg
;
8 Thus, according to Definition 1, the follower sub-game Gf
1< X mn
an
is an exact potential game. h
¼ Pn Pm H f ðam ; an Þ
2 :m2fN=fngg
In the following, we will analyze the existence of SE in
X X
þ Pi Pm H ai
mi f ðam ; ai Þ the formulated Stackelberg game.
i2fN=fngg m2fN=fi;ngg
9 Theorem 2 Given the smart jammer’s strategy cj , the
X = follower sub-game Gf always exists a strategy that satisfies
þ niai f ðan ; ai Þ :
Pi Pn H
; un ðan ; an ; cj Þ  n ðan ; an ; cj Þ, which is a pure strategy NE
i2fN=fngg
point.
ð11Þ
Proof Given the smart jammer’s strategy cj , then the
Similar to [32–34], interference symmetry is considered. users, which are rational and independent, play a non-co-
By applying Hmn
an
¼H nm
an
, we have operative channel selection sub-game.
X X Based on Theorem 1, the follower sub-game Gf is an
mn
Pn Pm H an
f ðam ; an Þ ¼ niai f ðan ; ai Þ:
P i Pn H exact potential game, and it has at least a pure strategy NE
m2fN=fngg i2fN=fngg point. h
ð12Þ
For the leader sub-game Gl , which is a non-cooperative
 
Hence, we can rewrite U1 an ; an ; cj as follows: game. Every finite strategic game has a mixed strategy
equilibrium [9]. Therefore, there exists a SE in the sense of
  X
U1 an ; an ; cj ¼  mn
Pn Pm H an
f ðam ; an Þ  Tðan Þ; stationary strategy.
m2fN=fngg Based on the above analysis, we assume that a smart
jammer’s mixed strategy is h0 . In the following, we define
ð13Þ
the SE for the proposed game [9].

P P ai Definition 2 The strategy profile a; h0 constitutes the


mi
where Tðan Þ ¼ i2fN=fngg m2fN=fi;ngg Pi Pm H f ðam ;
ai Þ is independent of an . SE for the proposed game, if the following conditions hold:
In addition,

uj a; h0  uj ða; h0 Þ; ð16Þ


 
U2 an ; an ; cj

X X un an ; an ; h0  n an ; an ; h0 : ð17Þ


¼ Pn E½Jn  ¼  jncj f ðcj ; an Þ
Pn Pj H
n2N n2N Theorem 3 In the proposed game, there exists a smart
X ð14Þ
jncj f ðcj ; an Þ 
¼ Pn Pj H jicj f ðcj ; ai Þ
P i Pj H jammer’s stationary policy and a users’ NE policy that
i2fN=fngg consist a SE.
c
jnj f ðcj ; an Þ  C2 ðan Þ:
¼ Pn Pj H
Proof Inspired by [35], if a smart jammer’s stationary
policy h0 is given, the proposed game reduces to a non-
P cooperative game. Based on Theorem 1, the follower sub-
jicj f ðcj ; ai Þ is also inde-
where C2 ðan Þ ¼ i2fN=fngg Pi Pj H
game is an exact potential game. According to [37], every
pendent of an .
finite exact potential game possesses at least one NE.
Therefore, we have

123
Wireless Netw

Therefore, NEðh0 Þ always exists in the proposed game


given the smart jammer’s policy h0 . However, the smart
jammer’s stationary policy can be given by:
h0 ¼ arg maxuj ðh0 ; NEðh0 ÞÞ: ð18Þ
h0

According to [9], every finite strategic game has a mixed


strategy equilibrium. Hence, h0 ; NE h0 constitutes a


SE in the sense of stationary strategy. h

4 Hierarchical learning solution for anti-jamming


channel selection game

4.1 Algorithm description Fig. 2 The diagram of the proposed HLA


X X
an ðtÞ
mn
In this subsection, to achieve the solution of the pro- un ðtÞ ¼ L  Pn Pm H f ðam ðtÞ; an ðtÞÞ
n2N m2fN=fngg
posed game, a hierarchical learning framework based on X c ðtÞ
jnj f ðcj ðtÞ; an ðtÞÞ:
the stochastic learning theory is formulated, and a hier-  Pn Pj H
archical learning algorithm (HLA) is proposed. We n2N
extend the proposed anti-jamming channel selection ð19Þ
game to a mixed strategy form in order to characterize
where the predefined positive constant L is introduced to
the proposed HLA. A mixed strategy of user n at time
slot t is defined as hn ðtÞ ¼ ðhn1 ðtÞ; . . .; hnm ðtÞ; . . .; hnM ðtÞÞ
P
and m2M hnm ðtÞ ¼ 1, where hnm ðtÞ means the proba- ensure that the random payoff is nonnegative, and the
bility that the user n selects channel m from users’ proposed algorithm in the following is effective.
available channel set M ¼ f1; 2; . . .; M g at time slot In the leader sub-game, an online channel selection
t. For the smart jammer, a mixed strategy at epoch k is algorithm based on Q-learning [35, 36, 43, 44] is proposed
denoted by h0 ðkÞ ¼ ðh01 ðkÞ; . . .; h0h ðkÞ; . . .; h0H ðkÞÞ and for the smart jammer. In the learning process, the strategy
P
h2C h0h ðkÞ ¼ 1, where h0h ðkÞ means the probability that is updated by repeated interaction with surrounding envi-
the smart jammer selects channel h from the jamming ronment, and the policies that yield high payoff are rein-
channel set C ¼ fc1 ; c2 ; . . .; cH g at epoch k. In the hier- forced through a trial and error exploration. The overall
archical learning framework, the smart jammer and users goal of learning is to learn to optimize its long-term
update their policies at different time scales. The smart cumulative payoff. The smart jammer’s random payoff can
jammer updates its policies at each epoch k, whereas be estimated by exploration. For the kth epoch, the esti-
users update their policies at each time slot t, and each mated random payoff of the smart jammer is given by:
X
epoch contains T time slots. uj ð k Þ ¼ jncj ðkÞ f ðcj ðkÞ; an ðkÞÞ:
Pn Pj H ð20Þ
In the follower sub-game, based on the stochastic n2N
learning automata (SLA) [38], we propose a learning-
An illustrative diagram of the proposed HLA is shown in
based anti-jamming channel selection algorithm, which is
an uncoupled and distributed algorithm. Specifically, it
does not need the information exchange, and each user Fig. 2, and the proposed algorithm is presented in Algo-
updates its policy based on its individual payoff. The rithm 1. At each time slot t, based on the probability dis-
stochastic learning automata [38] can learn the optimal tribution hn ðtÞ, each user chooses a channel an ðtÞ 2 M at
strategies from a set of strategies through repeated random. And the probability distribution hn ðtÞ is updated at
interactions with random environment, and has been each time slot t according to the feedback from the envi-
employed for adaptive decision-making in wireless ronment. The smart jammer takes actions at each epoch
communications [32, 39–42]. For the tth time slot, we k in a similar way. For the smart jammer, the stop criterion
assume that the users’ channel selection profile is is either the channel selection probability vector holds
aðtÞ ¼ fa1 ðtÞ; a2 ðtÞ; . . .; aN ðtÞg. The user n’s random h0 ðkÞ ¼ h0 ðk þ 1Þ, or the maximum iteration number is
payoff can be written as: reached. For users, we have similar stop criterions.

123
Wireless Netw

Algorithm 1: Hierarchical learning algorithm (HLA)

Step 1: Set t=0, k=0 and initialize the mixed strategy for both the users and smart jammer, and θ0h (k) =

1/ |C |, θnm (t) = 1/ |M | , ∀h ∈ C , ∀m ∈ M .

Step 2: In the kth epoch, the smart jammer stochastically selects jamming channel ch (k) according to

its policy θ0 (k).

Step 3: Stochastic learning process of all users, for each epoch k.

(1) In the tth slot, each user n stochastically selects channel an (t) according to its current strategy

θn (t).

(2) Each user n measures its utility un (t).

(3) Each user n updates its strategy according to the following rules:

θnm (t + 1) = θnm (t) + b1 ũn (t) (1 − θnm (t)) ,m = an (t),


(21)
θnm (t + 1) = θnm (t) − b1 ũn (t)θnm (t) ,m = an (t),

where 0 < b1 < 1 is the learning step size of users, and ũn (t) is the normalized utility.

ũn (t) = un (t)/L. (22)

Step 4: The smart jammer measures its utility u j (k).

Step 5: The smart jammer updates its Q values according to the following rule.

Q0k+1 (ch ) = (1 − κ0k )Qk0 (ch ) + κ0k u j (k) , (23)

2
where κ0k ∈ [0,1) is the learning rate, which holds ∑∞ k ∞ k
k=0 κ0 = ∞, ∑k=0 κ0 < ∞. And the smart jammer

updates its policy based on the Boltzmann distribution.

exp Qk0 (ch )/τ0


θ0h (k) = , (24)
∑ exp Qk0 (ch )/τ0
C

where the temperature τ0 controls the tradeoff of exploration-exploitation. Specifically, for τ0 → 0, the

smart jammer tends to select the policy with the maximum Q value, whereas for τ0 → ∞, the smart

jammer’s policy is completely random.

Step 6: If the stopping criterion holds, stop. Otherwise, update k=k+1 and go to step 2.

123
Wireless Netw

(" #
4.2 Convergence analysis h0h ðkÞ jk X
¼ h0h ðkÞ 0 uj ð k  1Þ  h0g ðkÞuj ðk  1Þ
dk s0
Let aðtÞ ¼ fa1 ðtÞ; a2 ðtÞ; . . .; aN ðtÞg represent channel g2C

selection profile of all users at time slot t, and hðtÞ ¼ X h0h ðkÞ
s0 h0g ðkÞ ln g:
fh1 ðtÞ; h2 ðtÞ; . . .; hN ðtÞg denote a mixed strategy profile of g2C
h0g ðkÞ
users, where hn ðtÞ ¼ ðhn1 ðtÞ; hn2 ðtÞ; . . .; hnM ðtÞÞ, 8n 2 N
ð30Þ
denotes the channel selection probability of user n at time
slot t. Moreover, the function hns ðhÞ is defined by: In the following, we resort to the ODE to analyze the
X convergence of h0 ðkÞ, which converges to the solution of
hns ðhÞ ¼ un ða1 ; . . .; an1 ; s; anþ1 ; . . .; aN Þ P htat ;
at ;t6¼n
t6¼n the ODE [38]. The right-hand side of (30) can be denoted
as f ðh0 Þ. h0 ðkÞ will converge weakly, as jk0 ! 0, to
ð25Þ

h0 ; NE h0 , which is the solution of dh dk ¼ f ðh0 Þ,


0

where hns ðhÞ represents the user n’s expected utility func-
tion when user n selects action s and other users adopt h0 ð0Þ ¼ h0 , where h0 ð0Þ is the initial condition. h
mixed strategy hn . Theorem 4 The proposed HLA can always find a SE.
Motivated by [32, 35, 36, 41], the asymptotic conver-
gence behavior of the proposed HLA algorithm is analyzed Proof Inspired by [35], we prove it by contradiction. It is
in the following Theorems 4, 5 and 6. assumed that the learning process converges to a point that
is not a SE. According to Theorem 3.1 in [38], the learning
Lemma 2 As b1 ! 0, the follower-SLA converges to a process converges to the solution of ODE that are
pure NE point in the follower learning sub-game. stable points. Therefore, the proposed HLA only converge
Proof Based on Theorem 3.1 in [38], as b1 ! 0, the to stable points.
sequence hn ðtÞ converges to the solution of the ordinary Thus, the stationary point is a non-SE, which is
differential equation (ODE): contradictory with Theorem 3. This completes the proof. h
dh
¼ f ðhÞ; hð0Þ ¼ h0 ; ð26Þ
dt
5 Numerical results and discussions
where f ðhÞ ¼ E½Gðhn ðtÞÞ; an ðtÞ; un ðtÞjhn ðtÞ ¼ hn , and hð0Þ
is the initial condition. In this subsection, the simulation results are performed to
Moreover, similar to [32, 41], define function show the performance of the proposed HLA. In the simu-
HðhÞ ¼ E½UðhÞ, where UðÞ denotes potential function lation, the system involves four channels, the users are
specified by Eq. (10). Therefore, randomly located in a region of 100 m  100 m, and the
X
Hðm; hn Þ ¼ Un ða1 ; . . .; an1 ; s; anþ1 ; . . .; aN ÞPt6¼n htat : jammer located in a region of ½150; 200 m. An illustrative
at ;t6¼n diagram of the placement of users and smart jammer is
ð27Þ shown in Fig. 3. The fading of the channels is Rayleigh
fading model, and the fading parameters are exponentially
Applying (15), (25) and (27), we have distributed with unit mean. Referring to [32, 34], the
parameters for this paper are given as: The user n’s
Hðm1 ; hn Þ  Hðm2 ; hn Þ ¼ hnm1 ðhÞ  hnm2 ðhÞ: ð28Þ
transmitting power Pn ¼ 2W, the smart jammer’s trans-
According to Theorem 3.2 in [38] and Theorem 5 in mitting power Pj ¼ 25W, the path loss exponent
[41], Theorem 4 is proved. h a1 ¼ a2 ¼ 2, the positive constant L = 0.005, and the
learning step size b1 ¼ 0:08. Unless otherwise stated, these
Lemma 3 The leader Q-learning converges to

parameters are fixed.


h0 ; NE h0 in the leader learning sub-game. The convergence of the proposed HLA for a single
simulation run is shown in Figs. 4 and 5. A system with
Proof Motivated by [35, 36, 44], we describe the evolu- five users and one smart jammer is considered. Take user 1
tion of the Q values in the following differential equation.
for example, its convergence behavior in the first epoch is
dQk0 ðch Þ   presented in Fig. 4. At time slot t ¼ 0, user 1 randomly
¼ jk0 uj ðkÞ  Qk0 ðch Þ : ð29Þ
dk selects its channel with equal probabilities
(h11 ¼ h12 ¼ h13 ¼ h14 ¼ 1=4). The channel selection
Compared to the dynamics of Q values, we would like to
probability h14 converges to 1 in about 300 iterations, h11 ,
express the dynamics in terms of strategies. By differen-
h12 and h13 converge to 0. Figure 5 shows the smart jam-
tiating (24) with respect to k and use (29), we have
mer’s convergence behavior. At k ¼ 0, the smart jammer

123
Wireless Netw

1
θ
01
0.9
θ
02
0.8 θ03

Channel selection probabilities


θ04
0.7

0.6

0.5

0.4

0.3

0.2

0.1

0
0 10 20 30 40 50
Epoch numbers

Fig. 5 Evolution of channel selection probabilities of the smart


Fig. 3 The diagram of the placement of the users and smart jammer jammer for a single simulation run

1 0.06

Expected weighted aggregate interference and


θ HLA
11
0.9 Random selection
θ
12 0.05
0.8 θ13
Channel selection probabilities

θ
14
0.7 0.04
jamming (U)

0.6
0.03
0.5

0.4
0.02

0.3

0.01
0.2

0.1
0
4 5 6 7 8 9 10
0
0 100 200 300 400 500 Number of users (N)
Iteration numbers
Fig. 6 Performance comparison of expected weighted aggregate
Fig. 4 Evolution of channel selection probabilities of the user 1 in the interference and jamming for different channel selection strategies of
first epoch for a single simulation run the users

selects the channels randomly with equal probabilities random selection. Moreover, the performance gap will be
(h01 ¼ h02 ¼ h03 ¼ h04 ¼ 1=4). As the algorithm iterates, more significant with the number of users increasing. This
the channel selection probabilities can converge in about is due to the fact that the proposed HLA may converge to a
13 epochs. desirable solution, whereas the random selection scheme is
In order to evaluate the performance of the proposed an instinctive approach.
HLA, it is compared with the random selection scheme, in In Fig. 7, we have shown the influence of the trans-
which each user randomly selects one channel at each time mitting power of the users and smart jammer. Figure 7
slot. The presented results are obtained by simulating 1000 indicates that the expected weighted aggregate interference
independent trials and then taking the mean. As can be seen and jamming will increase with the growing transmitting
from Fig. 6, the proposed HLA is superior to the random power of the users. The reason is that higher transmitting
selection scheme and yields lower expected weighted power of the users leads to more serious mutual interfer-
aggregate interference and jamming. Specifically, the ence. Moreover, as higher transmitting power of the smart
proposed HLA reduces the expected weighted aggregate jammer results in more serious damage, the expected
interference and jamming by 56% for 8 users versus a weighted aggregate interference and jamming will also

123
Wireless Netw

x 10
−3 0.025

Expected weighted aggregate interference and


11 HLA
HLA, P =20W No jamming
j
Expected weighted aggregate interference and

10 HLA, Pj=25W Random jamming


0.02
HLA, P =30W
9 j

jamming (U)
8 0.015
jamming (U)

0.01
6

5
0.005
4

3
0
4 5 6 7 8 9 10
2 Number of users (N)
1 1.5 2 2.5 3
Transmitting power of users (W)
Fig. 9 Performance comparison of expected weighted aggregate
interference and jamming for different types of jamming attacks
Fig. 7 The expected weighted aggregate interference and jamming
versus transmitting power of the users (N ¼ 5)

time. For the no jammer scenario, the algorithm is con-


0.4 sistent with the algorithm in [32], in which only mutual co-
Expected weighted aggregate interference and

Rayleigh (0dB) channel interference is considered. As indicated in Fig. 9,


0.35 Rayleigh (1dB)
Rayleigh (3dB) compared to random jammer and no jammer scenario, the
0.3 network undergoes the highest expected weighted aggre-
gate interference and jamming in the smart jammer sce-
jamming (U)

0.25
nario. The reason is that the smart jammer can adjust its
0.2 strategies adaptively according to the users’ strategies as
well as dynamic environment, and achieve maximum
0.15
damage.
0.1

0.05
6 Conclusion
0
5 10 15 20 25
Number of users (N) In this paper, the channel selection scheme for anti-jam-
ming defense had been investigated in an adversarial
Fig. 8 Performance comparison of expected weighted aggregate
interference and jamming for different Rayleigh fading parameters environment. First, in order to analyze the sequential
interactions between the users and smart jammer, a
Stackelberg game was formulated, and its properties were
increase with the growing transmitting power of the smart analyzed. Then, based on the stochastic learning theory, a
jammer. The performance comparison for different Ray- hierarchical learning algorithm (HLA) was proposed.
leigh fading parameters is shown in Fig. 8. Specifically, it Moreover, the proposed HLA was proven to converge to
is noted in Fig. 8 that the expected weighted aggregate Stackelberg Equilibrium (SE) solution. Finally, simulation
interference and jamming increases with the value of results were presented to validate the effectiveness of the
Rayleigh fading growing. learning solution.
In Fig. 9, performance evaluation is presented with
Acknowledgements This work was supported in part by the Natural
different types of jamming attacks. For the proposed HLA, Science Foundation for Distinguished Young Scholars of Jiangsu
the type of jamming attack is smart, which can adjust its Province under Grant BK20160034, in part by the National Science
strategies adaptively to maximize its damage. For com- Foundation of China under Grant 61631020, Grant 61671473, Grant
61401508, and Grant 61401505, in part by Jiangsu Provincial Natural
parison, we present the performance for random jammer
Science Foundation of China Grant BK20130069, and Grant
and no jammer scenario. In the random jammer scenario, BK20151450, and in part by the Open Research Foundation of Sci-
the random jammer chooses one channel randomly at a ence and Technology in Communication Networks Laboratory.

123
Wireless Netw

References 19. Song, T., Stark, W. E., Li, T., & Tugnait, J. K. (2016). Optimal
multiband transmission under hostile jamming. IEEE Transac-
1. International Telecommunication Union (2015). Technical and tions on Communications, 64(9), 4013–4027.
operational principles for HF sky-wave communication stations 20. Hanawal, M. K., Abdel-Rahman, M. J., & Krunz, M. (2014).
Game theoretic anti-jamming dynamic frequency hopping and
to improve the man-made noise HF environment(ITU-R 258/5).
ITU. rate adaptation in wireless systems. In Proceedings of the WiOpt
2. Zou, Y., Zhu, J., Wang, X., & Hanzo, L. (2016). A survey on Conference (pp. 247–254).
wireless security: Technical challenges, recent advances, and 21. Hanawal, M. K., Abdel-Rahman, M. J., & Krunz, M. (2016).
Joint adaptation of frequency hopping and transmission rate for
future trends. Proceedings of the IEEE, 104(9), 1727–1765.
3. Sagduyu, Y. E., Berry, R. A., & Ephremides, A. (2011). Jamming anti-jamming wireless systems. IEEE Transactions on Mobile
games in wireless networks with incomplete information. IEEE Computing, 15(9), 2247–2259.
Communications Magazine, 49(8), 112–118. 22. Abdel-Rahman, M. J., & Krunz, M. (2014). Game-theoretic
4. Chen, C., Song, M., Xin, C., & Backens, J. (2013). A game- quorum-based frequency hopping for anti-jamming rendezvous in
theoretical anti-jamming scheme for cognitive radio networks. DSA networks. In Proceedings of the IEEE DYSPAN (pp.
IEEE Network, 27(3), 22–27. 248–258).
5. Zhang, L., Guan, Z., & Melodia, T. (2016). United against the 23. Sun, Y., Wang, J., Sun, F., & Zhang, J. (2016). Energy-aware
enemy: anti-jamming based on cross-layer cooperation in wire- joint user scheduling and power control for two-tier femtocell
less networks. IEEE Transactions on Wireless Communications, networks: A hierarchical game approach. IEEE Systems Journal,.
15(8), 5733–5747. doi:10.1109/JSYST.2016.2580560.
24. Kang, X., Zhang, R., & Motani, M. (2012). Price-based resource
6. Zhu, H., Fang, C., Liu, Y., et al. (2016). You can jam but you
can’t hide: defending against jamming attacks for Geo-location allocation for spectrum-sharing femtocell networks: A Stackel-
database driven spectrum sharing. IEEE Journal on Selected berg game approach. IEEE Journal on Selected Areas in Com-
Areas in Communications,. doi:10.1109/JSAC.2016.2605799. munications, 30(3), 538–549.
7. Pietro, R. D., & Oligeri, G. (2013). Jamming mitigation in cog- 25. Yang, D., Zhang, J., Fang, X., & Richa, A., et al. (2012). Optimal
nitive radio networks. IEEE Network, 27(3), 10–15. transmission power control in the presence of a smart jammer. In
8. Wu, Y., Wang, B., Liu, K., & Clancy, T. (2012). Anti-jamming Proceedings of the IEEE Globecom (pp. 5506–5511).
games in multi-channel cognitive radio networks. IEEE Journal 26. Yang, D., Xue, G., Zhang, J., Richa, A., & Fang, X. (2013).
on Selected Areas in Communications, 30(1), 4–15. Coping with a smart jammer in wireless networks: A stackelberg
9. Han, Z., Niyato, D., Saad, W., Basar, T., et al. (2012). Game game approach. IEEE Transactions on Wireless Communications,
theory in wireless and communication networks. Cambridge: 12(8), 4038–4047.
27. Xiao, L., Chen, T., Liu, J., & Dai, H. (2015). Anti-jamming
Cambridge University Press.
10. Xu, Y., Wang, J., Wu, Q., et al. (2015). A game-theoretic per- transmission stackelberg game with observation errors. IEEE
spective on self-organizing optimization for cognitive small cells. Communications Letters, 19(6), 949–952.
IEEE Communications Magazine, 53(7), 100–108. 28. Jia, L., Yao, F., Sun, Y., et al. (2016). Bayesian Stackelberg game
for anti-jamming with incomplete information. IEEE Communi-
11. Sun, Y., Wu, Q., Wang, J., Xu, Y., & Anpalagan, A. (2016).
Veracity: Overlapping coalition formation based double auction cations Letters, 20(10), 1991–1994.
for heterogeneous demand and spectrum reusability. IEEE 29. Tang, X., Ren, P., Wang, Y., et al. (2015). Securing wireless
Journal on Selected Areas in Communications, 34(10), transmission against reactive jamming: A Stackelberg game
2690–2705. framework. In Proceedings of the IEEE GLOBECOM (pp. 1–6).
12. Shao, H., Sun, Y., Zhao, H., et al. (2016). Locally cooperative 30. Xiao, L., Xie, C., Chen, T., et al. (2016). Mobile offloading game
traffic-offloading in multi-mode small cell networks via potential against smart attacks. In Proceedings of the IEEE INFOCOM (pp.
games. Transactions on Emerging Telecommunications Tech- 403–408).
nologies, 27(7), 968–981. 31. Xiao, L., Xie, C., Chen, T., et al. (2016). A mobile offloading
13. Sun, Y., Wang, J., et al. (2016). Local altruistic coalition for- game against smart attacks. IEEE Access, 4, 2281–2291.
mation game for spectrum sharing and interference management 32. Wu, Q., Xu, Y., Wang, J., et al. (2013). Distributed channel
selection in time-varying radio environment: Interference miti-
in hyper-dense cloud-RANs. IET Communications, 10(15),
1914–1921. gation game with uncoupled stochastic learning. IEEE Transac-
14. Sharma, R. K., & Rawat, D. B. (2015). Advances on security tions on Vehicular Technology, 62(9), 4524–4538.
threats and countermeasures for cognitive radio networks: A 33. Zheng, J., Cai, Y., Yang, W., et al. (2013). A fully distributed
survey. IEEE Communications Surveys & Tutorials, 17(2), algorithm for dynamic channel adaptation in canonical commu-
1023–1043. (Second Quarter) . nication networks. IEEE Wireless Communications Letters, 2(5),
15. Wang, B., Wu, Y., Liu, K., & Clancy, T. (2011). An anti-jam- 491–494.
ming stochastic game for cognitive radio networks. IEEE Journal 34. Zheng, J., Cai, Y., Xu, Y., & Anpalagan, A. (2014). Distributed
on Selected Areas in Communications, 29(4), 877–889. channel selection for interference mitigation in dynamic envi-
16. Li, H., & Han, Z. (2010). Dogfight in spectrum: Combating pri- ronment: A game theoretic stochastic learning solution. IEEE
mary user emulation attacks in cognitive radio systems, part I: Transactions on Vehicular Technology, 63(9), 4757–4762.
35. Chen, X., Zhang, H., Chen, T., & Lasanen, M.: Improving energy
Known channel statistics. IEEE Transactions on Wireless Com-
munications, 9(11), 3566–3577. efficiency in green femtocell networks: a hierarchical reinforce-
17. Oskoui, M. G., Khorramshahi, P., & Salehi, T. A. (2016). Using ment learning framework. In Proceedings of the IEEE ICC 2013
game theory to battle jammer in control channels of cognitive (pp. 2241–2245).
36. Sun, Y., Shao, H., Liu, X., et al. (2015). Traffic offloading in two-
radio ad hoc networks. In Proceedings of the IEEE ICC (pp. 1–5).
18. El-Bardan, R., Brahma, S., & Varshney, P. K. (2016). Strategic tier multi-mode small cell networks over unlicensed bands: A
power allocation with incomplete information in the presence of hierarchical learning framework. KSII Transactions on Internet
jammer. IEEE Transactions on Communications, 64(8), and Information Systems, 9(11), 4291–4310.
3467–3479. 37. Monderer, D., & Shapley, L. S. (1996). Potential games. Games
and Economic Behavior, 14(1), 124–143.

123
Wireless Netw

38. Sastry, P., Phansalkar, V., & Thathachar, M. (1994). Decentral- Youming Sun received his B.S.
ized learning of Nash equilibria in multi-person stochastic games degree in electronic and infor-
with incomplete information. IEEE Transactions on Systems, mation engineering from Yan-
Man, and Cybernetics, 24(5), 769–777. shan University, Qinhuangdao,
39. Zhong, W., Xu, Y., Tao, M., et al. (2010). Game theoretic mul- China, in 2010 and M.S. degree
timode precoding strategy selection for MIMO multiple access from National Digital Switching
channels. IEEE Signal Processing Letters, 17(6), 563–566. System Engineering and Tech-
40. Zheng, J., Cai, Y., Lu, N., et al. (2015). Stochastic game-theoretic nological Research Center
spectrum access in distributed and dynamic environment. IEEE (NDSC), Zhengzhou, China, in
Transactions on Vehicular Technology, 64(10), 4807–4820. 2013, respectively. He is cur-
41. Xu, Y., Wang, J., Wu, Q., et al. (2012). Opportunistic spectrum rently working toward the Ph.D.
access in unknown dynamic environment: A game-theoretic degree in communications and
stochastic learning solution. IEEE Transactions on Wireless information system in NDSC.
Communications, 11(4), 1380–1391. His research interests include
42. Xu, Y., Xu, Y., & Anpalagan, A. (2015). Database-assisted resource allocation in small cell
spectrum access in dynamic networks: A distributed learning networks, cognitive radio networks, game theory and statistical
solution. IEEE Access, 3, 1071–1078. learning. He currently serves as a regular reviewer for many technical
43. Watkins, C. J. C. H., & Dayan, P. (1992). Q-learning. Machine journals, including IEEE Journal on Selected Areas in Communica-
Learning, 8, 279–292. tions, IEEE Systems Journal, IEEE Access, Wireless Networks, IET
44. Kianercy, A., & Galstyan, A. (2012). Dynamics of Boltzmann Communications, KSII Transaction on Internet and Information
Q-learning in two-player two-action games. Physical Review E, Systems. He has acted as Technical Program Committees (TPC)
85(4), 1–10. member for IEEE International Conference on Wireless Communi-
cations and Signal Processing 2015 (WCSP 2015).

Fuqiang Yao received his M.S. Yuhua Xu received his B.S.


degree and Ph.D. degree in degree in Communications
Communications and Electronic Engineering, and Ph.D. degree
Systems from Xidian Univer- in Communications and Infor-
sity, Xi’an, China, in 1990 and mation Systems from College of
1993, respectively. Currently, Communications Engineering,
he is a research fellow at the PLA University of Science and
Nanjing Telecommunication Technology, in 2006 and 2014
Technology Institute, China. respectively. He has been with
His current research interests College of Communications
include wireless communica- Engineering, PLA University of
tions and communication anti- Science and Technology since
jamming. 2012, and currently as an
Assistant Professor. His
research interests focus on
opportunistic spectrum access, learning theory, game theory, and
distributed optimization techniques for wireless communications. He
Luliang Jia received his B.S.
has published several papers in international conferences and reputed
degree in communications
journals in his research area. He served as Associate Editor for Wiley
engineering from Lanzhou
Transactions on Emerging Telecommunications Technologies and
Jiaotong University, Lanzhou,
KSII Transactions on Internet and Information Systems, and the
China, in 2011 and M.S. degree
Guest Editor of Special Issue on ‘‘The Evolution and the Revolution
in Communications and Infor-
of 5G Wireless Communication Systems’’ for IET Communications.
mation Systems from College of
In 2011 and 2012, he was awarded Certificate of Appreciation as
Communication Engineering,
Exemplary Reviewer for the IEEE Communications Letters. He was
PLA University of Science and
selected to receive the IEEE Signal Processing Society’s (SPS) 2015
Technology, Nanjing, China, in
Young Author Best Paper Award, and the Funds for Distinguished
2014, respectively. He is cur-
Young Scholars of Jiangsu Province in 2015.
rently working toward the Ph.D.
degree in College of Commu-
nication Engineering, PLA
University of Science and
Technology. His current research interests include game theory,
learning theory and communication anti-jamming technology.

123
Wireless Netw

Shuo Feng (S’15) received the 2015 and WCCN 2015. He was a co-recipient of the Best Paper
B.Sc. degree (Hons.) in electri- Award from IEEE VTC 2014-Fall.
cal engineering from the
University of Electronic Science Yonggang Zhu received his
and Technology of China, B.S. degree in electrical engi-
Chengdu, China, in 2011, and neering and Ph.D. degree in
the M.Sc. degree in communi- communications engineering
cations and information systems from College of Communica-
from the College of Communi- tion Engineering, PLA Univer-
cations Engineering, PLA sity of Science and Technology,
University of Science and Nanjing, China, in 2004 and
Technology, Nanjing, China, in 2009, respectively. From 2011
2014. He is currently pursuing to 2015, he was a postdoctoral
the Ph.D. degree in computa- fellow at Nanjing Telecommu-
tional science and engineering nication Technology Research
at McMaster University, Hamilton, ON, Canada. His research inter- Institute of CESEC. His current
ests include cognitive radio networks, machine learning, cognitive research interests include wire-
dynamic system, and information geometry. He is an invited reviewer less communication anti-jam-
for several journals such as IEEE Journal on Selected Areas in ming, compressive sampling and adaptive signal processing.
Communications, IEEE Communications Letters, etc. He served as
the Technical Program Committee (TPC) member of IEEE WCSP

123

S-ar putea să vă placă și