Sunteți pe pagina 1din 4

This article has been accepted for publication in a future issue of this journal, but has not been

fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/LCOMM.2018.2871442, IEEE
Communications Letters
1

Deep Learning Based Channel Estimation for Wireless Energy Transfer


Jae-Mo Kang, Chang-Jae Chun, and Il-Min Kim, Senior Member, IEEE

Abstract—We propose a deep learning based channel esti- [5]–[7], however, the actual channel coefficients themselves
mation technique for wireless energy transfer. Specifically, we were not estimated at the ET; instead, some other values
develop a channel learning scheme using the deep autoencoder, were estimated. Specifically, in [5], [6], the Gram (or one-
which learns the channel state information (CSI) at the energy
transmitter based on the harvested energy feedback from the point sample correlation) matrix of the channel coefficients
energy receiver, in the sense of minimizing the mean square was estimated and, in [7], the phase differences between the
error (MSE) of the channel estimation. Numerical results demon- channel coefficients were estimated.
strate that the proposed scheme learns the CSI very well and To the best of our knowledge, the problem of estimating
significantly outperforms the conventional scheme in terms of the (downlink) channel coefficients using the harvested energy
the channel estimation MSE as well as the harvested energy.
feedback information has not been studied in the literature.
Index Terms—Autoencoder, channel estimation, deep learning, Estimation-theoretically, this problem can be considered as
wireless energy transfer. the phase retrieval problem [8], [9]. The well-known and
popular technique to tackle this problem is the Gerchberg-
I. I NTRODUCTION Saxton algorithm developed in [9], which has recently drawn
Wireless energy transfer (WET) through the radio frequency renewed interest in the areas of machine learning, signal
(RF) wave has attracted significant interest in the literature processing, and communications [8]. Unlike the techniques
[1]–[7]. The critical limitations for the WET are the short of [5]–[7], this technique can be directly used for the WET
operating distance and the low energy transfer efficiency due to system to estimate the channel coefficients at the ET based
the propagation attenuation. To overcome these limitations, the on the harvested energy feedback information from the ER.
multi-antenna technique such as the energy beamforming can However, the performance of the Gerchberg-Saxton algorithm
be used [1], [2]. However, the performance of this technique is not satisfactory due to the global phase ambiguity issue.
heavily depends on the knowledge of channel state informa- Furthermore, a number of iterations are required to compute
tion (CSI) at the energy transmitter (ET) [3]–[7]. Therefore, the channel estimate.
acquiring the CSI at the ET is a practically very important To the best of our knowledge, all the critical issues above
problem for WET systems. have not been addressed in the literature. This motivated
The practical RF power (or energy) harvesting circuitry is our work. In this paper, we study the channel estimation
composed of only rectifiers, and thus, it has strictly limited problem for the WET system, which is very difficult to solve
(essentially, no) capability of the RF-to-baseband conversion analytically or even numerically due to both nonlinearity and
and the baseband processing [3]–[7]. Therefore, the traditional nonconvexity. To intelligently and effectively address the diffi-
CSI acquisition approach at the transmitter, which requires culty, we exploit the deep learning technique, which is inspired
the channel estimation at the receiver with complicated RF-to- by the fact that the deep neural network is a universal function
baseband conversion and baseband processing prior to the CSI approximator and has succeeded in solving many nonlinear
feedback, is almost impossible to realize at the practical energy and nonconvex problems in various areas [10]. To the best of
receiver (ER) [3]–[7]. To overcome this critical challenge, for our knowledge, our work is the first to utilize the deep learning
the WET system, several techniques were developed in the for the channel estimation in the WET system. We develop a
recent literature to acquire the CSI at the ET [3]–[7]. In [3], channel learning scheme using the deep autoencoder, in which
[4], assuming the channel reciprocity, the CSI was estimated the channel coefficients are autonomously learned at the ET
at the ET using the uplink pilots sent from the ER. Clearly, based on the harvested energy feedback from the ER and
the techniques developed in [3], [4] are applicable only to the pilot signals are adaptively optimized, such that the mean
the time-division duplex (TDD) scenario where the downlink square error (MSE) of the channel estimation is minimized. In
and uplink channels are reciprocal. However, for the general sharp contrast to the Gerchberg-Saxton method, the proposed
scenario such as in the frequency-division duplex (FDD) case, scheme learns the channel parameters very well by resolving
it is generally not possible to use the techniques of [3], [4], the phase ambiguity issue. Also, after the offline training,
because the downlink and uplink channels are different [5]– our proposed scheme can compute the channel estimate very
[7]. To address this issue, in [5]–[7], the CSI was estimated at efficiently without any iteration.
the ET based on the harvested energy feedback information Notation: We use E[·], (·)T , | · |, k · k, ∠(·), and ∇ to
from the ER. This approach does not require the channel denote the expectation, the transpose, the absolute value or
reciprocity, and thus, is effective and applicable even when the cardinality, the Euclidean norm, the phase, and the gradient
downlink and uplink channels are completely independent. In operator, respectively.
This work has been supported by Natural Sciences and Engineering II. S YSTEM M ODEL AND P ROBLEM F ORMULATION
Research Council (NSERC).
J.-M. Kang, C.-J. Chun, and I.-M. Kim are with the Department of A. System Model
Electrical and Computer Engineering, Queen’s University, Kingston, ON K7L
3N6, Canada (e-mail: jaemo.kang@queensu.ca; changjae.chun@queensu.ca; We consider a WET system composed of an ET and an
ilmin.kim@queensu.ca). ER. The ET is equipped with M antennas and the ER is

1089-7798 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/LCOMM.2018.2871442, IEEE
Communications Letters
2

equipped with a single antenna. Let h = [h1 , · · · , hM ]T = nonconvexity. Also, it is generally not possible to use the linear
[r1 ejφ1 , · · · , rM ejφM ]T denote the M ×1 channel vector from estimation technique because the observations {y(n)} of (1)
the ET to the ER, where hm denotes the channel coefficient are all nonlinear functions of the variable h to be estimated,
from the mth antenna of the ET to the ER. Also, rm = |hm | and thus, the channel estimation is a nonlinear estimation
and φm = ∠(hm ) denote the magnitude and the phase of problem. One might try to use the Bayes estimation techniques
hm , respectively. The channel is constant within each coherent for the channel estimation. However, these techniques require
block. In each block, there are two phases: the channel very high complexity to compute the channel estimate. Also,
estimation phase, followed by the WET phase. the analysis becomes very difficult due to the the non-closed
1) Channel Estimation Phase: In this phase, the ET sends form of the channel estimate. In order to overcome these
the pilot signals to the ER over N time slots. Let x(n) = challenges/shortcomings intelligently and effectively, and to
[x1 (n), · · · , xM (n)]T denote the M × 1 pilot signal vector of carry out the challenging optimization of (P1) very efficiently,
the ET at the nth time slot, n ∈ {1, 2, · · · , N }, where xm (n) we exploit the deep learning technique in the next section.
is the pilot symbol of the mth antenna at the ET. The power
2
constraint of the pilot signal is given by kx(n)k ≤ P , ∀n, III. D EEP L EARNING BASED C HANNEL E STIMATION
where P denotes the maximum transmit power at the ET. At
A. Conventional Scheme: Gerchberg-Saxton Algorithm
the nth time slot, the amount of energy (or interchangeably,
the power assuming the unit time) harvested From the estimation-theoretic perspective, estimating the
T by 2the power channel vector h from the harvested energy feedback
harvesting circuit at the ER is given by ζ x (n)h , which is
measured by the energy (or power) meter, where 0 < ζ ≤ 1 is information {y(n)} can be considered as the phase
the energy (or power) conversion efficiency. The ER then feeds retrieval problem [8], [9]. The most well-known and
back to the ET the amount of harvested energy through the popular approach to tackle this problem is the Gerchberg-
feedback channel at the cost of consuming additional energy Saxton algorithm [9], which has recently drawn renewed
and certain amounts of time and bandwidth. We assume the interest in the literature, because this scheme can be
error-free feedback link as in [5]–[7]. Thus, the harvested considered as a machine learning (or data clustering)
energy feedback signal received by the ET is given by technique [8]. In the Gerchberg-Saxton algorithm, the
2 estimate of h is computed
−1 P iteratively as follows: ĥ(i+1) =
y(n) = ζ xT (n)h + z(n), n = 1, · · · , N,

j∠(x (n)ĥ(i) )
T
(1) PN T N
n=1 x(n)x (n) n=1 x(n)|y(n)|e ,
where z(n) denotes the noise term, which accounts for the where ĥ(i) is the estimate of h at the ith iteration. However,
impairments in the measurement and quantization processes in the Gerchberg-Saxton algorithm, a considerable number
such as the antenna noise, the rectifier noise, the measurement of iterations are required, which might be difficult to carry
error, and the feedback quantization error. In this paper, the out in the stringent real-time applications. Furthermore, there
goal of the channel estimation is to estimate the channel always exists the global phase ambiguity (i.e., ĥ(i) ejψ for
vector h at the ET based on the harvested energy feedback any ψ yields the same observations {y(n)} as those of ĥ(i) ),
information from the ER (i.e., {y(n)}). which severely degrades the channel estimation performance.
2) WET Phase: In this phase, the ET sends the energy
signal s to the ER. If the channel vector h were perfectly B. Proposed Channel Estimation with Deep Autoencoder
known, the ET could deliver the maximum amount of energy
The structure of the proposed learning scheme is presented
to the ER √
via the optimal energy beamforming technique [1]–
in Fig. 1, which is based on the deep autoencoder. The similar
[6]: s∗ = khk
Ph
µ, where µ is the energy-carrying symbol with
structure was studied in the literature, e.g., in [11]. It consists
unit power. However, this approach is not possible in our case,
of two parts: the encoder in Fig. 1(b) and the decoder in Fig.
because the knowledge of the channel vector h is imperfect
1(c). The motivation of adopting the autoencoder is to use the
due to the channel estimation. Therefore, in this paper, the
encoder and the decoder for the two different purposes: the
energy signal is determined based√ on the channel estimate
P ĥ pilot design and the channel learning, respectively.
(denoted by ĥ) as follows: s = kĥk µ. The harvested energy
1) Encoder: For the optimization of (P1) (i.e., joint channel
T
h|2
T 2
at the ER is thus given by Q = ζ s h = ζPk|ĥ ĥk 2
. learning and pilot design), how to construct the encoder is
very important; but, is generally challenging, because the
B. Problem Formulation neural network must be constructed such that the pilots are
optimizable. One might try to take the naive autoencoder
In this paper, we aim to find the channel estimator ĥ =
F {y(n)}; {x(n)} at the ET, and at the same time, we aim approach to construct the encoder such as in [11]. Specifically,
to design the pilot signals {x(n)} in the sense of minimizing the encoder might be constructed using a neural network with
the MSE of the channel estimation as follows: arbitrary weights and arbitrary activation functions to learn an
h 2 i arbitrary feature of the channel h. However, it turns out that
(P1) : min E h − ĥ s.t. ĥ = F (y; X) (2) such naive approach is never effective for channel learning, nor
F (·),X
pilot design. To address this issue, in our paper, we construct
T T
where X = [x(1), · · · , x(N )] and y = [y(1), · · · , y(N )] . the encoder using a two-layer (i.e., shallow) neural network
Note that it is generally very difficult to solve the problem (P1) with the specific weights and activation function function
analytically or even numerically due to the nonlinearity and to learn the specific feature. The structure of the proposed

1089-7798 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/LCOMM.2018.2871442, IEEE
Communications Letters
3

(a) Structure of the proposed deep autoencoder (b) Structure of the encoder (c) Structure of the decoder
Fig. 1. Structure of the proposed channel learning scheme

encoder is presented in Fig. 1(b). Our approach is motivated by (ReLU) as the activation function at the hidden nodes, i.e.,
the fact that the physical mechanism for the pilot transmission ϕ` (v) = max{v, 0}, ` = 1, · · · , L, due to its benefits of
and harvested energy feedback in (1) can be exactly modeled avoiding the gradient vanishing problem, faster learning, and
by the neural network in Fig. 1(b). computational efficiency [10].
To model the mechanism of (1), the proposed encoder In the proposed learning scheme, the operation Genc (·) of
takes the channel h as the input and the harvested energy the encoder is first performed, and then, the operation Gdec (·)
feedback information {y(n)} (rather than arbitrary feature) of the decoder is performed. Thus, the operation   can be
as the output. To learn {y(n)} given h, we take effective mathematically written as ĥ = Gdec Genc h; X ; θ , where
and innovative approaches: we propose to use the pilots X is the matrix of the pilot symbols {xm (n)} (i.e., the weights
{xm (n)} as the weights of the encoder (rather than arbitrary of the encoder) and θ is the set of the parameters (i.e., the
weights), and we propose to use a new activation function, weights and biases) of the decoder.
namely, the scaled magnitude square function φ(v) = ζ|v|2 , 3) Training Procedure: The proposed channel learning
at the output nodes of the encoder (rather than arbitrary scheme has to be trained to carry out the optimization of (P1)
activation function). The proposed encoder consists the two (i.e., to minimize the MSE). To achieve this goal, it is very
layers: the input layer with M nodes (i.e., neurons) and important how the loss function is selected, because the loss
the output layer with N nodes. The operation of the nth function critically affects the performance to be optimized. To
output node is as follows. First, the weighted sum of the M address this issue, we use the cost function of (P1) (i.e., the
inputs is computed. Then the result is fed into the activation MSE) to select the loss function. Considering the operation
function φ(·). Finally, the noise z(n) is added. Overall, the of the proposed learning scheme and using the sample mean
operation of the encoder can be mathematically written approach, the cost function of (P1) can be computed as
 P M 2 N as 1 X
y = φ Xh + z = ζ m=1 xm (n)hm + zm (n) n=1 , h − Gdec Genc h; X ; θ 2 (3)
  
L h, ĥ; X, θ =
where z = [z(1), · · · , z(N )]T . |T |
h∈T

2) Decoder: We construct the decoder using a deep neural where T denotes the set of training samples. Therefore, we
network to represent (or approximate) the channel estimator select the loss function as in (3).
F(·). The structure of the proposed decoder is presented in To minimize the loss function L(·) of (3), in this paper, we
Fig. 1(c). For the channel learning given the harvested energy optimize the parameters X and θ according to the gradient
feedback information, the proposed decoder takes the values of descent method. Since there is no constraint for the weights
{y(n)} predicted by the encoder as the input and the channel and biases of the decoder, those parameters can be updated by
estimate ĥ (both the magnitudes and phases of {ĥm }) as performing the gradient descent step as follows:
the output. Also, there are an input layer with N nodes, an

θ ← θ − α∇θ L h, ĥ; X, θ (4)
output layer with M nodes, and L hidden layers, where the `th
hidden layer has K` nodes. The operation of each of hidden where α > 0 is the learning rate that determines the step size
and output nodes is as follows. First, the weighted sum of of the update. However, the naive gradient descent method of
the inputs is computed and a bias is added. Then the result (4) cannot be directly used to update the parameters of the
is fed into an activation function. Overall, the operation of encoder (i.e., the pilots), because there are the transmit power
constraints imposed on the pilot signals, i.e., kx(n)k2 ≤ P ,
the proposed decoder can be mathematically  writtenas ĥ =  ∀n. To address this issue and to satisfy the power constraints,
ϕo Wo ϕL WL ϕL−1 · · · ϕ1 W1 y + b1 · · · + bL + bo ,
where W` and b` (or Wo and bo ) are the matrix of the we propose to update the values of {x(n)} using the projected
weights and the vector of the biases in the `th hidden layer (or gradient descent method1,2 as follows:
h i
the output layer), respectively. Also, ϕ` (·) and ϕo (·) denote x(n) ← ΠX x(n) − α∇x(n) L h, ĥ; X, θ , ∀n, (5)
the activations functions at the nodes of the `th hidden layer
1 Note that the naive gradient descent method solves unconstrained min-
and the output layer, respectively. It is generally possible
imization problems, whereas the projected gradient descent method solves
to use any activation function at each of the hidden and constrained minimization problems.
output nodes. In our proposed scheme, the linear (or identity) 2 In the literature for the image recognition, e.g., in [12], the projected

function is used as the activation function at the output gradient descent method was used for the certain purposes. To the best of our
knowledge, however, the projected gradient descent method has never been
nodes, i.e., ϕo (v) = v. Also, we use the rectified linear unit utilized in the literature of deep learning for communications including [11].

1089-7798 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/LCOMM.2018.2871442, IEEE
Communications Letters
4


where X = v : kvk2 ≤ P denotes the power constraint

Average harvested energy [J]


7

set and ΠX [v] = arg minu∈X ku − vk2 is the projection of v


101 6
onto the set X . In the following, we derive the expression of
the projection operator ΠX [·] in closed form. 5

MSE
Lemma 1: The projection of v onto the √ set X is given by
2 Pv 100 Proposed, (P1) 4
ΠX [v] = v if kvk ≤ P and ΠX [v] = kvk if kvk2 > P . Conventional [9]
2
Proof: When kvk ≤ P , since v = arg minu∈X ku − 3
Proposed, (P1)
vk2 , we have ΠX [v] = v. When kvk2 > P , us- Conventional [9]
2
0 20 40 60 80 100 0 20 40 60 80 100
ing the Lagrange duality method, we have v/(1 + λ) = Number of gradient steps Number of gradient steps
arg minu∈X ku − vk2 , where λ is the Lagrange multiplier. Fig. 2. Learning performance of the proposed scheme and the conventional
2
From kv/(1 + λ)k √ = P , we have λ = kvk/P − 1, which scheme.
101
yields ΠX [v] = P v/kvk.

Average harvested energy [J]


7.0
Note that the training procedure above can be carried out
Proposed, (P1) 6.5
offline as follows: (i) generate samples of the channel h and Conventional [9]
noises {z(n)} according to the channel and noise statistics, 6.0
Proposed, (P1)

MSE
100
respectively; (ii) optimize the parameters θ and the pilots 5.5 Conventional [9]
{xm (n)} according to (4) and (5), respectively. During the 5.0

training procedure, therefore, the proposed encoder learns how 4.5


the pilots {xm (n)} must be designed and how the harvested
10−1 4.0
energy feedback information {y(n)} must be predicted based 0.0 2.5 5.0 7.5 10.0 12.5 15.0 17.5 20.0 0.0 2.5 5.0 7.5 10.0 12.5 15.0 17.5 20.0
SNR [dB] SNR [dB]
on the channel and noise samples. Also, the proposed decoder
Fig. 3. Channel estimation MSE and average harvested energy versus the
learns how the channel h must be estimated from the the SNR.
values of {y(n)} learned by the encoder.
After the training, the optimized weights {xm (n)} of the
proposed encoder Genc (·) can be used as the pilots in practice. scheme, the proposed scheme learns the channel coefficients
Interestingly enough, it turns out that the pilots designed by very well by resolving the phase ambiguity issue.
our proposed scheme are generally nonorthogonal, which is V. C ONCLUSION
in sharp contrast to the existing result such as in [6]. Once the
We proposed the deep autoencoder based channel learning
harvested energy is fed back from the ER, the ET can compute
scheme for the WET system to minimize the MSE of the
the channel estimate ĥ very efficiently using the proposed
channel estimation. The numerical results demonstrated the su-
decoder Gdec (·) without any iteration or any numerical com-
perior performance and effectiveness of the proposed scheme.
putation, which is in sharp contrast to the Gerchberg-Saxton
algorithm and the Bayes estimation techniques. R EFERENCES
[1] R. Zhang and C. K. Ho, “MIMO broadcasting for simultaneous wireless
IV. N UMERICAL R ESULTS information and power transfer," IEEE Trans. Wireless Commun., vol. 12,
no. 5, pp. 1989–2001, May 2013.
In the simulations, we consider a WET system with M = 4, [2] J. Xu, L. Liu, and R. Zhang, “Multiuser MISO beamforming for simul-
N = 8, and P = 2 W. Also, we consider the Rician fading taneous wireless information and power transfer," IEEE Trans. Signal
model. The line-of-sight channel components are generated Process., vol. 62, no. 18, pp. 4798–4810, Sep. 2014.
[3] Y. Zeng and R. Zhang, “Optimized training design for wireless energy
from the Nakagami-m distribution with the shape parameter transfer," IEEE Trans. Commun., vol. 63, no. 2, pp. 536–550, Feb. 2015.
of m = 0.5 and the spread parameter of Ω = 1. Also, [4] Y. Zeng and R. Zhang, “Optimized training for net energy maximization
the scattering channel components and the noises {z(n)} are in multi-antenna wireless energy transfer over frequency-selective chan-
nel," IEEE Trans. Commun., vol. 63, no. 6, pp. 2360–2373, Jun. 2015.
generated independently according to the Gaussian distribution [5] J. Xu and R. Zhang, “Energy beamforming with one-bit feedback," IEEE
with zero mean and unit variance. The signal-to-noise ratio Trans. Signal Process., vol. 62, no. 20, pp. 5370–5381, Oct. 2014.
(SNR) is σP2 . In the proposed scheme, we set L = 5, K` = 50, [6] K. W. Choi et al., “Received power-based channel estimation for energy
beamforming in multiple-antenna RF energy transfer system," IEEE
∀`, and α = 0.01. We train our proposed scheme using 2×105 Trans. Signal Process., vol. 65, no. 6, pp. 1461–1476, Mar. 2017.
training samples. [7] S. Abeywickrama, T. Samarasinghe, C. K. Ho, and C. Yuen, “Wireless
Fig. 2 compares the performance of the proposed scheme energy beamforming using received signal strength indicator feedback,"
IEEE Trans. Signal Process., vol. 66, no. 1, pp. 224–235, Jan. 2018.
and the conventional scheme. It can be observed that the [8] Y. Sun, P. Babu, and D. Palomar, “Majorization-minimization algorithms
channel estimation MSE of the proposed scheme (i.e., the cost in signal processing, communications, and machine learning," IEEE
function of (P1)) decreases as the number of gradient steps Trans. Signal Process., vol. 65, no. 3, pp. 794–816, Feb., 2017.
[9] R. W. Gerchberg and W. O. Saxton, “A practical algorithm for the
increases, meaning that the learning performance improves. determination of phase from image and diffraction plane pictures," Optik,
After about 4 (or 6) iterations, the proposed scheme starts to vol. 35, pp. 237–250, 1972.
yields lower MSE (or larger amount of harvested energy) than [10] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning," Nature, vol. 521,
pp. 436–444, May 2015.
the conventional scheme. [11] M. Kim, N.-I. Kim, W. Lee, and D.-H. Cho, “Deep learning aided
In Fig. 3, the channel estimation MSE and the average SCMA," IEEE Commun. Lett., vol. 22, no. 7, pp. 720–723, Apr. 2018.
harvested energy are shown versus the SNR. One can see that [12] J. Chorowski and J. M. Zurada, “Learning understandable neural net-
works with nonnegative weight constraints," IEEE Trans. Neural Netw.
the proposed scheme considerably outperforms the conven- Learn. Syst., vol. 26, no. 1, pp. 62–69, Jan. 2015.
tional scheme. This result shows that, unlike the conventional

1089-7798 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

S-ar putea să vă placă și