Lecture 2

MIMO Communication Systems
Lecture 2
The Capacity of Wireless Channels
Prof. Chun-Hung Liu

Dept. of Electrical and Computer Engineering
National Chiao Tung University
Spring 2017
2017/3/7 Lecture 2: Capacity of Wireless Channels 1

Outline
Capacity of SISO Channels (Chapter 4 in Goldsmiths Book)
Capacity in AWGN
Capacity of Flat-Fading Channels
Channel and System Model
Channel Distribution Information (CDI) Known
Channel Side Information at Receiver
Channel Side Information at Transmitter and Receiver
Capacity of MIMO Channels (Chapter 10.1~10.3 in

Goldsmiths Book)

Introduction to Shannon Channel Capacity
The fundamental problem of communication is that of reproducing at one
point, either exactly or approximately, a message selected at another point.
~ Claude Shannon
Channel
Channel Capacity :
C = B log2 (1 + SN R)

Consider a discrete-time additive white Gaussian noise (AWGN) channel
with channel input/output relationship y[i] = x[i] + n[i] , where x[i] is the
channel input at time i, y[i] is the corresponding channel output,
and n[i] is a white Gaussian noise random process.
Assume a channel bandwidth B and transmit power P.
The channel SNR, the power in x[i] divided by the power in n[i] , is
constant and given by = P/(N0 B) , where N0 is the power spectral
density of the noise.
The capacity of this channel is given by Shannons well-known formula
C = B log2 (1 + )
where the capacity units are bits/second (bps).
Shannons coding theorem proves that there exists a code that achieves
data rates arbitrarily close to capacity with arbitrarily small probability
of bit error.
Shannon proved that channel capacity equals the mutual information of
the channel maximized over all possible input distributions:
X
p(x, y)
C = max I(X; Y ) = max p(x, y) log . (4.3)
p(x) p(x)
x,y
p(x)p(y)
For the AWGN channel, the maximizing input distribution is Gaussian,

which results in the channel capacity C = B log2 (1 + ) .
At the time that Shannon developed his theory of information, data rates
over standard telephone lines were on the order of 100 bps. Thus, it was
believed that Shannon capacity, which predicted speeds of roughly 30
Kbps over the same telephone lines, was not a very useful bound for real
systems.
However, breakthroughs in hardware, modulation, and coding
techniques have brought commercial modems of today very close to the
speeds predicted by Shannon in the 1950s.

I. Capacity of SISO Channels

Mutual Information
The converse theorem shows that any code with rate R > C has a
probability of error bounded away from zero.
For a memoryless time-invariant channel with random input x and
random output y, the channels mutual information is defined as
X
p(x, y)
I(X; Y ) = p(x, y) log .
p(x)p(y)
x2X ,y2Y
where the sum is taken over all possible input and output pairs x 2 X and
y 2 Y for X and Y the input and output alphabets.
Mutual information can also be written in terms of the entropy in the
channel output y and conditional output y|x as
where

The AWGN Channel
The most important continuous alphabet channel is the Gaussian
channel depicted in the following figure. This is a time-discrete
channel with output Yi at time i, where Yi is the sum of the input Xi
and the noise Zi .
The noise Zi is drawn i.i.d. from a Gaussian distribution with
variance N. Thus,
Yi = Xi + Z i , Zi N (0, N )
The noise Zi is assumed to be

independent of the signal Xi .
If the noise variance is zero or the
input is unconstrained, the
capacity of the channel is infinite.
Figure: A Gaussian Channel

The AWGN Channel
Assume an average power constraint. For any codeword (x1 , x2 , . . . , xn )
transmitted over the channel, we require that
Xn
1
x2i P
n i=1
Assume that we want to send 1 bit over the channel in one use of the
channel.
Given the powerpconstraint,
p the best that we can do is to send one
of two levels, P or P . The receiver looks at the corresponding
Y received and tries to decide which of the two levels was sent.
Assuming that both levels are equally likely (this would be the case if
we wish to send exactly 1 bit of information), the optimump decoding
p
rule is to decide that P was sent if Y > 0 and decide P was
sent if Y 0 .

Probability of Error in AWGN Channels
The probability of error with such a decoding scheme is

1 p 1 p
Pe = P Y < 0 X = P + P Y > 0 X = P
2 2

1 p p 1 p p
= P Z< P X= P + P Z> P X= P
2 2
h p i p
=P Z> P =1 P/N
where (x) is the cumulative normal function

Z x
1 t2
(x) = p e 2 dt = 1 Q(x).
1 2
Using such a scheme, we have converted the Gaussian channel into
a discrete binary symmetric channel with crossover probability Pe .
Differential Entropy
Let X now be a continuous r.v. with cumulative distribution
F (x) = P[X x]
d
and f (x) = F (x) is the density function.
dx
Let S = {x : f (x) > 0} be the support set. Then
Definition of Differential Entropy h(X) :

Z
h(X) = f (x) log f (x) dx
S
Since we integrate over only the support set, no worries about log 0.

Differential Entropy of Gaussian Distribution
If we have:
2
X N (0, )
Lets compute this in nats.
E[X 2 ]
1
= log2 2e 2 , bits
2
Note: only a function of the variance 2 , not the mean. Why?
So entropy of a Gaussian is monotonically related to the variance.

Gaussian Distribution Maximizes Differential Entropy
Definition: The relative entropy (or KullbackLeibler distance) D(g|| )
between two densities g and is defined by
Z
g(x) (Why?)
D(g|| ) = g(x) log dx 0
(x)
Theorem : Let random variable X have zero mean and variance 2 .
Then h(X) 12 log2 (2e 2 ), with equality iff X N (0, 2 ) .
R
Proof: Let g(x) be any density satisfying x2 g(x)dx = 2 . Let (x) be the
density of a Gaussian random variable with zero mean and variable 2 .
2
Note that log (x) is a quadratic form and it is 2x 2 12 log(2 2 ) . Then
Z
g(x)
D(g|| ) = g(x) log dx
(x)
Z Z
= h(g) g log( (x))dx = h(g) (x) log( (x))dx (Why?)
= h(g) + h( ) 0
Therefore, the Gaussian distribution maximizes the entropy overall
distributions with the same variance.
Capacity of Gaussian Channels
Definition: The information capacity of the Gaussian channel with
power constraint P is
C= max2 I(X; Y ),
f (x):E[X ]P
where I(X; Y ) is called the mutual information between X and Y.

We can calculate the information capacity as follows:
Expanding I(X; Y ) , we have
I(X; Y ) = h(Y ) h(Y |X) = h(Y ) h(X + Z|X)

= h(Y ) h(Z|X) = h(Y ) h(Z),
since Z is independent of X. Now h(Z) = 1

2 log 2eN . Also,
E[Y 2 ] = E[(X + Z)2 ] = E[X 2 ] + 2E[X]E[Z] + E[Z 2 ] = P + N,
since X and Z are independent and E[Z] = 0 .

Capacity in AWGN
Given E[Y 2 ] = P + N, the entropy of Y is bounded by 12 log 2e(P + N )
(the Gaussian distribution maximizes the entropy for a given variance).
Applying this result to bound the mutual information, we obtain
I(X; Y ) = h(Y ) h(Z)

1 1
log 2e(P + N ) log 2eN
2 2

1 P
= log 1 +
2 N
Hence, the information capacity of the Gaussian channel is

1 P
C = max I(X; Y ) = log 1 + ,
E[X ]P
2 2 N
and the maximum is attained when X N (0, P ) .

Bandlimited Channels
The output of a channel can be described as the convolution
Y (t) = [X(t) + Z(t)] c(t),
where X(t) is the signal waveform, Z(t) is the waveform of the white
Gaussian noise, and c(t) is the impulse response of an ideal bandpass
filter, which cuts out all frequencies greater than B.
NyquistShannon Sampling Theorem: Suppose that a function f(t) is
bandlimited to B, namely, the spectrum of the function is 0 for all
frequencies greater than B. Then the function is completely determined
1
by samples of the function spaced 2B seconds apart.
Proof: Let F (!) be the Fourier transform of f (t). Then
Z 1 Z 2B
1 1
f (t) = F (!)ei!t d! = F (!)ei!t d!,
2 1 2 2B

since F (!) is zero outside the band 2B ! 2B . If we consider
1
samples spaced 2B seconds apart, the value of the signal at the sample
points can be written
n Z 2B
1
f = F (!)ei!n/2B d!.
2B 2 2B
The right-hand side of this equation is also the definition of the
coefficients of the Fourier series expansion of the periodic extension of
the function F (!) , taking the interval 2B to 2B as the fundamental
period.
The sample values f ( 2B

n
) determine the Fourier coefficients and, by
extension, they determine the value of F (!) in the interval (2B, 2B).
Consider the function

sin(2Bt) (This function is 1 at t=0 and is 0
sinc(t) = for t=n/2B, n=0)
2Bt
The spectrum of this function is constant in the band (B, B) and is
zero outside this band.
Now define
1
X n n
g(t) = f sinc t
n= 1
2B 2B
From the properties of the sinc function, it follows that g(t) is
bandlimited to B and is equal to f (n/2B) at t = n/2B . Since there is
only one function satisfying these constraints, we must have g(t) = f (t).
This provides an explicit representation of f (t) in terms of its samples.
What does this theorem tell us?
A general function has an infinite number of degrees of freedom-the
value of the function at every point can be chosen independently. The
NyquistShannon sampling theorem shows that a bandlimited function
has only 2B degrees of freedom per second. The values of the function
at the sample points can be chosen independently, and this specifies the
entire function.
If a function is bandlimited, it cannot be limited in time. But we can
consider functions that have most of their energy in bandwidth B and
have most of their energy in a finite time interval, say (0,T).
Now lets look back the problem of communication over a

bandlimited channel.
Assuming that the channel has bandwidth B, we can represent both the
input and the output by samples taken 1/2B seconds apart
If the noise has power spectral density N0 /2 watts/hertz and bandwidth

B hertz, the noise has power N20 2B = N0 B and each of the 2BT noise
samples in time T has variance N0 BT /2BT = N0 /2 .
Looking at the input as a vector in the 2TB-dimensional space, we see
that the received signal is spherically normally distributed about this
point with covariance N0 I .
2

Capacity of Bandlimited Channels
Let the channel be used over the time interval [0,T]. In this case, the
energy per sample is P T /2BT = P/2B , the noise variance per sample
is N0 2B T = N0 /2 , and hence the capacity per sample is
2 2BT
!
P
1 1 P
C = log2 1 + 2B = log2 1 + (bits per sample)
2 N0 /2 2 N0 B
Since there are 2B samples each second, the capacity of the channel
can be rewritten as
P
C = B log2 1 + (bits/s, bps)
N0 B
(This equation is one of the most famous formulas of information
theory. It gives the capacity of a bandlimited Gaussian channel with
noise spectral density N0 /2 watts/Hz and power P watts.)
If we let B ! 1 in the above capacity formula, we obtain
P (for infinite bandwidth channels, the capacity grows
C= log2 e
N0 linearly with the power.)
Assume a discrete-time
p channel with stationary and ergodic time-
varying gain g[i], 0 g[i], and AWGN n[i], as shown in Figure 4.1.
The channel power gain g[i] follows a given distribution p(g), e.g.
for Rayleigh fading p(g) is exponential. The channel gain g[i] can
change at each time i, either as an i.i.d. process or with some
correlation over time.

In a block fading channel g[i] is constant over some block length T after
which time g[i] changes to a new independent value based on the
distribution p(g).
Let denote the average transmit signal power, N0 /2 denote the noise
power spectral density of n[i] , and B denote the received signal
bandwidth. The instantaneous received signal-to-noise ratio (SNR) is
then [i] = P g[i]/(N0 B), 0 [i] < 1 and its expected value over all
time is = P g/(N0 B) .
The channel gain g[i] , also called the channel side (state) information
(CSI), changes during the transmission of the codeword.
The capacity of this channel depends on what is known about g[i] at the
transmitter and receiver. We will consider three different scenarios
regarding this knowledge: Channel Distribution Information (CDI),
Receiver CSI and Transmitter and Receiver CSI.

First consider the case where the channel gain distribution p(g) or,
equivalently, the distribution of SNR p( ) is known to the transmitter
and receiver.
For i.i.d. fading the capacity is given by (4.3), but solving for the
capacity-achieving input distribution, i.e., the distribution achieving
the maximum in (4.3), can be quite complicated depending on the
fading distribution.
For these reasons, finding the capacity-achieving input distribution
and corresponding capacity of fading channels under CDI remains an
open problem for almost all channel distributions.
Now consider the case where the CSI g[i] is known at the receiver at
time i. Equivalently, [i] is known at the receiver at time i.
Also assume that both the transmitter and receiver know the
distribution of g[i] . In this case, there are two channel capacity
definitions that are relevant to system design: Shannon capacity, also
called ergodic capacity, and capacity with outage.
For the AWGN channel, Shannon capacity defines the maximum data
rate that can be sent over the channel with asymptotically small error
probability.
Note that for Shannon capacity the rate transmitted over the channel is
constant: the transmitter cannot adapt its transmission strategy relative
to the CSI.
Capacity with outage is defined as the maximum rate that can be
transmitted over a channel with some outage probability corresponding
to the probability that the transmission cannot be decoded with
negligible error probability.
The basic premise of capacity with outage is that a high data rate can be
sent over the channel and decoded correctly except when the channel is
in deep fading.
By allowing the system to lose some data in the event of deep fades, a
higher data rate can be maintained if all data must be received correctly
regardless of the fading state.

Shannon capacity of a fading channel with receiver CSI for an average
power constraint P can be obtained as
Z 1
C= B log2 (1 + )p( )d (4.4)
0
By Jensens inequality,
Z
E[B log2 (1 + )]= B log2 (1 + )p( )d B log2 (1 + E[ ])
= B log2 (1 + ),
where is the average SNR on the channel.
Here we see that the Shannon capacity of a fading channel with receiver
CSI only is less than the Shannon capacity of an AWGN channel with the
same average SNR.
In other words, fading reduces Shannon capacity when only the receiver
has CSI.

Example 4.2: Consider a flat-fading channel with i.i.d. channel gain g[i]
which can take on three possible values: with probability ,
with probability with probability .
The transmit power is 10 mW, the noise spectral density is W/Hz,
and the channel bandwidth is 30 KHz. Assume the receiver has knowledge
of the instantaneous value of g[i] but the transmitter does not. Find the
Shannon capacity of this channel and compare with the capacity of an
AWGN channel with the same average SNR.

Capacity with outage applies to slowly-varying channels, where the

instantaneous SNR is constant over a large number of transmissions (a
transmission burst) and then changes to a new value based on the fading
distribution.
If the channel has received SNR during a burst, then data can be sent
over the channel at rate B log2 (1 + )with negligible probability of error
Capacity with outage allows bits sent over a given transmission burst to
be decoded at the end of the burst with some probability that these bits
will be decoded incorrectly.
Specifically, the transmitter fixes a minimum received SNR min and
encodes for a data rate C = B log2 (1 + min ) .
The data is correctly received if the instantaneous received SNR is greater
than or equal to min .
If the received SNR is below min then the receiver declares an outage.
The probability of outage is thus pout = p( < min ) .
The average rate correctly received over many transmission bursts is
Co = (1 pout )B log2 (1 + min ) since data is only correctly received on
1 pout transmissions.
Capacity with outage is

typically characterized by a plot
of capacity versus outage, as
shown in Figure 4.2.
In Figure 4.2. we plot the
normalized capacity
C/B = log2 (1 + min ) as a
function of outage prob.
pout = p( < min ) for a Rayleigh
fading channel with = 20 dB.
When both the transmitter and receiver have CSI, the transmitter can
adapt its transmission strategy relative to this CSI, as shown in Figure
4.3.
In this case, there is no notion of capacity versus outage where the

transmitter sends bits that cannot be decoded, since the transmitter
knows the channel and thus will not send bits unless they can be
decoded correctly.

Now consider the Shannon capacity when the channel power gain g[i] is
known to both the transmitter and receiver at time i.
Let s[i] be a stationary and ergodic stochastic process representing the
channel state, which takes values on a finite set S of discrete memoryless
channels.
Let Cs denote the capacity of a particular channel s 2 S , and p(s)
denote the probability, or fraction of time, that the channel is in state
s. The capacity of this time-varying channel is then given by
X
C = Cs p(s). (4.6)
s2S
The capacity of an AWGN channel with average received SNR is
C = B log2 (1 + )

From (4.6) the capacity of the fading channel with transmitter and
receiver side information is
Let us now allow the transmit power P ( ) to vary with , subject to

an average power constraint P :
Z 1
P ( )p( )d P . (4.8)
0
Define the fading channel capacity with average power constraint as
(4.9)
(Can this capacity be achieved?)

Figure 4.4 shows the main idea of how to achieve the capacity in (4.9)
To find the optimal power allocation P ( ) , we form the Lagrangian

Next we differentiate the Lagrangian and set the derivative equal to zero:
Solving for P ( ) with the constraint that P ( ) > 0 yields the optimal
power adaptation that maximizes (4.9) as
(
1 1
P( ) , 0
= 0
(4.12)
P 0, < 0
for some cutoff value 0:
If [i] is below this cutoff then no data is transmitted over the ith time
interval, so the channel is only used at time i if .
Substituting (4.12) into (4.9) then yields the capacity formula:
Z 1
C= B log2 p( )d (4.13)
0 0
The multiplexing nature of the capacity-achieving coding strategy
indicates that (4.13) is achieved with a time varying data rate, where the
rate corresponding to instantaneous SNR is B log2 ( / 0 ) .
Note that the optimal power allocation policy (4.12) only depends on
the fading distribution p( ) through the cutoff value 0 . This cutoff
value is found from the power constraint.
By rearranging the power constraint and replacing the inequality with
equality (since using the maximum available power will always be optimal)
yields the power constraint
Z 1
P( )
p( )d = 1.
0 P
By using the optimal power allocation (4.12), we have
Z 1
1 1
p( )d = 1
0 0

Note that this expression only depends on the distribution p( ) . The value
for 0 cannot be solved for in closed form for typical continuous pdfs p( )
and thus must be found numerically.
Since is time-varying, the maximizing power adaptation policy of (4.12)
is a water-filling formula in time, as illustrated in Figure 4.5.
1
1/ 0 The water-filling
P( ) terminology refers to the
fact that the line 1/
P sketches out the bottom of
a bowl, and power is
1/ poured into the bowl to a
constant water level
of 1/ 0 .

Example 4.4: Assume the same channel as in the previous example, with
a bandwidth of 30 KHz and three possible received SNRs: 1 = .8333
with p( 1 ) = .1, 2 = 83.33 with p( 2 ) = .5 and 3 = 333.33 with p( 3 ) = .4.
Find the ergodic capacity of this channel assuming both transmitter and
receiver have instantaneous CSI.
Solution: We know the optimal power allocation is water-filling, and we
need to find the cutoff value 0 that satisfies the discrete version of (4.15)
given by X 1
1
p( i ) = 1.
0 i
i 0
We first assume that all channel states are used to obtain 0 , i.e.
assume 0 mini i , and see if the resulting cutoff value is below that of
the weakest channel. If not then we have an inconsistency, and must redo
the calculation assuming at least one of the channel states is not used.
Applying (4.17) to our channel model yields


Comparing with the results of the previous example we see that this rate is
only slightly higher than for the case of receiver CSI only, and is still
significantly below that of an AWGN channel with the same average SNR.
That is because the average SNR for this channel is relatively high: for low
SNR channels capacity in flat-fading can exceed that of the AWGN channel
with the same SNR by taking advantage of the rare times when the channel
is in a very good state.
Zero-Outage Capacity and Channel Inversion: now consider a suboptimal

transmitter adaptation scheme where the transmitter uses the CSI to
maintain a constant received power, i.e., it inverts the channel fading.
The channel then appears to the encoder and decoder as a time-invariant

AWGN channel. This power adaptation, called channel inversion, is
given by P ( )/P = / where equals the constant received SNR that
can be maintained with the transmit power constraint (4.8).

R
The constant thus satisfies p( )d = 1 so = E[1/ ]
Fading channel capacity with channel inversion is just the capacity of an
AWGN channel with SNR :

1
C = B log2 (1 + ) = B log2 1 + (4.18)
E[1/ ]
The capacity-achieving transmission strategy for this capacity uses a fixed-
rate encoder and decoder designed for an AWGN channel with SNR .
This has the advantage of maintaining a fixed data rate over the channel
regardless of channel conditions.
For this reason the channel capacity given in (4.18) is called zero-outage
capacity.
Zero-outage capacity can exhibit a large data rate reduction relative to
Shannon capacity in extreme fading environments. For example, in
Rayleigh fading E[1/ ] is infinite, and thus the zero-outage capacity given
by (4.18) is zero.
a bandwidth of 30 KHz and three possible received SNRs: 1 = .8333
with p( 1 ) = .1, 2 = 83.33 with p( 2 ) = .5 and 3 = 333.33 with p( 3 ) = .4.
Assuming transmitter and receiver CSI, find the zero-outage capacity
of this channel.
= 1/E[1/ ]
E[1/ ] =
The outage capacity is defined as the maximum data rate that can be
maintained in all non-outage channel states times the probability of non-
outage.

Outage capacity is achieved with a truncated channel inversion policy for
power adaptation that only compensates for fading above a certain cutoff
fade depth 0 : (
P( ) , 0
=
P 0, < 0
Since the channel is only used when 0 , the power constraint (4.8)
yields = 1/E 0 [1/ ] , where
Z 1
1
E 0 [1/ ] , p( )d .
0
The outage capacity associated with a given outage probability pout and
corresponding cutoff 0 is given by

1
C(pout ) = B log2 1 + P[ 0 ].
E 0 [1/ ]

We can also obtain the maximum outage capacity by maximizing outage
capacity over all possible 0 :

1
C = max B log2 1 + P[ 0 ].
0 E 0 [1/ ]
This maximum outage capacity will still be less than Shannon capacity
(4.13) since truncated channel inversion is a suboptimal transmission
strategy.
a bandwidth of 30 KHz and three possible received SNRs: 1 = .8333 with
p( 1 ) = .1, 2 = 83.33 with p( 2 ) = .5 , and 3 = 333.33 with p( 3 ) = .4 .
Find the outage capacity of this channel and associated outage
probabilities for cutoff values 0 = .84 and 0 = 83.4 . Which of these
cutoff values yields a larger outage capacity?

II. Capacity of MIMO Channels

Circular Complex Gaussian Vectors
A complex random vector is of the form x = xR + jxI where xR and xI
are real random vectors.
Complex Gaussian random vectors are ones in which [xR , xI ]t is a
real Gaussian random vector.
The distribution is completely specified by the mean and covariance
matrix of the real vector [xR , xI ]t .
Define
where A is the transpose of the matrix A with each element replaced by its
complex conjugate, and At is just the transpose of A.
Note that in general the covariance matrix K of the complex random
vector x by itself is not enough to specify the full second-order statistics
of x. Indeed, since K is Hermitian, i.e., K = K, the diagonal elements are
real and the elements in the lower and upper triangles are complex
conjugates of each other.
In wireless communication, we are almost exclusively interested in
complex random vectors that have the circular symmetry property:
For a circular symmetric complex random vector x,
for any ; hence the mean = 0 . Moreover
for any ; hence the pseudo-covariance matrix J is also zero.

Thus, the covariance matrix K fully specifies the first and second
order statistics of a circular symmetric random vector.

And if the complex random vector is also Gaussian, K in fact specifies
its entire statistics.
A circular symmetric Gaussian random vector with covariance matrix
K is denoted as .
Some special cases:
A complex Gaussian random variable w = wR + jwI with i.i.d.
zero-mean Gaussian real and imaginary components is circular
symmetric. In fact, a circular symmetric Gaussian random
variable must have i.i.d. zero-mean real and imaginary
components.
A collection of n i.i.d. CN (0, 1) random variables forms a standard
circular symmetric Gaussian random vector w and is denoted
by CN (0, I) . The density function of w can be explicitly written as
Uw has the same distribution as w for any complex orthogonal

matrix U (such a matrix is called a unitary matrix and
characterized by the property UU = I.)
Narrowband MIMO Model
Here we consider a narrowband MIMO channel. A narrowband point-to-
point communication system of transmit and receive antennas is
shown in the following figure.

Narrowband MIMO Model
Or simply as y = Hx + n
We assume a channel bandwidth of B and complex Gaussian noise with

zero mean and covariance matrix , where typically n2 = N0 B .
For simplicity, given a transmit power constraint P we will assume an
equivalent model with a noise power of unity and transmit power
P/ n2 = , where can be interpreted as the average SNR per receive
antenna under unity channel gain.
This power constraint implies that the input symbols satisfy
Mt
X
E[xi xi ] = , (10.1)
i=1
where is the trace of the input covariance matrix Rx = E[xxT ]

Parallel Decomposition of the MIMO Channel
When both the transmitter and receiver have multiple antennas, there is
another mechanism for performance gain called (spatial) multiplexing
gain.
The multiplexing gain of an MIMO system results from the fact that a
MIMO channel can be decomposed into a number RH of parallel
independent channels.
By multiplexing independent data onto these independent channels, we
get an RH -fold increase in data rate in comparison to a system with just
one antenna at the transmitter and receiver.
Consider a MIMO channel with Mr Mt channel gain matrix known
to both the transmitter and the receiver.
Let denote the rank of H. From matrix theory, for any matrix H we
can obtain its singular value decomposition (SVD) as
(10.2)
where is an Mr Mt diagonal matrix of singular values of H.

Since cannot exceed the number of columns or rows of H, RH min{Mt , Mr }
If H is full rank, which is sometimes referred to as a rich scattering
environment, then RH = min(Mt , Mr ) .
The parallel decomposition of the channel is obtained by defining a
transformation on the channel input and output x and y through transmit
precoding and receiver shaping, as shown in the following figure:
Figure 10.2: Transmit Precoding and Receiver Shaping.

The transmit precoding and receiver shaping transform the MIMO
channel into parallel single-input single-output (SISO) channels with
input and output , since from the SVD, we have that

= UH n and is the diagonal matrix of singular values of H

where n
with i on the ith diagonal.

This parallel decomposition is shown in the following figure:
Figure 10.3: Parallel Decomposition of the MIMO Channel.

MIMO Channel Capacity
The capacity of a MIMO channel is an extension of the mutual
information formula for a SISO channel given in the previous lecture to a
matrix channel.
Specifically, the capacity is given in terms of the mutual information
between the channel input vector x and output vector y as
C = max I(X; Y) = max[H(Y) H(Y|X)] (10.5)

p(x) p(x)
The definition of entropy yields that H(Y|X) = H(N) ,the entropy in

the noise. Since this noise n has fixed entropy independent of the
channel input, maximizing mutual information is equivalent to
maximizing the entropy in y.
The mutual information of y depends on its covariance matrix, which
for the narrowband MIMO model is given by
E[yyH ] (10.6)

where Rx is the covariance of the MIMO channel input.
Consider vector Gaussian input and noise. The mutual information

between x and y for a given channel gain matrix can be found as
I(x; y|H) = B log2 det(eRy ) B log2 det(eRn )

where Rn = IMr .
According to the results of Ry and Rn , we can have

det(Ry )
I(x; y|H) = B log2 = B log2 det IMr + HRx HH
det(Rn )
where det(A) denotes the determinant of the matrix A.

Thus, the mutual information for input set X and output Y for a
constant channel gain matrix H can be shown as
I(X; Y) = B log2 det IMr + HRx HH (10.7)
The MIMO capacity is achieved by maximizing the mutual information

over all input covariance matrices satisfying the power constraint:
H

C= max B log2 det IMr + HRx H , (10.8)
Rx :Tr(Rx )=
Now let us consider the case of Channel Known at Transmitter

Substituting the matrix SVD of H into C and using properties of unitary
matrices we get the MIMO capacity with CSIT and CSIR as
X
C = max B log2 (1 + i2 i ). (10.9)
i :i i
i
Since = P/ n2 , the capacity (10.9) can also be expressed in terms of the
power allocation Pi to the ith parallel channel as
(10.10)
where i = Pi / n2 and i = 2 2
i P/ n is the SNR associated with the ith
channel at full power.
Solving the optimization leads to a water-filling power allocation for the
MIMO channel: (
1 1
Pi , i 0
= 0 i
(10.11)
P 0, i < 0

The resulting capacity is then
X
C= B log2 ( i / 0 ). (10.12)
i: i 0
Now consider the case of Channel Unknown at Transmitter (Uniform

Power Allocation)
Suppose now that the receiver knows the channel but the transmitter
does not. Without channel information, the transmitter cannot optimize
its power allocation or input covariance structure across antennas.
If the distribution of H follows the ZMSW channel gain model, there is
no bias in terms of the mean or covariance of H.
Thus, it seems intuitive that the best strategy should be to allocate equal
power to each transmit antenna, resulting in an input covariance matrix
equal to the scaled identity matrix:

Rx = IM
Mt t
It is shown that under these assumptions this input covariance matrix
indeed maximizes the mutual information of the channel.
For an Mt transmit, Mr -receive antenna system, this yields mutual
information given by

(10.13)
I = B log2 det IMr + HHH
Mt
Using the SVD of H, we can express this as
RH
X
i
I= B log2 1 + ,
i=1
M t
where i = i2 = i2 P/ 2
n and RH is the number of nonzero
singular values of H.
The mutual information of the MIMO channel (10.13) depends on the
specific realization of the matrix H, in particular its singular values { i }.

In capacity with outage the transmitter fixes a transmission rate C, and
the outage probability associated with C is the probability that the
transmitted data will not be received correctly or, equivalently, the
probability that the channel H has mutual information less than C.
This probability is given by
P (10.14)
Note that for fixed Mr , under the ZMSW (Zero-Mean Spatially White)
model the law of large numbers implies that
1
lim HHH = IMr . (10.15)
Mt !1 Mt
Substituting this into (10.13) yields that the mutual information in the
asymptotic limit of large Mt becomes a constant equal to
C = Mr B log2 (1 + ) (The max. rate can be achieved by Massive MIMO!)
We can have two important observations from the results in (10.14)
and (10.15)
As SNR grows large, capacity also grows linearly with M = min{Mt , Mr }
for any Mt and Mr .
At very low SNRs transmit antennas are not beneficial: Capacity only
scales with the number of receive antennas independent of the number of
transmit antennas.
Fading Channels
Channel Known at Transmitter: Water-Filling
EH
EH (10.16)

A less restrictive constraint is a long-term power constraint, where we
can use different powers for different channel realizations subject to the
average power constraint over all channel realizations.
The ergodic capacity in this case is
EH (10.17)
Channel Unknown at Transmitter: Ergodic Capacity and Capacity

with Outage
Consider now a time-varying channel with random matrix H
known at the receiver but not the transmitter. The transmitter
assumes a ZMSW distribution for H.
The two relevant capacity definitions in this case are ergodic
capacity and capacity with outage.
Ergodic capacity defines the maximum rate, averaged over all
channel realizations, that can be transmitted over the channel
for a transmission strategy based only on the distribution of H.
This leads to the transmitter optimization problem - i.e., finding the
optimum input covariance matrix to maximize ergodic capacity subject
to the transmit power constraint.
Mathematically, the problem is to characterize the optimum Rx to
maximize
EH (10.18)
where the expectation is with respect to the distribution on the channel matrix
H, which for the ZMSW model is i.i.d. zero-mean circularly symmetric unit
variance.
As in the case of scalar channels, the optimum input covariance matrix
that maximizes ergodic capacity for the ZMSW model is the scaled
identity matrix M IMt . Thus the ergodic capacity is given by:
t
EH (10.19)

The ergodic capacity of a 4x4 MIMO system with i.i.d. complex
Gaussian channel gains is shown in Figure 10.4.
Figure 10.4: Ergodic

Capacity of 4 4
MIMO Channel

Capacity with outage is defined similar to the definition for static channels
described in previous, although now capacity with outage applies to a
slowly-varying channel where the channel matrix H is constant over a
relatively long transmission time, then changes to a new value.
As in the static channel case, the channel realization and corresponding
channel capacity is not known at the transmitter, yet the transmitter must
still fix a transmission rate to send data over the channel.
For any choice of this rate C, there will be an outage probability associated
with C, which defines the probability that the transmitted data will not be
received correctly.
The outage capacity can sometimes be improved by not allocating power to
one or more of the transmit antennas, especially when the outage
probability is high. This is because outage capacity depends on the tail of
the probability distribution.
With fewer antennas, less averaging takes place and the spread of the tail
increases.

The capacity with outage of a 4 4 MIMO system with i.i.d. complex
Gaussian channel gains is shown in Figure 10.5
Figure 10.5: Capacity

with Outage of a 4 4
MIMO Channel.

Figure 10.6: Outage

Probability
Distribution of a 4 4
MIMO Channel

Lecture 2

Încărcat de

Informații document

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Lecture 2

Încărcat de

Drepturi de autor:

Formate disponibile

MIMO Communication Systems

Prof. Chun-Hung Liu

2017/3/7 Lecture 2: Capacity of Wireless Channels 1

Capacity of MIMO Channels (Chapter 10.1~10.3 in

2017/3/7 Lecture 2: Capacity of Wireless Channels 2

2017/3/7 Lecture 2: Capacity of Wireless Channels 3

For the AWGN channel, the maximizing input distribution is Gaussian,

2017/3/7 Lecture 2: Capacity of Wireless Channels 5

2017/3/7 Lecture 2: Capacity of Wireless Channels 6

2017/3/7 Lecture 2: Capacity of Wireless Channels 7

The noise Zi is assumed to be

2017/3/7 Lecture 2: Capacity of Wireless Channels 8

2017/3/7 Lecture 2: Capacity of Wireless Channels 9

where (x) is the cumulative normal function

Definition of Differential Entropy h(X) :

2017/3/7 Lecture 2: Capacity of Wireless Channels 11

Lets compute this in nats.

2017/3/7 Lecture 2: Capacity of Wireless Channels 12

where I(X; Y ) is called the mutual information between X and Y.

I(X; Y ) = h(Y ) h(Y |X) = h(Y ) h(X + Z|X)

since Z is independent of X. Now h(Z) = 1

E[Y 2 ] = E[(X + Z)2 ] = E[X 2 ] + 2E[X]E[Z] + E[Z 2 ] = P + N,

since X and Z are independent and E[Z] = 0 .

I(X; Y ) = h(Y ) h(Z)

2017/3/7 Lecture 2: Capacity of Wireless Channels 15

2017/3/7 Lecture 2: Capacity of Wireless Channels 16

The sample values f ( 2B

Consider the function

Now lets look back the problem of communication over a

If the noise has power spectral density N0 /2 watts/hertz and bandwidth

2017/3/7 Lecture 2: Capacity of Wireless Channels 19

2017/3/7 Lecture 2: Capacity of Wireless Channels 21

2017/3/7 Lecture 2: Capacity of Wireless Channels 22

2017/3/7 Lecture 2: Capacity of Wireless Channels 24

2017/3/7 Lecture 2: Capacity of Wireless Channels 25

2017/3/7 Lecture 2: Capacity of Wireless Channels 26

Capacity with outage applies to slowly-varying channels, where the

Capacity with outage is

In this case, there is no notion of capacity versus outage where the

2017/3/7 Lecture 2: Capacity of Wireless Channels 29

The capacity of an AWGN channel with average received SNR is

2017/3/7 Lecture 2: Capacity of Wireless Channels 30

Let us now allow the transmit power P ( ) to vary with , subject to

Define the fading channel capacity with average power constraint as

(Can this capacity be achieved?)

To find the optimal power allocation P ( ) , we form the Lagrangian

2017/3/7 Lecture 2: Capacity of Wireless Channels 32

for some cutoff value 0:

2017/3/7 Lecture 2: Capacity of Wireless Channels 34

2017/3/7 Lecture 2: Capacity of Wireless Channels 35

2017/3/7 Lecture 2: Capacity of Wireless Channels 36

2017/3/7 Lecture 2: Capacity of Wireless Channels 37

Zero-Outage Capacity and Channel Inversion: now consider a suboptimal

The channel then appears to the encoder and decoder as a time-invariant

2017/3/7 Lecture 2: Capacity of Wireless Channels 38

2017/3/7 Lecture 2: Capacity of Wireless Channels 40

2017/3/7 Lecture 2: Capacity of Wireless Channels 41

2017/3/7 Lecture 2: Capacity of Wireless Channels 42

2017/3/7 Lecture 2: Capacity of Wireless Channels 43

For a circular symmetric complex random vector x,

for any ; hence the mean = 0 . Moreover

for any ; hence the pseudo-covariance matrix J is also zero.

2017/3/7 Lecture 2: Capacity of Wireless Channels 45

Uw has the same distribution as w for any complex orthogonal