Sunteți pe pagina 1din 32

Chapter 4

Signal Design Trade-Os


4.1 Introduction
In Chapters 2 and 3 we have focused on the receiver, assuming that the signal set was
given to us. In this chapter we introduce the signal design. We have three main goals in
mind: (i) Introduce the design parameters we care mostly about; (ii) sharpen our intuition
about the role played by the dimensions of the signal space as we increase the number
of bits to be transmitted; and (iii) dene the signal design strategy to be pursued in the
next two chapters. We will also discuss isometries that may be applied to the signal set
to vary some design parameters while keeping other parameters xed, notably the error
probability. The continuous-time AWGN channel model is assumed.
4.2 Design Parameters
The problem of choosing a convenient signal constellation is not as clean-cut as the receiver
design problem. The reason is that the receiver design problem has a clear objective, to
minimize the error probability, and one solution, namely the MAP rule. In contrast,
when we choose a signal constellation we make tradeos among conicting objectives.
The design parameters and the performance measures we are mostly concerned with are:
The cardinality m of the message set H. Since in most cases the message consists
of bits, typically we choose m to be a power of 2. Whether m is a power of 2 or
not, we say that a message is worth k = log
2
m bits.
The implementation cost and computational complexity. To keep the discussion as
simple as possible, we continue to assume that the cost is determined by the number
of matched lters in the n-tuple former and the complexity is that of the decoder.
121
122 Chapter 4.
The message error probability P
e
and the bit error rate P
b
. The former, also called
block error probability, is the error probability we have considered so far. The latter
can be computed, in principle, once we specify the mapping between the set of k -bit
sequences and the set of messages. Until then, the only statement we can make about
P
b
is that
Pe
k
P
b
P
e
. The left bound applies with equality if a message error
always translates into 1-out-of- k bits being incorrectly reproduced. The right is an
equality if all bits are incorrectly reproduced each time that there is a message error.
Whether we care more about P
e
or P
b
depends on the application. If we send a
le that contains a computer program, every single bit of the le has to be received
correctly in order for the transmission to be successful. In this case we clearly want
P
e
to be small. However, there are sources that are more tolerant to occasional errors.
This is the case of a digitized voice signal. For voice it is sucient to have P
b
small.
To appreciate the dierence between P
e
and P
b
, consider the hypothetical situation
in which one message corresponds to k = 10
3
bits and 1 bit of every message is
incorrectly reconstructed. Then the message error probability is 1 (every message is
incorrectly reconstructed), whereas the bit error probability is 10
3
.
The average signals energy E and the average energy per bit E
b
. We know already
how to computer the former. The latter is E
b
=
E
k
. We are typically willing to double
the energy to send twice as many bits. In this case we x E
b
and let E be a function
of k .
The transmission rate.
The bandwidth. Several denitions will be introduced.
Scalability, in the sense that with the same sender/receiver pair we ought to be able
to communicate bit sequences of any length.
Clearly we desire scalability, high transmission rate, little energy spent per bit, small
bandwidth, small error probability (message or bit, depending on the application), low
cost and low complexity. As already mentioned, some of these goals conict. For instance,
starting from a given codebook we can trade energy for error probability by scaling down
all the codewords by some factor. In so doing the average energy will decrease and so will
the distance between codewords, which implies that the error probability will increase.
Alternatively, once we have reduced the energy by scaling down the codewords we can
add new codewords at the periphery of the codeword constellation, choosing their location
in such a way that new codewords do not further reduce the error probability. We keep
doing this until the average energy has returned to the original value. In so doing we trade
bit rate for error probability. By removing codewords at the periphery of the codeword
constellation we can trade bit rate for energy. All these manipulations pertain to the
encoder. By acting inside the waveform former, we can boost the bit-rate at the expense
of bandwidth. For instance, we can substitute
i
(t) with
i
(t) =

b
i
(bt) for some
b > 1. This scales the duration of all signals by 1/b with two consequences. First, the
bit rate is multiplied by b . (It takes a fraction b of time to send the same number of
4.3. Bandwidth Denitions 123
bits.) Second, the signals bandwidth expands by b . (The scaling property of the Fourier
transform asserts that the Fourier transform of (bt) is
1
|b|

F
(
f
b
) .) These examples are
meant to show that there is considerable margin for trading among bit rate, bandwidth,
error probability, and average energy.
4.3 Bandwidth Denitions
The fact that all man-made signals have a nite duration implies that their Fourier trans-
form has innite support. This is slightly annoying since it would be nice if we could
dene the duration T and the bandwidth W of a signal as the width of the signals
time-domain and frequency-domain support, respectively.
There are many reasonable denitions of bandwidth. To focus on the essential, in this
section we assume that h(t) is a function such that |h
F
(f)| is even. This is always the
case if h(t) is real-valued. We also assume that |h
F
(f)| has non-negligible energy around
f = 0, i.e., that h(t) is a baseband (as opposed to passband). The distinction between a
baseband and passband functions may seem fussy at this point but for all practical pulses
and signals it is very clear because |h
F
(f)| has either most of its energy concentrated in
an interval around f = 0 or it has most of its energy around f
c
for some large carrier
frequency f
c
. For now we focus on baseband pulses and signals. Chapter7 is dedicated
to the passband counterparts.
The pulse h(t) represents either the impulse response of a lter, in which case its band-
width is the lters bandwidth, or a signal we use to communicate. For each bandwidth
denition given below, there is an example in Exercise 1.
Absolute Bandwidth: It is the width of the support of h
F
(f) . So if the support of
h
F
(t) is the interval [
W
2
,
W
2
] then the absolute bandwidth is W . As mentioned in
the previous paragraph, for signals that we use in practice the absolute bandwidth is
innite. However, in examples we sometimes use signals that do have a nite absolute
bandwidth.
3-dB Bandwidth: The 3-dB bandwidth, if it exists, is the width of the interval
I = [
W
2
,
W
2
] in the interior of which |h
F
(f)|
2

|h
F
(0)|
2
2
and outside of which
|h
F
(f)|
2
<
|h
F
(0)|
2
2
. In other words, outside I the value of |h
F
(f)| has dropped by
more than 3-dB from its value at f = 0.
-Bandwidth: For any number (0, 1) , the -bandwidth is the smallest number
W such that
_ W
2

W
2
|h
F
(f)|
2
df = (1 )
_

|h
F
(f)|
2
df.
It is the size of the interval [
W
2
,
W
2
] that contains a fraction (1 ) of the signals
energy. Reasonable values for are = 0.1 and = 0.01. (Recall that by Parsvals
relationship, the integral on the right equals the squared norm khk
2
.)
124 Chapter 4.
First Zero-Crossing Bandwidth: The rst zero-crossing bandwidth, if it exists, is that
W for which |h
F
(f)| is positive in the interior of I = [
W
2
,
W
2
] and vanishes on the
boundary of I .
Equivalent Noise Bandwidth: It is W if
_
|h
F
(f)|
2
df = W|h
F
(0)|
2
. This bandwidth
name comes from the fact that if we feed with white noise a lter of impulse response
h(t) and we feed with the same input an ideal lowpass lter of frequency response
|h
F
(0)|
[
W
2
,
W
2
]
(f) , then the output power is the same in both situations.
Root-Mean Square (RMS) Bandwidth: It is dened as
W =
_
_

f
2
|h
F
(f)|
2
df
_

|h
F
(f)|
2
df
_1
2
.
To understand this denition, notice that the function g(f) :=
|h
F
(f)|
2
R

|h
F
(f)|
2
df
is non-
negative, even, and integrates to 1. Hence it is the density of some zero-mean random
variable and W =
_
_
f
2
g(f)df is the standard deviation of that random variable.
The reader should be aware that some authors dene the bandwidth by considering only
positive frequencies. Since we have assumed that |h
F
(f)| is an even function, the value
they obtain is exactly half the value obtained by considering the entire frequency axis.
The denition we have used easily extends to the cases where |h
F
(f)| is not even, and it is
more useful in answering some fundamental questions, see in particular Section 4.7. The
other, single-sided, denition makes sense for real-valued functions and it is somewhat
more useful for passband signals (see Chapter 7). We use them both, reserving the letter
B for single-sided bandwidths of real-valued signals.
4.4 Isometric Transformations Applied to the Codebook
If the channel is AWGN and the receiver implements a MAP rule, the error probability is
completely determined by the codebook C = {c
0
, . . . , c
m1
}. The purpose of this section is
to identify transformations to the codebook that do not aect the error probability. What
we do generalizes to complex-valued codebooks and complex-valued noise. However, since
we are not ready to discuss complex-valued random variables, for the moment we assume
that the codebook and the noise are real-valued.
From the geometrical intuition gained in Chapter 2, it should be clear to the reader that
the probability of error remains the same if a given codebook and the corresponding
decoding regions are translated by the same n-tuple b R
n
.
A translation is a particular instance of an isometry. An isometry is a distance preserving
transformation. Formally, given an inner product space V , a : V V is an isometry if
and only if for any V and V , the distance between and equals that between
a() and a() . The following example gives three isometries applied to a codebook.
4.4. Isometric Transformations Applied to the Codebook 125
Example 61. Figure 4.1 shows an original codebook C = {c
0
, c
1
, c
2
, c
3
} and three varia-
tions obtained by applying to C a reection, a rotation, and an translation, respectively.
In each case the isometry a : R
n
R
n
sends c
i
to c
i
= a(c
i
) .

2
c
0
c
1
c
2
c
3
(a) Original Codebook C

2
c
3
c
2
c
1
c
0
(b) Reected Codebook

2
c
0
c
1
c
2
c
3
(c) Rotated Codebook

2
c
0
c
1
c
2
c
3
(d) Translated Codebook
Figure 4.1: Isometries
Next we give a formal proof that if we apply an isometry to a codebook and its decoding
regions, then the error probability associated to the new codebook and the new regions
is the same as that of the original codebook and original regions. Let
g() =
1
(2
2
)
n/2
exp


2
2
2

, R
so that for Z N(0,
2
I
n
) we can write f
Z
(z) = g(kzk) . Then for any codebook
126 Chapter 4.
C = {c
0
, . . . , c
m1
}, decoding regions R
0
, . . . , R
m1
, and isometry a : R
n
R
n
we have
P
c
(i) = Pr{Y R
i
|codeword c
i
is transmitted}
=
_
yR
i
g(ky c
i
k)dy
(a)
=
_
yR
i
g(ka(y) a(c
i
)k)dy
(b)
=
_
a(y)a(R
i
)
g(ka(y) a(c
i
)k)dy
(c)
=
_
a(R
i
)
g(k a(c
i
)k)d
= Pr{Y a(R
i
)|codeword a(c
i
) is transmitted},
where in (a) we use the distance preserving property of an isometry, in (b) we use the
fact that y R
i
if and only if a(y) a(R
i
) , and in (c) we make the change of variable
= a(y) and use the fact that the Jacobian of an isometry is 1. The last line is the
probability of decoding correctly when the transmitter sends a(c
i
) and the corresponding
decoding region is a(R
i
) .
One can show that all isometries in R
n
are obtained from the composition of translation,
rotation, and reection. If we apply a rotation or a reection to an n-tuple, we do not
change its norm. Hence reections and rotations applied to a signal set do not change
the average energy, but translations generally do. In the next section, we determine the
translation that minimizes the average energy.
4.5 The Energy-Minimizing Translation
We keep assuming that vectors, scalars, and random variables are dened over the reals.
Generalization to the complex-valued counterparts is straightforward. But for this, we
rst need to introduce complex-valued random variables (Appendix 7.B).
Let

Y be a zero-mean random vector in R
n
. For any b R
n
,
Ek

Y +bk
2
= Ek

Y k
2
+kbk
2
+ 2Eh

Y , bi = Ek

Y k
2
+kbk
2
Ek

Y k
2
with equality if and only if b = 0. An arbitrary (not necessarily zero-mean) random
vector Y R
n
can be written as Y =

Y + m where m = E[Y ] and

Y = Y m is
zero-mean. The above inequality can then be restated as
EkY bk
2
Ek

Y k
2
,
with equality if and only if b = m.
We apply the above to a codebook C = {c
0
, . . . c
m1
}. If we let Y be the random variable
that takes value c
i
with probability P
H
(i) , then we see that the average energy E =
4.5. The Energy-Minimizing Translation 127
E[kY k
2
] can be decreased by a translation if and only if the mean m = E[Y ] =

i
P
H
(i)c
i
is non-zero. If it is non-zero, then the translated constellation

C = { c
0
, . . . c
m1
}, where
c
i
= c m, will achieve the minimum energy among all possible translated versions of C .
The average energy associated to the translated constellation is

E = E kmk
2
.
If S = {w
0
(t), . . . , w
m1
(t)} is the set of waveforms linked to C via some orthonormal
bias, then through the same basis c
i
will be associated to w
i
(t) = w
i
(t) m(t) where
m(t) =

i
P
H
(i)w
i
(t) . An example follows.
Example 62. Let w
0
(t) and w
1
(t) be rectangular pulses with support [0, T] and [T, 2T] ,
respectively, as shown on the left of Figure 4.2(a). Assuming that P
H
(0) = P
H
(1) =
1
2
, we
calculate the average m(t) =
1
2
w
0
(t)+
1
2
w
1
(t) and see that it is non-zero (center waveform).
Hence we can save energy by using the new signal set dened by w
i
(t) = w
i
(t) m(t) ,
i = 0, 1 (right). In Figure 4.2(b) we see the same idea, but acting on the codewords c
0
and c
1
obtained through the orthonormal basis
i
(t) =
w
i1
(t)
kw
i1
k
, i = 1, 2. As we see from
the gures, w
0
(t) and w
1
(t) are antipodal signals. This is not a coincidence: after we
remove the mean, the two signals become the negative of each other. This is best seen
from the codeword viewpoint of Figure 4.2(b).

c
0
c
1

a
(a) Codeword Viewpoint

c
0
c
1
t
w
0
(t)
t
w
0
(t)
t
w
1
(t)
t
a(t)
(b) Waveform Viewpoint
t
w
1
(t)
Figure 4.2: Energy minimization by translation
128 Chapter 4.
4.6 Isometric Transformations Applied to the Waveform
Set
The denition of isometry is based on the notion of distance, which is dened in every
inner product space: the distance between V and V is the norm k k.
Let V be the inner product space spanned by a signal set W = {w
0
(t) . . . w
m1
(t)} and
let a : V V be an isometry. If we apply this isometry to W, we obtain a new signal
set

W = { w
0
(t) . . . w
m1
(t)} V . Let B = {
1
(t), . . . ,
n
(t)} be an orthonormal basis
for V and let C = {c
0
, . . . , c
m1
} be the codebook associated to W via B. Could we
have obtained

W by applying some isometry to the codebook C ?
Yes we could. Through B, we obtain the codebook

C = { c
0
, . . . , c
m1
} associated to

W.
Through the composition that sends c
i
w
i
(t) w
i
(t) c
i
, we obtain a map from C
to

C . It is easy to see that this map is an isometry of the kind considered in Section 4.4.
Are there other kinds of isometries applied to W that cannot be obtained simply by
applying an isometry to C ? Yes there are. The easiest way to see this is to keep the
codebook the same and substitute the original orthonormal basis B = {
1
(t), . . . ,
n
(t)}
with some other orthonormal basis

B = {

1
(t), . . . ,

n
(t)}. In so doing, we obtain an
isometry from V to some other subspace

V of the set of nite energy signals. (See
Exercise 4 for more insight on this.)
The new signal set

W might not bear any resemblance to W, yet the resulting error
probability will be identical since the codebook is unchanged. This sort of transformation
is implicit in Example 60 of Section 3.4.
4.7 Time Bandwidth Product versus Dimensionality
If we substitute an orthonormal basis {
1
(t), . . . ,
n
(t)} with the related orthonormal
basis {
1
(t), . . . ,
n
(t)} obtained via the relationship
i
(t) =

b
i
(bt) for some b 1,
i = 1, . . . , n, then all signals are time-compressed and frequency-expanded by the same
factor b . Regardless how we dene the duration T and the bandwidth W , this example
suggests that we can increase one of the two at the expense of the other while keeping
WT constant. Is there a minimum to WT for a xed dimensionality?
There is indeed a fundamental relationship between the dimensionality n and the product
WT . Formulating this relationship precisely is tricky because the details depend on how
we dene duration and bandwidth. As it turns out, any reasonable denition leads to the
same conclusion, specically that the dimensionality of a set of time and frequency limited
signals grows linearly with WT when WT is large. Perhaps the cleanest formulation is
4.7. Time Bandwidth Product versus Dimensionality 129
the one presented by David Slepian in his Shannon Lecture
1
[8]. Hereafter we summarize
his main result without proof.
Formulating Slepians view on the relationship between n and WT requires a worthwhile
philosophical digression about mathematical model versus reality. When we say that a
real-world signal h(t) is time-limited to (
T
2
,
T
2
) , we can at best mean that if we measure
it we cannot tell the dierence between h(t) and h(t)
[
T
2
,
T
2
]
(t) . Our limited ability
to tell the dierence between a signal and another could be due to a number of things,
including the facts that the instrument we use to make measurements is made of wires
that lter the signal and add noise.
To cope with the indistinguishability of certain signals we say that two signals are indis-
tinguishable at level if their dierence has norm less than .
We say that h(t) is time-limited to the interval (
T
2
,
T
2
) at level if h(t) is indistin-
guishable from
(
T
2
,
T
2
)
h(t) at level . If T
0
is the smallest such T , then we say that
h(t) is of duration T
0
at level .
Example 63. Consider the signal h(t) = e
t
, t R. The norm of h(t)h(t)
(T/2,T/2)
(t)
is

e
T
0
. Hence, for any xed > 0, h(t) is of duration T
0
= 2 ln . For instance, for
= 10
5
, T
0
= 23.025.
Similarly, we say that h(t) is frequency-limited to the interval (
W
2
,
W
2
) at level if
h
F
(f) is indistinguishable from
(
W
2
,
W
2
)
h
F
(f) at level . If W
0
is the smallest such W ,
then we say that h(t) is a signal of bandwidth W
0
at level .
A particularity of these denitions is that if we increase the strength of a signal, we could
very well increase its duration and bandwidth. This is in distinct contradiction with the
usual denitions where duration and bandwidth are not aected by scaling. Another
particularity is that all nite-energy signals are both frequency-limited to some nite
bandwidth W and time-limited to some nite duration T .
The dimensionality of a signal set
2
is modied accordingly. We say that a set G of
signals has approximate dimension n at level during the interval (
T
2
,
T
2
) if there is a
xed collection of n = n(T, ) signals, say {
1
(t), . . . ,
n
(t)}, such that over the interval
(
T
2
,
T
2
) every signal in G is indistinguishable at level from some signal of the form

n
i=1
a
i

i
(t) . That is, we require for each h(t) G that there exists as such that
(
T
2
,
T
2
)
h(t) and
(
T
2
,
T
2
)

n
i=1
a
i

i
(t) are indistinguishable at level . We further require
that n be the smallest such number. We can now state the main result.
Theorem 64. (Slepian) Let G

be the set of all signals frequency-limited to (


W
2
,
W
2
)
and time-limited to (
T
2
,
T
2
) at level . Let n(W, T, , ) be the approximate dimension
1
The Shannon Award is the most prestigious award bestowed by the Information Theory Society.
Slepian was the rst after Shannon himself to receive the award. The recipient presents the Shannon
Lecture at the next IEEE International Symposium on Information Theory.
2
We do not require that this signal set be closed under addition and under multiplication by scalars,
i.e., we do not require that it forms an inner product space.
130 Chapter 4.
of G

at level during the interval (


T
2
,
T
2
) . Then, for every > ,
lim
T
n(W, T, , )
T
= W
lim
W
n(W, T, , )
W
= T.
So for large values, n is essentially WT . As already mentioned, the linear asymptotic
relationship between n and WT is not tied to how we dene duration and bandwidth. In
the following example, for every positive integer n we construct a signal space for which
WT = n.
Example 65. Let (t) =
1

Ts
sinc(t/T
s
) and
F
(f) =

T
s [1/(2Ts),1/(2Ts)]
(f) be a nor-
malized pulse and its Fourier transform. Let
l
(t) = (t lT
s
) , l = 1, . . . , n. The
collection B = {
1
(t), . . . ,
n
(t)} forms an orthonormal set. One way to see that
i
(t)
and
j
(t) are orthogonal to one another when i 6= j is to go to the Fourier domain and
use Parsevals relationship. (Another way is to evoke Theorem 79 of Chapter 5.) Let G
be the space spanned by the orthonormal basis B. It has dimension n by construction.
All signals of G are strictly frequency-limited to (W/2, W/2) for W = 1/T
s
and in a
sense (that we could dene) time-limited to (0, T) . For this example WT = n.
In light of the above example, the next example might be surprising at rst.
Example 66. Let (t) =
1

Ts
[Ts/2,Ts/2]
(t) and
F
(f) =

T
s
sinc(T
s
f) be the nor-
malized rectangular pulse of duration T
s
and its Fourier transform. The collection
{
1
(t), . . . ,
n
(t)}, where
l
(t) = (t lT
s
) , l = 1, . . . , n, forms an orthonormal set.
(This is obvious from the time-domain.) Let G be the space spanned by the orthonormal
basis B. It has dimension n by construction. All signals of G are strictly time-limited
to (T
s
/2, T +T
s
/2) for T = nT
s
and in a sense (that we could dene) frequency-limited
to (W/2, W/2) for W = 2/T
s
. For this example WT = 2n.
In both of the above two examples, we have constructed a signal set of dimensionality n
for which there is a linear relationship between n and WT , where W is the bandwidth
(according to some reasonable denition) of every signal in the set and T is the width of
the smallest interval of time that contains (according to some reasonable denition) every
signal in the set. What might be surprising is that in the second example WT is twice
the value taken in the rst example.
The explanation lies in the fact that in the rst example we shift the sinc by half its
width. (By width we mean the main-lobe width.) So it is half the width of the sinc that
matters for the growth of WT for large n. Every shift contributes by 1 to the nal WT
count. In contrast, in the second example it is the full with of the sinc main lobe that
matters for the nal count. Every time we shift the rectangle by T , we contribute by
2 to the nal WT count. We can summarize this by saying that the width of a sincs
4.7. Time Bandwidth Product versus Dimensionality 131
main lobe is not representative of how tightly we can pack shifted sinks next to each other
while keeping them orthogonal to one another. It is enough to shift the sinc by half that
width to ensure orthogonality.
Note that, in this section, n is the dimensionality of the signal space which may or may
not be related to a codeword length (also denoted by n). For instance, if we communicate
using signals from a signal space of dimensionality n, we can choose the codeword length
to be any integer smaller or equal n. For example, we could use a space of dimension
n = 10
7
to send one thousand codewords of length n = 10
4
each. It is standard practice
to use n for the dimensionality and for the codeword length: which is which should always
be clear from the context.
Theorem 64 establishes a fundamental relationship between the continuous-time and the
discrete-time channel model. It says that if we are allowed to use a frequency interval
of width W Hz during T seconds, then we can make approximately (asymptotically
exactly) up toWT uses of the equivalent discrete-time channel model. In other words,
we get to choose the discrete-time channel at a rate of up to W channel uses per second.
This is perhaps the most useful interpretation. It tells us that a discrete-time system that
sends k =
log
2
m
n
bits per channel use, where m is the codebook size and n the codeword
length, can be used to send kW bits per second.
Theorem 64 tells us that time and frequency are on an equal footing in terms of providing
the degrees of freedom exploited by the discrete-time channel. It is sometimes useful to
think of T and W as the width and hight of a rectangle in the time frequency plane
as shown in Figure 4.3. We associate such a rectangle with the set of signals that have
the corresponding time and frequency limitations according to, say, the criterion use in
Theorem 64. Like a piece of land, such a rectangle represents a natural resource and what
matters for its exploitation is its area.
W
T
t
f
Figure 4.3: Time frequency plane.
While Theorem 64 assumes a signal set that can be represented in the time frequency
plane by a rectangle as in Figure 4.3, as we can see from Exercise ?? (to be added), one
can argue that the relationship between the dimensionality of a signal set and the area
occupied by its representation in the time frequency plane extends to any shape. So
132 Chapter 4.
the shape does not matter in some sense but in practice it does as it aects the implemen-
tation. This explains why we typically communicate using signal sets that correspond to
a rectangle in the time frequency plane.
4.8 Building Intuition about Scalability: n versus k
The aim of this section is to sharpen our intuition by looking at a few examples of signal
constellations that contain a large number m of signals. We are interested in exploring
what happens to the probability of error when the number k = log
2
m of bits carried by
one signal becomes large. In doing so, we will let the energy grow linearly with k so as to
keep constant the energy per bit, which seems to be fair. The dimensionality of the signal
space will be n = 1 for the rst example (PAM) and n = 2 for the second (PSK). In the
third example (bit-by-bit on a pulse train) n will be equal to k . In the nal example
(block-orthogonal signaling) we will let n = 2
k
. These examples will provide us with
useful insight on the asymptotic relationship between the number of transmitted bits and
the dimensionality of the signal space.
What matters for all these examples is the choice of codebook. There is no need, in
principle, to specify the waveform signal w
i
(t) associated to a codeword c
i
. Nevertheless,
we will specify w
i
(t) to make the examples more realistic.
4.8.1 Keeping n Fixed as k Grows
Example 67. (PAM) In this example, we x n = 1. Let m be a positive even integer,
H = {0, 1, . . . , m1} be the message set, and for each i H let c
i
be a distinct element
of {a, 3a, 5a, . . . (m1)a} as shown in Figure 4.4.

c
0
5a

c
1
3a

c
2
a

c
3
a

c
4
3a

c
5
5a
Figure 4.4: Codebook for PAM signaling.
The waveform associated to message i is
w
i
(t) = c
i
(t),
where (t) is an arbitrary unit-energy waveform. This signaling method is called Pulse
Amplitude Modulation (PAM). (With n = 1 we do not have any choice other than
modulating the amplitude of a pulse.) We are totally free to choose the pulse. For the sake
of completeness we arbitrarily choose a rectangular pulse such as (t) =
1

T
[T/2,T/2]
(t) .
4.8. Building Intuition about Scalability: n versus k 133
We have already computed the error probability of PAM in Example 6 of Section 2.4.3,
namely
P
e
=

2
2
m

,
where
2
= N
0
/2. As shown in Exercise 9, the average energy of the above constellation
when signals are uniformly distributed is
E =
a
2
(m
2
1)
3
. (4.1)
Equating to E = kE
b
and using the fact that k = log
2
m yields
a =

3E
b
log
2
m
(m
2
1)
,
which goes to 0 as m goes to . Hence P
e
goes to 1 as m goes to .
Example 68. (PSK) In this example, we keep n = 2. We could start by dening the
codebook as in the previous example and then choose an arbitrary orthonormal basis, but
for the codebook we are after in this example, there is a natural signal set. Hence, we start
from there. Let T be a positive number and dene the Phase-Shift-Keying constellation
w
i
(t) =
_
2E
T
cos(2f
0
t +
2
m
i)
[0,T]
(t), i = 0, 1, . . . , m1. (4.2)
We assume that 2f
0
T is an integer for some integer k , so that kw
i
k
2
= E for all i .
(When 2f
0
T is an integer, w
i
(t) has an integer number of periods in a length- T interval.
This ensures that its norm is the same, regardless of the initial phase.) The signal space
representation can be obtained by using the trigonometric equivalence cos( + ) =
cos() cos() sin() sin() to rewrite (4.2) as
w
i
(t) = c
i,1

1
(t) +c
i,2

2
(t),
where
c
i,1
=

E cos
_
2i
m
_
,
1
(t) =
_
2
T
cos(2f
0
t)
[0,T]
(t),
c
i,2
=

E sin
_
2i
m
_
,
2
(t) =
_
2
T
sin(2f
0
t)
[0,T]
(t).
Hence the codeword associated to w
i
(t) is
c
i
=

cos 2i/m
sin 2i/m

.
134 Chapter 4.
In Example 19, we have already studied this constellation and have derived the following
lower bound to the error probability
P
e
2Q
_
_
E

2
sin

m
_
m1
m
,
where
2
=
N
0
2
is the variance of the noise in each coordinate. If we let E = kE
b
grow
linearly with k , the circle that contains the codewords has radius

E =

kE
b
. Its
circumference grows with

k , and the number m = 2


k
of points on this circle grows
exponentially with k . Hence the minimum distance between points goes to zero (indeed
exponentially fast). As a consequence, the argument of the Q function that lower bounds
the probability of error for PSK goes to 0 and the probability of error goes to 1.
As they are, the signal constellations used in the above two examples are not suitable
to transmit a large amount k of bits by letting the constellation size m = 2
k
grow
exponentially with k . The problem with the above two examples is that, as m grows, we
are trying to pack an increasing number of points into a space that also grows in size but
not fast enough. The space becomes crowded as m grows, meaning that the minimum
distance becomes smaller and the probability of error increases.
We should not conclude that PAM and PSK are not useful to send many bits. On the
contrary, these signaling methods are used in conjunction with some variation of the
technique described in the next example. The idea is to keep m relatively small and
reuse the constellation over and over along dierent dimensions. See the comment after
the next example.
4.8.2 Growing n Linearly with k
Example 69. (Bit by Bit on a Pulse Train) The idea is to use a dierent dimension
for each bit. Let (b
i,1
, b
i,2
, . . . , b
i,k
) be the binary sequence corresponding to message i .
For mathematical convenience, we assume these bits to take value in {1} rather than
{0, 1}. We let the associated codeword be c
i
= (c
i,1
, c
i,2
, . . . , c
i,k
)
T
be dened by
c
i,j
= b
i,j
_
E
b
where E
b
=
E
k
is the energy per bit. The transmitted signal is
w
i
(t) =
k

j=1
c
i,j

j
(t), t R. (4.3)
As already mentioned, the choice of orthonormal basis is immaterial for the point we
are making, but in practice some choices are more convenient than others. Specically,
if we choose
j
(t) = (t jT) for some waveform (t) that fullls h
i
,
j
i =
ij
,
4.8. Building Intuition about Scalability: n versus k 135
then the n-tuple former is drastically simplied because a single matched lter is suf-
cient to obtain all n projections (see Subsection 3.4.1). For instance, we can choose
(t) =
1

Ts
[Ts/2,Ts/2]
(t) , which fullls the mentioned constraints. We can now rewrite
the waveform signal as
w
i
(t) =
k

j=1
c
i,j
(t jT
s
), t R. (4.4)
The above expression justies the name bit-by-bit on a pulse train given to this signaling
method. As we will see in Chapter 5, there are many other possible choices for (t) .

c
0

c
1
(a) k = 1

2
c
0 c
1
c
2 c
3
(b) k = 2

c
2

c
1

c
0

c
4

c
7

c
3

c
5

c
6
(c) k = 3
Figure 4.5: Codebooks for bit-by-bit on a pulse train signaling.
The codewords c
0
, . . . , c
m1
are the vertices of a k -dimensional hypercube as shown in
Figure 4.5 for k = 1, 2, 3. For these values of k we immediately see from the gure what
the decoding regions of a ML decoder are, but let us proceed analytically and nd a ML
decoding rule that works for any k . The ML receiver decides that the constellation point
used by the sender is the c
i
= {

E
b
}
k
that maximizes hy, c
i
i
kc
i
k
2
2
. Since kc
i
k
2
is the same for all i , the previous expression is maximized by the c
i
that maximizes
136 Chapter 4.
hy, c
i
i =

y
j
c
i,j
. The maximum is achieved for the i for which c
i,j
= sign(y
j
)

E
b
where
sign(y) =
_
1, y 0
1, y < 0.
We now compute the error probability. As usual, we rst compute the error probability
conditioned on a specic c
i
. From the codebook symmetry, we expect that the error
probability will not depend on i . If c
i,j
is positive, Y
j
=

E
b
+ Z
j
and a maximum
likelihood decoder will make the correct decision if Z
j
>

E
b
. (The statement is an
if and only if if we ignore the zero-probability event that Z
j
=

E
b
.) This happens
with probability 1 Q(

E
b

) . Based on similar reasoning, it is straightforward to verify


that the probability of error is the same if c
i,j
is negative. Now let C
j
be the event that
the decoder makes the correct decision about the j th bit. The probability of C
j
depends
only on Z
j
. The independence of the noise components implies the independence of C
1
,
C
2
, . . . , C
k
. Thus, the probability that all k bits are decoded correctly when H = i is
is
P
c
(i) =
_
1 Q

E
b

_
k
,
which is the same for all i and, therefore, it is also the average error probability P
c
.
Notice that P
c
0 as k . However, the probability that a any specic bit be
decoded incorrectly is Q(

E
b

) , which does not depend on k .


Although in this example we chose to transmit a single bit per dimension, we could have
transmitted instead some small number of bits per dimension by means of one of the
methods discussed in the previous two examples. In that case we would have called the
signaling scheme symbol-by-symbol on a pulse train: this term will come up often in this
text. In fact it is the basis for many digital communication systems.
The following question seems natural at this point: Is it possible to avoid that P
c
0
as k ? The next example gives us the answer.
4.8.3 Growing n Exponentially With k
Example 70. (Block-Orthogonal-Signaling) Let n = m = 2
k
, choose n orthonormal
waveforms
1
(t), . . . ,
n
(t) and dene w
1
(t), . . . , w
m
(t) to be
w
i
(t) =

E
i
(t).
This is called block-orthogonal signaling. The name stems from the fact that in practice
a block of k bits are collected and then mapped into one of m orthogonal waveforms.
Notice that kw
i
k
2
= E for all i .
There are many ways to choose the 2
k
waveforms
i
(t) . One way is to choose
i
(t) =
(tiT) for some normalized pulse (t) such that (tiT) and (tjT) are orthogonal
4.8. Building Intuition about Scalability: n versus k 137
when i 6= j . In this case the requirement for (t) is the same as that in bit-by-bit on a
pulse train, but now we need 2
k
rather than k shifted versions, and we send one pulse
rather than a train of k weighted pulses. For obvious reasons the resulting signaling
scheme is sometimes called pulse position modulation.
Another example is to choose
w
i
(t) =
_
2E
T
cos(2f
i
t)
[0,T]
(t). (4.5)
This is called m-FSK ( m-ary frequency shift keying). If we choose f
i
T = k
i
/2 for some
integer k
i
such that k
i
6= k
j
if i 6= j then
hw
i
, w
j
i =
2E
T
_
T
0
_
1
2
cos[2(f
i
+f
j
)t] +
1
2
cos[2(f
i
f
j
)t]
_
dt = E
ij
as desired.

c
1
c
2
(a) k = 2

c
2
c
3

c
1
(b) k = 3
Figure 4.6: Codebooks for block-orthogonal signaling.
When m 3, it is not easy to visualize the decoding regions. However, we can proceed
analytically, using the fact that all coordinates of c
i
are 0 except for the i th, which has
value

E . Hence,

H
ML
(y) = arg max
i
hy, c
i
i
E
2
= arg max
i
hy, c
i
i
= arg max
i
y
i
.
To compute (or bound) the error probability, we start as usual with a xed c
i
. We choose
i = 1. When H = 1,
Y
j
=
_
Z
j
if j 6= 1,

E +Z
j
if j = 1.
138 Chapter 4.
Then
P
c
(1) = Pr{Y
1
> Z
2
, Y
1
> Z
3
, . . . , Y
1
, > Z
m
|H = 1}.
To evaluate the right side, we rst condition on Y
1
= , where R is an arbitrary
number
Pr{H =

H|H = 1, Y
1
= } = Pr{ > Z
2
, . . . , > Z
m
} =
_
1 Q
_

_
N
0
/2
__
m1
,
and then remove the conditioning on Y
1
,
P
c
(1) =
_

f
Y
1
|H
(|1)
_
1 Q
_

_
N
0
/2
__
m1
d
=
_

N
0
exp

E)
2
N
0
_
_
1 Q
_

_
N
0
/2
__
m1
d,
where we use the fact that when H = 1, Y
1
N(

E,
N
0
2
) . The above expression for
P
c
(1) cannot be simplied further, but we can evaluate it numerically. By symmetry,
P
c
(1) = P
c
(i) for all i . Hence P
c
= P
c
(1) = P
c
(i) .
The fact that the distance between any two distinct codewords is a constant simplies
the union bound considerably:
P
e
= P
e
(i) (m1)Q

d
2

= (m1)Q
_
_
E
N
0
_
< 2
k
exp
_

E
2N
0
_
= exp
_
k

E/k
2N
0
ln 2
_
,
where we used
2
=
N
0
2
and d
2
= kc
i
c
j
k
2
= kc
i
k
2
+kc
j
k
2
2hc
i
, c
j
i = kc
i
k
2
+kc
j
k
2
= 2E .
By letting E = E
b
k we obtain
P
e
< exp

k(
E
b
2N
0
ln 2)
_
.
We see that P
e
0 as k , provided that
E
b
N
0
> 2 ln 2.
The result of the above example is quite surprising at rst. The more bits we send, the
larger the probability P
c
that they will all be decoded correctly. On second thought, it
4.8. Building Intuition about Scalability: n versus k 139
is less surprising. We have more options to resist the noise if we are allowed to choose
an encoder that maps many bits at a time. In fact in n dimensions, the noise source is
more constrained than the encoder in the following sense. The encoder is free to choose
the components of the codewords it produces, with the only restriction that the average
energy constraint be met. A codeword that is zero in all components except for one would
not raise any eyebrows. To the contrary, you should be very suspicious of an iid Gaussian
source that outputs such an n-tuple. Implicitly, designing a good encoder is a matter of
using the encoders freedom to create patterns (codewords) that remain identiable after
being corrupted by noise.
Unfortunately there is a major problem with the requirement that n grows exponentially
with k . In fact, from Theorem 64 it means that WT has to grow exponentially with
k . In general, we expect W to be xed and T to grow linearly with k . We deduce
that for large values of k we can use bit-by-bit on a pulse train but not block-orthogonal
signaling.
In the next subsection we gain additional insight on why the message error probability
goes to 0 for block-orthogonal signaling and to 1 for bit-by-bit on a pulse train. The
union bound is very useful for this.
4.8.4 Bit-By-Bit Versus Block-Orthogonal
We have seen that the message error probability goes to 1 in bit-by-bit on a pulse train
and goes to zero (exponentially fast) in block-orthogonal signaling. The union bound is
quite useful to understand what goes on.
In computing the error probability when message i is transmitted, the union bound has
one term for each j 6= i . The dominating terms correspond to the signals c
j
that are
closest to c
i
. If we neglect the other terms, we obtain an expression of the form
P
e
(i) N
d
Q

d
m
2
2

,
where N
d
is the number of dominant terms, i.e., the number of nearest neighbors to c
i
,
and d
m
is the minimum distance, i.e., the distance to a nearest neighbor. 2
2
= N
0
.
For bit-by-bit on a pulse train, there are k closest neighbors, each neighbor obtained by
changing c
i
in exactly one component, and each of them is at distance 2

E
b
from c
i
. As
k increases, N
d
increases and Q(
dm
2
2
) stays constant. The increase of N
d
makes P
e
(i)
increase.
Now consider block-orthogonal signaling. All signals are at the same distance from each
other. Hence there are N
d
= 2
k
1 nearest neighbors to c
i
, all at distance d
m
=

2E =
140 Chapter 4.

2kE
b
. Hence
Q

d
m
2

1
2
exp
_
d
2
m
8
2
_
=
1
2
exp
_

kE
b
4
2
_
,
N
d
= 2
k
1 = e
k ln 2
1.
We see that the probability that the noise carries a signal closer to a specic neighbor
decreases as exp
_

kE
b
4
2
_
, whereas the number of nearest neighbors increases as exp(k ln 2) .
For
E
b
4
2
> ln 2 the product decreases, otherwise it increases.
In all four examples considered in this section, there is a common thread. As k increases,
the space populated by the signals grows in size and the number of signals increases.
If the former does not grow fast enough, the space becomes more and more crowded
and the error probability goes up. Sophisticated coding techniques can avoid this while
keeping a linear relationship between n and k . In Chapter 6, we do a case study of such
a technique.
4.9 Conclusion and Outlook
We have discussed some of the trade-os between the number of transmitted bits, the
signal duration, the bandwidth, the signals energy, and the error probability. We have
seen that, rather surprisingly, it is possible to transmit an increasing number k of bits at
a xed energy per bit E
b
and to make the probability that even a single bit is decoded
incorrectly go to zero as k increases. However, the scheme we used to prove this has the
undesirable property of requiring an exponential growth of the time bandwidth product.
In any given channel, we would quickly run out of time and/or bandwidth even with
moderate values of k . In real-world applications, we are given a xed bandwidth and we
let the duration grow linearly with k . It is not a coincidence that most signaling methods
in use today can be seen one way or another as renements of bit-by-bit on a pulse train.
This line of signaling technique will be pursued in the next two chapters.
This is a good time to clarify our non-standard use of the words coding, encoder, codeword,
and codebook. We have seen that no matter which waveform signals we use to communi-
cate, we can always break down the sender into a block that provides an n-tuple and one
that maps the n-tuple into the corresponding waveform. This view is completely general
and serves us well, whether we analyze or implement a system. Unfortunately there is no
standard name for the rst block. Calling it an encoder is a good name, but the reader
should be aware that the current practice is to say that there is coding when the mapping
from bits to codewords is non-trivial and to say that there is no coding when the map is
trivial as in bit-by-bit on a pulse train. Making a distinction is not a satisfactory solution
in our view. First of all, there is no good way to make a clean distinction between a trivial
and a non-trivial map. Second, there is no good substitute for the term encoder for the
block that implements a trivial map. We nd it to be a cleaner solution to talk about an
4.9. Conclusion and Outlook 141
encoder, regardless of complexity. An example of a non-trivial encoder will be studied in
depth in Chapter 6.
Calling the second block a waveform former is denitely non-standard, but we nd this
name to be more appropriate than calling it a modulator, which is the most common name
used for it. It has been inherited from the old days of analog communication techniques
such as amplitude modulation (AM) for which it was an appropriate name.
In this chapter we have looked at the relationship between k , T , W , E and P
e
by
considering specic signaling methods. Information theory is a eld that searches for the
ultimate tradeos, regardless of signaling method. A main result from information theory,
is the famous formula
C =
W
2
log
2

1 +
2P
N
0
W

. (4.6)
It gives a precise value to the ultimate rate C [bps] at which we can transmit reliably over
a waveform AWGN channel of power spectral density N
0
/2 [Watts/Hz] if we are allowed
to use signals of power not exceeding P [Watts] and absolute bandwidth not exceeding
W [Hz].
As already mentioned, it is quite common to consider only positive frequencies in de-
termining the bandwidth of a signal. We can identify two reasons this point of view
has become popular. One reason is that the positive frequencies are those we see
when we observe a signal with a spectrum analyzer. The other reason is that the use
of complex-valued notation to design and analyze communication systems is a relatively
recent practice. If we consider only real-valued signals, then the only dierence between
the two bandwidth denitions is a factor 2. As negative frequencies count as much as
the positive ones in determining the dimensionality of a set of signals, if we allow for
complex-valued signals, we can no longer disregard the negative frequencies in the def-
inition of bandwidth. Notice that if we dene bandwidth accounting only for positive
frequencies, then the relationship n = WT becomes n = 2BT (which is less attractive)
and (4.6) becomes the C = Blog
2
(1 +
P
N
0
B
) (which is more attractive
3
).
3
The factor 1/2 in front of (4.6) is fundamental: it reects the fact that there is a factor 1/2 in
front of the capacity of the discrete-time channel model. The factor 2 inside the log is an artifact of an
unfortunate practice, specically that we denote the noise power spectral density by
N0
2
rather than by
the better choice N
0
.
142 Chapter 4.
4.10 Exercises
Problem 1. (Bandwidth) Verify the following statements.
(a) The absolute bandwidth of sinc(
t
Ts
) is W =
1
Ts
.
(b) The 3-dB bandwidth of an RC lowpass lter is W =
1
RC
.
(c) The -bandwidth of an RC lowpass lter is W =
1
RC
tan
_

2
(1 )
_
.
(d) The zero crossing bandwidth of
[
Ts
2
,
Ts
2
]
(t) is W =
2
Ts
.
(e) The equivalent noise bandwidth of an RC lowpass lter is W =
1
2RC
.
(f) The RMS bandwidth of h(t) = exp(t
2
) is W =
1
4
. Hint: h
F
(f) = exp(f
2
) .
Hint: For an RC lowpass lter, we have h(t) =
1
RC
exp
_

t
RC
_
, t 0 and h(t) = 0
otherwise. The squared magnitude of its Fourier transform is |h
F
(f)|
2
=
1
1+(2RCf)
2
.
Problem 2. (Signal Translation) Consider the signals w
0
(t) and w
1
(t) shown in Fig-
ure 4.7, used to communicate one bit across an AWGN channel of power spectral density
N
0
/2.
t
w
0
(t)
T 2T
1
1
t
w
1
(t)
2T
1
1
Figure 4.7:
(a) Determine an orthonormal basis {
0
(t),
1
(t)} for the space spanned by {w
0
(t), w
1
(t)}
and nd the corresponding codewords c
0
and c
1
. Work out two solutions, one ob-
tained via Gram-Schmidt and one in which the second elements of the orthonormal
basis is a delayed version of the rst. Which of the two solutions would you choose
if you had to implement the system?
(b) Let X be a uniformly distributed binary random variable that takes values in {0, 1}.
We want to communicate the value of X over an additive white Gaussian noise
channel. When X = 0, we send w
0
(t) , and when X = 1, we send w
1
(t) . Draw the
block diagram of a ML receiver based on a single matched lter.
4.10. Exercises 143
(c) Determine the error probability P
e
of your receiver as a function of T and N
0
.
(d) Find a suitable waveform v(t) , such that the new signals w
0
(t) = w
0
(t) v(t) and
w
1
(t) = w
1
(t) v(t) have minimal energy and plot the resulting waveforms.
(e) What is the name of the kind of signaling scheme that uses w
0
(t) and w
1
(t) ? Argue
that one obtains this kind of signaling scheme independently of the initial choice of
w
0
(t) and w
1
(t) .
Problem 3. (Orthogonal Signal Sets) Consider a set W = {w
0
(t), . . . , w
m1
(t)} of mu-
tually orthogonal signals with squared norm E each used with equal probability.
(a) Find the minimum-energy signal set

W = { w
0
(t), . . . , w
m1
(t)} obtained by trans-
lating the original set.
(b) Let

E be the average energy of a signal picked at random within

W. Determine

E
and the energy saving E

E .
(c) Determine the dimension of the inner-product space spanned by

W.
Problem 4. (Isometries) Let

W = { w
0
(t), . . . , w
m1
(t)} be obtained from a given signal
set W = {w
0
(t), . . . , w
m1
(t)} via an isometry that sends w
i
(t) W to w
i
(t)

W, i =
0, . . . , m1. To simplify notation, we assume that W and

W are orthogonal to one an-
other and each spans a space of dimension n. Let B = {
1
(t), . . . ,
n
(t),
n+1
(t), . . . ,
2n
(t)}
be an orthonormal basis for the signal set spanned by

WW, where the rst n elements
of this basis form a basis for the inner product space spanned by W and the last n form
a basis for that spanned by

W. Let C be the codebook associated to W with respect to
the basis B and let

C be the one associated to

W. Notice that codewords have length
2n. Describe the isometry that maps C to

C .
Problem 5. (Dimensionality vs Area) The purpose of this exercise is to explore in what
sense the dimensionality of a signal set equals the area it occupies in the time frequency
plane. The focus is on ideas rather than mathematical rigor.
Consider the three regions in the time frequency plane of Figure 4.8. The region that
contains the origin represents a set G of signals that are time-limited to (
T
2
,
T
2
) and
frequency-limited to (
W
2
,
W
2
) in the sense of Theorem 64. The other two regions are
obtained by frequency-shifting all signals by f
0
or by time-shifting the signals by t
0
. We
assume that f
0
> W and t
0
> T .
(a) Argue that for G , there exists a real-valued basis B. Hint: See Exercise 8.
144 Chapter 4.
(b) Describe an orthonormal basis for each of the other two signal sets knowing that
B = {
1
(t), . . . ,
n
(t)} is an orthonormal basis for G . Conclude that the three sets
have the same dimensionality.
(c) Argue that the three sets are orthogonal to one another.
(d) Argue that the dimensionality of a large signal space

G described by one or more
non-overlapping regions (not necessarily rectangular) in the time frequency plane
is essentially equal to the total area of the describing regions.
G
W
T
W f
0
T
t
0
t
f
Figure 4.8:
Problem 6. (Time and Frequency-Limited Orthonormal Sets) Complement Examples 65
and 66 with similar examples in which the shifts occur in the frequency domain. The
corresponding time-domain signals can be complex-valued.
Problem 7. (Root-Mean Square Bandwidth) The root-mean square (rms) bandwidth of
a low-pass signal g(t) of nite energy is dened by
W
rms
=
_
_

f
2
|G(f)|
2
df
_

|G(f)|
2
df
_
1/2
where |G(f)|
2
is the energy spectral density of the signal. Correspondingly, the root
mean-square (rms) duration of the signal is dened by
T
rms
=
_
_

t
2
|g(t)|
2
dt
_

|g(t)|
2
dt
_
1/2
.
4.10. Exercises 145
We want to show that, with the above denitions and assuming that |g(t)| 0 faster
than 1/
_
|t| as |t| , the time bandwidth product satises
T
rms
W
rms

1
4
.
(a) Use Schwarz inequality and the fact that for any c C, c + c

= 2<{c} 2|c| , to
prove that
_

[g

1
(t)g
2
(t) +g
1
(t)g

2
(t)]dt
_
2
4
_

|g
1
(t)|
2
dt
_

|g
2
(t)|
2
dt.
(b) In the above inequality insert g
1
(t) = tg(t) and g
2
(t) =
dg(t)
dt
and show that
__

t
d
dt
[g(t)g

(t)] dt
_
2
4
_

t
2
|g(t)|
2
dt
_

dg(t)
dt

2
dt.
(c) Integragte the left-hand side by parts and use the fact that |g(t)| 0 faster than
1/
_
|t| as |t| to obtain
__

|g(t)|
2
dt
_
2
4
_

t
2
|g(t)|
2
dt
_

dg(t)
dt

2
dt.
(d) Argue that the above is equivalent to
_

|g(t)|
2
dt
_

|G(f)|
2
df 4
_

t
2
|g(t)|
2
dt
_

4
2
f
2
|G(f)|
2
df.
(e) Complete the proof to obtain T
rms
W
rms

1
4
.
(f) As a special case, consider a Gaussian pulse dened by g(t) = exp(t
2
) . Show that
for this signal T
rms
W
rms
=
1
4
i.e., the above inequality holds with equality. (Hint:
exp(t
2
)
F
exp(f
2
) .)
Problem 8. (Real Basis for Complex Space) Let G be a complex inner-product space
of nite energy waveforms with the property that g(t) G implies g

(t) G .
(a) Let G
R
be the subset of G that contains only real-valued waveforms. Argue that G
R
is a real inner-product space.
(b) Prove that if g(t) = a(t) +jb(t) is in G , then both a(t) and b(t) are in G
R
.
(c) Prove that if {
1
(t), . . . ,
n
(t)} is an orthonormal basis for the real inner-product
space G
R
then it is also an orthonormal basis for the complex inner-product space
G .
146 Chapter 4.
Comment: In this exercise we have shown that we can always nd a real-valued orthonor-
mal basis for an inner product space that fullls the stated condition with respect of
conjugation. An equivalent condition is that if g(t) G then also the inverse Fourier
transform of g

F
(f) is in G . The set G of complex-valued nite-energy waveforms that
are time-limited to (
T
2
,
T
2
) (in a strict sense) and frequency-limited to (
W
2
,
W
2
) fullls
the condition for any of the bandwidth denitions given in Section 4.3. (If we use the
absolute bandwidth denition, then T bust be innite or else the set G is empty.)
Problem 9. (Average Energy of PAM) Let U be a random variable uniformly distributed
in [a, a] and let S be a discrete random variable independent of U and uniformly
distributed over the PAM constellation {a, 3a, , (m1)a}, where m is an even
integer. Let V = S +U .
(a) Find the distribution of V .
(b) Find the variance of U and that of V .
(c) Use part (b) to determine the variance of S . Notice that the variance of S is the
average energy of the PAM constellation used with uniform distribution.
Problem 10. (Suboptimal Receiver for Orthogonal Signaling) This exercise takes a dif-
ferent approach to the evaluation of the performance of Block-Orthogonal Signaling (Ex-
ample 70). Let the message H {1, . . . , m} be uniformly distributed and consider the
communication problem described by:
H = i : Y = c
i
+Z, Z N(0,
2
I
m
),
where Y = (Y
1
, . . . , Y
n
)
T
R
m
is the received vector and {c
1
, . . . , c
m
} R
m
the code-
book consisting of constant-energy codewords that are orthogonal to each other. Without
loss of essential generality, we can assume
c
i
=

Ee
i
,
where e
i
is the i th unit vector in R
m
, i.e., the vector that contains 1 at position i and
0 elsewhere, and E is some positive constant.
(a) Describe the statistic of Y
j
for j = 1, . . . , m given that H = 1.
(b) Consider a suboptimal receiver that uses a threshold t =

E where 0 < < 1.


The receiver declares

H = i if i is the only integer such that Y
i
t . If there is no
such i or there is more than one index i for which Y
i
t , the receiver declares that
it cannot decide. This will be viewed as an error. Let E
i
= {Y
i
t}, E
c
i
= {Y
i
< t},
and describe, in words, the meaning of the event
E
1
E
c
2
E
c
3
E
c
m
.
4.10. Exercises 147
(c) Find an upper bound to the probability that the above event does not occur when
H = 1. Express your result using the Q function.
(d) Now we let E and ln m go to while keeping their ratio constant, namely E =
E
b
ln mlog
2
e. (Here E
b
is the energy per transmitted bit.) Find the smallest value
of E
b
/
2
(according to your bound) for which the error probability goes to zero as E
goes to . Hint: Use m1 < m = exp(ln m) and Q(x) <
1
2
exp(
x
2
2
) .
Problem 11. (Receiver Diagrams) For each signaling method discussed in Section 4.8,
draw the block diagram of an ML receiver.
Problem 12. (Bit-By-Bit on a Pulse Train) A communication system uses bit-by-bit on
a pulse train to communicate at 1 Mbps using a rectangular pulse. The transmitted signal
is of the form

j
B
j [0,Ts]
(t jT
s
).
where B
j
{b}. Determine the value of b needed to achieve bit-error probability
P
b
= 10
5
knowing that the channel corrupts the transmitted signal with additive white
Gaussian noise of power spectral density N
0
/2 where N
0
= 10
2
W/Hz.
Problem 13. (Bit Error Probability) A discrete memoryless source produces bits at a
rate of 10
6
bps. The bits, which are uniformly distributed and iid, are grouped into pairs
and each pair is mapped into a distinct waveform and sent over an AWGN channel of
noise power spectral density N
0
/2. Specically, the rst two bits are mapped into one of
the four waveforms shown in Figure 4.9 with T
s
= 210
6
, the next two bits are mapped
onto the same set of waveforms delayed by T
s
, etc.
(a) Describe an orthonormal basis for the inner product space W spanned by w
i
(t) ,
i = 0, . . . , 3 and plot the signal constellation in R
n
, where n is the dimensionality
of W.
(b) Determine an assignment between pairs of bits and a waveforms such that the bit
error probability is minimized and derive an expression for P
b
.
(c) Draw a block diagram of the receiver that achieves the above P
b
and uses a single
and causal lter.
(d) Determine the energy per bit E
b
and the power of the transmitted signal.
148 Chapter 4.
t
w
0
(t)
1
0 T
s
t
w
2
(t)
1
T
s
t
w
1
(t)
1
T
s
t
w
3
(t)
1
T
s
Figure 4.9:
Problem 14. ( m-ary Frequency Shift Keying) m-ary Frequency Shift Keying ( m-FSK)
is a signaling method that uses signals of the form
w
i
(t) = A
_
2
T
cos(2(f
c
+if)t)
[0,T]
(t), i = 0, , m1,
where A, T , f
c
, and f are xed parameters.
(a) Assuming that f
c
T is an integer, nd the smallest value of f that makes w
i
(t)
orthogonal to w
j
(t) when i 6= j .
(b) In practice the signals w
i
(t), i = 0, 1, , m 1 can be generated by changing
the frequency of a signal oscillator. In passing from one frequency to another a
phase shift is introduced. Again, assuming that f
c
T is an integer, determine the
smallest value f that ensures orthogonality between cos(2(f
c
+ if)t +
i
) and
cos(2(f
c
+jf)t +
j
) whenever i 6= j regardless of
i
and
j
.
(c) Sometimes we do not have complete control over f
c
either, in which case it is not
possible to set f
c
T to an integer. Argue that if we choose f
c
>> mf then for all
practical purposes the signals will be orthogonal to one another.
(d) Determine the average energy E and the frequency-domain interval occupied by the
signal constellation. How does the BT product behave as a function of k = log
2
(m) ?
4.10. Exercises 149
Problem 15. (Antipodal Signaling and Rayleigh Fading) Consider using antipodal sig-
naling, i.e w
0
(t) = w
1
(t) , to communicate one bit across a Rayleigh fading channel that
we model as follows. When w
i
(t) is transmitted the channel output is
R(t) = Aw
i
(t) +N(t),
where N(t) is white Gaussian noise of power spectral density N
0
/2 and A is a random
variable of probability density function
f
A
(a) =

2ae
a
2
, if a 0,
0, otherwise.
(4.7)
We assume that, unlike the transmitter, the receiver knows the realization of A. We also
assume that the receiver implements a maximum likelihood decision, and that the signal
energy is E
b
.
(a) Describe the receiver.
(b) Determine the error probability conditioned on the event A = a.
(c) Determine the unconditional error probability P
f
. (The subscript stands for fading).
(d) Compare P
f
to the error probability P
e
achieved by an ML receiver that observes
R(t) = mw
i
(t) +N(t) , where m = E[A] . Comment on the dierent behavior of the
two error probabilities. For each of them, nd the E
b
/N
0
value necessary to obtain
the probability of error 10
5
.
Problem 16. (Non-White Gaussian Noise) Consider the following transmitter/receiver
design problem for an additive non-white Gaussian noise channel.
(a) Let the hypothesis H be uniformly distributed in H = {0, . . . , m 1} and when
H = i , i H, let w
i
(t) be the channel input. The channel output is then
R(t) = w
i
(t) +N(t)
where N(t) is a Gaussian process of known power spectral density G(f) , where we
assume that G(f) 6= 0 for all f . Describe a receiver that, based on the channel
output R(t) , decides on the value of H with least probability of error. Hint: Find a
way to transform this problem into one that you can solve.
(b) Consider the setting as in part (a) except that now you get to design the signal set
with the restrictions that m = 2 and that the average energy can not exceed E .
We also assume that G
2
(f) is constant in the interval [a, b] , a < b , where it also
achieves its global minimum. What are the two signals that allow for the smallest
possible probability of error to be achieved?
150 Chapter 4.
Problem 17. (Continuous-Time AWGN Capacity) To prove the formula for the capacity
C of the continuous-time AWGN channel of noise power density N
0
/2 when signals are
power-limited to P and frequency-limited to (
W
2
,
W
2
) , we rst derive the capacity C
d
for the discrete-time AWGN channel of noise variance
2
and symbols constrained to
average energy not exceeding E
s
. The two expressions are:
C
d
=
1
2
log
2
_
1 +
E
s

2
_
[bits per channel use]
C = (W/2) log
2

1 +
P
N
0
(W/2)

[bps].
To derive C
d
we need tools from information theory. However, going from C
d
to C
using Theorem 64 is straightforward. To do so, let G

be the set of all signals that are


frequency-limited to (
W
2
,
W
2
) and time-limited to (
T
2
,
T
2
) at level . We choose
small enough that for all practical purposes all signals of G

are strictly frequency-limited


to (
W
2
,
W
2
) and strictly time-limited to (
T
2
,
T
2
) . Each waveform in G

is represented
by an n-tuple and as T goes to innity n approaches WT . Complete the argument
assuming n = WT and without worrying about convergence issues.
Problem 18. (Energy Eciency of PAM) This exercise complements what we have
learned in Example 67. Consider using the m-PAM constellation {a, 3a, 5a, . . .
(m 1)a} to communicate across the discrete-time AWGN channel of noise variance

2
= 1. Our goal is to communicate at some level of reliability, say with error probability
P
e
= 10
5
. We are interested in comparing the energy needed by PAM versus the energy
needed by a system that operates at channel capacity, namely at
1
2
log
2
_
1 +
Es

2
_
[bits per
channel use].
(a) Using the capacity formula, determine the energy per symbol E
C
s
(k) needed to trans-
mit k bits per channel use. (The superscript C stands for channel capacity.) At
any rate below capacity it is possible to make the error probability decrease without
limit by increasing the codeword length. This implies that there is a way to achieve
the desired error probability at energy per symbol E
C
s
(k) .
(b) Using m-PAM, we can achieve an arbitrary small error probability by making the
parameter a suciently large. As the size m of the constellation increases, the edge
eects become negligible, and the average error probability approaches Q(
a

2
) , which
is the probability of error conditioned on an interior point being transmitted. Find
the numerical value of the parameter a for which Q(
a

2
) = 10
5
. (You may use
1
2
exp(
x
2
2
) as an approximation of Q(x) .)
(c) Having xed the value of a, we can use equation (4.1) to determine the average
energy E
P
s
(k) needed by PAM to send k bits at the desired error probability. (The
superscript P stands for PAM.) Find and compare the numerical values of E
P
s
(k)
and E
C
s
(k) for k = 1, 2, 4.
4.10. Exercises 151
(d) Find lim
k
E
C
s
(k+1)
E
C
s
(k)
and lim
k
E
P
s
(k+1)
E
P
s
(k)
.
(e) Comment on PAMs eciency in terms of energy per bit for small and large values
of k . Comment also on the relationship between this exercise and Example 67.
152 Chapter 4.

S-ar putea să vă placă și