LECT3 Probability Theory

Probability Theory in
Digital Communication
An Example: Binary Symmetric Channel

Consider a discrete memoryless channel to transmit binary data
Assume the channel as symmetric, which means
the probability of receiving symbol 1 when symbol 0 is
sent is the same as the probability of receiving symbol 0
when symbol 1 is sent
To describe the probabilistic nature of this channel fully, we
need two sets of probabilities
1. The a priori probabilities of sending binary symbols 0 and 1:
A0, A1: events of transmitting symbols 0 and 1, respectively

Note: p0 +p1 = 1
2. The conditional probabilities of error:
B0, B1: events of receiving symbols 0 and 1, respectively

Requirement: To determine the a posteriori probabilities
P(A0B0) and P(A1B1)
3
P(A0B0): Probability that a symbol 0 was sent, given that

symbol 0 is received
P(A1B1): Probability that a symbol 1 was sent, given that
symbol 1 is received
(because, events B0 and B1 are mutually exclusive)
Similarly,
[Transition probability diagram of BSC]

From the figure,
1. The probability of receiving symbol 0 is given by
2. The probability of receiving symbol 1 is given by
Application of Bayes rule gives
Random Variables:
The outcome of a random experiment
or
may be a real number (as in case of rolling a die)

it may be non-numerical and described by a
phrase (such as heads or tails in tossing a coin)
From a mathematical point of view, it is desirable to have

numerical values for all outcomes. For this reason, we assign a
real number to each sample point according to some rule
Defn.: A function whose domain is a sample space and whose range
is some set of real numbers is called a random variable of
the experiment
Note: It is a function that maps sample points into real numbers7
Notation: When the outcome of an experiment is s, the random

variable is denoted as X(s) or simply X
Discrete Random Variable: If X takes on only a discrete set of

values
Example: Outcome of the throw of a die
Continuous Random Variable: If X takes on any value in a

whole observational interval
Example: Variable that represents the amplitude of a noise voltage
at a particular instant of time
8
Cumulative Distribution Function:

The CDF FX(x) of a random variable X is the probability that X
takes a value less than or equal to x; i.e.,
Note: For any point x, the distribution function FX(x) expresses a

probability
A CDF FX(x) has the following properties:
1. FX(x) 0
2. FX() = 1
3. FX(-) = 0
4. FX(x) is a monotone non-decreasing function of x; i.e.,

FX(x1) FX(x2) if x1 x2
9
Probability Density Function:

The pdf of the random variable X is defined by
Justification of the name: The name, density function, arises

from the fact that the probability of the event x1 < X x2 equals
The probability of an interval is therefore the area under the

probability density function, in that interval
10
Properties of pdf:
fX(x) 0
1.
for all x
This results from the fact that F(x) increases monotonically,

for as x increases, more outcomes are included in the
probability of occurrence represented by F(x).
2.
f ( x)dx = 1
This result is to be seen from the fact that
f ( x)dx = F () F () = 1 0 = 0
3.
F ( x) =
f ( x)dx
This result follows directly from the defn. of f(x)
11
Several Random Variables:

It may be necessary to identify the outcome of an experiment by
two (or more) random variables.
Let us consider the case of two random variables X and Y.
Joint Distribution Function:
The joint distribution function FXY(x,y) is defined as the
probability that the random variable X is less than or equal to a
specified value x and that the random variable Y is less than or
equal to a specified value y.
FXY ( x, y ) = P( X x, Y y )
y x
XY
( x, y )dxdy
12
Note: (i) The joint sample space is the xy-plane

(ii) FXY(x,y) is the probability that the outcome of an
experiment will result in a sample point lying inside the
quadrant (- < X x, - < Y y) of the joint sample space
Joint pdf:
The joint pdf of the random variables X and Y is defined by
provided this partial derivative exists
P( x1 X x2 , y1 Y y2 ) =
y 2 x2
XY
( x, y )dxdy
y1 x1
13
Properties of Joint pdf:

1. The joint distribution function FXY(x,y) is a monotonenondecreasing function of both x and y. Therefore the joint
pdf fXY(x,y) is always NON-NEGATIVE
2. The total volume under the graph of a joint pdf must be unity

XY
( x, y )dxdy = 1
3. If we should be concerned only with the cumulative probability

upto, say, some value x, quite independently of y, we would
write
x
FX ( x) = P( X x, Y ) =
XY
( x, y )dxdy
14
The probability density corresponding to FX(x) is
d
f X ( x) =
FX ( x) = f XY ( x, y )dy
dx
i.e. the pdf of a single random variable can be obtained from its
joint pdf with a second random variable
Similarly
fY ( y ) =
XY
( x, y )dx
i.e., the pdf fX(x) is obtained from the joint pdf fXY(x,y) by simply
integrating it over all possible values of the undesired random
variable
The pdfs fX(x) and fY(y) are called MARGINAL DENSITIES
15
Conditional pdf:
The conditional pdf of Y given that X = x is defined by
f XY ( x, y )
fY ( y | x) =
f X ( x)
Properties:
1. fY(y|x) 0
provided fX(x) > 0.
2.
( y | x)dy = 1
3. If the RVs X and Y are statistically independent, the

conditional pdf fY(y|x) reduces to the marginal density
function fY(y)
i.e., fY(y|x) = fY(y)
In such a case
fXY(x,y) = fX(x) fY(y)
16
4. When X and Y are independent
y2
x2
P ( x1 X x2 , y1 Y y2 ) = f X ( x)dx fY ( y )dy
y1
x1
17
Statistical Average:
Let us consider the problem of determining the average height of
the entire population of a country
If the data is recorded within the accuracy of an inch, then the
height X of every person will be approximated to one of the n
numbers x1, x2, , xn.
If there are Ni persons of height xi, then the average height is given
by
N x + N 2 x 2 + K + N n xn
X= 1 1
N
N = total number of the persons
Nn
N1
N2
X=
x1 +
x2 + K +
xn
N
N
N
18
In the limit as N, the ratio Ni/N approaches P(xi) according to

relative frequency definition
n
X = xi P( xi )
i =1
The mean of a random variable X, is also called the

EXPECTATION of X and is represented by E[X]
X E[ X ] = xi P ( xi ) = m
i
To calculate the average for a continuous random variable, let us

divide the range of the variable into small intervals x
19
The probability that X lies in the range between xi and xi+ x is
P ( xi X xi + x) P ( xi )
is given approximately by
P ( xi ) = f ( xi )x
So we have
m = xi f ( xi )x
i
In the limit, x 0,
m=
xf ( x)dx
Therefore
E[ X ] =
xf
( x)dx
20
So the expected value or mean of a RV X is defined by:
x = E[ X ] = xf X ( x)dx
It is often necessary to find the mean value of a function of a RV

For instance we are often interested in the mean square
amplitude of a signal. The mean square amplitude is the mean of
the square of the amplitude X, i.e., X 2
Let X denote the random variable
Let Y = g(X)
21
The average value, or expectation, of a function g(X) of the RV X is
given by:
E[Y ] =
yf
( y )dy
or
E[ g ( X )] =
g ( x) f
( x)dx
Example:
22
The expected value of Y is
The above result can easily be generalized for functions of two,

or more, random variables
E[ g ( X , Y )] = g ( x, y ) f ( x, y )
x
discrete case
E[ g ( X , Y )] =
g ( x, y) f ( x, y)dxdy
continuous
case
23
In particular, if Z = XY, then
E[ Z ] = Z =
xy f
XY
( x, y ) dxdy
And if X and Y are independent RVs, then
Z=
xy f
( x) fY ( y ) dxdy
x f
( x)dx y fY ( y ) dy = XY = mx m y
24
Properties of Expectation:
1. If c is any constant, then E(cX) = c E(X)
2. If X and Y are any RVs, then E(X+Y) = E(X) + E(Y)
3. If X and Y are independent RVs, then E(XY) = E(X) E(Y)
25
Moments:
If the function g(X) is X raised to a power, i.e., g(X) = Xn, the
average value E[Xn] is referred to as the nth moment of the
random variable
X : First moment of X
E[ X n ] =
n
x
f X ( x ) dx
The most important moments of X are the first two moments

n = 1: gives the mean of the random variable
n = 2: gives the mean-square value of X
E[ X 2 ] =
2
x
f X ( x ) dx
26
Central Moment:
Moments of the difference between a random variable
and its mean X
nth central moment is
n = 1: central moment is zero

n = 2: central moment is referred as VARIANCE (X2) of RV X
The square-root of the variance, namely X is called the

STANDARD DEVIATION of the random variable X
27
Note: The variance of a RV X in some sense is a measure of

the variables randomness.
By specifying the variance, we essentially constrain the
effective width of the pdf fX(x) of the random variable X
about the mean X
28
Variance and Standard Deviation

Consider two RVs, whose
density functions are given by
Curves I and II.
Both of them have the same
mean (). Yet both the RVs
differ in some way
That is, mean alone does not
characterize
a
RV.
To
characterize a RV, we must also
know how it varies or deviates
from its mean
29
We may be inclined to take E(X - ) as another characteristic of a

RV to indicate its deviation, or dispersion about its mean.
Let us consider the curves I and II, to be symmetric about .
Then (X- ) is +ve for X > , and negative for X < , resulting
in E(X- ) = 0, in both the cases
Thus E(X- ) does not serve our purpose. If a function treats both
+ve and ve differences identically, our purpose is served.
E(|X- |) and [E(X- )2] are two such functions. However it is
found that [E(X- )2] is more useful function of the two.
When we interpret f(x) as the mass density on X-axis, the mean is
the center of gravity of mass. Variance, equals the moment of
inertia of the probability masses and gives some notion of their
concentration near the mean
30
Properties of Variance:
1.
2 = E[( X ) 2 ] = E ( X 2 ) 2
2.
If c is any constant, then
Var (cX ) = c 2 Var ( X )
3. If X and Y are independent random variables, then

(a) Var (X+Y) = Var(X) + Var(Y)
(b) Var (X-Y) = Var(X) + Var(Y)
31
The Gaussian Probability Density

The gaussian (also called normal) pdf is of greatest importance
because many naturally occurring experiments are characterized
by random variables with gaussian density
It is defined as:
f ( x) =
1
2 2
( x ) 2 / 2 2
32
X=
xe
( x ) 2 / 2 2
2 2
E[( X ) ] =
2
dx =
( x m) e
( x m ) 2 / 2 2
2 2
dx = 2
It may also be verified that
f ( x)dx = 1
33
Joint Moments
The joint moment for a pair of RVs X and Y is defined by:
i, k: any +ve integers

When i = k = 1, the joint moment is called the CORRELATION
defined by E[XY]
The correlation of the centered random variables X - E[X] and
Y - E[Y] is called the COVARIANCE of X and Y
or
34
The covariance of X and Y, normalized w.r.t. x y is called the

CORRELATION COEFFICIENT of X and Y
Two RVs X and Y are UNCORRELATED if and only if the

covariance is zero, i.e., if and only if
We say that they are ORTHOGONAL if and only if their

correlation is zero, i.e., if and only if
35
Transformations of RVs
Question: How to determine the pdf of a RV Y related to another
RV X by the transformation Y = g(X)
Case I: Monotone (one-to-one) Transformations

(one value of x is transformed into one value of y)
Case II: Non-monotone (many-to-one) Transformations
(several values of x can be transformed into one value of y)
36
Monotone Transformations
Let X be a RV with pdf fX(x) and let Y = g(X) be a monotone
differential function of X
dy is the infinitesimal change in y that occurs due to an

37
infinitesimal change in x, i.e., dx
Both the events (y < Y y+dy) and (x < X x+dx) contain the
same outcomes
The probabilities of these two events must be equal
P(y < Y y+dy) = P(x < X x+dx)
fY(y) dy = fX(x) dx if g(x) is a monotone increasing function
fY(y) dy = - fX(x) dx if g(x) is a monotone decreasing function
fY(y) |dy| = fX(x) |dx|
38
Many-to-one Transformations
Let the equation g(x) = y has three roots x1, x2 and x3
g(x+dx) = (y + dy) has three roots x1 + dx1, x2 + dx2, and x3

+ dx3
The event (y < Y y+dy) occurs when any of the three

events (x1 < X x1+dx1), (x3 < X x2+dx2), or (x3 < X
x3+dx3) occurs
39
[provided dy is infinitesimally small and the three events

involving the random variable X are mutually exclusive]
40
Example: Consider the transformation Y = cos X, where the RV

X is uniformly distributed in the interval (-, ). Find the pdf of Y.
Soln.: For -1 < Y 1, the equation cos x = y has two solutions,
namely
41
With y = cos x, we have

The pdf of X is given by
42

LECT3 Probability Theory

Încărcat de

Informații document

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

LECT3 Probability Theory

Încărcat de

Drepturi de autor:

Formate disponibile

Probability Theory in

An Example: Binary Symmetric Channel

1. The a priori probabilities of sending binary symbols 0 and 1:

A0, A1: events of transmitting symbols 0 and 1, respectively

B0, B1: events of receiving symbols 0 and 1, respectively

P(A0B0): Probability that a symbol 0 was sent, given that

(because, events B0 and B1 are mutually exclusive)

[Transition probability diagram of BSC]

2. The probability of receiving symbol 1 is given by

Application of Bayes rule gives

may be a real number (as in case of rolling a die)

From a mathematical point of view, it is desirable to have

Notation: When the outcome of an experiment is s, the random

Discrete Random Variable: If X takes on only a discrete set of

Continuous Random Variable: If X takes on any value in a

Cumulative Distribution Function:

Note: For any point x, the distribution function FX(x) expresses a

4. FX(x) is a monotone non-decreasing function of x; i.e.,

Probability Density Function:

Justification of the name: The name, density function, arises

The probability of an interval is therefore the area under the

This results from the fact that F(x) increases monotonically,

This result is to be seen from the fact that

This result follows directly from the defn. of f(x)

Several Random Variables:

Note: (i) The joint sample space is the xy-plane

provided this partial derivative exists

Properties of Joint pdf:

3. If we should be concerned only with the cumulative probability

The probability density corresponding to FX(x) is

provided fX(x) > 0.

3. If the RVs X and Y are statistically independent, the

fXY(x,y) = fX(x) fY(y)

4. When X and Y are independent

In the limit as N, the ratio Ni/N approaches P(xi) according to

The mean of a random variable X, is also called the

To calculate the average for a continuous random variable, let us

The probability that X lies in the range between xi and xi+ x is

So the expected value or mean of a RV X is defined by:

It is often necessary to find the mean value of a function of a RV

The average value, or expectation, of a function g(X) of the RV X is

The expected value of Y is

The above result can easily be generalized for functions of two,

In particular, if Z = XY, then

And if X and Y are independent RVs, then

The most important moments of X are the first two moments

n = 1: central moment is zero

The square-root of the variance, namely X is called the

Note: The variance of a RV X in some sense is a measure of

Variance and Standard Deviation

We may be inclined to take E(X - ) as another characteristic of a

If c is any constant, then

Var (cX ) = c 2 Var ( X )

3. If X and Y are independent random variables, then

The Gaussian Probability Density

It may also be verified that

i, k: any +ve integers

The covariance of X and Y, normalized w.r.t. x y is called the

Two RVs X and Y are UNCORRELATED if and only if the

We say that they are ORTHOGONAL if and only if their

Case I: Monotone (one-to-one) Transformations

dy is the infinitesimal change in y that occurs due to an

g(x+dx) = (y + dy) has three roots x1 + dx1, x2 + dx2, and x3

The event (y < Y y+dy) occurs when any of the three