Sunteți pe pagina 1din 18

PROBABILITY AND PROBABILITY DISTRIBUTIONS REVIEW

Topics Outline
Probability of Events
Probability Rules
Random Variables
Probability Distributions
Probability of Events
There are many interpretations of probability. The three most widely used approaches are:
1. Classical method based on the assumption of equally likely events.
Example: Six-sided fair die.
Each side has the same chance of turning up. Therefore, each has a probability 1/6.
2. Empirical method based on experimental or historic data.
Example: Predicting the weather.
A 30% chance of rain today means that it rained on 30% of all days with similar
atmospheric conditions.
3. Subjective method based on judgment, experience, intuition.
Example: Chris and Sally make an offer to purchase a house.
Sally believes that the probability their offer will be accepted is 0.8.
Chris, however, believes that the probability their offer will be accepted is 0.6.
In the case of equally likely events, a convenient way of thinking about probabilities is:
P(A) =

number of outcomes in A
all outcomes

Probability Rules
No matter which method is used to assign probabilities to events, the following
probability rules (axioms) must hold.
Rule 1. 0 P ( A) 1 for any event A
That is, the probability of any event A is a number that lies between 0 and 1.
Rule 2. P(all outcomes) = 1
That is, the probability of something happening is 1.
The compliment Ac of A is the event consisting of all sample points that are not in A.
That is, Ac is the event that A does not occur.
Rule 3. Compliment rule
P(Ac) = 1 P(A)
That is, the probability of an event not occurring is 1 minus the probability that the event does occur.
-1-

Rule 4. Addition rule


P(AB) = P(A) + P(B) P(AB)
AB = union
= or = either A or B or both
AB = intersection = and = both A and B
Two events A and B are mutually exclusive (disjoint) if they have no outcomes in common and
so can never happen together.
Addition rule for mutually exclusive events
If A an B are mutually exclusive,
P(AB) = P(A) + P(B)
Addition rule for more than two mutually exclusive events:
P(A1A2 . . . An) = P(A1) + P(A2) + . . . + P(An)
The conditional probability of the event A given that the event B has already occurred is given by

|

  


Rule 5. Multiplication rule

    | 


Two events A and B are independent if knowledge of the occurrence of one has no influence on
the probability of occurrence of the other, that is
P(A|B) = P(A)
Multiplication rule for independent events
P(AB) = P(A)P(B)
Mutually exclusive events are not independent! If two events are mutually exclusive,
then knowledge of the occurrence of one has influence on the probability of the other.
(If you know that event B has occurred, you know that event A cannot have occurred).
How to check independence?
Use any one of the following techniques:
1. Check if A and B are mutually exclusive.
If they are mutually exclusive, then they are not independent.
2. Check if the multiplication rule for independent events P(A B) = P(A)P(B) holds.
If it holds, then the events are independent.
3. Use the definition of independent events: P(A|B) = P(A)
If it is satisfied, then the events are independent.
-2-

Example 1
Due to rising health insurance costs, 43 million people in the United States go without health
insurance (Time, December 1, 2003). Sample data representative of the national health insurance
coverage for individuals 18 years of age and older are shown here.
Age
18 to 34
35 and over

Health Insurance
Yes
No
750
170
950
130

a. Develop a joint probability table for these data.


First we calculate the totals in the table above.
Age
18 to 34
35 and over
Total

Health Insurance
Yes
No
750
170
950
130
1700
300

Total
920
1080
2000

Total sample size = 2000.


Dividing each entry by 2000 provides the following joint probability table (or contingency table).
Age
18 to 34
35 and over
Total

Health Insurance
Yes
No
0.375
0.085
0.475
0.065
0.850
0.150

0.46
0.54
1.00

Marginal probabilites are displayed in the margins and represent the probability of one event.
Joint probabilities are displayed in the interior cells and represenet probabilities of intersections.
Let A = 18 to 34 age group
B = 35 and over age group
Y = Insurance coverage
N = No insurance coverage
The marginal probabilities are:
P(A) = 0.46

P(B) = 0.54

P(Y) = 0.85

The joint probabilities are:


P(AY) = 0.375

P(AN) = 0.085

P(BY) = 0.475

P(BN) = 0.065
-3-

P(N) =0.15

b. What is the probability that a randomly selected individual does not have health insurance
coverage?

c. If the individual is between the ages of 18 and 34,


what is the probability that the individual does not have health insurance coverage?

d. If the individual does not have health insurance coverage,


what is the probability that the individual is in the 18 to 34 age group?

e. Are the events A and N independent?

f. What does the probability information tell you about the health insurance coverage in the United
States?

-4-

Solution:
b. What is the probability that a randomly selected individual does not have health insurance
coverage?
P(N) = 0.15
c. If the individual is between the ages of 18 and 34,
what is the probability that the individual does not have health insurance coverage?
P ( N | A) =

P ( N A) 0.085
=
= 0.1848
P ( A)
0.46

d. If the individual does not have health insurance coverage,


what is the probability that the individual is in the 18 to 34 age group?
P( A | N ) =

P ( A N ) 0.085
=
= 0.5667
P( N )
0.15

Please note that the probabilities in (c) and (d) are different.
e. Are the events A and N independent?
P(AN) = 0.085
P(A)P(N) = (0.46)(0.15) =0.069
Since 0.085 0.069, A and N are not independent
Or, equivalently
P(A|N) = 0.5667
P(A) = 0.46
Since 0.5667 0 .46, A and N are not independent.
f. What does the probability information tell you about the health insurance coverage in the
United States?
Probability of no health insurance coverage is 0.15.
A higher probability for no insurance exists for the younger population:
0.1848 (or approximately 18.5%) versus 0.1204 (or approximately 12%).
Of the no insurance group, more are in the 18 to 34 age group:
0.5667, or approximately 57% are ages 18 to 34.

-5-

Random Variables
A random variable is a variable taking numerical values determined by the outcome of a
random phenomenon.
Example: Toss two coins and let X stand for the number of heads: 0, 1, or 2.
Outcome
X

TT
|
0

HT TH HH
\ /
|
1
2

A random variable can be classified as being either discrete or continuous depending on the
numerical values it assumes.
Discrete random variables can take only a finite number, or a countable infinity of values.
For example, the above defined random variable X can assume only three values (0, 1, or 2),
it is discrete.
The random variable
X = {# of customers at a gas station for one day}
is a discrete random variable which can take an infinite sequence of values (0, 1, 2, and so on).
Continuous random variables may take any numerical value in an interval or collection of
intervals. For example,
X = {height of students in this class}
is a continuous random variable.
In general, continuous random variables represent measured data, such as height, weight,
temperature, distance, or time, whereas discrete random variables represent count data,
such as the number of defectives in a sample of 50 items or the number of cars arriving at
a tollbooth during a one-day period.
One way to determine whether a random variable is discrete or continuous is to think of the
values of the random variable as points on a line segment. Choose two points representing values
of the random variable. If the entire line segment between the two points also represents possible
values for the random variable, then the random variable is continuous.

-6-

Probability Distributions
The probability distribution function f(x) for a discrete random variable X provides the probability
for each value of the random variable. A given assignment of probabilities produces a valid
probability distribution if it satisfies the following two rules:
Rule 1. f(x) 0 for all x
Rule 2.

f ( x) = 1
all x

To calculate the probabilities involving continuous random variables we use a special function
called density function. The probability that the continuous random variable X takes on a value
in the interval [a, b] is equal to the area under the graph of the probability density function f(x)
over the interval [a, b]. The density function must satisfy the following two rules:
Rule 1. f(x) 0 for all x
That is, it is always on or above the horizontal axis.
Rule 2.

f ( x)dx = 1
all x

That is, the total area under the graph of f(x) is equal to 1.
Note the similarities between these conditions and those for a probability distribution function of a
discrete random variable. However, there are important differences between the two kinds of
probability functions. Note that for a continuous random variable:
1. P(X = c) = 0 for all c
That is, the probability of any given point equals zero.
(Because P(X = c) is the area of the line segment over the point c and the area of a line segment is zero.)
2. It follows that in continuous case
P(a X b) = P(a < X b) = P(a X < b) = P(a < X < b)
3. f(x) may exceed a value of 1.
The calculation of the expected value or mean and variance for a continuous random variable is
analogous to that for a discrete random variable. The difference is that instead of summations we use
integrals:
Discrete random variable

Continuous random variable

E ( X ) = = xf ( x)

E( X ) = =

all x

Var ( X ) = 2 = ( x ) 2 f ( x)

xf ( x)dx
all x

Var ( X ) = 2 =

all x

(x )
all x

The standard deviation in both cases is = 2 .


-7-

f ( x)dx

Example 2
A psychologist determined that the number of sessions required to obtain the trust of a new patient is
either 1, 2, or 3. Let X be a random variable indicating the number of sessions required to gain the
patients trust. The following probability function has been proposed.
x
f ( x) =
for x = 1, 2 or 3
6
(a) Is this probability function valid? Explain.
x
f(x)
(Values which (Associated
X can take) probabilities)
1
1/6
2
2/6
3
3/6
It is a valid probability distribution since
1. f(x) 0 for all x
2. f(1) + f(2) + f(3) = 1/6 + 2/6 + 3/6 = 1
(b) What is the probability that it takes exactly 2 sessions to gain the patients trust?
f(2) = 2/6 = 0.333
(c) What is the probability that it takes at least 2 sessions to gain the patients trust?
f(2) + f(3) = 2/6 + 3/6 = 5/6 = 0.833
(d) What is the expected number of hours required to obtain the trust of a new patient?
1
2
3
E ( x) = = xf ( x) = (1) + (2 ) + (3) = 1 / 6 + 4 / 6 + 9 / 6 = 14 / 6 = 2.333
6
6
6
all x
(e) What is the variance of the hours required to obtain the trust of a new patient?
Var ( x) = 2 = ( x ) 2 f ( x)
all

x
2

14 2 14 3 5
14 1
= 1 + 2 + 3 = = 0.556
6 6
6 6
6 6 9

(f) What is the standard deviation of the hours required to obtain the trust of a new patient?
5
= 2 =
= 0.745
9
Using Excel:
x-
(x- )^2
(x- )^2f(x)
x
f(x)
xf(x)
1
0.167
0.167
-1.333 1.778
0.296
2
0.333
0.667
-0.333 0.111
0.037
3
0.500
1.500
0.667
0.444
0.222
2
=
=
2.333
0.556

-8-

0.745

Binomial Distribution
Suppose we have a random process with just two possible outcomes, for example:
tossing a coin (heads or tails)
football game (win or loss)
auto smog inspection (pass or fail)
If the following properties are present we say the random process is a binomial experiment:
1. The experiment consists of a sequence of n identical trials.
2. The result of each trial may be either success or failure.
3. At each trial, the probability of a success is equal to p, and the probability of a failure is equal to 1 p.
4. The trials are independent.
(That is, the outcome of one trial has no influence on later outcomes.)
The binomial random variable is defined as
X = number of successes in n trials
The probability of having x successes in n trials is given by the binomial distribution function:
n
f ( x) = p x (1 p ) n x
x
where
n number of trials
x number of successes; x = 0, 1, 2, ... ,n
p probability of success
Recall: The binomial coefficient n choose x is equal to
n
n!
=
x x!( n x)!
and represents the number of ways to choose x successes in a sequence of n observations.

Factorial:

!

1
2  21
0! = 1
Note that X is a discrete random variable that can assume any of the values 0, 1, 2 ,, n.
It can be shown that in the case of a binomial random variable, the general formulas for the
expected value, variance, and standard deviation simplify to the following:
E ( X ) = = np

Var ( X ) = 2 = np (1 p )

= np(1 p )
-9-

Example 3
Suppose that the likelihood that someone who logs onto a particular site in a shopping mall on the
World Wide Web will purchase an item is 0.2.
If the site has 5 people accessing it in the next minute, what is the probability that
(a) exactly 2 individuals will purchase an item?
Let X = number of individuals who will purchase an item.
5
5!
P ( X = 2) = f ( 2) = (0.2) 2 (1 0.2) 5 2 =
(0.2)2 (0.8)3 = 0.2048
2!(5 2)!
2
Or using Excel,
P(X=2) = f(2) = BINOMDIST(2,5,0.2,FALSE) = 0.2048
(b) at most 2 individuals will purchase an item?
P ( X 2) = P ( X = 0) + P ( X = 1) + P ( X = 2)

5
5
0
5 5
1
4
2
3
= (0.2 ) (0.8) (0.2 ) (0.8) (0.2 ) (0.8) =
0
+ 1
+ 2
= 0.32768 + 0.4096 + 0.2048 = 0.94208
Using Excel,
P(X 2) = BINOMDIST(2,5,0.2,TRUE) = 0.94208
(c) more than 2 individuals will purchase an item?
P(X > 2) represents the complement of the probability P(X 2).
Because all the probabilities in a probability distribution must sum to 1,
P ( X > 2) = 1 P ( X 2) = 1 0.94208 = 0.05792

(d) On average, how many individuals will purchase an item?


E ( X ) = = np = (5)(0.2) = 1

(e) What is the variance of the number of individuals who will purchase an item?

2 = np(1 p) = (5)(0.2)(1 0.2) = 0.8


(f) What is the standard deviation of the number of individuals who will purchase an item?

= 2 = 0.8 = 0.8944
- 10 -

Uniform Distribution
A continuous random variable X is said to be uniformly distributed over the interval [a, b] if its
probability density function is given by
1

f ( x) = b a
0

for

axb
elsewhere

The expected value, variance, and standard deviation are given by


E( X ) = =

a+b
2

(b a ) 2
Var ( X ) = =
12
2

= 2 =

(b a) 2
12

Possible applications:
1. It is used as a first model for a quantity that is felt to be randomly varying between a and b
but about which little else is known.
2. It is essential in generating random values from all other distributions.

- 11 -

Example 4
The time X it takes to build a laser printer is thought to be uniformly distributed between 7 and 15 hours.
(a) Plot the density curve of X.
1
because
8
the area under the curve must be 1.

The height should be

(b) What are the chances that it will take more than 10 hours to build a printer?
P ( X > 10) = area of the rectangle with

base 5 ( = 15 10) and height

1
8

1 5
= (5) = = 0.625 or 62.5%
8 8

(c) Determine the probability that it will take between 12 and 14 hours to build a printer.
P (12 < X < 14 ) = area of the rectangle with

base 2 (= 14 12) and height

1
8

1 2
= (2) = = 0.25
8 8

(d) How likely is it that a printer will require fewer than 9 hours?
1 2
P(X < 9) = (9 7) = = 0.25
8 8

(e) What is the probability that it will take exactly 11 hours to build a printer?
P(X = 11) = 0
(f) Find the average time required to build a printer.
E(X) = =

a + b 7 + 15
=
= 11 hours
2
2

(g) What is the standard deviation of the time required to build a printer?

(b a )2
12

(15 7 )2
12
- 12 -

64
= 2.31 hours
12

Normal Distribution
The most important probability distribution in the entire field of statistics is the normal or
Gaussian distribution. It was discovered by DeMoivre in 1733 and reintroduced by Gauss near
the beginning of 19-th century. The density function of the normal distribution with mean
and standard deviation is given by

f ( x) =

1
2

( x )2
2 2

where = 3.14159 and e = 2.71828 is the base of the natural logarithm.

Note that:
1. The normal curve is single-peaked, bell-shaped, symmetric about the mean .
2. can be any real number positive, negative or 0.
3. f (x ) has a maximum at and the maximum value of the density function is f ( ) =
4. is the distance from to the change of curvature points on either side;
determines how widely spread the distribution will be; larger implies a more
disperse curve.

1
2

5. The normal curve approaches the horizontal axis asymptotically as we proceed in either
direction from the mean.
We abbreviate the normal distribution with mean and standard deviation as N( , ).
Once and are specified, the normal curve is completely determined.
For example, if =7 and =5, then the ordinates f ( x) =
for various values of x and the curve drawn.

- 13 -

1
5 2

( x 7 ) 2
50

can easily be computed

Standard Normal Distribution


The standard normal distribution N(0,1) is a normal distribution with mean = 0 and standard
deviation = 1. The density curve of the standard normal distribution is given by
f ( z) =

1
2

z2
2

Any normal curve N( , ) with mean and standard deviation can be converted to the
standard normal curve N(0,1) using the formula
x
z=

This equation rescales any normal distribution axis from its true units (time, weight, dollars,
barrels, and so forth) to the standard measure referred to as a z-score.
Thus, any observation x from a normally distributed density curve can be represented by a
unique z-score. The z-score represents the number of standard deviations that a data value x is
away from the distribution mean .

Please study the formula in words:


z - score =

variable - mean
standard deviation

- 14 -

Calculating Normal Probabilities


Normal probabilities can be calculated using a table of the standard normal distribution or
software. We will be using a table and Excel. The table gives the area under the standard normal
curve to the left of a value z.
There are two types of calculations forward and backward (or inverse) calculations. Forward
calculations are used to find probabilities (areas under the curve) given a value of the normal variable.
Backward calculations are used to find a value of the normal variable given the probability.

Forward Calculations
(Finding probabilities)
Example 5
An electrical firm manufactures light bulbs that have a length of life that is normally distributed
with mean equal to 800 hours and a standard deviation of 40 hours.
Find the probability that a given bulb burns:
(a) in less than 700 hours
(b) between 778 and 834 hours
(c) after 850 hours
(d) exactly 850 hours
(e) What is the value of the density function at 850 hours?

Solution: = 800
(a)
z=

= 40

The formula for finding z-scores is z =

700 800
= 2.5
40

The area left of (2.5) is 0.0062

The calculations if we use Excel instead of a table are:


P(X < 700) = NORMDIST(700,800,40,TRUE) = 0.00621
The proportion of bulbs that burn in less than 700 hours is 0.0062, or 0.62%.
(b)
834 800
= 0.85
40
778 800
=
= 0.55
40

z 834 =
z 778

- 15 -

area for X between 778 and 834 = area for x left of 834 area for x left of 778
= area for z left of 0.85 area for z left of (0.55)
= 0.8023 0.2912
= 0.5111
Calculations if Excel is used:
P(778 < X < 834) = P(X < 834) P(X < 778)
= NORMDIST(834,800,40,TRUE) NORMDIST(778,800,40,TRUE)
= 0.8023 0.2912
= 0.5111
The proportion of bulbs that burn between 778 and 834 hours is 0.5111, or 51.11%.

(c) z =

850 800
= 1.25
40

area right of 1.25 = 1 area left of 1.25


= 1 0.8944
= 0.1056
Excel calculations:
P(X > 850) = 1 P(X 850)
= 1 NORMDIST(850,800,40,TRUE)
= 1 0.8944
= 0.1056
The proportion of bulbs that burn after 850 hours is 0.1056, or 10.56%.
(d) The answer is 0, since there is no area under a smooth curve and exactly over the point 850.

(e) What is the value of the density function at 850 hours?


f(850) = NORMDIST(850,800,40,FALSE) = 0.0046
The value of the density function is 0.0046 and it represents the height of the distribution at
the point 850.
- 16 -

Backward (inverse) calculations


(Finding cut-off points)
Example 5 (continued)
(f) One percent of the bulbs will have a life expectancy of at most how many hours?
(g) One percent of the bulbs will have a life expectancy of at least how many hours?

Solution:
(f) The point z =

x 800
cuts off 1%, or 0.01 in the lower tail of the standard normal distribution.
40

Using the table backward, we find that the entry closest to 0.01 corresponds to z = 2.33.
Substituting z in the above equation and solving for x we get
x 800
2.33 =
40
(40)(2.33) = x 800
x = 800 + (40)(2.33) = 706.80 707
Note: To find x, we can also use the formula
x = + z
x = 800 + (2.33) (40) = 706.80 707
Or, using Excel:
x = NORMINV(0.01,800,40) = 706.95 707

Thus, 1% of the bulbs will have a life expectancy of at most 707 hours.
- 17 -

x 800
cuts off 1%, or 0.01 in the upper tail of the standard normal distribution.
40
This means that the area to the left of this point is 99%, or 0.99.

(g) The point z =

Using the table backward, we find that the entry closest to 0.99 corresponds to z = 2.33.
(Note: We could also find the z-value using the result from part (f) and the symmetry
of the standard normal distribution.)
Substituting z in the above equation and solving for x we get
x 800
2.33 =
40
(40)(2.33) = x 800
x = 800 + (40)(2.33) = 893.20 893
Or, using the formula

x = + z
x = 800 +(40)(2.33) = 893.20 893
Using Excel,
x = NORMINV(0.99,800,40) = 893.05 893

Therefore, 1% of the bulbs will have a life expectancy of at least 893 hours.

- 18 -

S-ar putea să vă placă și