Sunteți pe pagina 1din 81

Chapter 6

Normal Distribution
The most important type of random variable is
the normal or Gaussian random variable that has
a normal distribution. In fact, the binomial
distribution can be approximated to the normal
distribution.
Note that normal or Gaussian random variable is
continuous.
2

Graph of Normal Probability


Distribution
Let and be the mean and standard deviation
of the given population. Recall that normal or
Gaussian random variable is continuous. Its
distribution function is also continuous on the
set of all real numbers. Therefore, its graph is
continuous on the entire real line.
The normal or Gaussian graph is represented in
the next slide.
3

The Normal Curve

Properties of a Normal Curve


The notable features of a normal distribution (density) curve are
as follows:
a. The curve is bell-shaped with the highest point (the mode) at
the mean .
b. It is symmetrical about the mean.
c. The curve is always above the horizontal axis. In other words,
the curve approaches the horizontal axis (asymptote) but never
touches or crosses it.
d. It has two inflection points at - and + .
e. The area bounded by the normal curve, horizontal axis, two
vertical lines is the probability measure of normal random
variable belonging to the interval determined by the two vertical
lines.
5

The Normal Distribution Function


Moreover, the normal distribution curve, y = f(x) is
in fact the normal density function. Its mathematical
representation is given by,

1 x

f ( x | , )
e
2
where
x , 0,

Effect of the mean and variance on the


normal curve
FIXED 2, VARYING

VARYING 2, FIXED

Empirical Rule
For a distribution that is symmetrical and bell shaped
(in particular, for a normal distribution):
Approximately 68% of the data fall in the interval

,
Approximately 95% of the data fall in the interval

2 , 2

Approximately 99.7% of the data fall in the interval

3 , 3
8

Empirical Rule
In
fact from this empirical rule, one can easily conclude that the
probabilities of the events:
i.
ii.
iii.
are 0.68, 0.95, and 0.997 respectively.
That is,
P x 1 0.68

P x 2 0.95
P x 3 0.997

Graphical Representation of the


Empirical Rule

10

Control Charts
A control chart is used to examine data over a
period of equally spaced time intervals.
For a given random variable X, the control chart
is a plot of the observed values of X = x in time
sequence order.

11

Procedure for Making Control Chart


for Random Variable X

12

Example 1: Graphing Control Charts

13

Inferences about data using control


chart

Out-of-Control Signal-I:
One point beyond the three standard deviation level
either above or below the center line (mean, ).

Out-of-Control Signal-II:

A run of nine consecutive points on one side of the


center line (mean, ).

Out-of-Control Signal-III:

At least two of three consecutive points beyond the


two standard deviation level on the same side of the
center line (mean, ).
14

Graphical Illustration of Out-of-Control


Signal I

15

Probability of Out-of-Control Signal I


Using the Empirical Rule
Control Chart

20
10

19

16

13

10

Sam ple Mean

30

Trial

P x 3 1 0 .997
0.003
Empirically
16

Graphical Illustration of Out-of-Control


Signal II

17

Probability of Out-of-Control Signal II


Using the Empirical Rule
Control Chart

20
10

19

16

13

10

0
1

Sam ple Mean

30

Trial

P Nine on one side of the mean 0.5 0.002


P Nine on both sides of the mean 2 0.002 0.004
9

18

Graphical Illustration of Out-of-Control


Signal III

19

Probability of Out-of-Control Signal III


Using the Empirical Rule
Control Chart

20
10

Trial

19

16

13

10

Sam ple Mean

30

20

Probability of Signal III (cont)

1 0.95

x 2 0 .025

More accurately 2
Above the mean

Empirically

At least two out of three data values more than

P
two standard deviations above the mean

3 C2 (0.025) 2 (0.975)1 3 C3 (0.025) 3 (0.975) 0


0.0018
At least two out of three data values more than two
P
standard deviations above or below the mean

2 0.0018 0.0036 0.004


21

Summary of Signals Probabilities


Using the empirical rule :
P(Signal I) 1 0.997 0.003

P(Signal II) 2 0.59 0.004

P(Signal III) 2 3 C2 0.025 0.975 3 C3 0.025 0.975 0.004


2

22

Chebyshevs Theorem*
For any set of data (population or sample) with
sample size greater than 1, regardless of the
distribution of the data set, the proportion of the
data that must be within k standard deviations on
either side of the mean is given by,

23

Results of Chebyshevs Theorem


According to Chebyshevs Theorem for any set of data, the
proportion of data (percentage of data) within the given number
of standard deviations yields the following results:
At least 75% of the data fall in the interval 2 , 2
At least 88.9% of the data fall in the interval 3 , 3
At least 93.8% of the data fall in the interval 4 , 4

24

The Normal Distribution


1
f ( x | , )
e
2
PDF : f ( x | , )
CDF : F ( x | , )

1 x
2

z score

Probability Distribution Function


x

f (t | , )dt

Cumulative Probability Distribution Function


25

The Z-value (or Z-score)

The

z-value or z-score is the deviation of the


measurement from the mean per unit standard
deviation. It is defined by,
where x is the original measurement, is the
mean of the x distribution and is the standard
deviation.

26

Remarks

Note

that we assume the word average to be


either the sample mean or the population mean
. We further note that the original score x is
referred to as raw score x.
Knowing the z-score, , and then the raw
score x is determine by,

27

Standard Normal Distribution


If the original distribution of the x values is
normal with mean , and standard deviation ,
then the corresponding z values have a normal
distribution with mean, = 0 and standard
deviation, = 1.
This transformed normal distribution with mean,
= 0 and standard deviation, = 1 is called the
standard normal distribution.
28

Proof of Mean & Variance of Z-score


Given E ( x) and V ( x)
Then, let y x
E ( y ) ? and V ( y ) ?

E ( y ) E ( x ) E ( x) (1) E ( )
E ( y) 0
V ( y ) V ( x ) V ( x) (1) 2 V ( )
V ( y) 2 0 2

E ( z ) ? and V ( z ) ?

Hence, let z

1
x

E(x )

E ( z) E

1
0 0 E( z) 0

1
x
V ( z) V
2 V (x )

1
V ( z) 2 2 1

29

The Empirical Rule under the


Standard Normal Curve

30

The Standard Normal Table


The textbook uses the left tail and half tail style
tables interchangeably to solve problems
involving the normal distribution. However, for
the sake of uniformity we would focus on only
the left tail style table.
This style table provides the cumulative area to
the left of a given z score associated with an
original raw score x.
31

The Standard Normal Left Tail Table


(Left Tail Z Table)

32

Some Remarks about Normal


Probability
The total area under the normal curve is always
equal to 1.
The portion of the area under the curve within a
given interval represents the probability that a
measurement will lie in that interval.
The probability that z equals a certain number is
always 0.
P (z = a) = 0

Therefore, < and can be used interchangeably.


Similarly, > and can be used interchangeably.
P (z < b) = P (z b)
P (z > c) = P (z c)
33

Convention? Argumentative.
Some instructors and books states that:
The area to the left of a z-value smaller than
3.49 is 0.000
It is better to state 0.0002 from table
The area to the left of a z-value greater than 3.49
is 1.000
It is better to state 0.9998 from table
Always avoid absolute statements.
34

Use of the Left Tail Normal Table


(looking up area under the curve)
Example 2: P ( z 0.63) 0.7357

35

Example 3
a.

P ( z 2.43) 1 0.925 0.0075

b.

P( z 1.78) 0.9625

c.

P ( z 3.09) 0.001

d.

P ( z 0.227) P( z 0.23) 0.5910


36

Example 4
P (2.18 z 1.34) 0.9099 0.0146 0.8953
P( z 1.34) P ( z 2.18)
P ( z 1.34) 0.9099

0.04

1.3 0.9099

P ( z 2.18) 1 P ( z 2.18)
1 0.9854 0.0146

0.08

2.1 0.9854

37

Example 5
Given that the mean is 25 and the standard deviation is 5,
what is the probability that the observed data point is at most
28.15.

x 28.15 25
P ( x 28.15) P

P( z 0.63) 0.7357

38

Example 6
Given 4, 2;
3 4 x 6 4
P (3 x 6) P

2
2
P (0.50 z 1.00) 0.8413 0.3085 0.5328
P ( z 1.00) 0.8413
By symmetry

1.0 0.8413

P ( z 0.50) 1 P ( z 0.50)
1 0.6915 0.3085

0.00

0.00

0.5 0.6915
39

Example 7

b.

40

Example 7 (continue)
c.

41

Inverse Normal Distribution

Sometimes
we may be required to find the z
value or raw score, x that corresponds to a given
area under the normal curve.
To do this, we look up the area associated with
the given problem and find the corresponding z
value.
Next, the raw score, x can be computed as
follows:
42

Example 8: Using the information given in example 7,

43

Example 9: Using the information given in example 7,

44

Example 10
1. Find the z value such that 90% of the area
under the standard normal curve lies between
z and z.
2. Find the z value such that 3% of the area under
the standard normal curve lies to the right of z.
3. If a random variable X is normally distributed
with mean 50 and standard deviation 10, find k
so that the P (X k) = 0.99
45

Sampling Distribution
A sampling distribution is a
probability distribution of a
sample statistic based on all
possible simple random samples
of the same size from the same
population.

46

Example 11: Sampling Distribution


An application center has six sales representatives at its North
Jacksonville outlet. Listed below is the number of refrigerators
sold by each last month.
Sales Representative

Number Sold

Sales Representative

Number Sold

Zina

54

Jan

48

Woon

50

Molly

50

Ernie

52

Rachel

52

a. Select all possible samples of size 2 and compute


the sample mean number sold for each sample.
b. What is the distribution of the sample means.
47

Central Limit Theorem (CLT)


In general, given that a data is normally
distributed, then regardless of the sample size,
the sampling distribution will follow normal
probability distribution.
On the other hand, if the distribution of the data
does not follow the normal distribution then only
when the sample size increases does the sampling
distribution approach the normal probability
distribution.
48

Central Limit Theorem


Regardless of the distribution of the data, as the
sample size increases, the sampling distribution
approaches normality.

49

Central Limit Theorem


Let x be a random sample from a population
with finite mean and finite variance 2.
Let x be the sample mean; that is,
Then as the sample size increases, the
probability distribution of the sample mean
approaches a normal probability distribution
with mean and variance 2/n.
50

Proof
n

Given that E ( x) , V ( x) 2 and x

x
i 1

Prove : x E ( x ) , x2 V ( x )
, and therefore
n
the standard error for the sampling distribution is

SE x
n

51

Detailed Proof

n
1

i 1

x E(x) E
E xi
n
n i 1

1
1
E x1 x 2 x n E ( x1 ) E ( x 2 ) E ( x n )
n
n

1 n
1 n
1

E ( xi )

n i 1
n i 1
n n times
1
(n )
n
52

Detailed Proof (continue)

n
1

i 1

V (x) V
2 V xi
n
n i 1

1
1

V ( x1 ) V ( x 2 ) V ( xn )
V
x

1
2
n
2
2
n
n
1 n
1 n 2
1 2
2
2
V ( xi ) 2 2

2
n i 1
n i 1
n
n times

2
1

2
(
n

)
2
n
n

2
x

53

Detailed Proof (continue)


SE x V ( x )

2
2

n
n

54

Example 12
Assume that the weight of marbles are normally
distributed with mean 172 grams and standard
deviation 29 grams.
a. If 4 marbles are selected, find the probability that its
mean weight is less than 167 grams.
b. If 25 marbles are selected, find the probability that
they have a mean weight more than 167 grams.
c. If 100 marbles are selected, find the probability that
they have a mean weight between 167 grams and
180 grams.
55

Normal Approximation to the


Binomial Distribution
In the binomial distribution, if the sample size is very large, the probability of
finding r j for some j, where 1 j n is very tedious and lengthy
calculations. In such cases, the problem can be solved by using the normal
approximation to this type of the binomial distribution.
Procedure:
Step 1: Given a binomial distribution with n, r, and p, where
n stands for total number of trials
r stands for the number of successes (r = 0, 1, 2, , n)
p stands for the probability of success in a single trial.
Step 2: Criteria for the normal approximation to the binomial distribution is
that if,
np > 5 and nq > 5 or np 5 and nq 5
Then r has a binomial distribution that can be approximated by a normal
distribution with
= np and
56

Continuity Correction

57

Correction for Continuity

58

Converting Binomial to Standard Normal


without correction for continuity

x
x np
z

np (1 p )

59

Converting Binomial to Standard Normal


with correction for continuity

x 0.5 x 0.5 np
z

np (1 p )

60

Example 13
The Denver Post stated that 80% of all new products introduced in
grocery stores fail (and are taken off the market) within 2 years.
Using normal approximation for this binomial distribution and
correction for continuity, if a grocery store chain introduces 75 new
products,
a. Verify that the assumption for normal approximation to the
binomial is satisfied.
b. What is the probability that within two years, 54 or more will fail?
c. What is the probability that within two years, fewer than 62 will
fail?
d. What is the probability that within two years, more than 49 will
fail?
e. What is the probability that within two years, 58 or fewer fail?
61

Example 13 (solution)
a.

b.

Without correction for continuity


60 and 3.464
x 60 54 60
P( x 54) P

3.464
3.464

P( z 1.73) 0.9582
With correction for continuity
60 and 3.464
x 0.5 60 54 0.5 60
P( x 54 0.5) P

3.464
3.464

P ( z 1.88) 0.9699

62

Example 13 (solution)
c.

Without correction for continuity


60 and 3.464
x 60 61 60
P( x 62) P( x 61) P

3.464
3.464

P( z 0.29) 0.6141
With correction for continuity
60 and 3.464
x 0.5 60 61 0.5 60
P( x 61 0.5) P

3.464
3.464

P( z 0.43) 0.6664
63

Example 13 (solution)
d. for Continuity

With Correction for Continuity

64

Example 13 (solution)
e. for Continuity

With Correction for Continuity

65

PP & QQ Plots for Testing the


Assumption of Normality
PP PLOT

QQ PLOT

66

Normal Probability Plot


x
1

z
z x

Let m and b

z mx b
Hence, the data is normal if the scatter plot of the data and
the corresponding z-score (by matching percentiles) is a
line.
67

Testing the Assumption of Normality using


the Probability-Probability Plot (PP Plot)

Approximately Normal

Not Normal

68

Normal Quantile Plot

69

Testing the assumption of Normality using


the Quantile-Quantile Plot (QQ Plot)

70

Standardized & Percentage Plots


STANDARDIZED PLOT

PERCENTAGE PLOT

71

Normality
Central Limit Theorem

Continuous Random Variable


Correction for Continuity
Gaussian Probability

Distribution
Normal Approximation
Normal Probability

Distribution
Sampling Distribution

Standard Score
PP plot
QQ plot
Standardized plot
Percentage plot

Regardless of the datas distribution, as the sample


size increases, the sampling distribution approaches
normality
Most continuous variables are assumed normal and
even the discrete probability distribution, binomial,
can be approximated using normality.
The normal probability distribution was developed
by Gauss; a Gaussian probability distribution shows
normality.
To test for normality we can use the PP plot (a
metaphoric t-shirt) or the QQ plot (the t-shirt
turned inside-out) .
To force normality or normalize the data, we can
use the standardized plot or the percentage
change plot.
72

Assignment Problems
Section 6.1:# 6.1
Section 6.2:# 6.6
Section 6.3:# 6.15, 6.17, 6.19, 6.27, 6.29
Section 6.5:# 6.31, 6.33, 6.41
Section 6.5:# 6.49, 6.54, 6.60
Section 6.6:# 6.65, 6.68, 6.70

73

Assignment for chapter-06

Section 6.1
# 6.1 Determine if the following are continuous or
discrete random variables:
a. Number of characters in a document.
b. The amount of time it takes to make dinner.
c. The height of a palm
tree.
Section6.4

Section 6.2
# 6.6 Illustrate the following curves indicating the
points of inflection .
a. X~N()
b. X~N()
c. X~N()

Section 6.3
# 6.15 Determine the probability that the standard
normal random variable Z will assume a single value
between -1.42 and 0.75.
# 6.17 The random variable X is normally
distributed with mean and . Find the following
probabilities:
Section6.4

# 6.19 If random variable X is normally distributed


with and , find K so that the .

# 6.27 The amount of solid fuels, X, which assumes


values of X metric tons, is normally distributed
with mean thousand metric tons (kmt) and
standard deviation thousand metric tons.
a. Determine the probability that the amount
is between 250 kmt and 320 kmt, that is
a. Find the metric tonnage such that
probability that the tonnage is exceeded is
0.80.

# 6.29 The price of coffee, X, which assumes values of


x dollars, is normally distributed with mean and
standard deviation .
a. Determine the probability that the cost is
between $10 and $15 that is
b. Find the cost that the probability that this
price is exceeded is 0.2.
Section6.4
# 6.31 The mean height of a group of 500 nonsmoking college
students is 74 inches and the standard deviation is 5inches. What
is the probability that in a random sample of 25 students from this
group, the average height will be between 73 and 75 inches?

# 6.33 Suppose that the weight of candy packing


machine are distributed about the mean of 16 ounces
and a standard deviation of 2 ounces. What is the
probability that if nine packages of candy are
weighted their average weight:
a. Will be less than 14 ounces?
b. Will be more than 16 ounces?
# 6.41 A survey of IQ scores of all the United States
senators in the history of this body revealed a mean
of and standard deviation . What is the probability
that the average IQ score of a random sample of 16
senators:

a. Will be lower than 85?


b. Will exceed 85?
c. Will be between 85 and 130?
Section 6.5
# 6.49 According to Chebyshev`s rule, what
percentage of the data values will lie within one
standard deviation of the mean?
# 6.54 According to empirical rule, what percentage of
data values will lie within three standard deviation
of the mean?
# 6.60 Based on the empirical rule, how many data
values in a set of size 200 would you expect to lie
within three standard deviations of the mean?

Section 6.6
# 6.65 Let the random variable X be binomially
distributed with and . Evaluate the following
probabilities:

# 6.68 A fair coin is tossed 15 times. Determine the


probability that between 6 and 8 heads inclusive will
occur:
a. Using the binomial probability distribution.
b. Using the normal approximation without
correction for continuity.

c.
Using the normal approximation with
correction for continuity.
# 6.70 A fair die is rolled 200 times. Using normal
approximation to the binomial, what is the
probability that an ace (one) will appear between 34
and 36 times?

S-ar putea să vă placă și