Documente Academic
Documente Profesional
Documente Cultură
SCIENCE
INDEX
003-059 --- MEASURE OF CENTRAL TENDENCY
060-118 --- PROBABLILITY 1
119-201 --- RANDOM VARIABLE
202-246 --- NORMAL VARIABLE 2
247-293 --- SAMPLING 2
294-336 --- CONFIDENCE INTERVAL 2
337-438 --- HYPOTHESIS TESTING 2
439-501 --- COMPARISON OF TWO VARIABLES
502-523 --- ANALYSIS OF VARIANCE
1-1
WHAT IS STATISTICS?
Statistics is a science that helps us make better
decisions in business and economics as well as in
other fields.
Statistics teaches us how to summarize, analyze,
Gender Salaries
elements.
1-11
Why Sample?
Arithmetic Mean or
Average
◼ The arithmetic mean is the average of a group of numbers and is
computed by summing all numbers and dividing by the number of
numbers
Example
Example – Mean
538
x
i
i
x 20 26.9
1
1-18
Percentiles
⚫ Percentiles are measures of central tendency that divide a group
of data into 100 parts
The nth percentile is the value such that at least n percent of the
data are below that value and at most (100 - n) percent are above
that value
Steps in Determining the Location of a Percentile
1. Organize the numbers into an ascending-order array.
Example - Billionaires
Example - Percentiles
Find the 50th, 80th and the 90th percentiles of this
data set.
To find the 50th percentile, determine the data point
in position (n + 1)P/100 = (20 + 1)(50/100)
= 10.5.
Thus, the percentile is located at the 10.5th
position.
The 10th observation in the ordered set is 22, and
the 11th observation is also 22.
The 50th percentile will lie halfway between the
10th and 11th values (which are both 22 in this case)
and is thus 22.
1-21
Example - Percentiles
Example - Percentiles
Example – Median
Sorted
Billions
33
Billions 18
26
24
18
18
Median
21
19
18
19
50th Percentile
20 20
18 20
18 20 (20+1)50/100=10.5 22 +
52 21 (.5)(0) = 22
56 22
27 22
22 23 Median
18 24
49
22
26
27
The median is the middle
20
23
32
33 value of data sorted in
32 49
20 52 order of magnitude.
18 56
b. 27 27 27 55 55 55 88 88 99
Bimodal - 27 &
55
c. 1 2 3 6 7 8 9 10
No Mode
Consider following data sets,
Group Data
Exhaustive
Every observation is assigned to a group
Equal-width (if possible)
First or last group may be open-ended
1-34
Frequency Distribution
x f(x) f(x)/n
Spending Class ($) Frequency (number of customers) Relative Frequency
184 1.000
x F(x) F(x)/n
Spending Class ($) Cumulative Frequency Cumulative Relative Frequency
2-2 Basic
Definitions
Experiment
Intersection (And)
A
– a set containing all elements in both A and
B
Union (Or) B
A
– a set containing all elements in A or B or
both B
2-6
A
B
A
B
2-7
Sets: A Union B
A
B
A
B
2-8
B
A
2-10
P( A) 1
P( A)
Intersection - Probability of both A and B
A
P( A B) n(n(S )
Mutually exclusiveB)
events (A and C) :
P( A C ) 0
2-12
P( A B) n (n(S) A B ) P( A) P ( B ) P( A
B) exclusive events: If A and B are mutually exclusive, then
Mutually
P ( A B ) 0 so P ( A B ) P ( A)
P(B)
Sampling from a Population
with Replacement
Sampling n items from a population of size N with replacement
would provide
(N)n possibilities
where
N = population size
n = sample size
Q. Each time a die, which has six sides, is rolled, the outcomes are
independent (with replacement) of the previous roll. If a die is
rolled three times in succession, how many different outcomes
can occur?
Q. Each time a die, which has six sides, is rolled, the outcomes are
independent (with replacement) of the previous roll. If a die is
rolled three times in succession, how many different outcomes
can occur?
where
N = population size
n = sample size
Q. Suppose a small law firm has 16 employees and three are to be
selected randomly to represent the company at the annual
meeting of the American Bar Association. How many different
combinations of lawyers could be sent to the meeting?
Q. Suppose a small law firm has 16 employees and three are to be
selected randomly to represent the company at the annual
meeting of the American Bar Association. How many different
combinations of lawyers could be sent to the meeting?
Types of Probabilities
Addition Laws
Q. Yankelovich Partners conducted a survey for the American
Society of Interior Designers in which workers were asked which
changes in office design would increase productivity. Respondents
were allowed to answer more than one type of design change. The
number one change that 70% of the workers said would increase
productivity was reducing noise. In second place was more
storage/filing space, selected by 67%. In addition, suppose 56% of
the respondents believed both noise reduction and increased
storage space would improve productivity. If one of the survey
respondents was randomly selected and asked what office design
changes would increase worker productivity, what is the probability
that this person would select reducing noise or more storage/filing
space?
Let N represent the event “reducing noise.”
P (W ꓵ T) = P (W) . P (T / W)
Select one industry and one geographic location (say, A—Finance and G—West).
2-42
P ( U W ) .75
P(U W ) 30
P ( W ) . 8 0 P ( W ) 1.8 .2
Bayes’
Theorem
• Bayes’ theorem enables you, knowing just a little more than the
probability of A given B, to find the probability of B given A.
• Based on the definition of conditional probability and the law of total
probability.
P( A B)
P ( B A) P ( A)
P( A B) Applying the law of total
probability to the denominator
P( A B) P( A
B )
Applying the definition of
P( A B ) P
P (( BA)B) P
P (( BA)B ) P ( B conditional probability throughout
)
Q. A company screens job applicants for illegal drug use at a certain stage
in their hiring process. The specific test they use has a false positive rate
of 2% and a false negative rate of 1%. Suppose that 5% of all applicants
are actually using illegal drugs, and we randomly select an applicant.
Given the applicant tests positive, what is the probability that they are
actually on drugs ?
Q. A company screens job applicants for illegal drug use at a certain stage
in their hiring process. The specific test they use has a false positive rate
of 2% and a false negative rate of 1%. Suppose that 5% of all applicants
are actually using illegal drugs, and we randomly select an applicant.
Given the applicant tests positive, what is the probability that they are
actually on drugs ?
495
‘positive’
99%
500
On drugs 1%
5
5% ‘negative’
10,000 190
applicants ‘positive’
2%
9500
95% Not on drugs
98%
9
3
1
0
‘
n
e
g
a
Q. A company screens job applicants for illegal drug use at a certain stage
in their hiring process. The specific test they use has a false positive rate
of 2% and a false negative rate of 1%. Suppose that 5% of all applicants
are actually using illegal drugs, and we randomly select an applicant.
Given the applicant tests positive, what is the probability that they are
actually on drugs ?
495
‘positive’
99%
500
On drugs 1%
5
5% ‘negative’
10,000 190
applicants ‘positive’
2%
9500
95% Not on drugs
98%
9
3
1
0
‘
n
e
g
a
Q. Consider a test for an illness. The test has a known reliability.
1. When administered to an ill person, the test will indicate so with
probability 0.92
2. When administered to a person who is not ill, the test will
erroneously give a positive result with probability 0.04
Suppose the illness is rare and is known to affect only 0.1% of the entire
population. If a person is randomly selected from the entire population
and is given the test and the result is positive, what is the posterior
probability that the person in ill ?
2-50
Example
Q. An economist believes that during periods of high economic growth, the U.S.
dollar appreciates with probability 0.70; in periods of moderate economic
growth, the dollar appreciates with probability 0.40; and during periods of low
economic growth, the dollar appreciates with probability 0.20. During any period
of time, the probability of high economic growth is 0.30, the probability of
moderate economic growth is 0.50, and the probability of low economic growth
is 0.50. Suppose the dollar has been appreciating during the present period.
What is the probability we are experiencing a period of high economic growth?
2-58
P(H A)
P(H A)
P(H
P( A)
P(H A) A) P( M A) P( L
A)
P( A H) P(H)
P( A H) P(H) P( A M ) P( M )
P( A L) P( L)
(0.70)(0.30)
(0.70)(0.30)
0.210.20 (0.40)(0.50) (0.20)
0.04
0.467
0.45 0.21 (0.20)
0.21
2-59
Tree Diagram
P( A H) 0.30
P ( H ) 0.30 P( A H) (0.30)(0.30) 0.09
P( M ) 0.50
2 3 4 5 6 7 4 3/36
5 4/36
1,1 1,2 1,3 1,4 1,5 1,6 8 6 5/36
0.1
2
p(x
2,1 2,2 2,3 2,4 2,5 2,6 9 7 6/36
)
0.0
3,1 3,2 3,3 3,4 3,5 3,6 10 8 5/36 7
counts
length) data are measured and recorded, they become discrete data
Once continuous
because the data are rounded off to a discrete number
3-4
x. 2. P(x) 1
all x
Corollary: 0 P(X) 1
Problem
800, 900 and Now : the 500 Telephone
Numbers
The new code 500 is for busy, affluent people who
travel a lot; It can work with a cellular phone, your
home phone, office phone, second-home phone,
x P(x)
up to five additional phones besides your regular
0 0.1
one. The computer technology behind this service 1 0.2
is astounding – it first ring you up at the 2 0.3
telephone number specified as primary number. If 3 0.2
there is no answer, the computer switches to 4 0.1
search for you at your second-specified phone 5 0.1
number; if you do not answer there, it will switch 1.00
to your third phone; and so on up to five
allowable switches. From the data available on
an experimental run, the following probability
distribution is constructed for the number of
dialing switches that are necessary before a
3-6
1 0.2 0.3 0 .8
0 .7
2 0.3 0.6 0 .6
F(x)
0 .5
3 0.2 0.8 0 .4
0 .3
4 0.1 0.9 0 .2
5 0.1 1.0 0 .1
0 .0
1.00 0 1 2
x
3 4 5
x P(x) F(x)
0 0.1 0.1
1 0.2 0.3
2 0.3 0.6
3 0.2 0.8
4 0.1 0.9
5 0.1 1.0
1
x P(x) F(x)
0.1 0.1
0
1 0.2 0.3
2 0.3 0.6
3 0.2 0.8
4 0.1 0.9
5 0.1 1.0
1
Note: P(X > 1) = P(X > 2) = 1 – P(X < 1) = 1 – F(1) = 1 – 0.3 = 0.7
3-9
x P(x) F(x)
0 0.1 0.1
1 0.2 0.3
2 0.3 0.6
3 0.2 0.8
4 0.1 0.9
5 0.1 1.0
1
Note: P(1 < X < 3) = P(X < 3) – P(X < 0) = F(3) – F(0) = 0.8 – 0.1 = 0.7
3-10
2.3
3 4 5
0 0.1 0.0
The expected value of a discrete random 1 0.2 0.2
variable X is equal to the sum of each 2 0.3 0.6
value of the random variable multiplied 3 0.2 0.6
by its probability. 4 0.1 0.4
5 0.1 0.5
E ( X ) xP(x) 1.0 2.3 = E(X) =
all x
3-11
A Fair Game
Suppose you are playing a coin toss game in which you are
paid $1 if the coin turns up heads and you lose $1 when the
coin turns up tails. The expected value of this game is E(X)
=
0. A game of chance with an expected payoff of 0 is called a
fair game.
x P(x) xP(x)
-1 0.5 -0.50
1 0.5 0.50 -1
0
1
V ( X ) E [( X ) 2 ] all( xx ) 2 P ( x )
2
E ( X 2 ) [ E ( X )] 2 x 2 P ( x ) xP ( x )
all x all x
Number of (x )2P(x)
Switches, x P(x) xP(x) (x-) (x-)2 (x-)2P(x) x2P(x)
0 0.1 0.0 -2.3 5.29 0.529 0.0
2.01
1 0.2 0.2 -1.3 1.69 0.338 0.2 all x
2 0.3 0.6 -0.3 0.09 0.027 1.2
3 0.2 0.6 0.7 0.49 0.098 1.8
4 0.1 0.4 1.7 2.89 0.289 1.6 E( X 2 )
5 0.1 0.5 2.7 7.29 0.729 2.5
2.3 2.010 7.3 X)]2
[E( 2
= 2.3. x2P(x) xP(x)
Recall:
all x
all
x
7.3 2.32
2.01
Variance of a Linear Function of a
Random Variable
The variance of a linear function of a random variable is:
V(a X b) a2V(X) a22
V(X)
2
Number E( X 2 )
of items, x P(x) xP(x) x P(x)
2
[E(
X )]2
2
5000 0.2 1000 5000000
x 2 P(x) xP(x)
6000 0.3 1800 10800000 all x all x
7000 0.2 1400 9800000 46500000 (67002 ) 1610000
8000 0.2 1600 12800000
9000 0.1 900 8100000 SD( X )
1.0 6700 1610000 1268.86
46500000 V (2 X 8000)(4)(1610000)
(2 2 )V ( X ) 6440000
(2x SD(2x 8000)
8000)
2 x (2)(1268.86)
2537.72
Problem
During one holiday season, the Texas lottery played a game
called the Stocking Stuffer. With this game, total instant
winnings of $34.8 million were available in 70 million $1
tickets, with ticket prizes ranging from $1 to $1,000. Shown
here are the various prizes and the probability of winning
each prize. Use these data to compute the expected value
of the game, the variance of the game, and the standard
deviation of the game.
Solution
Solution
Sum and Linear Composites of
Random Variables
The mean or expected value of the sum of random
variables is the sum of their means or expected values:
( XY) E(X Y) E(X)E(Y) X Y
and E(Y) = $200
For example: E(X) =$350
E(X+Y) = $350 + $200 = $550
The variance of the sum of mutually independent random
variables is the sum of their variances:
2 ( X Y) V( X Y) V ( X ) V (Y) 2 X 2 Y
if and only if X and Y are independent.
and
V (a1X1a2X2...ak Xk )a2 V (X1)a2V (X2)...a2V (X k )
1
Sum and Linear Composites of
Random Variables (Continued)
A portfolio includes stocks in three
industries: financial, energy, and
The mean of the sum of the three
consumer goods. Assume that the three
sectors are independent of each other. random variables is: 1,000 +
The expected annual return and 1,200
standard deviations are as follows: + 600 = $2,800.
Financial – 1,000, and 700; energy –
1,200 and
1,100; consumer goods – 600 and 300.
What is the mean and standard deviation
of the annual return on this portfolio?
* Theterms success and failure are simply statistical terms, and do not have
positive or negative implications. In a production setting, finding a defective
product may be termed a “success,” although it is not a positive result.
The Binomial Random Variable
10 (1/32)
Number of outcomes Probability of each
with 2 heads outcome with 2 heads
In general:
1. The probability of a given sequence 2. The number of different sequences of n trials that
of x successes out of n trials with result in exactly x successes is equal to the number
probability of success p and of choices of x elements out of a total of n elements.
probability of failure q is equal to: This number is denoted:
n n!
pxq(n-x) nCx
x x!(n x)!
3-32
n=5
p
x 0.01 0.05 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 0.95 0.99
0 .951 .774 .590 .328 .168 .078 .031 .010 .002 .000 .000 .000 .000
1 .999 .977 .919 .737 .528 .337 .187 .087 .031 .007 .000 .000 .000
2 1.000 .999 .991 .942 .837 .683 .500 .317 .163 .058 .009 .001 .000
3 1.000 1.000 1.000 .993 .969 .913 .813 .663 .472 .263 .081 .023 .001
4 1.000 1.000 1.000 1.000 .998 .990 .969 .922 .832 .672 .410 .226 .049
h F(h) P(h)
Cumulative Binomial Deriving Individual Probabilities
Probability Distribution and 0 0.031 0.031
from Cumulative Probabilities
Binomial Probability 1 0.187 0.156
Distribution of H,the 2 0.500 0.313 F (x) P( X x) P(i)
Number of Heads 3 0.813 0.313 all i x
Problem
V ( X ) npq
2
V (H ) (5)(.5)(.5) 1.25
H
2
= SD(X) = npq
3-48
P(x
P(x
)
)
0.3 0.3 0.3
Binomial Probability: n=10 p=0.1 Binomial Probability: n=10 p=0.3 Binomial Probabil i ty: n=10 p=0.5
0. 5 0. 5 0. 5
0. 4 0. 4 0. 4
n = 10 0. 3 0. 3 0. 3
P(x
P(x
P(x
)
)
)
0. 2 0. 2 0. 2
0. 1 0. 1 0. 1
0. 0 0. 0 0. 0
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
x
x x
Binomial Probability: n=20 p=0.1 Binomial Probability: n=20 p=0.3 Binomial Probability: n=20 p=0.5
n = 20
P(x
P(x
P(x
)
)
0.1 0.1 0.1
Example:
• The average number of customers arriving at a Big-Bazaar store during a one-minute interval
will vary from hour to hour, day to day, and month to month.
• The number of flaws per pair of jeans might vary from Monday to Friday.
Problem
Suppose bank customers arrive randomly on weekday afternoons at
an average of 3.2 customers every 4 minutes. What is the probability
of exactly 5 customers arriving in a 4-minute interval on a weekday
afternoon?
Problem
Suppose bank customers arrive randomly on weekday afternoons at
an average of 3.2 customers every 4 minutes. What is the probability
of exactly 5 customers arriving in a 4-minute interval on a weekday
afternoon?
Problem
Bank customers arrive randomly on weekday afternoons at an
average of 3.2 customers every 4 minutes. What is the
probability of having more than 7 customers in a 4-minute
interval on a weekday afternoon?
Problem
Bank customers arrive randomly on weekday afternoons at an
average of 3.2 customers every 4 minutes. What is the
probability of having more than 7 customers in a 4-minute
interval on a weekday afternoon?
Problem
A bank has an average random arrival rate of 3.2
customers every 4 minutes. What is the probability of
getting exactly 10 customers during an 8-minute
interval?
Problem
A bank has an average random arrival rate of 3.2 customers
every 4 minutes. What is the probability of getting exactly 10
customers during an 8-minute interval?
Sol.
Never adjust or change x in a problem. Just because 10 customers arrive
in one 8-minute interval does not mean that there would necessarily
have been five customers in a 4-minute interval. There is no guarantee
how the 10 customers are spread over the 8-minute interval. Always
adjust the lambda value.
Using the Poisson Tables
Problem
If a real estate office sells 1.6 houses on an average weekday
and sales of houses on weekdays are Poisson distributed,
what is the probability of selling exactly 4 houses in one day?
What is the probability of selling no houses in one day? What
is the probability of selling more than five houses in a day?
What is the probability of selling 10 or more houses in a day?
What is the probability of selling exactly 4 houses in two
days?
Mean and Standard Deviation of a
Poisson Distribution
A continuous random variable is a random variable that can take on any value in
an interval of numbers.
The probabilities associated with a continuous random variable X are determined by the
probability density function of the random variable. The function, denoted f(x), has the
following properties.
F(x) = P(X x) =Area under f(x) between the smallest possible value of X (often -)
and the point x.
3-69
}
F(b)
P(a X b) = Area
under
f(x) between a and b
= F(b) - F(a)
x
0 a b
Standard Distributions
Six continuous distributions :
1. uniform distribution
2. normal distribution
3. exponential distribution
4. t distribution
5. chi-square distribution
6. F distribution
3-71
Uniform Distribution
x1 x2
3-72
Problem
Suppose a production line is set up to manufacture machine braces
in lots of five per minute during a shift. When the lots are
weighed, variation among the weights is detected, with lot weights
ranging from 41 to 47 grams in a uniform distribution.
a) What is the mean and standard deviation ?
b) Determine the probability that a lot weighs between 42 and 45
grams.
3-73
Problem
Problem
Suppose the amount of time it takes to assemble a plastic
module ranges from 27 to 39 seconds and that assembly
times are uniformly distributed. What is the probability that
a given assembly will take between 30 and 35 seconds?
Fewer than 30 seconds?
Problem
Suppose the amount of time it takes to assemble a plastic
module ranges from 27 to 39 seconds and that assembly
times are uniformly distributed. What is the probability that
a given assembly will take between 30 and 35 seconds?
Fewer than 30 seconds?
Problem
According to the National Association of Insurance
Commissioners, the average annual cost for automobile
insurance in the United States in a recent year was $691.
Suppose automobile insurance costs are uniformly
distributed in the United States with a range of from $200 to
$1,182. What is the standard deviation of this uniform
distribution? What is the height of the distribution? What
is the probability that a person’s annual cost for
automobile insurance in the United States is between
$410 and $825?
Problem
Exponential Distribution
• It is closely related to the Poisson distribution. Whereas the Poisson
distribution is discrete and describes random occurrences over some
interval, the exponential distribution is continuous and describes a
probability distribution of the times between random occurrences
• Characteristics of the exponential distribution.
■ It is a continuous distribution.
■ It is a family of distributions.
■ It is skewed to the right.
■ The x values range from zero to infinity.
■ Its apex is always at x = 0.
■ The curve steadily decreases as x gets larger
Exponential Distribution
Mean = 1 / λ
σ=1/λ
Introduction
As n increases, the binomial distribution approaches a ...
n=6 n = 10 n = 14
Binomial Distribution: n=6, p=.5 Binomial Distribution: n=10, p=.5 Binomial Distribution: n=14, p=.5
)
P(x
P(x P
)
)
0.1 0.1 0.1
4
x 2
0.
f(x
3
)
f ( x) 1 e 2 2 for x 0.
2
2 2 0.
1
-5 0 5
0.
0
...
4-3
0.
2
4
x
0.
2 2
3
for
f(x
f ( x ) 1 e x
)
0.
2 2
2
0.
0.0
4-4
0.4
0.3
=1
f(z)
{
0.2
0.1
0.0
-5 -4 -3 -2 -1 0 1 2 3 4 5
=0
Z
4-6
0.8 0.2881 0.2910 0.2939 0.2967 0.2995 0.3023 0.3051 0.3078 0.3106 0.3133
0. 0.9 0.3159 0.3186 0.3212 0.3238 0.3264 0.3289 0.3315 0.3340 0.3365 0.3389
1.0 0.3413 0.3438 0.3461 0.3485 0.3508 0.3531 0.3554 0.3577 0.3599 0.3621
2 1.56 1.1 0.3643 0.3665 0.3686 0.3708 0.3729 0.3749 0.3770 0.3790 0.3810 0.3830
{
1.2 0.3849 0.3869 0.3888 0.3907 0.3925 0.3944 0.3962 0.3980 0.3997 0.4015
0.0
0. 1.3 0.4032 0.4049 0.4066 0.4082 0.4099 0.4115 0.4131 0.4147 0.4162 0.4177
1 -5 -4 -3 -2 -1 0 1 2 3 1.4 0.4192 0.4207 0.4222 0.4236 0.4251 0.4265 0.4279 0.4292 0.4306 0.4319
4 5 1.5 0.4332 0.4345 0.4357 0.4370 0.4382 0.4394 0.4406 0.4418 0.4429 0.4441
1.6 0.4452 0.4463 0.4474 0.4484 0.4495 0.4505 0.4515 0.4525 0.4535 0.4545
Z
1.7 0.4554 0.4564 0.4573 0.4582 0.4591 0.4599 0.4608 0.4616 0.4625 0.4633
1.8 0.4641 0.4649 0.4656 0.4664 0.4671 0.4678 0.4686 0.4693 0.4699 0.4706
1.9 0.4713 0.4719 0.4726 0.4732 0.4738 0.4744 0.4750 0.4756 0.4761 0.4767
2.0 0.4772 0.4778 0.4783 0.4788 0.4793 0.4798 0.4803 0.4808 0.4812 0.4817
0.
)
0.1
0.0
-5 -4 -3 -2 -1 0 1 2 3
4 5
Z
4-8
. .
1. Find table area for 2.00 .
0.9
.
0.3159 ...
1.0 0.3413 ...
F(2) = P(Z 2.00) = .5 + .4772 =.9772 1.1
.
0.3643
.
...
0.4
0.3
Area between 1 and 2
P(1 Z 2) = .9772 - .8413 = 0.1359
f(z
0.2
)
0.1
0.0
-5 -4 -3 -2 -1 0 1 2 3
4 5
Z
4-10
Finding Values of the Standard Normal
Random Variable: P(0 < Z < 1.28) and
P (Z< 1.28)
4-11
Finding Values of the Standard Normal
Random Variable: P(0 < Z < 1.28) and
P (Z< 1.28)
z .00 .01 .02 .03 .04 .05 .06 .07 .08 .09
To find z such that 0.0
0.1
0.0000
0.0398
0.0040
0.0438
0.0080
0.0478
0.0120
0.0517
0.0160
0.0557
0.0199
0.0596
0.0239
0.0636
0.0279
0.0675
0.0319
0.0714
0.0359
0.0753
0.2 0.0793 0.0832 0.0871 0.0910 0.0948 0.0987 0.1026 0.1064 0.1103 0.1141
0.3 0.1179 0.1217 0.1255 0.1293 0.1331 0.1368 0.1406 0.1443 0.1480 0.1517
P(0 Z z) = .40:
0.4 0.1554 0.1591 0.1628 0.1664 0.1700 0.1736 0.1772 0.1808 0.1844 0.1879
0.5 0.1915 0.1950 0.1985 0.2019 0.2054 0.2088 0.2123 0.2157 0.2190 0.2224
0.6 0.2257 0.2291 0.2324 0.2357 0.2389 0.2422 0.2454 0.2486 0.2517 0.2549
0.7 0.2580 0.2611 0.2642 0.2673 0.2704 0.2734 0.2764 0.2794 0.2823 0.2852
0.8 0.2881 0.2910 0.2939 0.2967 0.2995 0.3023 0.3051 0.3078 0.3106 0.3133
0.
)
50 0.
0 -5 -4 -3 -2 -1 0 1 2 3 4 5
P(Z 1.28) .90 Z Z = 1.28
4-12
495
Look to the table of standard normal probabilities Total area in center = .99
to find that: Area in center left = .495
0.4
z.005
0.3
f(z
0.2
)
P(-.2575 Z Area in left tail = .005
0.1
Area in right tail = .005
0.2
0.3
f(w
f(x
f(y
0.2 0.1 0.1
)
)
0.1
0.2
)
0.1
P(25 X 35) normal probability density
0.0
P(47 Y 53) function.
P(-1 Z 1)
-5 0 5
z
Z~N(0,1)
4-14
The transformation of X to Z:
X x Normal Distribution: =50,
Z =10
x 0.0
7
Transformation 0.0
6
f(x
0.0
(1) Subtraction: (X - x)
)
5
0.0
3
0.0
0.0 =10
{
Standard Normal Distribution 4
2
0.0
0.4 1
0 10 20 30 40 50 60 70 80 90 100
0.0
0 X
0.3
f(z
0.2
)
0.1 1.0
XxZx
0.0
-5 -4 -3 -2 -1 0 1 2 3 4 5
Z
4-15
a
P( X a) P Z
b
P( X b) P Z
X b) P a
P(a Zb
4-16
P (100 X 180 )
100
180
P
X
100 160 180 160
P 30 Z 30
P 2 Z .6666
P ( X 150 )
X 150
P
150 127
P Z
22
P Z 1.045
0.5 0.3520
0.8520
4-20
P(394 X 399)
394 X 399
P
394 383 399 383
P 12 Z
12
P 0.9166 Z 1.333
f(z)
0.
0.
random variable will be within 2 1
X~N(124,122)
P(X > x) = 0.10 .
Find x.
4-27
X~N(124,122)
P(X > x) = 0.10 .
Find x.
X~N(124,122)
P(X > x) = 0.10 and P(Z > 1.28) 0.10
x = + z = 124 + (1.28)(12) = 139.36
4-28
X~N(5.7,0.52)
P(X >
x)=0.01
Find x.
4-29
X~N(5.7,0.52)
P(X >
x)=0.01
Find x.
X~N(5.7,0.52)
P(X > x)=0.01 and P(Z > 2.33)
0.01 x = + z = 5.7 + (2.33)(0.5) =
6.865
4-30
Approximating a Binomial
Probability Using the Normal
Distribution
For large n values, the binomial distribution is cumbersome to
analyze without a computer. Binomial Probability Tables goes only
to n = 25. The normal distribution is a good approximation for
binomial distribution problems for large values of n.
P(x
P(x
)
)
0.3 0.3 0.3
Binomial Probability: n=10 p=0.1 Binomial Probability: n=10 p=0.3 Binomial Probabil i ty: n=10 p=0.5
0. 5 0. 5 0. 5
0. 4 0. 4 0. 4
n = 10 0. 3 0. 3 0. 3
P(x)
P(x
P(x
)
)
0. 2 0. 2 0. 2
0. 1 0. 1 0. 1
0. 0 0. 0 0. 0
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
x
x x
Binomial Probability: n=20 p=0.1 Binomial Probability: n=20 p=0.3 Binomial Probability: n=20 p=0.5
n = 20
P(x
P(x
P(x
)
)
0.1 0.1 0.1
0.3
0.2
0.2
P(x
f(x
)
)
0.1
0.1
0.0 0.0
0 5 10 0 1 2 3 4 5 6 7 8 9 10 11
X X
= 0.0336
Correcting for Continuity
Problem
Problem
Using Statistics
• Statistical Inference:
Predict and forecast values of On basis of sample
population parameters... statistics derived from
Test hypotheses about values limited and incomplete
of population sample information.
parameters...Make decisions..
Make
generalizations
about the On the basis of
characteristics of observations of a
a population... sample, a part
of a population
Sampling - Frame
• Every research study has a target population that consists of
the individuals, institutions, or entities that are the object of
investigation. The sample is taken from a population list, map,
• directory, or other source used to represent the population. This
list, map, or directory is called the frame.
• School lists, trade association lists, or even lists sold by list
brokers.
• Frames that have overregistration contain the target population
units plus some additional units.
• Frames that have underregistration contain fewer units than does
the target population.
• Sampling is done from the frame, not the target population.
• In theory, the target population and the frame are the same.
• In reality, a business researcher’s goal is to minimize the
differences between the frame and the target
Types of Sampling
The two main types of sampling are random and nonrandom.
• In random sampling every unit of the population has the same
probability of being selected into the sample. Random sampling
implies that chance enters into the process of selection.
• winners of nationwide magazine sweepstakes or numbers
selected as state lottery winners are selected by some random
draw of numbers
• In nonrandom sampling not every unit of the population has the
same probability of being selected into the sample. Members of
nonrandom samples are not selected by chance.
• People might be selected because they are at the right place at the
right time or because they know the people conducting the
research.
Random Sampling Techniques
X X X X X X X
X X X X X X
X X X X X
Sample points
Sample mean ( X)
5-10
Sampling Distributions
P(X)
5 0.125 0.625 0.5 0.25 0.03125 0.1
6 0.125 0.750 1.5 2.25 0.28125
7 0.125 0.875 2.5 6.25 0.78125
8 0.125 1.000 3.5 12.25 1.53125
0.0
1.000 4.500 5.25000 1 2 3 4 5 6 7 8
X
E(X) = = 4.5
V(X) = 2 = 5.25
SD(X) = = 2.2913
5-12
P(X)
3.0 0.078125 0.234375 -1.5 2.25 0.175781 0.05
3.5 0.093750 0.328125 -1.0 1.00 0.093750
4.0 0.109375 0.437500 -0.5 0.25 0.027344
4.5 0.125000 0.562500 0.0 0.00 0.000000 0.00
5.0 0.109375 0.546875 0.5 0.25 0.027344 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5
8.0
5.5 0.093750 0.515625 1.0 1.00 0.093750 X
6.0 0.078125 0.468750 1.5 2.25 0.175781
6.5 0.062500 0.406250 2.0 4.00 0.250000 E ( X̅ ) μx̅ 4.5
7.0
7.5
0.046875
0.031250
0.328125
0.234375
2.5
3.0
6.25
9.00
0.292969
0.281250
V ( X̅ ) σ2x̅ 2.625
8.0 0.015625 0.125000 3.5 12.25 0.191406 SD( X̅ ) σx̅
1.000000 4.500000 2.625000 1.6202
5-14
P(X)
0.1
symmetric. 1 2 3 4
X
5 6 7 8
0.00
1.01.52.02.53.03.54.04.5 5.05.56.06.57.07.58.0
X
5-15
n=5
When sampling from a population 0.2
P(
X)
0
P(
X)
0.1
(n >30). Large n
0.
4
0.
X)
f(
3
V(X) 2 X
X
n
The standard deviation of the sample mean, known as the
standard error of the mean, is equal to the population standard
deviation divided by the square root of the sample size:
SD( X )
X
n
X
5-17
)
This means that, as the Sampling Distribution of the Sample
Mean
sample size increases, the 0.4
Sampling Distribution: n = 2
centered on the population 0.1
Normal population
Normal population
mean, but becomes more 0.0
compactly distributed around
that population mean.
The Central Limit Theorem
The central limit theorem creates the potential for applying
the normal distribution to many problems when sample size is
sufficiently large. Sample means that have been computed
for random samples drawn from normally distributed
populations are
normally distributed.
However, the real advantage of the central limit theorem
comes when
sample data drawn from populations not normally distributed or
from populations of unknown shape also can be analyzed by using
the normal distribution because the sample means are normally
5-19
Population
n=2
n = 30
X X X X
Problem
The mean expenditure per customer at a tire store is $85.00, with a
standard deviation of $9.00. If a random sample of 40 customers is
taken, what is the probability that the sample average expenditure
per customer for this sample will be $87.00 or more?
Solution
Problem
Mercury makes a 2.4 liter V-6 engine, the Laser XRi, used in
speedboats. The company’s engineers believe the engine
delivers an average power of 220 horsepower and that the
standard deviation of power delivered is 15 HP. A potential
buyer intends to sample 100 engines (each engine is to be run a
single time). What is the probability that the sample mean will
be less than 217HP?
5-25
Solution
P( X 217) P X 217
n
n
217 217
P Z
P Z
2201 5 22015
100 10
P(Z 2)
0.0228
Sampling from a Finite Population
In cases of a finite population, a statistical adjustment can be
made to the z formula for sample means. The adjustment is
called the finite correction factor
Problem
A production company’s 350 hourly employees average 37.6 years
of age, with a standard deviation of 8.3 years. If a random sample
of 45 hourly employees is taken, what is the probability that the
sample will have an average age of less than 40 years?
Solution
0 .4
)
P
0 .2
number of trials, n.
0 .0
01 2
n=10,p=0.3
X
0.3
$ p
Sample proportion:
0.2
P(X
N
)
0.1
0.0
deviation X
(
)
P 0.1
p(1
p) n 0.0
0 1 2 3 4 5 6 7 8 9 1011 21 31 41 51
0 1 2 3 4 5 6 7 8 9 10 11 12 13 1415
X
Solution
n 100
p 0.25 P( p$ 0.20) P p$ p .20 p
p(1 p) p(1
np (100)(0.25) 25 E ( p$ ) p) n n
p(1 p)
(.25)(.75)
.20 .25
.05
n 100
0.001875 V ( p$ )
P z
(.25)(.75)
P z
.0433
p(1 p) 100
0.001875 0.04330127 SD( p$ )
n
P ( z 1.15) 0.8749
5-38
◼ A sample statistic is a
numerical measure of a
summary
characteristic of a
sample.
A population parameter
is a numerical measure of
a summary characteristic
of a population.
5-39
Estimators
Estimators
5-42
Unbiasedness
{
Bias
Efficiency
Consistency
n = 10 n = 100
The sample variance (the sum of the squared deviations from the
sample mean divided by (n-1) is an unbiased estimator of the
population variance. In contrast, the average squared deviation
from the sample mean is a biased (though consistent) estimator of
the population variance.
E (s2 ) ( x x)
E 2
(n 1)
( x x) 2
E 2
n
6-1
Confidence Intervals
6-2
Using Statistics
Types of Estimators
• Point Estimate
A single-valued estimate.
A single element chosen from a sampling distribution.
Conveys little information about the actual value of the
population parameter, about the accuracy of the estimate.
0.2
n
0.1
2.5% 2.5%
Conversely, about 2.5% can be
0.0
expected to be above 1.96 n and
x
1.96 n 1.96 n
x 1.96 n .
2.5% fall below
the interval x
x
x
So 5% can be expected to fall outside
x 2.5% fall above .
x
the interval the interval 1.96 n , n
x
1.96
x
0.2
)
0.1
2.5% 2.5% sample mean falls within the 95%
0.0 interval around the population mean.)
x
1.96 n 1.96 n
x x x
0.4
x
P 1.96 n 1.96 0.95 0.3
n
f(z
0.2
)
or 0.1
0.0
-4 -3 -2 -1 0 1 2 3 4
z
x
P x 1.96 n 1.96 0.95
n
6-8
0.2
)
(1 )
z Stand ard Norm al Distrib ution
2 2
0.4
(1
)
0.99 0.005 2.576 0.3
f(z
0.2
)
0.1
2 2
0.95 0.025 1.960 0.0
-5 -4 -3 -2 -1 0 3 4 5
2
6-12
0.3 0.3
f(z)
f(z)
0.2 0.2
0.1 0.1
0.0 0.0
-5 -4 -3 -2 -1 0 1 2 3 4 5 -5 -4 -3 -2 -1 0 1 2 3 4 5
Z Z
f(x
0 .2
)
0 .4
0 .3
0 .1 0 .2
0 .1
0 .0 0 .0
x x
Problem
An economist wants to estimate the average amount in checking
accounts at banks in a given region. A random sample of 100 accounts
gives x-bar = $357.60 and s = $140.00. Give a 95% confidence
interval for μ, the average amount in any checking account at a bank in
the given region.
6-20
Problem
An economist wants to estimate the average amount in checking
accounts at banks in a given region. A random sample of 100 accounts
gives x-bar = $357.60 and s = $140.00. Give a 95% confidence
interval for μ, the average amount in any checking account at a bank in
the given region.
s
x z 0.025 140.00
n 357.60 1.96 100 357.60 27.44 330.16,385.04
6-21
The estimator of the population proportion, p, is the sample proportion, p. If the
For estimating p, a sample is considered large enough when both n p an n q are greater
than 5.
6-22
Example
Example
t 2 n
where t2 is the value of the t distribution with n-1 degrees of
freedom that cuts off a tail area of 2 to its right.
6-31
The t Distribution
df t0.100 t0.050 t0.025 t0.010 t0.005
--- ----- ----- ------ ------ ------
1 3.078 6.314 12.706 31.821 63.657 t D is trib utio n: d f= 1
2 1.886 2.920 4.303 6.965 9.925 0
3 1.638 2.353 3.182 4.541 5.841 0 .4
4 1.533 2.132 2.776 3.747 4.604
5 1.476 2.015 2.571 3.365 4.032
6 1.440 1.943 2.447 3.143 3.707 0 .3
7 1.415 1.895 2.365 2.998 3.499 Area = 0.10 Area = 0.10
8 1.397 1.860 2.306 2.896 3.355
}
f(t)
9 1.383 1.833 2.262 2.821 3.250 0 .2
10 1.372 1.812 2.228 2.764 3.169
11 1.363 1.796 2.201 2.718 3.106
0 .1
12 1.356 1.782 2.179 2.681 3.055
13 1.350 1.771 2.160 2.650 3.012
14 1.345 1.761 2.145 2.624 2.977
0 .0
15 1.341 1.753 2.131 2.602 2.947 -1.372 1.372
-2.228 0
16 1.337 1.746 2.120 2.583 2.921 2.228
}
17 1.333 1.740 2.110 2.567 2.898 t
18 1.330 1.734 2.101 2.552 2.878
19 1.328 1.729 2.093 2.539 2.861 Area = 0.025 Area = 0.025
20 1.325 1.725 2.086 2.528 2.845
21 1.323 1.721 2.080 2.518 2.831
22 1.321 1.717 2.074 2.508 2.819
23
24
1.319
1.318
1.714
1.711
2.069
2.064
2.500
2.492
2.807
2.797 Whenever is not known (and the population is
25
26
1.316
1.315
1.708
1.706
2.060
2.056
2.485
2.479
2.787
2.779 assumed normal), the correct distribution to use is
27
28
1.314
1.313
1.703
1.701
2.052
2.048
2.473
2.467
2.771
2.763 the t distribution with n-1 degrees of freedom.
29
30
1.311
1.310
1.699
1.697
2.045
2.042
2.462
2.457
2.756
2.750 Note, however, that for large degrees of freedom,
40
60
1.303
1.296
1.684
1.671
2.021
2.000
2.423
2.390
2.704
2.660 the t distribution is approximated well by the Z
0
1.289
1.282
1.658
1.645
1.980
1.960
2.358
2.326
2.617
2.576 distribution.
6-32
Example
A stock market analyst wants to estimate the average return on a
certain stock. A random sample of 15 days yields an average
(annualized) return of 10.37% and a standard deviation of s =
3.5%. Assuming that the returns are normal, give a 95%
confidence interval for the average return on this stock.
6-33
Example
A stock market analyst wants to estimate the average return on a certain
stock. A random sample of 15 days yields an average (annualized) return
of and a standard deviation of s = 3.5%. Assuming the returns are normal,
give a 95% confidence interval for the average return on this stock.
x 10.37%
df t0.100 t0.050 t0.025 t0.010 t0.005
---
1
-----
3.078
-----
6.314
------
12.706
------
31.821
------
63.657
The critical value of t for df = (n -1) = (15
.
.
.
.
.
.
.
.
.
.
.
.
-1)
13
. .
1.350
.
1.771
.
2.160
.
2.650
.
3.012
t 0.025 area of 0.025 is:
=14 and a right-tail
14
15
1.345
1.341
1.761
1.753
2.145
2.131
2.624
2.602
2.977
2.947 The corresponding
2.145 confidences interval
.
.
.
.
.
.
.
.
.
.
.
. or
interval estimate is: x t
. . . . . . 0.025 n
3.
10.37
515
2.145
10.37
1.94
8.43,12.31
Problem
The owner of a large equipment rental company wants to make a
rather quick estimate of the average number of days a piece of
ditchdigging equipment is rented out per person per time. The company
has records of all rentals, but the amount of time required to
conduct an audit of all accounts would be prohibitive. The owner
decides to take a random sample of rental invoices. Fourteen different
rentals of ditchdiggers are selected randomly from the files, yielding
the following data. She uses these data to construct a 99% confidence
interval to estimate the average number of days that a ditchdigger is
rented and assumes that the number of days per rental is normally
distributed in the population.
Solution
distribution.
6-37
f()
df = 30
The chi-square distribution is
2
0 .05
0 .04
df = 50
skewed to the right.
0 .03
0 .02
0 .01
Indegrees
ofsampling
freedom from a normal population, the random variable:
increase.
2 ( n 1)s
2
2
has a chi - square distribution with (n - 1) degrees of
freedom.
6-38
df .005 .010 .025 .050 .100 .900 .950 .975 .990 .995
1 0.0000393 0.000157 0.000982 0.000393 0.0158 2.71 3.84 5.02 6.63 7.88
2 0.0100 0.0201 0.0506 0.103 0.211 4.61 5.99 7.38 9.21 10.60
3 0.0717 0.115 0.216 0.352 0.584 6.25 7.81 9.35 11.34 12.84
4 0.207 0.297 0.484 0.711 1.06 7.78 9.49 11.14 13.28 14.86
5 0.412 0.554 0.831 1.15 1.61 9.24 11.07 12.83 15.09 16.75
6 0.676 0.872 1.24 1.64 2.20 10.64 12.59 14.45 16.81 18.55
7 0.989 1.24 1.69 2.17 2.83 12.02 14.07 16.01 18.48 20.28
8 1.34 1.65 2.18 2.73 3.49 13.36 15.51 17.53 20.09 21.95
9 1.73 2.09 2.70 3.33 4.17 14.68 16.92 19.02 21.67 23.59
10 2.16 2.56 3.25 3.94 4.87 15.99 18.31 20.48 23.21 25.19
11 2.60 3.05 3.82 4.57 5.58 17.28 19.68 21.92 24.72 26.76
12 3.07 3.57 4.40 5.23 6.30 18.55 21.03 23.34 26.22 28.30
13 3.57 4.11 5.01 5.89 7.04 19.81 22.36 24.74 27.69 29.82
14 4.07 4.66 5.63 6.57 7.79 21.06 23.68 26.12 29.14 31.32
15 4.60 5.23 6.26 7.26 8.55 22.31 25.00 27.49 30.58 32.80
16 5.14 5.81 6.91 7.96 9.31 23.54 26.30 28.85 32.00 34.27
17 5.70 6.41 7.56 8.67 10.09 24.77 27.59 30.19 33.41 35.72
18 6.26 7.01 8.23 9.39 10.86 25.99 28.87 31.53 34.81 37.16
19 6.84 7.63 8.91 10.12 11.65 27.20 30.14 32.85 36.19 38.58
20 7.43 8.26 9.59 10.85 12.44 28.41 31.41 34.17 37.57 40.00
21 8.03 8.90 10.28 11.59 13.24 29.62 32.67 35.48 38.93 41.40
22 8.64 9.54 10.98 12.34 14.04 30.81 33.92 36.78 40.29 42.80
23 9.26 10.20 11.69 13.09 14.85 32.01 35.17 38.08 41.64 44.18
24 9.89 10.86 12.40 13.85 15.66 33.20 36.42 39.36 42.98 45.56
25 10.52 11.52 13.12 14.61 16.47 34.38 37.65 40.65 44.31 46.93
26 11.16 12.20 13.84 15.38 17.29 35.56 38.89 41.92 45.64 48.29
27 11.81 12.88 14.57 16.15 18.11 36.74 40.11 43.19 46.96 49.65
28 12.46 13.56 15.31 16.93 18.94 37.92 41.34 44.46 48.28 50.99
29 13.12 14.26 16.05 17.71 19.77 39.09 42.56 45.72 49.59 52.34
30 13.79 14.95 16.79 18.49 20.60 40.26 43.77 46.98 50.89 53.67
6-39
2
(n 1)s 2
(n 1)s
, 2
1
2
2 2
2
where is the value of the chi-square distribution with n - 1 degrees of freedom
2
that cuts off an area to its right and 2 is the value of the distribution that
2
1
2
cuts off an area of to its left (equivalently, an area of 1 to its
2 2
right).
* Note: Because the chi-square distribution is skewed, the confidence interval for the
population variance is not symmetric
Problem
The U.S. Bureau of Labor Statistics publishes data on the
hourly compensation costs for production workers in
manufacturing for various countries. The latest figures
published for Greece show that the average hourly wage
for a production worker in manufacturing is $16.10. Suppose
the business council of Greece wants to know how consistent
this figure is. They randomly select 25 production workers in
manufacturing from across the country and determine that the
standard deviation of hourly wages for such workers is $1.12.
Use this information to develop a 95% confidence interval to
estimate the population variance for the hourly wages of
production workers in manufacturing in Greece. Assume that
the hourly wages for production workers across the country in
manufacturing are normally distributed.
Solution
A 95% confidence means that alpha is 1 - .95 = .05. This
value is split to determine the area in each tail of the chi-
square distribution: α/2 = .025.
Example
In an automated process, a machine fills cans of coffee. If the average
amount filled is different from what it should be, the machine may
be adjusted to correct the mean. If the variance of the filling
process is too high, however, the machine is out of control and needs
to be repaired. Therefore, from time to time regular checks of the
variance of the filling process are made. This is done by randomly
sampling filled cans, measuring their amounts, and computing the
sample variance. A random sample of 30 cans gives an estimate s2 =
18,540.
Give a 95% confidence interval for the population variance, 2.
6-43
Solution
In an automated process, a machine fills cans of coffee. If the average
amount filled is different from what it should be, the machine may
be adjusted to correct the mean. If the variance of the filling
process is too high, however, the machine is out of control and needs
to be repaired. Therefore, from time to time regular checks of the
variance of the filling process are made. This is done by randomly
sampling filled cans, measuring their amounts, and computing the
sample variance. A random sample of 30 cans gives an estimate s2 =
18,540.
Give a 95% confidence interval for the population variance, 2.
2
(n
2 1)s 2
, (n 1)s (30 45.7
1)18540, (30
1)18540 2
11765,33604
2 1 2
16.0
7-1
Hypothesis
Testing
7-2
Using Statistics
• A hypothesis is a statement or assertion about the state of nature
(about
the true value of an unknown population parameter):
The accused is innocent
= 100
• Every hypothesis implies its contradiction or alternative:
The accused is guilty
100
• A hypothesis is either true or false, and you may fail to reject it or
you may reject it on the basis of information:
Trial testimony and evidence
Sample data
7-3
• H0 and H1 are:
Mutually exclusive
– Only one can be true.
Exhaustive
– Together they cover all possibilities, so one or the other must be
true.
Example
Suppose flour packaged by a manufacturer is sold by weight; and a
particular size of package is supposed to average 40 ounces.
Suppose the manufacturer wants to test to determine whether their
packaging process is out of control as determined by the weight of
the flour packages.
7-6
The tails of a statistical test are determined by the need for an action. If action
is to be taken if a parameter is greater than some value a, then the alternative
hypothesis is that the parameter is greater than a, and the test is a right-tailed
test. H0: 50
H1: 50
H0: p 40%
H1: p < 40%
H0:
H1:
Example
A vendor claims that his company fills any accepted order, on the
average, in at most six working days. You suspect that the average is
greater than six working days and want to test the claim. How will you
set up the null and alternative hypotheses?
Example
A vendor claims that his company fills any accepted order, on the
average, in at most six working days. You suspect that the average is
greater than six working days and want to test the claim. How will you
set up the null and alternative hypotheses?
Example
A manufacturer of golf balls claims that the variance of the weights of
the company’s golf balls is controlled to within 0.0028 oz2. If you wish
to test this claim, how will you set up the null and alternative
hypotheses?
Example
A manufacturer of golf balls claims that the variance of the weights of
the company’s golf balls is controlled to within 0.0028 oz2. If you wish
to test this claim, how will you set up the null and alternative
hypotheses?
Example
At least 20% of the visitors to a particular commercial Web site where
an electronic product is sold are said to end up ordering the product. If
you wish to test this claim, how will you set up the null and alternative
hypotheses?
Example
Suppose a company has held an 18% share of the market. However,
because of an increased marketing effort, company officials believe the
company’s market share is now greater than 18%, and the officials
would like to prove it.
Because the company officials are only interested in “proving” that the
market share has increased and the inclusion of the “less than” sign in the
null hypothesis is confusing. Also, If the equal part of the null hypothesis is
rejected because the market share is seemingly greater, then certainly the
“less than” portion of the null hypothesis is also rejected because it is
further away from “greater than” than is “equal.” Using this logic, the null
hypothesis for the market share problem can be written as
Hypothesis Testing Process
When is the sample mean so far away from the population mean that
the null hypothesis is rejected?
• Confidence Interval
7-19
Rejection Region
Nonrejection Region
• The nonrejection region is the range of values (also
determined by the critical points) that will lead us not to reject
the null hypothesis if the test statistic should fall within this
region. The nonrejection region is designed so that, before
the sampling takes place, our test statistic will have a
probability 1-
of falling within the nonrejection region if the null
hypothesis is true
In a two-tailed test, the rejection region consists of the
values in both tails of the sampling distribution.
7-21
Decision Making
Decision Making
• A decision may be correct in two ways:
Fail to reject a true H0
Reject a false H0
• A decision may be incorrect in two ways:
Type I Error: Reject a true H0
• The Probability of a Type I error is denoted by .
Type II Error: Fail to reject a false H0
• The Probability of a Type II error is denoted by .
7-24
The “state of nature” is how things actually are and the “action” is
the decision that the business researcher actually makes
Type - I Error
Suppose the flour-packaging process actually is “in control” and is
averaging 40 ounces of flour per package. Suppose also that a business
researcher randomly selects 100 packages, weighs the contents of
each, and computes a sample mean. It is possible, by chance, to
randomly select 100 of the more extreme packages (mostly heavy
weighted or mostly light weighted) resulting in a mean that falls in the
rejection region. The decision is to reject the null hypothesis even
though the population mean is actually 40 ounces. In this case, the
business researcher has committed a Type I error.
if a manager fires an employee because some evidence indicates
that she is stealing from the company and if she really is not stealing
from the company,
Suppose a worker on the assembly line of a large manufacturer
hears an unusual sound and decides to shut the line down. If the
sound turns out not to be related to the assembly line and no
problems are occurring with the assembly line
7-29
Statistical Significance
While the null hypothesis is maintained to be true throughout a
hypothesis test, until sample data lead to a rejection, the aim of a
hypothesis test is often to disprove the null hypothesis in favor of
the alternative hypothesis. This is because we can determine and
regulate , the probability of a Type I error, making it as small as we
desire, such as 0.01 or 0.05. Thus, when we reject a null hypothesis,
we have a high level of confidence in our decision, since we know
there is a small probability that we have made an error.
Example
A survey of CPAs across the United States found that the average
net income for sole proprietor CPAs is $74,914.Because this survey
is now more than ten years old, an accounting researcher wants to
test this figure by taking a random sample of 112 sole proprietor
accountants in the United States which showed a sample mean of
$78,695. Assume the population standard deviation of net incomes
for sole proprietor CPAs is $14,530.
Solution
A survey of CPAs across the United States found that the average
net income for sole proprietor CPAs is $74,914.Because this survey
is now more than ten years old, an accounting researcher wants to
test this figure by taking a random sample of 112 sole proprietor
accountants in the United States which showed a sample mean of
$78,695. Assume the population standard deviation of net incomes
for sole proprietor CPAs is $14,530.
Step 1
Step 2
Solution
Step 3
Type I error rate, or alpha, which is .05 in this problem
Step 4
Because the test is two tailed and alpha is .05, there is 2 or .025 area in each
of the tails of the distribution. Thus, the rejection region is in the two ends
of the distribution with 2.5% of the area in each.
Solution
Step 5
Step 6
Because this test statistic, z = 2.75, is greater than the critical value of z in the
upper tail of the distribution, z = +1.96,
Step 7
Reject the null hypothesis
Step 8
Statistically, the
researcher has enough
7-34
95% confidence
Population interval around
mean under H0 observed sample mean
Note that the population mean may be 28 (the null hypothesis might be true), but
then the observed sample mean, 31.5, would be a very unlikely occurrence. There
is still the small chance ( = 0.05) that we might reject the true null hypothesis.
represents the level of significance of the test.
7-36
Nonrejection Region
If the observed sample mean falls within the nonrejection region, then you fail
to reject the null hypothesis as true. Construct a 95% nonrejection region
around the hypothesized population mean, and compare it with the 95%
confidence interval around the observed sample mean:
s 5 s 5
0 z 28 95% non- 95% Confidence x z.025 31.5
.025
n 1.96 100 rejection region Interval n 1.96 100
around the around the
population Mean Sample Mean
28.98 27,02,28.98 31.5.98 30.52,32.48
The nonrejection region and the confidence interval are the same width, but
centered on different points. In this instance, the nonrejection region does not
include the observed sample mean, and the confidence interval does not
include the hypothesized population mean.
7-37
Solution
A company that delivers packages within a large metropolitan
area claims that it takes an average of 28 minutes for a package to
be delivered from your door to the destination. Suppose you
want to carry out a hypothesis test of this claim at 95%
confidence by taking a sample of 100 packages, with an average
delivery time of
31.5 minutes & standard deviation of 5 minutes. 5
Set the null and alternative hypotheses: s
H0: = 28 x z .025
n 31.5 1.96
H1: 28 100
The p-Value
The p-Value
7-42
Example
An automatic bottling machine fills cola into two liter (2000 cc) bottles. A
consumer advocate wants to test the null hypothesis that the average amount
filled by the machine into a bottle is at least 2000 cc. A random sample of 40
bottles coming out of the machine was selected and the exact content of the
selected bottles are recorded. The sample mean was 1999.6 cc. The population
standard deviation is known from past experience to be 1.30 cc. Test this
hypotheses at 95% confidence with the help of p-value.
7-43
Example
An automatic bottling machine fills cola into two liter (2000 cc) bottles. A
consumer advocate wants to test the null hypothesis that the average amount
filled by the machine into a bottle is at least 2000 cc. A random sample of 40
bottles coming out of the machine was selected and the exact content of the
selected bottles are recorded. The sample mean was 1999.6 cc. The population
standard deviation is known from past experience to be 1.30 cc. Test this
hypotheses at 95% confidence with the help of p-value.
n = 40
H0: 2000 x=
H 1: 1999.6
2000 n = 40 = 1.3
For =
0.05, the x 1999.6 -
z 0
critical value 2000
x 0
The
of testz statistic
is is: z =
1.3
-1.645 n
n 40
Do not reject H0 if: [z -
= 1.95 Reject
1.645] H 0
7-44
Example
An automatic bottling machine fills cola into two liter (2000 cc) bottles. A
consumer advocate wants to test the null hypothesis that the average amount
filled by the machine into a bottle is at least 2000 cc. A random sample of 40
bottles coming out of the machine was selected and the exact content of the
selected bottles are recorded. The sample mean was 1999.6 cc. The population
standard deviation is known from past experience to be 1.30 cc. Test this
hypotheses at 95% confidence with the help of p-value.
z x 0 = 1999.6-
1.3
2000
H0: 2000
n 40
H 1:
2000
= 1.95
n = 40, 0 =
p - value P(Z -1.95)
2000, x-bar 0.5000-0.4744
The test statistic is: x 0
= 1999.6, z 0.0256
= 1.3 n
Example
Step 2
Step 3
Type I error rate, or alpha, which is .05 in this problem
Solution
Step 4
Because the test is one tailed test and alpha is .05, there is .05 area in the
left tail of the distribution.
Step 5,6
I)
II)
Solution
Step 7
1. Observed value method
Because the observed test statistic is not less than the critical value and is
not in the rejection region, the statistical conclusion is that the null
hypothesis cannot be rejected
2. Critical Value method
Because the mean obtained from the sample data is 4.156, the researchers
fail to reject the null hypothesis
3. p-Value method
The observed test statistic is z = -1.42. The probability of getting a z value at
least this extreme when the null hypothesis is true is .5000 - .4222 = .0778.
Step 8
The test does not result in enough evidence to conclude that U.S. managers
think it is less important to use customer service as a means of retaining
customers than do UK managers. Customer service is an important tool for
retaining customers in both countries according to managers.
7-49
7-62
z
x
n
7-64
t
x
s
n
7-65
Example
A coin is to tested for fairness. It is tossed 25 times and only 8 Heads are
observed. Test if the coin is fair at an of 5% (significance level).
7-66
Example
A coin is to tested for fairness. It is tossed 25 times and only 8 Heads are
observed. Test if the coin is fair at an of 5% (significance level).
Note: Since the chi-square table only provides the critical values, it
cannot be used to calculate exact p-values. As in the case of the t-tables,
only a range of possible values can be inferred.
7-68
Example
A manufacturer of golf balls claims that they control the weights of the golf balls
accurately so that the variance of the weights is not more than 1 mg2. A random sample
of 31 golf balls yields a sample variance of 1.62 mg2. Is that sufficient evidence to
reject the claim at an of 5%?
7-69
Solution
A manufacturer of golf balls claims that they control the weights of the golf balls
accurately so that the variance of the weights is not more than 1 mg2. A random sample
of 31 golf balls yields a sample variance of 1.62 mg2. Is that sufficient evidence to
reject the claim at an of 5%?
Example
Example
±1.96 x
0.3
.025 .025
n -1.96 1.96 z
s = 7.8 0.3
0.2
.025
.025
0.1
0.0
x = 14.6-12
z 0 s
z
= 2.6 4
0.65
Since the test statistic falls in the upper rejection region, H0 is rejected, and we may
conclude that the average amount of carry-on baggage is more than 12 pounds.
7-73
Examples
An insurance company believes that, over the last few years, the average liability
insurance per board seat in companies defined as “small companies” has been
$2000. Using = 0.01, test this hypothesis using Growth Resources, Inc. survey
data.
7-74
Examples
An insurance company believes that, over the last few years, the average liability
insurance per board seat in companies defined as “small companies” has been
$2000. Using = 0.01, test this hypothesis using Growth Resources, Inc. survey
data.
n = 100
H0: = 2000 x = 2700
H1: 2000 s = 947
Example
Example
The average time it takes a computer to perform a certain task is believed to be 3.24
seconds. It was decided to test the statistical hypothesis that the average performance
time of the task using the new algorithm is the same, against the alternative that the
average performance time is no longer the same, at the 0.05 level of significance.
7-77
Examples
The average time it takes a computer to perform a certain task is believed to be 3.24
seconds. It was decided to test the statistical hypothesis that the average performance
time of the task using the new algorithm is the same, against the alternative that the
average performance time is no longer the same, at the 0.05 level of significance.
For = x
x 0 3.48 -
z
0.05,test statistic is: z s
The
0
3.24
=
critical n s
values of z 2.8
Do not reject H0 if: [-1.96 z 1.96] 0.24 1.21
are ±1.96 = n
0.20
Do not reject H
0
200
Reject H0 if: [z < -1.96] or z 1.96]
7-78
Example
Example
According to the Japanese National Land Agency, average land prices in central Tokyo
soared 49% in the first six months of 1995. An international real estate investment
company wants to test this claim against the alternative that the average price did not
rise by 49%, at a 0.01 level of significance.
7-80
Example
According to the Japanese National Land Agency, average land prices in central Tokyo
soared 49% in the first six months of 1995. An international real estate investment
company wants to test this claim against the alternative that the average price did not
rise by 49%, at a 0.01 level of significance.
H0: = 49 n = 18
H 1: x = 38
s = 14
49 n = 18
For = 0.01 and (18-1) = 17 x 38 - 49
0
df , critical values of t are t
s
=
±2.898 x
14
The test statistic is: t 0
s n
n
- 1 11 8
= 3.33 Reject H
Do not reject H0 if: [-2.898 t 2.898] 3.3 0
Example
Lower Rejection
Region
Nonrejection
Region
Upper Rejection
Region
rejection region, we may
conclude that the average
price has risen by less than
49%.
7-82
Example
Canon, Inc,. has introduced a copying machine that features two-color copying capability
in a compact system copier. The average speed of the standard compact system copier is
27 copies per minute. Suppose the company wants to test whether the new copier has the
same average speed as its standard compact copier. Conduct a test at an = 0.05 level of
significance.
7-83
Example
Canon, Inc,. has introduced a copying machine that features two-color copying capability
in a compact system copier. The average speed of the standard compact system copier is
27 copies per minute. Suppose the company wants to test whether the new copier has the
same average speed as its standard compact copier. Conduct a test at an = 0.05 level of
significance.
n = 24
H0: = 27 x = 24.6
H1: 27 s = 7.4
n = 24
For = 0.05 and (24-1) = 23 x 0 24.6 -
t
df , critical values of t are 27 =
±2.069 x s
7.4
The test statistic is: t s
0
n =
-2.4
n 1.59 Do not reject H
1.51 0
Do not reject H0 if: [-2.069
t 2.069] 24
Reject H0 if: [t < -2.069] or
7-84
Example
The t Distribution
0.8 Since the test statistic falls in
0.7
the nonrejection region, H0
.95
0.6
0.5
0.4
0.3
is not rejected, and we may
.025 .025
0.2
0.1
not conclude that the
average speed is different
0.0
t
-2.069
2.069
from 27 copies per minute.
Nonrejection
Lower Rejection Region Upper Rejection
Region Region
7-85
Example
An investment analyst for Goldman Sachs and Company wanted to test the hypothesis
made by British securities experts that 70% of all foreign investors in the British market
were American. The analyst gathered a random sample of 210 accounts of foreign
investors in London and found that 130 were owned by U.S. citizens. At the = 0.05
level of significance, is there evidence to reject the claim of the British securities
experts?
7-86
Example
An investment analyst for Goldman Sachs and Company wanted to test the hypothesis
made by British securities experts that 70% of all foreign investors in the British market
were American. The analyst gathered a random sample of 210 accounts of foreign
investors in London and found that 130 were owned by U.S. citizens. At the = 0.05
level of significance, is there evidence to reject the claim of the British securities
experts?
n = 210
H0: p = 0.70 130
H1: p 0.70 p 0.619
= 210
n = 210
For = 0.05 critical values of z are ±1.96 p -
0 0.619 - 0.70
z= p
The test statistic is: p p (0.70)(0.30)
z p0 q 0
0
= 210
n pq
00
-0.081
Do not reject H0 if: [-1.96 = n 2.5614 Reject H
0.0316
zReject H0 if: [z < -1.96] or z 1.96]
1.96]
0
7-87
Example
The EPA sets limits on the concentrations of pollutants emitted by various industries. Suppose that the
upper allowable limit on the emission of vinyl chloride is set at an average of 55 ppm within a range of two
miles around the plant emitting this chemical. To check compliance with this rule, the EPA collects a
random sample of 100 readings at different times and dates within the two-mile range around the plant. The
findings are that the sample average concentration is 60 ppm and the sample standard deviation is 20 ppm.
Is there evidence to conclude that the plant in question is violating the law?
7-88
Example
The EPA sets limits on the concentrations of pollutants emitted by various industries. Suppose that the
upper allowable limit on the emission of vinyl chloride is set at an average of 55 ppm within a range of two
miles around the plant emitting this chemical. To check compliance with this rule, the EPA collects a
random sample of 100 readings at different times and dates within the two-mile range around the plant. The
findings are that the sample average concentration is 60 ppm and the sample standard deviation is 20 ppm.
Is there evidence to conclude that the plant in question is violating the law?
n = 100
H0: 55
x = 60
H1: 55 s = 20
n = 100
For = 0.01, the critical x 0 60 - 55
z =
value of z is 2.326 s
20
x0
The test statistic is: z s n
5 2.5
2 100
n = Reject H
0
Do not reject H0 if: [z 2.326]
Reject H0 if: z 2.326]
7-89
Example
0.99
rejected, and we may
0 .3
f(z)
0 .2
conclude
0 .1
that the average concentration
0 .0
-5 0 5
of vinyl chloride is more
than 55 ppm.
z 2.326
2.5
Nonrejection Rejection
Region Region
7-90
Example
A certain kind of packaged food bears the following statement on the package: “Average net weight 12 oz.”
Suppose that a consumer group has been receiving complaints from users of the product who believe that they are
getting smaller quantities than the manufacturer states on the package. The consumer group wants, therefore, to
test the hypothesis that the average net weight of the product in question is 12 oz. versus the alternative that the
packages are, on average, underfilled. A random sample of 144 packages of the food product is collected, and it
is found that the average net weight in the sample is 11.8 oz. and the sample standard deviation is 6 oz. Given
these findings, is there evidence the manufacturer is underfilling the packages?
7-91
Example
A certain kind of packaged food bears the following statement on the package: “Average net weight 12 oz.”
Suppose that a consumer group has been receiving complaints from users of the product who believe that they are
getting smaller quantities than the manufacturer states on the package. The consumer group wants, therefore, to
test the hypothesis that the average net weight of the product in question is 12 oz. versus the alternative that the
packages are, on average, underfilled. A random sample of 144 packages of the food product is collected, and it
is found that the average net weight in the sample is 11.8 oz. and the sample standard deviation is 6 oz. Given
these findings, is there evidence the manufacturer is underfilling the packages?
n = 144
H0: 12
x = 11.8
H1: 12
s=6
n = 144
For = 0.05, the critical value
of z is -1.645 z
x 0 11.8 -12
x =
0 s 6
The test statistic is: z s n 144
n
-.2
Do not reject H0 if: [z -1.645] = 0.4 Do not reject H
.5
Reject H0 if: z ]
0
7-92
Example
0.2
is underfilling packages on
0.0
-5 0 average.
5
-0.4
-1.645 z
Rejection Nonrejection
Region Region
7-93
A floodlight is said to last an average of 65 hours. A competitor believes that the average life of the
floodlight is less than that stated by the manufacturer and sets out to prove that the manufacturer’s
claim is false. A random sample of 21 floodlight elements is chosen and shows that the sample
average is 62.5 hours and the sample standard deviation is 3. Using =0.01, determine whether
there is evidence to conclude that the manufacturer’s claim is false.
7-94
A floodlight is said to last an average of 65 hours. A competitor believes that the average life of the
floodlight is less than that stated by the manufacturer and sets out to prove that the manufacturer’s
claim is false. A random sample of 21 floodlight elements is chosen and shows that the sample
average is 62.5 hours and the sample standard deviation is 3. Using =0.01, determine whether
there is evidence to conclude that the manufacturer’s claim is false.
H0: 65
H 1:
65 n = 21
For =
0.01 an
(21-1) =
20 df, the
critical
value
-2.528
7-95
0 .2
H0: p 0.0096
H1: p
0.0096 n =
600
0.
2
conclude that proportion of
0.
1
all hotels in the country that
0.
0
-5 0
z 1.282
5 meet the association’s
0.519 standards is greater than
Nonrejection
Region
Rejection
Region 0.0096.
7-99
0.4 0.4
p-value=area to
p-value=area to
0.3 right of the test statistic 0.3
right of the test statistic
=0.3018
=0.0062
f(z
f(z
0.2 0.2
)
)
0.1 0.1
0.0 0.0
-5 0 0.519 5 -5 0 5
z 2.5 z
The p-value is the probability of obtaining a value of the test statistic as extreme as,
or more extreme than, the actual value obtained, when the null hypothesis is true.
The p-value is the smallest level of significance, , at which the null hypothesis
may be rejected using the obtained value of the test statistic.
7-100
0.3
f(z
0.2
)
0.1
0.0
-5 0 5
-0.4 0.4
z
The further away in the tail of the distribution the test statistic falls, the smaller
is the p-value and, hence, the more convinced we are that the null hypothesis
is false and should be rejected.
In a right-tailed test, the p-value is the area to the right of the test statistic if the
test statistic is positive.
In a left-tailed test, the p-value is the area to the left of the test statistic if
the test statistic is negative.
In a two-tailed test, the p-value is twice the area to the right of a positive
test statistic or to the left of a negative test statistic.
Using Statistics
• Inferences about differences between parameters of two
populations
Paired-Observations
Paired-Observation Comparisons
Paired-Observation Comparisons
of Means
Test statistic for the paired- observations test
D D
t df n 1
S 0,
D
n
D sample average for the differences
S D sample standard deviation for the differences
n sample size
D mean of the population of differences under the null
hypothesis
0
An assumption for this test is that the differences of the two populations are
normally distributed. The after measurement is not independent of the before
8-5
Example
A random sample of 16 viewers of Home Shopping Network was
selected for an experiment. All viewers in the sample had
recorded the amount of money they spent shopping during the
holiday season of the previous year. The next year, these people
were given access to the cable network and were asked to keep a
record of their total purchases during the holiday season. Home
Shopping Network managers want to test the null hypothesis that
their service does not increase shopping volume, versus the
alternative hypothesis that it does.
8-6
Example
8-7
Example
8-8
Example
H0: D 0
H1: D >
0
df = (n-1)
= (16-1) = D D
Test
15 Statistic: 0
t
sD
n
D̅ = $32.81 , sD = $55.75, μD =
0
Solution
D D 32.81 0
t 0 2.354
5 5.7 5
sD
n 16
t D is t r ib u t io n : d f = 1
5
0.
4
0.
3
f(t)
0.
2 Nonrejection Rejection
Region Region
0.
1
0.
0
-5 0 1.753 5 t
= t0.05
2.131
2.602
= t0.025
= t0.01
2.354=
test
statistic
Example
Suppose a stock market investor is interested in determining whether
there is a significant difference in the P/E (price to earnings) ratio for
companies from one year to the next. In an effort to study this
question, the investor randomly samples nine companies from the
Handbook of Common Stocks and records the P/E ratios for each of
these companies at the end of year 1 and at the end of year 2. Test at
α = 0.01
Solution
There is not enough evidence from the data to declare a significant difference in
the average P/E ratio between year 1 and year 2
Example
Consumers are asked to rate a company both before and
after viewing a video on the company twice a day for a
week. The data is displayed as follows. Use an alpha of .
05 to test to determine whether there is a significant
increase in the ratings of the company after the one-
week video treatment.
Assume that differences in ratings are normally distributed
in
the population.
Solution
or
Both populations are normal and 1 and 2 are both
known
Small sample test if:
unknown
8-16
◼ H0: 1 -2 0
◼ H1: 1 -2 0
• V: Difference between two population means is
greater
than D
1 2+ D
◼ H0: 1 -2 D
◼ H1: 1 -2 D
Test Statistic
Test Statistic
Test Statistic
Example
A random sample of 32 advertising managers from across the United States is
taken. The advertising managers are contacted by telephone and asked what their
annual salary is. A similar random sample is taken of 34 auditing managers. The
resulting salary data is listed below. The analyst wants to test at α= 0.05 whether
there is a difference in the average wage of an advertising manager and an
auditing manager.
Solution
The business researcher rejects the null hypothesis and can say that there
is significant difference between the average annual wage of an advertising
manager and the average annual wage of an auditing manager
Example
A sample of 87 professional working women showed that
the average amount paid annually into a private pension
fund per person was $3352. The population standard
deviation is
$1100. A sample of 76 professional working men showed
that the average amount paid annually into a private
pension fund per person was $5727, with a population
standard deviation of $1700. A women’s activist group
wants to “prove” that women do not pay as much per
year as men into private pension funds. If they use α= .01
and these sample data, will they be able to reject a null
hypothesis that women annually pay the same as or
more than men into private pension funds? Use the
eight-step hypothesis-testing process.
Solution
Null hypothesis is that women pay the same as or more than
men The alternative hypothesis
Example
Population 1: Preferred
Visa H : 1 2
0
0H :
n =
1
1 1 2
1200 0
x =
1
( x1 x 2) ( 1 2)0
452= z (452 523)
2
1
0
212
2 2122 1852
n n 1200 800
11 2
Population 2 : Gold 2
Card 71
71
80.2346 8.96 7.926
n =
2
800
x =
p - value : p(z < -7.926)
523
=
2
2
0
185
H is rejected at any common level of significance
0
8-27
Example
Since the value of the test statistic is
far below the lower critical point, the
Standard Normal Distribution
0.4
null hypothesis may be rejected, and
0.3
we may conclude that there is a
0.2
statistically significant difference
f(z
0.1
0.0
of Gold Card and Preferred Visa
z
-z0.01=-2.576
z0.01=2.576
0
cardholders.
Rejection Nonrejection Rejection
Region Region Region
Test Statistic=-7.926
Example
Suppose that the makers of Duracell batteries want to
demonstrate that their size AA battery lasts an average of at
least 45 minutes longer than Duracell’s main competitor, the
Energizer. Two independent random samples of 100 batteries of
each kind are selected, and the batteries are run continuously
until they are no longer operational. The sample average life for
Duracell is found to be 308 minutes. The result for the
Energizer batteries is 254 minutes. Assume population s.d1 is 84
minutes and population s.d2 67 minutes. Is there evidence to
substantiate Duracell’s claim that its batteries last, on average,
at least 45 minutes longer than Energizer batteries of the same
size?
8-29
Example
Population 1:
Duracell H : 1 2 45
0
n =
1
H
1
:
1
2
45
100
x =
308
=
1 ( x x ) ( ) (308 254) 45
1
z 1 2 1 2 0
84 2 2
842 672
1 2
100 100
Population 2 : n
1
n
Energizer
9
2
9
0.838
115 .4 5 10.75
n =
2
100
x = p - value : p(z > 0.838) = 0.201
254
2
H m a y not be rejected at any c o m m o n
= 67
2
0
level of significance
8-30
◼ If we might assume that the population variances 12 and 22 are equal (even though
unknown), then the two sample variances, s12 and s22, provide two separate estimators of
the common population variance. Combining the two separate estimates into a pooled
estimate should give us a better estimate than either sample variance by itself.
}
* * * * * ** * * * ** * * *** * *
* * ** * * Sample 2
x2
*
x
From sample 1 we get1the estimate s1 with
2 From sample 2 we get the estimate s22 with
(n -1) degrees of freedom.
Sample 1 (n2-1) degrees of freedom.
1
From both samples together we get a pooled estimate, sp2 , with (n1-1) + (n2-1) = (n1+ n2 -2)
total degrees of freedom.
8-31
2
The degrees of freedom associated with this estimator is:
df = (n1+ n2-2)
2 1 1
The estimate of the standard deviation of (x 1 x 2 ) is given by: sp
1n n 2
Test statistic for the difference between two population means, assuming equal
population variances:
(x1 x2 ) ( 1 2 ) 0
t=
1 1
s2p
1 2
n n
Solution
Population1: Oil price =
H 0: 1 2
$66.00 n1 =14
0 H1: 1 ( x21 x2 ) ( 1 2 )
x1 = 0.317% t
0
(n1 0 1)s12 (n2 1)s22 1
s1 = 0.12% 1 n1 n2 2
n1 n2
Population 2: Oil price =
0.10 0.10
$58.00 n = 9 7 7
0.049 2.154
2
x2 = 0.0024
7 7
s0.21%
2 = 0.11%
Critical point: t =
2.080 0.02
df = (n1 n2 2) (14 9 2) 5 at the 5% level of significance
H 0 may be rejected
21
Example
The manufacturers of compact disk players want to test whether a
small price reduction is enough to increase sales of their product.
Randomly chosen data on 15 weekly sales totals at outlets in a
given area before the price reduction show a sample mean of
$6,598 and a sample standard deviation of $844. A random
sample of 12 weekly sales totals after the small price reduction
gives a sample mean of $6,870 and a sample standard deviation
of $669. Is there evidence that the small price reduction is
enough to increase sales of compact disk players? Assume that
population is normally distributed and population variances are
assumed normal.
8-36
Example 8-6
H : 1 0
Population 1: Before 0 2
H :
Reduction 0
n1 = 15 1 2
1
x1 = ( x x1) ( 2 1 )0
t 2
$6598 (n 1)s 2 (n 1)s 2 1 1
s1 = 1
1 2 2
n n 2 n n
$844 1 2
1 2
(6870 6598) 0
Population 2: After Reduction
n = 12 (14)844 2 (11)669 2 1 1
2x = $6870 15 12
2 15 12
2
s2 = 272 272
0.91
$669 89375.25 298.96
df = (n n 2) (15 12 2) 25
1 2
Critical point : t = 1.316
0.10
2 1 1
( x1 x2 ) t sp
n1 n2
2
8-39
Population 198
1: 0
H 0 : p1 p 2 0
0.2
a 10% level of significance,
f(z
)
0.1
0.0
we may conclude that there
z
0
is no statistically significant
-z
0.05
Rejection
=-1.645
Nonrejection Rejection
z =1.645
0.05
difference between banks’
Region Region Region
Test Statistic=1.415 shares of car loans in 2000
and 2007.
8-46
Population 1: With
H 0 : p1 p2 0.10
Sweepstakes
n1 = 300
H1: p1 p2 0.10 p )
( p
x1 = 120 z 1 2
D p (1 p
p
1 = p(1
1 1 2 2
0.40 p ) )
n1 n2
(0.40 0.20) 0.1
Population 2: No 0.10 0 3.118
Sweepstakes
n = 700 0.03207
2 (0.40)(0.60) (0.20)(.80)
x2 = 140 300 700
p 2 = point: z
Critical =
3.09 0.001
H 0 may be rejected at any
0.20
common level of
8-48
0.4
level of significance as small as
0.3
0.001, the null hypothesis may be
0.2
rejected, and we may conclude that
the proportion of customers buying at
f(z
0.1
0.0
least
$2500 of travelers checks is at least
z
0
z0.001=3.09
Nonrejection
Region
Rejection 10% higher when sweepstakes are
Region
Test Statistic=3.118 on.
8-49
p1 1 2 2 (1
( p1 p 2 )
p
)
z
n1 n2 p
2 (1
p )
8-50
k
Fk1, k 2
2
11
22
k2
8-51
The F Distribution
• The F random variable
cannot be negative, so it F Distributions with different Degrees of
Freedom
is bound by zero on the
left.
1.0 F(25,30)
f(F)
F(10,15)
• The F distribution is skewed
0.5
to the right.
• The F distribution is 0.0 F(5,6)
= 1
F (k1 , k2, 1-α/2)
F (k2, k1, α/2)
k2 0.6
1 161.4 199.5 215.7 224.6 230.2 234.0 236.8 238.9 240.5 0.5
2 18.51 19.00 19.16 19.25 19.30 19.33 19.35 19.37 19.38
f(F)
3 10.13 9.55 9.28 9.12 9.01 8.94 8.89 8.85 8.81 0.4
4 7.71 6.94 6.59 6.39 6.26 6.16 6.09 6.04 6.00
0.3
5 6.61 5.79 5.41 5.19 5.05 4.95 4.88 4.82 4.77
6 5.99 5.14 4.76 4.53 4.39 4.28 4.21 4.15 4.10 0.2
7 5.59 4.74 4.35 4.12 3.97 3.87 3.79 3.73 3.68
0.1
8 5.32 4.46 4.07 3.84 3.69 3.58 3.50 3.44 3.39
9 5.12 4.26 3.86 3.63 3.48 3.37 3.29 3.23 3.18 0.0 F
10 4.96 4.10 3.71 3.48 3.33 3.22 3.14 3.07 3.02 0 1 2 3 4 5
11 4.84 3.98 3.59 3.36 3.20 3.09 33..001 2.95 2.90
1
12 4.75 3.89 3.49 3.26 3.11 3.00 2.91 2.85 2.80 F0.05=3.01
13 4.67 3.81 3.41 3.18 3.03 2.92 2.83 2.77 2.71
14 4.60 3.74 3.34 3.11 2.96 2.85 2.76 2.70 2.65
15 4.54 3.68 3.29 3.06 2.90 2.79 2.71 2.64 2.59
8-55
F(6,9) =3.37
f(F)
0.4
0.3
0.05
0.2
2
F s1
n1 1 , n 2 1
s22
I: Two-Tailed Test
• 1 = 2
• H0: 1 = 2
• H1: 2
II: One-Tailed Test
• 12
• H0: 1 2
• H1: 1 2
Example
One of the problems that insider trading supposedly causes is
unnaturally high stock price volatility. When insiders rush to buy a
stock they believe will increase in price, the buying pressure
causes the stock price to rise faster than under usual conditions.
Then, when insiders dump their holdings to realize quick gains,
the stock price dips fast. Price volatility can be measured as the
variance of prices. An economist wants to study the effect of the
insider trading scandal and ensuing legislation on the volatility of
the price of a certain stock. The economist collects price data for
the stock during the period before the event (interception and
prosecution of insider traders) and after the event. The economist
makes the assumptions that prices are approximately normally
distributed and that the two price data sets may be considered
independent random samples from the populations of prices
before and after the event. As we mentioned earlier, the theory of
stock.
financeThe 25 daily
supports thestock pricesassumption.
normality before the 1event
(The give s 2 9.3 of
assumption
the 24 stock
random
(dollars pricesmay
sampling
squared), after
and bethe event
somewhat
2 give s 2 3.0 (dollars
problematic squared).
in this case, but
later
Conduct
α = we the
will test
dealat
with time-dependent observations more
0.05.
effectively.) Suppose that the economist wants to test whether
8-58
Solution
Population 1 : Before
n = 25
1
s 12 9 . 3 2 2
2
H :
Population 2 : After 0 1 21
n = 24
2 2 2
H 1: 1 2
s 22 3 . 0
2
1s 9.3
F F 3.1
0.05 n1 1, n2 1 24,23 s2 3.0
2
F 2.01
24,23
0.01 H 0 may be rejected at a 1% level of significance.
F 2.70
24,23
8-59
Solution
0.6
0.4
0.3
0.0 F
rejected, and we may conclude
0 1 2 3 4 5
that the variance of stock
F0.01=2.7
Statistic=3.1
Test
prices is reduced after the
interception and prosecution
of inside traders.
Example
Suppose a machine produces metal sheets that are specified to be 22 millimeters
thick. Because of the machine, the operator, the raw material, the manufacturing
environment, and other factors, there is variability in the thickness. Two machines
produce these sheets. Operators are concerned about the consistency of the two
machines. To test consistency, they randomly sample 10 sheets produced by
machine 1 and 12 sheets produced by machine 2. The thickness measurements of
sheets from each machine are given below. Assume sheet thickness is normally
distributed in the population. Test to determine whether the variance from each
sample comes from the same population variance (population variances are
equal) at α = .05
Solution
The variance for families in the United States is greater than the
variance of families in Manhattan. Families in Manhattan are more
homogeneous in amount spent on basics than families across the
United States.
9-1
Analysis of Variance
9-2
Using Statistics
• ANOVA (ANalysis Of VAriance) is a statistical method for determining
the existence of differences among several population means.
ANOVA is designed to detect differences among means from
populations subject to different treatments (different populations)
ANOVA is a joint test
◼ T h e equality of several population means is tested
simultaneously or jointly.
H0: 1 = 2 = 3 = 4 = ... r
H1: Not all i (i = 1, ..., r) are equal
N = n1 + n2 + …… nr
One-Way ANOVA
Suppose a researcher decides to analyze the effects of the four machine
operators on the valve opening measurements of valves produced in a
manufacturing plant. Is there a significant difference in the mean valve openings
of 24 valves produced by the four operators?
Is it possible to analyze the four samples by using a t test for the difference in two
sample means? These four samples would require 4C2 = 6 individual t tests to
accomplish the analysis of two groups at a time. If α= .05 for a particular test,
there is a 5% chance of committing a Type I error. In this problem, with
six t tests, the error rate compounds, so when the analyst is finished with the
problem there is a much greater than .05 chance of committing a Type I error
9-6
………
1 2 r
Population 1 Population 2 Population r
9-8
The ANOVA Test Statistic for r = 4 Populations and n = 54 Total
Sample Observations
0.
)
5
test statistic is less than
0.
4 =0.05 2.79 we would not reject the null hypothesis, and we
0.
3
would conclude the 4 population means are equal. If
0.
0 1 2 3 4 5 the test statistic is greater than 2.79, we would reject
F(3,50)
2.79
2
the null hypothesis and conclude that the four
0.
1 population means are not equal.
The Hypothesis Test of ANOVA
The Hypothesis Test of Analysis of
Variance
The main principle behind the analysis of
variance is,
The Hypothesis Test of Analysis of
Variance
The ANOVA principle thus
says
The Hypothesis Test of ANOVA
The Hypothesis Test of ANOVA
The Sum-of-Squares Principle
The sum-of-squares total (SST) is the sum of the two terms: the sum of squares for
treatment (SSTR) and the sum of squares for error (SSE).
SST = SSTR + SSE