Dsci 301 Blackboard

Mark G.
Haug
Probability
Making Sound
Decisions
1) Correlation
2) Modeling
Collecting Data
Describing Data
Natural
Phenomena

Mark G. Haug
Outline of Probability
A. Percentages
B. Risk
C. Formal probability
D. Randomness
E. Miscellaneous Problems

Mark G. Haug
A. Percentages
One simple rule: whenever you are given a
percentage (or fraction), always be clear on
percentage of what?

Mark G. Haug
B. Risk
Risk is the probability that an event will
occur within a given time period.

Mark G. Haug
Which do you think is safest based on your own
experience and the following information?

1. National Ski Patrol, 1984-97: 34 deaths per 52,250,000 per year
2. Denver Post claims: National Safety Council, 1995: 17 drowning
deaths per million water-sports participants
3. Denver Post claims: National Safety Council, 1995: 7.1 (bicycling?)
deaths per million bicyclists
4. Skiing magazine claims: National Severe Storms Laboratory: 89
lightning deaths per year in US

A. Snow Skiing
B. Water-Sports
C. Bicycling
D. Lightning

Mark G. Haug
Review of Risk

1. Measures of risk should be of the same units
(per number per time)

2. What is the baseline risk?

3. Does the unit of measurement adequately represent
the concern? (E.g., safety is a relative criterion.)

4. Is the risk your risk or some general population?

Mark G. Haug
C. Formal Probability
A preliminary note on probability:

1. Formal (Classical)

2. Empirical (Relative Frequency)

3. Subjective

Mark G. Haug
C. Formal Probability
1. Events
2. Outcomes
3. Addition Law
4. Mutually Exclusive Events
5. Complements
6. Conditional Probability
7. Multiplication Law
8. Independent Events
9. Combinations

Mark G. Haug
pepperoni
anchovy
mushroom

Mark G. Haug
Definitions in Formal Probability
1. Events: Events are the specified results
of a situation.
2. Outcomes: Outcomes are all of the
possible results of a situation.
Note: The sum of the probability across all
outcomes always adds up to 1.

Mark G. Haug
1. Events: Events are the specified results of a
situation.
2. Outcomes: Outcomes are all of the possible results
of a situation.
If all of the outcomes are equally likely,
then the probability of an event, P(E), is
equal to the number of events divided by
the number of outcomes.
Note: The sum of the probability of each outcome
always adds up to 1.

Mark G. Haug
= P (pepperoni)
= A (anchovy)
= M (mushroom)
Example: There are
eight slices of pizza.
What is P(A) ?
(What is the
probability that a
slice will contain at
least one anchovy?)

Mark G. Haug
1946: Malone v. Commonwealth
Which of the following describes
the revolver and the shooting sequence?

A. 4 chambers, sequential
B. 4 chambers, spin after each trigger
C. 5 chambers, sequential
D. 5 chambers, spin after each trigger
E. 6 chambers, sequential
F. 6 chambers, spin after each trigger

Mark G. Haug
Read as:
P(A or B) = P(A) + P(B) - P(A and B).
3. Addition Law:
P(AB) = P(A) + P(B) - P(AB)
4. Mutually Exclusive Events:
If P(AB) = 0, then events A and B are
called mutually exclusive events.

Mark G. Haug
= P (pepperoni)
= A (anchovy)
= M (mushroom)
Example:
What is
P(AM)=?

Mark G. Haug
= P (pepperoni)
= A (anchovy)
= M (mushroom)

Are slices with
anchovies and
slices with mushrooms
mutually exclusive?

P(A) + P(M) - P(AM)
5/8 + 4/8 - 2/8 = 7/8
0.875
No, 2 of the 8 slices,
P(AM), contain both
anchovies and
mushrooms.

Mark G. Haug
5. Complements: P(A`) = 1 - P(A)
Read as: P(Not A) = 1 - P(A)

Mark G. Haug
= P (pepperoni)
= A (anchovy)
= M (mushroom)
Example:
What is P(A`) = ?
(What is the
probability that a
slice will not
contain at least
one anchovy?)

Mark G. Haug
6. Conditional Probability:
P(B|A) = [P(AB)] / [P(A)]
Read as:
P(B given A) = P(A and B) / P(A)

Mark G. Haug
= P (pepperoni)
= A (anchovy)
= M (mushroom)
Example #1:
What is P(P|A)=?

Mark G. Haug
= P (pepperoni)
= A (anchovy)
= M (mushroom)
Example #2:
What is P(A|P)=?

Mark G. Haug
8. Independent Events:
Events A and B are independent
if (and only if) P(B|A) = P(B).
[It also follows that P(A|B) = P(A)
when A and B are independent.]
7. Multiplication Law:
P(AB) = P(A) P(B|A)
P(AB) = P(A) P(B)
if events A and B are independent.

Mark G. Haug
= P (pepperoni)
= A (anchovy)
= M (mushroom)

Are slices with
anchovies and slices
with mushrooms
independent?

Mark G. Haug
Reading: pp. 74-80
Sections 5.1-5.6

Homework: pp. 84-88
Problems 5.1-5.12

101 Special Problems: 1.4, 2.7

Mark G. Haug
1973 Candidates:
b
1
b
2
b
3
w
1
w
2
w
3

1975 Candidates:
b
1
b
2
b
3
b
4
w
1
w
2
w
3
w
4

Mark G. Haug
LEE v. CITY OF RICHMOND
United States District Court, E.D. Virginia, 1978
456 F.Supp. 756

***

The Court concludes that the numbers
involved are too small to allow a finding of
disparate impact under the Griggs rule. The total of
applicants for 1973 and 1975 was fourteen, seven
white and seven black.

***

Mark G. Haug

Homework: p. 88
Problem 5.13

Mark G. Haug

Homework: p. 90
Problem 5.17

Mark G. Haug
100
100
100
100
Fire one
Explode?
Yes,

then
take
all 399
No
Fire another
Explode?
Yes,

then
take
all 398
No
Reject 100
Acceptance Sampling

Mark G. Haug

If 84% of the shells are duds, what is the
probability that approximately 75% or more of
all the shells will be accepted?

A. 1/24
B. 3/24 = 1/8
C. 1/6
D. 1/4
E. 1/3
F. 1/2

Mark G. Haug
9. Combinations:
r)! (n r!
n!
r
n
=
|
|
.
|
\
|
Read as: n choose r
Example: How many distinct pairs of people are there
in a group of 5 people? (Read as: 5 choose 2)
Answer: You can form exactly 10 distinct pairings of
people within a group of 5 people.

Mark G. Haug
9. Combinations:
Probability example: If Kathryn, Joseph,
Luke, Alexander, and Dominic are the only
members of a committee, then what is the
probability that Kathryn and Joseph will be
selected as co-chairs in a random selection
process?
r)! (n r!
n!
r
n
=
|
|
.
|
\
|

Mark G. Haug
9. Combinations:
Answer: 1/10 = 0.1.

Kathryn and Joseph are one distinct pairing
(the event) of ten equally likely pairings
(the outcomes).
r)! (n r!
n!
r
n
=
|
|
.
|
\
|

Mark G. Haug
Reading: p. 83
Section 5.10

Homework: pp. 93-94
Problem 5.24

From the Materials Published by Harvard
Business School: Charles River Jazz Festival
Read case and answer Question #1
(Answer is on Blackboard HB crjf answers)

Mark G. Haug
Assume that this week you receive a stock
market prediction in the mail. It says the market
will go down next week. You wait and see. The
market then goes down. The following week,
same thing happens: a prediction, which is later
confirmed as accurate. This happens seven
consecutive times. Then, the mailer asks you to
subscribe to his/her newsletter for 52 issues at
$500.

Mark G. Haug
D. Some Curious Aspects of Random
What does random mean?

Mark G. Haug
Which of the following best describes the
meaning of random?

A. chaos
B. absence of a pattern
C. equal likelihood of all outcomes
D. independence of outcomes

Mark G. Haug
It is difficult to define, but if it is time-
dependent, then each random event must be
of preceding random events. We
infer non-random processes if we observe an
unlikely pattern(s) in the process. The absence
of a pattern, however, does not guarantee a
random process.
D. Some Curious Aspects of Random
What does random mean?

Mark G. Haug
Luke Haug dialed 9-1-1 by accident. Was this a
random event?

A. Yes, because he was only 2 years old
B. No, because there is a pattern
C. No, because the numbers are not
equally likely
D. Yes, because the numbers are
equally likely
E. Yes, but Lukes parents should be
fined anyway for neglecting their kid

Mark G. Haug
How long would you expect the longest
sequence of heads (H) to be in a collection
of 200 consecutive tosses of a coin?

Mark G. Haug
In one season of 30 basketball games, what
would you expect is the longest streak of
consecutive shots made for a 40% shooter who
shoots on average 20 shots per game? (Assume
that the Hot Hand does not exist: i.e., shots are
independent.)

A. 6
B. 7
C. 8
D. 12
E. 24

Mark G. Haug
In one season of 162 baseball games, what
would you expect is the longest streak of
consecutive games with one or more hits for a
.250 batter who gets on average 5 at bats?
(Assume that hits and games are independent.)

A. 7
B. 11
C. 15
D. 19
E. 23

Mark G. Haug
1. Bombs on an Airplane
2. Birthdays**
3. A Family of Four*
4. Testing for HIV**
5. Lets Make a Deal!* **
6. Life and Times of Orchestra Conductors
* Treated in Parade Magazine by Marilyn vos Savant.
** Treated in The Economist.

Mark G. Haug
1. Bombs on an Airplane
When discovering that the chance of a
bomb on an airplane is 1 in 13,000,000,
frequent-flyer Frank got nervous.
Consulting with you, he learns that the
probability of 2 bombs on the same plane is
1 in 169,000,000,000,000, assuming the
two bombs are independent. Relieved,
Frank now carries a bomb with him when
he flies.

Mark G. Haug
Would Franks bomb be considered
independent of another bomb on the
airplane?
YES.
So, whats wrong with Franks logic?

Mark G. Haug
So, whats wrong with Franks logic?
This is an example of conditional probability.
The probability of two bombs on the plane is
P(B>2 | B>1) =
P(B>2 B>1) / P(B>1) =
P(B>2) / P(B>1) =
(~1/169,000,000,000,000) / (~1/13,000,000) =
1/13,000,000, or simply 1 in 13,000,000

Mark G. Haug
2. Birthdays
What is the probability of a matching
birthday in any randomly selected group
(and most non-random groups) of 25
people?

(Note: Consider the standard 365 days a
year -- the months and days -- and
disregard the year of birth. Also disregard
February 29 for the sake of simplicity.)

Mark G. Haug
A. 25 / 365
B. 25! / 365!
C. (365-25)! / 365!
D. 0.14
E. 0.28
F. 0.42
G. 0.56
H. 0.70
What is the probability of a matching birthday
in any randomly selected group
(and most non-random groups) of 25 people?

Mark G. Haug
101 Special Problems: 1.3

Mark G. Haug
A. (1/365)
4

B. (1/365)
3

C. (1/90)
4

D. (1/90)
3

E. cannot be calculated with this information
What is the probability that in a family of four
kids (four singleton births), all four kids have
the same birthdate? Boston Globe reported this
true story and suggested the probability was
about 1 in 18,000,000,000 (1 in 18 billion).

Mark G. Haug
3. A Family of Four
Question #2: A husband and wife have two
children: one is a boy. Whats the
probability that the other is a girl?
children: the oldest one is a boy. Whats the

Mark G. Haug
A. 0.50, 0.25
B. 0.50, 0.33
C. 0.50, 0.50
D. 0.50, 0.67
E. 0.50, 0.75
children: one is a boy. Whats the probability
that the other is a girl?

Mark G. Haug
3. A Family of Four
children: one is a boy. Whats the probability
that the other is a girl?

Mark G. Haug
Question #2: A husband and wife have two children:
one is a boy. Whats the probability that the other is a
girl?

Mark G. Haug
4. Testing for HIV
1993: Wellcome- Elisa Test for HIV
P( pos. | HIV ) = 0.993 (Sensitivity)
P( neg. | Not HIV ) = 0.9999 (Specificity)
P( neg. | HIV ) = ?

Mark G. Haug
P( pos. | HIV ) = 0.993 (This is sensitivity)
P( neg. | Not HIV ) = 0.9999 (This is specificity)
P( neg. | HIV ) = ?
This is a conditional probability that is simply a
complement to a known conditional probability,
namely, P( pos. | HIV ). Given that someone has
HIV, there are only two possible outcomes from
the test: pos. and neg.

Mark G. Haug
P( pos. | Not HIV ) = ?
This is a conditional probability that is simply a
complement to P( neg. | Not HIV ). Given that
someone does not have HIV, there are only two
possible outcomes: pos. and neg.

Mark G. Haug
Much more interesting: If you tested people at
random, what is the probability that someone
really has HIV given he tested positive?
[P( HIV | pos.) = ?]

Mark G. Haug
If you tested people at random,
what is the probability that someone really
has HIV given that he tested positive?
[P( HIV | pos.) = ?]
A. 0.00
B. 0.20
C. 0.40
D. 0.60
E. 0.80
F. greater than 0.80 but less than or equal to 0.90
F. greater than 0.90 but less than or equal to 0.99
G. greater than 0.99

Mark G. Haug
To answer this question, we need to add a new
topic to our discussion of formal probability:
10. Bayes Rule:
) P(B' ) B' | P(A P(B) B) | P(A
P(B) B) | P(A
A) | P(B
+
=

Mark G. Haug
P( HIV | pos. ) =
) HIV Not ( P ) HIV Not | . pos ( P ) HIV ( P ) HIV | . pos ( P
) HIV ( P ) HIV | . pos ( P
+
=
) P(B' ) B' | P(A P(B) B) | P(A
P(B) B) | P(A
A) | P(B
+
=

Mark G. Haug
What if the people being tested are a self-
selected group? In other words, people coming
to the clinic specifically for the test?
) P(B' ) B' | P(A P(B) B) | P(A
P(B) B) | P(A
A) | P(B
+
=

Mark G. Haug
Homework Problem:

1) A Manufacturer produces sweatshirts through a
standardized process. 2) The process is efficient and
cost effective, but as such, it produces 20% defects
(cut-outs). 3) A quality management program of
thoroughly inspecting every sweatshirt is not cost
effective. 4) A quality program of cursory inspection of
each sweatshirt is cost effective. 5) A satisfactory
sweatshirt will always pass the proposed inspection. 6)
Approximately 25% of the defective sweatshirts will
also pass the test.
Question: P(defective sweatshirt | pass)?
.0588

Mark G. Haug
Reading: p. 80
Section 5.7

Note:
Consider the tree diagram approach as an aid for
setting up these types of problems.

Homework: pp. 91-92
Problems 5.19, 5.20

Mark G. Haug
5. Lets Make a Deal!

Mark G. Haug

Mark G. Haug
6. Life and Times of Orchestra Conductors

A study found that the average life expectancy
of famous male orchestral conductors was 73.4
years, significantly higher than the life
expectancy for all males, which was 68.5 years
at the time of the study. Jane Brody in her New
York Times health column reported that this was
thought to be due to arm exercise.

Mark G. Haug
Outline of Probability
A. Percentages
B. Risk
C. Formal probability
D. Randomness

Mark G. Haug
Probability Collecting Data
Making Sound
Decisions
1) Correlation
2) Modeling
Describing Data
Natural
Phenomena

Mark G. Haug
Sampling
An infamous case: the Literary Digest
presidential poll of 1936.

Mark G. Haug
Popular Analysis: Since Republicans were
wealthier as a group, they were over-
represented in the sample of ten million
(i.e., more likely to own phones and
automobiles). Thus, the results were biased
toward the Republican, Alf Landon.

Mark G. Haug
Another Analysis: Only 2.3 million of the
10 million responded to the survey.
Voluntary response is nearly always biased,
since volunteers are typically different than
those who dont volunteer.

Mark G. Haug
Reading

Harvard Business School Materials:
Selection Bias and the Perils of Benchmarking

Mark G. Haug
Life and Times of Orchestra Conductors

A study found that the average life expectancy
of famous male orchestral conductors was 73.4
years, significantly higher than the life
expectancy for all males, which was 68.5 years
at the time of the study. Jane Brody in her New
York Times health column reported that this was
thought to be due to arm exercise.

Mark G. Haug
1970
Draft
Lottery

Mark G. Haug
Sampling in Accounting
The Chesapeake and Ohio (C&O) Railroad
Company: There were 23,000 waybills for
a six month period in a district where
freight charges were divided among C&O
and another railroad company.
C&O can 1) examine all waybills to determine
the amount due to C&O, or 2) take a sample of
the waybills and make an estimate.

Mark G. Haug
1. Simple Random Sampling: Randomly
select a number of the 23,000 waybills.
2. Stratified Random Sampling: Create
strata (categories) of waybills based on
some feature (e.g., total $), and then
randomly select a number of the waybills
within each stratum.
Waybills vary from $2 to $200, with most
being low and few being high.

Mark G. Haug
Heres how C&O stratified
(based on statistical theory not covered in this
course):

Waybill (Total Charges) Proportion Sampled
$ 0 to $ 5.00 1%
$ 5.01 to $10.00 10%
$10.01 to $20.00 20%
$20.01 to $40.00 50%
$40.01 and over 100%

Mark G. Haug
Waybill (Total Charges) Proportion Sampled
$ 0 to $ 5.00 1%
$ 5.01 to $10.00 10%
$10.01 to $20.00 20%
$20.01 to $40.00 50%
$40.01 and over 100%

This sampling scheme generated a little
over 2,000 of the 23,000 waybills (9%).
Using the sample data, C&O estimated the
portion due to C&O to be $64,568.

Mark G. Haug
Analysis of C&Os Accounting:

Method Amount Cost of
Due to C&O Method
Stratified Random Sample
of 2,000 Waybills
Complete Examination
of 23,000 Waybills
Difference

Mark G. Haug

Reading: p. 3
Section 1.6

Mark G. Haug
Describing Data
Making Sound
Decisions
1) Correlation
2) Modeling
Natural
Phenomena

Mark G. Haug
Outline of Descriptive Statistics
Population vs. Sample
1. Qualitative vs. Quantitative Data
2. Visually Presenting Data, I
3. Measuring the Center of the Data
4. Measuring the Variability of the Data, I
5. A Special Type of Data: Proportions
6. Visually Presenting Data, II
7. Measuring the Variability of the Data, II

Mark G. Haug
Outline of Descriptive Statistics
Population versus Sample

Mark G. Haug
Qualitative data is data that describes an
attribute rather than a measure.

Quantitative data is data that measures,
rather than describes an attribute.
1. Qualitative vs. Quantitative Data

Mark G. Haug
0 8 16 24 32 40 48 56 64 72

0 8 16 24 32 40 48 56 64 72
1
3
4
4
2 2
2
1
3

Mark G. Haug
0 8 16 24 32 40 48 56 64 72
1
3
4
4
2 2
2
1
3
With a histogram, you can estimate
probabilities. For example, whats the
estimated probability that a lightbulb lasts
longer than 40 seconds?

Mark G. Haug
What is the estimated probability that a lightbulb
will last more than 64 seconds given that it survives
for more than 40 seconds? P(Y>64|Y>40) = ?
A. 3/24
B. 9/24
C. 3/9
0 8 16 24 32 40 48 56 64 72
1
3
4
4
2 2
2
1
3

Mark G. Haug
Reading: pp. 10-12
Sections 2.1-2.3

Homework: p. 21
Problem 2.2

Mark G. Haug

Mark G. Haug
D
e
a
t
h
s

f
r
o
m

C
h
o
l
e
r
a

Aug 18-24 Aug 25-31 Sep 1-7 Sep 8-14 Sep 15-21 Sep 22-28
100
200
300
400
500
Pump Handle Removed
September 8

Mark G. Haug

Mark G. Haug
a. Mean
b. Median
c. Mode

Mark G. Haug
The mean is the sum of the observations
divided by the number of the observations.

For example, the mean of the light bulb
data is the sum of:
6, 9, 11, 14, 16, 19, 20, 22, 24, 26, 28, 29,
31, 34, 36, 44, 46, 50, 54, 56, 63, 66, 70, 70
divided by 24: (844) / (24) ~ 35.

Mark G. Haug
Population Mean:

Sample Mean:
x

Mark G. Haug
The median is simply the middle
observation after the observations have
been arranged in increasing (or decreasing)
order.
For example, the median of the light bulb
data: 6, 9, 11, 14, 16, 19, 20, 22, 24, 26, 28, 29, 31,
34, 36, 44, 46, 50, 54, 56, 63, 66, 70, 70.
Take the average of 29 and 31 since
they are both in the middle: 30.

Mark G. Haug
The mode is the observation that appears
most frequently.

For example, the mode of the light bulb
data: 6, 9, 11, 14, 16, 19, 20, 22, 24, 26, 28, 29, 31,
34, 36, 44, 46, 50, 54, 56, 63, 66, 70, 70.
70 appears more frequently than any
other number, so 70 is the mode.

Mark G. Haug
Reading: pp. 47-48
Section 3.8

Homework: p.49
Problem 3.1

Mark G. Haug
a. Range
b. Variance
c. Standard Deviation

Mark G. Haug
The range is simply the highest value
minus the lowest value. For the light bulb
data:

6, 9, 11, 14, 16, 19, 20, 22, 24, 26, 28, 29,
31, 34, 36, 44, 46, 50, 54, 56, 63, 66, 70, 70

the range is 70 - 6 = 64.

Mark G. Haug
The variance and the standard deviation are
the most abstract descriptive statistics to
this point. The variance is the mean of the
squared differences between each data
point and the mean. The standard deviation
is the square root of the variance.

Mark G. Haug
o
o
deviation standard population
variance population
2

Mark G. Haug
The variance and the standard deviation are
given as follows:
Sample Variance:
( )
1 n
x x
s
2
2
Sample Standard Deviation:

2
s s =

Mark G. Haug
For the light bulb data:

6, 9, 11, 14, 16, 19, 20, 22, 24, 26, 28, 29,
31, 34, 36, 44, 46, 50, 54, 56, 63, 66, 70, 70

sample mean is 35, and n is 24 :
( )
1 24
35 x
s
2
2

Mark G. Haug
6, 9, 11, 14, 16, 19, 20, 22, 24, 26, 28, 29,
31, 34, 36, 44, 46, 50, 54, 56, 63, 66, 70, 70
( ) ( ) ( ) ( )
20 403 s s
403 s
23
35 70 35 11 35 9 35 6
s
2
2
2 2 2 2
2
~ ~ =
~
+ + + +
=

Mark G. Haug
Reading: p. 60
Section 4.6

Homework: p. 67
Problem 4.6

Mark G. Haug
The Empirical Rule: For data that yield symmetric and
mound shaped histograms (normally distributed):
.340 .340
.135 .135
.025 .025
2s x 1s x x 1s - x 2s - x + +

Mark G. Haug
0 8 16 24 32 40 48 56 64 72
1
3
4
4
2 2
2
1
3
?

Mark G. Haug

Example: Suppose data exhibit a symmetric and
mound shaped histogram, with a sample mean
of 100 and a standard deviation of 15.
P(X>115) ~ ?
P(X<100) ~ ?
P(85<X<130) ~ ?
P(100<X<110) ~ ?
P(X=100) ~ ?
The Empirical Rule: For data that yield symmetric and
mound shaped histograms (normally distributed):

Mark G. Haug
Reading: p. 62
Section 4.9

Homework: p. 69
Problems 4.10, 4.11

Mark G. Haug
When we are interested in data that has
only two outcomes, binary data,
(e.g., yes/no, male/female, etc.), then we
analyze our data as proportions.

Mark G. Haug
Assume 80 successes
out of a sample of 100.
( ) ( )
0.04
100
0.8 1 0.8
n

0.80
100
80
n
x
~
= = =

Mark G. Haug
6. Visually Presenting Data, II

Mark G. Haug
1970
Draft
Lottery
?

Mark G. Haug
Approximate Median of Ranks Within Each Month
(1970 Draft Lottery)
0
50
100
150
200
250
300
J
a
n
u
a
r
y
F
e
b
r
u
a
r
y
M
a
r
c
h
A
p
r
i
l
M
a
y
J
u
n
e
J
u
l
y
A
u
g
u
s
t
S
e
p
t
e
m
b
e
r
O
c
t
o
b
e
r
N
o
v
e
m
b
e
r
D
e
c
e
m
b
e
r

Mark G. Haug
Damage
High

Low
Temperature (F)
50 60 70 80

Mark G. Haug
7. Measuring the Variability of the Data, II
There are several types of variation or
explanations as to why we observe
something different than we expected:

a. random variation
b. systematic variation
c. measurement variation

Mark G. Haug
random variation reliability
noise consistency
chance

systematic variation validity
bias accuracy

measurement variation precision
mistake

Mark G. Haug

Mark G. Haug

Mark G. Haug

Mark G. Haug

Mark G. Haug
Exercise: Count the number of times the
letter e appears in the following text:

Mark G. Haug
random variation reliability
noise consistency
chance

systematic variation validity
bias accuracy

measurement variation precision
mistake

Mark G. Haug
W
a
t
e
r

C
o
n
s
u
m
p
t
i
o
n

6/96
6/97
6/98 6/99

Mark G. Haug
Making Sound
Decisions
1) Correlation
2) Modeling
Describing Data
Natural
Phenomena

Mark G. Haug
Random Variables:
Measures Exhibiting Uncertainty
1. Relationship to Histograms
2. Probability Density Function, P(x), f(x)
3. Cumulative Distribution Function, F(x)
4. Expectation
5. Variation

Mark G. Haug
1 2 3 4 5 6 x
Example #1: Roll a die and observe the
outcome. The random variable is the value
observed, which is discrete: a whole
integer, 1 through 6.

Mark G. Haug
Example #2: Flip a coin until
you get tails. The random
variable is the number of
flips until (and including)
tails, which is discrete: a
whole integer, 1 or more.
1 2 3 4 5 6 7 8 9 10

Mark G. Haug
Example #3: Record the height of all KU
students. The random variable is the height
observed, which is continuous.

Mark G. Haug
2. P(x), f(x)
P(x) is the ordinate (height) for a given x
on a discrete distribution. P(x) also
represents the probability of x.

f(x) is the ordinate (height) for a given x on
a continuous distribution. f(x) does not,
represent the probability of x.

Mark G. Haug
1 2 3 4 5 6 x
2. P(x), f(x)
After 100 rolls, you observed a 1 14
times. 14 / 100 = 0.14. P(X=1) ~ 0.14.
P(x)
.17
.19
.15
.18
.17
.14
0.14

Mark G. Haug

Mark G. Haug
Example #3: Record the height (x) ...
2. P(x), f(x)
x
f(x)

Mark G. Haug
2. P(y), f(y)
x
f(x)
x=56
f(x=56) =

Mark G. Haug
}
= s =
+ + + = s =
x
x min
dx f(x) x) P(X F(x)
P(x) P(1) P(0) x) P(X F(x)

Mark G. Haug
1 2 3 4 5 6 x
P(x)
.17
.19
.15
.18
.17
.14
Example: F(3) = P(Xs3) = P(1) + P(2) + P(3)
~ 0.14 + 0.17 + 0.19 = 0.50

Mark G. Haug
}
=
5.5
0
dx f(x)
x
f(x)
x=56
Example: F(56) = P(Xs 56)

Mark G. Haug
4. Expectation
The expected value of a random variable is
synonymous with the mean.
| |
}
=
=
x all
x all
dx f(x) x E(X)
P(x) x E(X)

Mark G. Haug
1 2 3 4 5 6 x
P(x)
.17
.19
.15
.18
.17
.14
4. Expectation
E(X) = (1)(0.14) + (2)(0.17) + + (6)(0.17)
= 3.57

Mark G. Haug
Homework
Case: Charles River Jazz Festival
Read case and answer Questions ## 3, 4
(Answers on Blackboard HB crjf answers)

Mark G. Haug
5. Variation
The variation of a random variable is
synonymous with the variance.
( ) | |
( )
}
=
=
x all
2
x all
2
dx f(x) E(X) x Var(X)
P(x) E(X) x Var(X)

Mark G. Haug
1 2 3 4 5 6 x
P(x)
.17
.19
.15
.18
.17
.14
Var(X) ~ (1 - 3.57)
2
(0.14) + (2 - 3.57)
2
(0.17)
+ + (6 - 3.57)
2
(0.17) = 2.8

Std.Dev.(X) ~ 1.7
5. Variation

Mark G. Haug
Reading: pp. 99-101, 122-123
Sections 6.1, 6.2, 7.1

Homework: p. 107
Problem 6.1


BlackBoard: HW Random Variables.doc

Mark G. Haug
Probability Distributions

Mark G. Haug
( )( )
( ) ( )
2
/ x 0.5
e
2
1
f(x)

=
for - < x <
The Normal Distribution
f(x)
x

Mark G. Haug
( )( )
( ) ( )
2
/ x 0.5
e
2
1
f(x)

=
for - < x <
The Normal Distribution
f(x)
x
You will not need to learn this formula.

Mark G. Haug
f(x)
x
}

= 1 dx f(x)

Mark G. Haug
f(x)
x
}

= = dx f(x) x E(X)

Mark G. Haug
f(x)
x
}

= =
=
2 2 2
2 2
dx f(x) x Var(X)
(x) E ) E(x Var(X)

Mark G. Haug
: mean
: standard deviation
.340 .340
.135 .135
.025 .025
Empirical Rule Revisited
2 1 +1 +2

Mark G. Haug
70 85 100 115 130
.340 .340
.135 .135
.025 .025
Example: The Wechsler IQ test has a mean
of 100 and a standard deviation of 15.
2 1 +1 +2

Mark G. Haug
70 85 100 115 130
.340 .340
.135 .135
.025 .025
P(IQ>100) = ?
P(IQ<70) = ?
P(85<IQ<130) = ?
P(IQ=115) = ?

Mark G. Haug
By converting x-values to z-scores, we can
convert all normal curves, regardless of
or o, into the standard normal curve.
x
z

=
Z, or z-scores, represent the
number of standard
deviations that x is from .
By converting x-values to z-scores,
= 0 and o = 1.

Mark G. Haug
70 85 100 115 130
x
z

=
For x=70, z = (70-100) / (15) = -2.
For x=100, z = (100-100) / (15) = 0.
For x=120, z = (120-100) / (15) = 1.33.
2 1 +1 +2

Mark G. Haug
z
0 +1 -1 -2 +2
Standard Normal Curve

Mark G. Haug
Area (probability)
defined by z.
z
0 z
See Page 389

Mark G. Haug
z .00 .01 .02 .03 ...
0.00 .0000 .0040 .0080 .0120 ...
0.10 .0398 .0438 .0478 .0517 ...
0.20 .0793 .0832 .0871 .0910 ...
... ... ... ... ...
P(0 < Z < 0.21) = ? = .0832
P(Z > 0.21) = ?

Mark G. Haug
z
0 0.21
The table only gives values for the gray area.
In this example, P(0 < Z < 0.21) = .0832.
P(Z > 0.21) is . This area and the gray area add
to .50. So, P(Z > 0.21) = .5000 - .0832 = .4168.

Mark G. Haug
z .00 .01 .02 .03 ...
0.00 .0000 .0040 .0080 .0120 ...
0.10 .0398 .0438 .0478 .0517 ...
0.20 .0793 .0832 .0871 .0910 ...
... ... ... ... ...
P(0 < Z < 0.21) = ?
P(Z < 0.21) = ?
P(Z < 0.21) = ?

Mark G. Haug
z .00 .01 .02 .03 ...
0.00 .0000 .0040 .0080 .0120 ...
0.10 .0398 .0438 .0478 .0517 ...
0.20 .0793 .0832 .0871 .0910 ...
... ... ... ... ...
P(-0.21 < Z < 0) = ?

Mark G. Haug
z
-z 0 z
The Normal Distribution curve is
symmetrical: P(0 to Z) = P(0 to -Z).
Therefore, these areas are equal in size.

Mark G. Haug
z .00 .01 .02 .03 ...
0.00 .0000 .0040 .0080 .0120 ...
0.10 .0398 .0438 .0478 .0517 ...
0.20 .0793 .0832 .0871 .0910 ...
... ... ... ... ...
P(-0.21 < Z < 0) =
P(-0.10 < Z < 0.23) =

Mark G. Haug
Example: What proportion of people have an IQ
of 130 or higher?
(Use the Wechsler IQ test.)
Answer: 2.28% or 0.0228.
Since = 100 and o = 15,
z = (130 - 100) / (15) = 2.00
From the table, P(0<Z<2.00) = .4772
Since P(Z<0)=.5000,
P(Z<2.00) = .5000 + .4772 = .9772,
P(Z>2.00) = 1 - P(Z<2.00) = 1 - .9772 = .0228.

Mark G. Haug
Reading: pp. 123-126
Sections 7.2, 7.3

Homework: pp. 129-133
Problems 7.1, 7.2, 7.3, 7.4, 7.5, 7.6, 7.7, 7.8

Mark G. Haug
P(k < Z < 0) = 0.0832.
What is k?
A. -0.21
B. 0
C. 0.21
z .00 .01 .02 .03 ...
0.00 .0000 .0040 .0080 .0120 ...
0.10 .0398 .0438 .0478 .0517 ...
0.20 .0793 .0832 .0871 .0910 ...
... ... ... ... ...

Mark G. Haug
P(k < Z) = 0.5832.
What is k?
A. -0.21
B. 0
C. 0.21
z .00 .01 .02 .03 ...
0.00 .0000 .0040 .0080 .0120 ...
0.10 .0398 .0438 .0478 .0517 ...
0.20 .0793 .0832 .0871 .0910 ...
... ... ... ... ...

Mark G. Haug
Wechsler IQ Test: =100 =15
P(Z < k) = 0.4207
What is k?
A. -0.2
B. 0
C. 0.2
z .00 .01 .02 .03 ...
0.00 .0000 .0040 .0080 .0120 ...
0.10 .0398 .0438 .0478 .0517 ...
0.20 .0793 .0832 .0871 .0910 ...
... ... ... ... ...

Mark G. Haug
Wechsler IQ Test: =100 =15
P(X < k) = 0.4207
What is k?
A. -0.2
B. 0
C. 0.2
D. 97
E. 103
z .00 .01 .02 .03 ...
0.00 .0000 .0040 .0080 .0120 ...
0.10 .0398 .0438 .0478 .0517 ...
0.20 .0793 .0832 .0871 .0910 ...
... ... ... ... ...

Mark G. Haug
( ) ( )
n ..., 2, 1, 0, x for
1
x
n
f(x) P(x)
x n x
=
|
|
.
|
\
|
= =

Binomial Distribution
NOTE: t is a probability to be specified
in the binomial distribution, and does
not stand for 3.14...

Mark G. Haug
Example: Flip a coin four times and count
the number of heads.

1. Series of independent trials: Four flips of the coin

2. Success or Failure: Heads (or Tails)

3. Probability of success is constant: One-half (50%)

Mark G. Haug
( ) ( )
( )( )
( )
( )
( )
( ) 0.0625 4 X P
0.2500 3 X P
0.3750 2 X P
0.2500
2
1
2
1
4
2
1
1
2
1
1
4
1) P(X
0.0625
2
1
1 1
2
1
1
2
1
0
4
0) P(X
1
x
n
f(x) P(x)
1 4 1 1 4 1
4 0 4 0
x n x
= =
= =
= =
=
|
.
|
\
|
|
.
|
\
|
=
|
.
|
\
|

|
.
|
\
|
|
|
.
|
\
|
= =
=
|
.
|
\
|
=
|
.
|
\
|

|
.
|
\
|
|
|
.
|
\
|
= =
|
|
.
|
\
|
= =

Example: Flip a coin

four times and count the
number of heads.

Mark G. Haug
0 1 2 3 4

Number of Heads
from 4 Coin Tosses
0.0625
0.2500
0.3750

Mark G. Haug
0 1 2 3 4
0.3750
0.0625
0.2500
For t=0.5, regardless of n, you will always
observe symmetry. If t=0.5, then there will
not be symmetry.

Mark G. Haug
( ) ( )
| |
( ) ( ) ( ) | | ( ) 1 n x P X E x Var(X)
n ) P(x x E(X)
1
x
n
f(x) P(x)
n i
0 i
i
2
i
n i
0 i
i i
x n x
= =
= =
|
|
.
|
\
|
= =
=
=
=
=

Mark G. Haug
( ) ( )
( ) 1 n Var(X)
n E(X)
1
x
n
f(x) P(x)
x n x
=
=
|
|
.
|
\
|
= =

Mark G. Haug
0 1 2 3 4
0.3750
0.0625
0.2500
For n=4 and t=0.5, the mean is (4)(0.5)=2.
The variance is (4)(0.5)(1-0.5)=1.
The standard deviation is 1.

Mark G. Haug
Problem: A plays against B.
1. A chooses a number from 1 to 6
2. A rolls 3 dice
3. If all 3 dice show As number, B pays $3 to A
4. If 2 dice show As number, B pays $2 to A
5. If 1 die shows As number, B pays $1 to A
6. Otherwise, A pays $1 to B.

Would you rather be A or B?

Mark G. Haug
Sections 6.3

Problem 6.4, 6.6, 6.7, 6.8, 6.9

101 Special Problems: 5.1, 5.3, 5.7

Mark G. Haug
Example: Flip a coin four times and count
the number of heads.

Note the shape of the distribution: It is
shaped like the Normal Distribution.

What happens
if there are 50 flips instead of 4?

Mark G. Haug
The Normal Approximation to the
If [(n)(t)] > 5 and [(n)(1-t)] > 5,
then you can approximate binomial
distribution probabilities with the normal
approximation. (Even when t = 0.5.)

Mark G. Haug
Example: Lets Make a Deal! If you choose
not to switch doors after the goat is
revealed, what is the probability that you
will win 10 or more times in 18 tries?
Is [(n)(t)] > 5 and [(n)(1-t)] > 5?
Yes:
[(n)(t)] = (18)(0.33) = 6.
[(n)(1-t)] = (18)(0.67) = 12.

Mark G. Haug
The Normal Approximation to the Binomial Distribution
Example: Lets Make a Deal! If you choose not to switch
doors after the goat is revealed, what is the probability that
you will win 10 or more times in 18 tries?

Mark G. Haug
P(X > 10)
0 2 4 6 8 10 . . .

Mark G. Haug
( )
( )
( )
0.0228 2) P(Z 10) P(X
2
4
6 10
Z
3
1
1
3
1
18
3
1
18 x
1 n
n x
Z
10) P(X
10) P(X
= > = >
=
=
(
|
.
|
\
|

|
.
|
\
|
(
|
.
|
\
|
=
>
>

Mark G. Haug
18 18 18
12 18 12
11 18 11
10 18 10
3
1
1
3
1
18
18
) 18 ( P
3
1
1
3
1
12
18
) 12 ( P
3
1
1
3
1
11
18
) 11 ( P
3
1
1
3
1
10
18
) 10 ( P
|
.
|
\
|

|
.
|
\
|
|
.
|
\
|
=
+
|
.
|
\
|

|
.
|
\
|
|
.
|
\
|
=
+
|
.
|
\
|

|
.
|
\
|
|
.
|
\
|
=
+
|
.
|
\
|

|
.
|
\
|
|
.
|
\
|
=
You can also

solve the
problem the
long way:

Mark G. Haug
Another problem:
Assume n=870 and t=0.791.
What is the expected value?
What is the standard deviation?
Expected Value = nt
Variance = nt(1- t)

Mark G. Haug
Section 7.4

NOTE: Ignore the correction for continuity and the n 30 rule.

Homework: p. 133
Problems 7.9, 7.10

BlackBoard:
HW Binomial Normal Approximation.doc

Mark G. Haug
Poisson Distribution
... 2, 1, 0, x for
x!
e
f(x) P(x)
x
=
= =

Mark G. Haug
| |
( ) ( ) ( ) | | x P X E x Var(X)
) P(x x E(X)
x!
e
f(x) P(x)
i
0 i
i
2
i
i
0 i
i i
x
= =
= =
= =
=
=
=
=

Mark G. Haug
Var(X)
E(X)
x!
e
f(x) P(x)
x
=
=
= =

Mark G. Haug
Poisson Distribution Problem
Kansas Lemon Law applies to all vehicles under
12,000 lb. GVW after 10 repair attempts for
different defects within the lesser of the warranty
period or 1 year

Assume the industry average is 2.73 different defects
per car per year. What is the general probability that
a consumer will avail himself of the KS Lemon Law?

Mark G. Haug
Kansas Lemon Law applies to all vehicles under 12,000 lb. GVW after 10 repair
attempts for different defects within the lesser of the warranty period or 1 year

Assume the industry average is 2.73 different defects per car per year. What is the
general probability that a consumer will avail himself of the KS Lemon Law?

Mark G. Haug

BlackBoard: HW Poisson.doc
Section 6.6

Mark G. Haug
Geometric Distribution
A marketing problem: To tie a consumer
kid to your product, you decide to put 6
different toys in your cereal. Believing kid
consumer will want to collect all 6 toys,
how many boxes per child can you expect
to sell as a result of your marketing
scheme?

Mark G. Haug
A marketing problem: To tie a consumer kid to your
product, you decide to put 6 different toys in your
cereal. Believing kid consumer will want to collect all
6 toys, how many boxes can you expect to sell as a
result of your marketing scheme?

A. 6
B. 12
C. 15
D. 18
E. 21
F. 24
G. 120

Mark G. Haug
Starting with no toys, kid consumer has a 6
out of 6 chance of getting a
non-duplicate toy in his/her first box.
The expected value for a geometric
distribution is (1/t). Therefore, the
expected value, the number of boxes, is:

Mark G. Haug
Kid consumer has a 5 out of 6 chance of
getting a non-duplicate toy in his/her next
box.

Mark G. Haug
Once a second unique toy is obtained, kid
consumer has a 4 out of 6 chance of getting
a non-duplicate toy in his/her next box.

Mark G. Haug
Unique Expected Number
Toys of Boxes

Mark G. Haug
Geometric Distribution
Homework: What is the expected number
of boxes sold per child for a collection of 4
unique toys? 8 unique toys?

Mark G. Haug
Making Sound
Decisions
1) Correlation
2) Modeling
Describing Data
Natural
Phenomena

Mark G. Haug
The Central Limit Theorem
For any population, the

sampling distribution of the sample mean

is approximately normal

if the sample size, n, is sufficiently large.

Mark G. Haug
The Central Limit Theorem
For any population, the
sampling distribution of the sample mean
is approximately normal
if the sample size, n, is sufficiently large.

This is true for the sample mean, and it is
also true for the sample sum.

Mark G. Haug
A. less than 1.00
B. 1.00 to 1.99
C. 2.00 to 2.99
D. 3.00 to 3.99
E. 4.00 to 4.99
F. 5.00 to 5.99
G. 6.00 to 6.99
H. 7.00 to 7.99
I. 8.00 to 8.99
J. 9.00 or greater
What is the mean (average) of the last four
digits in your residential phone number?

Mark G. Haug
Germans on sampling, 16th Century: Stand at
the door of a church on a Sunday and bid 16
men to stop, tall ones and small ones, as they
happen to pass out when the service is
finished; then make them put their left feet one
behind the other, and the length thus obtained
shall be a right and lawful rood to measure and
survey the land with, and the 16th part of it
shall be a right and lawful foot.

Mark G. Haug
Describe
the probability distribution
for the length
of a rood.

Mark G. Haug
Example: Assume that 16th Century
German mens left feet have a mean length
of 13 inches with a standard deviation of 2
inches. We can only guess about the
probability distribution. Well also assume
that there are not sampling issues.

Describe the probability distribution for the
length of a rood.

Mark G. Haug
Sampling distribution for
the length of a rood.

Mark G. Haug
length of a rood.
The distribution for the length of a rood is
approximately normal with a mean of 17.33
feet and a standard deviation of 8 inches.

Mark G. Haug
T T T T
2 T 1 T T 1 - T 2 - T o + o + o o
The distribution for
the length of a rood
is approximately
normal with a mean
of 17.33 feet and a
standard deviation
of 8 inches.
16.00 16.67 17.33 18.00 18.67

Mark G. Haug
Describe
the probability distribution
for the length
of a right and lawful foot.

Mark G. Haug
Example: Assume that 16th Century
German mens left feet have a mean length
of 13 inches with a standard deviation of 2
inches. We can only guess about the
probability distribution. Well also assume
that there are not sampling issues.

length of a right and lawful foot.

Mark G. Haug
length of a right and lawful foot.
The distribution for the length of a right
and lawful foot is approximately normal
with a mean of 13 inches
and a standard deviation of 0.5 inches.

Mark G. Haug
x x x x
2 x 1 x x 1 - x 2 - x + +
The distribution for
the length of a right
and lawful foot is
approximately
normal with a mean
of 13 inches and a
standard deviation
of 0.5 inches.
12 12.5 13 13.5 14

Mark G. Haug
Sections 8.1-8.5

Problems 8.1, 8.2, 8.3

Mark G. Haug
x
T
Standard Deviation

Standard Deviation
of Estimated Proportion

Standard Deviation
of Sample Sum

Standard Deviation
of Sample Mean
(Standard Error of the Mean)

Mark G. Haug
You operate a call center that requires a minimum of
30,000 outbound calls per day. In other words, you
must produce at least 30,000 calls to accomplish
several operational objectives.

Call center employees can handle the number of calls
per day according to a very strange distribution (it
has two bumps and is asymmetric and ugly compared
to all the pretty distributions weve seen before).
Using our basic statistics calculations, the data
forming the distribution has an average of 100 calls
per employee per day and a standard deviation of 15
calls per employee per day.

Mark G. Haug
Q1: If you employ 300 callers, what is the
probability that you will make at least 30,000 calls in
the day?

probability that you will make at least 30,000 calls in
the day?

Q3: To insure a 99% probability that all 30,000 calls
are handled for a given day, how many call center
employees need to be present for that day?

Mark G. Haug
Call center employees can handle the number of calls per day according
to a very strange distribution (it has two bumps and is asymmetric and
ugly compared to all the pretty distributions weve seen before). Using
our basic statistics calculations, the data forming the distribution has an
average of 100 calls per employee per day and a standard deviation of
15 calls per employee per day.

Average = 100
Standard Deviation = 15
Distribution: Unknown

Mark G. Haug
probability that you will make at least 30,000
calls in the day?

Mark G. Haug
calls in the day?

Mark G. Haug
calls in the day?

Mark G. Haug
calls in the day?

Mark G. Haug
Q3: To insure a 99% probability that all 30,000
calls are handled for a given day, how many
call center employees need to be present for
that day?
E(T) = n E(X) T = 30,000
Gray area is
0.99 of the
entire area
Red area is
0.01 of the
entire area

Mark G. Haug
z = 0 z = ?
that day?

Mark G. Haug
that day?
z = 0 z = ?
Gray area is
0.49 of the
entire area
and is the
appropriate
area to
consider in
the z-Table
z = -2.33

Mark G. Haug
that day?

Mark G. Haug
that day?

Mark G. Haug
that day?

Mark G. Haug

Homework:
BlackBoard HW CLT.doc

Mark G. Haug
Central Limit Theorem
Application: Confidence Intervals (CI)

Mark G. Haug
Random
Sample
(re: CLT)
Calculate

s and x
Recognize

x
2s x 95% ~

Mark G. Haug
Confidence Interval (CI):

1. Must be designated by X%, i.e., a 95% Confidence
Interval.

2. Our class will only consider Two-Sided 95% CIs
(except wherever the HW asks you to do otherwise).

3. A 95% CI means the following:
For every random sample possible, approximately
95% of the random samples will produce CIs that
include the true value (that which is being estimated).

4. A 95% CI means nothing more than #3 above.

Mark G. Haug
|
.
|
\
|
=
n
s
2 x 2s x
x
How wide or narrow will the interval be?
Answer: Depends:

Mark G. Haug
Central Limit Theorem
Application: Confidence Intervals (CI)

1. CI for a mean
2. CI for a proportion
3. Sample size issues

Mark G. Haug
x x x x
2 x 1 x x 1 - x 2 - x + +
z
o/2
= 1.96 for o=0.05
z
o/2
= z
0.05/2
= z
0.025

-1.96 +1.96

Mark G. Haug
1. CI for a mean (known o)
( ) ( )
x
2
x
2
z x z x
|
.
|
\
|
+ s s
|
.
|
\
|
Example: Based on a sample, n=100, you

find that the sample mean is 50 and know
that the population standard deviation is 10.
Determine the 95% CI for .
Note: o = 1 - 0.95. o = 0.05.

Mark G. Haug
1. CI for a mean (known o)
( ) ( )
51.96 48.04
100
10
(1.96) 50
100
10
(1.96) 50
z x z x
x
2
x
2
s s
|
.
|
\
|
+ s s |
.
|
\
|
|
.
|
\
|
+ s s
|
.
|
\
|

Mark G. Haug
Section 8.6

Homework: p. 153
Problems 8.7, 8.8

Mark G. Haug
1. CI for a mean (unknown o)
( ) ( )
x
2
1), (n
x
2
1), (n
s t x s t x
|
.
|
\
|
+ s s
|
.
|
\
|

Example: Based on a sample, n=100, you
find that the sample mean is 50 and the
sample standard deviation is 10. Determine
the 95% CI for .
See Page 390

Mark G. Haug
1. CI for a mean (unknown o)
( ) ( )
51.98 48.02
100
10
(1.98) 50
100
10
(1.98) 50
s t x s t x
x
2
1), (n
x
2
1), (n
s s
|
.
|
\
|
+ s s |
.
|
\
|
|
.
|
\
|
+ s s
|
.
|
\
|

Mark G. Haug
Section 8.8

Problem 8.17

Mark G. Haug
2. CI for a proportion
5 )
- n(1 and 5
n for
n
)
(1
n
)
(1
> >
|
.
|
\
|
+ s s
|
.
|
\
|

Mark G. Haug
The Use and Misuse of Statistics
Read Sections 4 and 5
Section 9.3

Homework: p. 167
Problem 9.6

Mark G. Haug
3. Sample Size Issues
E
n
)
(1
:
2
|
.
|
\
|
Example: How large must a sample be if I

want to estimate t within 0.03, and be 95%
confident of this estimate?
|
|
.
|
\
|

|
.
|
\
|
= =
n
) (1
z 0.03 E 0.03
:
2

Mark G. Haug
0 1
t
| |
0
d
)
Var( d
=
If we take a conservative approach, that is,
assume the greatest variability,
then we cant really go wrong.

Mark G. Haug
| |
maximized. is )
Var( then , when is which

0, 2 - 1 when re therefo 0,
n
2 1
0
n
2 1
n
2
n
1
d
)
Var( d
n
n
-
n
) - (1
)
Var(
2
1
2 2
=
= =
= =
= = =

Mark G. Haug
( ) 067 , 1
03 . 0
96 . 1
25 . 0 n
n
25 . 0
96 . 1
03 . 0
n
0.5) - 0.5(1
(1.96) 0.03
n
) 1 (
z 03 . 0 E 03 . 0 :
2
2
2
=
|
.
|
\
|
=
=
|
.
|
\
|
|
|
.
|
\
|
=
|
|
.
|
\
| t t
|
.
|
\
|
= = t t
o

Mark G. Haug
Within 1E

Interval 2E
Width 2E
Range 2E

Always use 0.50 for -hat when calculating
sample size (unless otherwise directed).

Always use observed -hat for -hat when
calculating a confidence interval for
(unless otherwise directed).

Mark G. Haug
Reading: pp. 147-148, 163
Sections 8.7, 9.4

Homework: pp. 154, 168
Problems 8.12, 8.13, 9.8

Mark G. Haug
Making Sound
Decisions
1) Correlation
2) Modeling
Describing Data
Natural
Phenomena

Mark G. Haug
Inference
1. This is
our hope.
4. This is
our hope.
2.
This is an error.
(Type 1 Error)
3.
This is an error.
(Type 2 Error)
State of Nature
Defendant
is Innocent
Defendant
is Guilty
O
u
t
c
o
m
e

Evidence
Does Not
Support
Guilt
Evidence
Supports
Guilt
Criminal
Justice View

Mark G. Haug
Reading:

Blackboard: In Re Winship.doc

Mark G. Haug
Inference
1. This is
what we assume
throughout the
scientific method.
4. This is
our motivation.
State of Nature
New Theory
is Wrong
New Theory
is Right
O
u
t
c
o
m
e

Evidence
Does Not
Support
New Theory
Evidence
Does
Support
New Theory
Science View
2.
This is an error.
(Type 1 Error)

3.
This is an error.
(Type 2 Error)

Mark G. Haug
Inference
State of Nature
O
u
t
c
o
m
e

2. In science, we
control o: 0.05.
3. In science, we
never really
know |.
H
O
: New Theory
is Wrong
H
a
: New Theory
is Right
Evidence
Does Not
Support
New Theory
Evidence
Does
Support
New Theory
Science View
1. This is
what we assume
throughout the
scientific method.
4. This is
our motivation.

Mark G. Haug
Inference
2.
False-Positive
3.
False-Negative
State of Nature
Negative Positive
Medical Lab
View
1.
Specificity
4.
Sensitivity
O
u
t
c
o
m
e

Test
Shows
Negative
Test
Shows
Positive

Mark G. Haug
1. Test for a Mean, Known Std. Dev.
Example: I.D.E.A. guarantees a free
appropriate public education for children
with disabilities. Idiopathic cognitive
delays can be measured with IQ tests.
Suppose a child scored a 68 on an IQ test.
Is this evidence of a learning disability? Is
an IQ of 68 statistically significantly
different than the norm?

Mark G. Haug
Example: I.D.E.A. guarantees a free appropriate public
education for children with disabilities. Idiopathic cognitive
delays can be measured with IQ tests. Suppose a child scored a
68 on an IQ test. Is this evidence of a learning disability? Is an
IQ of 68 statistically significantly different than the norm?
H
O
: No Learning Disability (Null Hypothesis)
H
a
: Learning Disability (Alternative Hypothesis)

Mark G. Haug
Example: I.D.E.A. guarantees a free appropriate public
education for children with disabilities. Idiopathic cognitive
delays can be measured with IQ tests. Suppose a child scored a
68 on an IQ test. Is this evidence of a learning disability? Is an
IQ of 68 statistically significantly different than the norm?
We know the population means and
standard deviations for most IQ tests
(e.g., Wechsler).
For the Wechsler test, X has a normal
distribution with =100 and o=15.

Mark G. Haug
z - scores
-2 -1 0 1 2
Set o = 0.05
o/2 = 0.025

Mark G. Haug
-2 -1 0 1 2
Set o = 0.05
o/2 = 0.025
two-sided
(should it be?)

Mark G. Haug
Rejection Region
-2 -1 0 1 2
Set o = 0.05
o/2 = 0.025

Mark G. Haug
z - scores
-2 -1 0 1 2
Set o = 0.05
o/2 = 0.025
|z| = 1.96
Critical
Value

Mark G. Haug
-2 -1 0 1 2
2.13
15
100 68
x
z
o
~
= Test Statistic

Mark G. Haug
-2 -1 0 1 2
Since the test
statistic, -2.13,
is in the
rejection
region,
we reject H
O
.
An IQ test score of 68 is statistically significantly
different than the norm.
2.13
15
100 68
x
z
o
~
=

Mark G. Haug
-2 -1 0 1 2
P-value
P-value =
0.0166 X 2 =
0.0332
2.13
15
100 68
x
z
o
~
=

Mark G. Haug
-2 -1 0 1 2
Set o = 0.05
o/2 = 0.025
two-sided
(should it be?)

Mark G. Haug
-2 -1 0 1 2
Set o = 0.05 one-sided
z = -1.645

Mark G. Haug
-2 -1 0 1 2
z = -1.645
Reject H
O

2.13
15
100 68
x
z
o
~
=

Mark G. Haug
-2 -1 0 1 2
z = -1.645
Reject H
O

P-value?
P-value
= 0.0166

Mark G. Haug
Also:
2.13
15
100 68
x
z
o
~
=
x
o
x
z

=

Mark G. Haug
Reading: pp. 174-178, 182-183
Sections 10.1-10.3, 10.7

BlackBoard:
Buffett The Superinvestors of Graham and Dodd.doc

Homework: p. 185, 191
Problem 10.2, 10.19

Mark G. Haug
Terms to Know Concerning Statistics and
the Idea of Hypothesis Test (Ideas)
Type 1 Error: Rejecting H
O
when in fact H
O
is true.
Type 2 Error: Not Rejecting H
O
when in fact H
O
is false.
(alpha): The probability of a Type 1 Error.
(beta): The probability of a Type 2 Error.
H
O
(Null Hypothesis): What is assumed for policy
reasons, or can be assumed to be true by reasonable
people, or a reasonable beginning to an investigation.
H
a
(or H
1
, Alternative Hypothesis): What is promoted,
or what is advocated and must be proven.

Mark G. Haug
More Terms to Know Concerning Statistics
and the Idea of Hypothesis Test (Planning)
Two-Sided: Divides (alpha) into two parts.
Critical Value: A threshold value with which the test
statistic for the observed data must pass through to
reject the H
O
. For this class, a critical value may be a
z-statistic, a t-statistic, or a
2
statistic (covered later).

Rejection Region: The area of a probability distribution
identified by (alpha).

Mark G. Haug
More Terms to Know Concerning Statistics
and the Idea of Hypothesis Test (Observed)
Test Statistic: A measure of the evidence derived
from the observed data.

Statistical Significance: When a test statistic passes the
threshold established by the critical value, then
we reject the H
O
, and say that the evidence derived
from the observed data is statistically significant.

P-Value: The probability that the difference between
what is observed and what is expected (H
O
) is due to
chance. If the p-value is less than (alpha), then we
reject H
o
and say the observed data is is statistically
significant.

Mark G. Haug
Now suppose that you tested a random
sample of 25 judges, suspecting that they
may have higher IQs than average.

What is the probability that you will reject
H
O
based on the sample?
Type II Error and Power

Mark G. Haug
Based on a random sample of n=25, we have
a sampling distribution for H
O
with a
mean (of the sample mean) of 100 and a
standard deviation (of the sample mean) of 3.
o = 0.05 / 2
94 97 100 103 106
z = 1.96
105.9 105.88 z
x
~ = +

Mark G. Haug
Now suppose that you tested a random
sample of 25 judges, suspecting that they
may have higher IQs than average.
Assume that in truth, the population of
judges has a mean IQ of 105. What is the
probability that you will reject H
O
based on
the sample?

Mark G. Haug
99 102 105 108 111
94 97 100 103 106
z = 1.96
105.9 z
x
~ +

Mark G. Haug
This region represents |, the
measure of the Type 2 Error.

P(Z<0.3) = 0.6179
( )
3 . 0
3
105 9 . 105
z =
=
99 102 105 108 111
z = 1.96
105.9 z
x
~ +

Mark G. Haug
Power is the probability of rejecting H
O
,
when in fact you should reject H
O
.

Power is the complement of |, therefore the
power of this particular test is
( 1 - 0.6179 ) = 0.3821.

Mark G. Haug
1. This is
what we assumed
to get started.
4. This is
our motivation.

Power = 0.38

State of Nature
IQ = 100

IQ = 105

O
u
t
c
o
m
e

Evidence
Does Not
Support
IQ > 100
Evidence
Does
Support
IQ > 100
2.
This is an error.
(Type 1 Error)
= 0.05
3.
This is an error.
(Type 2 Error)
= 0.62

Mark G. Haug
Power (and 1-Power = Type II)

1. Declare null hypothesis to
2. Establish rejection region for null hypothesis and

3. Calculate P (reject null hypothesis | truth )

Mark G. Haug
Now suppose that you tested

a random sample of 225 judges,

suspecting that they may have higher IQs
than average.

H
O

Mark G. Haug
Based on a random sample of n = 225, we
have a sampling distribution for H
O
with a
97 100 103
101 102

Mark G. Haug
Based on a random sample of n = 225, we
have a sampling distribution for H
O
with a
o = 0.05 / 2
97 100 103
z = 1.96
102 101.96 z
x
~ = +

Mark G. Haug
Now suppose that you tested

a random sample of 225 judges,

suspecting that they may have higher IQs
than average. Assume that in truth, the
population of judges has a mean IQ of 105.
H
O

Mark G. Haug
97 100 103 105
z = 1.96
102 101.96 z
x
~ = +

Mark G. Haug
105
z = 1.96
102 101.96 z
x
~ = +

Mark G. Haug
105
102 101.96~
Area represents , the
probability of a Type 2 Error.
Here, about 0.0013, yielding
a power of 0.9987.

Mark G. Haug
101 Special Problems:
10.2, 10.8, 10.13, 10.14
Section 10.4

Homework: p. 189
Problem 10.13

Mark G. Haug
2. Test for a Mean, Unknown Std. Dev.
table - t Use 3)
1 - n df 2)
Determine 1)
: Region Rejection
n
s
- x
t Statistic, Test
O
=
=
(See Page 390)

Mark G. Haug
Reading: pp. 182
Section 10.6

Homework: p. 191
Problem 10.18

Mark G. Haug
3. Confidence Intervals

Mark G. Haug
time

Mark G. Haug
time

Mark G. Haug
time

Mark G. Haug
time

Mark G. Haug
Sections 10.8, 10.9

Homework: p. 192
Problem 10.22

Mark G. Haug
4. Test for Difference Between Two Means
a. t-Test (Independent Samples)
b. Paired Sample Test

Mark G. Haug
Known o: o
1
= o
2
Unknown o: o
1
= o
2
Known o
1
and Known o
2
, but o
1
= o
2

Unknown o
1
and Unknown o
2
, and o
1
= o
2

BY

Equal / Unequal Sample Sizes

Mark G. Haug
2
2
2
2
1
2
1
1
2
1
2
2
2
2
2
1
2
1
1
2
1
1
2 1
2
2
2
1
2
1
2 1
'
2 1 a 2 1 2 1 O
n
s
n
s
n
s
1) (n
n
s
n
s
n
s
1 1) (n
1) 1)(n (n
df
n
s
n
s
x x
t
: H 0 : : H
|
|
|
|
.
|
\
|
+
+
|
|
|
|
.
|
\
|
+

=
+
=
= = =

Mark G. Haug
Example:
Sample
Size Mean Std. Dev.
B 16 1.075 0.5796
P 16 1.159 0.6134

Mark G. Haug
Motor Vehicle Mfrs. Ass
=
n of USA v. E.P.A.
768 F. 2d 385
(D.C. Cir., 1985)

Nitrous Oxide Emissions from 16 Cars:

Car Base Fuel (B) Petrocoal (P) Difference (P-B) Sign of Difference

1 1.195 1.385 0.190 +
2 1.185 1.230 0.045 +
3 0.755 0.755 0.000 tie
4 0.715 0.775 0.060 +

5 1.805 2.024 0.219 +
6 1.807 1.792 -0.015 -
7 2.207 2.387 0.180 +
8 0.301 0.532 0.231 +

9 0.687 0.875 0.188 +
10 0.498 0.541 0.043 +
11 1.843 2.186 0.343 +
12 0.838 0.809 -0.029 -

13 0.720 0.900 0.180 +
14 0.580 0.600 0.020 +
15 0.630 0.720 0.090 +
16 1.440 1.040 -0.400 -

Mean 1.075 1.159 0.0841
S. Dev. 0.5796 0.6134 0.1672

Mark G. Haug
The Federal Clean Air Act on new fuels:

The Administrator of the EPA may grant a
waiver if the new fuel will not cause or
contribute to a failure of any emission
control device or system
Read as: H
O
:
new
=
standard
is tenable

Mark G. Haug
Motor Vehicle Mfrs. Assn of U.S. v. E.P.A.
768 F.2d 385 (D.C. Cir. 1985)
1 1.195 1.385 +0.190 +
2 1.185 1.230 +0.045 +

16 1.440 1.040 -0.400 -

Mean 1.075 1.159 0.0841
Std. Dev. 0.580 0.613 0.1672
Car Control Fuel Petrocoal Difference Sign
Nitrous Oxide Emissions

Mark G. Haug
30
n
s
n
s
n
s
1) (n
n
s
n
s
n
s
1 1) (n
1) 1)(n (n
df
0.40
16
(0.6134)
16
(0.5796)
1.159 1.075
n
s
n
s
x x
t
0 : : H
2
2
2
2
1
2
1
1
2
1
2
2
2
2
2
1
2
1
1
2
1
1
2 1
2 2
2
2
2
1
2
1
2 1
'
2 1 2 1 O
~
|
|
|
|
.
|
\
|
+
+
|
|
|
|
.
|
\
|
+

=
=
+
=
+
=
= =

Mark G. Haug
30
n
s
n
s
n
s
1) (n
n
s
n
s
n
s
1 1) (n
1) 1)(n (n
df
0.40
16
(0.6134)
16
(0.5796)
1.159 1.075
n
s
n
s
x x
t
0 : : H
2
2
2
2
1
2
1
1
2
1
2
2
2
2
2
1
2
1
1
2
1
1
2 1
2 2
2
2
2
1
2
1
2 1
'
2 1 2 1 O
~
|
|
|
|
.
|
\
|
+
+
|
|
|
|
.
|
\
|
+

=
=
+
=
+
=
= =
df (n
1
+ n
2
- 2)

Mark G. Haug
b. Paired Sample Test

Mark G. Haug
Motor Vehicle Mfrs. Ass
=
n of USA v. E.P.A.
768 F. 2d 385
(D.C. Cir., 1985)

Nitrous Oxide Emissions from 16 Cars:

Car Base Fuel (B) Petrocoal (P) Difference (P-B) Sign of Difference

1 1.195 1.385 0.190 +
2 1.185 1.230 0.045 +
3 0.755 0.755 0.000 tie
4 0.715 0.775 0.060 +

5 1.805 2.024 0.219 +
6 1.807 1.792 -0.015 -
7 2.207 2.387 0.180 +
8 0.301 0.532 0.231 +

9 0.687 0.875 0.188 +
10 0.498 0.541 0.043 +
11 1.843 2.186 0.343 +
12 0.838 0.809 -0.029 -

13 0.720 0.900 0.180 +
14 0.580 0.600 0.020 +
15 0.630 0.720 0.090 +
16 1.440 1.040 -0.400 -

Mean 1.075 1.159 0.0841
S. Dev. 0.5796 0.6134 0.1672

Mark G. Haug
Example:
Sample
Size Mean Std. Dev.
B 16 1.075 0.5796
P 16 1.159 0.6134
P-B 16 0.0841 0.1672

Mark G. Haug
131 . 2 t
753 . 1 t
15 df
01 . 2
16
1672 . 0
0 0841 . 0
s
- d
t Statistic, Test
0 : H
) 15 ( 025 . 0 2 /
) 15 ( 05 . 0
d
O
d O
=
=
=
~
=
=
= o
= o

Mark G. Haug
Section 11.3

Problem 11.5

Homework:
Blackboard HW Paired Difference.doc

Mark G. Haug
Volunteers
124
Volunteers
Step 1 Diet
Stratified Randomization
Pre-Test
Cholesterol
Pre-Test
Cholesterol
Post-Test
Cholesterol
6 weeks
of cornflakes
Post-Test
Cholesterol
6 weeks
of Cheerios

Mark G. Haug
Cheerios may reduce your cholesterol by as
much as 18%!
Is this claim true?
Is it deceptive?
A. Yes, Yes
B. Yes, No
C. No, Yes
D. No, No

Mark G. Haug
5. Test for Proportions
Expected Value = nt = (870)(.791) = 688.
Variance = nt(1- t) = (870)(0.791)(1-.791)
= 143.83, and the standard deviation is the
square root of the variance, 143.83, so the
standard deviation ~ 12.

Mark G. Haug
Expected Value = nt = (870)(.791) = 688.
Variance = nt(1- t) = (870)(0.791)(1-.791) = 143.83, and the
standard deviation is the square root of the variance, 143.83,
so the standard deviation ~ 12.
H
O
: t = 0.791

Mark G. Haug
H
O
: t = 0.791
x=339
140
O O
O
10 29) P(Z
29
12
688 339
z
0.791) 1)(1 (870)(0.79
1) (870)(0.79 339
) (1 n
n x
z
= <
~
=

Mark G. Haug

Mark G. Haug
Reading:

Blackboard: JAMA Therapeutic Touch.pdf

Mark G. Haug
Probability
Making Sound
Decisions
1) Correlation
2) Modeling
Collecting Data
Describing Data
Natural
Phenomena

Mark G. Haug
Heart No Heart
Attack Attack
Aspirin 104 10,933
Placebo 189 10,842
H
0
: Variables Are Independent
(No Association)
H
a
: Variables Are Dependent
(Association)

Mark G. Haug
Heart No Heart
Attack Attack
Aspirin 104 10,933
Placebo 189 10,842
Chi-Squared (_
2
) Test for Association
Step #1: Calculate Row Totals,
Column Totals, and Grand Total.
11,037
11,031
293 21,775 22,068

Mark G. Haug
Heart No Heart
Attack Attack
Aspirin 104 10,933
Placebo 189 10,842
Step #2: Calculate expected value for
each cell under the assumption of
independence:
11,037
11,031
293 21,775 22,068
( )( )
147
22068
293 11037
G
C R
E
1 1
1 , 1
~ =
=

Mark G. Haug
H.A. No H.A.
Aspirin 104 10,933
Placebo 189 10,842
11,037
11,031
293 21,775 22,068
( )( )
( )( )
( )( )
( )( )
885 , 10
22068
21775 11031
G
C R
E
890 , 10
22068
21775 11037
G
C R
E
146
22068
293 11031
G
C R
E
147
22068
293 11037
G
C R
E
2 2
2 , 2
2 1
2 , 1
1 2
1 , 2
1 1
1 , 1
~ =
=
~ =
=
~ =
=
~ =
=

Mark G. Haug
H.A. No H.A.
Aspirin 104 10,933
Placebo 189 10,842
H.A. No H.A.
Aspirin 147 10,890
Placebo 146 10,885
Step #3:
( )
(
(

= _
ij
ij
2
ij ij
2
E
E n

Mark G. Haug
H.A. No H.A.
Aspirin 104 10,933
Placebo 189 10,842
H.A. No H.A.
Aspirin 147 10,890
Placebo 146 10,885
( ) ( )
( ) ( )
~
= _
10885
10885 10842
146
146 189
10890
10890 10933
147
147 104
2 2
2 2
2
26

Mark G. Haug
( ) ( )
( ) ( )
~
= _
10885
10885 10842
146
146 189
10890
10890 10933
147
147 104
2 2
2 2
2
26
Step #4:
Calculate the degrees of freedom for this test:
[(number of rows) - 1][(number of columns) - 1]
[(r-1)(c-1)] = [2-1][2-1] = 1

Mark G. Haug
_
2
test statistic ~ 26 with 1 degree of freedom
Conclusion:
Since 26 > 3.84 (See Page 392),
we conclude that there is an association
between aspirin intake and heart attacks.

Mark G. Haug
Section 12.3

Homework: p. 233
Problem 12.9

AND
Problems on the next two slides

Mark G. Haug

10 1,605
1,770 560,010
_
2
~ 4.7 with 1 df.
0.05 > p-value
Homework:

Mark G. Haug
Homework:

71 102
44 17
_
2
~ 17.4 with 1 df.
0.001 > p-value

Mark G. Haug
Association (Continued)

Classical Experimentation
Clinical Trials
Cohort Studies
Case-Control Studies

Mark G. Haug
Classical Experimentation
Theory:
My new super-polymer rubber tire
lasts longer than other tires.
H
O
:
super-polymer
=
other

Mark G. Haug
Random samples
Classical Experimentation: Diagram
Other tires
New tires
Apply statistical methods to determine
if there is a significant difference

Mark G. Haug
Clinical Trials
Theory:
Aspirin is effective against heart attacks.
H
O
:
aspirin
=
placebo
OR
H
O
: No Association

Mark G. Haug
Clinical Trials: Diagram
The population
of people who
may suffer from
heart attacks
Volunteers
Randomly assign
volunteers into groups:
placebo and aspirin
Is there a difference?

Mark G. Haug
Heart No Heart
Attack Attack

Aspirin 104 10,933
Placebo 189 10,842
Conclusion: Since _
2
=26 > 3.84, we
conclude that there is an association
between aspirin intake and heart attacks.

Mark G. Haug
Volunteers
124
Volunteers
Step 1 Diet
Stratified Randomization
Pre-Test
Cholesterol
Pre-Test
Cholesterol
Post-Test
Cholesterol
6 weeks
of cornflakes
Post-Test
Cholesterol
6 weeks
of Cheerios

Mark G. Haug

Mark G. Haug
Cohort Study (Prospective Study)
Theory:
Women in executive positions
are at risk for breast cancer.
H
O
: No Association

Mark G. Haug
Cohort Study: Diagram
The entire
population of
women without
breast cancer
Volunteers
Follow
volunteers
over time
Is there a
difference in
the proportion
of women
who are are
executives?
Volunteers without
breast cancer
Volunteers with
breast cancer

Mark G. Haug
Breast No Breast
Cancer Cancer
Executive 10 1,605
Non-Exec. 1,770 560,010

Mark G. Haug
Breast No Breast
Cancer Cancer
Executive 10 1,605
Non-Exec. 1,770 560,010
Relative Risk (RR):
( )
( )
2 97 . 1
010 , 560 1770
1770
1605 10
10
~ =
+
+

Mark G. Haug
Case-Control Study (Retrospective Study)
Theory:
Asbestos causes
lung cancer.
H
O
: No Association

Mark G. Haug
Case-Control Study: Diagram
Was there a difference
in asbestos exposure
between these two
groups prior to this
study?
Volunteers
The entire population
of people
People with
lung cancer
People without
lung cancer

Mark G. Haug
Case-Control Study
Lung No Lung
Cancer Cancer
No Exposure 71 102
Exposure 44 17

Mark G. Haug
Case-Control Study
Lung No Lung
Cancer Cancer
No Exposure 71 102
Exposure 44 17
Odds Ratio (OR):
7 . 3
102
17
71
44
~

Mark G. Haug
Reading:

Blackboard:
Short Story from Mark Twain.doc

Mark G. Haug
Causation

1. Clinical Trials: YES, but limited
2. Cohort Studies: NO
3. Case-Control Studies: NO

Mark G. Haug
Three types of an association
1. Causation: A causes B

2. Common Response: Changes in A and B are
caused by changes in C, which may or may not
be known.

3. Confounding: Changes in B may be caused
by changes in A and by changes in C, which
may or may not be known.

Mark G. Haug
1. Causation: A causes B
A B
A finding of association between A and B

Mark G. Haug
2. Common Response:
Changes in A and B are caused by changes in C,
which may or may not be known.
A B
C

Mark G. Haug
3. Confounding:
Changes in B may be caused by changes in A and by
changes in C, which may or may not be known.
A B
C
A
s
s
o
c
i
a
t
i
o
n

b
e
t
w
e
e
n

A

a
n
d

C

Mark G. Haug

(Finding Causation in an Association)

Mark G. Haug
Correlation
When talking about correlation in statistics, we are
usually talking about a relationship that is linear
Example:
Weight
Height

Mark G. Haug
The correlation coefficient, (rho) -- called r for
sample data -- can take on any value from -1 to 1:
-1 0 1
Perfect
inverse
linear
relationship
No linear
relationship
Perfect
linear
relationship
Partial
inverse
linear
relationship
Partial
linear
relationship

Mark G. Haug
Y
X

Mark G. Haug
Y
X

Mark G. Haug
Y
X

Mark G. Haug
Y
X

Mark G. Haug
Y
X

Mark G. Haug
Y
X

Mark G. Haug
Y
X

Mark G. Haug
Y
X

Mark G. Haug
r = -0.27
1970
Draft
Lottery

Mark G. Haug
Reading: p. 264
Section 14.2

Mark G. Haug
The (linear) correlation coefficient, r
yx
:
( )
( )
( )
( )
( )
2 / 1
2
2
2 / 1
2
2
yx
n
y
y
n
x
x
n
y x
xy
r
(
(

(
(
(
(

Mark G. Haug
The (linear) correlation coefficient, r
yx
:
An Example:
Height (ft.) Weight (lbs.)
5.8 145
5.5 150
4.9 100
6.2 195
5.9 205
5.4 125
6.1 215
5.6 165

Mark G. Haug
X = Height (ft.) Y = Weight (lbs.) X
2
Y
2
XY
5.8 145 33.64 21025 841
5.5 150 30.25 22500 825
4.9 100 24.01 10000 490
6.2 195 38.44 38025 1209
5.9 205 34.81 42025 1210
5.4 125 29.16 15625 675
6.1 215 37.21 46225 1312
5.6 165 31.36 27225 924
45.4 1300 258.88 222650 7485
Ex Ey E(x
2
) E(xy) E(y
2
)

Mark G. Haug
( )
( )
( )
( )
( )
91 . 0
8
) 1300 (
222650
8
) 4 . 45 (
88 . 258
8
) 1300 )( 4 . 45 (
7485
n
y
y
n
x
x
n
y x
xy
r
2 / 1
2
2 / 1
2
2 / 1
2
2
2 / 1
2
2
yx
~
(

=
(
(

(
(
(
(

Mark G. Haug
H
O
: = 0
( )
38 . 5
) 91 . 0 ( 1
2 8 ) 91 . 0 (
r 1
2 n r
t
2 2
yx
yx
~

=
The degrees of freedom for t is (n-2)

Mark G. Haug
Three types of correlation
1. Causation

2. Common Response

3. Confounding

Mark G. Haug
Covariance
ij
=
ij
j

Market Asset A Asset B
Condition
Good $1.16 $1.01
Average $1.10 $1.10
Poor $1.04 $1.19
Adapted from Elton & Gruber,
Modern Portfolio Theory and Investment Analysis, 5e

Mark G. Haug
Probability
Making Sound
Decisions
Correlation
Modeling
Collecting Data
Describing Data
Natural
Phenomena

Mark G. Haug
Regression

Simple Linear Regression (SLR)
Three Types of Uncertainty
Method
Criterion for Establishing SLR (and MLR below)
Excel

Statistics
Multiple Linear Regression
Excel

Statistics - the bright side
Statistics - the dark side
Applications
The Market Model and BETA
Transformations
Calculating the Rate of Return for an Investment Portfolio
Operations Strategy
Cautionary Tale about Where do babies come from?
Operational Costs for Commercial Airlines
Advertising Case

Mark G. Haug
Simple Linear Regression
Weight
Height
x

y
1 0
| + | =

Mark G. Haug
x

y

1 0
| + | =
y = b + m x
y = a + b x
Intercept Slope

Mark G. Haug
Weight
Height
x

y
1 0
| + | =

Mark G. Haug
Regression

Method
Excel

Statistics
Excel

Applications
Transformations
Operations Strategy
Advertising Case

Mark G. Haug
Sales
Time

Mark G. Haug
Reading:
The Use and Misuse of Statistics
Read Section 3

Mark G. Haug
Regression

Method
Excel

Statistics
Excel

Applications
Transformations
Operations Strategy
Advertising Case

Mark G. Haug
Weight
Height
x

y
1 0
| + | =

Mark G. Haug
If you estimate |
0
and |
1
with the following
formulae, you will satisfy the least squares
criterion.
| |
| |
x
) x x (
) y y )( x x (
1 0
n i
1 i
2
i
n i
1 i
i i
1
| = |

= |
=
=
=
=

Mark G. Haug
X = Ht. (ft.) Y = Wt. (lbs.) X - 5.675 Y - 162.5 (X-5.675)(Y-162.5) (X - 5.675)
2
5.8 145 0.13 -18 -2.19 0.0156
5.5 150 -0.18 -13 2.19 0.0306
4.9 100 -0.77 -63 48.44 0.6006
6.2 195 0.53 33 17.06 0.2756
5.9 205 0.23 43 9.56 0.0506
5.4 125 -0.27 -38 10.31 0.0756
6.1 215 0.43 53 22.31 0.1806
5.6 165 -0.08 3 -0.19 0.0056
mean = 5.675 mean = 162.5 107.5000 1.2350

Mark G. Haug
| |
| |
331 ) 675 . 5 )( 87 ( 5 . 162
87
235 . 1
5 . 107
) x x (
) y y )( x x (
0 1
1 0
n i
1 i
2
i
n i
1 i
i i
1
~ = | ~ = |
| = |

= |
=
=
=
=

Mark G. Haug
Weight
Height
x

y
1 0
| + | =
( ) ( ) ht 87 331 wt + ~

Mark G. Haug
Regression

Method
Excel

Statistics
Excel

Applications
Transformations
Operations Strategy
Advertising Case

Mark G. Haug
Weight
Height
2
i
) y y (
( )
=
=

n i
1 i
2
i
y y minimizes that
, y line, the Find

Mark G. Haug
Ordinary Least Squares (OLS) Regression
Height Weight
5.8 145
5.5 150
4.9 100
6.2 195
5.9 205
5.4 125
6.1 215
5.6 165
Predicted Wt
[ Wt = -331 + 87 (Ht) ]
173.38
147.27
95.04
208.20
182.09
138.56
199.49
155.97
Difference
28.38
-2.73
-4.96
13.20
-22.91
13.56
-15.51
-9.03
Difference between actual weight and predicted weight.
Diff Sq
805.46
7.47
24.60
174.20
525.10
183.95
240.44
81.51
Difference squared. Why?
2042.71
Sum of each difference squared.

Mark G. Haug
Regression

Method
Excel

Statistics
Excel

Applications
Transformations
Operations Strategy
Advertising Case

Mark G. Haug
Regression

Method
Excel

Statistics
Excel

Applications
Transformations
Operations Strategy
Advertising Case

Mark G. Haug
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.91
R Square 0.82
Adjusted R Square 0.79
Standard Error 18.45
Observations 8
ANOVA
df SS MS F Significance F
Regression 1 9357.2874 9357.2874 27.4849 0.0019
Residual 6 2042.7126 340.4521
Total 7 11400
Coefficients Standard Error t Stat P-value L 95% U 95% L 95% U 95%
Intercept -331.48 94.4493 -3.5096 0.0127 -562.59 -100.37 -562.59 -100.37
Height 87.04 16.6033 5.2426 0.0019 46.42 127.67 46.42 127.67

Mark G. Haug
Multiple R 0.91
R Square 0.82
Observations 8
Significance F
0.0019
Adjusted R Square: The amount of variation in Y explained by X.
In this case,
79% of the variation in Y (weight) can be explained by X (height).

Standard Error:
a measure of the variability between the OLS line and the data points.

Significance F: This is the p-value for the null hypothesis that the
model as a whole is irrelevant.

Mark G. Haug
Coefficients
Intercept -331.48
Height 87.04
P-value
0.0127
0.0019
Coefficients are the estimated values for
0
and
1
. In this
case
0
is Intercept and
1
is Height.

The p-value for each coefficient is the result of a hypothesis
test for H
O
:
i
= 0 where
i represents
i=0
(in the case of the intercept),
i represents
i=1
(in the case of the slope), and
any other subscript that is used in the model (covered later).

Mark G. Haug
SUMMARY OUTPUT
Multiple R 0.91
R Square 0.82
Observations 8
ANOVA
df SS MS F Significance F
Regression 1 9357.2874 9357.2874 27.4849 0.0019
Residual 6 2042.7126 340.4521
Total 7 11400
Coefficients Standard Error t Stat P-value L 95% U 95% L 95% U 95%
Intercept -331.48 94.4493 -3.5096 0.0127 -562.59 -100.37 -562.59 -100.37
Height 87.04 16.6033 5.2426 0.0019 46.42 127.67 46.42 127.67
p-value for slope coefficient will be equal to p-value for model

for simple linear regression (SLR) only

Mark G. Haug
Reading: pp. 265-267, 268-269, 271
Sections 14.3-14.6, 14.10, 14.14

Homework: p. 272
Problem 14.1 Data:
Do 1, 2, and 3 by hand & 1 and 3 by Excel:
1. Calculate correlation coefficient
2. Determine whether correlation coefficient is
statistically significant
3. Determine the simple linear regression equation
for these data

Mark G. Haug
Regression

Method
Excel

Statistics
Excel

Applications
Transformations
Operations Strategy
Advertising Case

Mark G. Haug
(???)
) height (

y
x

y
2 1 0
2 2 1 1 0
| + | + | =
| + | + | =

Mark G. Haug
Regression

Method
Excel

Statistics
Excel

Applications
Transformations
Operations Strategy
Advertising Case

Mark G. Haug
Multiple R 0.99
R Square 0.98
Standard Error 7.45
Observations 8
Significance F
0.000092

Mark G. Haug
Coefficients
Intercept -234.29
Height 41.07
Waist 4.90
P-value
0.0025
0.0115
0.0024

Mark G. Haug
Regression Statistics Significance F
Multiple R 0.91 0.0019
R Square 0.82
Observations 8
Coefficients P-value
Intercept -331.48 0.0127
Height 87.04 0.0019
H
e
i
g
h
t

O
n
l
y

Multiple R 0.99 0.000092
R Square 0.98
Standard Error 7.45
Observations 8
Intercept -234.29 0.0025
Height 41.07 0.0115
Waist 4.90 0.0024
H
e
i
g
h
t

a
n
d

W
a
i
s
t

Mark G. Haug

Homework:
Blackboard HW Regression Calories.doc

Mark G. Haug
Regression

Method
Excel

Statistics
Excel

Applications
Transformations
Operations Strategy
Advertising Case

Mark G. Haug
Multiple R 0.91 0.0019
R Square 0.82
Observations 8
Intercept -331.48 0.0127
Height 87.04 0.0019
H
e
i
g
h
t

O
n
l
y

Multiple R 0.99 0.000092
R Square 0.98
Standard Error 7.45
Observations 8
Intercept -234.29 0.0025
Height 41.07 0.0115
Waist 4.90 0.0024
H
e
i
g
h
t

a
n
d

W
a
i
s
t

Mark G. Haug
Regression

Method
Excel

Statistics
Excel

Applications
Transformations
Operations Strategy
Advertising Case

Mark G. Haug
x y
10 8.04
8 6.95
13 7.58
9 8.81
11 8.33
14 9.96
6 7.24
4 4.26
12 10.84
7 4.82
5 5.68
A
Adj R Square Significance F
Standard Error
Intercept
x

Mark G. Haug
x y
10 9.14
8 8.14
13 8.74
9 8.77
11 9.26
14 8.1
6 6.13
4 3.1
12 9.13
7 7.26
5 4.74
B
Standard Error
Intercept
x

Mark G. Haug
Standard Error
Intercept
x
x y
10 7.46
8 6.77
13 12.74
9 7.11
11 7.81
14 8.84
6 6.08
4 5.39
12 8.15
7 6.42
5 5.73
C

Mark G. Haug
x y
8 6.58
8 5.76
8 7.71
8 8.84
8 8.47
8 7.04
8 5.25
19 12.5
8 5.56
8 7.91
8 6.89
D
Standard Error
Intercept
x

Mark G. Haug
Regression

Method
Excel

Statistics
Excel

Applications
Transformations
Operations Strategy
Advertising Case

Mark G. Haug
Regression

Method
Excel

Statistics
Excel

Applications
Transformations
Operations Strategy
Advertising Case

Mark G. Haug
R
i
= The return on a stock, stock i
i
= Represents the component of the stock i return
that is independent of the market
i
= Represents the component of the stock i return
that is dependent on the market (The BETA)
R
m
= The return of the stock market
i m i i i
) R ( E ) R ( E c + | + o =
i
= Random variable that represents the uncertainty
in the component of the stock i return that is
independent of the market

Mark G. Haug
Homework:

Blackboard:
data for semilog and market model hw.xls
(problem #2 only)

Mark G. Haug
Regression

Method
Excel

Statistics
Excel

Applications
Transformations
Operations Strategy
Advertising Case

Mark G. Haug
Regression

Method
Excel

Statistics
Excel

Applications
Transformations
Operations Strategy
Advertising Case

Mark G. Haug
Year S&P 500
1976 $10,000
1977 $9,281
1978 $9,886
1979 $11,710
1980 $15,509
1981 $14,753
1982 $17,924
1983 $21,949
1984 $23,315
1985 $30,692
1986 $36,407
1987 $38,292
1988 $44,610
1989 $58,702
1990 $56,871
1991 $74,159
1992 $79,804
1993 $87,832

Mark G. Haug
( )
t
0 t
i 1 P P + =
P
t
The principal after t periods of time.

P
0
The principal after t=0 periods of time.
Here t=0, meaning this is the initial amount
invested.

i The fixed rate of return for the investment.

t The number of periods of time.

Mark G. Haug
( )
t
0 t
i 1 P P + =
( )
( ) ( ) ( )
( ) ( ) ( )
x y
i 1 ln t P ln P ln
i 1 P ln P ln
i 1 P P
1 0
0 t
t
0 t
t
0 t
| + | =
+ + =
+ =
+ =
( )
( )
( ) 1 e i
i 1 e
i 1 ln
1
1
1
=
+ =
+ = |
|
|

Mark G. Haug
Year S&P 500
1976 $10,000
1977 $9,281
1978 $9,886
1979 $11,710
1980 $15,509
1981 $14,753
1982 $17,924
1983 $21,949
1984 $23,315
1985 $30,692
1986 $36,407
1987 $38,292
1988 $44,610
1989 $58,702
1990 $56,871
1991 $74,159
1992 $79,804
1993 $87,832
Period
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
ln(P
t
)
9.21
9.14
9.20
9.37
9.65
9.60
9.79
10.00
10.06
10.33
10.50
10.55
10.71
10.98
10.95
11.21
11.29
11.38
( ) ( ) ( )
( )
t x
P ln y
x y
i 1 ln t P ln P ln
t
1 0
0 t
=
=
| + | =
+ + =

Mark G. Haug
Period
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
ln(P
t
)
9.21
9.14
9.20
9.37
9.65
9.60
9.79
10.00
10.06
10.33
10.50
10.55
10.71
10.98
10.95
11.21
11.29
11.38
Coefficients
Intercept 9.006049
Period 0.142524
( )
( )
( )
% 32 . 15 i
1 1532 . 1 i
1 e i
1 e i
142524 . 0
1
=
=
=
=
|

Mark G. Haug
Homework:

Blackboard:
data for semilog and market model hw.xls
(problem #1 only)

Mark G. Haug
Regression

Method
Excel

Statistics
Excel

Applications
Transformations
Operations Strategy
Advertising Case

Mark G. Haug
Temperature
W
i
n
n
i
n
g

T
i
m
e

Cold Hot

Mark G. Haug
Adjusted R Square 0.56 Significance F
Standard Error 1.18 0.0004
Intercept 148.5749 0.0000
Temp -0.7128 0.1042
Temp SQ 0.0066 0.0586
Intercept 126.1854 0.0000
Temp SQ 0.0010 0.0003

Mark G. Haug
Intercept 148.5749 0.0000
Temp -0.7128 0.1042
Temp SQ 0.0066 0.0586

Mark G. Haug
Temperature
W
i
n
n
i
n
g

T
i
m
e

Mark G. Haug
Year Temp Men Women
1978 75 132.2 152.5
1979 80 131.7 147.6
1980 50 129.7 145.7
1981 54 128.2 145.5
1982 52 129.5 147.2
1983 59 129.0 147.0
1984 79 134.9 149.5
1985 72 131.6 148.6
1986 65 131.1 148.1
1987 64 131.0 150.3
1988 67 128.3 148.1
1989 56 128.0 145.5
1990 73 132.7 150.8
1991 57 129.5 147.5
1992 51 129.5 144.7
1993 73 130.1 146.4
1994 70 131.4 147.6
1995 62 131.0 148.1
1996 49 129.9 148.3
1997 61 128.2 148.7
1998 55 128.8 145.3
Temp Temp SQ
75 5625
80 6400
50 2500
54 2916
52 2704
59 3481
79 6241
72 5184
65 4225
64 4096
67 4489
56 3136
73 5329
57 3249
51 2601
73 5329
70 4900
62 3844
49 2401
61 3721
55 3025

Mark G. Haug
Intercept 128.9118 0.0000
Temp 0.4809 0.4125
Temp SQ -0.0028 0.5358
Intercept 144.0209 0.0000
Temp SQ 0.0009 0.0046
Intercept 140.2085 0.0000
Temp 0.1198 0.0039

Mark G. Haug
Homework:

National Highway Traffic Safety Administration
accident rate as a function of drivers age:

f(y) = 60.0 - 2.28(y) + 0.0232(y
2
)
y is age, 16 s y s 85

Find the estimated age when
accident rate is a minimum.

Answer: 49

Mark G. Haug
Regression

Method
Excel

Statistics
Excel

Applications
Transformations
Operations Strategy
Advertising Case

Mark G. Haug
Country Storks (Pairs) Human Birth Rate (1000/yr)
Albania 100 83
Austria 300 87
Belgium 1 118
Bulgaria 5000 117
Denmark 9 59
France 140 774
Germany 3300 901
Greece 2500 106
Holland 4 188
Hungary 5000 124
Italy 5 551
Poland 30000 610
Portugal 1500 120
Romania 5000 367
Spain 8000 439
Switzerland 150 82
Turkey 25000 1576

Mark G. Haug
) storks (

rate birth human
x

y
1 0
1 1 0
| + | =
| + | =
0.34
332.19 0.0079
225.0287 0.0295
0.0288 0.0079
Adjusted R Square Significance F
Standard Error
Intercept
Storks

Mark G. Haug
Regression

Method
Excel

Statistics
Excel

Applications
Transformations
Operations Strategy
Advertising Case

Mark G. Haug
Model Passenger Seats Flight Speed (miles/hr) Length of Flight (miles) Cost per Hour (dollars)
B747-100 410 518 2882 6567
B747-400 400 539 5063 7075
B747-200/300 369 529 3231 7790
L-1011-100/200 305 498 1363 5081
B-777 291 513 2451 4194
DC-10-10 286 498 1493 5092
DC-10-40 284 504 1963 4684
DC-10-30 272 516 2379 5859
A300-600 266 467 1126 5123
MD-11 260 524 3253 6335
L1011-500 222 523 2995 4764
B767-300ER 216 495 2331 3616
All Numbers Are Based Upon Averages
More seats equate to greater costs per hour:
1. Larger aircraft (fixed)
a. cost of airplane
b. larger staff to operate
c. higher insurance
d. etc
2. Heavier--thus more fuel (fixed and variable)
3. In flight service (variable)

Mark G. Haug
) seats (

cost
x

y
1 0
1 1 0
| + | =
| + | =
Cost = $1,136.34 + $14.67 (No. of Seats)
0.53
845.31 0.0045
1136.3405 0.3759
14.6730 0.0045
Standard Error
Intercept
Seats

Mark G. Haug
Model Passenger Seats Flight Speed (miles/hr) Length of Flight (miles) Cost per Hour (dollars)
B747-100 410 518 2882 6567
B747-400 400 539 5063 7075
B747-200/300 369 529 3231 7790
L-1011-100/200 305 498 1363 5081
B-777 291 513 2451 4194
DC-10-10 286 498 1493 5092
DC-10-40 284 504 1963 4684
DC-10-30 272 516 2379 5859
A300-600 266 467 1126 5123
MD-11 260 524 3253 6335
L1011-500 222 523 2995 4764
B767-300ER 216 495 2331 3616
All Numbers Are Based Upon Averages
More seats equate to greater costs per hour:
1. Larger aircraft (fixed)
a. cost of airplane
b. larger staff to operate
c. higher insurance
d. etc
2. Heavier--thus more fuel (fixed and variable)
3. In flight service (variable)

Mark G. Haug
) seats (

cost
x

y
1 0
1 1 0
| + | =
| + | =
0.32
587.34 0.1091
13604.3388 0.0255
-29.9707 0.1091
Standard Error
Intercept
Seats

Mark G. Haug
Model Passenger Seats Cost per Hour (dollars)
B747-100 410 6567
B747-400 400 7075
B747-200/300 369 7790
L-1011-100/200 305 5081
B-777 291 4194
DC-10-10 286 5092
DC-10-40 284 4684
DC-10-30 272 5859
A300-600 266 5123
MD-11 260 6335
L1011-500 222 4764
B767-300ER 216 3616

Mark G. Haug
Regression

Method
Excel

Statistics
Excel

Applications
Transformations
Operations Strategy
Advertising Case

Mark G. Haug
Observation Network Month Day Rating Fact Stars
Previous
Rating
Competition
1 BBS 1 1 15.6 0 1 14.2 14.5
2 BBS 1 7 10.8 1 0 15.3 17.2
3 BBS 1 7 14.1 0 1 13.8 14.4
4 BBS 1 1 16.8 1 1 12.8 15.3
5 BBS 2 1 14.3 1 1 12.4 13.3
6 BBS 2 1 17.1 1 1 12.9 15.1
7 BBS 3 1 8.9 0 0 10.8 14.9
8 BBS 3 7 16.2 1 0 13.3 11.6
9 BBS 4 7 9.4 0 1 12.3 12.8
10 BBS 5 1 10.2 0 1 10.7 15.6
11 BBS 5 7 9.4 0 0 10.7 14.5
12 BBS 5 1 12.1 0 1 10.1 15.6
13 BBS 5 1 10.7 1 0 8.6 17.0
14 BBS 9 7 15.0 1 0 9.8 8.2
15 BBS 9 7 10.2 0 0 11.7 13.5
16 BBS 9 7 10.3 0 1 10.1 15.2
17 BBS 10 7 10.8 0 1 10.9 13.1
18 BBS 10 7 14.4 1 0 15.9 12.6
19 BBS 11 7 14.4 1 1 12.1 14.2
20 BBS 11 7 13.6 1 0 11.4 11.9
21 ABN 1 7 14.6 0 0 19.3 14.4
22 ABN 1 2 10.8 0 1 16.3 15.2
23 ABN 1 7 16.2 0 0 20.1 14.4
24 ABN 1 2 12.8 0 0 14.8 13.1
25 ABN 1 7 16.0 0 1 19.3 13.5
More Data: 88 Observations Total

Mark G. Haug
Regression

Method
Excel

Statistics
Excel

Applications
Transformations
Operations Strategy
Advertising Case

Dsci 301 Blackboard

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Dsci 301 Blackboard

Încărcat de

Drepturi de autor:

Formate disponibile

Mark G.

Sample Standard Deviation:

Example: Flip a coin

You can also

Example: Based on a sample, n=100, you

Example: How large must a sample be if I

Var( then , when is which

S-ar putea să vă placă și