Documente Academic
Documente Profesional
Documente Cultură
Lecture 1.3
Men willingly believe what they wish. --Julius Caesar (100-44 BC)
Data
Population
Parameter
Response
#
=
Explanatory
Variable
________
Categorical
(YES/NO,
Success/Failure,
True/False)
Is the true population proportion of adults who believe in life after death is more than 70%?
Large Sample Hypothesis Test for Population Proportion H0: p=p0
Decision Rule: Ha: p>p0 !
Ha: pp0 !/
Ha: p<p0 !
Assumptions
Large n
--for CI check:
,
( )
--for HT check:
! ,
( )
1 Sample
Sample
Statistics
Test Statistic
p p0
Z=
p0 (1 p0 )
n
where
p =
X
n
Confidence Interval
p(1
p)
n
2
" Z1( /2) %
$
ni = p(1 p)
'
# E &
p Z1( /2)
!!
= = !! !! 1 ! !!! =
1 ! !!! , where
= !! !!! !
!
Explanatory
Variable
2 Independent
Categorical
Group (1,2)
Samples
(YES/NO,
(Gender,
Success/Failure,
Student/NonTrue/False)
Students)
Is there a significant difference in proportions of athletes among male and female students at BU?
Large Sample Hypothesis Test for Difference in Population Proportions H0: p1-p2=0
Decision Rule: Ha: p1>p2 !
Ha: p1p2 !/
Ha: p1<p2 !
Assumptions
Large n1, n2
and
( )
and
( )
Data
Population
Parameter
! !
Sample
Statistics
! !
Test Statistic
p 1 p 2
Z=
1 1
p (1 p ) +
n1 n2
where p
1 = X 1 / n1 , p 2 = X 2 / n2
p =
X1 + X 2
n1 + n2
Response
Confidence Interval
( p 1 p 2 ) Z1( / 2)
p 1 (1 p 1 ) p 2 (1 p 2 )
+
n1
n2
pq 2Z1( / 2) + p1 q1 + p 2 q 2 Z1
ni =
ES
p=
p1 + p 2
,q = 1 p
2
Chi-Square Tests
Name of the Test
Goodness-of-Fit Test
(how well
categorical data
FITS expected
distribution)
Description
Uses Multinomial Distribution:
response is the choice of a
category from more than 2 possible
categories
H0 involves statements about what
the proportions should be for the k
response categories
Examples
Do people park equally
often on all 4 levels of a
parking garage on rainy days?
Do students register
equally likely for morning,
afternoon, and evening
sessions?
Test of
Independence
(if two categorical
variables can be
considered
INDEPENDENT)
Chi-Square statistic:
=
, reject H0 if ()
Drugs
Sex
Stress
Education
Total
52
38
21
n=120
= 0.05
To compute the chi-square statistics, extend the table of observed counts and compute expected counts:
Topic Issue:
Number of calls =Oi
(Expected Counts) =
(Oi - Ei)2/Ei
Drugs
Sex
52
38
120*0.4 =48
Stress
21
Education
9
Total
n=120
120*0.25=
(52- 48)2/48=
0.33
2.133
Exercise:
According to M&Ms web site each regular package of Milk Chocolate M&Ms should contain 24%
blue, 14% brown, 16% green, 20% orange, 13% red, and 14% yellow M&Ms. Count candies, and use
an appropriate test of hypotheses to check if the claimed percentage is consistent with the stated
proportion distribution.
Step 1: Define parameter of interest and state the hypothesis.
H0:________________________________________________
Ha:________________________________________________
Step 2: Summarize the data into an appropriate test statistic:
First, create table of observed and expected counts:
Colors
Blue
Brown
Green
Orange
H0
0.24
0.14
0.16
0.2
Oi
Ei
=_____________
Red
0.13
Yellow
0.14
2 =
df =
Step 3: Assuming the H0 is true, define decision rule:
Step 4: Decide whether or not the result is statistically significant based on rejection region:
Step 5: Report the conclusion in the context of the problem (question of interest).
Based on a random sample of n= ___________, there is __________ significant
evidence, at level = __________, to conclude that_______________________
_________________________________________________________________
6
Exercise: As part of an on-going study, men who quit smoking using a variety of methods are
being followed for several years. A group of 350 men (n=350) who quit smoking using a nicotine
patch were tracked down 3 years after their quitting date. They were asked if they had successfully quit
or if they had gone back to smoking. The possible answers and number of men who answered the
question that way are given below.
Outcome
# responses Oi
I
haven't
smoked
since (#1)
188
I smoke as
much now as I
did before (#3)
35
I smoke
Total
more now than I
did before (#4)
111
16
=
(Oi - Ei)2/Ei
The researchers wish to test the following hypothesis:
H0: p1 = 0.60, p2 = 0.10, p3 = 0.25, p4 = 0.05
Step 1: Define parameter of interest and state the hypothesis.
Parameter:
H0:________________________________________________
Ha: ________________________________________________ = 0.05
Step 2: Summarize the data into an appropriate test statistic:
=
with df = k-1 =
Step 4: Decide whether or not the result is statistically significant based on rejection region:
Step 5: Report the conclusion in the context of the problem (question of interest).
Based on a random sample of n= ___________, there is __________ significant evidence, at
level = __________, to conclude that_____________________________________________
7
350
Test of Independence
A fundamental question: Is there a relationship between the two variables so that the chance
that an individual falls into a particular category for one variable depends upon the particular
category they fall into for the other variable? A procedure for assessing the statistical significance
of a relationship between categorical variables is the chi-square test of independence.
Chi-Square Test of Independence
=
, reject H0 if () , df = (R-1)(C-1)
Example:
Are angry people more likely to have Coronary Heart Disease? Coronary heart disease (CHD) is
a narrowing of the small blood vessels that supply blood and oxygen to the heart. CHD is also called
coronary artery disease.
People who get angry easily tend to be more likely to have heart disease. That is the conclusion of a
study that followed a random sample of 12,986 people from three locations over about four years. All
subjects were free of heart disease at the beginning of the study. The subjects
took the Spielberger Trait Anger Scale, which measures how prone a person
is to sudden anger. The 8474 people in the sample who had normal blood
pressure were classified according to whether they had coronary heart
disease (CHD) or not and whether they had low anger, moderate anger, or
high anger according toCHD
the Anger
Scale.Crosstabulation
* TEMPER
Count
CHD
Total
CHD
No CHD
Low anger
53
3057
3110
TEMPER
Moderate
anger
110
4621
4731
High anger
27
606
633
Total
190
8284
8474
CHD
CHD
No CHD
Total
Low anger
53
3057
3110
TEMPER
Moderate
anger
110
4621
4731
High anger
27
606
633
Total
190
8284
8474
=
=
= .
CHD
CHD
No CHD
Total
Count
Expected Count
Count
Expected Count
Count
Expected Count
Low anger
53
69.7
3057
3040.3
3110
3110.0
TEMPER
Moderate
anger
110
106.1
4621
4624.9
4731
4731.0
High anger
27
14.2
606
618.8
633
633.0
Total
190
190.0
8284
8284.0
8474
8474.0
18
Not Sleep
Deprived
Not
Stressed
7
Total
12
Total
=_____________
df =
Step 3: Assuming the H0 is true, define decision rule:
Step 4: Decide whether or not the result is statistically significant based on rejection region:
Step 5: Report the conclusion in the context of the problem (question of interest).
Based on a random sample of n= ___________, there is __________ significant
evidence, at level = __________, to conclude that_______________________
_________________________________________________________________
11