Week1LectureNotes3 PDF

CAS MA 116 Statistics II Summer 2, 2012 Lecture 1.
Lecture 1.3
Men willingly believe what they wish. --Julius Caesar (100-44 BC)
1. Chi-Square Test of Goodness- of- Fit

2. Conditional Probability. Independence. Chi-Square Test: Test of Independence
5 Basic Steps in Hypothesis Test

Step 1: Determine the null (H0) and alternative (Ha) hypotheses.
Note: Hypotheses are statements ABOUT population parameters NOT ABOUT sample statistics.
Step 2: Verify necessary data conditions (assumptions), and if met, summarize the data into an
appropriate test statistic (using appropriate data summary, or sample statistic).
Step 3: Assuming the null (H0) hypothesis is true, find either rejection region or the p-value.
Step 4: Decide whether or not the result is statistically significant based on rejection region:
or based on p-value, the probability of getting a test statistic as extreme or more extreme (in the
direction of Ha) than the observed value of the test statistic, assuming H0 is true.
Step 5: Report the conclusion in the context of the problem (question of interest).
One-Sample Hypothesis Test for Population Proportion (Large n)

Test
Scenario
Population
Proportion
Data
Population
Parameter
Response
#
=
Explanatory
Variable
________
Categorical
(YES/NO,
Success/Failure,
True/False)
Is the true population proportion of adults who believe in life after death is more than 70%?
Large Sample Hypothesis Test for Population Proportion H0: p=p0
Decision Rule: Ha: p>p0 !
Ha: pp0 !/
Ha: p<p0 !
Assumptions
Large n
--for CI check:
,
( )
--for HT check:
! ,
( )
1 Sample
Sample
Statistics
Test Statistic
p p0
Z=
p0 (1 p0 )
n
where
p =
X
n
Confidence Interval
p(1
p)
n
2
" Z1( /2) %
$
ni = p(1 p)
'
# E &
p Z1( /2)
CAS MA 116 Statistics II Summer 2, 2012 Lecture 1.3
One-Sample Hypothesis Test for Population Proportion (small n)

(Binomial Test)
Small Sample Hypothesis Test for Population Proportion H0: p=p0
Required assumption: Data are a random sample from Binomial population.
Test Statistics: X (observed # successes), X ~ Binomial(n, p0)
Decision rule is based on p-value:
Ha: p>p0 p-value = P(Xx) = P(X=x)+P(X=x+1) +...+ P(X=n)
Ha: p<p0 p-value = P(Xx) = P(X=0)+P(X=1) +...+ P(X=x)
Ha: pp0 p-value = P(Xx) =1- P(X=x)
If the p-value , then reject H0.

If the p-value >, then fail to reject H0.
Recall Probability of exactly k successes in trials:
!
!!
= = !! !! 1 ! !!! =
1 ! !!! , where
= !! !!! !
!
Two-Sample Hypothesis Test for Difference in Population Proportions

(Large n1, n2)
Test
Scenario
Difference in
Population
Proportions
Explanatory
Variable
2 Independent
Categorical
Group (1,2)
Samples
(YES/NO,
(Gender,
Success/Failure,
Student/NonTrue/False)
Students)
Is there a significant difference in proportions of athletes among male and female students at BU?
Large Sample Hypothesis Test for Difference in Population Proportions H0: p1-p2=0
Decision Rule: Ha: p1>p2 !
Ha: p1p2 !/
Ha: p1<p2 !
Assumptions
Large n1, n2
and
( )
and
( )
Data
Population
Parameter
! !
Sample
Statistics
! !
Test Statistic
p 1 p 2
Z=
1 1
p (1 p ) +
n1 n2
where p
1 = X 1 / n1 , p 2 = X 2 / n2
p =
X1 + X 2
n1 + n2
Response
Confidence Interval
( p 1 p 2 ) Z1( / 2)
p 1 (1 p 1 ) p 2 (1 p 2 )
+
n1
n2
pq 2Z1( / 2) + p1 q1 + p 2 q 2 Z1
ni =
ES
where ES=|p2-p1| (under Ha)
p=
p1 + p 2
,q = 1 p
2
Chi-Square Tests
Name of the Test
Goodness-of-Fit Test
(how well
categorical data
FITS expected
distribution)
Description
Uses Multinomial Distribution:
response is the choice of a
category from more than 2 possible
categories
H0 involves statements about what
the proportions should be for the k
response categories
Examples
Do people park equally
often on all 4 levels of a
parking garage on rainy days?
Do students register
equally likely for morning,
afternoon, and evening
sessions?
H0: p1 = p01, p2 = p02,... , pk = p0k

Ha: H0 is false
Note: all proportions add up to 1.
!
!!! !! = 1
n Consider 1 population and 2
categorical variables.
Test of
Independence
(if two categorical
variables can be
considered
INDEPENDENT)
n Test if the two categorical

variables appear to be related
(dependent) for a given
population of interest.
Are angry people more

likely to have Heart Disease?
Are sleep deprived
students are more likely to be
stressed?
H0: [variable 1] and [variable 2] are

INDEPENDENT
Ha: [variable 1] and [variable 2] are
DEPENDENT
Chi-Square statistic:
=
, reject H0 if ()
where Oi is the observed count and

Ei is the expected count under the corresponding null hypothesis.
Test of Goodness of Fit

The goodness of fit test is used to assess if one sample fits well with a specified distribution.
Probabilities for the all categories add up to 1.
Chi-Square Goodness-of-Fit Test
H0: p1 = p01, p2 = p02,... , pk = p0k
Test Statistics: =
Decision Rule: reject H0 if () , df = k-1

Oi is the observed count and
= is the expected count under the corresponding H0.
Example (DAgoustino, Example 7.10): Goodness of Fit for Teen Issues

Volunteers at a teen hotline have been assigned based on the assumption that 40% of all calls are
drug related, 25% are sex related (e.g., date rape), 25% are stress related, and 10% concern educational
issues. For this investigation, each call is classified into one category based on the primary issue raised
by the caller.
To test the hypothesis, the following data are collected from 120 randomly selected calls placed to
the teen hotline. Based on the data, is the assumption regarding the distribution of topic issues
appropriate?
Topic Issue:
Number of calls:
Drugs
Sex
Stress
Education
Total
52
38
21
n=120
Step 1: Define parameter of interest and state the hypothesis.

Parameter: pi = the proportion of calls related to (1 Drugs, 2- Sex, 3 Stress, 4 - Education)
H0: p1 = 0.40, p2 = 0.25, p3 = 0.25, p4 = 0.10
Ha: H0 is false
Significance Level
= 0.05
Step 2: Summarize the data into an appropriate test statistic:

=
with k = 4 (number of categories), = =
To compute the chi-square statistics, extend the table of observed counts and compute expected counts:
Topic Issue:
Number of calls =Oi
(Expected Counts) =
(Oi - Ei)2/Ei
Drugs
Sex
52
38
120*0.4 =48
Stress
21
Education
9
Total
n=120
120*0.25=
(52- 48)2/48=
0.33
2.133
2 = 0.33 + 2.13+ 2.70 +0.75 = 5.913
Step 3: Assuming the H0 is true, define decision:

Decision Rule:
Reject H0 if ! ! ()
Using Table B.5 ! = ! 3 = 7.81
Decision: Fail to Reject H0
Based on a random sample of n=120 phone calls, there is no significant evidence at 5%
significance level to conclude that volunteers at a teen hotline have been assigned inappropriately.
Exercise:
According to M&Ms web site each regular package of Milk Chocolate M&Ms should contain 24%
blue, 14% brown, 16% green, 20% orange, 13% red, and 14% yellow M&Ms. Count candies, and use
an appropriate test of hypotheses to check if the claimed percentage is consistent with the stated
proportion distribution.
H0:________________________________________________
Ha:________________________________________________
First, create table of observed and expected counts:
Colors
Blue
Brown
Green
Orange
H0
0.24
0.14
0.16
0.2
Oi
Ei
=_____________
Red
0.13
Yellow
0.14
Then compute test statistics
2 =
df =
Step 3: Assuming the H0 is true, define decision rule:
Based on a random sample of n= ___________, there is __________ significant
evidence, at level = __________, to conclude that_______________________
_________________________________________________________________
6
Exercise: As part of an on-going study, men who quit smoking using a variety of methods are
being followed for several years. A group of 350 men (n=350) who quit smoking using a nicotine
patch were tracked down 3 years after their quitting date. They were asked if they had successfully quit
or if they had gone back to smoking. The possible answers and number of men who answered the
question that way are given below.
Outcome
# responses Oi
I
haven't
smoked
since (#1)
188
I don't smoke much

anymore, but I
occasionally light up (#2)
I smoke as
much now as I
did before (#3)
35
I smoke
Total
more now than I
did before (#4)
111
16
=
(Oi - Ei)2/Ei
The researchers wish to test the following hypothesis:
H0: p1 = 0.60, p2 = 0.10, p3 = 0.25, p4 = 0.05
Parameter:
H0:________________________________________________
Ha: ________________________________________________ = 0.05
=
with df = k-1 =
Based on a random sample of n= ___________, there is __________ significant evidence, at
level = __________, to conclude that_____________________________________________
7
350
Test of Independence
A fundamental question: Is there a relationship between the two variables so that the chance
that an individual falls into a particular category for one variable depends upon the particular
category they fall into for the other variable? A procedure for assessing the statistical significance
of a relationship between categorical variables is the chi-square test of independence.
Chi-Square Test of Independence
=
, reject H0 if () , df = (R-1)(C-1)
H0: [variable 1] and [variable 2] are INDEPENDENT

Oi is the observed count and
= ( )/ is the expected count under the
corresponding H0.
Conditional Probability and Independence

Definition:
Recall that conditional probability is the probability of some event A, given the occurrence of some
other event B. Conditional probability is written | , and is read "the (conditional) probability of
A, given B" or "the probability of A under the condition B".
When in a random experiment the event B is known to have occurred, the possible outcomes of the
experiment are reduced to B, and hence the probability of the occurrence of A is changed from the
unconditional probability into the conditional probability given B.
Marginal probability is then the unconditional probability P(A) of the event A; that is, the probability
of A, regardless of whether event B did or did not occur.
The Conditional Probability Rule:
| =
Two random events A and B are (statistically) independent if and only if

= = ()()
Thus, if A and B are independent, then their joint probability can be expressed as a simple product of
their individual probabilities. Equivalently, for two independent events A and B with non-zero
probabilities:
| = , or | = .
In other words, if A and B are independent, then the conditional probability of A, given B is simply
the individual (marginal) probability of A alone; likewise, the probability of B given A is simply the
probability of B alone.
8
Example:
Are angry people more likely to have Coronary Heart Disease? Coronary heart disease (CHD) is
a narrowing of the small blood vessels that supply blood and oxygen to the heart. CHD is also called
coronary artery disease.
People who get angry easily tend to be more likely to have heart disease. That is the conclusion of a
study that followed a random sample of 12,986 people from three locations over about four years. All
subjects were free of heart disease at the beginning of the study. The subjects
took the Spielberger Trait Anger Scale, which measures how prone a person
is to sudden anger. The 8474 people in the sample who had normal blood
pressure were classified according to whether they had coronary heart
disease (CHD) or not and whether they had low anger, moderate anger, or
high anger according toCHD
the Anger
Scale.Crosstabulation
* TEMPER
Count
CHD
Total
CHD
No CHD
Low anger
53
3057
3110
TEMPER
Moderate
anger
110
4621
4731
High anger
27
606
633
Total
190
8284
8474
1. What proportion of sampled subjects had CHD?

(Answer: 0.022)
2. What proportion of High anger subjects had CHD?
(Answer: 0.043)
3. What proportion of Moderate anger subjects had CHD?
(Answer: 0.023)
4. What proportion of Low anger subjects had CHD?

(Answer: 0.017)
5. Do anger classification and coronary heart disease status seem to be independent?
Step 1: State the hypotheses.

H0:__Having CHD is independent of Level of Anger _____________
Ha: _______________________________________________________
Significance level =______0.05_______
* TEMPER
Step 2: Summarize CHD
the data
into anCrosstabulation
appropriate test statistic:
Count
CHD
CHD
No CHD
Total
Low anger
53
3057
3110
TEMPER
Moderate
anger
110
4621
4731
High anger
27
606
633
Total
190
8284
8474
NOTE: All cell counts must be 5.

First, compute expected counts:
= ( )/

=
=
= .
CHD * TEMPER Crosstabulation
CHD
CHD
No CHD
Total
Count
Expected Count
Count
Expected Count
Count
Expected Count
Low anger
53
69.7
3057
3040.3
3110
3110.0
TEMPER
Moderate
anger
110
106.1
4621
4624.9
4731
4731.0
High anger
27
14.2
606
618.8
633
633.0
Total
190
190.0
8284
8284.0
8474
8474.0
2 =16.077 with df = (R-1)(C-1)= (2-1)(3-1) =2

25.99
___________________________________________________________
Based on a random sample of n= __________, there is __________ significant
_________________________________________________________________
10
Exercise: Are sleep deprived students are more likely to be stressed?

Using the following data perform an appropriate test and conclude if there is a relationship
between level of stress and the lack of sleep of BU students at 5% significance level.
Stressed
Sleep Deprived
18
Not Sleep
Deprived
Not
Stressed
7
Total
12
Total

H0:________________________________________________
Ha:________________________________________________
=_____________

=
df =
Based on a random sample of n= ___________, there is __________ significant
_________________________________________________________________
11

Week1LectureNotes3 PDF

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Week1LectureNotes3 PDF

Încărcat de

Drepturi de autor:

Formate disponibile

CAS MA 116 Statistics II Summer 2, 2012 Lecture 1.

1. Chi-Square Test of Goodness- of- Fit

5 Basic Steps in Hypothesis Test

One-Sample Hypothesis Test for Population Proportion (Large n)

CAS MA 116 Statistics II Summer 2, 2012 Lecture 1.3

One-Sample Hypothesis Test for Population Proportion (small n)

If the p-value , then reject H0.

Two-Sample Hypothesis Test for Difference in Population Proportions

where ES=|p2-p1| (under Ha)

CAS MA 116 Statistics II Summer 2, 2012 Lecture 1.3

H0: p1 = p01, p2 = p02,... , pk = p0k

n Test if the two categorical

Are angry people more

H0: [variable 1] and [variable 2] are

where Oi is the observed count and

CAS MA 116 Statistics II Summer 2, 2012 Lecture 1.3

Test of Goodness of Fit

Decision Rule: reject H0 if () , df = k-1

= is the expected count under the corresponding H0.

Example (DAgoustino, Example 7.10): Goodness of Fit for Teen Issues

Step 1: Define parameter of interest and state the hypothesis.

Step 2: Summarize the data into an appropriate test statistic:

with k = 4 (number of categories), = =

CAS MA 116 Statistics II Summer 2, 2012 Lecture 1.3

2 = 0.33 + 2.13+ 2.70 +0.75 = 5.913

Step 3: Assuming the H0 is true, define decision:

CAS MA 116 Statistics II Summer 2, 2012 Lecture 1.3

Then compute test statistics

CAS MA 116 Statistics II Summer 2, 2012 Lecture 1.3

I don't smoke much

Step 3: Assuming the H0 is true, define decision rule:

CAS MA 116 Statistics II Summer 2, 2012 Lecture 1.3

H0: [variable 1] and [variable 2] are INDEPENDENT

Conditional Probability and Independence

Two random events A and B are (statistically) independent if and only if

CAS MA 116 Statistics II Summer 2, 2012 Lecture 1.3

1. What proportion of sampled subjects had CHD?

4. What proportion of Low anger subjects had CHD?

5. Do anger classification and coronary heart disease status seem to be independent?

CAS MA 116 Statistics II Summer 2, 2012 Lecture 1.3

Step 1: State the hypotheses.

NOTE: All cell counts must be 5.

CHD * TEMPER Crosstabulation

2 =16.077 with df = (R-1)(C-1)= (2-1)(3-1) =2

CAS MA 116 Statistics II Summer 2, 2012 Lecture 1.3

Exercise: Are sleep deprived students are more likely to be stressed?

Step 1: Define parameter of interest and state the hypothesis.

Step 2: Summarize the data into an appropriate test statistic:

S-ar putea să vă placă și