Sunteți pe pagina 1din 11

CAS MA 116 Statistics II Summer 2, 2012 Lecture 1.

Lecture 1.3
Men willingly believe what they wish. --Julius Caesar (100-44 BC)

1. Chi-Square Test of Goodness- of- Fit


2. Conditional Probability. Independence. Chi-Square Test: Test of Independence

5 Basic Steps in Hypothesis Test


Step 1: Determine the null (H0) and alternative (Ha) hypotheses.
Note: Hypotheses are statements ABOUT population parameters NOT ABOUT sample statistics.
Step 2: Verify necessary data conditions (assumptions), and if met, summarize the data into an
appropriate test statistic (using appropriate data summary, or sample statistic).
Step 3: Assuming the null (H0) hypothesis is true, find either rejection region or the p-value.
Step 4: Decide whether or not the result is statistically significant based on rejection region:
or based on p-value, the probability of getting a test statistic as extreme or more extreme (in the
direction of Ha) than the observed value of the test statistic, assuming H0 is true.
Step 5: Report the conclusion in the context of the problem (question of interest).

One-Sample Hypothesis Test for Population Proportion (Large n)


Test
Scenario
Population
Proportion

Data

Population
Parameter

Response

#
=

Explanatory
Variable
________

Categorical
(YES/NO,
Success/Failure,
True/False)
Is the true population proportion of adults who believe in life after death is more than 70%?
Large Sample Hypothesis Test for Population Proportion H0: p=p0
Decision Rule: Ha: p>p0 !
Ha: pp0 !/
Ha: p<p0 !
Assumptions
Large n
--for CI check:
,
( )
--for HT check:
! ,
( )

1 Sample

Sample
Statistics

Test Statistic
p p0
Z=
p0 (1 p0 )
n
where

p =

X
n

Confidence Interval

p(1
p)
n
2
" Z1( /2) %

$
ni = p(1 p)
'
# E &
p Z1( /2)

CAS MA 116 Statistics II Summer 2, 2012 Lecture 1.3

One-Sample Hypothesis Test for Population Proportion (small n)


(Binomial Test)
Small Sample Hypothesis Test for Population Proportion H0: p=p0
Required assumption: Data are a random sample from Binomial population.
Test Statistics: X (observed # successes), X ~ Binomial(n, p0)
Decision rule is based on p-value:
Ha: p>p0 p-value = P(Xx) = P(X=x)+P(X=x+1) +...+ P(X=n)
Ha: p<p0 p-value = P(Xx) = P(X=0)+P(X=1) +...+ P(X=x)
Ha: pp0 p-value = P(Xx) =1- P(X=x)

If the p-value , then reject H0.


If the p-value >, then fail to reject H0.
Recall Probability of exactly k successes in trials:
!

!!
= = !! !! 1 ! !!! =
1 ! !!! , where
= !! !!! !
!

Two-Sample Hypothesis Test for Difference in Population Proportions


(Large n1, n2)
Test
Scenario
Difference in
Population
Proportions

Explanatory
Variable
2 Independent
Categorical
Group (1,2)
Samples
(YES/NO,
(Gender,
Success/Failure,
Student/NonTrue/False)
Students)
Is there a significant difference in proportions of athletes among male and female students at BU?
Large Sample Hypothesis Test for Difference in Population Proportions H0: p1-p2=0
Decision Rule: Ha: p1>p2 !
Ha: p1p2 !/
Ha: p1<p2 !

Assumptions
Large n1, n2
and
( )
and
( )

Data

Population
Parameter
! !

Sample
Statistics
! !

Test Statistic
p 1 p 2
Z=
1 1
p (1 p ) +
n1 n2
where p
1 = X 1 / n1 , p 2 = X 2 / n2

p =

X1 + X 2
n1 + n2

Response

Confidence Interval

( p 1 p 2 ) Z1( / 2)

p 1 (1 p 1 ) p 2 (1 p 2 )
+
n1
n2

pq 2Z1( / 2) + p1 q1 + p 2 q 2 Z1
ni =

ES

where ES=|p2-p1| (under Ha)

p=

p1 + p 2
,q = 1 p
2

CAS MA 116 Statistics II Summer 2, 2012 Lecture 1.3

Chi-Square Tests
Name of the Test

Goodness-of-Fit Test
(how well
categorical data
FITS expected
distribution)

Description
Uses Multinomial Distribution:
response is the choice of a
category from more than 2 possible
categories
H0 involves statements about what
the proportions should be for the k
response categories

Examples
Do people park equally
often on all 4 levels of a
parking garage on rainy days?
Do students register
equally likely for morning,
afternoon, and evening
sessions?

H0: p1 = p01, p2 = p02,... , pk = p0k


Ha: H0 is false
Note: all proportions add up to 1.
!
!!! !! = 1
n Consider 1 population and 2
categorical variables.

Test of
Independence
(if two categorical
variables can be
considered
INDEPENDENT)

n Test if the two categorical


variables appear to be related
(dependent) for a given
population of interest.

Are angry people more


likely to have Heart Disease?
Are sleep deprived
students are more likely to be
stressed?

H0: [variable 1] and [variable 2] are


INDEPENDENT
Ha: [variable 1] and [variable 2] are
DEPENDENT

Chi-Square statistic:
=

, reject H0 if ()

where Oi is the observed count and


Ei is the expected count under the corresponding null hypothesis.

CAS MA 116 Statistics II Summer 2, 2012 Lecture 1.3

Test of Goodness of Fit


The goodness of fit test is used to assess if one sample fits well with a specified distribution.
Probabilities for the all categories add up to 1.
Chi-Square Goodness-of-Fit Test
H0: p1 = p01, p2 = p02,... , pk = p0k
Test Statistics: =

Decision Rule: reject H0 if () , df = k-1


Oi is the observed count and

= is the expected count under the corresponding H0.

Example (DAgoustino, Example 7.10): Goodness of Fit for Teen Issues


Volunteers at a teen hotline have been assigned based on the assumption that 40% of all calls are
drug related, 25% are sex related (e.g., date rape), 25% are stress related, and 10% concern educational
issues. For this investigation, each call is classified into one category based on the primary issue raised
by the caller.
To test the hypothesis, the following data are collected from 120 randomly selected calls placed to
the teen hotline. Based on the data, is the assumption regarding the distribution of topic issues
appropriate?
Topic Issue:
Number of calls:

Drugs

Sex

Stress

Education

Total

52

38

21

n=120

Step 1: Define parameter of interest and state the hypothesis.


Parameter: pi = the proportion of calls related to (1 Drugs, 2- Sex, 3 Stress, 4 - Education)
H0: p1 = 0.40, p2 = 0.25, p3 = 0.25, p4 = 0.10
Ha: H0 is false
Significance Level

= 0.05

Step 2: Summarize the data into an appropriate test statistic:


=

with k = 4 (number of categories), = =

CAS MA 116 Statistics II Summer 2, 2012 Lecture 1.3

To compute the chi-square statistics, extend the table of observed counts and compute expected counts:

Topic Issue:
Number of calls =Oi
(Expected Counts) =
(Oi - Ei)2/Ei

Drugs

Sex

52

38

120*0.4 =48

Stress
21

Education
9

Total
n=120

120*0.25=

(52- 48)2/48=
0.33

2.133

2 = 0.33 + 2.13+ 2.70 +0.75 = 5.913

Step 3: Assuming the H0 is true, define decision:


Decision Rule:
Reject H0 if ! ! ()
Using Table B.5 ! = ! 3 = 7.81
Step 4: Decide whether or not the result is statistically significant based on rejection region:
Decision: Fail to Reject H0
Step 5: Report the conclusion in the context of the problem (question of interest).
Based on a random sample of n=120 phone calls, there is no significant evidence at 5%
significance level to conclude that volunteers at a teen hotline have been assigned inappropriately.

CAS MA 116 Statistics II Summer 2, 2012 Lecture 1.3

Exercise:
According to M&Ms web site each regular package of Milk Chocolate M&Ms should contain 24%
blue, 14% brown, 16% green, 20% orange, 13% red, and 14% yellow M&Ms. Count candies, and use
an appropriate test of hypotheses to check if the claimed percentage is consistent with the stated
proportion distribution.
Step 1: Define parameter of interest and state the hypothesis.
H0:________________________________________________
Ha:________________________________________________
Step 2: Summarize the data into an appropriate test statistic:
First, create table of observed and expected counts:
Colors
Blue
Brown
Green
Orange
H0
0.24
0.14
0.16
0.2
Oi
Ei

=_____________

Red
0.13

Yellow
0.14

Then compute test statistics

2 =

df =
Step 3: Assuming the H0 is true, define decision rule:

Step 4: Decide whether or not the result is statistically significant based on rejection region:
Step 5: Report the conclusion in the context of the problem (question of interest).
Based on a random sample of n= ___________, there is __________ significant
evidence, at level = __________, to conclude that_______________________
_________________________________________________________________
6

CAS MA 116 Statistics II Summer 2, 2012 Lecture 1.3

Exercise: As part of an on-going study, men who quit smoking using a variety of methods are
being followed for several years. A group of 350 men (n=350) who quit smoking using a nicotine
patch were tracked down 3 years after their quitting date. They were asked if they had successfully quit
or if they had gone back to smoking. The possible answers and number of men who answered the
question that way are given below.
Outcome

# responses Oi

I
haven't
smoked
since (#1)
188

I don't smoke much


anymore, but I
occasionally light up (#2)

I smoke as
much now as I
did before (#3)

35

I smoke
Total
more now than I
did before (#4)

111

16

=
(Oi - Ei)2/Ei
The researchers wish to test the following hypothesis:
H0: p1 = 0.60, p2 = 0.10, p3 = 0.25, p4 = 0.05
Step 1: Define parameter of interest and state the hypothesis.
Parameter:
H0:________________________________________________
Ha: ________________________________________________ = 0.05
Step 2: Summarize the data into an appropriate test statistic:
=

with df = k-1 =

Step 3: Assuming the H0 is true, define decision rule:

Step 4: Decide whether or not the result is statistically significant based on rejection region:

Step 5: Report the conclusion in the context of the problem (question of interest).
Based on a random sample of n= ___________, there is __________ significant evidence, at
level = __________, to conclude that_____________________________________________
7

350

CAS MA 116 Statistics II Summer 2, 2012 Lecture 1.3

Test of Independence
A fundamental question: Is there a relationship between the two variables so that the chance
that an individual falls into a particular category for one variable depends upon the particular
category they fall into for the other variable? A procedure for assessing the statistical significance
of a relationship between categorical variables is the chi-square test of independence.
Chi-Square Test of Independence
=

, reject H0 if () , df = (R-1)(C-1)

H0: [variable 1] and [variable 2] are INDEPENDENT


Oi is the observed count and
= ( )/ is the expected count under the
corresponding H0.

Conditional Probability and Independence


Definition:
Recall that conditional probability is the probability of some event A, given the occurrence of some
other event B. Conditional probability is written | , and is read "the (conditional) probability of
A, given B" or "the probability of A under the condition B".
When in a random experiment the event B is known to have occurred, the possible outcomes of the
experiment are reduced to B, and hence the probability of the occurrence of A is changed from the
unconditional probability into the conditional probability given B.
Marginal probability is then the unconditional probability P(A) of the event A; that is, the probability
of A, regardless of whether event B did or did not occur.
The Conditional Probability Rule:
| =

Two random events A and B are (statistically) independent if and only if


= = ()()
Thus, if A and B are independent, then their joint probability can be expressed as a simple product of
their individual probabilities. Equivalently, for two independent events A and B with non-zero
probabilities:
| = , or | = .
In other words, if A and B are independent, then the conditional probability of A, given B is simply
the individual (marginal) probability of A alone; likewise, the probability of B given A is simply the
probability of B alone.
8

CAS MA 116 Statistics II Summer 2, 2012 Lecture 1.3

Example:
Are angry people more likely to have Coronary Heart Disease? Coronary heart disease (CHD) is
a narrowing of the small blood vessels that supply blood and oxygen to the heart. CHD is also called
coronary artery disease.
People who get angry easily tend to be more likely to have heart disease. That is the conclusion of a
study that followed a random sample of 12,986 people from three locations over about four years. All
subjects were free of heart disease at the beginning of the study. The subjects
took the Spielberger Trait Anger Scale, which measures how prone a person
is to sudden anger. The 8474 people in the sample who had normal blood
pressure were classified according to whether they had coronary heart
disease (CHD) or not and whether they had low anger, moderate anger, or
high anger according toCHD
the Anger
Scale.Crosstabulation
* TEMPER
Count

CHD
Total

CHD
No CHD

Low anger
53
3057
3110

TEMPER
Moderate
anger
110
4621
4731

High anger
27
606
633

Total
190
8284
8474

1. What proportion of sampled subjects had CHD?


(Answer: 0.022)
2. What proportion of High anger subjects had CHD?
(Answer: 0.043)
3. What proportion of Moderate anger subjects had CHD?
(Answer: 0.023)

4. What proportion of Low anger subjects had CHD?


(Answer: 0.017)

5. Do anger classification and coronary heart disease status seem to be independent?

CAS MA 116 Statistics II Summer 2, 2012 Lecture 1.3

Step 1: State the hypotheses.


H0:__Having CHD is independent of Level of Anger _____________
Ha: _______________________________________________________
Significance level =______0.05_______
* TEMPER
Step 2: Summarize CHD
the data
into anCrosstabulation
appropriate test statistic:
Count

CHD

CHD
No CHD

Total

Low anger
53
3057
3110

TEMPER
Moderate
anger
110
4621
4731

High anger
27
606
633

Total
190
8284
8474

NOTE: All cell counts must be 5.


First, compute expected counts:
= ( )/

=
=
= .

CHD * TEMPER Crosstabulation

CHD

CHD
No CHD

Total

Count
Expected Count
Count
Expected Count
Count
Expected Count

Low anger
53
69.7
3057
3040.3
3110
3110.0

TEMPER
Moderate
anger
110
106.1
4621
4624.9
4731
4731.0

High anger
27
14.2
606
618.8
633
633.0

Total
190
190.0
8284
8284.0
8474
8474.0

2 =16.077 with df = (R-1)(C-1)= (2-1)(3-1) =2


Step 3: Assuming the H0 is true, define decision rule:
25.99
Step 4: Decide whether or not the result is statistically significant based on rejection region:
___________________________________________________________
Step 5: Report the conclusion in the context of the problem (question of interest).
Based on a random sample of n= __________, there is __________ significant
evidence, at level = __________, to conclude that_______________________
_________________________________________________________________
10

CAS MA 116 Statistics II Summer 2, 2012 Lecture 1.3

Exercise: Are sleep deprived students are more likely to be stressed?


Using the following data perform an appropriate test and conclude if there is a relationship
between level of stress and the lack of sleep of BU students at 5% significance level.
Stressed
Sleep Deprived

18

Not Sleep

Deprived

Not
Stressed
7

Total

12

Total

Step 1: Define parameter of interest and state the hypothesis.


H0:________________________________________________
Ha:________________________________________________

=_____________

Step 2: Summarize the data into an appropriate test statistic:


=

df =
Step 3: Assuming the H0 is true, define decision rule:

Step 4: Decide whether or not the result is statistically significant based on rejection region:
Step 5: Report the conclusion in the context of the problem (question of interest).
Based on a random sample of n= ___________, there is __________ significant
evidence, at level = __________, to conclude that_______________________
_________________________________________________________________

11

S-ar putea să vă placă și