Documente Academic
Documente Profesional
Documente Cultură
Contents
I Introduction, the null and alternative hypotheses
I Hypothesis testing process
I Type I and Type II errors, power
I Test statistic, level of significance and rejection/acceptance regions
in upper-, lower- and two-tail tests
I Test of hypothesis: procedure
I p-value
I Two-tail tests and confidence intervals
I Examples with various parameters
I Power and sample size calculations
Chapter 2. Hypothesis testing in one population
Learning goals
At the end of this chapter you should be able to:
I Perform a test of hypothesis in a one-population setting
I Formulate the null and alternative hypotheses
I Understand Type I and Type II errors, define the significance level,
define the power
I Choose a suitable test statistic and identify the corresponding
rejection region in upper-, lower- and two-tail tests
I Use the p-value to perform a test
I Know the connection between a two-tail test and a confidence
interval
I Calculate the power of a test and identify a sample size needed to
achieve a desired power
Chapter 2. Hypothesis testing in one population
References
I Newbold, P. Statistics for Business and Economics
I Chapter 9 (9.1-9.5)
I Ross, S. Introduction to Statistics
I Chapter 9
Test of hypothesis: introduction
'
Population: X = weight of a box of cereal (in oz)
0
z}|{
Null hypothesis, H0 : 20 SRS
'
X Bernoulli(p), p = proportion of defective parts in the entire shipment
p0
z}|{
Null hypothesis, H0 : p 0.5 SRS
'
Null hypothesis, H0 : 20 versus
'
Null hypothesis, H0 : p 0.5 versus
Is it likely to observe a
xyyxxxxyy
Population:
sample mean x = 1.65
if the population mean is
1.6?
X = height of a UC3M student (in m)
Claim: On average, students are shorter
than 1.6 ) Hypotheses:
'
H0 : 1.6 versus H1 : > 1.6
Actual situation
Decision H0 true H0 false
Do not No error Type II Error
Reject H0 (1 ) ( )
Reject Type I error No Error
H0 () (1 = power)
Type I and Type II errors, power
I Type I and Type II errors can not happen at the same time
I Type I error can only occur if H0 is true
I Type II error can only occur if H0 is false
I If the Type I error probability () *, then the Type II error
probability +
I All else being equal:
I * when the dierence between the hypothesized parameter value
and its true value +
I * when +
I * when *
I * when n +
I The power of the test increases as the sample size increases
I For 2 1
power() = 1
I For 2 0
power()
Test statistic, level of significance and rejection region
Test statistic, T
I Allows us to decide if the sample data is likely or unlikely to
occur, assuming the null hypothesis is true.
I It is the pivotal quantity from Chapter 1 calculated under the null
hypothesis.
I The decision in the test of hypothesis is based on the observed value
of the test statistic, t.
I The idea is that, if the data provide an evidence against the null
hypothesis, the observed test statistic should be extreme, that is,
very unusual. It should be typical otherwise.
I In distinguishing between extreme and typical we use:
I the sampling distribution of the test statistic
I the significance level to define so-called rejection (or critical)
region and the acceptance region.
Test statistic, level of significance and rejection region
Rejection region (RR) and acceptance region (AR) in size tests:
RR = {t : t > T } AR = {t : t T }
AR CRITICAL RR
VALUE
RR = {t : t < T1 } AR = {t : t T1 }
RR CRITICAL AR
VALUE
Two-tail test H1 : 6= 0
2 2
8 z 9
>
> z }| { >
>
>
> >
>
>
< >
=
X x 0 x 0
Normal data p0 N(0, 1) z : p < z1 /2 or p > z/2
Known variance / n >
> / n / n >
>
>
> >
>
>
: >
;
ff
Non-normal data X 0 x 0 x 0
Mean p ap. N(0, 1) z : p < z1 /2 or p > z/2
Large sample / n / n / n
ff
Bernoulli data p p0 p p0 p p0
p ap. N(0, 1) z : p < z1 /2 or pp (1 p )/n > z/2
Large sample p0 (1 p0 )/n p0 (1 p0 )/n 0 0
8 t 9
>
> z }| { >
>
>
> >
>
>
< >
=
X 0 x 0 x 0
Normal data p tn t : < tn p
s/ n 1 > p 1;1 /2 or s/ n > tn 1;/2 >
Unknown variance >
> s/ n >
>
>
> >
>
: ;
8 2 9
>
> >
>
>
> z }| { >
>
>
> >
>
>
> >
>
(n 1)s 2
< (n 1)s 2 (n 1)s 2
=
Variance 2 2 : < 2 or > 2
Normal data 2 n 1 > 2 n 1;1 /2 2 n 1;/2 >
0 >
> 0 >
>
>
> 0 >
>
>
> >
>
>
: >
;
( )
(n 1)s 2 2 2 : (n 1)s 2 2 (n 1)s 2 2
St. dev. Normal data 2 n 1 2 < or 2 >
n 1;1 /2 n 1;/2
0 0 0
'
X N(, 2 = 0.12 )
Observed test statistic:
SRS: n = 16 = 0.1 0 = 5
n = 16 x = 5.038
Sample: x = 5.038 x 0
z = p
/ n
Objective: test
0 5.038 5
= p = 1.52
z}|{ 0.1/ 16
H0 : = 5 against H1 : > 5
(Upper-tail test)
Upper-tail test for the mean, variance known: example
p-value = P(T t)
test
stat
p-value = P(T t)
|test |test
stat| stat|
'
X N(, 2 = 0.12 ) p-value = P(Z z) = P(Z 1.52)
= 0.0643 where Z N(0, 1)
SRS: n = 16
Since it holds that
Sample: x = 5.038 p-value = 0.0643 = 0.05
we fail to reject H0 (but would reject
Objective: test at any greater than 0.0643, e.g.,
0 = 0.1).
z}|{
H0 : = 5 against H1 : > 5
(Upper-tail test)
z=
Test statistic: Z = X /pn0 N(0, 1) 1.52
Observed test statistic: z = 1.52
pvalue
=area
N(0,1) density
1
The p-value and the probability of the null hypothesis
I The p-value:
I is not the probability of H0 nor the Type I error ;
I but it can be used as a test statistic to be compared with (i.e.
reject H0 if p-value < ).
I We are interested in answering: How probable is the null given the
data?
I Remember that we defined the p-value as the probability of the data
(or values even more extreme) given the null.
I We cannot answer exactly.
I But under fairly general conditions and assuming that if we had no
observations Pr(H0 ) = Pr(H1 ) = 1/2, then for p-values, p, such that
p < 0.36:
ep ln(p)
Pr(H0 |Observed Data) .
1 ep ln(p)
I For a p-value equal to 0.05 the null has a probability of at least 29%
of being true
I While if we want the probability of the null being true to be at most
5%, the p-value should be no larger than 0.0034.
Confidence intervals and two-tail tests: duality
H0 : = 0 against H1 : 6= 0
'
X N(, 2 = 0.062 )
CI0.95 () = x 1.96 p
SRS: n = 9 n
0.06
Sample: x = 1.95 = 1.95 1.96 p
9
Objective: test = (1.9108, 1.9892)
0
z}|{ Since 0 = 2 2 / CI0.95 () we
H0 : = 2 against H1 : 6= 2 reject H0 at a 5% significance
(Two-tail test) level.
Two-tail test for the proportion: example
Example: 9.6 (Newbold) In a random sample of 199 audit partners in
U.S. accounting firms, 104 partners indicated some measure of
agreement with the statement: Cash flow from operations is a valid
measure of profitability. Test at the 10% level against a two-sided
alternative the null hypothesis that one-half of the members of this
population would agree with the preceding statement.
Population:
X = 1 if a member agrees with the Test statistic:
statement and 0 otherwise Z = p p p0 approx. N(0, 1)
'
X Bernoulli(p) p0 (1 p0 )/n
Observed test statistic:
SRS: n = 199 large n
p0 = 0.5
Sample: p = 104
= 0.523 n = 199 p = 0.523
199
p p0
Objective: test z = p
p0 (1 p0 )/n
p0
z}|{ 0.523 0.5
H0 : p = 0.5 against H1 : p 6= 0.5 = p
0.5(1 0.5)/199
(Two-tail test) = 0.65
Two-tail test for the proportion: example
Example: 9.6 (cont.)
Rejection (or critical) region:
'
X N(, 2 ) 2 unknown
0 = 20 n=6
SRS: n = 6 small n p
x = 1.95 s = 0.588 = 0.767
x 0
Sample: x = 117 = 19.5 t = p
6 s/ n
2 2284.44 6(19.5)2
s = 6 1
= 0.588 19.5 20
= p = 1.597
Objective: test 0.767/ 6
0
z}|{
H0 : 20 against H1 : < 20
(Lower-tail test)
Lower-tail test for the mean, variance unknown: example
Example: 9.4 (cont.)
tn 1 density
||
2.015 1.476
Conclusion: The sample data gave enough evidence to reject the claim
that the average increase in sales was at least 20%.
p-value interpretation: if the null hypothesis were true, the probability of
obtaining such sample data would be at most 10%, which is quite
unlikely, so we reject the null hypothesis.
Lower-tail test for the mean, variance unknown: example
Example: 9.4 (cont.) in Excel: Go to menu: Data, submenu: Data
Analysis, choose function: two-sample t-test with unequal variances.
Column A (data), Column B (n repetitions of 0 = 20), in yellow
(observed t stat, p-value and tn 1; ).
Upper-tail test for the variance: example
Example: 9.5 (Newbold) In order to meet the standards in consignments of a
chemical product, it is important that the variance of their percentage impurity
levels does not exceed 4. A random sample of twenty consignments had a
sample quasi-variance of 5.62 for impurity level percentages.
a) Perform a suitable test of hypothesis ( = 0.1).
2
b) Find the power of the test. What is the power at 1 = 7?
2
c) What sample size would guarantee a power of 0.9 at 1 = 7?
Population:
X = impurity level of a consignment of a 2
'
X N(, 2 ) Observed test statistic:
2
0 =4 n = 20
SRS: n = 20 2
s = 5.62
Sample: s 2 = 5.62 2 (n 1)s 2
= 2
0
Objective: test (20 1)5.62
2 =
0 4
z}|{ = 26.695
2 2
H0 : 4 against H1 : >4
(Upper-tail test)
Upper-tail test for the variance: example
Example: 9.5 a) (cont.)
2
p-value = P( 26.695) 2 =
2 (0.1, 0.25) because 26.695
2 2
19;0.25 19;0.1
z}|{ z}|{
22.7 < 26.695 < 27.2 pvalue
=area
Hence, given that p-value exceeds
= 0.1, we cannot reject the null
hypothesis at this level.
2
density 22.7 27.2
n 1
Conclusion: The sample data did not provide enough evidence to reject
the claim that the variance of the percentage impurity levels in
consignments of this chemical is at most 4.
Upper-tail test for the variance: power
Example: 9.5 b) Recall that: power = P(reject H0 |H1 is true)
When do we reject H0 ?
ff
(n 1)s 2 2
RR0.1 = 2
> n 1;0.1 power( 2 ) versus 2
0
8 9
>
> 27.2 4 = 108.8>
z }| { >
1.0
< =
2 2 2 power(22) =
= (n 1)s > n 1;0.1 0 1 ( )
>
> >
>
: ;
0.8
20 = 4
0.6
Hence the power is:
power( 12 ) = P reject H0 | 2 = 12
0.4
= P (n 1)s 2 > 108.8| 2 = 12
0.0 0.2
(n 1)s 2 108.8
=P >
2
1
2
1
0 0 2 4 6 1 8 10
=P 2
>
108.8
=1 F 2
108.8 2
2 2
1
2
1 ` 2 108.8
(F 2 is the cdf of n 1) Hence, power(7) = P > 7
= 0.6874.
Upper-tail test for the variance: sample size calculations
Example: 9.5 c)
From our previous calculations,
we know that
2
2 (n 1)s 2 (n 1)s 2
potencia( 1 ) = P 2 > 2n 1;0.1 02 , 2 2
n 1
1 1 1
2
The last equation implies that we are dealing with a n 1 distribution,
whose upper 0.9-quantile satisfies 2n 1;0.9 0.571 2n 1;0.1 .
2 2
chi-square table 43;0.9 / 43;0.1 = 0.573 > 0.571 ) n 1 = 43
1.0
power() =
1 () n=16
n=9
0.8
0.8
0 = 5 n=4
0.6
0.6
0.4
0.4
0.2
0.2
0.0
0.0
4.85 4.95 5.05 4.85 4.95 5.05
1 0
Another power example: lower-tail test for the mean,
normal population, known 2
Note that the power = 1 P(Type II error) function has the following
features (everything else being equal):
I The farther the true mean 1 from the hypothesized 0 , the greater
the power
I The smaller the , the smaller the power, that is, reducing the
probability of Type I error will increase the probability of Type II error
I The larger the population variance, the lower the power (we are less
likely to detect small departures from 0 , when there is greater
variability in the population)
I The larger the sample size, the greater the power of the test (the
more info from the population, the greater the chance of detecting
any departures from the null hypothesis).