Sunteți pe pagina 1din 10

Australian School of Business

Probability and Statistics


Solutions Week 7
1. Distinction between terms:
(a) The null hypothesis is the hypothesis being tested while the alternative hypothesis is the hypothesis accepted if the null is rejected.
(b) A one-tailed hypothesis is one where it is of the form of an inequality like > a or < a, while a
two-tailed is of the form 6= a.

(c) A simple hypothesis is one where if true, it will completely specify the probability distribution,
otherwise it is called a composite hypothesis.

(d) A Type I error is the mistake committed when the null hypothesis is rejected when it is in fact
true. On the other hand, a Type II error is the mistake committed when the null hypothesis is
accepted when it is in fact false.
2. The probability mass function for a Poisson is:
pX (x) =

e x
.
x!

(a) Thus, the best critical region is given by solving the Neyman-Pearson lemma:
P
Q
L (x1 , . . . , xn ; 0 ) e0.1n (0.1) xk / xk !
P
=
Q
L (x1 , . . . , xn ; 1 ) e0.5n (0.5) xk / xk !
P

=e0.4n (0.2)
P

xk

xk

= (0.2)
k1 (= k/e0.4n )
X
=
xk log((0.2)) k2 (= log(k1 ))

xk k

(= k2 / log(0.2))

is the form of the best critical region (note: log(0.2) < 0). Thus the best critical region is of the
form:
o
n
X
C = (x1 , . . . , xn ) :
xk k ,
where k is such that Pr (

Pn

i=1

xi k |H0 ) = .

(b) For the specific form of the critical region given in the problem, that is, where we reject the null
P10
H0 when k=1 xk 3, the level of significance is:
!
10

X
X
e1
= Pr
xk 3 | = 0.1 =
x!
x=3
k=1

1
1
1
= 1 e + e + e /2 = 0.0803.

3. The density is actually that of a N (, 1) distribution where the variance is known. The best critical
region can be found by solving:
n

P
1/ 2 exp 21 nk=1 x2k
L (x1 , . . . , xn ; 0 )


=
n
Pn
2
L (x1 , . . . , xn ; 1 )
1/ 2 exp 21 k=1 (xk 1)
!!
n
1 X
2xk n
k.
= exp
2
k=1

c Katja Ignatieva

School of Risk and Actuarial Studies, ASB, UNSW

Page 1 of 10

ACTL2002 & ACTL5101

Probability and Statistics

Solutions Week 7

Thus, a little manipulation on this will lead us to:


n
X

1
xk log k + n.
|
{z 2 }
k=1
k

Therefore, the best critical region is of the form:


n
X

k=1

xk k

and to get the form of the constant k , we solve for:


Pr (Reject H0 | = 0 ) =
the level of significance. Solving this, we get:
n
X

Pr

k=1

xk k | = 0



k
= Pr Z
n

Pn
since we know that when = 0, the k=1 Xk N (0, n). Since the probability is equal to , we have

k
= z1 . Thus, reject the null whenever:
n
n
X

k=1

xk

n (z1 ) .

4. Testing from a Poisson distribution:


(a) Using Neyman-Pearson lemma, the best critical region can be found by solving:
P

L (x1 , . . . , xn ; 0 )
L (x1 , . . . , xn ; 1 )

en0 0

xk

= en(0 1 )

xk !
xk Q
n
1
1
e
/ xk !
 P
P

0
1

xk

k,

which after some manipulation will lead us to:


 
X 


0
xk log
log ken(0 1 )
1

X
log ken(0 1 )
 
xk
= k ,
log 01

where the inequality is reversed in the last step because:


1 > 0 =
Thus, reject H0 whenever

0
< 1 = log
1

0
1

< 0.

xk k where k is determined from:


!
n
X

Pr
xk k |0 = .
k=1

(b) Note that since the sum of Poisson is another Poisson P


with parameters by simply adding the
Poisson parameters, the distribution of the test statistic nk=1 xk Poisson(n0 ). Therefore, k
is determined from
(
)
X en0 (n0 )x
argmax
.
x!
k

x=k

c Katja Ignatieva

School of Risk and Actuarial Studies, ASB, UNSW

Page 2 of 10

ACTL2002 & ACTL5101

5. (a)

Probability and Statistics

Solutions Week 7

1. Test:
H0 : M = A v.s. H1 : M 6= A ,
where M is the population mean of the number of defectives in the morning and M is the
population mean of the number of defectives in the afternoon. Note that we are asked to
test whether there is a difference between in the means, which implies a two-sided test. The
test statistic is using the difference in mean, given unknown population variance, which is
assumed to be equal in the two samples. Note that the sample sizes are small, thus we do
not approximate the student-t distribution with the standard normal one. Hence, the test
statistic is:
T =

(X M X A ) (M A )
q
Sp n1M + n1A

,s

n Sp2
(X M X A ) (M A )
q
n
=
2
p n1M + n1A
|
{z
}
2
|
{z
}

where n = nA + nM 2

(n )/n

(X M X A )
q
tnM +nA 2
Sp n1M + n1A

* using the null hypothesis M A = 0, tnM +nA 2 is a student-t distribution with nM +nA 2
degrees of freedom.
The rejection region is C = {(x1 , . . . , xn )|T {(, tnM +nA 2,1/2 )(tnM +nA 2,1/2 , )}}.
From the data we have:
nM
X

i=1
nA
X

xi =212
xi =161

nM
X

i=1
nA
X

x2i = 4056

nM = 12

x2i = 2811

nA = 10

i=1

i=1

From this we can calculate:


!
PnM
nM
X
1
2
2
2
2
i=1 xi
xM =
xi nM xM = 28.242
= 212/12 = 17
sM =

nM
3
nM 1
i=1
!
PnA
nA
X
1
2
2
2
i=1 xi
xM =
xi nA xA = 24.322
= 16.1 sA =

nA
nA 1
i=1
s2p =

s2M (nM 1) + s2A (nA 1)


11 28.242 + 9 24.322
=
= 26.478.
nM + nA 2
20

Thus the value of our statistical test is:


212/12 16.1
q
 = 0.71.
1
1
+ 10
26.478 12

Note that the significance level is not given in this exercise. Thus we have to find he p-value
of the test. From Formulae and Table page 163 we observe that the 1- quantile student-t
distribution with 20 degrees of freedom takes the value 0.6870 for = 0.25 and 0.8600 for
= 0.20. Therefore, the p-value is close to 2 0.25 = 0.5 (somewhat lower). Hence usually we
consider p-values of 0.1, 0.05 or 0.01, for those p-values we would reject the null hypothesis
and accept the alternative, i.e., there is no statistical difference in the number of defectives
in the morning compared with the afternoon.
2. See below a dotchart(note only stars are enough). The stars represent the observations (lower,
black morning observations, upper, blue afternoon). The + signs corresponds to x 2s, x s,
x, x + s, x + 2s, in the middle using pooled sample standard variance and the upper and
lower ones using the sample standard variance of the individual (e.g. morning or afternoon)
sample. If the data is normal, then know that 95% of the observations should be smaller
(larger) than the + 2 ( 2) and approximately 2/3 of the observation should lay in the
interval ( , + ).
c Katja Ignatieva

School of Risk and Actuarial Studies, ASB, UNSW

Page 3 of 10

ACTL2002 & ACTL5101

Probability and Statistics

10

15

20

Solutions Week 7

25

30

From this dotchart we observe that the equal variance assumption seems reasonable, the
normality assumption of the morning data seems reasonable, however of the afternoon data the
normality assumption seems questionable (perhaps due to small number of observations) with
not a hump-shaped density function (i.e., we do not observe that there are more observations
around the mean) and the probability of large outliers is relatively large (i.e., we observe some
excess kurtosis).
(b)

1. In this question we are interested in proportions, i.e., the probability of a defective screw. We
are testing:
H0 : p0 = p1 v.s. H1 : p0 6= p1
where p0 is the (population) probability of a defective in the 150 screws sample day and
p1 is the (population) probability of a defective in the 100 screws sample day. Note that
we are asked to test whether there is a difference between the proportions, which implies a
two-sided test and that n and np are large, so we can use the normal approximation. The
test statistic of difference in proportions is (similar to the difference in mean when variance
-under the null- are equal):
p0 p1
Z=r

pb(1 pb) n10 +

1
n1

 N (0, 1)

Note the difference with the example in the lecture notes in week 7, where the proportion
under the null hypothesis is given. In this case both p0 and p1 are random variables. Under
the null hypothesis of equal proportions, the best estimate of this proportion, denoted by pb
is given by the average proportion in the two samples combined.
The rejection region is C = {(x1 , . . . , xn )|Z {(, z1/2 ) (z1/2 , )}}.
Hence, we have:
n0 =22 150 = 3300,
n1 = 20 100 = 2000
232
373
= 0.11303,
p1 =
= 0.116
p0 =
22 150
20 100
373 + 232
605
pb =
=
= 0.11415.
22 150 + 20 100
5300
c Katja Ignatieva

School of Risk and Actuarial Studies, ASB, UNSW

Page 4 of 10

ACTL2002 & ACTL5101

Probability and Statistics

Solutions Week 7

Then we have as our value of the test statistic is given by:


0.1180 0.1160
Z=q
1
+
0.11415 0.88585 3300

1
2000

 = 0.222.

Again, no level of significance is given, so we compute the p-value. From Formulae and Tables
page 160 we observe (0.22) = 0.58706 Hence, the p-value would be 2(10.58706) = 0.82588.
Thus the difference in the proportion is not significant at levels of < 0.82588, which is usually
the case.
2. Now we have the following test:
H0 : p = pe = 0.09 v.s. H1 : p = p1 > pe

Note that one can also set H0 : p = pe 0.09, but this complicates the test statistic. It would
lead to the same statistic and critical value.
The test statistic now which corresponds with the one in the lecture notes:
p1 pe
N (0, 1).
Z= p
pe (1 pe)/n1

The rejection region is C = {(x1 , . . . , xn )|Z (z1 , )}.


The value of this test statistic is, using that under the null hypothesis1 p = pe = 0.09:
0.116 0.09
Z= p
= 4.063.
0.09 0.91/2000

From Formulae and Tables page 161 we observe (4.06) = 0.99998 thus the corresponding
p-value is 1 (4.06) = 0.00002. Hence, for level of significance higher than 0.00002 (for
example 5%) we can reject the null hypothesis that the proportion of defectives is 9%. Hence
we can conclusively disprove that the proportion is 9% and thus we have proven that the
proportion is larger than 9%.
6. We have the hypothesis:
H0 : 12 = 22 v.s. H1 : 12 6= 22 with = 0.05
The test statistic is given by:
22
12
2
S
= 12
S2

F =

S12
F (n1 1, n2 1)
S22

F (n1 1, n2 1),

* using, under the null equal variances, thus the fraction of the variances are equal to one.
The rejection region is C = {(x1 , . . . , xn )|F {(0, 1/F1/2 (n2 1, n1 1)) (F1/2 (n1 1, n2
1), )}}.
The upper critical value is is given by F (24, 29, 0.975) = 2.514 and the lower approximated by
F (24, 29, 0.025) = 1/F (29, 24, 0.975) 1/F (24, 24, 0.975) = 1/2.269 = 0.441 (see Formulae and tables page 173), note two-sided test, therefore we have the 1 /2 for constructing the critical value.
The value of the test statistic is:
F =

139.7
s21
=
= 1.82.
s22
76.6

We reject the null hypothesis for large and small value of F , which is not the case. Hence, we cannot
reject the null hypothesis of equal variances at a 5% significance level.
7. (a) There does not seem to be a relationship between age and incubation period for both individuals
who died and who survived. (There seems to be a (positive) relationship between surviving and
the incubation period, but this was not asked in this question).
(b) For this we use the following dotplots (with the upper dots for the individuals who died (black
stars) and the lower for the individuals who survived (blue stars)).
1 In case of composite null hypothesis H : p 0.09, you should select here the p (0, 0.09] which leads to the highest
0
1
1
Type I error (Pr(Reject H0 |H0 is true)), which is the highest if p1 = 0.09.

c Katja Ignatieva

School of Risk and Actuarial Studies, ASB, UNSW

Page 5 of 10

ACTL2002 & ACTL5101

Probability and Statistics

Solutions Week 7

Figure 1: Dotplot for age

20

25

30

35

40

45

50

55

Age

1. The dotplot does not suggest a relationship between survival and age.
Figure 2: Dotplot for incubation period

20

30

40

50
Age

60

70

80

2. The dotplot suggests a relationship between survival and incubation period, namely the inc Katja Ignatieva

School of Risk and Actuarial Studies, ASB, UNSW

Page 6 of 10

ACTL2002 & ACTL5101

Probability and Statistics

Solutions Week 7

dividuals who survived tended to have a longer incubation period.


(c) We are interested in the difference in mean, with unknown population standard deviation. Therefore, we have to assume that the population variance of the incubation period for the survived
and died individuals is equal to set up a test statistic (from the dotcharts we observe the variance
of died individuals might be higher than the survived individuals). Under this assumption, and
using the central limit theorem (which might be not a good approximation because number of
survived is 7 and number of died is 11, i.e., total sample size is 18) or when both the incubation
period for survived and the incubation period for died are normally distributed (which might be
a good approximation looking at the dotcharts) we have the following test statistic:
T =

(Y S Y D ) (S D )
q
tnS +nD 2 ,
Sp n1S + n1D

where Y S , S , nS is the sample mean, population mean, and sample size of incubation period for
survived individuals, Y D , D is the sample mean, population mean, and sample size of incubation
period for died individuals, and Sp is the sample. Note that sample size is small, thus we have to
use the t-distribution and not the standard normal distribution. We have:
nS = 7
nD = 11
y S = 339/7 =
48.429
y
= 305/11 = 27.727
P

 PD 2 
2
 
yS
yS
7
339 2
19665
S
s2S = nSn1
=
= 541.2857

nS
nS
6
7
7

P



P
2
 
2
yD
yD
nD
11
305 2
10035
=
s2D = nD
= 157.8182

1
nD
nD
10
11
11
(n 1)s2 +(n 1)s2

D
D
= 6541.2857+10157.8182
= 4825.8961
= 301.6185
s2p = S nSS+nD 2
16
16
The (1 ) 100% confidence interval of the difference in mean is given by:
r
r
1
1
1
1
(xS xD ) t1/2,n1 +n2 2 sp
+
<S D < (xS + xD ) + t1/2,n1 +n2 2 sp
+
nS
nD
nS
nD
r
r
1
1
1
1
+
<S D < 20.702 + t1/2,n1 +n2 2 17.3672
+
20.702 t1/2,n1 +n2 2 17.3672
7 11
7 11

Using Formulae and Table page 163 we observe t0.975,16 = 2.120 and t0.995,16 = 4.015. Thus the
95% confidence interval for the difference in mean is given by (2.9, 38.5) and the 99% confidence
interval for the difference in mean is given by (3.8, 45.2).
The 95% confidence interval for the difference in mean does not include zero, hence when testing
the hypothesis of equal mean versus the alternative of a difference in mean (two-sided) with a
significance level of 5% we would reject the null hypothesis.
However, the 99% confidence interval for the difference in mean does include zero, hence when
testing the hypothesis of equal mean versus the alternative of a difference in mean (two-sided)
with a significance level of 1% we cannot reject the null hypothesis.
(d)

1. We preform the test:


2
2
= 5%
v.s. H1 : S2 6= D
H0 : S2 = D

The test statistic is given by:


F =

2
SS2
D

2 F (n1 1, n2 1)
S2 SD

Ss2
F (6, 10),
Sd2

* using, under the null equal variances, thus the fraction of the variances are equal to one.
The rejection region is C = {(x1 , . . . , xn )|{(0, 1/F1/2 (n2 1, n1 1)) (F1/2 (n1 1, n2
1), )}}.
The upper critical value is is given by F (6, 10, 0.975) = 4.072 and the lower is given by
F (6, 10, 0.025) = 1/F (10, 6, 0.975) = 1/5.461 = 0.18312 (see Formulae and tables page 173),
note two-sided test, therefore we have the 1 /2 for constructing the critical value.
The value of the test statistic is:
F =

c Katja Ignatieva

541.2857
s2S
=
= 3.4298.
s2D
157.8182

School of Risk and Actuarial Studies, ASB, UNSW

Page 7 of 10

ACTL2002 & ACTL5101

Probability and Statistics

Solutions Week 7

We reject the null hypothesis for large and small value of F , which is not the case. Hence,
we cannot reject the null hypothesis of equal variances at a 5% significance level.
Note that: F (6, 10, 0.95) = 3.271 (Formulae and Tables page 172), implying that we can
reject the null hypothesis of equal variances at a 10% significance level, and the p-value is
slightly smaller than 0.1.
2. See answer question c).
Although the dotcharts suggests that there is a difference in variance, when formally testing
the hypothesis, we cannot reject the null hypothesis of equal variances (due to small sample
size which either causes the observed difference in sample variance when the population
variances are equal or -in case of unequal population variances- the small sample size leads
to a low power of the test).
From the dotcharts we observe that the incubation period seems to be normally distribution
for both the sample survived and the sample died.
8. (a) See below a dotchart (note only stars are enough). The stars represent the observations (upper,
black Company A observations, lower, blue Company B observations). The + signs corresponds
to x 2s, x s, x, x + s, x + 2s, in the middle using pooled sample standard variance and the
upper and lower ones using the sample standard variance of the individual (e.g. Company A or
Company B) sample. If the data is normal, then we know that 95% of the observations should be
smaller (larger) than the + 2 ( 2) and approximately 2/3 of the observation should lay in
the interval ( , + ).

100

150

200
Premium

250

300

350

In order to apply the hypothesis test, the population mean of company A and company B should
be normally distributed with the same population variance. For the assumption of normally distribution of the population mean for company A and company B we cannot use CLT, because
that only holds for large n, which is not the case. Therefore, only if the underlaying population
is normally distributed, than the population mean is normally distributed.
From the dotcharts we observe that approximately 2/3 of the observation of both samples lay
within one sample standard deviation from the sample mean and no observations are smaller/larger
than the sample mean +/- 2 times the sample standard deviation. There seems to be a concentration of the observations around the sample mean (i.e., hump-shaped p.d.f.). Therefore, we cannot
reject the assumption that the distribution of the premiums of company A and the premiums of
company B are normally distributed.
We observe that the sample variance of company A is larger than the sample variance of company
c Katja Ignatieva

School of Risk and Actuarial Studies, ASB, UNSW

Page 8 of 10

ACTL2002 & ACTL5101

Probability and Statistics

Solutions Week 7

B, but this might be due to the small sample size. Hence, we cannot reject the assumption of
equal variance from the dotcharts.
(b) Assuming that the premiums are normally distributed, the only test is the test for equal variances.
Hence, we test/the hypothesis is:
2
2
2
2
H0 : A
= B
v.s.H1 : A
6= B
with = 0.05

The test statistic is given by:


F =

2
S2
B
A
2
2 F (nA 1, nB 1)
A SB

2
SA
F (9, 9),
2
SB

* using, under the null equal variances, thus the fraction of the variances are equal to one.
The rejection region is C = {(x1 , . . . , xnA +nB )|F (0, 1/F1/2 (nB 1, nA 1)) (F1/2 (nA
1, nB 1), )}}.
The upper critical value is is given by F (9, 9, 0.975) = 4.026 and the lower critical value by
F (9, 9, 0.025) = 1/F (9, 9, 0.975) = 1/4.026 = 0.2484 (see Formulae and tables page 173), note
two-sided test, therefore we have the 1 /2 for constructing the critical value.
The value of the test statistic is:
F =

s2A
4303.4
=
= 1.243,
s2B
3461.7

P

 P 2 
A
A2
494126
A
where s2A = nna 1
= 10

nA
nA
9
10

P

 P 2
2 
A
A2
541463
= 10
= 3461.7.
2259
nA
nA
9
10
10


2134 2
10

= 4303.4 and s2B =

nA
na 1

We reject the null hypothesis for large and small value of F , which is not the case. Hence, we
cannot reject the null hypothesis of equal variances at a 5% significance level. Therefore, it is
2
2
reasonable to assume that A
= B
Note that even F(9,9,0.9) = 2.440 (Formulae and Tables page 171), which we cannot reject the
null hypothesis of equal variance at a level of significance of 20%.
(c) We want to test, i.e., the hypothesis is:
H0 : B = A v.s. H1 : B > A ,
or (note, this will result in the test statistic and the same critical value)
H0 : B A v.s. H1 : B > A .
The corresponding test statistic (note that the sample size is small, hence the student-t distribution
cannot be approximated by the standard normal distribution) is:
T =

(X B X A ) (B A )
q
Sp n1B + n1A

(X B X A )
q
tnB +nA 2
Sp n1B + n1A

* using the null hypothesis2 B A = 0, tnB +nA 2 is a student-t distribution with nB + nA 2 =


18 degrees of freedom. We reject for large values of the statistic, i.e., the rejection region is
C = {(x1 , . . . , xnA +nB )|T (t1 (nB + nA 2), )}.
From the data and part b) we can calculate:
PnA
PnB
Bi
i=1 Ai
xA =
= 213.4 xB = i=1
= 225.9
na
nB
9 4303.4 + 9 3461.7
s2 (nA 1) + s2B (nB 1)
=
= 3882.5.
s2p = A
nA + nB 2
18
2 In case of composite null hypothesis, you should select here the
A (, B ] which leads to the highest Type I error
(Pr(Reject H0 |H0 is true)), which is the highest if A = B .

c Katja Ignatieva

School of Risk and Actuarial Studies, ASB, UNSW

Page 9 of 10

ACTL2002 & ACTL5101

Probability and Statistics

Solutions Week 7

Thus the value of our statistical test is:


225.9 213.4
q
= 0.4486.
1
1
+ 10
3882.5 10

Note that the significance level is not given in this exercise. Thus we have to find he p-value of the
test. From Formulae and Table page 163 we observe that the 1- quantile student-t distribution
with 18 degrees of freedom takes the value 0.5338 for = 0.3 and 0.2571 for = 0.40. Therefore,
the p-value is between to 0.3 and 0.4 (note: one sided test). Hence usually we consider p-values of
0.1, 0.05 or 0.01, for those p-values we would reject the null hypothesis and accept the alternative,
i.e., there is no statistical larger premium charged by company B and company A.
(d) Let pA , pB be the (population) proportion of the proportion of the claims that are higher than
200 for Company A and B, respectively. In order to construct the confidence interval for the
difference in proportions we first need the test statistic:
(b
pA pbB ) (pA pB )
N (0, 1),
Z= q
p
bA (1b
pA )
p
bB (1b
pB )
+
nA
nB

note that, under the null hypothesis, nA pA = 5 and nB pB = 5 which is the minimum requirement
as rule of thumb for a reasonable good approximation of a Binomial random variable by a normal
random variable, which is used in the test. the corresponding (two-sided) confidence interval is
given by:
s
s
pbA (1 pbA ) pbB (1 pbB )
pbA (1 pbA ) pbB (1 pbB )
+
< (pA pB ) < (b
pA pbB ) + z1/2
+
(b
pA pbB ) z1/2
nA
nB
nA
nB
From the data we have pbA = 5/10 = 0.5 and pbB = 6/10 = 0.6 and from Formulae and Tables
page 162 z0.975 = 1.96.
p
p
0.1 1.96 0.25/10 + 0.24/10 < (pA pB ) < 0.1 + 1.96 0.25/10 + 0.24/10
0.53 < (pA pB ) <0.33,

thus the 95% confidence interval for the difference in proportion of premiums charged higher than
200 is given by (0.53, 0.33).
This confidence interval contains the value zero, hence when testing the null hypothesis of equal
proportion of premiums charged higher than 200 versus a different in proportions (two-sided
test), we cannot reject the null hypothesis at a 5% significance level.
(e) Now, we have the following hypothesis:
H0 : A = 170 v.s. H1 : A > 170
The test statistic is (recall small sample size nA = 10, so we cannot approximate the student-t
distribution by a standard normal one):
T =

X A A
tnA 1

sA / nA
X A 170
tnA 1

sA / nA

* assuming that the null hypothesis is true. Reject for large values of the statistic, i.e., the
rejection region is C = {(x1 , . . . , xnA )|T (t1 (nA 1), )}.
The value of the test statistic, given the calculation in b) (i.e., s2A = 4303.4) and c) (i.e., xA =
213.4), is given by:
213.4 170
= 2.092
T =
340.34
From Formulae and Tables page 163 we observe t9,0.95 = 1.833 and t9,0.975 = 2.262. Therefore,
the p-value lays between 2.5% and 5%, i.e., testing at a 5% significance level would reject the null
hypothesis of no increase in the premium, whereas testing at a 2.5% significance level would not
lead to a rejection the null hypothesis of no increase in the premium.
-End of Week 7 Tutorial Solutions-

c Katja Ignatieva

School of Risk and Actuarial Studies, ASB, UNSW

Page 10 of 10

S-ar putea să vă placă și