Documente Academic
Documente Profesional
Documente Cultură
𝜈+1 𝜈+1
𝛤( ) −
2 𝑡2 2
𝑓(𝑡) = 𝜈 (1 + ) −∞ < 𝑡 < ∞
√𝜋𝜈 𝛤(2) 𝜈
The F distribution is positively skewed. The symbol 𝑓𝛼,𝜈1 ,𝜈2 is used to denote the point to the
right of which the area under the F curve with 𝜈1 and 𝜈2 degrees of freedom is 𝛼.
𝑃 [𝐹 > 𝑓𝛼,𝜈 ,𝜈 ] = 𝛼
1 2
---------------
Tests of significance concerning mean, standard deviation and proportion are dealt with
in the study below. Simultaneously, interval estimation of the population mean, standard
deviation and proportion are derived. The following terms are used in interval estimation and the
tests of significance.
Sampling Distribution and Standard Error
The statistical measures of the sample are termed as ‘statistics’; the statistical measures of
the population are called ‘parameters’.
The term ‘Sampling Distribution’ is used to refer to the distribution of a sample statistic.
For example, ‘the sampling distribution of mean’ is the distribution of the mean of independent
random samples of size ‘n’ drawn from a given population.
‘Standard Error’ is the standard deviation of the sampling distribution. The standard
error, denoted S.E. of a sample statistic is used in tests of significance of difference and while
giving an interval estimate for the population parameter.
INTERVAL ESTIMATION
Point estimators give a single value as an estimate for the parameter. There is no
indication about the probability of this estimate being acceptable. In other words, there is no idea
of the size of the error in such an estimate. Interval estimation is based on a certain stipulated
‘confidence level’, and as such, indicates the size of the error in the value provided by
estimation. This confidence level is usually either 95% or 99%. We denote this level as 1 – α. If
(𝑎, 𝑏) is the (1 – α)% confidence interval for the parameter θ, it means that
TESTS OF SIGNIFICANCE
It is tested if the observed difference between the sample statistic and population
parameter is significant or not, using a test measure at a stipulated confidence level, denoted as α.
Suppose we want to test the null hypothesis
𝐻0 ∶ 𝜃 = 𝜃0
The alternative can be any one of the following and consequently the nature of the test will be as
below:
The following is the general procedure for testing the significance of an observed difference:
(i) Set up null hypothesis 𝐻0 of no difference and a suitable alternative hypothesis 𝐻1
(ii) Decide whether one-tailed test or two-tailed test is to be used
(iii) Decide on the level of significance α
(iv) Select the appropriate test statistic (this involves the S.E. of the sample statistic under question)
(v) Demarcate critical region in the distribution of the test statistic
(vi) Check if the calculated value of the test statistic falls within or outside the critical region
(vii) Accordingly reject / accept 𝐻0 at the level of significance α
Instead of demarcating the critical region and checking whether the calculated value of the
test statistic falls within or outside the critical region, we may also use the P-value. The P-value is
the lowest level of significance at which the null hypothesis could have been rejected. More simply,
P-value is the area of the region corresponding to the calculated value of the test statistic. If the P-
value is less than the stipulated level of significance α, the null hypothesis is rejected at the level of
significance α.
INTERVAL ESTIMATION AND TESTS CONCERNING MEAN
I Mean of a single sample
Let a random sample of size n be taken from an infinite population. 𝜇 and 𝜎 2 denote the
mean and variance of the population. Let 𝑋̅ be the mean of the sample. The distribution of the
sample mean 𝑋̅ which is called as the ‘sampling distribution of the sample mean’ has mean 𝜇 and
𝜎2
variance .
𝑛
∑(𝑥𝑖 −𝑥̅ )2
𝑠 2 is the sample variance given by 𝑠 2 =
𝑛−1
𝜎 𝑠
The 100(1 − 𝛼)% confidence interval for 𝜇 is 𝑋̅ ± 𝑧𝛼⁄2 or 𝑋̅ ± 𝑧𝛼⁄2
√𝑛 √𝑛
𝜎 𝜎 2 2
𝐸(𝑋̅1 − 𝑋̅2 ) = 𝜇1 − 𝜇2 Variance (𝑋̅1 − 𝑋̅2 ) = √ 𝑛1 + 𝑛2
1 2
The null hypothesis 𝐻0 : μ1 − μ2 = δ against the alternative which could be any one of
μ1 − μ2 ≠ δ μ1 − μ2 > δ μ1 − μ2 < δ.
Case 1: Populations known to be normal with known variances σ12, σ22. No condition on n1 and n2.
𝜎 2 𝜎2 2 ̅ 1 −𝑋̅ 2 − 𝛿
𝑋
𝑆. 𝐸.𝑋̅1 −𝑋̅2 = √ 𝑛1 + Test statistic = 𝑧 = ~ 𝑁 (0, 1) [under 𝐻0 ]
1 𝑛2 𝜎 2 𝜎 2
√ 1 + 2
𝑛1 𝑛2
𝜎 𝜎 2 2
The 100(1 − 𝛼)% confidence interval for 𝜇1 − 𝜇2 is 𝑋̅1 − 𝑋̅2 ± 𝑧𝛼⁄2 √ 𝑛1 + 𝑛2
1 2
Case 3: Populations normal. σ12, σ22 are not known. Either or both of n1 and n2 < 30
Let the samples be {𝑋1𝑖 ⁄𝑖 = 1, 2, . . . 𝑛1 } and {𝑋2𝑗 ⁄𝑗 = 1, 2, . . . 𝑛2 }
Assume 𝜎1 = 𝜎2 = 𝜎 ; 𝜎 is estimated by pooling the squared deviations from the means of
the two samples. The pooled estimate of 𝜎 is denoted as 𝑆𝑃 .
2
(𝑛1 −1)𝑠1 2 + (𝑛2 −1)𝑠2 2 ̅1 )2 + ∑(𝑥2𝑗 − 𝑥
∑(𝑥1𝑖 − 𝑥 ̅2)
2
𝑆𝑃 = 𝑛1 + 𝑛2 −2
= 𝑛1 + 𝑛2 −2
1 1
𝑆. 𝐸.𝑋̅1 −𝑋̅2 = 𝑆𝑃 √𝑛 + 𝑛
1 2
(𝑋̅ 1 −𝑋̅ 2 ) − 𝛿
Test statistic = t = 1 1
𝑆𝑃 √ +
𝑛1 𝑛2
~ t distribution with 𝑛1 + 𝑛2 − 2 degrees of freedom. [under 𝐻0 ]
1 1
The 100(1 − 𝛼)% confidence interval for 𝜇1 − 𝜇2 is (𝑋̅1 − 𝑋̅2 ) ± 𝑡𝛼⁄2, 𝑛1 +𝑛2−2 [𝑆𝑃 √𝑛 + 𝑛 ]
1 2
1) Test at the 0.05 level of significance whether the mean of a random sample of size n = 16 is
significantly less than 10, if the distribution from which the sample was taken is normal; 𝑥̅ = 8.4, σ =
3.2.
Solution: 𝐻0 : 𝜇 = 10 𝐻1 : 𝜇 < 10 (left-tailed) 𝛼 = 0.05
Population is normal; 𝜎 is known; standard normal variable z is used as the test statistic.
𝑋̅ − 𝜇0 8.4−10 (−1.6)×4
𝑧= 𝜎 = 3.2⁄ = = −2
⁄ 𝑛 3.2
√ √16
Note: The claimed value of 140 mm of Hg according to 𝐻0 is within the 95% confidence limits for μ
and it can be inferred that the claim can be accepted.
𝜎
The 98% confidence limits for μ = 𝑋̅ ± 𝑧𝛼⁄2
√𝑛
10.5
= 141.8 ± (2.327) ( ) = (140.8, 142.8) mm of Hg
√120
3) The security department of a factory wants to know whether the true average time required by the
night guard to walk his round is 30 minutes. If in a random sample of 32 rounds, the night guard
averaged 30.8 minutes with a standard deviation of 2.1 minutes, determine whether this is sufficient
evidence to reject the null hypothesis μ = 30 minutes in favour of the alternative hypothesis μ ≠ 30
minutes at (a) 0.01 (b) 0.05 levels of significance.
Solution: 𝐻0 : 𝜇 = 30 𝐻1 : 𝜇 ≠ 30 (two-tailed) 𝛼 = 0.01, 0.05
Size of the sample is more than 30; standard normal variable z is used as the test statistic. σ is not
known; use the sample standard deviation s in place of σ.
│𝑋̅ − 𝜇0 │ 30.8−30 (0.8)×√32
│𝑧│ = 𝜎 = 2.1⁄ = = 2.155
⁄ 𝑛 2.1
√ √32
4) Five measurements of the tar content of a certain kind of cigarette yielded 14.5, 14.2, 14.4, 14.3,
14.6 mg/cigarette. Assume that the data are a random sample from a normal population.
(a) Find if there is reason enough to reject the null hypothesis μ = 14.0 in favour of the alternative μ
≠ 14.0.
(b) What would be the inference if the null hypothesis is μ = 14.3 and the alternative is μ ≠14.3?
(c) What would be the inference if the null hypothesis is μ = 14.2 and the alternative is μ > 14.2?
Use 0.05 level of significance.
Solution: Population normal; σ unknown; n < 30
Sample mean = 𝑥̅ = 14.4
∑(𝑥𝑖 −𝑥̅ )2 0.10
∑(𝑥𝑖 − 𝑥̅ )2 = 0.10 𝑠2 = = = 0.025
𝑛−1 4
5) The length of the skulls of 10 fossil skeletons of an extinct species of bird has a mean of 5.68 cm
and a standard deviation of 0.29 cm. Assume that such measurements are normally distributed and
find a 95% confidence interval for the mean length of the skulls of this species of bird.
Solution: Population normal; σ unknown; n < 30
Sample mean = 𝑥̅ = 5.68 cm Sample S.D. = s = 0.29
𝑠
95% confidence limits for μ, the mean length of the skulls = 𝑥̅ ± 𝑡𝛼⁄2, 𝑛−1 ( 𝑛)
√
𝑠 0.29
= 𝑥̅ ± 𝑡0.025,9 ( 𝑛) = 5.68 ± (2.262)
√ √10
6) In a study of television viewing habits, it is desired to estimate the average number of hours that
teenagers spend watching per week. If it is reasonable to assume that σ = 3.2 hours, how large a
sample is needed so that it will be possible to assert with 95% confidence that the sample mean is off
the true mean by less than 20 minutes?
Solution: σ = 3.2 hours α = 0.05
It is required to find the least value of the sample size n.
The difference between the sample mean 𝑥̅ and the true mean μ should not exceed 20 minutes.
1
│𝑥̅ − 𝜇│ < 3 Hour.
Let us work with the assumption that n will turn out to be ≥ 30.
The maximum error (maximum difference between 𝑥̅ and μ)
𝜎 3.2 (1.96) (3.2)
= 𝑧𝛼⁄2 = 𝑧0.025 ( )=
√𝑛 √ 𝑛 √𝑛
1
This term should be less than Hour
3
(1.96) (3.2) 1
<
√𝑛 3
7) An experiment is performed to determine whether the average nicotine content of one kind of
cigarette exceeds that of another kind by 0.20 mg. A random sample of size 50 cigarettes of the first
kind had an average nicotine content of 2.61 mg with a standard deviation of 0.12 mg. Another
random sample of size 40 cigarettes of the second kind had an average nicotine content of 2.38 mg
with a standard deviation of 0.14 mg. Test the null hypothesis μ1 – μ2 = 0.20 against the alternative
hypothesis μ1 – μ2 ≠ 0.20 at the 0.05 level of significance. Also check the decision on the P-value
corresponding to the value of the appropriate test statistic.
Solution:
Populations not known; σ12, σ22 unknown; n1, n2 both > 30
H0: μ1 – μ2 = 0.20 H1: μ1 – μ2 ≠ 0.20 (two-tailed) α = 0.05
̅1 −𝑥̅2 − 𝛿
𝑥
Test statistic = z = ~ 𝑁 (0, 1)
𝑠1 2 𝑠2 2
√ +
𝑛1 𝑛2
2.61−2.38−0.20
│𝑧│ = 2 2
= 1.08
√(0.12) + (0.14)
50 40
𝑠 2 𝑠2 2 0.12
( )2 (0.14)2
√ 𝑛1 + 𝑛2
= √ 50 + 40
= 0.02789
1
̅1 −𝑥̅2 − 𝛿
𝑥
Test statistic = z = ~ 𝑁 (0, 1)
𝑠1 2 𝑠2 2
√ +
𝑛1 𝑛2
(53.8−54.5)− (−0.5)
z= 2 2
= −1.22
√(2.4) +(2.5)
400 500
10) With reference to Question 9, base your decision on the P-value for the test statistic calculated.
Solution: z = −1.22; P-value corresponding to −1.22 is 0.1112.
Inference: P-value > 0.05. Accept the null hypothesis at a level of significance of 0.05.
11) To compare two kinds of bumper guards, six of each kind were mounted on a certain make of
compact car. Then each car was run into a concrete wall at 5 miles per hour, and the following are
the costs of the repairs (in dollars):
Bumper guard 1: 127 168 143 165 122 139
Bumper guard 2: 154 135 132 171 153 149
Test at a level of significance of 0.01 whether the difference between the means of these two
samples is significant.
Solution: Populations not known; σ12, σ22 unknown; Samples independent; n1, n2 both < 30
Assume that the populations are normal and have equal variance 𝜎 2 . The pooled estimate of 𝜎 2 is
𝑆𝑃 2 .
H0: μ1 = μ2 H1: μ1 ≠ μ2 (two-tailed) α = 0.01
(𝑋̅ 1 −𝑋̅ 2 )
Test statistic = T = ~ t distribution with 𝑛1 + 𝑛2 − 2 degrees of freedom
1 1
𝑆𝑃 √ +
𝑛1 𝑛2
2
̅1 )2 + ∑(𝑥2𝑗 − 𝑥
∑(𝑥1𝑖 − 𝑥 ̅2 )
2
𝑆𝑃 = 𝑛1 + 𝑛2 −2
Calculations:
Sample 1:
864
x1i 127 168 143 165 122 139 𝑥̅1 = = 144
6
x1i − 𝑥̅1 − 17 24 −1 21 − 22 −5
(𝑥1𝑖 − 𝑥̅1 )2 289 596 1 441 484 25 ∑(𝑥1𝑖 − 𝑥̅1 )2 = 1836
Sample 2:
894
x2j 154 135 132 171 153 149 𝑥̅2 = = 149
6
x2j − 𝑥̅2 5 − 14 − 17 22 4 0
2 2
(𝑥2𝑗 − 𝑥̅2 ) 25 196 289 484 16 0 ∑(𝑥2𝑗 − 𝑥̅2 ) = 1010
1836 +1010
𝑆𝑃 2 = = 284.6
10
│𝑥̅ 1 − 𝑥̅ 2 │ 5
│𝑡│ = 1 1
= 1 1
= 0.5133
𝑆𝑃 √ + √284.6 √6 + 6
𝑛1 𝑛2
12) The following are the average weekly losses of work-hours due to accidents in 10 industrial
plants before and after a certain safety program was put into operation:
45 and 36 73 and 60 46 and 44 124 and 119 33 and 35
57 and 51 83 and 77 34 and 29 26 and 24 17 and 11
Test whether the safety program is effective at a level of significance of 0.05.
Solution: Populations not known; σ12, σ22 unknown; Samples NOT independent; n1, n2 both < 30
Since the samples are not independent, the difference between the population means cannot
be tested using the usual procedure. The random variable under question is: difference between each
pair of observations.
X denotes the difference between each pair of observations.
X = (average weekly losses of work-hours due to accidents before the safety program)
– (average weekly losses of work-hours due to accidents after the safety program)
Let μ denote the mean of the population of X. The null hypothesis is that there is no significant
improvement after the safety program. The alternative hypothesis is that there a significant decrease
in the average weekly losses of work-hours due to accidents after the safety program. It is now a test
of significance of the difference between the mean of a sample and the mean of the population.
𝐻0 : 𝜇 = 0 𝐻1 : 𝜇 > 0 (right-tailed) 𝛼 = 0.05
Population of X is assumed as normal. Sample size n = 10 (< 30).
𝑋̅ − 𝜇0
Test statistic = 𝑡 = 𝑠
⁄ 𝑛
√
Calculations:
X 9 13 2 5 −2 6 6 5 2 6 ∑ 𝑥 = 52
X2 81 169 4 25 4 36 36 25 4 36 ∑ 𝑥 2 = 420
𝑋̅ = 5.2
10 420 52 2
𝑠2 = [ 10 − (10) ] = 16.62
9
5.2 −0
𝑡= = 4.033
√16.62⁄
√10
13) A study of two kinds of photocopying equipment shows that 61 failures of the first kind of
equipment took on the average 80.7 minutes to repair with a standard deviation of 19.4 minutes,
while 61 failures of the second kind of equipment took on the average 88.1 minutes to repair with a
standard deviation of 18.8 minutes. Find a 99% confidence interval for the difference between the
true average amounts of time it takes to repair failures of the two kinds of photocopying equipment.
Solution: n1 = 61 n2 = 61
𝑋̅1 = 80.7 𝑋̅2 = 88.1
𝑠1 = 19.4 𝑠2 = 18.8
Populations not known; σ1 and σ2 not known; n1 and n2 > 30
Use 𝑠1 and 𝑠2 in place of σ1 and σ2.
𝑠 2 𝑠2 2
99% confidence interval for (μ1 – μ2) = (𝑥̅1 − 𝑥̅2 ) ± 𝑧0.005 √ 𝑛1 +
1 𝑛2
(19.4)2 (18.8)2
= (80.7 − 88.1) ± (2.575)√ +
61 61
14) Twelve randomly selected mature citrus trees of one variety have a mean height of 13.8 feet with
a standard deviation of 1.2 feet and fifteen randomly selected mature citrus trees of another variety
have a mean height of 12.9 feet with a standard deviation of 1.5 feet. Assuming that the random
samples were selected from normal populations with equal variances, construct a 95% confidence
interval for the difference between the true average heights of the two kinds of citrus trees.
Solution: n1 =12 n2 = 15
𝑋̅1 = 13.8 𝑋̅2 = 12.9
𝑠1 = 1.2 𝑠2 = 1.5
Populations normal; σ1 =σ2 (value not known); n1 and n2 < 30
1 1
95% confidence interval for (μ1 – μ2) = (𝑋̅1 − 𝑋̅2 ) ± 𝑡𝛼⁄2, 𝑛1 +𝑛2 −2 [𝑆𝑃 √𝑛 + 𝑛 ]
1 2
1 1
95% confidence interval for (μ1 – μ2) = (13.8 − 12.9) ± 𝑡0.025,25 [(1.376)√12 + 15 ]
1 1
= (0.9) ± (2.06) [(1.376)√12 + 15 ]
𝑃0 (1− 𝑃0 ) 𝑝 − 𝑃0
𝑆. 𝐸.𝑝 = √ Test statistic = 𝑧 = ~ 𝑁(0, 1) [under 𝐻0 ]
𝑛 𝑃 (1− 𝑃0 )
√ 0
𝑛
𝑝(1− 𝑝)
The 100(1 − 𝛼)% confidence interval for P is 𝑝 ± 𝑧𝛼⁄2 √ 𝑛
𝑝1 (1−𝑝1 ) 𝑝2 (1−𝑝2 )
The 100(1 − 𝛼)% confidence interval for P1 − P2 is (𝑝1 − 𝑝2 ) ± 𝑧𝛼⁄2 √ +
𝑛1 𝑛2
15) The manufacturer of a spot remover claims that his product removes 90 percent of all spots. If, in
a random sample, only 174 of 200 spots were removed with the manufacturer’s product, test the null
hypothesis P = 0.90 against the alternative hypothesis P < 0.90 at the 5% level of significance.
Solution: 𝐻0 : 𝑃 = 0.90 𝐻1 : 𝑃 < 0.90 𝛼 = 0.05
174
𝑝= = 0.87 n = 200
200
𝑝 − 𝑃0 0.87 −0.90
Test statistic = 𝑧 = = (0.9)(0.1)
= − 1.41
𝑃 (1− 𝑃0 )
√ 0 √
200
𝑛
16) In random samples, 74 out of 250 persons who watched a certain television program on a small
TV set and 92 out of 250 persons who watched the same program on a large TV set remembered 2
hours later what products were advertised. Test the null hypothesis that there is no difference
between the two populations at a level of significance of 0.01.
Solution: 𝐻0 : 𝑃1 = 𝑃2 𝐻1 : 𝑃1 ≠ 𝑃2 (two-tailed) 𝛼 = 0.01
74 92
n1 = 250 n2 = 250 𝑝1 = 250 = 0.296 𝑝2 = 250 = 0.368
𝑝1 − 𝑝2
Test statistic = 𝑧 =
1 1
√𝑃̂ (1−𝑃̂ )(𝑛 + 𝑛 )
1 2
𝑛1 𝑝1 + 𝑛2 𝑝2 74+92
𝑃̂ = = = 0.332
𝑛1 + 𝑛2 250+250
│0.296−0.368│
│𝑧│ = 1 1
= 1.71
√(0.332)(0.668)( + )
250 250
17) A private opinion poll is engaged by a politician to estimate what proportion of her constituents
favor a certain proposal. Determine how large a sample the poll will have to take to be at least 95%
confident that the sample proportion is off by less than 0.02.
Solution: The maximum error should be less than 0.02 at α = 0.05
𝑃0 (1− 𝑃0 )
Maximum error = 𝑧𝛼⁄2 √ < 0.02 𝑧𝛼⁄2 = 𝑧0.025 = 1.96
𝑛
1
Maximum value of 𝑃0 (1 − 𝑃0 ) is 4
---------------------------------------------------------------------------------------------------------------
To maximize 𝑥(1 − 𝑥):
𝑑𝑦
Let 𝑦 = 𝑥(1 − 𝑥) = (1 − 𝑥) − 𝑥 = 0 ⇒ 1 − 2𝑥 = 0 ⇒ 𝑥 = ½
𝑑𝑥
𝑑2 𝑦
= −2 < 0.
𝑑𝑥 2
Hence, 𝑥(1 − 𝑥) is maximum when 𝑥 = ½
1 1 1
Maximum value of 𝑥(1 − 𝑥) = 2 (1 − 2) = 4
-----------------------------------------------------------------------------------------------------------------
1⁄ 1.96
(1.96)√ 4
< 0.02 ⇒ √𝑛 > 0.04 ⇒ √𝑛 > 49 ⇒ 𝑛 > 2401
𝑛
18) A sample survey at a supermarket showed that 204 of 300 shoppers use discount coupons.
Construct a 95% confidence interval for the corresponding true proportion.
204
Solution: 𝑝= = 0.68
300
𝑝(1− 𝑝)
The 95 % confidence interval for P is 𝑝 ± 𝑧0.025 √ 𝑛
(0.68)(0.32)
= 0.68 ± (1.96)√ = 0.68 ± 0.053 = (0.627, 0.733)
300
19) Among 500 marriage license applications chosen at random in a given year, there were 48 in
which the woman was at least one year older than the man and among 400 marriage license
applications chosen at random six years later, there were 68 in which the woman was at least one
year older than the man. Construct a 99% confidence interval for the difference between the
corresponding true proportions of marriage license applications in which the woman was at least one
year older than the man.
48 68
Solution: 𝑝1 = = 0.096 𝑝2 = 400 = 0.170 𝛼 = 0.01
500
𝑝1 (1−𝑝1 ) 𝑝2 (1−𝑝2 )
The 100(1 − 𝛼)% confidence interval for P1 − P2 is (𝑝1 − 𝑝2 ) ± 𝑧𝛼⁄2 √ +
𝑛1 𝑛2
𝐻0 : σ2 = σ02
against the alternative which could be any one of
σ2 ≠ σ02 σ2 > σ02 σ2 < σ02
(𝑛−1)𝑠2
Test statistic 𝜒2 = ~ 𝜒 2 distribution with n – 1 degrees of freedom [under 𝐻0 ]
𝜎0 2
I (b) This case is a variation of the previous case. A simpler test can be used when the sample size is
more than 30.
A random sample of size n (≥ 30) is taken from a normal population with the variance
denoted by σ2. The variance in the sample is s2. The null hypothesis and the alternative are the same
as in I (a).
𝑠
Test statistic = 𝑧 = (𝜎 − 1) √2(𝑛 − 1) ~ 𝑁(0, 1)
0
𝑠2 2
𝐻1 : 𝜎2 2 > 𝜎1 2 ≥ 𝐹𝛼, 𝑛2−1, 𝑛1−1
𝑠1 2
𝜎1 2
The 100(1 − 𝛼)% confidence interval for is
𝜎2 2
𝑠 2 1 𝑠1 2
[𝑠1 2 𝐹𝛼⁄2, 𝜈1 , 𝜈2
,
𝑠2 2
𝐹𝛼⁄2, 𝜈2, 𝜈1 ]
2
where 𝜈1 = 𝑛1 − 1 and 𝜈2 = 𝑛2 − 1
20) In a random sample, the weights of 24 Black Angus steers of a certain age have a standard
deviation of 238 pounds. Assume that the weights constitute a random sample from a normal
population, and test the null hypothesis 𝜎 = 250 pounds against the two-sided alternative 𝜎 ≠ 250
pounds at the level of significance of 0.01.
Solution: 𝐻0 : 𝜎 = 250 𝐻1 : 𝜎1 ≠ 250 (two-tailed) 𝛼 = 0.01
Population normal; n = 24
(𝑛−1)𝑠2 23 (238)2
Test statistic 𝜒2 = = = 20.84
𝜎0 2 (250)2
There is no reason to reject the null hypothesis at the level of significance of 0.01.
21) In a random sample, s = 2.53 minutes for the amount of time that 30 women took to complete the
written test for their driver’s licenses. At the level of significance of 5%, test the null hypothesis that
σ = 2.85 minutes against the alternative that σ < 2.85 minutes.
Solution: 𝐻0 : 𝜎 = 2.85 𝐻1 : 𝜎1 < 2.85 (left-tailed) 𝛼 = 0.05
Population normal; n = 30
(𝑛−1)𝑠2 29 (2.53)2
Test statistic 𝜒2 = = = 22.85
𝜎0 2 (2.850)2
There is no reason to reject the null hypothesis at the level of significance of 0.05.
Aliter: Since the size of the sample is 30, the sample could be considered large. In this case, the test
is as follows:
𝑠 2.53
Test statistic = 𝑧 = (𝜎 − 1) √2(𝑛 − 1) = (2.85 − 1) √2(29) = − 0.8551
0
22) Past data indicate that the standard deviation of measurements made on sheet metal stampings by
experienced inspectors is 0.41 square inch. If a new inspector measures 50 stampings with a standard
deviation of 0.49 square inch, test the null hypothesis that σ = 0.41 square inch against the alternative
that σ > 0.41 square inch at the level of significance of 5%. Check your answer using the P-value of
the statistic.
Solution: 𝐻0 : 𝜎 = 0.41 𝐻1 : 𝜎1 > 0.41(right-tailed) 𝛼 = 0.05
Population normal; n = 50
Since the size of the sample is 50, the sample could be considered large.
𝑠 0.49
Test statistic = 𝑧 = (𝜎 − 1) √2(𝑛 − 1) = (0.41 − 1) √2(49) = 1.93
0
23) To compare two kinds of bumper guards, six of each kind were mounted on a certain make of
compact car. Then each car was run into a concrete wall at 5 miles per hour, and the following are
the costs of the repairs (in dollars):
Bumper guard 1: 127 168 143 165 122 139
Bumper guard 2: 154 135 132 171 153 149
Test at a level of significance of 0.02 whether it is reasonable to assume that the two populations
sampled have equal variances.
Solution: 𝐻0 : 𝜎1 2 = 𝜎2 2 𝐻1 : 𝜎1 2 ≠ 𝜎2 2 (two-tailed) 𝛼 = 0.02
The test statistic is determined after ascertaining which of the sample variances is larger.
Calculations:
Sample 1:
864
x1i 127 168 143 165 122 139 𝑥̅1 = = 144
6
x1i − 𝑥̅1 − 17 24 −1 21 − 22 −5
(𝑥1𝑖 − 𝑥̅1 )2 289 596 1 441 484 25 ∑(𝑥1𝑖 − 𝑥̅1 )2 = 1836
∑(𝑥1𝑖 − 𝑥̅ 1 )2 1836
Sample variance = 𝑠1 2 = = = 367.2
𝑛1 −1 5
Sample 2:
894
x2j 154 135 132 171 153 149 𝑥̅2 = = 149
6
x2j − 𝑥̅2 5 − 14 − 17 22 4 0
2 2
(𝑥2𝑗 − 𝑥̅2 ) 25 196 289 484 16 0 ∑(𝑥2𝑗 − 𝑥̅2 ) = 1010
2
∑(𝑥2𝑗 − 𝑥̅2 ) 1010
Sample variance = 𝑠2 2 = = = 202
𝑛2 −1 5
𝑠1 2 367.2
Since 𝑠1 2 > 𝑠2 2 , test statistic is 𝐹 = = = 1.82
𝑠2 2 202
24) In the comparison of two kinds of paint, a consumer testing service finds that four 1-gallon cans
of one brand have a standard deviation of 31 square feet, while four 1-gallon cans of another brand
have a standard deviation of 26 square feet. Assume that the two populations are normal and test the
null hypothesis that 𝜎1 = 𝜎2 against the alternative that 𝜎1 > 𝜎2 at the level of significance of 5%.
Solution: 𝐻0 : 𝜎1 = 𝜎2 𝐻1 : 𝜎1 > 𝜎2 (right-tailed) 𝛼 = 0.05
Populations normal; 𝑛1 = 4; 𝑛2 = 4
𝑠1 = 31; 𝑠2 = 26 𝑠1 2 > 𝑠2 2
𝑠1 2 961
Test statistic is 𝐹 = = = 1.4216
𝑠2 2 676
26) Twelve randomly selected mature citrus trees of one variety have a standard deviation of 1.2 feet
and fifteen randomly selected mature citrus trees of another variety have a standard deviation of 1.5
feet. Assuming that the random samples were selected from normal populations, construct a 98%
confidence interval for the ratio of the variances of the two populations sampled..
Solution: n1 =12 n2 = 15
𝑠1 = 1.2 𝑠2 = 1.5
Populations normal;
𝜎1 2
The 100(1 − 𝛼)% confidence interval for is
𝜎2 2
𝑠 2 1 𝑠1 2
[𝑠1 2 𝐹𝛼⁄2, 𝜈1 , 𝜈2
,
𝑠2 2
𝐹𝛼⁄2, 𝜈2, 𝜈1 ]
2
where 𝜈1 = 𝑛1 − 1 and 𝜈2 = 𝑛2 − 1
𝜎1 2
The 98% confidence interval for is
𝜎2 2
𝑠 2 1 𝑠1 2
[𝑠1 2 𝐹0.01,11,14
,
𝑠2 2
𝐹0.01,14,11 ]
2
(1.2)2 1 (1.2)2
= [(1.5)2 , (4.30)] = [0.165, 2.752]
3.87 (1.5)2
27) The following are the heat-producing capacities of coal from two mines (in millions of calories
per ton):
Mine A: 8500 8330 8480 7960 8030
Mine B: 7710 7890 7920 8270 7860
Assume that the data constitute independent random samples from normal populations and construct
a 90% confidence interval for the ratio of the variances of the two populations sampled.
Solution: Populations normal; 𝑛1 = 5 𝑛2 = 5
The ratio of the variances will not change by scaling the observations using the same scale measure
for both samples. Let us use a scale of 10.
Mine A:
4130
Xi 850 833 848 796 803 𝑋̅ = = 826
5
𝑋𝑖 − 𝑋̅ 24 7 22 −30 −23
∑(𝑋𝑖 −𝑋̅)2 2538
(𝑋𝑖 − 𝑋̅)2 576 49 484 900 529 𝑠2 = = = 634.5
𝑛−1 4
Mine B:
3965
Xi 771 789 792 827 786 𝑋̅ = = 793
5
𝑋𝑖 − 𝑋̅ −22 −4 −1 34 −7
∑(𝑋𝑖 −𝑋̅)2 1706
(𝑋𝑖 − 𝑋̅)2 484 16 1 1156 49 𝑠2 = = = 426.5
𝑛−1 4
𝑠1 2 = 634.5 𝑠2 2 = 426.5
𝜎1 2
The 100(1 − 𝛼)% confidence interval for is
𝜎2 2
𝑠 2 1 𝑠1 2
[𝑠1 2 𝐹𝛼⁄2, 𝜈1 , 𝜈2
,
𝑠2 2
𝐹𝛼⁄2, 𝜈2, 𝜈1 ]
2
where 𝜈1 = 𝑛1 − 1 and 𝜈2 = 𝑛2 − 1
𝜎1 2
The 90% confidence interval for is
𝜎2 2
𝑟
Test statistic 𝑡= √𝑛 − 2 ~ 𝑡𝑛−2 [under 𝐻0 ]
√1−𝑟 2
(b) A random sample of n pairs of observation is taken from a bivariate normal population with
the correlation co-efficient denoted by ρ. The correlation co-efficient in the sample is 𝑟. The null
hypothesis to be tested is
ρ = 𝜌0
against the alternative
ρ ≠ 𝜌0
𝑍− 𝑍0
Test statistic ~ N (0, 1) [under 𝐻0 ]
√1⁄𝑛−3
1 1+𝑟 1 1+𝜌
Where 𝑍= 𝑙𝑜𝑔𝑒 [1−𝑟] 𝑍0 = 𝑙𝑜𝑔𝑒 [1−𝜌]
2 2
𝑍1 − 𝑍2
Test statistic 1 1
~ N (0, 1) [under 𝐻0 ]
√𝑛 −3 + 𝑛 −3
1 2
1 1+𝑟 1 1+𝑟
Where 𝑍1 = 𝑙𝑜𝑔𝑒 [1−𝑟1] 𝑍2 = 𝑙𝑜𝑔𝑒 [1−𝑟2 ]
2 1 2 2
Important Note: The alternative hypotheses are two-tailed. Compare the calculated value of the test
statistic with 𝑡𝑛−2, 𝛼⁄2 or 𝑧𝛼⁄2 as the case may be for drawing inference.
28) Test the significance of the values of the correlation co-efficient obtained from samples of n
pairs of observation from bivariate normal population at the level of 5%:
(i) 𝑛 = 38; 𝑟 = 0.6 (ii) 𝑛 = 11; 𝑟 = 0.5
Solution:
(i) 𝐻0 : 𝜌 = 0 𝐻1 : 𝜌 ≠ 0 𝛼 = 0.05
𝑟 0.6
𝑡= √𝑛 − 2 = √38 − 2 = 4.5
√1−𝑟 2 √1− (0.6)2
𝑡 ~ 𝑡𝑛−2
𝑛 > 30; Use normal distribution values.
Inference: 𝑧0.025 = 2.575 4.5 > 𝑧0.025
Reject the null hypothesis at the level of 5%. The correlation co-efficient obtained from the sample
is suggestive of correlation between the variables in the population.
(ii) 𝐻0 : 𝜌 = 0 𝐻1 : 𝜌 ≠ 0 𝛼 = 0.05
𝑟 0.5
𝑡= √𝑛 − 2 = √11 − 2 = 1.73
√1−𝑟 2 √1− (0.5)2
𝑡 ~ 𝑡𝑛−2
Inference: 𝑡9,0.025 = 2.262; 1.73 < 𝑡9,0.025
We cannot reject the null hypothesis at the level of 5%. The correlation co-efficient obtained from
the sample does not signify correlation between the variables in the population.
29) Find the least value of 𝑟 in a sample of 27 pairs of observations from bivariate normal population
which would be significant of correlation in the population at the level of 5%.
Solution:
𝑟
√1−𝑟 2
√𝑛 − 2 > 𝑡27−2,0.025
𝑟
√1−𝑟 2
√25 > 2.060
𝑟
> 0.412
√1−𝑟 2
𝑟2
> 0.1697
1− 𝑟 2
30) The correlation co-efficient in a sample of 18 pairs of observation from bivariate normal
population was found to be 0.5. It is claimed that the correlation co-efficient in the population is 0.7.
Does the sample correlation co-efficient justify this claim at 5% level?
Solution: 𝐻0 : 𝜌 = 0.7 𝐻1 : 𝜌 ≠ 0.7 𝛼 = 0.05
1 1+𝑟 1 1.5
𝑍= 𝑙𝑜𝑔𝑒 [1−𝑟] = 𝑙𝑜𝑔𝑒 [0.5] = 0.5943
2 2
1 1+𝜌 1 1.7
𝑍0 = 𝑙𝑜𝑔𝑒 [1−𝜌] = 𝑙𝑜𝑔𝑒 [0.3] = 0.8673
2 2
│𝑍− 𝑍0 │ │0.5943−0.8673│
Test statistic = 1
= 1.232
√1⁄𝑛−3 √
15
1 1+𝑟 1 1+0.56
𝑍2 = 𝑙𝑜𝑔𝑒 [1−𝑟2 ] = 𝑙𝑜𝑔𝑒 [1−0.56] = 0.6328
2 2 2
32) The following sample data pertain to the shipments received by a large firm from three different
vendors. Test at the 0.01 level of significance whether the three vendors ship products of equal
quality.
Number rejected Number imperfect Number perfect Total
but acceptable
Vendor A 12 23 89 124
Vendor B 8 12 62 82
Vendor C 21 30 119 170
Total 41 65 270 376
33) In 360 tosses of a pair of dice, 74 sevens and 26 elevens are observed. Test the hypothesis that
the dice are fair. Use a level of significance of 5%.
Solution: 𝐻0 : The dice are fair; all the six scores in each die are equally likely.
𝐻1 : The dice are biased.
𝛼 = 0.05
Under 𝐻0
6
Probability of a score of seven from two fair dice = 𝑃{(1,6)(2,5)(3,4)(4,3)(5,2)(6,1)} = 36
6
Expected number of sevens in 360 tosses = 360 × = 60
36
2
Probability of a score of eleven from two fair dice = 𝑃{(5,6)(6,5)} = 36
2
Expected number of elevens in 360 tosses = 360 × = 20
36
Score 7 11
𝑓𝑖,𝑗 74 26
𝑒𝑖,𝑗 60 20
2
(𝑓𝑖𝑗 − 𝑒𝑖𝑗 ) (74 − 60)2 (26 − 20)2
Test statistic = 𝜒 2 = ∑𝑖 ∑𝑗 [ ] = + = 5.067
𝑒𝑖𝑗 60 20
Inference:
The number of independent observations = 2
We have used one constraint that the dice are tossed 360 times.
Number of degrees of freedom = 2 – 1 = 1
𝜒 21,0.05 = 3.841 5.067 > 𝜒 21,0.05
There is sufficient reason to reject the null hypothesis at the level of significance of 5%. The dice are
suspected to be biased.
34) Over 5 years, in T-20 cricket matches, the team ‘Breezy Butterflies’ played 60 matches and won
35 of them. They took first batting in 24 matches and won in 18 of these. An enthusiast commented
that batting first is lucky for the team. Use the 𝜒 2 test to examine if there is any association between
first batting and winning. Use a level of significance of 5%.
Solution: 𝐻0 : First batting has no influence on winning.
𝐻1 : There is an association between first batting and winning.
𝛼 = 0.05
The contingency table showing the observed frequencies is:
No. of games batting No. of games batting Total
first second
No. of games won 18 17 35
Total 24 36 60
Total 24 36 60
2
2 (𝑓𝑖𝑗 − 𝑒𝑖𝑗 ) (18 − 14)2 (17 − 21)2 (6 − 10)2 (19 − 15)2
Test statistic = 𝜒 = ∑𝑖 ∑𝑗 [ ] = + + + = 4.57.
𝑒𝑖𝑗 14 21 10 15
There is sufficient reason to reject the null hypothesis at the level of significance of 5%. There is an
association between first batting and winning. It is seen from the numbers that this is a positive
association. First batting has significantly helped the team in winning the game.
Note: The expected frequencies 𝑒𝑖 ’s will be calculated from theoretical distribution under the null
hypothesis with the condition that ∑𝑖 (𝑒𝑖 ) = ∑𝑖(𝑓𝑖 ). This leads to a further loss of one degree of
freedom.
Important: The test is valid only if none of the 𝑒𝑖 ’s is less than 5. If any 𝑒𝑖 is less than 5, the
particular class would have to be combined with the neighbouring class so as to remedy the snag.
35) Four coins were tossed 160 times and 0, 1, 2, 3 or 4 heads showed, respectively, 19, 54, 23 and 6
times. Use the 5% level of significance to test whether it is reasonable to suppose that the coins are
balanced and randomly tossed.
Solution:
𝐻0 : The coins are balanced and randomly tossed so that the population is binomial with the
probability of heads = ½.
α = 0.05
The following table shows the calculation of the expected frequencies and the test statistic:
i 𝑒𝑖 = 𝑁 × (𝑛𝐶𝑖 𝑝𝑖 (1 − 𝑝)𝑛−𝑖 ) 𝑓𝑖 (𝑓𝑖 − 𝑒𝑖 )2
𝑒𝑖
0 160 × (4𝐶0 (½)4 ) = 10 19 4.263
1 160 × (4𝐶1 (½)4 ) = 40 54 3.630
2 160 × (4𝐶2 (½)4 ) = 60 58 0.069
3 160 × (4𝐶3 (½)4 ) = 40 23 12.565
4 160 × (4𝐶4 (½)4 ) = 10 6 2.667
The value 𝑝 = ½ was taken from the null hypothesis and not the sample. Hence there is no loss of
any degree of freedom in that respect. The total N = 160 was taken from sample data. This constraint
makes one degree of freedom lost.
𝜈 = number of degrees of freedom = no. of classes –1= 5 – 1 = 4.
Inference: 𝜒 2 0.05,4 = 9.488 23.194 > 𝜒 2 0.05,4
We have no reason to accept the null hypothesis at 5%. It is not reasonable to suppose that the coins
are balanced and randomly tossed.
36) Each day, Monday through Saturday, a baker bakes three large chocolate cakes. Those not sold
on the same day are given away to a food bank. Use the data shown in the following table to test at
the 5% level of significance whether they may be looked upon as values of a binomial random
variable.
No. of cakes sold 0 1 2 3
No. of days 1 16 55 228
Solution: 𝐻0 : The observed values can be looked upon as values of a binomial random variable.
α = 0.05
The value of the parameter p of the binomial distribution cannot be found using general reasoning.
In such a case, p is found from the sample.
The mean of the binomial distribution is np. Calculate the value of the mean of the sample to find p.
1
Mean = [0 × 1 + 1 × 16 + 2 × 55 + 3 × 228] = 2.7
300
𝑛𝑝 = 3𝑝 = 2.7 ⇒ 𝑝 = 0.9
The following table shows the calculation of the expected frequencies and the test statistic:
i 𝑒𝑖 = 𝑁 × [𝑛𝐶𝑖 𝑝𝑖 (1 − 𝑝)𝑛−𝑖 ] 𝑓𝑖 (𝑓𝑖 − 𝑒𝑖 )2
𝑒𝑖
0 300 × [3𝐶0 (0.9)0 (0.1)3 ] = 0.3 1
1 300 × [3𝐶1 (0.9)1 (0.1)2 ] = 8.1 16 (17 − 8.4)2
= 8.805
8.4
(55 − 72.9)2
2 300 × [3𝐶2 (0.9)2 (0.1)1 ] = 72.9 55 = 4.395
72.9
(228−218.7)2
= 0.395
3 300 × [3𝐶3 (0.9)3 (0.1)0 ] = 218.7 228 218.7
Frequency of the first class is 1 (< 5). This class has been combined with the next class so as to make
the frequency greater than 5.
Calculation of number of degrees of freedom:
𝑚 = number of classes = 3
𝑡 = number of independent parameters estimated using sample data = 1
𝜈=𝑚−𝑡−1=1
Inference: 𝜒 2 0.05,1 = 3.841 13.595 > 𝜒 2 0.05,1
We have no reason to accept the null hypothesis at 5%. The observed values cannot be looked upon
as values of a binomial random variable.
37) It is desired to test whether the number of gamma rays emitted per second by a certain
radioactive substance is a random variable having the Poisson distribution with λ = 2.4. Use the
following data obtained for 300 1-second intervals to test this null hypothesis at the level of
significance of 0.05.
No. of gamma rays 0 1 2 3 4 5 6 7 or more
Frequency 19 48 66 74 44 35 10 4
Solution: 𝐻0 : The given random variable has Poisson distribution with λ = 2.4.
α = 0.05
The table showing the calculation of the expected frequencies and the test statistic follows later.
Frequency of the last class is 4 (< 5). This class has been combined with the previous class so as to
make the frequency greater than 5.
Calculation of number of degrees of freedom:
𝑚 = number of classes = 7
𝑡 = number of independent parameters estimated using sample data = nil
𝜈=𝑚−𝑡−1=6
Inference: 𝜒 2 0.05,6 = 12.592 29.06 > 𝜒 2 0.05,6
We have no reason to accept the null hypothesis at 5%. The given random variable does not conform
to Poisson distribution with λ = 2.4.
Accept 𝐻0 at 5% level. We have no reason to suspect the null hypothesis at 5% level. The theory that
the frequencies are in the proportion 9 : 3 : 3 : 1 is justified by the result of the experiment.
---------------------