Chapter 5 - Sampling Distribution and Hypothesis Testing

Quantitative Skills SAMPLING DISTRIBUTION & HYPOTHESIS TESTING
5 SAMPLING DISTRIBUTION & HYPOTHESIS TESTING

5.0 Introduction
Statistical Inference can be defined as the process by which conclusions are drawn about some
measure or attribute of a population based upon analysis of sample data. Statisticians use the word
population to refer not only to people but to all items that have been chosen for study. Whereas,
sample is used to describe a portion chosen from the population. Statistical inference can
conveniently be divided into two types estimation and hypothesis testing.
5.1 Sa!"in# di$tri%ution
From a population, we draw many samples of n items, and for each sample we find the mean
x
. We
will not obtain the same value for the sample mean each time when we draw a sample. hus, it would
be sensible to arrange our sample means into a fre!uency distribution. We call this fre!uency
distribution the sampling distribution of sample means. Such a sampling distribution e"ists not only for
the mean but for any point estimate, e.g. median, proportion, etc.
5.& Pro!'rti'$ o( $a!"in# di$tri%ution o( t)' 'an
#a$ %ery close to being normally distributed. &specially when sample si'es are large.
#b$ he mean of the sampling is the same as the population mean.
5 he sampling distribution has a standard deviation which is called the standard error of the
mean. It measures the e"tent to which we e"pect the means from the different samples to
vary because of the chance error in the sampling process.
5.* Standard 'rror
(ote that a distribution of sample means that is less spread out #that has a small standard error$ is a
better estimator of the population mean than a distribution of sample means that is widely dispersed
and has a larger standard error.
#a$ Standard error of the mean, with )nown standard deviation for the population.
#i$ When the population si'e is infinite,

x
n
=
Example 1
* ban) calculates that its individual saving accounts are normally distributed
with a mean of +,000 and a standard deviation of +-00. If the ban) ta)es
a random sample of .00 accounts, what is the probability that the sample
mean will lie between ./00 and ,050 0
1iven 2 3 -00 n 3 .00

x
n
=

3
600
100
3 -0
Degee Level ! Asia Pa"i#i" Univesit$ %# Te"&n%l%g$ an' Inn%vati%n Page ( )!

For
x
3 ./00, z =
1900 2000
60
3 4 ..-5
6#' 7 4..-5 $ 3 0.85,5
For
x
3 ,050, z =
2050 2000
60
3 0.9:
6#' ; 0.9: $ 3 0.,/-5
6# ./00 ;
x
; ,050 $ 3 0.,/-5 < 0.85,5
3 0.58/,
#ii$ When the population si'e is finite

x
N n
N
n
=

1
Where,
( 3 si'e of the population
n 3 si'e of the sample

N n
N
1
finite population multiplier
If
n
N
is less than 0.05 , the finite population multiplier need not be used.
Example 2
he =olumbus and &li'abeth =ity railroads has >ust hired -0 e"perienced
carpenters who will be assigned randomly to ten crews of si" men each.
he -0 new carpenters averaged 5 years of previous e"perience with a
standard deviation of ..- years. &d Wilson has been assigned as
foreman of one of the crews. What is the probability that Wilson?s
crew will have an average of 8 years e"perience or more0
1iven 2 3 ..- ( 3 -0 n 3 -

x
N n
N
n
=

1

3
16
6
60 6
60 1
.
3 0.-,8/
Degee Level ! Asia Pa"i#i" Univesit$ %# Te"&n%l%g$ an' Inn%vati%n Page ( )*

z
x
x
=

3
4 5
0 6249
.

3 4..-0
6 #' 4..-0$ 3 0.5 < 0.885,
3 0./85,
3 /8.5,@
#b$ Standard error of the mean, with un)nown standard deviation for the population.
he population standard deviation , can be estimated from sample deviation s.

=

x
n
Example 3
Fit4n4rim caters primarily to middle4aged men who wish to lose weight through a
regular program of e"ercise. hey have ./8 of these men who have been members
for at least one year. *fter sampling 80 of these men, they have found that the
average weight loss was ., pounds and the sample standard deviation was 8
pounds. What is the estimated standard error of this mean 0
1iven s 3 8 n 3 80 ( 3 ./8
= =
s 4

x
n
N n
N 1
3
4
40
194 40
194 1
3 0.5-5 pounds
#c$ Standard error of the proportion
Degee Level ! Asia Pa"i#i" Univesit$ %# Te"&n%l%g$ an' Inn%vati%n Page ( )+
p
pq
n
= #sample$
=
( ) 1
n
#population$
Where,
n 3 number of trials
p 3 probability of success
! 3 #.4p$ 3 probability of a failure
(ote that if population si'e is )nown, the finite population multiplier ,
N n
N

1
is included as
shown in the e"ample below2
Example 4
Aast year a sample was ta)en for 80 of the ,00 active training centres for vocational
rehabilitation of handicapped veterans. It was determined from the sample that .:.,
of the .-00 trainees sampled who completed the program were able to locate
>obs. &ach of the ,00 centres had 80 trainees each. 6rovide an estimate
of the standard error of the proportion of >ob placement success.
1iven 2 p 3
1312
1600
3 0.9, n 3 .-00
! 3 .4p 3 0..9 ( 3 9000
p
pq
n
N n
N 1
=

082 018
1600
8000 1600
8000 1
. .
3 0.009-
5.+ ,'ntra" "iit t)'or'
his theorem states that as the sample si'e increases , the sampling distribution of the mean
approaches the normal distribution in form, regardless of the form of the population distribution. For
practical purposes, the sampling distribution of the mean can be assumed to be appro"imately normal,
regardless of the population distribution whenever the sample si'e is greater than thirty.
Because the central limit theorem ma)es it possible to use the normal probability distribution in a wide
variety of decision problems involving an un)nown population mean, many statisticians consider it to
be the most important theorem in applied statistics.
5.5 E$tiation
&stimation is one of the way in ma)ing inferences about characteristics of populations from
information contained in samples. We can ma)e two types of estimates about a population2 a point
estimate and an interval estimate. * point estimate is a single number that is used to estimate an
un)nown population parameter. *n interval estimate is a range of values used to estimate a population
parameter. *n estimator is a sample statistic used to estimate a population parameter.
5.5.1 ,rit'ria o( a #ood '$tiator
#a$ Cnbiasedness 44 * sample mean is an unbiased estimator of a population mean
because the mean of the sampling distribution of sample mean ta)en from
the sample population is e!ual to the population itself.
#b$ &fficiency 44 refers to the si'e of the standard error of the statistics. If we compare two
statistics from a sample of the same si'e and try to decide which one is the more
efficient estimator, we would pic) the statistics that has smaller standard error or
standard deviation of the sampling distribution.
Degee Level ! Asia Pa"i#i" Univesit$ %# Te"&n%l%g$ an' Inn%vati%n Page ( ),
#c$ =onsistency 44 If as the sample si'e increases, it becomes almost certain that the
value of the statistics comes very close to the value of the population parameter.
#d$ Sufficiency 44 *n estimator is sufficient if it ma)es so much use of the information in
the sample that no other estimator could e"tract from the sample additional
information about the population parameter being estimated.
5.5.& Bia$'d '$tiator
*n estimator is said to be biased if its e"pected value is not e!ual to #i.e greater or less than $
the population parameter. * biased estimator will tend to over4estimate or under4estimate the
true value of the population parameter.
5.- ,on(id'nc' "'.'"
It applies to the construction of a confidence interval estimate for an un)nown population parameter. It
determines the proportion of the sampling distribution within which the estimates lies. In other words, it
indicates how confident we are that the interval estimate will include the population parameter. *
higher probability means more confidence.
* /5@ confidence level will mean that with /5@ certainty the population mean lies within the range2
sample mean ../- standard errors.
5.-.1 ,on(id'nc' int'r.a" / "iit$ / %and$
It is used to establish a value for the #un)nown$ population parameter. For e"ample, if the
mean life of light bulbs produced by a company is un)nown, we could ta)e a sample and use
the mean life from the sample to estimate the true mean life for all light bulbs produced by the
company.
* /5@ confidence interval implies that /5@ of all intervals generated from a sample of a given
si'e will contain the true value of the population parameter.
#a$ =alculate interval estimates of the mean from large samples #n7:0$ of infinite
population.
Example 5
* ban) calculates that its individual saving accounts are normally
distributed with a mean of +,000 and a standard deviation of +-00. If the
ban) ta)es a random sample of .00 accounts, find the /5@ confidence
interval for the mean 0
1iven2 n 3 .00
x
3 ,000 3 -00
Standard error of the mean,

x
n
=
3
600
100
3 -0
he /5@ confidence interval for the true mean is
3 x z
x

3 ,000 ../- " -0
i.e. .99,.8 to ,..5.-
#b$ =alculate interval estimates of the mean from large samples of finite population with
un)nown population standard deviation.$
Example 6
Defer to &"ample : , find the upper and lower limits of the confidence
interval of average weight loss if the desired confidence level is /0@ 0
1iven2 s 3 8 n 3 80 ( 3 ./8
x
3 .,
&stimated population standard deviation,

3 s 3 8
Degee Level ! Asia Pa"i#i" Univesit$ %# Te"&n%l%g$ an' Inn%vati%n Page ( )(
&stimated standard error of the mean,

x
n
N n
N 1
3
4
40
194 40
194 1
3 0.5-5 pounds
he /0@ confidence interval of average weight loss is
3
x z
x

3 ., ..-8 " 0.5-5
lower limit 3 ...05 pounds
upper limit 3 .,./: pounds
#c$ =alculate interval estimate of proportion with finite population.
Example 7
Defer to &"ample 8 . Eetermine the /5@ confidence interval for the
proportion of successful >ob placements.
1iven 2 p 3 0.9, ! 3 .4p 3 0..9 n 3 .-00 ( 3 9000
Standard error of proportion,
p
pq
n
N n
N 1
3
082 018
1600
8000 1600
8000 1
. .
3 0.009-
he /5@ confidence interval for the proportion of successful >ob placement
is
3
p z
p
3 0.9, ../- " 0.009-

3 0.90: to 0.9:5
5.0 Int'r.a" '$tiat' u$in# t)' t1di$tri%ution

Cse of the t4distribution for estimating is re!uired whenever the sample si'e is :0 or less and the
population standard deviation is not )nown.
(ote that , using t4 table we must specify the degree of freedom.
Example 8
1iven that n 3 .0
x
3 ..800 s 3 500
Find the /5@ confidence interval for the mean.
3 s 3 500

x
=
700
10
3 ,.,.:9
Since n ; :0, is un)nown, t4distribution is used with df 3 /
he /5@ confidence interval for the mean is
3 x t
x

Degee Level ! Asia Pa"i#i" Univesit$ %# Te"&n%l%g$ an' Inn%vati%n Page ( )-
3 ..800 ,.,-, " ,.,.:9
3 ..:/5 to ..805
5.2 D't'rin' t)' $a!"' $i3' in '$tiation
#a$ Sample si'e for estimating a mean
Example 9
It was )nown from past e"perience that the standard deviation of the annual earnings
of the entire population of these graduates is about .500. Fow large a sample si'e
should the university ta)e in order to estimate the mean annual earnings of last year?s
class within 500 at a /5@ confidence level 0
1iven2 3 .500
z
x
= 500
G value for /5@ confidence level 3 ../-
../-
x
3 500

x
3 ,55

x
3
n
n =
1500
255
n 3 :8.-
he sample si'e is :5
#b$ Sample si'e for estimating a proportion
Example 10
We want to find what proportion of the student are in favour of a new grading system.
If we want the sample si'e that will enable us to be /0@ certain of estimating the
true proportion that are in favour of the new system within 0.0,.
1iven 2
z
p
3 0.0,
G value for /0@ confidence level is ..-8
If p, ! values are not given, use 0.5 for p and !.

..-8
p
3 0.0,

p
=
0 02
164
.
.

p
3
pq
n
=
0 02
164
.
.

05 05 0 02
164
2
. . .
.
n
n 3
0 25
0 0001488
.
.

n 3 .-90
he sample si'e is .-90.
5.4 H5!ot)'$i$ T'$tin#
In decision ma)ing, we ma)e an assumption, called hypothesis, then we collect some sample data,
produce sample statistics and use this information to decide how li)ely it is that our hypothesi'ed
population parameter is correct.
It is distinguished from a confidence interval in two ways2
#a$ We usually have a priori information about the value of the population parameter.
Degee Level ! Asia Pa"i#i" Univesit$ %# Te"&n%l%g$ an' Inn%vati%n Page ( ).
#b$ he hypothesi'ed test may be used to establish whether or not the value of the population
parameter has changed.
We cannot accept or re>ect a hypothesis about a population parameter simply by intuition. Instead, we
need to learn how to decide ob>ectively, on the basis of sample information, whether to accept or re>ect
a hunch.
In hypothesis testing, we must state the assumed or hypothesi'ed value of the population before we
begin sampling. he assumption we wish to test is called the null hypothesis, F0 .
he null hypothesis specifies the value of the population parameter to be tested in a hypothesis test.
he word HnullI is used because what is being tested is the assumption that there is no differenceI
between the parameter value specified in the null hypothesis and the actual value of the population
parameter.
he choice of the parameter value designated as the null hypothesis is crucial, as in hypothesis testing
the null hypothesis is always given the benefit of the doubt. In other words, the null hypothesis will be
accepted unless the sample result is clearly inconsistent with it.
If our sample results fail to support the null hypothesis, we must conclude that something else is true.
Whenever we re>ect the null hypothesis, the conclusion we do accept is called the alternative
hypothesis, F..
*lternative hypothesis, on the other hand, is the formulation of the population parameter #s$
contradicting the null hypothesis. It is crucial in determining the critical region of a significance test.
5.4.1 ,rit'ria to d'cid' 6)'t)'r to acc'!t or r'7'ct t)' nu"" )5!ot)'$i$
o allow for the fact that the sample may or may not be representative, it is usual procedure to
find the confidence interval.
*ssuming the hypothesis is correct, then the significance level indicates the percentage of
sample means that is outside the confidence.
5.4.& Si#ni(icanc' "'.'"
he significance level applies to a hypothesis test on a population parameter and relates to
the probability of committing a ype . error, naming re>ecting the null hypothesis when in
reality it is true. he si'e of the significance determines the critical ' value upon
which the decision regarding the hypothesis is based.
here is no single standard or universal level of significance for testing hypothesis. he higher
the significance level, the higher the probability of re>ecting a null hypothesis when it is true.
5.4.* T5!' I and t5!' II 'rror$
* type I error, is the error of re>ecting a null hypothesis when it is true. * type II error, is the
error of accepting a null hypothesis when it is actually false.
In order for any tests of hypothesis or rules of decisions to be good, they must be designed so
as to minimi'e errors of decision. his is not a simple matter since, for a given sample si'e, an
attempt to decrease one type of error is accompanied in general by an increase in the other
type of error. he only way to reduce both types of errors is to increase the sample si'e,
which may or may not be possible.
5.4.+ On'1tai"'d t'$t and t6o1tai"'d t'$t
* two4tailed test of a hypothesis will re>ect the null hypothesis if the sample mean is
significantly higher or lower than hypothesi'ed population mean. #here are , re>ection
regions.$
* one4tailed test is a significance test in which the null hypothesis can be upset by values well
above or below the mean, but not both.
Degee Level ! Asia Pa"i#i" Univesit$ %# Te"&n%l%g$ an' Inn%vati%n Page ( )/
he left tailed test is used if the hypothesis are
Fo 2 3 Fo
F. 2 ; Fo
he sample mean is significantly below the hypothesi'ed population mean that leads us to
re>ect the null hypothesis in favour of the alternative hypothesis or the re>ection region is in
the lower tail of the distribution of the sample mean.
he right tailed test is used if the hypothesis are
Fo 2 3 Fo
F. 2 7 Fo
Jnly values of the sample mean that is significantly above the hypothesi'ed population mean
will cause us to re>ect the null hypothesis in favour of the alternative hypothesis or the re>ection
region is in the upper tail of the distribution of the sample mean.
In e"amination !uestions we must first decide which sort of test to apply. If the !uestion uses
words which imply a change in one direction 44 words such as Kbetter?, Kworse? , Kimproved?,
Kincreased?, Kreduced?, etc., then a Kone4tailed test? must be employed.
If the !uestion implies that a change in either direction is important 44 perhaps as)ing Kis there
any difference?, Kis there any change? 4 then a two4tailed test must be used.
5.4.5 H5!ot)'$i$ t'$tin# (or on' $a!"' 8'an & !ro!ortion9
#a$ wo4tailed test #mean$
Example 11
Suppose that a company?s management accountant has estimated that the average
direct cost of providing a certain service to a customer is +80. * sample has been
ta)en, consisting of .50 service provision, and the mean direct cost of each service
in the sample was +85 with a standard deviation of +.0.
1iven 2
3 s 3 .0
Fo 2 3 80
F. 2 80
significance level 3 5@
Standard error,

x
n
=
3
10
150

3 0.9.-5
=ritical G value 2 ../-
Sample value 3
x
x

3
45 40
08165
.
3 -..
he sample value of is outside the critical range ../-, hence its probability is
#considerably$ less than 5@ and we re>ect the null hypothesis. We can conclude
that the true average cost is different from +80 at 5@ significance level.
*lternate solution2
1iven 2
3 s 3 .0
Fo 2 3 80
F. 2 80
significance level 3 5@
Degee Level ! Asia Pa"i#i" Univesit$ %# Te"&n%l%g$ an' Inn%vati%n Page ( )0
Standard error,

x
n
=
3
10
150

3 0.9.-5
Aimits of the acceptance region,
3 Fo ../-
x
3 80 ../- " 0.9.-5
3 :9.80 to 8..-
*s the sample mean, 85, lies outside the acceptance region, the hypothesis
is re>ected. We can conclude that the true average cost is different from
+80 at 5@ significance level.
#b$ wo4tailed test #proportion$
Example 12
Jver the years the proportion of faulty goods has been !uite steady at :@
but a recent sample of 50 items had 8@ faulty. *t 5@ significance level, can
we conclude that a change has occurred 0
1iven 2 Fo 2 3 0.0:
Fo 2 0.0:
Significance level 2 5@
Standard error 3
( ) 1
n
3
0 03 0 97
50
. .
3 0.0,8.
=ritical G value are ../-
Sample value , 3
p
p

3
0 04 0 03
0 0241
. .
.

3 0.8.
he sample value is between ../- ,so we cannot re>ect the null hypothesis. We
can conclude that at 5@ significance level, the results are compatible with the true
proportion still being :@.
#c$ Jne4tailed test #mean$
Example 13
he residents of Lineral =ity claim that because of the natural fluoride in
their drin)ing water, their children have fewer cavities. his past year they
have learned that country wide average was ..-5 cavities per child between
the ages of 5 and .5. he standard deviation for the country was 0.5-
cavities per child. he residents of Lineral =ity randomly sampled 95 of their
children and found an average of ..: cavities per child. Csing a significance
level of 0.0,, should the citi'ens of Lineral =ity conclude that the incidence
of cavities is significantly lower in their community0
1iven2 Fo 2 3 ..-5
F. 2 ; ..-5
Aevel of significance 3 0.0,
Degee Level ! Asia Pa"i#i" Univesit$ %# Te"&n%l%g$ an' Inn%vati%n Page ( )!1
Standard error 3
n
3
056
85
.
3 0.0-.
=ritical G value 3 ,.05
Sample value 3
13 165
0 061
. .
.
3 45.58
Since sample value is less than the critical value, the null hypothesis is re>ected.
We can conclude that the incidence of cavities is significantly lower in Lineral =ity.
#d$ Jne4tailed test #proportion$
Example 14
It is believed that /0@ of potential customers are familiar with a firm?s
trademar) but a sample of .00 consumers showed that only 50@
recognised the trademar). Is this significantly less than the e"pected
proportion at .@ significance level 0
1iven 2 Fo 2 3 0./
F. 2 ; 0./
Significance level 2 .@
Standard error 3
( ) 1
n
3
0 9 01
100
. .
3 0.0:
=ritical G value 3 4,.:,-:
Sample value 3
p
p

3
0 7 0 9
0 03
. .
.
3 4 -.-5
Since sample value is less than the critical value, we re>ect the null
hypothesis. We can conclude that the proportion of consumers familiar
with the trademar) is significantly less than /0@ at .@ significance level.
#e$ When sample si'e is less than :0, and the population standard deviation is not
)nown, t4distribution is used.
Example 15
Jver the past few years, Moe Schlumph, the manager of the local Broil4
Burger hamburger stand has been averaging sales of 900 hamburgers per
day. Decently, medical researches have discovered a possible lin) between
fried hamburgers and cancer, but broiled hamburgers are considered safe.
Moe believes that since his hamburgers are broiled, this new medical finding
result in improved sales. he recorded the sales ,0 days which were
selected at random during the following month. he average sales for the
sample period were 9.9 hamburgers, and the standard deviation was 58
hamburgers. *t 0.05 significance level, can be conclude that hamburger
sales have significantly increased 0
1iven 2
3 s 3 58
Fo 2 3 900
Degee Level ! Asia Pa"i#i" Univesit$ %# Te"&n%l%g$ an' Inn%vati%n Page ( )!!
F. 2 7 900
Aevel of significance 3 0.05
Standard error 3
n
3
54
20

3 .,.05
*t degree of freedom 3 .,, 3 0..0 #one tail$
=ritical, t4value 3 ..5,/
sample value 3
818 800
12 07
.
3 ..8/
Since sample value is less than critical value, we accept the null hypothesis
and conclude that there is no significant increase in mean daily hamburger
sales.
5.1: ,)i1S;uar' Di$tri%ution
=hi4s!uare test is an important e"tension of hypothesis testing and is used when it is wished to
compare an actual, observed distribution with a hypothesi'ed, or e"pected distribution. If is often
referred to as a Kgoodness of fit? test.
It enables us to test whether more than two population proportions can be considered e!ual. In other
words, we might want to test whether one group of people # or items$ might have the same range of
attributes as other groups, or whether there is a significant difference between them.
When a comparison of this )ind is made, the data is usually arranged in a table, )nown as a
contingency table or two way classification tables. # =ontingency is defined as both K close connection?
and a Kchance happening or occurrence of events$
* contingency table is a table having D rows and = columns. &ach row corresponds to a level of one
variable, each column to a level of another variable. &ntries in the body of the tables are the
fre!uencies with which each variable combination occurred.
he formula for the calculation of
,
is as follows2
,
3
( )

E
2
E O
where J 3 the observed fre!uency of any value
& 3 e"pected fre!uency of any value
he
,
value obtained from the formula is compared with the value from the table of
,
for a given
significance level and the number of degrees of freedom, i.e. the usual hypothesis testing procedures.
If the sets of observed and e"pected fre!uencies are nearly ali)e, we can reason intuitively that we will
accept the null hypothesis. If there is a large difference, then we may intuitively re>ect the null
hypothesis.
Example 16
* random sample of 800 householders is classified by two characteristics2
Whether they own a colour television and by what type of householder #i.e. owner4occupier,
private tenant, council tenant$. he results of the investigation are given below2
*ctual
fre!uencies
Jwner occupier =ouncil tenant 6rivate tenant otal
=olour % .50 -0 ,0 ,:0
(o colour % 85 -9 55 .50
./5 .,9 55 800
It is re!uired to test at 5@ level.
Solution :
Degee Level ! Asia Pa"i#i" Univesit$ %# Te"&n%l%g$ an' Inn%vati%n Page ( )!*
F0 2 he two classifications are independent #i.e. no relation between classes of
householder and colour % ownership$
F. 2 he classifications are not independent.
* contingency table can now be drawn up as follows2
otal Jwner occupier =ouncil tenant 6rivate tenant
observed e"pected observed e"pected observed e"pected
=olour
%
,:0 .50 .., -0 58 ,0 88
(o colour
%
.50 85 9: -9 58 55 ::
800 ./5 ./5 .,9 .,9 55 55
he
,
calculation can now be made
Jbserved fre!uencies &"pected fre!uencies
#J$ #&$ #J & $ #J & $
,
( )
E
E O
2
.50 .., <:9 .888 .,.9/

85 9: 4:9 .888 .5.80
-0 58 4.8 ./- ,.-5
-9 58 <.8 ./- :.-:
,0 88 4,8 55- .:.0/
55 :: <,8 55- .5.85
,
3 -5...
Eetermine the appropriate
,
value from the table . his is similar to the t4distribution, but with
,
, there is always a one4tailed test of significance. In
,
goodness of fit test with a
contingency table, the number of degrees of freedom , v 3 # rows .$ # columns .$
#6lease note that the row and column total cells are ignored in the calculating of the number
of degrees of freedom$
In this case, v 3 #, .$ # : .$
3 , degrees of freedom
From the
,
critical point table, the critical point at the 5@ level of significance is 5.//..
*s the calculated value #-5...$ is greater than the table value we re>ect the null hypothesis
and accept that there is a connection between the type of householder and colour %
ownership.
5.1:.1 Yat'$< ,orr'ction
*pply whenever there is only one degree of freedom in a
,
test.
In calculation of
,
for a contingency table with , rows and , columns #i.e. only one degree of
freedom$ it is necessary to deduct 0.5 from the absolute value of #J &$ for each item. By
this, we mean that if # J &$ is < 8, we reduce it to :.5 and if # J &$ is 8 , we also reduce
to :.5, regardless of the < or sign.
Example 17
In a mar)et survey carried out on behalf of Waring itfer Atd, a hat manufacturer, the
following results were obtained.
Len Women
(ever wear hats 9, -5
Sometimes wear hats ,8 ,5
he mar)eting manager wants to )now whether the survey reveals any difference in the hat4
wearing habits of men N women. Cse the 5@ level of significance to prepare appropriate
advice.
Degee Level ! Asia Pa"i#i" Univesit$ %# Te"&n%l%g$ an' Inn%vati%n Page ( )!+
Solution:
F0 2 6L 3 6w
F. 2 6m 6w
Lan Woman
otal Jbserved &"pected Jbserved &"pected
(ever wear hats .85 9, 5/.5 -5 -5.5
Sometimes wear hats 8/ ,8 ,-.5 ,5 ,,.5
./- .0- .0-.0 /0 /0.0
(ow, we apply Oates correction, and calculate , not #J &$, but the absolute value of #J &$
J & J & #J &$ 0.5 ##J &$ 0.5$
,
( ) [ ]
E
0.5 E O
2

9, 5/.5 <,.5 , 8 0.05
,8 ,-.5 4,.5 , 8 0..5
-5 -5.5 4,.5 , 8 0.0-
,5 ,,.5 <,.5 , 8 0..9
,
3 0.88
*t the 5@ level of significance, the critical point in the
,
distribution with one degree of
freedom is :.98. Since the value of
,
in the test is 0.88 and below the critical point, we
accept the null hypothesis, that there is no difference between the hat wearing habits of men
and women.
5.1:.& Pr'caution$ a%out u$in# t)' ,)i1$;uar' T'$t
#a$ o use the
,
test, the sample si'e must be large enough to guarantee the similarity
between the theoretically correct distribution and our sampling distribution of
,
.
When the e"pected fre!uencies are too small, the value of
,
will be overestimated
and will result in too many re>ections pf the null hypothesis.
#b$ If, when calculating the e"pected cell values, the e"pected cell fre!uency is less than
5, the
,
test becomes inaccurate. In such circumstances the cell which is less than
5 is merged with an ad>oining cell so that the e"pected fre!uencies in all resulting
cells are at least 5.
#c$ he larger the value of
,
the bigger the differences between actual and e"pected.
We should re>ect the null hypothesis when the difference between the observed N
e"pected fre!uencies is too large.
#d$ If actual results were e"actly as e"pected then #J &$ would e!ual 'ero and
,

would be 'ero. If the
,
value was 'ero, we should be careful to !uestion whether
absolutely no difference between observed and e"pected fre!uencies.
Degee Level ! Asia Pa"i#i" Univesit$ %# Te"&n%l%g$ an' Inn%vati%n Page ( )!,

Chapter 5 - Sampling Distribution and Hypothesis Testing

Încărcat de

Informații document

Descriere originală:

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Chapter 5 - Sampling Distribution and Hypothesis Testing

Încărcat de

Drepturi de autor:

Formate disponibile

Quantitative Skills SAMPLING DISTRIBUTION & HYPOTHESIS TESTING

5 SAMPLING DISTRIBUTION & HYPOTHESIS TESTING

3 0.9, ../- " 0.009-

.50 .., <:9 .888 .,.9/

S-ar putea să vă placă și