Sunteți pe pagina 1din 13

Exercise IV: Confidence intervals.

Suppose the distribution of some trait X in a population is dependent on some parameter Q


(e.g. the average and/or variance). It is assumed that the form of the density
( , ) f x Q
or
probability function
( ) ( )
k k
P X x p Q
is knon.
!et " be a vector of observations# ( )
$ %
, ,...,
n
X X X
. " is an n & dimensional random variable
and dependent on the parameter Q.
!et ( ), ( ) U E U E be functions of the random variable " such that ( ) ( ) U E U E .
!et be a real number ' $ < < .
If the folloing holds#
( ( ) ( )) $ P U E Q U E
(for continuous random variables ( $ & ) the interval
( ), ( ) U E U E
is called a confidence
interval for Q, and $ the level of confidence.
)heoretically, you can build a confidence interval for each parameter of the trait*s distribution,
but in practice it is used to construct confidence intervals for the average (Q ( m) and
variance (Q ( +ar(X)). ,elo e sho the corresponding confidence intervals for both
parameters.
Confidence interval for population mean (Q=m)
-odel I.
Suppose a trait in some population has a . (m, ) distribution. !et*s assume that & m & is
unknon, is knon, and the sample is small (n / 0').
)he estimate of the population average is the sample statistic
X
. 1iven these conditions,
X

has a
( , ) N m
n

distribution.
)hus, the standardi2ed random variable
X m
U
n

has a . (', $) distribution.


f(u)

%

$&
%

u
'

u
3ence, the random variable 4 satisfies#
( ) $ P u U u

< <
)he value
u

can be read from the tables of the standardi2ed normal distribution for a given
#
( ) $
%
u


Since 4 has a . (', $) distribution and e have
X m
U
n

#
$
X m
P u u
n

< <



,
,
$ P X u m X u
n n

_
< < +

,
.
)hus, for e5ample for ( '.'6, the confidence interval is as follos (
u

from the table (


$.78)#
$, 78 $, 78
, X X
n n

+
hich means that in 76 cases out of $'', the estimated 9m9 is in this range. In other ords,
the error in estimation is not greater than
$, 78
n

in 76: of cases.
"5ample. )he aiting time for a tram has been studied and the folloing values obtained (in
minutes)# $%, $6, $;, $0, $6. Suppose that the aiting time for a tram has a normal
distribution ith unknon mean value (m) and a knon standard deviation % .
)he construction of the confidence interval for the mean consists of the folloing steps#
Step $ <alculate the sample mean x .
Step % <alculate the radius of the confidence interval.
Step 0 <onstruct the confidence interval
$.=60> $.=60 $%.';87>$6.660' x x +
.
)hus, ith a confidence level of $ '.76 , the population average lies in the calculated
confidence interval.
-odel II.
)he trait X has a . (m, )distribution in some population, here neither m nor are knon.
)o build a confidence interval for m, e ill use the t&statistic ith n&$ degrees of freedom#
$
X m
t n
S


f(t)
$&

%

t
'

t
)he value

t
is read from the tables for the student distribution ith n&$ & degrees of
freedom#
( ) $ P t t t

< <
$
$
X m
P t t
S
n


_

< <


,
,
$
$ $
S S
P X t m X t
n n


_
< < +


,
.
)hus, for e5ample, for ( '.'6 the confidence interval is as follos (
t

from tables>
e.g. for n ( %8, this value is %.'68)#
%, '68 %, '68
>
6 6
S S
X X +
hich means that in 76 cases out of $'', 9m9 lies in this range. In other ords, the error in
estimation is not greater than
%.'68
6
S
in 76: of cases.
.ote# this interval is variable, depending on the value of S.
"5ample. Suppose that in the previous e5ample on the average aiting time for a tram there
as no information on the standard deviation in the population. <alculation of the confidence
interval ill be carried out in ; steps.
Step $ <alculate the sample mean x . )his has not changed and still is $0.?
Step % <alculate the sample standard deviation.
Step 0 <alculate the radius of the confidence interval.
Step ; <onstruct the confidence interval.
$.8$?70$$?=> $.8$?70$$?= $%.$?$'=>$6.;$?70 x x +
.
)hus, ith a confidence level of $ '.76 the population average lies in the calculated
confidence interval.
-odel III.
@or large samples (nA 0'), the central limit theorem shos that
( , ) X N m
n

, hile the
la of large numbers shos that
S
.
)herefore, by substituting in model I the population standard deviation by the sample
standard deviation s, Be get#
$
s s
P X u m X u
n n


_
< < +

,
"5ample. Suppose that in the analysis of the average aiting time for a tram there as no
information on the standard deviation in the population, but e managed to gather much more
data.
$;
$0
$6
$6
$%
$6
$;
$0
$6
$6
$%
$6
$;
$0
$6
$6
$%
$6
$;
$0
$6
$6
$%
$6
$;
$0
$6
<alculation of the confidence interval can be conducted using the 9data analysis / descriptive
statistics9 Canel. ,y entering the data into this previously described panel e get#
-ean $0,7;$$=8;=
Standard "rror ',%'%%%$'=;
-edian $;
-ode $6
Standard Deviation $,$=7$;$06;
Sample +ariance $,07'0=;00%
Eurtosis &$,%%;800$8
Skeness &',6?=$%?8=
Fange 0
-inimum $%
-a5imum $6
Sum ;=;
<ount 0;
<onfidence !evel(76,':) ',;$$;%$?8?
<onstruct a confidence interval
'.;$$;%> '.;$$;% $0.6%7=>$;.06%6 x x +
. )hus, ith a confidence level of
$ '.76 the population average is included in the calculated confidence interval. .ote
that, in agreement ith our intuition, the length of the interval has significantly decreased as a
conseGuence of the large number of observations> the estimation has become more accurate.
-odel I+ (confidence interval for a proportion).
If e e5amine a population according to the presence or absence of a certain characteristic
(e.g. Guality control & products classed as good and bad, non&smokers and smokers, etc.), it
can be described by a to&point distribution#
( $) , ( ') $ P X p P X p
,
here the random variable X takes the value $ if the feature e5ists and ' if not present.
)hus, if a feature is observed m times in an n&element sample, an appro5imation of p is
$
$
n
i
i
m
X X
n n

Be find that for '.'6 /p/'.76 and nA $''#


$ $
$
m m m m
m m
n n n n
P u p u
n n n n



_ _




, ,
< < +
' ;



"stimation of the percentage of smokers among students. In a $?''&element sample, the
number of smokers m ( 8''. @or $ & ( '.76,
$, 78 u


, hich can be read from the
cumulative distribution function (to&sided interval)#
8''
', 000
$?''
m
n
,
$
', '$$
m m
n n
n
_

. )hus, the 76: confidence level for the fraction


of students smoking is#
0%,$7: / p / 0;,;': .
Confidence interval for standard deviation - variance (Q = ).
Hssumption# )he feature X has a normal distribution, or close to normal.
-odel +#
)he population mean m and population standard deviation are not knon, the sample is
large (n A 0'). )he statistic
%
%
nS
Z

has a
%

distribution ith n&$ degrees of freedom#


$/% $/%
c
$
c
%
%
$ % %
$
nS
P c c

_
< <

,
here# { } { }
% %
$ %
$ $
% %
P c P c < .
.ote# -ost tables provide the value { }
%
P a
.
Be obtain the folloing confidence interval for the variance#
% %
%
% $
$
nS nS
P
c c

_
< <

,
"5ample. <onsider again the data used in model III.
Step I. Be calculate the variance of the sample#
)hus, the observed value of the non&standardi2ed
%
(the numerator of the eGuation) amounts
to $.07'0 I 0; ( ;=.%=%=.
Step II. Be no calculate the value of c
$
and c
%
.
@or c
$
#
Hnd for c
%
Step III. 3ence the reGuired confidence interval for the variance is#
;=.%=%= ;=.%=%=
> '.70$7> %.;?$7
6'.=%6 $7.';88

-odel +I.
)he trait of interest has a normal or close to normal distribution, large sample nA 0'
$
$ $
% %
s s
P
u u
n n





< <
' ;

+


here#
u

is read from tables for the standard normal distribution, . (', $).
Sample Sie !etermination for interval estimates of t"e avera#e $it" #iven confidence
level.
.ote that in each case e have a reGuired length of interval (the difference beteen the right
and left ends of the confidence interval). )his knoledge allos us to determine the necessary
sample si2e, so that one can estimate the reGuired parameter ith a predetermined precision (a
specified level of confidence).
)ask# hat is the reGuired sample si2e to obtain a confidence interval of given length
(accuracy) at the chosen confidence level $ & J
!et 2d be the reference length of the interval.
-odel I#
!ength of the interval#
% %
u
d
n

. )hus#
% %
%
u
n
d

.
-odel II#
!ength of the interval#
% %
$
t s
d
n

. )hus#
% %
%
$
t s
n
d

+ .
It is necessary to dra an initial sample (s is calculated from this sample) of si2e n
'
. If it is
established that nA n
'
, an e5tra n & n
'
elements must be dran.
-odel III#
Hs in model II.
-odel I+#
a) if e kno the magnitude of p, the reGuired sample si2e is
%
%
($ ) u p p
n
d


#
%) if e do not kno the order of magnitude of p, e use the ineGuality
$
($ )
;
p p
. )hus#
%
%
;
u
n
d

.
&dditional pro%lems.
$) Httendance at statistics lectures is as follos (in :)
=%, =', 6?, 8%, 8=, 6?, 7', 7$, 68, 8?, 8?, =', =$, 6%, 87
Hssuming this is a random sample of lectures, give a 76: confidence interval for the average
percentage attendance.
%) Hccording to the Wall Street Journal , an average of ;; tons of carbon dio5ide ill be
saved per year if ne, more efficient lamps are used. Hssume that this average is based on a
random sample of %6 test runs of the ne lamps and the sample standard deviation as $7
tons. 1ive a 7': confidence interval for the average annual savings.
0) H ne optical disc system prototype as tested and it is claimed to be able to record an
average of %.% hours of 3D )+. Hssume n($' trials and K('.% hour. 1ive a 7': confidence
interval for the mean recording time.
;) In a survey, ortune rated companies on a ' to $' scale. H random sample of $' firms and
their scores are as follos#
@ed"5 ?.7;, Balt Disney ?.=8, <3S ?.8=, -cDonald*s =.?%, <+S 8.?', Safeay 8.6=,
Starbucks ?.'7, Sysco =.;%, Staples 8.;6, 3.I =.%7
<onstruct a 76: confidence interval for the average rating of a company on ortune*s entire
list.
8) 4se the folloing random sample of gasoline prices to construct a 7': confidence interval
for the average price of a litre (in C!.)#
0.?6, 0.76, ;.76, ;.$7, ;.6', ;%6, ;.6', ;.0%.
=) Hn estimate of the percentage of defective pins in a large batch of pins supplied by a
vendor is desired to be estimated ithin $: ith a 7': confidence level. )he actual
percentage of defective pins is guessed to be ;:.
a) Bhat is the minimum sample si2eJ
b) If the actual percentage of defective pins may be anyhere beteen 0: and 8:, tabulate
the minimum sample si2e reGuired for actual percentages from 0: to 8:.
c) If the cost of sampling and testing n pins is (%6L8n) dollars, tabulate the cost for the same
range of percentages as in part (b).
?) @or all the above problems of interval estimation, determine the si2e of the reGuired
research sample such that the reGuired precision is obtained ith a confidence level of not less
than 77:

S-ar putea să vă placă și