Sunteți pe pagina 1din 41

Interval Estimation and

Sample Size Decision


• Point estimation
• Interval estimation for
 Population Mean
 Population Proportion
 Population Variance
• Sample size decision in estimating
 Population Mean
 Population Proportion
 Population Variance

QAM – II by Gaurav Garg (IIM Lucknow)


Statistical Estimation
• We take data from a sample and say something about the
population from which the sample was drawn
• Sample statistic is used to estimate unknown parameter.
• There are two types of estimation:
• Point Estimation:
 Calculation of a single value of a sample statistic
• Interval Estimation
 Calculation of an interval using a sample statistic
 This interval is calculated at a desired level of confidence
• Eg. 95% confidence, 99% confidence, can not be 100%
 Sample to sample variation (standard error) is also taken
into consideration.
QAM – II by Gaurav Garg (IIM Lucknow)
Confidence Interval Estimates
• Let θ be the unknown parameter.
• Suppose T is the point estimate of θ and E(T) = θ.
• Fix the confidence level at (1-  )x100 %.
•  is the probability of “error”.
• (1- ) is called confidence coefficient.
• Thus, for 95% confidence level,  = 0.05.
• Confidence interval estimate of θ is [T-h, T+h]
• It means that P(T-h ≤ θ ≤ T+h) = 1- 
• Where, h = critical value x standard error

QAM – II by Gaurav Garg (IIM Lucknow)


• Formula for confidence interval is [T-h, T+h]
• T = Unbiased (Point) Estimate of the unknown
parameter
• h = critical value x standard error of the estimate
• Critical Value is obtained using confidence coefficient
(1-  ) (will be discussed later)
• Lower Confidence Limit = T-h
• Upper Confidence Limit = T+h
Point Estimate
Lower Confidence Limit Upper Confidence Limit

Width of
confidence interval
QAM – II by Gaurav Garg (IIM Lucknow)
• Using Central Limit Theorem, for large sample
T 
Z ~ N (0,1)
SE (T )
• Where T is the unbiased point estimate of θ
• SE(T) is the standard error of T.
• Confidence coefficient is fixed as (1-  ).
• Critical value is given by z/2 as below
• P(-z/2 < Z < z/2) = (1-  ), where Z~N(0,1).
N(0,1)

QAM – II by Gaurav Garg (IIM Lucknow)


T 
• Z ~ N (0,1)
SE (T )
• For Z~N(0,1) P  z / 2  Z  z / 2   1  

 T  
• This implies P   z / 2   z / 2   1  
 SE (T ) 

• or P T  z / 2  SE (T )    T  z / 2  SE (T )  1  

• Thus (1-  )x100 % Confidence interval estimate of θ is

• [T - z/2 x SE(T), T + z/2 x SE(T)]

QAM – II by Gaurav Garg (IIM Lucknow)


Confidence Interval for Population Mean μ
(σ Known)
• When
 Population standard deviation σ is known
 Population is normally distributed
 If population is not normal, sample size is large
• (1-  )x100 % Confidence interval estimate of μ
is given by
   
 x  z / 2  , x  z / 2  
 n n
• where P(-z/2 < Z < z/2) = (1-  ), Z~N(0,1).
QAM – II by Gaurav Garg (IIM Lucknow)
Commonly used confidence levels and corresponding
critical values (N(0,1) Distribution)
N(0,1)

α
 .025 1    0.95 α
 .025
2 2

-z/2 = - 1.96 0 z/2 = 1.96


Confidence
Confidence Level Coefficient α Critical Value
80% 0.8 0.2 1.28
90% 0.9 0.1 1.645
95% 0.95 0.05 1.96
98% 0.98 0.02 2.33
99% 0.99 0.01 2.58
99.80% 0.998 0.002 3.08
99.90% 0.999 0.001 3.27
QAM – II by Gaurav Garg (IIM Lucknow)
Distribution of the Sample Mean N  ,  n 
/2 1  /2

μx  μ
Value of Sample Mean x (1-) x100%
for different samples of intervals will
contain μ.

Confidence Intervals (for different samples)


 σ σ 
 x  z α/ 2 , x  z α/ 2 
 n n
QAM – II by Gaurav Garg (IIM Lucknow)
• Example:
• A sample of 11 circuits from a large normal population
has a mean resistance of 2.20 ohms.
• We know from past testing that the population standard
deviation is 0.35 ohms.
• Determine a 95% confidence interval for the true mean
resistance of the population.
• Ans. σ
x  z ( 0.025)
n
 2.20  1.96 (0.35/ 11)
 2.20  0.2068
(1.9932 , 2.4068)
QAM – II by Gaurav Garg (IIM Lucknow)
Confidence Interval for Population Mean μ
(σ Unknown)
• Use unbiased estimate of σ, given by
1 n
s1   i
n  1 i 1
( x  x ) 2

• Case 1: n is small
 Value of s1 varies sample to sample
 This increases extra variability
 Normal distribution can not be used
 We use t distribution with (n -1) d.f.
• Case 2: n is large
 When n is large, t distribution approaches normal distribution
 We use N(0,1) distribution

QAM – II by Gaurav Garg (IIM Lucknow)


Case 1: σ is unknown and n is small
• Assumption: Population has normal distribution
• (1-  )x100 % Confidence interval estimate of μ is given
by
 s1 s1 
 x  t / 2  , x  t / 2  
 n n
• Where t/2 is given such that
• P(-t/2 < T < t/2) = (1-  ), for T ~ t(n-1).

QAM – II by Gaurav Garg (IIM Lucknow)


Some Critical Values of t(n-1) distribution for
given α and d.f. (n-1)

α t(n-1) α
2
1 2

0
-t/2 t/2
d.f. Critical Value Critical Value
(n-1) at α = 0.05 at α = 0.10
1 12.706 6.314
2 4.303 2.92
3 3.182 2.353
4 2.776 2.132
5 2.571 2.015
6 2.447 1.943
7 2.365 1.895
QAM – II by Gaurav Garg (IIM Lucknow)
• Consider the same example
• A sample of 11 circuits from a large normal population
has a mean resistance of 2.20 ohms.
• Population standard deviation is not known.
• Sample standard deviation (s1) is 0.35 ohms.
• Determine a 95% confidence interval for the true mean
resistance of the population.
• Ans. If we are given s2, we
s1 can use following
x  t ( 0 .025 )
n formula
 2 .20  2.22814  ( 0 .35 / 11 ) n 2
 2 .20  0 .2351
s 
2
s
n 1
1

( 1.9649 , 2 .4351 )
QAM – II by Gaurav Garg (IIM Lucknow)
Case 2: σ is unknown and n is large
• Population may or may not have normal distribution

• (1-  )x100 % Confidence interval estimate of is μ given


by
 s1 s1 
 x  z / 2  , x  z / 2  
 n n
• Where z/2 is given such that
• For Z~N(0,1), P(-z/2 < Z < z/2) = (1-  ).

QAM – II by Gaurav Garg (IIM Lucknow)


Confidence Interval Estimate of μ

σ known σ Unknown

n small n large n small n large


Normal Any Normal Any
Distribution Distribution Distribution Distribution

     s1 s1 
 x  z / 2  , x  z / 2    x  z / 2  , x  z / 2  
 n n  n n

 s s 
 x  t / 2  1 , x  t / 2  1 
 n n
QAM – II by Gaurav Garg (IIM Lucknow)
Confidence Intervals for Population Proportion π
Case 1:
• Small Sample: out of scope
Case 2:
• Large Sample
p 
• We know that Z  ~ N (0,1) for large n
 (1   ) n
• For Z~N(0,1), we have
P( z / 2  Z  z / 2 )  1  
 p  
or 
P  z / 2   z / 2   1  
  (1   ) n 
 
or 
P p  z / 2   (1   ) n    p  z / 2   (1   ) n  1   
QAM – II by Gaurav Garg (IIM Lucknow)
• Thus (1-  )x100 % CI estimate of π is given by
p  z
 /2   (1   ) n , p  z / 2   (1   ) n 
• This expression itself contains π. Which is
unknown
• So, this CI estimate becomes meaningless.
• We use the unbiased estimate of π
• Then, (1-  )x100 % CI estimate of π is given by
p  z /2  pq n , p  z / 2  pq n 
• Where q=1-p.
• Required Assumption: Large Sample only.
QAM – II by Gaurav Garg (IIM Lucknow)
• Example:
• A random sample of 100 people shows that 25
have opened IRA (individual retirement
arrangement) this year.
• Construct a 95% confidence interval for the true
proportion of population who have opened IRA.
• Ans
p  z( 0.025 ) p( 1  p)/n
 25 / 100  1.96 0.25( 0.75 )/ 100
 0.25  1.96 (.0433 )
 ( 0.1651 , 0.3349 )
QAM – II by Gaurav Garg (IIM Lucknow)
Confidence Interval for Population Variance  2
• Variance is an inverse measure of the group’s
homogeneity.
• Variance is an important indicator of total quality in
standardized products and services.
• Managers improve processes by reducing variance.
• Variance is a measure of financial risk.
• Variance of rates of return help managers assess
financial and capital investment alternatives.
• Variability is a reality in global markets.
• Productivity, wages, and costs of living vary between
regions and nations.

QAM – II by Gaurav Garg (IIM Lucknow)


Confidence Interval for Population Variance  2
Case 1:
• Small Sample
• Parent Population is Normal
• Let us take a sample x1 , x2 ,..., xn from N(μ,σ).

 xi  x 
n 2

• Then,   
2
 ~  (2n1)
i 1   
n
1
• We know that 1
s 2
 
n  1 i 1
( x i  x ) 2

(n  1) s12
• So,  2  ~  (2n1)
 2

QAM – II by Gaurav Garg (IIM Lucknow)


• Then, (1-  )x100 % CI estimate of  2 is given by

n  1s 2
n  1s 2
  
1 2 1

 
2 2

 /2 1 / 2

 
 n  1s12 n  1s12 
• Or  , 
 
2 2
 
  /2 1 / 2 

• Here,   and   are critical values obtained


2 2

/2 1 /2

using Chi Square distribution with (n-1) d.f.


QAM – II by Gaurav Garg (IIM Lucknow)
df = 7
α = 0.10

α/2 = 0.05
1- α =0.90 α /2 = 0.05

2.167 14.067

QAM – II by Gaurav Garg (IIM Lucknow)


QAM – II by Gaurav Garg (IIM Lucknow)
• Example:
• The cholesterol concentration in the yolks of a
sample of 18 randomly selected eggs laid by
genetically engineered chickens were found to
have a mean value of 9.38 mg/g of yolk and a
standard deviation of 1.62 mg/g.
• Use this information to construct a confidence
interval estimate of the true variance of the
cholesterol concentration in these egg yolks.

QAM – II by Gaurav Garg (IIM Lucknow)


Confidence Interval for Population Variance  2
Case 2:
• Large Sample
• Parent Population may or may not be Normal
• We know that E ( s1 )   2 2

• Also, S.E.( s12 )   2 2 (n  1) (Proof is out of scope)

s12   2
• So, ~ N (0,1) for large samples.
 2
2 (n  1)
• Using this, (1-  )x100 % CI estimate of  2 is given by
•  s12 s12 
 , 
1 z 2 ( n  1) 1  z 2 ( n  1) 
  / 2  / 2 
QAM – II by Gaurav Garg (IIM Lucknow)
• Example:
• A technologist is developing a new method for processing
a food material.
• For best quality, it is important to control moisture content
in the final product.
• So, as one part of determining the practicality of the new
method, the technologist must estimate the variability of
water content in the resulting product.
• He collects 50 specimens of product from the new
process, and determines the percent water in each.
• These 50 specimens give a sample mean water content of
43.24% and a sample standard deviation of 7.93%.
• Compute a 95% confidence interval estimate of the true
variance of the percentage water for this new process.

QAM – II by Gaurav Garg (IIM Lucknow)


Sample Size Decision
(when Estimating μ)
• We have seen (for sufficiently large n) that
x
Z
x ~ N ( , n) or
 n
~ N (0,1)

• Error of Estimation e  x  
• Fix the confidence level at (1-  )x100 %
• Obtain critical value is z/2 using N(0,1) such that
• Then, we have
  z / 2 
2
e
z / 2  or n 
 n  e 

QAM – II by Gaurav Garg (IIM Lucknow)


• Thus the sample size for estimating population mean μ
is
  z / 2 
2

n 
 e 

• Critical value z/2 can be taken from the table.


• Estimation Error (e) should be fixed by the researcher in
advance.
• Clearly, e ≠ 0
• Population standard deviation σ can be estimated from
some other small sample or pilot survey as
• Range/6 or by sample standard deviation
QAM – II by Gaurav Garg (IIM Lucknow)
• Example:
• In a pilot survey, it is observed that the smallest
observation is 6 and the largest observation is 276.
• What should be the sample size needed to estimate the
population mean within ± 5 with 90% confidence level?
• Ans.
276  6
Estimate of population standard deviation ˆ   45
6
Estimation Error e  5
For 90% confidence level, critical value z ( 0.05)  1.645
 ˆ z 0.05 
2
 45  1.645 
2

So, n       219.19  219


 e   5 

QAM – II by Gaurav Garg (IIM Lucknow)


Sample Size Decision
(when Estimating 𝛑)
• Similarly, the sample size for estimating population
proportion 𝛑 is given by  (1   ) ( z / 2 ) 2
n
e2
• For fixed confidence coefficient (1-  ), critical value z/2 can
be taken from the normal table.
• Estimation Error (e = |p – 𝛑|) should be fixed by the
researcher in advance. Clearly, e ≠ 0
• Population proportion P can be estimated from some other
small sample or pilot survey.
• If no information is available, it can be decided by the
researcher using past experience or can be taken as 0.5.

QAM – II by Gaurav Garg (IIM Lucknow)


• Example:
• How large a sample would be necessary to
estimate the true proportion defective in a large
population within ±3%, with 95% confidence?
• (Assume a pilot sample yields p = 0.12)
•Ans.
Estimate of population proportion p  0.12
Estimation Error e  3 / 100  0.03
For 95% confidence level, critical value z ( 0.025)  1.96
pq( z 0.025 ) 2 0.12  0.88  1.96  1.96
So, n    450.75  451
e 2
0.03  0.03

QAM – II by Gaurav Garg (IIM Lucknow)


Sample Size Decision
(when Estimating  2)
• We know, for large samples, s12   2
~ N (0,1)
 2
2 (n  1)
• Similarly, the sample size for estimating population variance  2 is
given by 2 4 z2 / 2
n  1
e2
• For fixed confidence coefficient (1-  ), critical value z/2 can be
taken from the normal table.
• Estimation Error e  s12   2 should be fixed by the
researcher in advance. Clearly, e ≠ 0
• Population variance  2 can be estimated from some other small
sample or pilot survey.
• If no information is available, it can be decided by the researcher
using past experience or can be taken as the square of Range/6.
QAM – II by Gaurav Garg (IIM Lucknow)
Estimating Total
• In auditing, one is more interested to get the estimate of
population total amount.
• The point estimate of it can be given by Nx
• The CI estimate at (1-  )x100 % confidence level is given by
 s1   s1 
 N x  N t / 2    N x  N z / 2  
 n  n
(small sample size, normal distributi on) (large sample size)

• fpc should be used when n / N >0.05


 s1 N  n   s N n 
 N x  N t / 2    N x  N z / 2  1 
 N  1   N  1 
 n   n
(small sample size, normal distributi on) (large sample size)

QAM – II by Gaurav Garg (IIM Lucknow)


Example: A firm has a population of 1000 accounts and
wishes to estimate the total population value.
• A sample of 80 accounts is selected with average
balance of $87.6 and standard deviation of $22.3.
• Find the 95% confidence interval estimate of the total
balance.
• Ans: N  1000, n  80, x  87.6, s1  22.3
s1 N n
Nx  N z 0 .025
n N 1
22.3 1000  80
 ( 1000 )( 87.6 )  ( 1000 )( 1.96 )
80 1000  1
 87 ,600  4,762.48
 (82837.52, 92362.48)
QAM – II by Gaurav Garg (IIM Lucknow)
Estimating Total Difference
• An auditor may wish to estimate the magnitude of
errors
• An error is the difference of the values reached
during audit and the original values recorded.
• A sample of size n items is collected.
• Let Di denote the error in the ith item (i=1,2,…,n).
 Di = 0, if the auditor finds that the original value is correct
 Di > 0, if the audited value is larger than the original value
 Di < 0, if the audited value is smaller than the original value

QAM – II by Gaurav Garg (IIM Lucknow)


• Define: D  1 n D and 1 n
 i n i 1
sD  
n  1 i 1
( Di  D ) 2

• Point Estimate of Total Difference is N  D


• CI estimate of Total Difference
 sD   sD 
 N D  N t / 2    N D  N z / 2  
 n  n
(for small samples, normal distributi on) (for large samples)

• fpc should be used when n / N >0.05


 s N n   s N n
 N D  N t / 2  D   N D  N z / 2  D 
 n N  1   N  1 
  n
(for small samples, normal distributi on) (for large samples)

QAM – II by Gaurav Garg (IIM Lucknow)


• Example:
• Econe Dresses has 1200 inventory items.
• In the past 15% items were incorrectly priced.
• A sample of 120 items was selected.
• Historical cost of each item was compared with
the audited value.
• 15 items differ in their historical costs and
audited values.
• These values are as follows:

QAM – II by Gaurav Garg (IIM Lucknow)


Historical Audited D
i
Cost Value
261 240 21 n  120, N  1200
87 105 -18
201 276 -75
D  0.95833
121 110 11 s D  25.24482
315 298 17
n/N = 120/1200 = 0.1 > 0.05,
411 356 55
249 211 38 So we use fpc
216 305 -89 95% CI is
21 210 -189  s N n
 N D  Nz( 0.025) D 
140 152 -12  N  1 
 n
129 112 17  [1200  (0.95833)
340 216 124
25.24482 1200  120
341 402 -61 1200  1.96  ]
120 1200  1
135 97 38
228 220 8
QAM – II by Gaurav Garg (IIM Lucknow)
Small sample
σ is

SUMMARY (INTERVAL ESTIMATES)
(Normal Distribution)
know x  z / 2 
Large sample n
n
Population (Any Distribution)
s1
Mean (μ) σ is Small sample x  t / 2 
not (Normal Distribution) n
s1
know Large sample x  z / 2 
n (Any Distribution) n
Small sample OUT OF SCOPE
Population
Large sample p  z / 2  pq n
Proportion (𝛑) (Any Distribution)
Small sample n  1s12 n  1s12
,
 
2 2
Population (Normal Distribution)
 /2 1 / 2

Variance (σ2) Large sample s12


(Any Distribution) 1  z / 2 2 (n  1)
QAM – II by Gaurav Garg (IIM Lucknow)
SUMMARY (SAMPLE SIZE DECISION)
For estimating
  z / 2 
2

Population Mean Large sample n 


(Any Distribution)  e 
(μ)
For estimating Large sample
 (1   ) ( z / 2 ) 2
Population (Any Distribution) n
e2
Proportion (𝛑)
For estimating
Large sample 2 4 z2 / 2
Population n  1
(Any Distribution) e2
Variance (σ2)

QAM – II by Gaurav Garg (IIM Lucknow)

S-ar putea să vă placă și