Documente Academic
Documente Profesional
Documente Cultură
Table of Contents
1. INTRODUCTION ........................................................................................................................................... 1 1.1 STRATIFIED SAMPLING PROCEDURE ................................................................................................................. 2 2. PRINCIPLES OF STRATIFICATION .................................................................................................................. 3 3. NOTATIONS FOR STRATIFICATION ............................................................................................................... 3 4. ESTIMATION OF POPULATION MEAN, TOTAL AND THEIR VARIANCES ......................................................... 4 5. ESTIMATION OF POPULATION PROPORTION AND ITS VARIANCE ................................................................ 5 6. ESTIMATION OF VARIANCE.......................................................................................................................... 6 7. EXAMPLES: ESTIMATION OF MEAN/TOTAL, PROPORTION AND SES ............................................................ 7 7.1 ESTIMATION OF MEAN/TOTAL SE AND CI.......................................................................................................... 7 7.2 ESTIMATION OF PROPORTION, SE AND CI ......................................................................................................... 8 8. ALLOCATION OF SAMPLE TO STRATA ........................................................................................................ 10 8.1 OPTIMUM ALLOCATION .............................................................................................................................. 10 8.2 NEYMAN ALLOCATION................................................................................................................................ 11 8.3 PROPORTIONAL ALLOCATION ....................................................................................................................... 11 8.4 SAMPLE REQUIRED AND ALLOCATION WITH A SPECIFIED COST C ............................................................................ 12 8.5 EXAMPLES: PROPORTIONAL AND NEYMAN ALLOCATIONS .................................................................................... 13 9. REFERENCES .............................................................................................................................................. 15
1. INTRODUCTION We have seen in SRS that 1 1 n S2 S2 ; where f = n/N is the sampling fraction and V ( y ) = ( ) S 2 = (1 ) = (1 f ) n N N n n
(
S2 =
1
Yi Y
Yi 2 N Y
=
1
_2
V ( y) i.e. the precision of a sample estimate of the population mean depends not only upon the size of the sample (n) and the sampling fraction (f) but also on the variability or heterogeneity (S 2) of the population.
N 1
N 1
Apart from the size of the sample (n) , therefore, the only way of increasing the precision of an estimate is to devise sampling procedures which will effectively reduce the heterogeneity (S2). One such procedure is stratified sampling.
Prepared by Dr. V. K. Dwivedi, Department of Statistics, UB for STA 354: Survey Research Methods
-1-
Stratification serves many useful purposes. The principal ones are the following:
i. To increase precision of sample estimate Stratification if done correctly will give more precise (having lower variance) estimates for the whole population. This will be due to the fact that the variance within each stratum is often lower than the variance in the whole population. Estimates for each stratum can be obtained separately Stratification will allow the estimates of each strata; say cities/town, urban villages and rural areas to be obtained separately. This can be useful to different organizations for development projects. Flexibility in the choice of the sample designs for different strata Since sampling procedure is done separately for each stratum, in some situations it may be necessary to use different sampling designs in each stratum.
ii.
iii.
iv.
Convenience and reduced costs The use of stratification will save time and cost for sampling units and convenient to sample from a strata rather than the entire population. For example; for business surveys a mail interview may be used for large firms and a personal interview for small firms.
Guaranteed representation of important domains and special sub-population A domain is a subset of the population for which estimates are desired. It may be a stratum, a combination of strata, or an administrative area. Domains can also be demographic subpopulations defined by characteristics such as age, race and sex.
v.
Of all the methods of sampling, stratified sampling is a procedure most commonly used in surveys. In stratified sampling, the population of N units is sub-divided into H sub-population called strata, the hth sub-population having Nh units (h=1,2,,H). These sub-populations are non-over-lapping so that they comprise the whole population such that; N1 + N2 + + NH = N A sample is drawn from each stratum independently, the sample size within the hth stratum being nh (h= 1, 2.. , H) such that n1 + n2 + + nH = n The procedure of taking samples in this way is known as stratified sampling. If a sample is taken randomly from each stratum, the procedure is known as stratified random sampling. The theory of stratified sampling deals with the properties of the estimates from a stratified sample and with the best choice of the sample size n h to obtain maximum precision.
Remark: The main objective of stratification is to give a better cross-section of the population so as to gain a higher degree of relative precision.
-2-
2. PRINCIPLES OF STRATIFICATION
The principles to be followed in stratifying a population are summarized below: i. ii. iii. The strata should be non-overlapping and should together comprise the whole population. The stratification of population should be done in such a way that strata are homogeneous within themselves, with respect to the character under study. In many practical situations when it is difficult to stratify with respect to the character under study, administrative convenience may be considered as the basis for stratification. If the limit of precision for certain sub-population is given, it will be better to treat each sub-population as a stratum.
iv.
3. NOTATIONS FOR STRATIFICATION i) ii) iii) N denotes the total population i.e. population size n denotes the sample size N h size of the h th stratum
H
such that N h = N
h =1
iv)
such that
v) vi) vii) viii)
n
h =1
=n
Yh is the population total for the h th stratum y hi is the value obtained for the i th unit in the h th stratum
H 1 H Nh 1 H Y = yhi = N h Y h = Wh Y h is the population mean per unit N h =1 i =1 N h =1 h =1 N Wh = h is the stratum weight, representing the proportion of population N
W
h =1
=1
Yh = yh =
1 Nh
y
i =1 nh
Nh
hi
1 yhi is the sample mean of the h th stratum nh i =1 n f h = h is the sampling fraction for the h th stratum Nh
Nh 1 Nh 1 2 2 ( y Y ) = [ yhi N hYh2 ] is the population variance for hi h Nh 1 i Nh 1 i
Sh 2 =
xiii)
the h th stratum nh _2 1 nh 1 2 2 2 sh = ( yhi yh ) = [ yhi nh y h ] = is the sample variance for h th nh 1 i nh 1 i the stratum
-3-
Stratum
Yh = yhi
i =1
Nh
yh
=
n1 n2
1 nh
y
i =1
nh
hi
1 2 . . h . . H Total
. . . .
. . . . . .
y1 N1 . y2 N2 . yhN h .
. .
N1 N2
Y1
y1
Y2
y2
Nh
Yh
nh
yh
y HN H .
_
NH N
YH
Y
nH n
yH
For the population mean per unit Y , the unbiased estimate used in stratified sampling is y st (st for stratified), where
H
N
yst =
h =1
yh
= Wh yh
h =1
(1)
Theorem 1: If in every stratum the sample estimator y h is unbiased and samples are drawn independently in different strata, then
_
(i) y st is an unbiased estimator of the population mean Y , and (ii) its sampling variance is given by
H h =1
V ( yst ) = Wh2 V ( yh )
(2a)
Proof 1: (i) Since in each stratum a simple random sample is taken, so stratum sample mean y h is
Thus, E ( yst ) = E{ Wh yh } = Wh E ( yh ) = Wh Y h = Y This shows that y st is an unbiased estimator of population mean. (ii) We note that sampling is done independently in each stratum, and therefore
H h =1
V ( yst ) = Wh2 V ( yh )
(2a)
V ( yst ) = (
h =1
1 1 2 ) Wh2 Sh = nh N h
(1 f
h =1
2 h ) Wh
2 Sh nh
(2b)
-4-
Proof 2: We note that sampling is done independently in each stratum, and therefore by Theorem 1 applied to an individual stratum, we have Sh2 (2c) V ( yh ) = (1 f h ) nh Substituting the value of equation (2c) in equation (2a) we get H S2 V ( yst ) = (1 f h ) Wh2 h nh h =1 COROLLARY 1 (Estimator of population total and its variance) If y st =N y st is the estimator of population total Y , then y st is an unbiased estimator and its sampling variance is given by H S2 (3) V ( yst ) = N 2 V ( y st ) = N 2 (1 f h ) Wh2 h nh h =1 COROLLARY 2 (Estimator of population mean and its variance under proportional allocation) 2 H _ n N 2 Sh V ( y st ) = (1 h ) h N h N 2 nh h =1 n n If in every stratum h = the variance of y st reduces to Nh N
V ( y st ) prop = (1
h =1 _
2 S2 N n Nh n 1 H Nh 2 ) 2 h = (1 ) Sh N N nN h N n h =1 N
V ( y st ) prop =
(1 f ) H Wh Sh2 n h=1
(4)
COROLLARY 3 (Estimator of population mean and its variance under proportional allocation and same variance in all strata) n n 2 and the variance in all strata have the same value S w the If in every stratum h = Nh N variance of y st in equation (4) reduces to
_
V ( y st ) =
2 ( N n) S w S2 = (1 f ) w Nn n
(5)
COROLLARY 4 (Estimator of population mean and its variance for stratified random sampling with replacement srs wr) 2 2 2 H H Nh Sh 2 Sh (6) V ( y st ) = 2 = Wh nh h =1 N nh h =1 5. ESTIMATION OF POPULATION PROPORTION AND ITS VARIANCE
Let every unit in the population falls into one of the two classes (i) C1 (having a particular characteristic) and (ii) C2 (not having that characteristic).
-5-
For example; a crop field is irrigated or not irrigated. If we are now interested to estimate the proportion of irrigated field P, the population N can be defined with the variate yi as having value 1 if the field is irrigated, otherwise zero.
To estimate the proportion of irrigated field using stratified random sampling, the population Nh can be defined with the variate yhi as having value 1 if the field is irrigated, otherwise zero in the h-th stratum. Number of units in C1 (irrigated) in population in h-th stratum Number of units in C1 (irrigated) in sample in h-th stratum : Nh1 : nh1
Proportion of units in C1 in population in h-th stratum: Ph = N h1 / N h Proportion of units in C1 in sample in h-th stratum : ph = nh1 / nh
Theorem 3: A sample is drawn from each stratum independently, the sample size within the hth stratum being nh (h= 1, 2.. , H) such that
n1 + n2 + + nH = n
An unbiased estimate of population proportion P is given by
H
pst = Wh ph
h =1
With variance
H
V ( p st ) = Wh2 V ( ph )
h =1
(7a) (7b)
(7)
by s h 2 =
nh _ 1 nh 1 2 2 = ( y y ) [ yhi nh y h ] hi h nh 1 i nh 1 i
(8)
The unbiased estimator of sampling variance of sample mean y st can be obtained as follows: Est. of V ( y st ) = v( y st ) = (1 f h ) Wh2
h =1 _ _ H 2 sh nh
(9)
-6-
Then unbiased estimator of sampling variance of sample total yst = N yst can be obtained as follows: H s2 (10) Est. of V ( y st ) = v( y st ) = N 2 (1 f h )Wh2 h nh h =1
6.2 Proportion
The unbiased estimator of sampling variance of sample proportion pst can be obtained as follows H p (1 ph ) (11) Est. of V ( p st ) = v( p st ) = (1 f h ) Wh2 h (nh 1) h =1 where ph is the unbiased sample estimate of proportion Ph in the h-th stratum.
7. EXAMPLES: ESTIMATION OF MEAN/TOTAL, PROPORTION AND SEs
7.1 Estimation of mean/total SE and CI
Dairy farms in a certain regions are divided into four categories, depending on their total acreage and on whether or not they concentrate exclusively on dairy products. The number of farms in the four categories are 72, 37, 50 and 11. In a survey to estimate the total number of milk producing cows in the region a stratified random sample 28 farms is chosen with proportional allocation. The number of cows on the selected farms are : Categories/ Stratum 1 2 3 4 Number of cows on selected farms (yhi) 61,47,44,70,28,39,51,52,101,49,54,71 160,148,89,139,142,93 26,19,21,34,28,15,20,24 17,11
Estimate the (i) average cows per form, its SE and 95% CI (ii) total number of cows in the region, its SE and 95% CI
Solution 7.1: Calculation and Result
Table: Calculation of terms involved into the estimation of average and total cows and their standard errors
(1 f h )
h Nh
N Wh = h nh N
(4)
sh 2
(5)
yh
662 771 187 28
yh
55.17 128.50 23.38 14.00
Wh yh
(8)=(4)x(7) 23.36 27.97 6.88 0.91 59.12
Wh 2
(9)=(4)x(4) 0.1794 0.0474 0.0865 0.0042
2 Wh 2 sh
= (1
nh W 2 s2 ) (1 f h ) h h ) Nh nh
(11) (12) 4.4161057 5.9341044 0.3216371 0.0308304 10.702676
(6) (7)=(6)/(3)
Total 170 28
-7-
(b) Estimate of variance of estimate of average number of cows per farm 4 _ _ W 2s2 Est. of V ( y st ) = v ( y st ) = (1 f h ) h h = 10.7027 nh h =1 Thus, standard error estimate of average number of cows per farm
_
(b) Estimate of variance of estimate total number of cows in region Est. of V ( yst ) = v ( yst ) = N 2 v( y st ) = 170x170x10.7027=309307.4 Thus, standard error estimate total number of cows in region SE ( y st ) = v ( y st ) = 309307.4 =556.15 (c) 95 % CI for Y y st 1.96 SE ( yst ) 100501.96x556.15 8960 Y 11140
7.2 Estimation of proportion, SE and CI
A department contains 200 boys and 300 girls. To estimate the proportion (p) of students who favour supplementary examination, a stratified random sample of 10 boys and 15 girls was selected. The sample produced the following results: 1: favour; Boys 1 2 1 Girls 1 1 1 2: Not favour 1 2 1 1 1 2
2 1
1 2
2 1
1 1
Use the sample to estimate the (i) proportion of students who favour supplementary examinations (ii) calculate the standard error of estimate and 95% CI for P.
-8-
Solution 7.2: Formulae (i) Estimate of proportion of students who favour supplementary examinations
H
p st =
W p
h h =1
v( p st )
Where, v( p st ) =
h =1
Wh2 (1 f h ) ph (1 ph ) ( nh 1) v( p st )
nh (3) 10 15
nh1 (4) 6 12
(1 f h ) p h ( 1 p h ) nh
p st =
W
h =1
ph = 0.72
v( p st )
Where, v( p st ) =
h =1
Wh2 (1 f h ) ph (1 ph ) (nh 1)
= 0.0080
v( p st ) = 0.0892
-9-
The allocation of sample size to strata is affected by three factors, viz. i. ii. iii. Total number of units in each stratum (Nh) The variability of observations within each stratum (Sh2), and The cost of obtaining an observation from each stratum (ch)
A good allocation is one where maximum precision is obtained with minimum cost, or in other words, the criteria for allocation is to minimize the cost for a given variance or minimize the variance for a given cost. There are three methods of allocating a sample to strata namely; Optimum i) ii) Neyman iii) Proportional allocation
8.1 Optimum allocation
H
C = C0 + nh ch
h =1
(12)
Where the overhead cost C0 is constant and ch is the average cost of surveying one unit in the h-th stratum, which may depend upon the nature and size of the units in the stratum. To determine the optimum value of nh we consider the function
_
= V ( y st ) + C Where is some unknown constant. Using the calculus method of Lagrange multipliers, we select and the constant to minimize .
Differentiating with respect to n h, we have For a fixed total size of sample n, the optimum allocation of sample in the stratum is given as:
nh = n
Wh S h / ch
(W
S h / ch )
=n
N h S h / ch
(N
S h / ch )
(13)
Where: N Wh = h is stratum weight N S h is standard deviation in the stratum ch is the cost of enumeration per unit The above equation leads to the following rules of conduct. In a given stratum we take a larger sample if, 1. The stratum accounts for a large part of the population, 2. The variance within the stratum is large, we sample more units to compensate for the heterogeneity, and 3. Sampling is cheaper in the stratum.
- 10 -
Neyman allocation is a special case of optimal allocation and it applies when the costs in the strata are equal but not the variances. Under Neyman allocation, n h is proportional to N h S h . When cost of sampling is same in all strata (i.e. ch = c), the equation (13) reduces to: Wh S h N h Sh nh = n =n (14) Wh Sh N h Sh Remark: In both optimum and Neyman allocations, the values of the variances will not be known and estimates would have to be obtained from a previous study. The equation leads to the following rules of conduct. In a given stratum take a larger sample if, (i) The stratum accounts for a large part of the population, (ii) The variance within the stratum is large, we sample more units to compensate for the heterogeneity. The variance sample mean for Neyman allocation takes the form as 2 _ ( Wh S h ) 2 Wh S h V ( y st ) Neyman = n N
8.3 Proportional allocation
(15)
Proportional allocation involves the use of a uniform sampling fraction in all strata. This implies that the number of sampled units in each stratum is proportional to the size of the stratum. For example; in a population of 2400 men and 1600 women, proportional allocation with a 10% sample would mean sampling 240 men and 160 women. Proportional allocation will be applicable when the (i) cost of sampling (i.e. ch = c), and (ii) the strata variability (i.e. Sh = S ) are same in all strata. The equation (13) reduces to:
(16)
The equation leads to the following rules of conduct. In a given stratum take a larger sample if, (i) The stratum accounts for a large part of the population, The variance of estimate of population means i.e. sample mean for proportional allocation takes the form as 2 _ n Wh S h Wh Sh2 (17) V ( y st ) prop = (1 ) = (1 f ) N n n
- 11 -
Below is the Summary table of sections 8.1 to 8.3 Allocation Formula Take a larger sample if,
V ( y st )
Optimum
nh = n
Wh S h / ch
(W
Sh / ch ) Sh / ch )
or
(i) The stratum accounts for a large part of the population, (ii) The variance within the stratum is large, we sample more units to compensate for the heterogeneity, and (iii) Sampling is cheaper in the stratum.
nh = n
N h S h / ch
(N
Neyman (when ch = c)
nh = n
Wh Sh , or Wh Sh N h Sh N h Sh
(i) The stratum accounts for a large part of the population, (ii) The variance within the stratum is large, we sample more units to compensate for the heterogeneity, and i) The stratum accounts for a large part of the population,
( Wh S h ) 2 n
W S
h
2 h
nh = n
nh = n Wh = n
Nh N
= (1
2 n Wh Sh ) N n
Using (12) and (13), the total sample size n required for estimating the population with a specified cost C is given by
n= (C C0 ) (Wh S h / ch )
(W
S h ch )
(18)
For the given cost the allocation of sample in the strata are (C Co ) Wh S h / ch nh = (Wh Sh ch )
(19)
- 12 -
(i) Proportional and Neyman allocations and to find relative efficiency The following data show the stratification of all farms in a region by farm size and the average acres under sorghum per form in each stratum:
Stratum 1 2 3 4 Total
Average No. of acres under holdings in sorghum stratum Standard ( ) Y (Nh) h Deviation (Sh) 1055 11.4 11.2 915 34.3 18.6 582 50.3 25.7 398 60.1 36.4 2950
For a sample of 300 holdings, compute: (i) the sample size in each stratum under (a) Proportional allocation, and (b) Neyman allocation (ii) Percent relative efficiency of Neyman allocation over Proportional allocation.
Solution 8.5(i): Formulae
n Wh S h Wh Sh (ii) Percent relative efficiency of Neyman allocation over Proportional allocation
% Re lative Efficiency =
V ( y st ) prop
_
x100
V ( y st ) Neyman
where,
n Wh S h Wh Sh , and V ( y st ) prop = (1 ) = (1 f ) N n n 2 2 _ ( Wh Sh ) Wh Sh V ( y st ) Neyman = n N Calculation and Results
_ 2 2
Table: Calculation of terms involved into the proportional and Neyman allocation Proportional Neyman allocation
allocation Sh (4) 11.2 18.6 25.7 36.4 Wh (5) 0.3576 0.3102 0.1973 0.1349 1 WhSh (6) 4.0054 5.7692 5.0703 4.9109 19.7558
h (1) 1 2 3 4 Total
Yh
(3) 11.4 34.3 50.3 60.1
Wh S
2 h
nh = n Wh
(8) 107 93 59 40 300
nh =
n Wh S h Wh Sh
(9) 61 88 77 75 300
- 13 -
(i) The allocation of sample of 300 are allocated according to proportional and Neyman allocation in columns (8) and (9) respectively. (ii) Percent relative efficiency of Neyman allocation over proportional allocation
_
V ( y st ) prop = (1
n Wh Sh Wh Sh ) = (1 f ) N n n =(1-300/2950)*461.2311/300 = 1.3811
2 2
V ( y st ) Neyman =
( Wh S h ) 2
n N =(1/300)*19.7558x19.7558-(1/2950)*461.2311 = 1.1446
_
W S
h
2 h
% Re lative Efficiency =
V ( y st ) prop
_
x100
V ( y st ) Neyman
% Re lative Efficiency =
V ( y st ) prop
_
1) x100 =21%
V ( y st ) Neyman
8.5 (ii) Sample required with a specified cost C
Let the population is divided into 3 strata. The following information is available for these three strata: Stratum 1 3500 22 2 Stratum 2 4500 17 3 Stratum 3 3000 30 5
(i) Determine the appropriate optimal allocation when the total budget outlay(C) is 1000. (ii) Estimate the standard error of sample mean assuming uniform cost.
Solution 8.5 (ii): Formulae:
(i) The sample size in each stratum is determined as (C C0 ) Wh Sh / ch nh = Wh Sh ch (ii) Estimate of standard error of sample mean assuming uniform cost
_ _
(20)
SE ( y st ) Neyman = V ( y st ) Neyman
_
where, V ( y st ) Neyman =
( Wh S h ) 2
n
W S
h
2 h
- 14 -
Calculations and Results Given, C=1000 and C0=600 Then, C-C0= 400
Table: Calculation of terms involved into the allocation of samples for specified cost 2 nh Wh S h Wh Sh / ch Wh Sh ch Wh S h h Nh Sh ch Wh
(1) 1 2 3
(3) 22 17 30
(4) (5) (6) 2 0.3182 7.0000 3 0.4091 6.9545 5 0.2727 8.1818 1.0000 22.1364
(i) The optimal allocation of sample using equation (20) are given in column (9). (ii) Estimate of standard error of sample mean assuming uniform cost. When cost is uniform, then the estimate of the sample mean becomes Neyman allocation.
_
V ( y st ) Neyman =
_
( Wh S h ) 2
V ( y st ) Neyman
_
W S
h
2 h
William G. Cochran (1977): Sampling techniques, John Wiley & Sons, Inc, Canada p.p 89 Pouri S.R.S Rao (2000) Sampling Methodology with applications, Chapman & Hall, USA p.p 83-109. Kish, L. (1995): Survey sampling, John Wiley & Sons, Inc., New York, Yansaneh; Ibrahim (2002): Overview of sample design issues for Households Surveys in Developing and Transition Countries: Design, Implementation and Analysis, United Nations Statistics Division (UNSD) Publication (Chapter 2). This publication can be downloaded free of cost from internet http://unstats.un.org/unsd/HHsurveys Sharan L Lohr (1999) Sampling Design and Analysis Second Edition, Brooks/Cole publishing company, USA.
- 15 -