Sunteți pe pagina 1din 8

Cochran's Rule for Simple Random Sampling Author(s): R. A. Sugden, T. M. F. Smith and R. P.

Jones Reviewed work(s): Source: Journal of the Royal Statistical Society. Series B (Statistical Methodology), Vol. 62, No. 4 (2000), pp. 787-793 Published by: Wiley for the Royal Statistical Society Stable URL: http://www.jstor.org/stable/2680621 . Accessed: 28/01/2013 09:22
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at . http://www.jstor.org/page/info/about/policies/terms.jsp

.
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact support@jstor.org.

Wiley and Royal Statistical Society are collaborating with JSTOR to digitize, preserve and extend access to Journal of the Royal Statistical Society. Series B (Statistical Methodology).

http://www.jstor.org

This content downloaded on Mon, 28 Jan 2013 09:22:15 AM All use subject to JSTOR Terms and Conditions

J. R. Statist.Soc. B (2000) 62, Part4, pp. 787-793

Cochran's rule forsimple randomsampling


R. A. Sugden, Goldsmiths College,London, UK T. M. F. Smith of Southampton, UK University and R. P. Jones Electronic Data Systems, Uxbridge, UK
[Received May 1999. Revised February2000] Summary. Cochran's ruleforthe minimum sample size to ensure adequate coverage of nominal is derivedby usingthe Edgeworth expansion forthe distribution function 95% confidenceintervals of the standardized sample mean. The rule is extended forconfidence intervalsbased on the Studentizedsample mean. The performance ofthe ruleand Edgeworth forsmaller approximations sample sizes are examined by simulation. Confidenceinterval; Edgeworth expansion; Kurtosis;Sampling Keywords: Coverage probability; fraction; Simple randomsampling;Skewness

1. Introduction such as means and totals,based We considerinferences about finite populationparameters, on the randomization distribution to all possible repetitions of a randomcorresponding population samplingrule. If the sample size is large an appeal is made to one of the finite versionsof the centrallimittheorem (CLT) (Madow, 1948; Erdos and Renyi,1959; H'ajek, of an population the distribution 1960; Rosen, 1972) and it is assertedthat for the finite normal with a mean and unbiased estimatorof the population mean is approximately forsimplerandomsampling variancewhichcan be evaluated.An earlyversionof thisresult forbinaryrandom (SRS) is due to Bowley(1926) who also gave a Poisson approximation If the CLT does not apply thenthereis no variableswithlow probabilities of occurrence. for inference. Moments can still be evaluated but the form of the agreed framework distribution of an estimator is unknown.It is important, therefore, that statisticians have case. some indication whether or not the normalapproximation will hold in any particular Problems withthenormalapproximation forthedistribution ofthesamplemean are well known.Plane and Gordon (1982) pointedout thatunderSRS, whenthepopulationis finite of size N withvariatevalues YI, . . , YN,thenthedistribution ofthestandardized meanfrom y,has thesame shape as thatof a sampleof size N - n. a sampleof size n withvaluesYl, . . y, If
R. A. Sugden,Department of Mathematicaland ComputingSciences,Goldsmiths Address for correspondence: of London, Lewishamn College,University Way, London, SE14 6NW, UK.
E-mail: ? 2000 maa0l Royal ras@gold.ac.uk Statistical Society 1369-7412/00/62787

This content downloaded on Mon, 28 Jan 2013 09:22:15 AM All use subject to JSTOR Terms and Conditions

788

R. A. Sugden,T. M. F Smith and R. P. Jones


z,
l(-f)n

(1.1)

is the standardized statistic, where and f = n/N is the samplingfraction


N

S2

E(yi

_ f2(N

-1)

is the finite populationvariance,thenit is easy to show thatZ, = -ZN-,1, since the population mean Y is fixed.The selectionof the complement of a sample is equivalentto the selectionof the sampleitself, fromthepoint of view of the distribution of the standardized the coverageproperties statistic. Since distributions determine of confidence intervals this thanN/2 thecoverageproperties becomeworseas the implies thatforsamplesthatare larger sample size increasesdespite the fact that the samplingvariance is decreasing.For the Studentized statistic

Utz
where

j Vn(i

(1.2)

s2 =E (Yi_-y)2(n -1)
i=l

is the unbiased sample estimator of variance,it is not truethat Un= -UN-n but coverage are likelyto behave in a way thatis similarto those of Z,,. probabilities For theStudentized statistic populations witha singleoutlier such Unit is easyto construct that confidence intervalsfromsamples that do not contain the outliernever cover -the populationmeanwhereasthosethatdo containtheoutlier do coverthemean. SincetheSRS inclusionprobability of any unit is n/N this impliesthat the coverageprobability of the interval estimator is also n/N,whichcan be verysmall.Outliers in finite populationsneed to be treatedwithcaution. Numeroussimulation studiesdemonstrate that the CLT may not workwell forspecific finite thatthe CLT can give a populations,but farmoredemonstrate will the normalapproximation work? Under what conditions good approximation. thatthegivenfinite The proofsoffinite oftheCLT require populationversions population be considered as a member of a sequenceof finite populationsin whichboththepopulation so it is usual size and thesamplesize tendto oo. This sequencecould be completely arbitrary as thegivenfinite to constrain it to have thesame moments populationand forthesampling it is possibleto provethattheCLT willhold to remain fixed. Undertheseconditions fraction to work?What do we How largemustthesamplesize be fortheasymptotics asymptotically. mean by work? The only generalresultthatwe know of in thisarea is Cochran's rule (Cochran (1977), page 42), whichstates in ruleas to howlargen must be foruse of thenormal 'Thereis no safegeneral approximation in whichthe principal confidence limits. For populations deviation from computing normality is a crude I haveoccasionally found useful ofmarked rulethat consists positive skewness, 2 n > 25G (1.3) is designed a 95% confidence where measure ofskewness. Therule so that probability GI is Fisher's is that than It derived statement will bewrong notmore 6% ofthetime. mathematically by assuming is negligible. ofthedistribution of higher than thethird Thisrule duetomoments anydisturbance

This content downloaded on Mon, 28 Jan 2013 09:22:15 AM All use subject to JSTOR Terms and Conditions

SimpleRandomSampling

789

ignoring thedirection of theerror attempts to controlonlythe totalfrequency of wrongstatements, of estimate.'

Our objectin thispaperis to explorethederivation of Cochran'srule.In theprocesswe shall theruleand exploresome of define moreprecisely themeaning of 'works'.We thengeneralize In Section2 we developthetheory Cochran'srule.In Section3 theconsequences. underlying we apply this to Z1, and give an interpretation of the rule. In Section 4 we extendthe In Section5 we drawsome approachto theStudentized statistic theresults. U, and generalize conclusionsand make some recommendations forpractice. 2. Theory

2.1. Coverageprobabilities

In thissectionwe are interested in thecoverageprobabilities based of nominal95% intervals on the standardized statistic given by Zs,
Y/7

? 1.96SV/ 1

(2.1)

or on the Studentized statistic U, givenby j- n? 1.96s(

(2.2)

The exactdistributions underSRS ofZ, and Un, sincethey dependon unknown are unknown, distribution populationvalues.The required coverage probabilities are givenbytheunknown ? 1.96.Approximations to thetruedistribution functions evaluatedat theassumedpercentiles functions and Cochran'srestrictions on moments Edgeworth expansions. suggest functionsto establish the Robinson (1978) used a method based on characteristic Edgeworth expansionforthedistribution ftinction of Z1, whichis valid undera 'non-lattice' condition.He writes Pr(Z,,< z) = 1 (z) + PI-(z)(z) +P2(Z) n( where
p1(z)=
-Y

O(nr32

(2.3)

(2

1) 2 2f)2 (z5 -10Z315z),

p2(z)=

- ?6}(I ) -f(1f T-7y2 I 24(1 -f)

4f
72 = 2=

-3z)-2(1-

72 1 -f

-(4)lo4-3

(N- I)S2/N.

Here
-

Z (Y. -Y)"N
N

is therthpopulationmomentabout themean, and t2) = . To derive theEdgeworth whichcan be expressed as a smoothfunction of expansionforU17,

This content downloaded on Mon, 28 Jan 2013 09:22:15 AM All use subject to JSTOR Terms and Conditions

790

and R. P. Jones R. A. Sugden,T. M. F. Smith

expansionsto order l/n of thefirst samplemeans,the methodof Hall (1992) requires four can be approximated usingthedeltamethodbyexpanding of U,. These cumulants cumulants To evaluate these l/n and takingexpectations. the powers of Us,to order of probability of thesesample means up to order6. forjoint moments we need expressions expectations of the delta methodare discussedin Bai and Rao (1991). Conditionsforthe validity More specifically, followingHall (1992), page 71, and definingWi = (Yi - Y)/ and thesamplevariance s2, and hence Us, as functions Vi = W2-1, i= 1,..., N, we can write Hence U, can be expressed as and vofthevariatesWiand Virespectively. ofthesamplemeanswa seriesin ascendingpowersof n-l/2. Using the resultsof Sugden and Smith(1997), witha of order6, thevarianceof Us,is givenby resultforexact moments of their correction slight
n2 var(Un,)

2 Y2 -f) 2 f(1I = I +_+ 7(1 4 71 + n + 0

(2

fromThompson (1997), page 71, who has (2 -f)/n as the second term. This differs by Thompson. is also givenincorrectly cumulant Because of this,the fourth 2.2. The Edgeworthexpansion forUn in of theprevioussectionare now substituted in theexpansionsforthecumulants The terms givenby Hall (1992), page 48, to give the expressions <, u) = <D(u)+ Pr(Un
where
ql

+
(u)
q2(U) M()

+ O(n-3/2),

(2.4)

ql(u) = '/{I I(1 and q(u= [2


2 i

q5(u) I 1+

f2 1)}u

2-6f3f2 26(f+)

(u2 - 3) - 2t

(2

1+I(U2 -f)
_

_3)}

~/2{l?2f(3

-ft +

2 -f(U2

3)+(I -3?18(1

-f/2)2(U4

10,12 +

15) }

thatgivenin Sugdenand Smith(1997). forq2(u) corrects The expression 3. Cochran's rule for Zn

of the The statementof Cochran's rule in Section 1 does not mention the finiteness fora largepopulation.We also take approximation populationand so we takef= 0 as a first due to higherorder deviationsfromnormality the meaningof the two phrasesregarding as well. forK4 involves theexpression moments to be theassumption that-y2= 0, although jY2 therule. of -y2as well,thusextending However,we shall examinetheeffect that of errorclearlyimpliesa requirement Cochran's statement about the totalfrequency intervalfor Y, based on a normal of a nominal95% confidence the coverageprobability withthepopulationvarianceS2 known,is at least 94%, i.e. distribution approximation Pr(Zn < 1.96) - Pr(Zn < -1.96) > 0.94. (3.1)

This content downloaded on Mon, 28 Jan 2013 09:22:15 AM All use subject to JSTOR Terms and Conditions

SimpleRandomSampling

791

If we neglect theerror in theEdgeworth term as pi (z) is expansion(2.3) to order1/n,then, even and P2(Z) is odd, we obtain
2P2(1.96) 0(196) > -0.01.

For infinite populations (f= 0), P2(.96) = 0.23569-y 2- 0.06873-y2 and Pi(l.96) = so when P2(0.96) > 0 the inequalityis satisfiedfor any sample size n > 2 -0.47360&yl, underCochran's impliedconditions. The Edgeworth expansionthussuggests overcoverage rather thanundercoverage. When when-Y2 p2(1.96) < 0, however, occurring is sufficiently large, then theinequality givesa rulefora minimum samplesizen > 11.6881P2(l.96)l which againdoes we return to criterion not giveCochran'srule.In thenextsection, (3.1) forthestatistic U,. Condition(3.1) on thecoverageprobability of theinterval at one end allows overcoverage to compensate forundercoverage at theother, so a considerably condition would be stronger to require Pr(Z, <- 1.96) > 0.97 and Pr(Z,, -1.96) < 0.03, (3.2)

both tails of the two-sidedinterval.Now we obtain two quadratic controlling separately in l1/Vn: inequalities 96) 0(1.96) P2(.96) 0(1.96) Pip(L

4%/f

0.005

These quadratics have no real roots when -Y2is sufficiently large and negative,and the inequalities are satisfied forany n. Otherwise, takingthesmallerpositiveroot of the quadratic,we obtain
- 0.068731Y2)2 4(0.23569_y1 l-V/(0.14364_2i + 0.023522y2)}2 {0.473601yll

When -Y2= 0, Cochran'srule(1.3) is obtainedon further to two significant rounding figures. that Jones(1996) obtaineda preliminary versionof thisresult.We have thusdemonstrated Cochran's rule can be derivedby using Edgeworth expansionswhen the coverageerroris in both tails,contrary to the statement in Section 1. controlled 4. A new rule for U,

In deriving a rule for the minimum sample size to validate a normaldistribution approximationforthe Studentized we return to Cochran's originalconditionon the total statistic, of erroras in inequality frequency (3.1). We requirean expansionfor U, such that Pr(U,,< 1.96)-Pr(U,, < -1.96) > 0.94. (4.1)

from Section2.2 and usingthefactthatql(u) is evenand q2(u) is odd, we obtain Substituting on neglecting the Edgeworth errorof 0(n -32) 2 q2(1.96)0(1.96)/n > -0.01.
andf to 0, so that q2(1.96) = -2.3724 Putting -Y2

(4.2)
. On gives n > 27.73 + 24.74-yi

to two significant we have rounding figures

- 2.1170y2,

This content downloaded on Mon, 28 Jan 2013 09:22:15 AM All use subject to JSTOR Terms and Conditions

792

and R. P. Jones R. A. Sugden,T. M. F. Smith

n > 28 + 25_y,

(4.3)

the whichimposesa penaltyof 28 extrasampleunitsover Cochran's rule,fornot knowing of Cochran's rule underhis modification variance(Jones,1996). This is our recommended thancoveragein both tails. originalconditionof totalcoveragerather of weakeningCochran's strongassumptionson the We now explore the implications -Y2.A moregeneralapproach,specifying populationsize and thekurtosis (a) f o0, of at most e > 0 and (b) an undercoverage u, (c) a generalnormalpercentile forn: quadraticinequality (4.1) the following givesfrominequality
21

[Nq)

F~~~~~~~~~~~~~ 36cE _2
1V)-362

3u I_{2 + 24 H3(tt) + H5(u)] u}+ N2 ?72u

{144u+ 72H3(u) +4H5(u)? + -{72u + 18H3(u)l )-oN{18H3(u)+36u}? N N _0(u) N + (4.4) + [72u+ 18H3(u) - 6-y2 H3(u) ?Y2{72u+ 48 H3(u) + 4 H5(u)}] < 0, -n used in the polynomials whereH3(u) = u3- 3u and H5(u) = u5- lOu3 + 15u are theHermite of Edgeworth rootof thequadraticcan be plottedagainst The smaller derivation expansions. and forvariousvalues of -Y2,as in Fig. 1. populationeffect, 71 forvariousN to see thefinite = 27.73 foru = 1.96,e = 0.01 and = y2= 0 is q5(u){2u root for7m The smaller + H3(u)/21E-1 thesame any N. When -y2= 8 thereis also a value of -ylforwhichtheroot is approximately forothervalues of -Y2. has been confirmed forall N. This intriguing property and so we In using Edgeworth expansionsto order 1/nwe are makingapproximations populations. widely varying each offive 100000samplesfrom our results bysimulating tested samplesize We foundthatour modified Cochran ruleworksin the sensethattheminimum the (4.3) gave adequate coveragein all cases. We thenexaminedwhether givenby inequality showed workedwell for smallersample sizes. Our simulations approximations Edgeworth to the simulateddistribution that the expansionto order 1/ngave a good approximation function at u= ?1.96.
120 0

200

100

20 A 0

27.73 1 2 3 4

36.63 / 14.89 0 0

Yi2
(a)

~~~~~~~~~~~~~~~~~~Yi2
(b)

10

(a) '72 0 and(b) ininequality (4.4)for root ofthequadratic bythesmallest sample sizes given Fig. 1. Minimum = 8: N =100 N =200; N = 1000; . , N=500; ,N = 100000; ----y2

This content downloaded on Mon, 28 Jan 2013 09:22:15 AM All use subject to JSTOR Terms and Conditions

SimpleRandomSampling

793

5. Conclusions
We have shownthatCochran's rulecan be derivedfora standardized mean underSRS by using Edgeworth expansions.We have extendedthe rule to the Studentized mean and our modified rule forminimum sample size is that n > 28 + 25 _Y2 1. A further modification is necessary forsmallpopulationsizes and forpositivekurtosis. We in Fig. 1. Our modification of Cochran's gave a generalresultin inequality (4.4), illustrated rule gives a conservative estimatefor the minimumsample size under a wide range of conditions. our rule,nominal95% We have demonstrated empirically that,whenn satisfies intervals have adequate coverage. We have chosen to use Edgeworthexpansionsfor the cumulativedistribution function sincethey are thebasis of Cochran'srule.It maybe possibleto improve theseapproximations by using saddlepointexpansions for Studentizedmeans as in Robinson and Skovgaard (1998), and thisis a topic forfuture research. We have not used Cornish-Fisher expansions forthe percentiles of the exact distribution since from Thompson(1997), page 72,
= accuracyin small samples,even in the case f = 0, -yj

has shown limits obtained do nothaveparticularly 'Experience that (bythis method) goodcoverage
'72=
0?

(our italicsand notation).

References
of samplemeans.Ann.Statist., 19, 1295-1315. Bai, Z. D. and Rao, C. R. (1991) Edgeworth expansionof a function Bowley,A. (1926) Measurement of precision attainedin sampling.Bull. Int. Statist.Inst.,22, book 1, 1-45. Cochran,W. G. (1977) SamplingTechniques, 3rdedn. New York: Wiley. forsamplesfroma finite population.Ptubl. Erd6s,P. and Renyi,A. (1959) On thecentral limittheorem Matli. Inst. Hung.Acad. Sci., 4, 49-57. from a finite population.Puibl. Math. Inst.IIung. Hajek, J.(1960) Limiting distributions in simplerandomsampling Acad. Sci., 5, 361-374. Hall, P. (1992) The Bootstrap and Edgeworth Expansion.New York: Springer. populations.MSc Thesis. University of Southampton, Jones,R. P. (1996) The centrallimittheoremfor finite Southampton. finite universes. Ann.Math. based on samplesfrom Madow, W. G. (1948) On thelimiting distributions of estimates Statist.,19, 535-545. of theCentralLimitTheoremto finite Plane,D. R. and Gordon,K. R. (1982) A simpleproofof thenonapplicability populations.Am. Statistn, 36, 175-176. population.Anti.Statist., 6, 1005-1011. Robinson,J. (1978) An asymptotic expansionforsamplesfroma finite of smallrelative errors forempirical saddlepoint Robinson,J.and Skovgaard,I. M. (1998) Bounds forprobabilities and bootstrap tail approximations. Ann.Statist., 26, 2369-2394. forsuccessive withvarying probabilities without replacement, partsI Ros6n,B. (1972) Asymptotic theory sampling and II. Ann.Mcath. 43, 373-397, 748-776. Statist., to thedistribution of thesamplemean under Sugden,R. A. and Smith, T. M. F. (1997) Edgeworth approximations simplerandomsampling. Statist.Probab.Lett.,34, 293-299. Thompson,M. E. (1997) Theory of Sample Surveys. London: Chapman and Hall.

This content downloaded on Mon, 28 Jan 2013 09:22:15 AM All use subject to JSTOR Terms and Conditions