Sunteți pe pagina 1din 2

6/6/2016

Determiningsamplesizenecessaryforbootstrapmethod/ProposedMethodCrossValidated

CrossValidatedisaquestionand
answersiteforpeopleinterestedin
statistics,machinelearning,data
analysis,datamining,anddata
visualization.It's100%free,no
registrationrequired.

signup

login

tour

Here'showitworks:

Anybodycanask
aquestion

help

Anybodycan
answer

Thebestanswersarevoted
upandrisetothetop

Signup

Determiningsamplesizenecessaryforbootstrapmethod/ProposedMethod
Iknowthisisaratherhottopicwherenoonereallycangiveasimpleanswerfor.NeverthelessIamwonderingifthefollowingapproachcouldntbe
useful.
Thebootstrapmethodisonlyusefulifyoursamplefollowsmoreorless(readexactly)thesamedistributionastheoriginalpopulation.Inordertobe
certainthisisthecaseyouneedtomakeyoursamplesizelargeenough.Butwhatislargeenough?
Ifmypremiseiscorrectyouhavethesameproblemwhenusingthecentrallimittheoremtodeterminethepopulationmean.Onlywhenyoursample
sizeislargeenoughyoucanbecertainthatthepopulationofyoursamplemeansisnormallydistributed(aroundthepopulationmean).Inother
words,yoursamplesneedtorepresentyourpopulation(distribution)wellenough.Butagain,whatislargeenough?
Inmycase(administrativeprocesses:timeneededtofinishademandvsamountofdemands)Ihaveapopulationwithamultimodaldistribution(all
thedemandsthatarefinishedin2011)ofwhichIam99%certainthatitisevenlessnormallydistributedthanthepopulation(allthedemandsthat
arefinishedbetweenpresentdayandadayinthepast,ideallythistimespanisassmallaspossible)Iwanttoresearch.
My2011populationexistsoutofenoughunitstomakex samplesofasamplesizen .Ichooseavalueofx ,suppose10(x = 10 ).NowIusetrial
anderrortodetermineagoodsamplesize.Itakeann = 50 ,andseeifmysamplemeanpopulationisnormallydistributedbyusingKolmogorov
Smirnov.IfsoIrepeatthesamestepsbutwithasamplesizeof40,ifnotrepeatwithasamplesizeof60(etc.).
AfterawhileIconcludethatn = 45 istheabsoluteminimumsamplesizetogetamoreorlessgoodrepresentationofmy2011population.SinceI
knowmypopulationofinterest(allthedemandsthatarefinishedbetweenpresentdayandadayinthepast)haslessvarianceIcansafelyusea
samplesizeofn = 45 tobootstrap.(Indirectly,then = 45 determinesthesizeofmytimespan:timeneededtofinish45demands.)
Thisis,inshort,myidea.ButsinceIamnotastatisticianbutanengineerwhosestatisticslessonstookplaceinthedaysofyonderIcannot
excludethepossibilityIjustgeneratedalotofrubbish:).Whatdoyouguysthink?Ifmypremisemakessense,doIneedtochoseanx largerthan
10 ,orsmaller?Dependingonyouranswers(doIneedtofeelembarrassedornot?:)I'llbepostingsomemorediscussionideas.
responseonfirstanswerThanksforreplying,Youranswerwasveryusefulltomeespeciallythebooklinks.
ButIamaffraidthatinmyattempttogiveinformationIcompletelycloudedmyquestion.Iknowthatthebootstrapsamplestakeoverthedistribution
ofthepopulationsample.Ifollowyoucompletelybut...
Youroriginalpopulationsampleneedstobelargeenoughtobemoderatelycertainthatthedistributionofyourpopulationsamplecorresponds
(equals)withthe'real'distributionofthepopulation.
Thisismerelyanideaonhowtodeterminehowlargeyouroriginalsamplesizeneedstobeinordertobereasonablycertainthatthesample
distributioncorrespondswiththepopulationdistribution.
Supposeyouhaveabimodalpopulationdistributionandonetopisalotlargerthantheotherone.Ifyoursamplesizeis5thechanceislargethatall
5unitshaveavalueveryclosetothelargetop(chancetoadrandomlydrawaunitthereisthelargest).Inthiscaseyoursampledistributionwil
lookmonomodal.
Withasamplesizeofahundredthechancethatyoursampledistributionisalsobimodalisalotlarger!!Thetroublewithbootstrappingisthatyou
onlyhaveonesample(andyoubuildfurtheronthatsample).Ifthesampledistributionreallydoesnotcorrespondwiththepopulationdistribution
youareintrouble.Thisisjustanideatomakethechanceofhaving'abadsampledistribution'aslowaspossiblewithouthavingtomakeyour
samplesizeinfinitlylarge.
bootstrap

samplesize methodology
editedJul29'12at16:58

askedJul29'12at14:02

siegfried
59

1Answer

ItookinterestinthisquestionbecauseIsawthewordbootstrapandIhavewrittenbooksonthe
bootstrap.Alsopeopleoftenask"HowmanybootstrapsamplesdoIneedtogetagoodMonte
Carloapproximationtothebootstrapresult?"Mysuggestedanswertothatquestionistokeep

http://stats.stackexchange.com/questions/33300/determiningsamplesizenecessaryforbootstrapmethodproposedmethod

1/2

6/6/2016

Determiningsamplesizenecessaryforbootstrapmethod/ProposedMethodCrossValidated

increasingthesizeuntilyougetconvergence.Noonenumberfitsallproblems.
Butthatisapparentlynotthatquestionyouareasking.Youseemtobeaskingwhattheoriginal
samplesizeneedstobeforthebootstraptowork.FirstofallIdonotagreewithyourpremise.
Thebasicnonparametricbootstrapassumesthatthesampleistakenatrandomfroma
population.Soforanysamplesizen thedistributionforsampleschosenatrandomis the
samplingdistributionassumedinbootstrapping.Thebootstrapprinciplesaysthatchoosinga
randomsampleofsizen fromthepopulationcanbemimickedbychoosingabootstrapsample
ofsizen fromtheoriginalsample.Whetherornotthebootstrapprincipleholdsdoesnotdepend
onanyindividualsample"lookingrepresentativeofthepopulation".Whatitdoesdependonis
whatyouareestimatingandsomepropertiesofthepopulationdistribution(e.g.,thisworksfor
samplingmeanswithpopulationdistributionsthathavefinitevariances,butnotwhentheyhave
infinitevariances).Itwillnotworkforestimatingextremesregardlessofthepopulation
distribution.
Thetheoryofthebootstrapinvolvesshowingconsistencyoftheestimate.Soitcanbeshownin
theorythatitworksforlargesamples.Butitcanalsoworkinsmallsamples.Ihaveseenitwork
forclassificationerrorrateestimationparticularlywellinsmallsamplesizessuchas20for
bivariatedata.
Nowifthesamplesizeisverysmallsay4thebootstrapmaynotworkjustbecausethesetof
possiblebootstrapsamplesisnotrichenough.InmybookorPeterHall'sbookthisissueoftoo
smallasamplesizeisdiscussed.Butthisnumberofdistinctbootstrapsamplesgetslargevery
quickly.Sothisisnotanissueenoughforsamplesizesassmallas8.Youcantakealookat
thesereferences:
Mybook:BootstrapMethods:AGuideforPractitionersandResearchers
Hall'sbook:TheBootstrapandEdgeworthExpansion
editedMay13'14at16:50

Gregor
567

answeredJul29'12at14:44

MichaelChernick
14

25.3k

31

77

http://stats.stackexchange.com/questions/33300/determiningsamplesizenecessaryforbootstrapmethodproposedmethod

2/2

S-ar putea să vă placă și