Documente Academic
Documente Profesional
Documente Cultură
Determiningsamplesizenecessaryforbootstrapmethod/ProposedMethodCrossValidated
CrossValidatedisaquestionand
answersiteforpeopleinterestedin
statistics,machinelearning,data
analysis,datamining,anddata
visualization.It's100%free,no
registrationrequired.
signup
login
tour
Here'showitworks:
Anybodycanask
aquestion
help
Anybodycan
answer
Thebestanswersarevoted
upandrisetothetop
Signup
Determiningsamplesizenecessaryforbootstrapmethod/ProposedMethod
Iknowthisisaratherhottopicwherenoonereallycangiveasimpleanswerfor.NeverthelessIamwonderingifthefollowingapproachcouldntbe
useful.
Thebootstrapmethodisonlyusefulifyoursamplefollowsmoreorless(readexactly)thesamedistributionastheoriginalpopulation.Inordertobe
certainthisisthecaseyouneedtomakeyoursamplesizelargeenough.Butwhatislargeenough?
Ifmypremiseiscorrectyouhavethesameproblemwhenusingthecentrallimittheoremtodeterminethepopulationmean.Onlywhenyoursample
sizeislargeenoughyoucanbecertainthatthepopulationofyoursamplemeansisnormallydistributed(aroundthepopulationmean).Inother
words,yoursamplesneedtorepresentyourpopulation(distribution)wellenough.Butagain,whatislargeenough?
Inmycase(administrativeprocesses:timeneededtofinishademandvsamountofdemands)Ihaveapopulationwithamultimodaldistribution(all
thedemandsthatarefinishedin2011)ofwhichIam99%certainthatitisevenlessnormallydistributedthanthepopulation(allthedemandsthat
arefinishedbetweenpresentdayandadayinthepast,ideallythistimespanisassmallaspossible)Iwanttoresearch.
My2011populationexistsoutofenoughunitstomakex samplesofasamplesizen .Ichooseavalueofx ,suppose10(x = 10 ).NowIusetrial
anderrortodetermineagoodsamplesize.Itakeann = 50 ,andseeifmysamplemeanpopulationisnormallydistributedbyusingKolmogorov
Smirnov.IfsoIrepeatthesamestepsbutwithasamplesizeof40,ifnotrepeatwithasamplesizeof60(etc.).
AfterawhileIconcludethatn = 45 istheabsoluteminimumsamplesizetogetamoreorlessgoodrepresentationofmy2011population.SinceI
knowmypopulationofinterest(allthedemandsthatarefinishedbetweenpresentdayandadayinthepast)haslessvarianceIcansafelyusea
samplesizeofn = 45 tobootstrap.(Indirectly,then = 45 determinesthesizeofmytimespan:timeneededtofinish45demands.)
Thisis,inshort,myidea.ButsinceIamnotastatisticianbutanengineerwhosestatisticslessonstookplaceinthedaysofyonderIcannot
excludethepossibilityIjustgeneratedalotofrubbish:).Whatdoyouguysthink?Ifmypremisemakessense,doIneedtochoseanx largerthan
10 ,orsmaller?Dependingonyouranswers(doIneedtofeelembarrassedornot?:)I'llbepostingsomemorediscussionideas.
responseonfirstanswerThanksforreplying,Youranswerwasveryusefulltomeespeciallythebooklinks.
ButIamaffraidthatinmyattempttogiveinformationIcompletelycloudedmyquestion.Iknowthatthebootstrapsamplestakeoverthedistribution
ofthepopulationsample.Ifollowyoucompletelybut...
Youroriginalpopulationsampleneedstobelargeenoughtobemoderatelycertainthatthedistributionofyourpopulationsamplecorresponds
(equals)withthe'real'distributionofthepopulation.
Thisismerelyanideaonhowtodeterminehowlargeyouroriginalsamplesizeneedstobeinordertobereasonablycertainthatthesample
distributioncorrespondswiththepopulationdistribution.
Supposeyouhaveabimodalpopulationdistributionandonetopisalotlargerthantheotherone.Ifyoursamplesizeis5thechanceislargethatall
5unitshaveavalueveryclosetothelargetop(chancetoadrandomlydrawaunitthereisthelargest).Inthiscaseyoursampledistributionwil
lookmonomodal.
Withasamplesizeofahundredthechancethatyoursampledistributionisalsobimodalisalotlarger!!Thetroublewithbootstrappingisthatyou
onlyhaveonesample(andyoubuildfurtheronthatsample).Ifthesampledistributionreallydoesnotcorrespondwiththepopulationdistribution
youareintrouble.Thisisjustanideatomakethechanceofhaving'abadsampledistribution'aslowaspossiblewithouthavingtomakeyour
samplesizeinfinitlylarge.
bootstrap
samplesize methodology
editedJul29'12at16:58
askedJul29'12at14:02
siegfried
59
1Answer
ItookinterestinthisquestionbecauseIsawthewordbootstrapandIhavewrittenbooksonthe
bootstrap.Alsopeopleoftenask"HowmanybootstrapsamplesdoIneedtogetagoodMonte
Carloapproximationtothebootstrapresult?"Mysuggestedanswertothatquestionistokeep
http://stats.stackexchange.com/questions/33300/determiningsamplesizenecessaryforbootstrapmethodproposedmethod
1/2
6/6/2016
Determiningsamplesizenecessaryforbootstrapmethod/ProposedMethodCrossValidated
increasingthesizeuntilyougetconvergence.Noonenumberfitsallproblems.
Butthatisapparentlynotthatquestionyouareasking.Youseemtobeaskingwhattheoriginal
samplesizeneedstobeforthebootstraptowork.FirstofallIdonotagreewithyourpremise.
Thebasicnonparametricbootstrapassumesthatthesampleistakenatrandomfroma
population.Soforanysamplesizen thedistributionforsampleschosenatrandomis the
samplingdistributionassumedinbootstrapping.Thebootstrapprinciplesaysthatchoosinga
randomsampleofsizen fromthepopulationcanbemimickedbychoosingabootstrapsample
ofsizen fromtheoriginalsample.Whetherornotthebootstrapprincipleholdsdoesnotdepend
onanyindividualsample"lookingrepresentativeofthepopulation".Whatitdoesdependonis
whatyouareestimatingandsomepropertiesofthepopulationdistribution(e.g.,thisworksfor
samplingmeanswithpopulationdistributionsthathavefinitevariances,butnotwhentheyhave
infinitevariances).Itwillnotworkforestimatingextremesregardlessofthepopulation
distribution.
Thetheoryofthebootstrapinvolvesshowingconsistencyoftheestimate.Soitcanbeshownin
theorythatitworksforlargesamples.Butitcanalsoworkinsmallsamples.Ihaveseenitwork
forclassificationerrorrateestimationparticularlywellinsmallsamplesizessuchas20for
bivariatedata.
Nowifthesamplesizeisverysmallsay4thebootstrapmaynotworkjustbecausethesetof
possiblebootstrapsamplesisnotrichenough.InmybookorPeterHall'sbookthisissueoftoo
smallasamplesizeisdiscussed.Butthisnumberofdistinctbootstrapsamplesgetslargevery
quickly.Sothisisnotanissueenoughforsamplesizesassmallas8.Youcantakealookat
thesereferences:
Mybook:BootstrapMethods:AGuideforPractitionersandResearchers
Hall'sbook:TheBootstrapandEdgeworthExpansion
editedMay13'14at16:50
Gregor
567
answeredJul29'12at14:44
MichaelChernick
14
25.3k
31
77
http://stats.stackexchange.com/questions/33300/determiningsamplesizenecessaryforbootstrapmethodproposedmethod
2/2