Gradient Boosting - Wikipedia, The Free Encyclopedia

8/28/2015
GradientboostingWikipedia,thefreeencyclopedia
Gradientboosting
FromWikipedia,thefreeencyclopedia
Gradientboostingisamachinelearningtechniqueforregressionandclassificationproblems,whichproducesaprediction
modelintheformofanensembleofweakpredictionmodels,typicallydecisiontrees.Itbuildsthemodelinastagewisefashion
likeotherboostingmethodsdo,anditgeneralizesthembyallowingoptimizationofanarbitrarydifferentiablelossfunction.
TheideaofgradientboostingoriginatedintheobservationbyLeoBreiman[1]thatboostingcanbeinterpretedasanoptimization
algorithmonasuitablecostfunction.ExplicitregressiongradientboostingalgorithmsweresubsequentlydevelopedbyJeromeH.
Friedman[2][3]simultaneouslywiththemoregeneralfunctionalgradientboostingperspectiveofLlewMason,JonathanBaxter,
PeterBartlettandMarcusFrean.[4][5]Thelattertwopapersintroducedtheabstractviewofboostingalgorithmsasiterative
functionalgradientdescentalgorithms.Thatis,algorithmsthatoptimizeacostfunctionaloverfunctionspacebyiteratively
choosingafunction(weakhypothesis)thatpointsinthenegativegradientdirection.Thisfunctionalgradientviewofboostinghas
ledtothedevelopmentofboostingalgorithmsinmanyareasofmachinelearningandstatisticsbeyondregressionand
classification.
Contents
1Informalintroduction
2Algorithm
3Gradienttreeboosting
3.1Sizeoftrees
4Regularization
4.1Shrinkage
4.2Stochasticgradientboosting
4.3Numberofobservationsinleaves
4.4PenalizeComplexityofTree
5Usage
6Names
7Seealso
8References
Informalintroduction
(ThissectionfollowstheexpositionofgradientboostingbyLi.[6])
Likeotherboostingmethods,gradientboostingcombinesweaklearnersintoasinglestronglearner,inaniterativefashion.Itis
easiesttoexplainintheleastsquaresregressionsetting,wherethegoalistolearnamodel thatpredictsvalues
,
minimizingthemeansquarederror
tothetruevaluesy(averagedoversometrainingset).
Ateachstage
ofgradientboosting,itmaybeassumedthatthereissomeimperfectmodel
(attheoutset,a
veryweakmodelthatjustpredictsthemeanyinthetrainingsetcouldbeused).Thegradientboostingalgorithmdoesnotchange
inanywayinstead,itimprovesonitbyconstructinganewmodelthataddsanestimatorhtoprovideabettermodel
.Thequestionisnow,howtofind ?Thegradientboostingsolutionstartswiththeobservation
thataperfecthwouldimply
or,equivalently,
.
Therefore,gradientboostingwillfithtotheresidual
https://en.wikipedia.org/wiki/Gradient_boosting
.Likeinotherboostingvariants,each
learnstocorrect
1/6
8/28/2015
Therefore,gradientboostingwillfithtotheresidual
.Likeinotherboostingvariants,each
learnstocorrect
itspredecessor
.Ageneralizationofthisideatootherlossfunctionsthansquarederror(andtoclassificationandranking
problems)followsfromtheobservationthatresiduals
arethenegativegradientsofthesquarederrorlossfunction
.So,gradientboostingisagradientdescentalgorithmandgeneralizingitentails"pluggingin"adifferentloss
anditsgradient.
Algorithm
Inmanysupervisedlearningproblemsonehasanoutputvariableyandavectorofinputvariablesxconnectedtogetherviaa
jointprobabilitydistributionP(x,y).Usingatrainingset
ofknownvaluesofxandcorresponding
valuesofy,thegoalistofindanapproximation
toafunctionF*(x)thatminimizestheexpectedvalueofsomespecified
lossfunctionL(y,F(x)):
.
Gradientboostingmethodassumesarealvaluedyandseeksanapproximation
h(x)fromsomeclass,calledbase(orweak)learners:
intheformofaweightedsumoffunctions
.
Inaccordancewiththeempiricalriskminimizationprinciple,themethodtriestofindanapproximation
thatminimizesthe
averagevalueofthelossfunctiononthetrainingset.Itdoessobystartingwithamodel,consistingofaconstantfunction
,
andincrementallyexpandingitinagreedyfashion:
,
,
wherefisrestrictedtobeafunctionfromtheclassofbaselearnerfunctions.
However,theproblemofchoosingateachstepthebestfforanarbitrarylossfunctionLisahardoptimizationproblemin
general,andsowe'll"cheat"bysolvingamucheasierprobleminstead.
Theideaistoapplyasteepestdescentsteptothisminimizationproblem.Ifweonlycaredaboutpredictionsatthepointsofthe
trainingset,andfwereunrestricted,we'dupdatethemodelperthefollowingequation,whereweviewL(y,f)notasafunctional
off,butasafunctionofavectorofvalues
:
Butasfmustcomefromarestrictedclassoffunctions(that'swhatallowsustogeneralize),we'lljustchoosetheonethatmost
closelyapproximatesthegradientofL.Havingchosenf,themultiplieristhenselectedusinglinesearchjustasshowninthe
secondequationabove.
Inpseudocode,thegenericgradientboostingmethodis:[2][7]
2/6
8/28/2015
Input:trainingset
adifferentiablelossfunction
numberofiterations
Algorithm:
1. Initializemodelwithaconstantvalue:
2. Form=1toM:
1. Computesocalledpseudoresiduals:
2. Fitabaselearner
3. Computemultiplier
topseudoresiduals,i.e.trainitusingthetrainingset
bysolvingthefollowingonedimensionaloptimizationproblem:
4. Updatethemodel:
3. Output
Gradienttreeboosting
Gradientboostingistypicallyusedwithdecisiontrees(especiallyCARTtrees)ofafixedsizeasbaselearners.Forthisspecial
caseFriedmanproposesamodificationtogradientboostingmethodwhichimprovesthequalityoffitofeachbaselearner.
Genericgradientboostingatthemthstepwouldfitadecisiontree
topseudoresiduals.Let bethenumberofits
leaves.Thetreepartitionstheinputspaceinto disjointregions
andpredictsaconstantvalueineachregion.
Usingtheindicatornotation,theoutputof
forinputxcanbewrittenasthesum:
where
isthevaluepredictedintheregion
.[8]
Thenthecoefficients
aremultipliedbysomevalue
modelisupdatedasfollows:
,chosenusinglinesearchsoastominimizethelossfunction,andthe
Friedmanproposestomodifythisalgorithmsothatitchoosesaseparateoptimalvalue
foreachofthetree'sregions,instead
ofasingle forthewholetree.Hecallsthemodifiedalgorithm"TreeBoost".Thecoefficients
fromthetreefitting
procedurecanbethensimplydiscardedandthemodelupdaterulebecomes:
3/6
8/28/2015
Sizeoftrees
,thenumberofterminalnodesintrees,isthemethod'sparameterwhichcanbeadjustedforadatasetathand.Itcontrolsthe
maximumallowedlevelofinteractionbetweenvariablesinthemodel.With
(decisionstumps),nointeractionbetween
variablesisallowed.With
themodelmayincludeeffectsoftheinteractionbetweenuptotwovariables,andsoon.
Hastieetal.[7]commentthattypically
workwellforboostingandresultsarefairlyinsensitivetothechoiceof in
thisrange,
isinsufficientformanyapplications,and
isunlikelytoberequired.
Regularization
Fittingthetrainingsettoocloselycanleadtodegradationofthemodel'sgeneralizationability.Severalsocalledregularization
techniquesreducethisoverfittingeffectbyconstrainingthefittingprocedure.
OnenaturalregularizationparameteristhenumberofgradientboostingiterationsM(i.e.thenumberoftreesinthemodelwhen
thebaselearnerisadecisiontree).IncreasingMreducestheerrorontrainingset,butsettingittoohighmayleadtooverfitting.
AnoptimalvalueofMisoftenselectedbymonitoringpredictionerroronaseparatevalidationdataset.BesidescontrollingM,
severalotherregularizationtechniquesareused.
Shrinkage
Animportantpartofgradientboostingmethodisregularizationbyshrinkagewhichconsistsinmodifyingtheupdateruleas
follows:
whereparameter iscalledthe"learningrate".
Empiricallyithasbeenfoundthatusingsmalllearningrates(suchas
)yieldsdramaticimprovementsinmodel's
[7]
generalizationabilityovergradientboostingwithoutshrinking(
). However,itcomesatthepriceofincreasing
computationaltimebothduringtrainingandquerying:lowerlearningraterequiresmoreiterations.
Stochasticgradientboosting
SoonaftertheintroductionofgradientboostingFriedmanproposedaminormodificationtothealgorithm,motivatedby
Breiman'sbaggingmethod.[3]Specifically,heproposedthatateachiterationofthealgorithm,abaselearnershouldbefitona
subsampleofthetrainingsetdrawnatrandomwithoutreplacement.[9]Friedmanobservedasubstantialimprovementingradient
boosting'saccuracywiththismodification.
Subsamplesizeissomeconstantfractionfofthesizeofthetrainingset.Whenf=1,thealgorithmisdeterministicandidentical
totheonedescribedabove.Smallervaluesoffintroducerandomnessintothealgorithmandhelppreventoverfitting,actingasa
kindofregularization.Thealgorithmalsobecomesfaster,becauseregressiontreeshavetobefittosmallerdatasetsateach
iteration.Friedman[3]obtainedthat
leadstogoodresultsforsmallandmoderatesizedtrainingsets.Therefore,f
istypicallysetto0.5,meaningthatonehalfofthetrainingsetisusedtobuildeachbaselearner.
Also,likeinbagging,subsamplingallowsonetodefineanoutofbagestimateofthepredictionperformanceimprovementby
evaluatingpredictionsonthoseobservationswhichwerenotusedinthebuildingofthenextbaselearner.Outofbagestimates
helpavoidtheneedforanindependentvalidationdataset,butoftenunderestimateactualperformanceimprovementandthe
optimalnumberofiterations.[10]
Numberofobservationsinleaves
Gradienttreeboostingimplementationsoftenalsouseregularizationbylimitingtheminimumnumberofobservationsintrees'
terminalnodes(thisparameteriscalledn.minobsinnodeintheRgbmpackage[10]).Itisusedinthetreebuildingprocessby
ignoringanysplitsthatleadtonodescontainingfewerthanthisnumberoftrainingsetinstances.
4/6
8/28/2015
Imposingthislimithelpstoreducevarianceinpredictionsatleaves.
PenalizeComplexityofTree
Anotherusefulregularizationtechniquesforgradientboostedtreesistopenalizemodelcomplexityofthelearnedmodel.[11]The
modelcomplexitycanbedefinedproportionalnumberofleavesinthelearnedtrees.Thejointlyoptimizationoflossandmodel
complexitycorrespondstoapostpruningalgorithmtoremovebranchesthatfailtoreducethelossbyathreshold.Otherkindsof
regularizationsuchasl2penaltyontheleavevaluescanalsobeaddedtoavoidoverfitting.
Usage
Recently,gradientboostinghasgainedsomepopularityinthefieldoflearningtorank.Thecommercialwebsearchengines
Yahoo[12]andYandex[13]usevariantsofgradientboostingintheirmachinelearnedrankingengines.
Names
Themethodgoesbyavarietyofnames.Friedmanintroducedhisregressiontechniqueasa"GradientBoostingMachine"
(GBM).[2]Mason,Baxteret.el.describedthegeneralizedabstractclassofalgorithmsas"functionalgradientboosting".[4][5]
Apopularopensourceimplementation[10]forRcallsit"GeneralizedBoostingModel".Commercialimplementationsfrom
SalfordSystemsusethenames"MultipleAdditiveRegressionTrees"(MART)andTreeNet,bothtrademarked.
Seealso
AdaBoost
Randomforest
References
1. Brieman,L."ArcingTheEdge(http://statistics.berkeley.edu/sites/default/files/techreports/486.pdf)"(June1997)
2. Friedman,J.H."GreedyFunctionApproximation:AGradientBoostingMachine.(http://wwwstat.stanford.edu/~jhf/ftp/trebst.pdf)"
(February1999)
3. Friedman,J.H."StochasticGradientBoosting.(https://statweb.stanford.edu/~jhf/ftp/stobst.pdf)"(March1999)
4. Mason,L.Baxter,J.Bartlett,P.L.Frean,Marcus(1999)."BoostingAlgorithmsasGradientDescent"
(http://papers.nips.cc/paper/1766boostingalgorithmsasgradientdescent.pdf)(PDF).InS.A.SollaandT.K.LeenandK.Mller.
AdvancesinNeuralInformationProcessingSystems12.MITPress.pp.512518.
5. Mason,L.Baxter,J.Bartlett,P.L.Frean,Marcus(May1999).BoostingAlgorithmsasGradientDescentinFunctionSpace
(http://maths.dur.ac.uk/~dma6kp/pdf/face_recognition/Boosting/Mason99AnyboostLong.pdf)(PDF).
6. ChengLi."AGentleIntroductiontoGradientBoosting"
(http://www.ccs.neu.edu/home/vip/teach/MLcourse/4_boosting/slides/gradient_boosting.pdf)(PDF).NortheasternUniversity.Retrieved
19August2014.
7. Hastie,T.Tibshirani,R.Friedman,J.H.(2009)."10.BoostingandAdditiveTrees".TheElementsofStatisticalLearning(http://www
stat.stanford.edu/~tibs/ElemStatLearn/)(2nded.).NewYork:Springer.pp.337384.ISBN0387848576.
8. Note:incaseofusualCARTtrees,thetreesarefittedusingleastsquaresloss,andsothecoefficient
fortheregion
isequal
tojustthevalueofoutputvariable,averagedoveralltraininginstancesin
.
9. Notethatthisisdifferentfrombagging,whichsampleswithreplacementbecauseitusessamplesofthesamesizeasthetrainingset.
10. Ridgeway,Greg(2007).GeneralizedBoostedModels:Aguidetothegbmpackage.(http://cran.rproject.org/web/packages/gbm/gbm.pdf)
11. TianqiChen.IntroductiontoBoostedTrees(http://homes.cs.washington.edu/~tqchen/pdf/BoostedTree.pdf)
12. Cossock,DavidandZhang,Tong(2008).StatisticalAnalysisofBayesOptimalSubsetRanking
(http://www.stat.rutgers.edu/~tzhang/papers/it08ranking.pdf),page14.
13. Yandexcorporateblogentryaboutnewrankingmodel"Snezhinsk"(http://webmaster.ya.ru/replies.xml?item_no=5707&ncrnd=5118)(in
Russian)
Retrievedfrom"https://en.wikipedia.org/w/index.php?title=Gradient_boosting&oldid=678013581"
Categories: Decisiontrees Ensemblelearning
5/6
8/28/2015
Thispagewaslastmodifiedon26August2015,at22:37.
TextisavailableundertheCreativeCommonsAttributionShareAlikeLicenseadditionaltermsmayapply.Byusingthis
site,youagreetotheTermsofUseandPrivacyPolicy.WikipediaisaregisteredtrademarkoftheWikimedia
Foundation,Inc.,anonprofitorganization.
6/6

Gradient Boosting - Wikipedia, The Free Encyclopedia

Încărcat de

Informații document

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Gradient Boosting - Wikipedia, The Free Encyclopedia

Încărcat de

Drepturi de autor:

Formate disponibile

8/28/2015

S-ar putea să vă placă și