Sunteți pe pagina 1din 6

8/28/2015

GradientboostingWikipedia,thefreeencyclopedia

Gradientboosting
FromWikipedia,thefreeencyclopedia

Gradientboostingisamachinelearningtechniqueforregressionandclassificationproblems,whichproducesaprediction
modelintheformofanensembleofweakpredictionmodels,typicallydecisiontrees.Itbuildsthemodelinastagewisefashion
likeotherboostingmethodsdo,anditgeneralizesthembyallowingoptimizationofanarbitrarydifferentiablelossfunction.
TheideaofgradientboostingoriginatedintheobservationbyLeoBreiman[1]thatboostingcanbeinterpretedasanoptimization
algorithmonasuitablecostfunction.ExplicitregressiongradientboostingalgorithmsweresubsequentlydevelopedbyJeromeH.
Friedman[2][3]simultaneouslywiththemoregeneralfunctionalgradientboostingperspectiveofLlewMason,JonathanBaxter,
PeterBartlettandMarcusFrean.[4][5]Thelattertwopapersintroducedtheabstractviewofboostingalgorithmsasiterative
functionalgradientdescentalgorithms.Thatis,algorithmsthatoptimizeacostfunctionaloverfunctionspacebyiteratively
choosingafunction(weakhypothesis)thatpointsinthenegativegradientdirection.Thisfunctionalgradientviewofboostinghas
ledtothedevelopmentofboostingalgorithmsinmanyareasofmachinelearningandstatisticsbeyondregressionand
classification.

Contents
1Informalintroduction
2Algorithm
3Gradienttreeboosting
3.1Sizeoftrees
4Regularization
4.1Shrinkage
4.2Stochasticgradientboosting
4.3Numberofobservationsinleaves
4.4PenalizeComplexityofTree
5Usage
6Names
7Seealso
8References

Informalintroduction
(ThissectionfollowstheexpositionofgradientboostingbyLi.[6])
Likeotherboostingmethods,gradientboostingcombinesweaklearnersintoasinglestronglearner,inaniterativefashion.Itis
easiesttoexplainintheleastsquaresregressionsetting,wherethegoalistolearnamodel thatpredictsvalues
,
minimizingthemeansquarederror
tothetruevaluesy(averagedoversometrainingset).
Ateachstage
ofgradientboosting,itmaybeassumedthatthereissomeimperfectmodel
(attheoutset,a
veryweakmodelthatjustpredictsthemeanyinthetrainingsetcouldbeused).Thegradientboostingalgorithmdoesnotchange
inanywayinstead,itimprovesonitbyconstructinganewmodelthataddsanestimatorhtoprovideabettermodel
.Thequestionisnow,howtofind ?Thegradientboostingsolutionstartswiththeobservation
thataperfecthwouldimply

or,equivalently,
.

Therefore,gradientboostingwillfithtotheresidual
https://en.wikipedia.org/wiki/Gradient_boosting

.Likeinotherboostingvariants,each

learnstocorrect
1/6

8/28/2015

GradientboostingWikipedia,thefreeencyclopedia

Therefore,gradientboostingwillfithtotheresidual
.Likeinotherboostingvariants,each
learnstocorrect
itspredecessor
.Ageneralizationofthisideatootherlossfunctionsthansquarederror(andtoclassificationandranking
problems)followsfromtheobservationthatresiduals
arethenegativegradientsofthesquarederrorlossfunction
.So,gradientboostingisagradientdescentalgorithmandgeneralizingitentails"pluggingin"adifferentloss
anditsgradient.

Algorithm
Inmanysupervisedlearningproblemsonehasanoutputvariableyandavectorofinputvariablesxconnectedtogetherviaa
jointprobabilitydistributionP(x,y).Usingatrainingset
ofknownvaluesofxandcorresponding
valuesofy,thegoalistofindanapproximation
toafunctionF*(x)thatminimizestheexpectedvalueofsomespecified
lossfunctionL(y,F(x)):
.
Gradientboostingmethodassumesarealvaluedyandseeksanapproximation
h(x)fromsomeclass,calledbase(orweak)learners:

intheformofaweightedsumoffunctions

.
Inaccordancewiththeempiricalriskminimizationprinciple,themethodtriestofindanapproximation
thatminimizesthe
averagevalueofthelossfunctiononthetrainingset.Itdoessobystartingwithamodel,consistingofaconstantfunction
,
andincrementallyexpandingitinagreedyfashion:
,
,
wherefisrestrictedtobeafunctionfromtheclassofbaselearnerfunctions.
However,theproblemofchoosingateachstepthebestfforanarbitrarylossfunctionLisahardoptimizationproblemin
general,andsowe'll"cheat"bysolvingamucheasierprobleminstead.
Theideaistoapplyasteepestdescentsteptothisminimizationproblem.Ifweonlycaredaboutpredictionsatthepointsofthe
trainingset,andfwereunrestricted,we'dupdatethemodelperthefollowingequation,whereweviewL(y,f)notasafunctional
off,butasafunctionofavectorofvalues
:

Butasfmustcomefromarestrictedclassoffunctions(that'swhatallowsustogeneralize),we'lljustchoosetheonethatmost
closelyapproximatesthegradientofL.Havingchosenf,themultiplieristhenselectedusinglinesearchjustasshowninthe
secondequationabove.
Inpseudocode,thegenericgradientboostingmethodis:[2][7]
https://en.wikipedia.org/wiki/Gradient_boosting

2/6

8/28/2015

GradientboostingWikipedia,thefreeencyclopedia

Input:trainingset

adifferentiablelossfunction

numberofiterations

Algorithm:
1. Initializemodelwithaconstantvalue:

2. Form=1toM:
1. Computesocalledpseudoresiduals:

2. Fitabaselearner
3. Computemultiplier

topseudoresiduals,i.e.trainitusingthetrainingset

bysolvingthefollowingonedimensionaloptimizationproblem:

4. Updatethemodel:

3. Output

Gradienttreeboosting
Gradientboostingistypicallyusedwithdecisiontrees(especiallyCARTtrees)ofafixedsizeasbaselearners.Forthisspecial
caseFriedmanproposesamodificationtogradientboostingmethodwhichimprovesthequalityoffitofeachbaselearner.
Genericgradientboostingatthemthstepwouldfitadecisiontree
topseudoresiduals.Let bethenumberofits
leaves.Thetreepartitionstheinputspaceinto disjointregions
andpredictsaconstantvalueineachregion.
Usingtheindicatornotation,theoutputof
forinputxcanbewrittenasthesum:

where

isthevaluepredictedintheregion

.[8]

Thenthecoefficients
aremultipliedbysomevalue
modelisupdatedasfollows:

,chosenusinglinesearchsoastominimizethelossfunction,andthe

Friedmanproposestomodifythisalgorithmsothatitchoosesaseparateoptimalvalue
foreachofthetree'sregions,instead
ofasingle forthewholetree.Hecallsthemodifiedalgorithm"TreeBoost".Thecoefficients
fromthetreefitting
procedurecanbethensimplydiscardedandthemodelupdaterulebecomes:

https://en.wikipedia.org/wiki/Gradient_boosting

3/6

8/28/2015

GradientboostingWikipedia,thefreeencyclopedia

Sizeoftrees
,thenumberofterminalnodesintrees,isthemethod'sparameterwhichcanbeadjustedforadatasetathand.Itcontrolsthe
maximumallowedlevelofinteractionbetweenvariablesinthemodel.With
(decisionstumps),nointeractionbetween
variablesisallowed.With
themodelmayincludeeffectsoftheinteractionbetweenuptotwovariables,andsoon.
Hastieetal.[7]commentthattypically
workwellforboostingandresultsarefairlyinsensitivetothechoiceof in
thisrange,
isinsufficientformanyapplications,and
isunlikelytoberequired.

Regularization
Fittingthetrainingsettoocloselycanleadtodegradationofthemodel'sgeneralizationability.Severalsocalledregularization
techniquesreducethisoverfittingeffectbyconstrainingthefittingprocedure.
OnenaturalregularizationparameteristhenumberofgradientboostingiterationsM(i.e.thenumberoftreesinthemodelwhen
thebaselearnerisadecisiontree).IncreasingMreducestheerrorontrainingset,butsettingittoohighmayleadtooverfitting.
AnoptimalvalueofMisoftenselectedbymonitoringpredictionerroronaseparatevalidationdataset.BesidescontrollingM,
severalotherregularizationtechniquesareused.

Shrinkage
Animportantpartofgradientboostingmethodisregularizationbyshrinkagewhichconsistsinmodifyingtheupdateruleas
follows:

whereparameter iscalledthe"learningrate".
Empiricallyithasbeenfoundthatusingsmalllearningrates(suchas
)yieldsdramaticimprovementsinmodel's
[7]
generalizationabilityovergradientboostingwithoutshrinking(
). However,itcomesatthepriceofincreasing
computationaltimebothduringtrainingandquerying:lowerlearningraterequiresmoreiterations.

Stochasticgradientboosting
SoonaftertheintroductionofgradientboostingFriedmanproposedaminormodificationtothealgorithm,motivatedby
Breiman'sbaggingmethod.[3]Specifically,heproposedthatateachiterationofthealgorithm,abaselearnershouldbefitona
subsampleofthetrainingsetdrawnatrandomwithoutreplacement.[9]Friedmanobservedasubstantialimprovementingradient
boosting'saccuracywiththismodification.
Subsamplesizeissomeconstantfractionfofthesizeofthetrainingset.Whenf=1,thealgorithmisdeterministicandidentical
totheonedescribedabove.Smallervaluesoffintroducerandomnessintothealgorithmandhelppreventoverfitting,actingasa
kindofregularization.Thealgorithmalsobecomesfaster,becauseregressiontreeshavetobefittosmallerdatasetsateach
iteration.Friedman[3]obtainedthat
leadstogoodresultsforsmallandmoderatesizedtrainingsets.Therefore,f
istypicallysetto0.5,meaningthatonehalfofthetrainingsetisusedtobuildeachbaselearner.
Also,likeinbagging,subsamplingallowsonetodefineanoutofbagestimateofthepredictionperformanceimprovementby
evaluatingpredictionsonthoseobservationswhichwerenotusedinthebuildingofthenextbaselearner.Outofbagestimates
helpavoidtheneedforanindependentvalidationdataset,butoftenunderestimateactualperformanceimprovementandthe
optimalnumberofiterations.[10]

Numberofobservationsinleaves
Gradienttreeboostingimplementationsoftenalsouseregularizationbylimitingtheminimumnumberofobservationsintrees'
terminalnodes(thisparameteriscalledn.minobsinnodeintheRgbmpackage[10]).Itisusedinthetreebuildingprocessby
ignoringanysplitsthatleadtonodescontainingfewerthanthisnumberoftrainingsetinstances.
https://en.wikipedia.org/wiki/Gradient_boosting

4/6

8/28/2015

GradientboostingWikipedia,thefreeencyclopedia

Imposingthislimithelpstoreducevarianceinpredictionsatleaves.

PenalizeComplexityofTree
Anotherusefulregularizationtechniquesforgradientboostedtreesistopenalizemodelcomplexityofthelearnedmodel.[11]The
modelcomplexitycanbedefinedproportionalnumberofleavesinthelearnedtrees.Thejointlyoptimizationoflossandmodel
complexitycorrespondstoapostpruningalgorithmtoremovebranchesthatfailtoreducethelossbyathreshold.Otherkindsof
regularizationsuchasl2penaltyontheleavevaluescanalsobeaddedtoavoidoverfitting.

Usage
Recently,gradientboostinghasgainedsomepopularityinthefieldoflearningtorank.Thecommercialwebsearchengines
Yahoo[12]andYandex[13]usevariantsofgradientboostingintheirmachinelearnedrankingengines.

Names
Themethodgoesbyavarietyofnames.Friedmanintroducedhisregressiontechniqueasa"GradientBoostingMachine"
(GBM).[2]Mason,Baxteret.el.describedthegeneralizedabstractclassofalgorithmsas"functionalgradientboosting".[4][5]
Apopularopensourceimplementation[10]forRcallsit"GeneralizedBoostingModel".Commercialimplementationsfrom
SalfordSystemsusethenames"MultipleAdditiveRegressionTrees"(MART)andTreeNet,bothtrademarked.

Seealso
AdaBoost
Randomforest

References
1. Brieman,L."ArcingTheEdge(http://statistics.berkeley.edu/sites/default/files/techreports/486.pdf)"(June1997)
2. Friedman,J.H."GreedyFunctionApproximation:AGradientBoostingMachine.(http://wwwstat.stanford.edu/~jhf/ftp/trebst.pdf)"
(February1999)
3. Friedman,J.H."StochasticGradientBoosting.(https://statweb.stanford.edu/~jhf/ftp/stobst.pdf)"(March1999)
4. Mason,L.Baxter,J.Bartlett,P.L.Frean,Marcus(1999)."BoostingAlgorithmsasGradientDescent"
(http://papers.nips.cc/paper/1766boostingalgorithmsasgradientdescent.pdf)(PDF).InS.A.SollaandT.K.LeenandK.Mller.
AdvancesinNeuralInformationProcessingSystems12.MITPress.pp.512518.
5. Mason,L.Baxter,J.Bartlett,P.L.Frean,Marcus(May1999).BoostingAlgorithmsasGradientDescentinFunctionSpace
(http://maths.dur.ac.uk/~dma6kp/pdf/face_recognition/Boosting/Mason99AnyboostLong.pdf)(PDF).
6. ChengLi."AGentleIntroductiontoGradientBoosting"
(http://www.ccs.neu.edu/home/vip/teach/MLcourse/4_boosting/slides/gradient_boosting.pdf)(PDF).NortheasternUniversity.Retrieved
19August2014.
7. Hastie,T.Tibshirani,R.Friedman,J.H.(2009)."10.BoostingandAdditiveTrees".TheElementsofStatisticalLearning(http://www
stat.stanford.edu/~tibs/ElemStatLearn/)(2nded.).NewYork:Springer.pp.337384.ISBN0387848576.
8. Note:incaseofusualCARTtrees,thetreesarefittedusingleastsquaresloss,andsothecoefficient
fortheregion
isequal
tojustthevalueofoutputvariable,averagedoveralltraininginstancesin
.
9. Notethatthisisdifferentfrombagging,whichsampleswithreplacementbecauseitusessamplesofthesamesizeasthetrainingset.
10. Ridgeway,Greg(2007).GeneralizedBoostedModels:Aguidetothegbmpackage.(http://cran.rproject.org/web/packages/gbm/gbm.pdf)
11. TianqiChen.IntroductiontoBoostedTrees(http://homes.cs.washington.edu/~tqchen/pdf/BoostedTree.pdf)
12. Cossock,DavidandZhang,Tong(2008).StatisticalAnalysisofBayesOptimalSubsetRanking
(http://www.stat.rutgers.edu/~tzhang/papers/it08ranking.pdf),page14.
13. Yandexcorporateblogentryaboutnewrankingmodel"Snezhinsk"(http://webmaster.ya.ru/replies.xml?item_no=5707&ncrnd=5118)(in
Russian)

Retrievedfrom"https://en.wikipedia.org/w/index.php?title=Gradient_boosting&oldid=678013581"
Categories: Decisiontrees Ensemblelearning
https://en.wikipedia.org/wiki/Gradient_boosting

5/6

8/28/2015

GradientboostingWikipedia,thefreeencyclopedia

Thispagewaslastmodifiedon26August2015,at22:37.
TextisavailableundertheCreativeCommonsAttributionShareAlikeLicenseadditionaltermsmayapply.Byusingthis
site,youagreetotheTermsofUseandPrivacyPolicy.WikipediaisaregisteredtrademarkoftheWikimedia
Foundation,Inc.,anonprofitorganization.

https://en.wikipedia.org/wiki/Gradient_boosting

6/6

S-ar putea să vă placă și