Sunteți pe pagina 1din 5

10/7/2016

CompleteGuidetoParameterTuninginGradientBoosting(GBM)inPython

Introduction
IfyouhavebeenusingGBMasablackboxtillnow,maybeitstimeforyoutoopenitandsee,how
itactuallyworks!
This article is inspired by Owen Zhangs (Chief Product Officer at DataRobot and Kaggle Rank 3)
approach shared at NYC Data Science Academy . He delivered a ~2 hours talk and I intend to
condenseitandpresentthemostpreciousnuggetshere.
Boosting algorithms play a crucial role in dealing with bias variance tradeoff. Unlike bagging
algorithms,whichonlycontrolsforhighvarianceinamodel,boostingcontrolsboththeaspects(bias
&variance),andisconsideredtobemoreeffective.AsincereunderstandingofGBMhereshould
giveyoumuchneededconfidencetodealwithsuchcriticalissues.
In this article, Ill disclose the science behind using GBM using Python.And, most important, how
youcantuneitsparametersandobtainincredibleresults.
SpecialThanks:Personally,IwouldliketoacknowledgethetimelesssupportprovidedbyMr.Sudalai
Rajkumar,currently AVRank2.Thisarticlewouldntbepossiblewithouthisguidance.Iamsurethe
wholecommunitywillbenefitfromthesame.

http://www.analyticsvidhya.com/blog/2016/02/completeguideparametertuninggradientboostinggbmpython/

1/5

10/7/2016

CompleteGuidetoParameterTuninginGradientBoosting(GBM)inPython

TableofContents
1.HowBoostingWorks?
2.UnderstandingGBMParameters
3.TuningParameters(withExample)

1.HowBoostingWorks?
Boosting is a sequential technique which works on the principle ofensemble. It combines a set
of weak learners and delivers improved prediction accuracy. At any instant t, the model
outcomes are weighed based on the outcomes of previous instant t1. The outcomes predicted
correctly are given a lower weight and the ones missclassified are weighted higher. This
techniqueisfollowedforaclassificationproblemwhileasimilartechniqueisusedforregression.
Letsunderstanditvisually:

Observations:
1.Box1:OutputofFirstWeakLearner(Fromtheleft)
Initiallyallpointshavesameweight(denotedbytheirsize).
Thedecisionboundarypredicts2+veand5vepointscorrectly.
2.Box2:OutputofSecondWeakLearner
Thepointsclassifiedcorrectlyinbox1aregivenalowerweightandviceversa.
Themodelfocusesonhighweightpointsnowandclassifiesthemcorrectly.But,othersare
misclassifiednow.

http://www.analyticsvidhya.com/blog/2016/02/completeguideparametertuninggradientboostinggbmpython/

2/5

10/7/2016

CompleteGuidetoParameterTuninginGradientBoosting(GBM)inPython

Similartrendcanbeseeninbox3aswell.Thiscontinuesformanyiterations.Intheend,allmodels
aregivenaweightdependingontheiraccuracyandaconsolidatedresultisgenerated.
DidIwhetyourappetite?Good.Refertothesearticles(focusonGBMrightnow):
LearnGradientBoostingAlgorithmforbetterpredictions(withcodesinR)
QuickIntroductiontoBoostingAlgorithmsinMachineLearning
GettingsmartwithMachineLearningAdaBoostandGradientBoost

2.GBMParameters
Theoverallparameterscanbedividedinto3categories:
1.TreeSpecificParameters:Theseaffecteachindividualtreeinthemodel.
2.BoostingParameters:Theseaffecttheboostingoperationinthemodel.
3.MiscellaneousParameters:Otherparametersforoverallfunctioning.

Illstartwithtreespecificparameters.First,letslookatthegeneralstructureofadecisiontree:

Theparametersusedfordefiningatreearefurtherexplainedbelow.NotethatImusingscikitlearn
(python)specificterminologiesherewhichmightbedifferentinothersoftwarepackageslikeR.But

http://www.analyticsvidhya.com/blog/2016/02/completeguideparametertuninggradientboostinggbmpython/

3/5

10/7/2016

CompleteGuidetoParameterTuninginGradientBoosting(GBM)inPython

theidearemainsthesame.
1.min_samples_split
Definestheminimumnumberofsamples(orobservations)whicharerequiredinanodetobe
consideredforsplitting.
Usedtocontroloverfitting.Highervaluespreventamodelfromlearningrelationswhichmightbe
highlyspecifictotheparticularsampleselectedforatree.
Toohighvaluescanleadtounderfittinghence,itshouldbetunedusingCV.
2.min_samples_leaf
Definestheminimumsamples(orobservations)requiredinaterminalnodeorleaf.
Usedtocontroloverfittingsimilartomin_samples_split.
Generallylowervaluesshouldbechosenforimbalancedclassproblemsbecausetheregionsin
whichtheminorityclasswillbeinmajoritywillbeverysmall.
3.min_weight_fraction_leaf
Similartomin_samples_leafbutdefinedasafractionofthetotalnumberofobservationsinsteadof
aninteger.
Onlyoneof#2and#3shouldbedefined.
4.max_depth
Themaximumdepthofatree.
Usedtocontroloverfittingashigherdepthwillallowmodeltolearnrelationsveryspecifictoa
particularsample.
ShouldbetunedusingCV.
5.max_leaf_nodes
Themaximumnumberofterminalnodesorleavesinatree.
Canbedefinedinplaceofmax_depth.Sincebinarytreesarecreated,adepthofnwould
produceamaximumof2^nleaves.
Ifthisisdefined,GBMwillignoremax_depth.
6.max_features
Thenumberoffeaturestoconsiderwhilesearchingforabestsplit.Thesewillberandomly
selected.
Asathumbrule,squarerootofthetotalnumberoffeaturesworksgreatbutweshouldcheckupto
3040%ofthetotalnumberoffeatures.
Highervaluescanleadtooverfittingbutdependsoncasetocase.

Beforemovingontootherparameters,letsseetheoverallpseudocodeoftheGBMalgorithmfor2
classes:

1.Initializetheoutcome

http://www.analyticsvidhya.com/blog/2016/02/completeguideparametertuninggradientboostinggbmpython/

4/5

10/7/2016

CompleteGuidetoParameterTuninginGradientBoosting(GBM)inPython

2.Iteratefrom1tototalnumberoftrees
2.1Updatetheweightsfortargetsbasedonpreviousrun(higherfortheonesmisclassified)
2.2Fitthemodelonselectedsubsampleofdata
2.3Makepredictionsonthefullsetofobservations
2.4Updatetheoutputwithcurrentresultstakingintoaccountthelearningrate
3.Returnthefinaloutput.

This is an extremely simplified (probably naive) explanation of GBMs working. The parameters
whichwehaveconsideredsofarwillaffectstep2.2,i.e.modelbuilding.Letsconsideranothersetof
parametersformanagingboosting:
1.learning_rate
Thisdeterminestheimpactofeachtreeonthefinaloutcome(step2.4).GBMworksbystarting
with an initial estimate which is updated using the output of each tree. The learning parameter
controlsthemagnitudeofthischangeintheestimates.
Lower values are generally preferred as they make the model robust to the specific
characteristicsoftreeandthusallowingittogeneralizewell.
Lower values would require higher number of trees to model all the relations and will be
computationallyexpensive.
2.n_estimators
Thenumberofsequentialtreestobemodeled(step2)
ThoughGBMisfairlyrobustathighernumberoftreesbutitcanstilloverfitatapoint.Hence,this
shouldbetunedusingCVforaparticularlearningrate.
3.subsample
Thefractionofobservationstobeselectedforeachtree.Selectionisdonebyrandomsampling.
Valuesslightlylessthan1makethemodelrobustbyreducingthevariance.
Typicalvalues~0.8generallyworkfinebutcanbefinetunedfurther.

Apartfromthese,therearecertainmiscellaneousparameterswhichaffectoverallfunctionality:
1.loss
Itreferstothelossfunctiontobeminimizedineachsplit.
It can have various values for classification and regression case. Generally the default values
workfine.Othervaluesshouldbechosenonlyifyouunderstandtheirimpactonthemodel.
2.init
Thisaffectsinitializationoftheoutput.
Thiscanbeusedifwehavemadeanothermodelwhoseoutcomeistobeusedastheinitial

http://www.analyticsvidhya.com/blog/2016/02/completeguideparametertuninggradientboostinggbmpython/

5/5

S-ar putea să vă placă și