Complete Guide To Parameter Tuning in Gradient Boosting (GBM) in Python PDF

10/7/2016
CompleteGuidetoParameterTuninginGradientBoosting(GBM)inPython
Introduction
IfyouhavebeenusingGBMasablackboxtillnow,maybeitstimeforyoutoopenitandsee,how
itactuallyworks!
This article is inspired by Owen Zhangs (Chief Product Officer at DataRobot and Kaggle Rank 3)
approach shared at NYC Data Science Academy . He delivered a ~2 hours talk and I intend to
condenseitandpresentthemostpreciousnuggetshere.
Boosting algorithms play a crucial role in dealing with bias variance tradeoff. Unlike bagging
algorithms,whichonlycontrolsforhighvarianceinamodel,boostingcontrolsboththeaspects(bias
&variance),andisconsideredtobemoreeffective.AsincereunderstandingofGBMhereshould
giveyoumuchneededconfidencetodealwithsuchcriticalissues.
In this article, Ill disclose the science behind using GBM using Python.And, most important, how
youcantuneitsparametersandobtainincredibleresults.
SpecialThanks:Personally,IwouldliketoacknowledgethetimelesssupportprovidedbyMr.Sudalai
Rajkumar,currently AVRank2.Thisarticlewouldntbepossiblewithouthisguidance.Iamsurethe
wholecommunitywillbenefitfromthesame.
http://www.analyticsvidhya.com/blog/2016/02/completeguideparametertuninggradientboostinggbmpython/
1/5
10/7/2016
TableofContents
1.HowBoostingWorks?
2.UnderstandingGBMParameters
3.TuningParameters(withExample)
1.HowBoostingWorks?
Boosting is a sequential technique which works on the principle ofensemble. It combines a set
of weak learners and delivers improved prediction accuracy. At any instant t, the model
outcomes are weighed based on the outcomes of previous instant t1. The outcomes predicted
correctly are given a lower weight and the ones missclassified are weighted higher. This
techniqueisfollowedforaclassificationproblemwhileasimilartechniqueisusedforregression.
Letsunderstanditvisually:
Observations:
1.Box1:OutputofFirstWeakLearner(Fromtheleft)
Initiallyallpointshavesameweight(denotedbytheirsize).
Thedecisionboundarypredicts2+veand5vepointscorrectly.
2.Box2:OutputofSecondWeakLearner
Thepointsclassifiedcorrectlyinbox1aregivenalowerweightandviceversa.
Themodelfocusesonhighweightpointsnowandclassifiesthemcorrectly.But,othersare
misclassifiednow.
2/5
10/7/2016
Similartrendcanbeseeninbox3aswell.Thiscontinuesformanyiterations.Intheend,allmodels
aregivenaweightdependingontheiraccuracyandaconsolidatedresultisgenerated.
DidIwhetyourappetite?Good.Refertothesearticles(focusonGBMrightnow):
LearnGradientBoostingAlgorithmforbetterpredictions(withcodesinR)
QuickIntroductiontoBoostingAlgorithmsinMachineLearning
GettingsmartwithMachineLearningAdaBoostandGradientBoost
2.GBMParameters
Theoverallparameterscanbedividedinto3categories:
1.TreeSpecificParameters:Theseaffecteachindividualtreeinthemodel.
2.BoostingParameters:Theseaffecttheboostingoperationinthemodel.
3.MiscellaneousParameters:Otherparametersforoverallfunctioning.
Illstartwithtreespecificparameters.First,letslookatthegeneralstructureofadecisiontree:
Theparametersusedfordefiningatreearefurtherexplainedbelow.NotethatImusingscikitlearn
(python)specificterminologiesherewhichmightbedifferentinothersoftwarepackageslikeR.But
3/5
10/7/2016
theidearemainsthesame.
1.min_samples_split
Definestheminimumnumberofsamples(orobservations)whicharerequiredinanodetobe
consideredforsplitting.
Usedtocontroloverfitting.Highervaluespreventamodelfromlearningrelationswhichmightbe
highlyspecifictotheparticularsampleselectedforatree.
Toohighvaluescanleadtounderfittinghence,itshouldbetunedusingCV.
2.min_samples_leaf
Definestheminimumsamples(orobservations)requiredinaterminalnodeorleaf.
Usedtocontroloverfittingsimilartomin_samples_split.
Generallylowervaluesshouldbechosenforimbalancedclassproblemsbecausetheregionsin
whichtheminorityclasswillbeinmajoritywillbeverysmall.
3.min_weight_fraction_leaf
Similartomin_samples_leafbutdefinedasafractionofthetotalnumberofobservationsinsteadof
aninteger.
Onlyoneof#2and#3shouldbedefined.
4.max_depth
Themaximumdepthofatree.
Usedtocontroloverfittingashigherdepthwillallowmodeltolearnrelationsveryspecifictoa
particularsample.
ShouldbetunedusingCV.
5.max_leaf_nodes
Themaximumnumberofterminalnodesorleavesinatree.
Canbedefinedinplaceofmax_depth.Sincebinarytreesarecreated,adepthofnwould
produceamaximumof2^nleaves.
Ifthisisdefined,GBMwillignoremax_depth.
6.max_features
Thenumberoffeaturestoconsiderwhilesearchingforabestsplit.Thesewillberandomly
selected.
Asathumbrule,squarerootofthetotalnumberoffeaturesworksgreatbutweshouldcheckupto
3040%ofthetotalnumberoffeatures.
Highervaluescanleadtooverfittingbutdependsoncasetocase.
Beforemovingontootherparameters,letsseetheoverallpseudocodeoftheGBMalgorithmfor2
classes:
1.Initializetheoutcome
4/5
10/7/2016
2.Iteratefrom1tototalnumberoftrees
2.1Updatetheweightsfortargetsbasedonpreviousrun(higherfortheonesmisclassified)
2.2Fitthemodelonselectedsubsampleofdata
2.3Makepredictionsonthefullsetofobservations
2.4Updatetheoutputwithcurrentresultstakingintoaccountthelearningrate
3.Returnthefinaloutput.
This is an extremely simplified (probably naive) explanation of GBMs working. The parameters
whichwehaveconsideredsofarwillaffectstep2.2,i.e.modelbuilding.Letsconsideranothersetof
parametersformanagingboosting:
1.learning_rate
Thisdeterminestheimpactofeachtreeonthefinaloutcome(step2.4).GBMworksbystarting
with an initial estimate which is updated using the output of each tree. The learning parameter
controlsthemagnitudeofthischangeintheestimates.
Lower values are generally preferred as they make the model robust to the specific
characteristicsoftreeandthusallowingittogeneralizewell.
Lower values would require higher number of trees to model all the relations and will be
computationallyexpensive.
2.n_estimators
Thenumberofsequentialtreestobemodeled(step2)
ThoughGBMisfairlyrobustathighernumberoftreesbutitcanstilloverfitatapoint.Hence,this
shouldbetunedusingCVforaparticularlearningrate.
3.subsample
Thefractionofobservationstobeselectedforeachtree.Selectionisdonebyrandomsampling.
Valuesslightlylessthan1makethemodelrobustbyreducingthevariance.
Typicalvalues~0.8generallyworkfinebutcanbefinetunedfurther.
Apartfromthese,therearecertainmiscellaneousparameterswhichaffectoverallfunctionality:
1.loss
Itreferstothelossfunctiontobeminimizedineachsplit.
It can have various values for classification and regression case. Generally the default values
workfine.Othervaluesshouldbechosenonlyifyouunderstandtheirimpactonthemodel.
2.init
Thisaffectsinitializationoftheoutput.
Thiscanbeusedifwehavemadeanothermodelwhoseoutcomeistobeusedastheinitial
5/5

Complete Guide To Parameter Tuning in Gradient Boosting (GBM) in Python PDF

Încărcat de

Informații document

Descriere originală:

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Complete Guide To Parameter Tuning in Gradient Boosting (GBM) in Python PDF

Încărcat de

Drepturi de autor:

Formate disponibile

10/7/2016

S-ar putea să vă placă și