Documente Academic
Documente Profesional
Documente Cultură
Networks
and
Ensemble Methods for
Classification
NEURAL NETWORKS
Neural Networks
Neural Networks
Aneuralnetwork isasetofconnectedinput/outputunits
(neurons)whereeachconnectionhasaweightassociatedwithit.
Duringthelearningphase,thenetworklearnsbyadjustingthe
weightsthatenableittopredictthecorrectclasslabelofthe
inputsamples(thetrainingsamples).
Knowledgeaboutthelearningtaskisgivenintheformofexamples.
Interneuronconnectionstrengths(weights)areusedtostore the
acquired information (thetrainingexamples).
Duringthelearningprocess theweightsaremodifiedinordertomodel
theparticularlearningtaskcorrectlyonthetrainingexamples.
http://aemc.jpl.nasa.gov/activities/bio_regen.cfm
Neural Networks
Network architectures
Advantages
Threedifferentclassesofnetworkarchitectures
predictionaccuracyisgenerallyhigh
singlelayerfeedforward
robust,workswhentrainingexamplescontainerrorsornoisydata
multilayerfeedforward
outputmaybediscrete,realvalued,oravectorofseveraldiscreteorreal
valuedattributes
recurrent
fastevaluationofthelearnedtargetfunction
Criticism
neuronsareorganizedinacycliclayers
Thearchitectureofaneuralnetworkislinkedwiththelearning
algorithmusedtotrain
singlelayer
parametersarebestdeterminedempirically,suchasthenetworktopologyor
structure
multilayer
longtrainingtime
difficulttounderstandthelearnedfunction(weights)
noteasytoincorporatedomainknowledge
Input layer
of
source nodes
Output layer
of
neurons
Input
layer
Output
layer
Hidden Layer
Neurons
The neuron
Bias
Neuralnetworksarebuiltoutofadenselyinterconnectedset
ofsimpleunits(neurons)
Eachneurontakesanumberofrealvaluedinputs
Producesasinglerealvaluedoutput
Inputstoaneuronmaybetheoutputsofotherneurons.
Aneuronsoutputmaybeusedasinputtomanyotherneurons
x1
w1
b
w0
Input
signal
x2
w2
xm
Local
Field
Adderfunction (linear
combiner)whichcomputesthe
weightedsumoftheinputs:
wm
weights
()
u b w 0
w jx j
j 1
Output
Activation
function
(squashing
function)for
limitingthe
amplitudeofthe
outputofthe
neuron
y (u)
Bias:servestovarytheactivityoftheunit
7
The neuron
Assignweightstoeachinputlink
Multiplyeachweightbytheinputvalue(0or1)
Sumalltheweightfiringinputcombinations
Applysquashfunction,e.g.:
Ifsum>thresholdfortheNeuronthen
Output=+1
ElseOutput=1
http://wwwcse.uta.edu/~cook/ai1/lectures/figures/neuron.jpg
10
Logisticactivation
z z
1
z
1 e z
1
z
1, if
1, if
z sign( z )
Hyperbolictangentactivation
z 0,
z 0.
choosesmallrandomweights(wi)
Setthreshold =1(stepfunction)
Choosesmalllearningrate (r)
Thresholdactivation
Initially
u tanhu
1 e 2u
1 e 2u
Applyeachmemberofthetrainingset totheneuralnetmodel
usingatrainingrule toadjusttheweights
Foreachunit
1
1
z
-1
Computetheoutputvalueusingtheactivationfunction
Computetheerror
Updatetheweightsandthebias
1
11
11
Computethenetinputtotheunitasalinearcombinationofalltheinputs
totheunit
12
Arethesimplestformofneuralnetworks
Modifytheweights (wi)accordingtotheTrainingRule:
wi=wi+r (t a) xi
outputvariables
inputvariables
risthelearningrate (eg.0.2)
t=targetoutput
a=actualoutput
xi = ithinputvalue
outputnodes
Learningrate:iftoosmalllearningoccursatasmallpace,iftoolargeitmaystuckinlocalminimuminthe
decisionspace
14
13
Example
b=1
X1=0
w0=0.49
w1=0.95
Y=0
w2=0.15
x1
x2
threshold=0.5
r=0.05
X2=1
Computeoutput u=1x0.49+0x0.95+1x0.15=0.34<t
fortheinput thus,y=0
targetoutput=1
Computethe actualoutput(y)=0
error error=(10)=1
correctionfactor=errorxr=0.05
w0=0.49+0.05x(10)x(1)=0.44
Computethenew w1=0.95+0.05x(10)x0=0.95
weights w2=0.15+0.05x(10)x1=0.20
Repeattheprocess
withthenew
weigthsforagiven
numberof
iterations
inputlayer
15
hiddenlayer
(oneormore)
outputlayer
16
backpropagationalgorithm
Phase1:Propagation
Forwardpropagationofatraininginput
Backpropagationofthepropagation'soutput
activations.
Phase2:Weightupdate
Foreachweightsynapse:
Multiplyitsoutputdeltaandinputactivationto
getthegradientoftheweight.
j j (r )Errj
Outputvector
toupdatethebias
toupdatetheweights
Errj O j (1 O j ) Errkw jk
k
Bringtheweightintheoppositedirectionofthe
gradientbysubtractingaratioofitfromthe
weight.
errorforanodeinthehiddenlayer
Hiddennodes
Errj O j (1 O j )(Tj O j )
errorforanodeintheoutputlayer
Thisratioinfluencesthespeedandqualityof
learning.Thesignofthegradientofaweight
indicateswheretheerrorisincreasing,thisis
whytheweightmustbeupdatedintheopposite
direction.
Oj
Inputnodes
Repeatthephase1and2untiltheperformance
ofthenetworkisgoodenough.
1
I
1 e j
I j w ij Oi j
Inputvector:xi
18
17
Example
x1=1
Propagation
w04=0.4
w14=0.2
I j w ij Oi j
i
w06=0.1
w15=0.3
w46=0.3
w24=0.4
x2=0
Oj
1
I
1 e j
w56=0.2
w25=0.1
w34=0.5
x3=1
neuron
w35=0.2
w05=0.2
xi inputvariables(1,0,1)whoseclassis1
wij randomlyassignedweights
activationfunction
Oj=1/(1+eIj)
and
learningrate=0.9
19
input
output
0.2x1+0.4x00.5x10.4=0.7
1/(1+e0.7)=0.332
0.3x1+0.1x0+0.2x1+0.2=0.1
1/(1+e0.1)=0.525
0.3x0.3320.2x0.525+0.1=0.105
1/(1+e0.105)=0.474
20
Updating weights
weight
errorforanodeintheoutputlayer
Errj O j (1 O j )(Tj O j )
errorforanodeinthehiddenlayer
Errj O j (1 O j ) Errkw jk
k
output
neuron
error
0.332
0.525
0.525x(1 0.525)x(0.2)x0.1311=0.0065
0.474
0.3+0.9x0.1311x0.332=0.261
w56
0.2+0.9x0.1311x0.525=0.138
w14
0.2+0.9x0.0087x1=0.192
w15
0.3+0.9x0.0065x1=0.306
w24
0.4+0.9x0.0087x0=0.4
w25
0.1+0.9x0.0065x0=0.1
w34
0.5+0.9x0.0087x1=0.508
toupdatethebias
w35
0.2+0.9x0.0065x1=0.194
j j (r )Errj
w06
0.1+0.9x0.1311=0.218
neuron
output
error
0.332
0.0087
0.525
0.0065
0.474
0.1311
toupdatetheweights
neuron
Newvalue
w46
w05
0.2+0.9x0.0065=0.194
w04
0.4+0.9x0.0087=0.408
21
22
Example
x1=1
w04=0.408
w14=0.192
w06=0.218
w15=0.306
w46=0.261
Weakness
Longtrainingtime
Requireanumberofparameterstypicallybestdeterminedempirically,e.g.,the
networktopologyor``structure."
Poorinterpretability:Difficulttointerpretthesymbolicmeaningbehindthe
learnedweightsandof``hiddenunits"inthenetwork
w24=0.4
x2=0
6
w56=0.138
w25=0.1
w34=0.508
x3=0
w35=0.194
w05=0.194
Thisistheresultingnetworkafterthefirstiteration.Wenowhavetoprocess
anothertrainingexampleuntiltheoverallerrorisloworwerunoutofexamples.
23
Strength
Hightolerancetonoisydata
Abilitytoclassifyuntrainedpatterns
Wellsuitedforcontinuousvaluedinputsandoutputs
Successfulonawidearrayofrealworlddata
Algorithmsareinherentlyparallel
24
EnsembleMethod
Aggregationofmultiplelearnedmodelswith
thegoalofimprovingaccuracy.
Intuition:simulatewhatwedowhenwecombinea
expertpanelinahumandecisionmakingprocess
ENSEMBLE METHODS
25
Some Comments
Combiningmodelsaddscomplexity
Itismoredifficulttocharacterizeandexplainpredictions
Theaccuracymayincrease
Diversityfromdifferencesininputvariation
Differentfeatureweightings
ViolationofOckhamsRazor
simplicityleadstogreateraccuracy
Identifyingthebestmodelrequiresidentifyingtheproper
"modelcomplexity"
26
Ratings
ClassifierA
Actors
ClassifierB
Genres
ClassifierC
+
+
Predictions
TrainingExamples
Divideuptrainingdataamongmodels
ClassifierA
ClassifierB
+
+
Predictions
ClassifierC
TrainingExamples
27
28
Ensemblemethods
Useacombinationofmodelstoincreaseaccuracy
Combineaseriesofklearnedmodels,M1,M2,,Mk,withtheaimof
creatinganimprovedmodelM*
Algebraicmethods
Average
Weightedaverage
Sum
Weightedsum
Product
Maximum
Minimum
Median
Votingmethods
Majorityvoting
Weightedmajorityvoting
Bordacount
(rankcandidatesinorderofpreference)
30
29
Bagging:
Analogy:Diagnosisbasedonmultipledoctorsmajorityvote
Training
averagingthepredictionoveracollectionofclassifiers
Boosting:
weightedvotewithacollectionofclassifiers
Ensemble:
combiningasetofheterogeneousclassifiers
31
AclassifiermodelMi islearnedforeachtrainingsetDi
Classification:classifyanunknownsample X
EachclassifierMi returnsitsclassprediction
ThebaggedclassifierM*countsthevotesandassignstheclasswiththe
mostvotestoX
Prediction:canbeappliedtothepredictionofcontinuousvaluesbytaking
theaveragevalueofeachpredictionforagiventesttuple
32
Bagging
Accuracy
OftensignificantbetterthanasingleclassifierderivedfromD
Fornoisedata:notconsiderablyworse,morerobust
Provedimprovedaccuracyinprediction
Requirement:Needunstableclassifiertypes
Unstablemeansasmallchangetothetrainingdatamayleadtomajor
decisionchanges.
StabilityinTraining
Training:constructclassifierf fromD
Stability:smallchangesonD resultsinsmallchangesonf
Decisiontreesareatypicalunstableclassifier
33
http://en.wikibooks.org/wiki/File:DTE_Bagging.png
Boosting
Analogy:Consultseveraldoctors,basedonacombinationofweighted
diagnosesweightassignedbasedonthepreviousdiagnosisaccuracy
Incrementallycreatemodelsselectivelyusingtraining
examplesbasedonsomedistribution.
Howboostingworks?
Weightsareassignedtoeachtrainingexample
Aseriesofkclassifiersisiterativelylearned
AfteraclassifierMiislearned,theweightsareupdatedtoallowthe
subsequentclassifier,Mi+1,topaymoreattentiontothetraining
examplesthatweremisclassifiedbyMi
34
UsingDifferentDataDistribution
Startwithuniformweighting
Duringeachstepoflearning
Increaseweightsoftheexampleswhicharenotcorrectlylearnedbythe
weaklearner
Decreaseweightsoftheexampleswhicharecorrectlylearnedbytheweak
learner
Idea
Focusondifficultexampleswhicharenotcorrectlyclassifiedinthe
previoussteps
ThefinalM*combinesthevotesofeachindividualclassifier,wherethe
weightofeachclassifier'svoteisafunctionofitsaccuracy
35
36
WeightedVoting
Boosting
Constructstrongclassifierbyweightedvotingoftheweak
classifiers
DifferenceswithBagging:
Modelsarebuiltsequentiallyonmodifiedversionsofthedata
Thepredictionsofthemodelsarecombinedthroughaweighted
sum/vote
Idea
Betterweakclassifiergetsalargerweight
Boostingalgorithmcanbeextendedfornumericprediction
Iterativelyaddweakclassifiers
Comparingwithbagging:Boostingtendstoachievegreater
accuracy,butitalsorisksoverfittingthemodeltomisclassified
data
Increaseaccuracyofthecombinedclassifierthroughminimization
ofacostfunction
37
38
Adaboost comments
Givenasetofdclasslabeledexamples,(X1,y1),,(Xd,yd)
Initially,alltheweightsofexamplesaresetthesame(1/d)
Generatekclassifiersinkrounds.Atroundi,
TuplesfromDaresampled(withreplacement)toformatrainingsetDiofthe
samesize
Eachexampleschanceofbeingselectedisbasedonitsweight
AclassificationmodelMiisderivedfromDianditserrorratecalculatedusingDi
asatestset
Ifatupleismisclassified,itsweightisincreased,otherwiseitisdecreased
Errorrate:err(Xj)isthemisclassificationerrorofexampleXj.ClassifierMi
errorrateisthesumoftheweightsofthemisclassifiedexamples.
39
Thisdistributionupdateensuresthatinstancesmisclassifiedby
thepreviousclassifieraremorelikelytobeincludedinthe
trainingdataofthenextclassifier.
Hence,consecutiveclassifierstrainingdataaregeared
towardsincreasinglyhardtoclassifyinstances.
Unlikebagging,AdaBoostusesaratherundemocraticvoting
scheme,calledtheweightedmajorityvoting.Theideaisan
intuitiveone:thoseclassifiersthathaveshowngood
performanceduringtrainingarerewardedwithhighervoting
weightsthantheothers.
40
RandomForest:Avariationofthebaggingalgorithm
Createdfromindividualdecisiontreeswhoseparametersvary
randomly.Suchparameterscanbe
bootstrappedreplicasofthetrainingdata,asinbagging,but
theycanalsobe
differentfeaturesubsetsasinrandomsubspacemethods.
Duringclassification,eachtreevotesandthemostpopular
classisreturned
Thediagramshouldbeinterpretedwiththeunderstandingthatthealgorithmissequential:classifierCKiscreated
beforeclassifierCK+1,whichinturnrequiresthatKandthecurrentdistributionDKbeavailable.
42
41
TwoMethodstoconstructRandomForest:
References
ForestRI(randominputselection):Randomlyselect,ateachnode,F
attributesascandidatesforthesplitatthenode.TheCART
methodologyisusedtogrowthetreestomaximumsize
ForestRC(randomlinearcombinations):Createsnewattributes(or
features)thatarealinearcombinationoftheexistingattributes
(reducesthecorrelationbetweenindividualclassifiers)
ComparableinaccuracytoAdaboost,butmorerobustto
errorsandoutliers
Insensitivetothenumberofattributesselectedfor
considerationateachsplit,andfasterthanbaggingorboosting
43
DataMining:PracticalMachineLearningToolsandTechniqueswithJava
Implementations,IanH.WittenandEibeFrank,1999
DataMining:PracticalMachineLearningToolsandTechniquessecond
edition,IanH.WittenandEibeFrank,2005
ToddHolloway,2008,EnsembleLearningBetterPredictionsThrough
Diversity,powerpointpresentation
LeandroM.Almeida,SistemasBaseadosemComitsdeClassificadores
CongLi,2009,MachineLearningBasics3.EnsembleLearning
R.Polikar,Ensemblebasedsystemsindecisionmaking,IEEECircuitsand
SystemsMagazine,vol.6,no.3,pp.2145,Quarter2006.
44