NN and Ensemble Class

Neural
Networks
and
Ensemble Methods for
Classification
NEURAL NETWORKS
Neural Networks
Neural Networks
Aneuralnetwork isasetofconnectedinput/outputunits
(neurons)whereeachconnectionhasaweightassociatedwithit.
Duringthelearningphase,thenetworklearnsbyadjustingthe
weightsthatenableittopredictthecorrectclasslabelofthe
inputsamples(thetrainingsamples).
Knowledgeaboutthelearningtaskisgivenintheformofexamples.
Interneuronconnectionstrengths(weights)areusedtostore the
acquired information (thetrainingexamples).
Duringthelearningprocess theweightsaremodifiedinordertomodel
theparticularlearningtaskcorrectlyonthetrainingexamples.
http://aemc.jpl.nasa.gov/activities/bio_regen.cfm
Neural Networks
Network architectures
Advantages
Threedifferentclassesofnetworkarchitectures
predictionaccuracyisgenerallyhigh
singlelayerfeedforward
robust,workswhentrainingexamplescontainerrorsornoisydata
multilayerfeedforward
outputmaybediscrete,realvalued,oravectorofseveraldiscreteorreal
valuedattributes
recurrent
fastevaluationofthelearnedtargetfunction
Criticism
neuronsareorganizedinacycliclayers
Thearchitectureofaneuralnetworkislinkedwiththelearning
algorithmusedtotrain
singlelayer
parametersarebestdeterminedempirically,suchasthenetworktopologyor
structure
multilayer
longtrainingtime
difficulttounderstandthelearnedfunction(weights)
noteasytoincorporatedomainknowledge
Input layer
of
source nodes
Output layer
of
neurons
Input
layer
Output
layer
Hidden Layer
Neurons
The neuron
Bias
Neuralnetworksarebuiltoutofadenselyinterconnectedset
ofsimpleunits(neurons)
Eachneurontakesanumberofrealvaluedinputs
Producesasinglerealvaluedoutput
Inputstoaneuronmaybetheoutputsofotherneurons.
Aneuronsoutputmaybeusedasinputtomanyotherneurons
x1
w1
b
w0
Input
signal
x2
w2
xm
Local
Field
Adderfunction (linear
combiner)whichcomputesthe
weightedsumoftheinputs:
wm
weights
()
u b w 0
w jx j
j 1
Output
Activation
function
(squashing
function)for
limitingthe
amplitudeofthe
outputofthe
neuron
y (u)
Bias:servestovarytheactivityoftheunit
7
The neuron
How does it Works?
Assignweightstoeachinputlink
Multiplyeachweightbytheinputvalue(0or1)
Sumalltheweightfiringinputcombinations
Applysquashfunction,e.g.:
Ifsum>thresholdfortheNeuronthen
Output=+1
ElseOutput=1
http://wwwcse.uta.edu/~cook/ai1/lectures/figures/neuron.jpg
10
Popular activation functions

Linearactivation
How Are Neural Networks Trained?
Logisticactivation
z z
1
z
1 e z
1
z
1, if
1, if
z sign( z )
Hyperbolictangentactivation
z 0,
z 0.
choosesmallrandomweights(wi)
Setthreshold =1(stepfunction)
Choosesmalllearningrate (r)
Thresholdactivation
Initially
u tanhu
1 e 2u
1 e 2u
Applyeachmemberofthetrainingset totheneuralnetmodel
usingatrainingrule toadjusttheweights
Foreachunit
1
1
z
-1
Computetheoutputvalueusingtheactivationfunction
Computetheerror
Updatetheweightsandthebias
1
11
11
Computethenetinputtotheunitasalinearcombinationofalltheinputs
totheunit
12
Single Layer Perceptron
Single layer perceptron: training rule
Arethesimplestformofneuralnetworks
Modifytheweights (wi)accordingtotheTrainingRule:
wi=wi+r (t a) xi
outputvariables
inputvariables
risthelearningrate (eg.0.2)
t=targetoutput
a=actualoutput
xi = ithinputvalue
outputnodes
Learningrate:iftoosmalllearningoccursatasmallpace,iftoolargeitmaystuckinlocalminimuminthe
decisionspace
14
13
Example
b=1
X1=0
w0=0.49
w1=0.95
Y=0
w2=0.15
x1
x2
Multi layer network
threshold=0.5
r=0.05
X2=1
Computeoutput u=1x0.49+0x0.95+1x0.15=0.34<t
fortheinput thus,y=0
targetoutput=1
Computethe actualoutput(y)=0
error error=(10)=1
correctionfactor=errorxr=0.05
w0=0.49+0.05x(10)x(1)=0.44
Computethenew w1=0.95+0.05x(10)x0=0.95
weights w2=0.15+0.05x(10)x1=0.20
Repeattheprocess
withthenew
weigthsforagiven
numberof
iterations
inputlayer
15
hiddenlayer
(oneormore)
outputlayer
16
Training multi layer networks
MultiLayer network of sigmoid units

Problem:whatisthedesiredoutputforahiddennode?=>Backpropagationalgorithm
backpropagationalgorithm
Phase1:Propagation
Forwardpropagationofatraininginput
Backpropagationofthepropagation'soutput
activations.
Phase2:Weightupdate
Foreachweightsynapse:
Multiplyitsoutputdeltaandinputactivationto
getthegradientoftheweight.
j j (r )Errj
Outputvector
toupdatethebias
wij wij (r )ErrjOi

Outputnodes
toupdatetheweights
Errj O j (1 O j ) Errkw jk
k
Bringtheweightintheoppositedirectionofthe
gradientbysubtractingaratioofitfromthe
weight.
errorforanodeinthehiddenlayer
Hiddennodes
Errj O j (1 O j )(Tj O j )
errorforanodeintheoutputlayer
Thisratioinfluencesthespeedandqualityof
learning.Thesignofthegradientofaweight
indicateswheretheerrorisincreasing,thisis
whytheweightmustbeupdatedintheopposite
direction.
Oj
Inputnodes
Repeatthephase1and2untiltheperformance
ofthenetworkisgoodenough.
1
I
1 e j
I j w ij Oi j
Inputvector:xi
18
17
Example
x1=1
Propagation
w04=0.4
w14=0.2
I j w ij Oi j
i
w06=0.1
w15=0.3
w46=0.3
w24=0.4
x2=0
Oj
1
I
1 e j
w56=0.2
w25=0.1
w34=0.5
x3=1
neuron
w35=0.2
w05=0.2
xi inputvariables(1,0,1)whoseclassis1
wij randomlyassignedweights
activationfunction
Oj=1/(1+eIj)
and
learningrate=0.9
19
input
output
0.2x1+0.4x00.5x10.4=0.7
1/(1+e0.7)=0.332
0.3x1+0.1x0+0.2x1+0.2=0.1
1/(1+e0.1)=0.525
0.3x0.3320.2x0.525+0.1=0.105
1/(1+e0.105)=0.474
20
Updating weights
Calculation of the neuron
weight
errorforanodeintheoutputlayer
Errj O j (1 O j )(Tj O j )
errorforanodeinthehiddenlayer
Errj O j (1 O j ) Errkw jk
k
output
neuron
error
0.332
0.474x(1 0.474)x(1 0.474)=0.1311
0.525
0.525x(1 0.525)x(0.2)x0.1311=0.0065
0.474
0.332 x(1 0.332)x(0.3)x0.1311=0.0087
0.3+0.9x0.1311x0.332=0.261
w56
0.2+0.9x0.1311x0.525=0.138
w14
0.2+0.9x0.0087x1=0.192
w15
0.3+0.9x0.0065x1=0.306
w24
0.4+0.9x0.0087x0=0.4
w25
0.1+0.9x0.0065x0=0.1
w34
0.5+0.9x0.0087x1=0.508
toupdatethebias
w35
0.2+0.9x0.0065x1=0.194
wij wij (r )ErrjOi
j j (r )Errj
w06
0.1+0.9x0.1311=0.218
neuron
output
error
0.332
0.0087
0.525
0.0065
0.474
0.1311
toupdatetheweights
neuron
Newvalue
w46
w05
0.2+0.9x0.0065=0.194
w04
0.4+0.9x0.0087=0.408
21
22
Example
x1=1
Neural Network as a Classifier
w04=0.408
w14=0.192
w06=0.218
w15=0.306
w46=0.261
Weakness
Longtrainingtime
Requireanumberofparameterstypicallybestdeterminedempirically,e.g.,the
networktopologyor``structure."
Poorinterpretability:Difficulttointerpretthesymbolicmeaningbehindthe
learnedweightsandof``hiddenunits"inthenetwork
w24=0.4
x2=0
6
w56=0.138
w25=0.1
w34=0.508
x3=0
w35=0.194
w05=0.194
Thisistheresultingnetworkafterthefirstiteration.Wenowhavetoprocess
anothertrainingexampleuntiltheoverallerrorisloworwerunoutofexamples.
23
Strength
Hightolerancetonoisydata
Abilitytoclassifyuntrainedpatterns
Wellsuitedforcontinuousvaluedinputsandoutputs
Successfulonawidearrayofrealworlddata
Algorithmsareinherentlyparallel
24
EnsembleMethod
Aggregationofmultiplelearnedmodelswith
thegoalofimprovingaccuracy.
Intuition:simulatewhatwedowhenwecombinea
expertpanelinahumandecisionmakingprocess
ENSEMBLE METHODS
25
Some Comments
Methods to Achieve Diversity
Combiningmodelsaddscomplexity
Itismoredifficulttocharacterizeandexplainpredictions
Theaccuracymayincrease
Diversityfromdifferencesininputvariation
Differentfeatureweightings
ViolationofOckhamsRazor
simplicityleadstogreateraccuracy
Identifyingthebestmodelrequiresidentifyingtheproper
"modelcomplexity"
26
Ratings
ClassifierA
Actors
ClassifierB
Genres
ClassifierC
+
+
Predictions
TrainingExamples
Divideuptrainingdataamongmodels
ClassifierA
ClassifierB
+
+
Predictions
ClassifierC
TrainingExamples
27
28
Ensemble Methods: Increasing the

Accuracy
How to combine models
Ensemblemethods
Useacombinationofmodelstoincreaseaccuracy
Combineaseriesofklearnedmodels,M1,M2,,Mk,withtheaimof
creatinganimprovedmodelM*
Algebraicmethods
Average
Weightedaverage
Sum
Weightedsum
Product
Maximum
Minimum
Median
Votingmethods
Majorityvoting
Weightedmajorityvoting
Bordacount
(rankcandidatesinorderofpreference)
30
29
Popular ensemble methods
Bagging:
Analogy:Diagnosisbasedonmultipledoctorsmajorityvote
Training
averagingthepredictionoveracollectionofclassifiers
Boosting:
Bagging: Bootstrap AGGregatING
weightedvotewithacollectionofclassifiers
Ensemble:
combiningasetofheterogeneousclassifiers
31
GivenasetDofdtuples,ateachiterationi,atrainingsetDi ofd tuples

issampledwithreplacementfromD(i.e.,bootstrap)
AclassifiermodelMi islearnedforeachtrainingsetDi
Classification:classifyanunknownsample X
EachclassifierMi returnsitsclassprediction
ThebaggedclassifierM*countsthevotesandassignstheclasswiththe
mostvotestoX
Prediction:canbeappliedtothepredictionofcontinuousvaluesbytaking
theaveragevalueofeachpredictionforagiventesttuple
32
Bagging
Accuracy
OftensignificantbetterthanasingleclassifierderivedfromD
Fornoisedata:notconsiderablyworse,morerobust
Provedimprovedaccuracyinprediction
Requirement:Needunstableclassifiertypes
Unstablemeansasmallchangetothetrainingdatamayleadtomajor
decisionchanges.
StabilityinTraining
Training:constructclassifierf fromD
Stability:smallchangesonD resultsinsmallchangesonf
Decisiontreesareatypicalunstableclassifier
33
http://en.wikibooks.org/wiki/File:DTE_Bagging.png
Boosting
Boosting: Construct Weak Classifiers
Analogy:Consultseveraldoctors,basedonacombinationofweighted
diagnosesweightassignedbasedonthepreviousdiagnosisaccuracy
Incrementallycreatemodelsselectivelyusingtraining
examplesbasedonsomedistribution.
Howboostingworks?
Weightsareassignedtoeachtrainingexample
Aseriesofkclassifiersisiterativelylearned
AfteraclassifierMiislearned,theweightsareupdatedtoallowthe
subsequentclassifier,Mi+1,topaymoreattentiontothetraining
examplesthatweremisclassifiedbyMi
34
UsingDifferentDataDistribution
Startwithuniformweighting
Duringeachstepoflearning
Increaseweightsoftheexampleswhicharenotcorrectlylearnedbythe
weaklearner
Decreaseweightsoftheexampleswhicharecorrectlylearnedbytheweak
learner
Idea
Focusondifficultexampleswhicharenotcorrectlyclassifiedinthe
previoussteps
ThefinalM*combinesthevotesofeachindividualclassifier,wherethe
weightofeachclassifier'svoteisafunctionofitsaccuracy
35
36
Boosting: Combine Weak Classifiers
WeightedVoting
Boosting
Constructstrongclassifierbyweightedvotingoftheweak
classifiers
DifferenceswithBagging:
Modelsarebuiltsequentiallyonmodifiedversionsofthedata
Thepredictionsofthemodelsarecombinedthroughaweighted
sum/vote
Idea
Betterweakclassifiergetsalargerweight
Boostingalgorithmcanbeextendedfornumericprediction
Iterativelyaddweakclassifiers
Comparingwithbagging:Boostingtendstoachievegreater
accuracy,butitalsorisksoverfittingthemodeltomisclassified
data
Increaseaccuracyofthecombinedclassifierthroughminimization
ofacostfunction
37
38
Adaboost: a popular boosting algorithm
Adaboost comments
(Freund and Schapire, 1997)
Givenasetofdclasslabeledexamples,(X1,y1),,(Xd,yd)
Initially,alltheweightsofexamplesaresetthesame(1/d)
Generatekclassifiersinkrounds.Atroundi,
TuplesfromDaresampled(withreplacement)toformatrainingsetDiofthe
samesize
Eachexampleschanceofbeingselectedisbasedonitsweight
AclassificationmodelMiisderivedfromDianditserrorratecalculatedusingDi
asatestset
Ifatupleismisclassified,itsweightisincreased,otherwiseitisdecreased
Errorrate:err(Xj)isthemisclassificationerrorofexampleXj.ClassifierMi
errorrateisthesumoftheweightsofthemisclassifiedexamples.
39
Thisdistributionupdateensuresthatinstancesmisclassifiedby
thepreviousclassifieraremorelikelytobeincludedinthe
trainingdataofthenextclassifier.
Hence,consecutiveclassifierstrainingdataaregeared
towardsincreasinglyhardtoclassifyinstances.
Unlikebagging,AdaBoostusesaratherundemocraticvoting
scheme,calledtheweightedmajorityvoting.Theideaisan
intuitiveone:thoseclassifiersthathaveshowngood
performanceduringtrainingarerewardedwithhighervoting
weightsthantheothers.
40
Random Forest (Breiman 2001)
RandomForest:Avariationofthebaggingalgorithm
Createdfromindividualdecisiontreeswhoseparametersvary
randomly.Suchparameterscanbe
bootstrappedreplicasofthetrainingdata,asinbagging,but
theycanalsobe
differentfeaturesubsetsasinrandomsubspacemethods.
Duringclassification,eachtreevotesandthemostpopular
classisreturned
Thediagramshouldbeinterpretedwiththeunderstandingthatthealgorithmissequential:classifierCKiscreated
beforeclassifierCK+1,whichinturnrequiresthatKandthecurrentdistributionDKbeavailable.
42
41
Random Forest (Breiman 2001)
TwoMethodstoconstructRandomForest:
References
ForestRI(randominputselection):Randomlyselect,ateachnode,F
attributesascandidatesforthesplitatthenode.TheCART
methodologyisusedtogrowthetreestomaximumsize
ForestRC(randomlinearcombinations):Createsnewattributes(or
features)thatarealinearcombinationoftheexistingattributes
(reducesthecorrelationbetweenindividualclassifiers)
ComparableinaccuracytoAdaboost,butmorerobustto
errorsandoutliers
Insensitivetothenumberofattributesselectedfor
considerationateachsplit,andfasterthanbaggingorboosting
43
DataMining:PracticalMachineLearningToolsandTechniqueswithJava
Implementations,IanH.WittenandEibeFrank,1999
DataMining:PracticalMachineLearningToolsandTechniquessecond
edition,IanH.WittenandEibeFrank,2005
ToddHolloway,2008,EnsembleLearningBetterPredictionsThrough
Diversity,powerpointpresentation
LeandroM.Almeida,SistemasBaseadosemComitsdeClassificadores
CongLi,2009,MachineLearningBasics3.EnsembleLearning
R.Polikar,Ensemblebasedsystemsindecisionmaking,IEEECircuitsand
SystemsMagazine,vol.6,no.3,pp.2145,Quarter2006.
44

NN and Ensemble Class

Încărcat de

Informații document

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

NN and Ensemble Class

Încărcat de

Drepturi de autor:

Formate disponibile

Neural

How does it Works?

Popular activation functions

How Are Neural Networks Trained?

Single Layer Perceptron

Single layer perceptron: training rule

Multi layer network

Training multi layer networks

MultiLayer network of sigmoid units

wij wij (r )ErrjOi

Calculation of the neuron

0.474x(1 0.474)x(1 0.474)=0.1311

0.332 x(1 0.332)x(0.3)x0.1311=0.0087

wij wij (r )ErrjOi

Neural Network as a Classifier

Methods to Achieve Diversity

Ensemble Methods: Increasing the

How to combine models

Popular ensemble methods

Bagging: Bootstrap AGGregatING

GivenasetDofdtuples,ateachiterationi,atrainingsetDi ofd tuples

Boosting: Construct Weak Classifiers

Boosting: Combine Weak Classifiers

Adaboost: a popular boosting algorithm

(Freund and Schapire, 1997)

Random Forest (Breiman 2001)

Random Forest (Breiman 2001)

S-ar putea să vă placă și