Sunteți pe pagina 1din 11

Neural

Networks
and
Ensemble Methods for
Classification

NEURAL NETWORKS

Neural Networks

Neural Networks

Aneuralnetwork isasetofconnectedinput/outputunits
(neurons)whereeachconnectionhasaweightassociatedwithit.

Duringthelearningphase,thenetworklearnsbyadjustingthe
weightsthatenableittopredictthecorrectclasslabelofthe
inputsamples(thetrainingsamples).

Knowledgeaboutthelearningtaskisgivenintheformofexamples.

Interneuronconnectionstrengths(weights)areusedtostore the
acquired information (thetrainingexamples).

Duringthelearningprocess theweightsaremodifiedinordertomodel
theparticularlearningtaskcorrectlyonthetrainingexamples.

http://aemc.jpl.nasa.gov/activities/bio_regen.cfm

Neural Networks

Network architectures

Advantages

Threedifferentclassesofnetworkarchitectures

predictionaccuracyisgenerallyhigh

singlelayerfeedforward

robust,workswhentrainingexamplescontainerrorsornoisydata

multilayerfeedforward

outputmaybediscrete,realvalued,oravectorofseveraldiscreteorreal
valuedattributes

recurrent

fastevaluationofthelearnedtargetfunction

Criticism

neuronsareorganizedinacycliclayers

Thearchitectureofaneuralnetworkislinkedwiththelearning
algorithmusedtotrain
singlelayer

parametersarebestdeterminedempirically,suchasthenetworktopologyor
structure

multilayer

longtrainingtime

difficulttounderstandthelearnedfunction(weights)

noteasytoincorporatedomainknowledge

Input layer
of
source nodes

Output layer
of
neurons

Input
layer

Output
layer

Hidden Layer

Neurons

The neuron
Bias

Neuralnetworksarebuiltoutofadenselyinterconnectedset
ofsimpleunits(neurons)

Eachneurontakesanumberofrealvaluedinputs

Producesasinglerealvaluedoutput

Inputstoaneuronmaybetheoutputsofotherneurons.

Aneuronsoutputmaybeusedasinputtomanyotherneurons

x1

w1

b
w0

Input
signal

x2

w2

xm

Local
Field

Adderfunction (linear
combiner)whichcomputesthe
weightedsumoftheinputs:

wm
weights

()

u b w 0

w jx j
j 1

Output

Activation
function
(squashing
function)for
limitingthe
amplitudeofthe
outputofthe
neuron

y (u)
Bias:servestovarytheactivityoftheunit
7

The neuron

How does it Works?

Assignweightstoeachinputlink

Multiplyeachweightbytheinputvalue(0or1)

Sumalltheweightfiringinputcombinations

Applysquashfunction,e.g.:

Ifsum>thresholdfortheNeuronthen

Output=+1

ElseOutput=1

http://wwwcse.uta.edu/~cook/ai1/lectures/figures/neuron.jpg

10

Popular activation functions


Linearactivation

How Are Neural Networks Trained?

Logisticactivation

z z

1
z
1 e z

1
z

1, if
1, if

z sign( z )

Hyperbolictangentactivation
z 0,
z 0.

choosesmallrandomweights(wi)

Setthreshold =1(stepfunction)

Choosesmalllearningrate (r)

Thresholdactivation

Initially

u tanhu

1 e 2u
1 e 2u

Applyeachmemberofthetrainingset totheneuralnetmodel
usingatrainingrule toadjusttheweights

Foreachunit

1
1

z
-1

Computetheoutputvalueusingtheactivationfunction

Computetheerror

Updatetheweightsandthebias

1
11
11

Computethenetinputtotheunitasalinearcombinationofalltheinputs
totheunit

12

Single Layer Perceptron

Single layer perceptron: training rule

Arethesimplestformofneuralnetworks

Modifytheweights (wi)accordingtotheTrainingRule:

wi=wi+r (t a) xi
outputvariables

inputvariables

risthelearningrate (eg.0.2)

t=targetoutput

a=actualoutput

xi = ithinputvalue

outputnodes
Learningrate:iftoosmalllearningoccursatasmallpace,iftoolargeitmaystuckinlocalminimuminthe
decisionspace
14

13

Example

b=1
X1=0
w0=0.49
w1=0.95

Y=0
w2=0.15

x1

x2

Multi layer network

threshold=0.5
r=0.05

X2=1
Computeoutput u=1x0.49+0x0.95+1x0.15=0.34<t
fortheinput thus,y=0
targetoutput=1
Computethe actualoutput(y)=0
error error=(10)=1
correctionfactor=errorxr=0.05
w0=0.49+0.05x(10)x(1)=0.44
Computethenew w1=0.95+0.05x(10)x0=0.95
weights w2=0.15+0.05x(10)x1=0.20

Repeattheprocess
withthenew
weigthsforagiven
numberof
iterations

inputlayer
15

hiddenlayer
(oneormore)

outputlayer
16

Training multi layer networks

MultiLayer network of sigmoid units


Problem:whatisthedesiredoutputforahiddennode?=>Backpropagationalgorithm

backpropagationalgorithm

Phase1:Propagation

Forwardpropagationofatraininginput

Backpropagationofthepropagation'soutput
activations.

Phase2:Weightupdate

Foreachweightsynapse:

Multiplyitsoutputdeltaandinputactivationto
getthegradientoftheweight.

j j (r )Errj

Outputvector

toupdatethebias

wij wij (r )ErrjOi


Outputnodes

toupdatetheweights

Errj O j (1 O j ) Errkw jk
k

Bringtheweightintheoppositedirectionofthe
gradientbysubtractingaratioofitfromthe
weight.

errorforanodeinthehiddenlayer

Hiddennodes

Errj O j (1 O j )(Tj O j )
errorforanodeintheoutputlayer

Thisratioinfluencesthespeedandqualityof
learning.Thesignofthegradientofaweight
indicateswheretheerrorisincreasing,thisis
whytheweightmustbeupdatedintheopposite
direction.

Oj

Inputnodes

Repeatthephase1and2untiltheperformance
ofthenetworkisgoodenough.

1
I
1 e j

I j w ij Oi j

Inputvector:xi

18

17

Example
x1=1

Propagation

w04=0.4

w14=0.2

I j w ij Oi j
i

w06=0.1

w15=0.3

w46=0.3

w24=0.4

x2=0

Oj

1
I
1 e j

w56=0.2
w25=0.1

w34=0.5

x3=1

neuron
w35=0.2
w05=0.2

xi inputvariables(1,0,1)whoseclassis1
wij randomlyassignedweights

activationfunction
Oj=1/(1+eIj)
and
learningrate=0.9
19

input

output

0.2x1+0.4x00.5x10.4=0.7

1/(1+e0.7)=0.332

0.3x1+0.1x0+0.2x1+0.2=0.1

1/(1+e0.1)=0.525

0.3x0.3320.2x0.525+0.1=0.105

1/(1+e0.105)=0.474
20

Updating weights

Calculation of the neuron

weight

errorforanodeintheoutputlayer

Errj O j (1 O j )(Tj O j )
errorforanodeinthehiddenlayer

Errj O j (1 O j ) Errkw jk
k

output

neuron

error

0.332

0.474x(1 0.474)x(1 0.474)=0.1311

0.525

0.525x(1 0.525)x(0.2)x0.1311=0.0065

0.474

0.332 x(1 0.332)x(0.3)x0.1311=0.0087

0.3+0.9x0.1311x0.332=0.261

w56

0.2+0.9x0.1311x0.525=0.138

w14

0.2+0.9x0.0087x1=0.192

w15

0.3+0.9x0.0065x1=0.306

w24

0.4+0.9x0.0087x0=0.4

w25

0.1+0.9x0.0065x0=0.1

w34

0.5+0.9x0.0087x1=0.508

toupdatethebias

w35

0.2+0.9x0.0065x1=0.194

wij wij (r )ErrjOi

j j (r )Errj

w06

0.1+0.9x0.1311=0.218

neuron

output

error

0.332

0.0087

0.525

0.0065

0.474

0.1311

toupdatetheweights

neuron

Newvalue

w46

w05

0.2+0.9x0.0065=0.194

w04

0.4+0.9x0.0087=0.408

21

22

Example
x1=1

Neural Network as a Classifier

w04=0.408

w14=0.192

w06=0.218

w15=0.306

w46=0.261

Weakness

Longtrainingtime

Requireanumberofparameterstypicallybestdeterminedempirically,e.g.,the
networktopologyor``structure."

Poorinterpretability:Difficulttointerpretthesymbolicmeaningbehindthe
learnedweightsandof``hiddenunits"inthenetwork

w24=0.4

x2=0

6
w56=0.138
w25=0.1

w34=0.508

x3=0

w35=0.194
w05=0.194

Thisistheresultingnetworkafterthefirstiteration.Wenowhavetoprocess
anothertrainingexampleuntiltheoverallerrorisloworwerunoutofexamples.
23

Strength

Hightolerancetonoisydata

Abilitytoclassifyuntrainedpatterns

Wellsuitedforcontinuousvaluedinputsandoutputs

Successfulonawidearrayofrealworlddata

Algorithmsareinherentlyparallel
24

EnsembleMethod

Aggregationofmultiplelearnedmodelswith
thegoalofimprovingaccuracy.

Intuition:simulatewhatwedowhenwecombinea
expertpanelinahumandecisionmakingprocess

ENSEMBLE METHODS

25

Some Comments

Methods to Achieve Diversity

Combiningmodelsaddscomplexity

Itismoredifficulttocharacterizeandexplainpredictions

Theaccuracymayincrease

Diversityfromdifferencesininputvariation
Differentfeatureweightings

ViolationofOckhamsRazor

simplicityleadstogreateraccuracy

Identifyingthebestmodelrequiresidentifyingtheproper
"modelcomplexity"

26

Ratings

ClassifierA

Actors

ClassifierB

Genres

ClassifierC

+
+

Predictions

TrainingExamples

Divideuptrainingdataamongmodels
ClassifierA
ClassifierB

+
+

Predictions

ClassifierC
TrainingExamples

27

28

Ensemble Methods: Increasing the


Accuracy

How to combine models

Ensemblemethods

Useacombinationofmodelstoincreaseaccuracy

Combineaseriesofklearnedmodels,M1,M2,,Mk,withtheaimof
creatinganimprovedmodelM*

Algebraicmethods

Average
Weightedaverage
Sum
Weightedsum
Product
Maximum
Minimum
Median

Votingmethods

Majorityvoting
Weightedmajorityvoting
Bordacount

(rankcandidatesinorderofpreference)

30

29

Popular ensemble methods

Bagging:

Analogy:Diagnosisbasedonmultipledoctorsmajorityvote

Training

averagingthepredictionoveracollectionofclassifiers

Boosting:

Bagging: Bootstrap AGGregatING

weightedvotewithacollectionofclassifiers

Ensemble:

combiningasetofheterogeneousclassifiers

31

GivenasetDofdtuples,ateachiterationi,atrainingsetDi ofd tuples


issampledwithreplacementfromD(i.e.,bootstrap)

AclassifiermodelMi islearnedforeachtrainingsetDi

Classification:classifyanunknownsample X

EachclassifierMi returnsitsclassprediction

ThebaggedclassifierM*countsthevotesandassignstheclasswiththe
mostvotestoX

Prediction:canbeappliedtothepredictionofcontinuousvaluesbytaking
theaveragevalueofeachpredictionforagiventesttuple

32

Bagging

Accuracy

OftensignificantbetterthanasingleclassifierderivedfromD

Fornoisedata:notconsiderablyworse,morerobust

Provedimprovedaccuracyinprediction

Requirement:Needunstableclassifiertypes

Unstablemeansasmallchangetothetrainingdatamayleadtomajor
decisionchanges.

StabilityinTraining

Training:constructclassifierf fromD

Stability:smallchangesonD resultsinsmallchangesonf

Decisiontreesareatypicalunstableclassifier
33

http://en.wikibooks.org/wiki/File:DTE_Bagging.png

Boosting

Boosting: Construct Weak Classifiers

Analogy:Consultseveraldoctors,basedonacombinationofweighted
diagnosesweightassignedbasedonthepreviousdiagnosisaccuracy

Incrementallycreatemodelsselectivelyusingtraining
examplesbasedonsomedistribution.

Howboostingworks?

Weightsareassignedtoeachtrainingexample

Aseriesofkclassifiersisiterativelylearned

AfteraclassifierMiislearned,theweightsareupdatedtoallowthe
subsequentclassifier,Mi+1,topaymoreattentiontothetraining
examplesthatweremisclassifiedbyMi

34

UsingDifferentDataDistribution

Startwithuniformweighting

Duringeachstepoflearning

Increaseweightsoftheexampleswhicharenotcorrectlylearnedbythe
weaklearner
Decreaseweightsoftheexampleswhicharecorrectlylearnedbytheweak
learner

Idea

Focusondifficultexampleswhicharenotcorrectlyclassifiedinthe
previoussteps

ThefinalM*combinesthevotesofeachindividualclassifier,wherethe
weightofeachclassifier'svoteisafunctionofitsaccuracy
35

36

Boosting: Combine Weak Classifiers

WeightedVoting

Boosting

Constructstrongclassifierbyweightedvotingoftheweak
classifiers

DifferenceswithBagging:

Modelsarebuiltsequentiallyonmodifiedversionsofthedata

Thepredictionsofthemodelsarecombinedthroughaweighted
sum/vote

Idea

Betterweakclassifiergetsalargerweight

Boostingalgorithmcanbeextendedfornumericprediction

Iterativelyaddweakclassifiers

Comparingwithbagging:Boostingtendstoachievegreater
accuracy,butitalsorisksoverfittingthemodeltomisclassified
data

Increaseaccuracyofthecombinedclassifierthroughminimization
ofacostfunction

37

38

Adaboost: a popular boosting algorithm

Adaboost comments

(Freund and Schapire, 1997)

Givenasetofdclasslabeledexamples,(X1,y1),,(Xd,yd)

Initially,alltheweightsofexamplesaresetthesame(1/d)

Generatekclassifiersinkrounds.Atroundi,

TuplesfromDaresampled(withreplacement)toformatrainingsetDiofthe
samesize

Eachexampleschanceofbeingselectedisbasedonitsweight

AclassificationmodelMiisderivedfromDianditserrorratecalculatedusingDi
asatestset

Ifatupleismisclassified,itsweightisincreased,otherwiseitisdecreased

Errorrate:err(Xj)isthemisclassificationerrorofexampleXj.ClassifierMi
errorrateisthesumoftheweightsofthemisclassifiedexamples.

39

Thisdistributionupdateensuresthatinstancesmisclassifiedby
thepreviousclassifieraremorelikelytobeincludedinthe
trainingdataofthenextclassifier.

Hence,consecutiveclassifierstrainingdataaregeared
towardsincreasinglyhardtoclassifyinstances.

Unlikebagging,AdaBoostusesaratherundemocraticvoting
scheme,calledtheweightedmajorityvoting.Theideaisan
intuitiveone:thoseclassifiersthathaveshowngood
performanceduringtrainingarerewardedwithhighervoting
weightsthantheothers.

40

Random Forest (Breiman 2001)

RandomForest:Avariationofthebaggingalgorithm

Createdfromindividualdecisiontreeswhoseparametersvary
randomly.Suchparameterscanbe

bootstrappedreplicasofthetrainingdata,asinbagging,but
theycanalsobe

differentfeaturesubsetsasinrandomsubspacemethods.

Duringclassification,eachtreevotesandthemostpopular
classisreturned

Thediagramshouldbeinterpretedwiththeunderstandingthatthealgorithmissequential:classifierCKiscreated
beforeclassifierCK+1,whichinturnrequiresthatKandthecurrentdistributionDKbeavailable.
42

41

Random Forest (Breiman 2001)

TwoMethodstoconstructRandomForest:

References

ForestRI(randominputselection):Randomlyselect,ateachnode,F
attributesascandidatesforthesplitatthenode.TheCART
methodologyisusedtogrowthetreestomaximumsize
ForestRC(randomlinearcombinations):Createsnewattributes(or
features)thatarealinearcombinationoftheexistingattributes
(reducesthecorrelationbetweenindividualclassifiers)

ComparableinaccuracytoAdaboost,butmorerobustto
errorsandoutliers
Insensitivetothenumberofattributesselectedfor
considerationateachsplit,andfasterthanbaggingorboosting

43

DataMining:PracticalMachineLearningToolsandTechniqueswithJava
Implementations,IanH.WittenandEibeFrank,1999

DataMining:PracticalMachineLearningToolsandTechniquessecond
edition,IanH.WittenandEibeFrank,2005

ToddHolloway,2008,EnsembleLearningBetterPredictionsThrough
Diversity,powerpointpresentation

LeandroM.Almeida,SistemasBaseadosemComitsdeClassificadores

CongLi,2009,MachineLearningBasics3.EnsembleLearning

R.Polikar,Ensemblebasedsystemsindecisionmaking,IEEECircuitsand
SystemsMagazine,vol.6,no.3,pp.2145,Quarter2006.

44

S-ar putea să vă placă și