Sunteți pe pagina 1din 24

Classification

Neural
eu a Networks
e o s1

Jeff Howbert Introduction to Machine Learning Winter 2012 1


Neural networks

z Topics
Perceptrons
structure
training

expressiveness

Multilayer networks
possible structures
activation functions
training with gradient descent and backpropagation
expressiveness

Jeff Howbert Introduction to Machine Learning Winter 2012 2


Connectionist models

z Consider humans:
Neuron switching time ~ 0.001
0 001 second
Number of neurons ~ 1010
Connections per neuron ~ 1044-55
Scene recognition time ~ 0.1 second
100 iinference
f steps
t d
doesnt
t seem lik
like enough
h

Massively parallel computation

Jeff Howbert Introduction to Machine Learning Winter 2012 3


Neural networks

z Properties:
Many neuron-like threshold switching units
Many weighted interconnections among units
Highly parallel, distributed process
Emphasis on tuning weights automatically

Jeff Howbert Introduction to Machine Learning Winter 2012 4


Neural network application

ALVINN: An Autonomous Land Vehicle In a


Neural Network
(Carnegie Mellon University Robotics Institute, 1989-1997)
ALVINN is a perception system which
learns to control the NAVLAB vehicles
by watching a person drive. ALVINN's
architecture consists of a single hidden
layer back-propagation network. The
input layer of the network is a 30x32
unit two dimensional "retina" which
receives input from the vehicles video
camera. Each input unit is fully
connected to a layer of five hidden
units which are in turn fully connected
to a layer of 30 output units. The output
layer is a linear representation of the
direction the vehicle should travel in
order to keep the vehicle on the road.
Jeff Howbert Introduction to Machine Learning Winter 2012 5
Neural network application

ALVINN drives 70 mph


on highways!

Jeff Howbert Introduction to Machine Learning Winter 2012 6


Perceptron structure

z Model is an assembly of
nodes connected by
weighted links

z Output
O t t node
d sums up its it
input values according to
the weights
g of their links

z Output node sum then y = I ( w j x j t ) or


compared against some j

y = sign( w j x j t )
threshold t
j

Jeff Howbert Introduction to Machine Learning Winter 2012 7


Example: modeling a Boolean function

X1 X2 X3 Y
1 0 0 0
1 0 1 1
1 1 0 1
1 1 1 1
0 0 1 0
0 1 0 0
0 1 1 1
0 0 0 0

Output Y is 1 if at least two of the three inputs are equal to 1.

Jeff Howbert Introduction to Machine Learning Winter 2012 8


Perceptron model

X1 X2 X3 Y
1 0 0 0
1 0 1 1
1 1 0 1
1 1 1 1
0 0 1 0
0 1 0 0
0 1 1 1
0 0 0 0

y = I ( 0 . 3 x1 + 0 . 3 x 2 + 0 . 3 x 3 > 0 . 4 )
1 if z is true
where I ( z ) =
0 otherwise
Jeff Howbert Introduction to Machine Learning Winter 2012 9
Perceptron decision boundary

Perceptron decision boundaries are linear


(hyperplanes in higher dimensions)

Example: decision surface for Boolean function on preceding slides


Jeff Howbert Introduction to Machine Learning Winter 2012 10
Expressiveness of perceptrons

z Can model any function where positive and negative


examples are linearly separable
Examples: Boolean AND, OR, NAND, NOR
z Cannot (fully) model functions which are not linearly
separable.
separable
Example: Boolean XOR

Jeff Howbert Introduction to Machine Learning Winter 2012 11


Perceptron training process

1. Initialize weights with random values.


2. Do
a. Apply perceptron to each training example.
b. If example is misclassified, modify weights.
3. Until all examples are correctlyy classified, or
process has converged.

Jeff Howbert Introduction to Machine Learning Winter 2012 12


Perceptron training process

z Two rules for modifying weights during training:


Perceptron training rule
train on thresholded outputs
driven byy binaryy differences between correct and
predicted outputs
modify weights with incremental updates

Delta rule
train on unthresholded outputs
driven by continuous differences between correct
and predicted outputs
modify weights via gradient descent

Jeff Howbert Introduction to Machine Learning Winter 2012 13


Perceptron training rule

1. Initialize weights with random values.


2. Do
a. Apply perceptron to each training sample i.
b. If sample i is misclassified, modify all weights j.
w j w j + ( yi yi ) xij
where
yi is target (correct)output for samplei (0 or 1)
yi is thresholded p
perceptron
p output
p ((0 or 1))
is learning rate (a small constant)

3 Until all samples are correctly classified


3. classified.
Jeff Howbert Introduction to Machine Learning Winter 2012 14
Perceptron training rule

a. If sample i is misclassified, modify all weights j.


w j w j + ( yi y i ) xij
where
yi is target
g ((correct)) output p i ((0 or 1))
p for sample
y i is thresholded perceptron output (0 or 1)
is learning
g rate ((a small constant))
Examples:
yi = y i no update
yi y i = 1; xij small,
ll positive
iti w j increased
i d by
b small
ll amountt
yi y i = 1; xij large, negative w j decreased by large amount
yi y i = 1; xij large, negative w j increased by large amount
Jeff Howbert Introduction to Machine Learning Winter 2012 15
Perceptron training rule

z Example of processing one sample

= 0.1
yi y i = 1

( yi y i ) xi1 = 0.1
( yi y i ) xi 2 = 0.0
( yi y i ) xi 3 = 0.1

Jeff Howbert Introduction to Machine Learning Winter 2012 16


Delta training rule

z Based on squared error function for weight


vector:
1 1
E (w ) = ( yi y i ) = ( yi w x i ) 2
2

2 i 2 i
Note that error is difference between correct
output and unthresholded sum of inputs, a
continuous quantity (rather than binary difference
between correct output and thresholded output).

z Weights are modified by descending gradient


of error function
function.
Jeff Howbert Introduction to Machine Learning Winter 2012 17
Squared error function for weight vector w

Jeff Howbert Introduction to Machine Learning Winter 2012 18


Gradient of error function

Gradient :
E E E
E ( w ) = , ,L,
w0 w1 wd

Training rule for w :


w = E (w
(w )

Training rule for individual weight :


E
w j =
w j

Jeff Howbert Introduction to Machine Learning Winter 2012 19


Gradient of squared error function

E 1
=
w j w j 2 i
( y i y
i ) 2

1
=
2 w j
i i
(
i
y y
) 2

1
= 2( yi y i ) ( yi y i )
2 i w j

= ( yi y i ) ( yi w x i )
i w j
E
= ( yi y i )( xijj )
w j i

Jeff Howbert Introduction to Machine Learning Winter 2012 20


Delta training rule
1. Initialize weights with random values.
2. Do
a. Apply perceptron to each training sample i.
b. If sample i is misclassified, modify all weights j.
w j w j + ( yi y i ) xij
where
yi is target (correct) output for sample i (0 or 1)
y i is unthresholded perceptron output (continuous value)
is learning rate (a small constant)
3. Until all samples
p are correctly
y classified,, or process
p
converges.
Jeff Howbert Introduction to Machine Learning Winter 2012 21
Gradient descent: batch vs. incremental

z Incremental mode (illustrated on preceding slides)


Compute error and weight updates for a single
sample.
Apply updates to weights before processing next
sample.

z Batch mode
Compute errors and weight updates for a block of
samples (maybe all samples)
samples).
Apply all updates simultaneously to weights.

Jeff Howbert Introduction to Machine Learning Winter 2012 22


Perceptron training rule vs. delta rule

z Perceptron training rule guaranteed to correctly classify


all training samples if:
Samples are linearly separable.
Learning rate is sufficiently small.

z Delta rule uses gradient descent. Guaranteed to


converge to hypothesis with minimum squared error if:
Learning rate is sufficiently small.
Even when:
Training data contains noise.
Trainingg data not linearly
y separable.
p

Jeff Howbert Introduction to Machine Learning Winter 2012 23


Equivalence of perceptron and linear models


y = I ( w j x j t )
j

Jeff Howbert Introduction to Machine Learning Winter 2012 24

S-ar putea să vă placă și