Fall 2016 Chapter 1 Lecture 3

Neural Networks
Chapter 1
Rosenblatt's Perceptron
Dr. Vincent A. Cassella
Catholic University of America
Material Acknowledgement
Neural Networks and Learning Machines, Third Edition
Simon Haykin
Copyright 2009 by Pearson Education, Inc.

Upper Saddle River, New Jersey 07458
All rights reserved.
Perceptron
Figure 1.1 Signal-flow graph of the perceptron.
0, ()=1
<0, ()=-1
Decision Threshold

Simon Haykin
The goal of the

perceptron is to correctly
classify the set of
externally applied stimuli
x1,x2,...,xm into one of two
classes C1 or C2. The
decision rule for the
classification is to assign
the point represented by
the inputs x1,x2,...,xm to
class C1 if the perceptron
output y=+1 and to class
C2 if it is -1.

Perceptron (Hyperplane)
Figure 1.2 Illustration of the hyperplane (in this example,
a straight line) as decision boundary for a
two-dimensional, two-class pattern-classification problem.
Decision Threshold
Class 2
Class 1
BREAK
Simon Haykin

Perceptron Convergence Theorem

To understand the error-correction technique learning algorithm for the
perceptron, it is convenient to work with the modified signal-flow graph
model.
Figure 1.3 Equivalent signal-flow graph of the perceptron;
dependence on time has been omitted for clarity.
x(n)=[+1, x1(n), x2(n),..., xm(n)]T

w(n)=[b, w1(n), w2(n),..., wm(n)]T

Simon Haykin


When data is linearly
separable, there exists a
weight vector w such that we
may state the following.
wTx>0 for every input vector x
belonging to Class 1.
wTx0 for every input vector x
belonging to Class 2.
Figure 1.4 (a) A pair of linearly separable

patterns. (b) A pair of non-linearly separable.
Perceptron
works
Perceptron
doesn't
work
The class with the equality is

arbitrary.
Simon Haykin


Algorithm for adapting the weight vector.
1. If the nth member of the training set, x(n), is correctly classified by the
weight vector w(n) computed at the nth iteration of the algorithm, no
correction is made to the weight vector of the perceptron in accordance with
the rule:
w(n+1)=w(n) if wTx(n)>0 and x(n) belongs to class 1.

w(n+1)=w(n) if wTx(n)0 and x(n) belongs to class 2.
2. Otherwise, the weight vector of the perceptron is updated in accordance
with the following rule.
w(n+1)=w(n)-(n)x(n) if wTx(n)>0 and x(n) belongs to class 2.

w(n+1)=w(n)+(n)x(n) if wTx(n)0 and x(n) belongs to class 1.
The learning-rate parameter (n)>0 controls the adjustment applied to the
weight vector w at iteration n.
Simon Haykin


w(n+1)=w(n)-(n)x(n) if wTx(n)>0 and x(n) belongs to class 2.
w(n+1)=w(n)+(n)x(n) if wTx(n)0 and x(n) belongs to class
1.
Misclassified Data adjusts the
weight vector in a direction that
will enable it to ultimately
correctly classify all the data.
Conflicting requirements for the

learning-rate parameter (n).
Small (n) for stable weights.
Large (n) for fast learning.

Simon Haykin


Simon Haykin

Perceptron and Two-class Bayes Classifier

We are given an observation x, and we need to determine the class that the
observation came from.
For example, we need to determine if an apple is good or rotten based on the
observed average RGB color.
In the Bayes classifier, all possible observations shall be defined as belonging to
either Class 1 or Class 2. It will make mistakes if there exists an observation that
can possibly come from either class. However, we want to create the classifier so
that it minimizes the average risk or cost.
Definitions:
Cij is the cost of classifying observation x as i when it came from j.
pi is the probability of coming from class i. (p1+p2=1)
px(x|Ci) is the probability of observing x given that it came from class i.

Simon Haykin


The Bayes Classifier divides up the observation space into either classes
1 or 2. This is done in a way that minimizes the risk of the classification.
pi px(x|Ci) is the probability of both observing x (i.e. a specific RGB color of

the apple) given that it came from a region we defined as class i AND the
probability of actually coming from class i (i.e. good apple or rotten apple).
Correct Classifications
Wrong Classifications
c11=cost of bagging a good apple, c22=cost of trashing a rotten apple, c21=cost of
trashing a good apple, c12=cost of bagging a rotten apple Note: c21>c11 and c12>c22
Simon Haykin

Correct Classifications
Class 1 is defined to
minimize the overall
risk or cost.

Simon Haykin
Wrong Classifications

Fixed risk.
In order to minimize the Risk, assign all

observations x that cause the integrand to
be less than zero to Class 1.
Likelihood ratio test

Assign observation x to Class 1.
Assign observation x to Class 2.
End of Lecture 2
Simon Haykin

These classifiers are equivalent.
Figure 1.5 Two equivalent implementations of the Bayes classifier: (a) Likelihood ratio test, (b) Log-likelihood ratio test.

Simon Haykin

Bayes Classifier for a Gaussian Distribution

Special Case: The mean of X varies between classes but the
covariance matrix of X is the same for both classes.
Class 1
Class 2
The covariance matrix C is assumed to be nonsingular. Note C=CT
Decision Boundary

Simon Haykin


Decision Boundary
Choose Class 1, Otherwise,
Choose Class 2
Choose Class 1, Otherwise, Choose Class 2

Simon Haykin

Define:
and
Now, the classifier is in the form

Figure 1.6 Signal-flow graph of Gaussian classifier.
Therefore, when the covariance

matrix of X is the same for both
classes, the Bayes classifier of a
Gaussian distribution is in the
form of Rosenblatt's perceptron.

Simon Haykin


Figure 1.7 Two overlapping, one-dimensional Gaussian distributions.
The perceptron provides perfect classification when the classes are linearly
separable. However, Gaussian distributions overlap and are not linearly
separable. Since, classification error must occur, the goal is to minimize the risk.
Rosenblatt's perceptron is capable of doing that when the covariance matrices
are the same.
Simon Haykin

Study Material
What is the structure of a Rosenblatt perceptron and what is its goal? Is the
decision threshold of a Rosenblatt perceptron always a hyperplane or line? What
type of data is the Rosenblatt perceptron capable of perfectly classifying? How
are perceptron weights adapted using the Perceptron Convergence Theorem
(PCT)? Why does the PCT work? How are the weights initialized with the PCT?
What does the Bayes classifier minimize? What are the four costs considered
when forming the Bayes classifier? What is the ultimate test that is formed from
the Bayes classifier? How do you construct the Likelihood Ratio Test (LRT) and
what are all the variables involved? Can log(LRT) give a different result than
LRT? What is the special class of Gaussian data that enables a Rosenblatt
perceptron to duplicate Bayes classification? Can a Rosenblatt perceptron
perfectly classify Gaussian data? Can any classifier perfectly classify Gaussian
data?

Simon Haykin


Fall 2016 Chapter 1 Lecture 3

Încărcat de

Informații document

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Fall 2016 Chapter 1 Lecture 3

Încărcat de

Drepturi de autor:

Formate disponibile

Neural Networks

Copyright 2009 by Pearson Education, Inc.

Neural Networks and Learning Machines, Third Edition

The goal of the

Copyright 2009 by Pearson Education, Inc.

Copyright 2009 by Pearson Education, Inc.

Perceptron Convergence Theorem

x(n)=[+1, x1(n), x2(n),..., xm(n)]T

Neural Networks and Learning Machines, Third Edition

Copyright 2009 by Pearson Education, Inc.

Perceptron Convergence Theorem

Figure 1.4 (a) A pair of linearly separable

The class with the equality is

Copyright 2009 by Pearson Education, Inc.

Perceptron Convergence Theorem

w(n+1)=w(n) if wTx(n)>0 and x(n) belongs to class 1.

w(n+1)=w(n)-(n)x(n) if wTx(n)>0 and x(n) belongs to class 2.

Copyright 2009 by Pearson Education, Inc.

Perceptron Convergence Theorem

Conflicting requirements for the

Neural Networks and Learning Machines, Third Edition

Copyright 2009 by Pearson Education, Inc.

Perceptron Convergence Theorem

Neural Networks and Learning Machines, Third Edition

Copyright 2009 by Pearson Education, Inc.

Perceptron and Two-class Bayes Classifier

Neural Networks and Learning Machines, Third Edition

Copyright 2009 by Pearson Education, Inc.

Perceptron and Two-class Bayes Classifier

pi px(x|Ci) is the probability of both observing x (i.e. a specific RGB color of

Copyright 2009 by Pearson Education, Inc.

Perceptron and Two-class Bayes Classifier

Neural Networks and Learning Machines, Third Edition

Copyright 2009 by Pearson Education, Inc.

Perceptron and Two-class Bayes Classifier

In order to minimize the Risk, assign all

Likelihood ratio test

Copyright 2009 by Pearson Education, Inc.

Perceptron and Two-class Bayes Classifier

These classifiers are equivalent.

Neural Networks and Learning Machines, Third Edition

Copyright 2009 by Pearson Education, Inc.

Bayes Classifier for a Gaussian Distribution

The covariance matrix C is assumed to be nonsingular. Note C=CT

Neural Networks and Learning Machines, Third Edition

Copyright 2009 by Pearson Education, Inc.

Bayes Classifier for a Gaussian Distribution

Choose Class 1, Otherwise, Choose Class 2

Neural Networks and Learning Machines, Third Edition

Copyright 2009 by Pearson Education, Inc.

Bayes Classifier for a Gaussian Distribution

Now, the classifier is in the form

Therefore, when the covariance

Neural Networks and Learning Machines, Third Edition

Copyright 2009 by Pearson Education, Inc.

Bayes Classifier for a Gaussian Distribution

Copyright 2009 by Pearson Education, Inc.

Neural Networks and Learning Machines, Third Edition

Copyright 2009 by Pearson Education, Inc.

S-ar putea să vă placă și