Sunteți pe pagina 1din 99

1/10/2017

1

Neural Networks

1/10/2017 1 Neural Networks for Pattern Classification

for

1/10/2017 1 Neural Networks for Pattern Classification

Pattern Classification

1/10/2017 1 Neural Networks for Pattern Classification

1/10/2017

2

Neural Networks for Pattern Classification

General discussion

Linear separability

Hebb nets

Perceptron

1/10/2017

3

General discussion

Pattern recognition

Patterns: images, personal records, driving habits, etc. Represented as a vector of features (encoded as integers or real numbers in NN)

Pattern classification:

Classify a pattern to one of the given classes

Form pattern classes

Pattern associative recall

Using a pattern to recall a related pattern

Pattern completion: using a partial pattern to recall the whole

pattern

Pattern recovery: deals with noise, distortion, missing information

1/10/2017

4

General architecture Single layer

net input to Y:

net

b

n

i 1

x w

i

i

1 b w x 1 1 w Y n x n
1
b
w
x
1
1
w
Y
n
x
n

bias b is treated as the weight from a special unit with constant

output 1.

threshold related to Y output

y

f (

net

)

1

-1

if

net

if

net

classify

( x

1

,

x

n

) into one of the two classes

1/10/2017

5

Decision region/boundary n = 2, b != 0, q = 0

b

x

2

x w

1

1

x w

2

 

w

1

w

2

x

1

2

0 or

b

w

2

x 2 + - x 1
x
2
+
-
x
1

is a line, called decision boundary, which partitions the plane into two decision regions

If a point/pattern

, and the output is one (belongs to class

( x , x

1

0

2

) is in the positive region, then

b x w

1

1

x w

2

2

one)

Otherwise,

b x w

1

1

x w

2

2

0

, output 1 (belongs to class two)

n = 2, b = 0, q != 0 would result a similar partition

1/10/2017

6

If n = 3 (three input units), then the decision boundary is a two dimensional plane in a three dimensional space

In general, a decision boundary

b

n

i

1

x w

i

i

0

dimensional hyper-plane

partition the space into two decision regions

in

dimensional

an

n

space,

is a n-1 which

This simple network thus can classify a given pattern into one of the two classes, provided one of these two classes is entirely in one decision region (one side of the decision boundary) and the other class is in another region.

The decision boundary is determined completely by the weights W and the bias b (or threshold θ).

1/10/2017

7

Linear Separability Problem

If

two

classes

of

patterns

can

be

separated

boundary, represented by the linear equation

b

by

a

decision

n

i 1

x w

i

i

0

then they are said to be linearly separable. The simple network

can correctly classify any patterns.

Decision boundary (i.e., W, b or θ) of linearly separable classes can be determined either by some learning procedures or by solving linear equation systems based on representative patterns

of each classes

If such a decision boundary does not exist, then the two classes are said to be linearly inseparable.

Linearly inseparable problems cannot be solved by the simple network , more sophisticated architecture is needed.

1/10/2017

8

Examples of linearly separable classes

- Logical AND function

patterns (bipolar) decision boundary

x1

x2

y

w1 = 1

-1

-1

-1

w2 = 1

-1

1

-1

b = -1

1

-1

-1

θ= 0

1

1

1

-1 + x1 + x2 = 0

- Logical OR function

patterns (bipolar) decision boundary

x1

x2

y

w1 = 1

-1

-1

-1

w2 = 1

-1

1

1

b = 1

1

-1

1

θ = 0

1

1

1

1 + x1 + x2 = 0

x o o o
x
o
o
o
x: class I (y = 1) o: class II (y = -1) x x o
x: class I (y = 1)
o: class II (y = -1)
x
x
o
x

x: class I (y = 1) o: class II (y = -1)

1/10/2017

9

Examples of linearly inseparable classes

- Logical XOR (exclusive OR) function patterns (bipolar) decision boundary

x1

x2

y

-1

-1

-1

-1

1

1

1

-1

1

1

1

-1

o x o x
o
x
o
x

x: class I (y = 1) o: class II (y = -1)

No line can separate these two classes.

1/10/2017

10

XOR can be solved by a more complex network with hidden units

  1 2 x1 z1 2   0 -2 Y -2 x2 z2
  1
2
x1
z1
2
  0
-2
Y
-2
x2
z2
2
2

(-1, -1)

(-1,-1)

-1

(-1, 1)

(-1, 1)

1

(1, -1)

(1, -1)

1

(1, 1)

(1, 1)

-1

1/10/2017

11

1/10/2017 1 1
1/10/2017 1 1

1/10/2017

12

1/10/2017 1 2

1/10/2017

13

Different non linearly separable problems

Structure Types of Decision Regions Exclusive-OR Problem Classes with Meshed regions Most General Region Shapes
Structure
Types of
Decision Regions
Exclusive-OR
Problem
Classes with
Meshed regions
Most General
Region Shapes
Single-Layer
Half Plane
Bounded By
Hyperplane
A
B
B
A
B
A
Two-Layer
Convex Open
A
B
B
Or
Closed Regions
A
B
A
Abitrary
Three-Layer
A
B
(Complexity
B
Limited by No.
of Nodes)
A
B
A

1/10/2017

14

Can a single neuron learn a task?

1/10/2017

15

Hebb Nets

Hebb, in his influential book The organization of Behavior (1949), claimed Behavior changes are primarily due to the changes of synaptic

strengths (

increases only when both I and j are “on”: the Hebbian learning law

increases only if the

outputs of both units

) between neurons I and j

w ij

w

ij

In ANN, Hebbian law can be stated:

x

i

and

y

j

w

ij

have the same sign.

In our simple network (one output and n input units)

w

ij

or,

w

ij

w

ij

( new )

w

w

ij

(

new

) w

ij

ij

(

old

(

old

)

x y

i

) x y

i

1/10/2017

16

Hebb net (supervised) learning algorithm (p.49)

Step 0. Initialization: b = 0, wi = 0, i = 1 to n Step 1. For each of the training sample s:t do steps 2 -4

/* s is the input pattern, t the target output of the sample */

Step 2.

xi := si, I = 1 to n

/* set s to input units */

Step 3.

y := t

/* set y to the target */

Step 4.

wi := wi + xi * y, i = 1 to n b := b + xi * y

/* update weight */ /* update bias */

Notes: 1) α = 1, 2) each training sample is used only once.

Examples: AND function

Binary units (1, 0)

(x1, x2, 1)

y=t

w1

w2

b

(1,

1,

1)

1

1

1

1

(1,

0,

1)

0

1

1

1

(0,

1,

1)

0

1

1

1

(0,

0,

1)

0

1

1

1

 
  bias unit

bias unit

An incorrect boundary:

1 + x1 + x2 = 0 Is learned after using each sample once

1/10/2017

17

Bipolar units (1, -1)

(x1, x2, 1)

y=t

w1

w2

b

(1,

1,

1)

1

1

1

1

(1, -1,

1)

-1

0

2

0

(-1,

1,

1)

-1

1

1

-1

(-1, -1,

1)

-1

2

2

-2

A

correct boundary

-1 + x1 + x2 = 0

is

successfully learned

It will fail to learn x1 ^ x2 ^ x3, even though the function is linearly separable.

Stronger learning methods are needed.

Error driven: for each sample s:t, compute y from s based on current W and b, then compare y and t

Use training samples repeatedly, and each time only change

weights slightly (α<< 1)

Learning methods of Perceptron and Adaline are good examples

1/10/2017

18

The Perceptron

In 1958, Frank Rosenblatt introduced a training algorithm that

provided the first procedure for training a simple ANN: a perceptron.

The operation of Rosenblatt’s perceptron is based on the

McCulloch and Pitts neuron model. The model consists of a

linear combiner followed by a hard limiter.

The weighted sum of the inputs is applied to the hard limiter, which produces an output equal to +1 if its input is positive and -

1 if it is negative.

1/10/2017

19

Single-layer two-input perceptron

Inputs

x 1

x 2

Li near

Hard

9 Single-layer two-input perceptron Inputs x 1 x 2 Li near Hard w 1 Com bin
w 1
w
1
9 Single-layer two-input perceptron Inputs x 1 x 2 Li near Hard w 1 Com bin

Com bin er

Li m iter

9 Single-layer two-input perceptron Inputs x 1 x 2 Li near Hard w 1 Com bin
9 Single-layer two-input perceptron Inputs x 1 x 2 Li near Hard w 1 Com bin


Output

9 Single-layer two-input perceptron Inputs x 1 x 2 Li near Hard w 1 Com bin

Y

9 Single-layer two-input perceptron Inputs x 1 x 2 Li near Hard w 1 Com bin
w 2
w
2

T hreshold

1/10/2017

20

The aim of the perceptron is to classify inputs, x 1 , x 2 , one of two classes, say A 1 and A 2 .

In the case of an elementary perceptron, the n- dimensional

space is divided by a hyperplane into two decision regions. The

., x n , into

hyperplane is defined by the linearly separable function:

n

i 1

x w

i

i

   0

1/10/2017

21

Linear separability in the perceptron

x x 2 2 Cl ass A 1 1 2 1 x 1 Cl ass
x
x 2
2
Cl ass A 1
1
2
1
x 1
Cl ass A 2
x 1
2
x 1 w 1 + x 2 w 2 
= 0
x
1 w 1 + x 2 w 2 + x 3 w 3 
= 0
x 3
(a)
Tw o-in put perceptron.
(b) Three-in put perceptron.

1/10/2017

22

How does the perceptron learn its classification tasks?

This is done by making small adjustments in the weights to reduce

the difference between the actual and desired outputs of the

perceptron.

The initial weights are randomly assigned, usually in the range [- 0.5, 0.5], and then updated to obtain the output consistent with the

training examples.

1/10/2017

23

If at iteration p, the actual output is Y(p) and the desired output is Y d (p), then the error is given by:

e( p )

Y ( p ) Y ( p )

d

where p = 1, 2, 3,

Iteration p here refers to the p th training example presented to the perceptron.

If the error, e(p), is positive, we need to increase perceptron

output Y(p), but if it is negative, we need to decrease Y(p).

1/10/2017

24

The perceptron learning rule

w ( p )  a . x ( p ) .  e( p
w ( p )
 a
.
x ( p )
.
e( p )
i
i

w ( p

i

1)

where p = 1, 2, 3,

α is the learning rate, a positive constant less than unity.

The perceptron learning rule was first proposed by Rosenblatt in

1960. Using this rule we can derive the perceptron training algorithm for classification tasks.

1/10/2017

25

Perceptron’s training algorithm

Step 1: Initialization

Set initial weights w 1 , w 2 ,, w n and threshold θ to random numbers in the range [-0.5, 0.5].

If the error, e(p), is positive, we need to increase perceptron

output Y(p), but if it is negative, we need to decrease Y(p).

Step 2: Activation

Activate the perceptron by applying inputs x 1 (p), x 2 (p),, x n (p) and desired output Y d (p). Calculate the actual output at iteration p = 1

 

n

Y (

p

)

step

i 1

x i

( p )

w

i (

p

)

1/10/2017

26

Perceptron’s training algorithm (continued)

where n is the number of the perceptron inputs, and step is a step activation function.

Step 3: Weight training

Update the weights of the perceptron

w

i

( p

1)

w

i

(

p )  

w

i

(

p )

where Δw i (p) is the weight correction at iteration p.

The weight correction is computed by the delta rule:

w

i

( p )   

x ( p )
i

.

e ( p )

1/10/2017

27

Perceptron’s training algorithm (continued)

Step 4: Iteration

Increase iteration p by one, go back to Step 2 and repeat the process until convergence.

1/10/2017

28

Example of perceptron learning: the logical operation AND

4

5

Epoch

In puts

x 1

x 2

1 0 0 0 0.3  0.1 0 1 0 0.3  0.1 1 0
1 0
0
0
0.3
 0.1
0
1
0
0.3
 0.1
1
0
0.3
0
 0.1
1
0.2
1
1
 0.1
2 0
0
0
0.3
0.0
0
0
0.3
0.0
1
1
0
0
0.3
0.0
1
1
1
0.2
0.0
3 0
0
0
0.2
0.0
0
0
0.2
0.0
1
1
0
0
0.2
0.0
1
1
1
0.1
0.0
Desired Initial Actual Error Final output w eights output w eights Y Y d w
Desired
Initial
Actual
Error
Final
output
w eights
output
w eights
Y
Y d
w 1
w 2
e
w 1
w 2
0
0
0.3
 0.1
0
0
0.3
 0.1
1
0.2
1
 0.1
1
0.3
0
0.0
0
0
0.3
0.0
0
0.3
0.0
0
1
1
0.2
0.0
1
0
0.2
0.0
0
0
0.2
0.0
0
0.2
0.0
0
1
1
0.1
0.0
0
1
0.2
0.1
0
0
0
0.2
0.1
0
0
0.2
0.1
0
0
0.2
0.1
0
0.2
0.1
1
0
1
0
0
0.2
0.1
1
1
0.1
0.1
1
1
1
0.1
0.1
1
0
0.1
0.1
0
0
0
0.1
0.1
0
0
0.1
0.1
0
0
0.1
0.1
0
0.1
0.1
1
0
0
1
0
0
0.1
0.1
0
0.1
0.1
1
1
1
0.1
0.1
1
0
0.1
0.1
0 0 0.1 0.1 0 0 0.1 0.1 0 0.1 0.1 1 0 0 1 0
0 0 0.1 0.1 0 0 0.1 0.1 0 0.1 0.1 1 0 0 1 0
0 0 0.1 0.1 0 0 0.1 0.1 0 0.1 0.1 1 0 0 1 0
0 0 0.1 0.1 0 0 0.1 0.1 0 0.1 0.1 1 0 0 1 0
0 0 0.1 0.1 0 0 0.1 0.1 0 0.1 0.1 1 0 0 1 0
0 0 0.1 0.1 0 0 0.1 0.1 0 0.1 0.1 1 0 0 1 0
0 0 0.1 0.1 0 0 0.1 0.1 0 0.1 0.1 1 0 0 1 0
0 0 0.1 0.1 0 0 0.1 0.1 0 0.1 0.1 1 0 0 1 0

Threshold: = 0.2; l earning rate:

1 0 0 1 0 0 0.1 0.1 0 0.1 0.1 1 1 1 0.1 0.1

= 0.1

1/10/2017

29

Two-dimensional plots of basic logical operations

x 2 x 2 x 2 1 1 1 x 1 x 1 x 1
x 2
x 2
x 2
1
1
1
x 1
x 1
x 1
0
1
0
1
0
1
(a)
AND (x 1 n x 2 )
( c )
Ex cl us iv e- OR
( b )
OR ( x 1  x 2 )
( x 1  x 2 )

A perceptron can learn the operations AND and OR, but not

Exclusive-OR.

1/10/2017

30

Multilayer neural networks

A multilayer perceptron is a feedforward neural network with

one or more hidden layers.

The network consists of an input layer of source neurons, at least one middle or hidden layer of computational neurons, and

an output layer of computational neurons.

The input signals are propagated in a forward direction on a layer-by-layer basis.

1/10/2017

31

Multilayer perceptron with two hidden layers

Fi rst Sec ond Input hidden hidden Output layer layer layer layer n p u
Fi rst
Sec ond
Input
hidden
hidden
Output
layer
layer
layer
layer
n p u t S i g n a l sI
O u t p u t
S i g n a l s

1/10/2017

32

What does the middle layer hide?

A hidden layer “hides” its desired output. Neurons in the hidden

layer cannot be observed through the input/output behavior of the

network. There is no obvious way to know what the desired output of the hidden layer should be.

Commercial ANNs incorporate three and sometimes four layers,

including one or two hidden layers. Each layer can contain from

10 to 1000 neurons. Experimental neural networks may have five or even six layers, including three or four hidden layers, and utilize millions of neurons.

1/10/2017

33

Back-propagation neural network

Learning in a multilayer network proceeds the same way as for a perceptron.

A training set of input patterns is presented to the network.

The network computes its output pattern, and if there is an error - or in other words a difference between actual and desired output patterns - the weights are adjusted to reduce this error.

In a back-propagation neural network, the learning algorithm has two phases.

First, a training input pattern is presented to the network input layer.

The network propagates the input pattern from layer to layer until

the output pattern is generated by the output layer.

1/10/2017

34

If this pattern is different from the desired output, an error is calculated and then propagated backwards through the network

from the output layer to the input layer. The weights are

modified as the error is propagated.

1/10/2017

35

Three-layer back-propagation neural network

In p ut s ig na l s 1 x 1 1 y 1 1
In p ut
s ig na l s
1
x
1
1
y
1
1
2
x
2
2
y
2
2
i
w ij
j
w j k
y
k
x
k
i
m
n
y
l
l
x
n
H i d d en
In
p ut
O
utpu t
la y er
la y er
la y er
Erro r
s ig na l s

1/10/2017

36

The back-propagation training algorithm

Step 1: Initialization

Set all the weights and threshold levels of the network to random numbers uniformly distributed inside a small range:

 2.4 2.4    F F  i i 
2.4
2.4 
F
F
i
i

,

in the

network. The weight initialization is done on a neuron-by- neuron basis.

where

F i is the total number of inputs

of neuron

i

1/10/2017

37

Step2: Activation

Activate the back-propagation neural network by applying inputs x 1 (p), x 2 (p),, x n (p) and desired outputs y d,1 (p), y d,2 (p),,y d,n (p).

(a) Calculate the actual outputs of the neurons in the hidden layer:

 n  ( p )  sigmoi d y j   x (
n
(
p
)
sigmoi d
y j
 x
(
p
)
w
(
p )  

i 1

i

ij

j

where n is the number of inputs of neuron j in the hidden layer, and sigmoid is the sigmoid activation function.

1/10/2017

38

Step 2 : Activation (continued)

(b) Calculate the actual outputs of the neurons in the output

layer:

y ( p )  sigmoi d   x ( p )  w
y
(
p
)
 sigmoi d
x
(
p
)
w
(
p
)
 
k
jk
jk
k
j  1

m

where m is the number of inputs of neuron k in the output layer.

1/10/2017

39

Step 3: Weight training

Update the weights in the back-propagation network propagating backward the errors associated with output neurons. (a) Calculate the error gradient for the neurons in the output layer:

k ( p )  y k ( p )  1  y k

k

( p )

y

k

( p )

k ( p )  y k ( p )  1  y k (

1

y

k

( p )

 y k ( p )

e

k

( p )

 

where

e

k

(

)

p

y

,

d k

(

)

p

y

k

(

)

p

Calculate the weight corrections:

 ( p )   y ( p )  ( p ) w
( p )
y ( p )
( p )
w jk
j
k

Update the weights at the output neurons:

w

jk

( p

1)

w ( p )

jk

 

w ( p )

jk

1/10/2017

40

Step 3: Weight training (continued)

(b) Calculate the error gradient for the neurons in the hidden layer:

l ( p )  y ( p )  [ 1  y (
l
(
p
)
y
(
p
)
[
1
y
(
p
)
]
(
p
)
w
(
p
)
j
j
j
k
jk
k  1

Calculate the weight corrections:

w ij

( p )

 x i
x
i

( p )

j
j

( p )

Update the weights at the hidden neurons:

w ( p  1)  w ( p )   w ( p
w
( p
 1)
w
( p )
 
w
( p )
ij
ij
ij

1/10/2017

41

Step 4: Iteration

Increase iteration p by one, go back to Step 2 and repeat the process until the selected error criterion is satisfied.

As an example, we may consider the three-layer back-propagation

network. Suppose that the network is required to perform logical operation Exclusive-OR. Recall that a single-layer perceptron could not do this operation. Now we will apply the three-layer net.

1/10/2017

42

Three-layer network for solving the Exclusive-OR operation

1

 3 1  w 13 x 1 3 1 w 35  5 w
3
1
w
13
x
1
3
1
w
35
5
w
23
5
w
24
w
45
x
2
4
2
w
24
In put
Output
4
layer
layer
1

Hi ddenlayer

y 5

1/10/2017

43

The effect of the threshold applied to a neuron in the hidden or

output layer is represented by its weight, θ , connected to a fixed

input equal to -1.

The initial weights and threshold levels are set randomly as follows:

w 13 = 0.5, w 14 = 0.9, w 23 = 0.4, w 24 = 1.0, w 35 = -1.2, w 45 = 1.1, θ 3 = 0.8, θ 4 = -0.1 and θ 5 = 0.3.

= 0.4, w 2 4 = 1.0, w 3 5 = -1.2, w 4 5 =

1/10/2017

44

We consider a training set where inputs x 1 and x 2 are equal to 1 and desired output y d,5 is 0. The actual outputs of neurons 3 and 4 in the hidden layer are calculated as

y

3

y

4

sigmoid

sigmoid

(

(

x w

1

13

x w

1

14

x

2

x

2

w

23

w

24



3

)



4

)

1/ 1

1/ 1

e

(10.5 10.4 10.8)

 

 

e

(10.9 11.0 10.1)

 

 





0.5250

0.8808

Now

the

actual

of

neuron

5

in

the

is

output

output

layer

determined as:

( 0.52501.2 0.88081.1 10.3)        y  sigmoid (
( 0.52501.2 0.88081.1 10.3)
 
 
y
 sigmoid
(
y w
y w

)
1/ 1
e
0.5097
5
3
35
4
45
5

Thus, the following error is obtained:

e  y  y  0  0.5097   0.5097 d ,5 5
e 
y
 y
0
0.5097
 
0.5097
d ,5
5

1/10/2017

45

The next step is weight training. To update the weights and

threshold levels in our network, we propagate the error, e, from the output layer backward to the input layer.

First, we calculate the error gradient for neuron 5 in the output

layer:  y 5 5
layer:
 y
5
5

(1

y

5

)

e

0.5097 (1

0.5097) ( 0.5097)

 

0.1274

Then we determine the weight corrections assuming that the

learning rate parameter, α , is equal to 0.1:

w

35

rate parameter, α , is equal to 0.1:  w 35   y 3 

y

3

 w   y 45 4    (  1) 5
 w
 y
45
4

(
1)
5

45 4    (  1) 5    5 5 5 

5

5 5
5
5

0.1 0.5250 ( 0.1274)

 

0.0067

0.1 0.8808 ( 0.1274)

 

0.0112

0.1 (

1)

(

0.1274)

 

0.0127

1/10/2017

46

Next we calculate the error gradients for neurons 3 and 4 in the hidden layer:

3  y (1  y )   w  0.5250 (1  
3
 y
(1
 y
)
 w
0.5250 (1
0.5250) (
0.1274) (
1.2)
0.0381
3
3
5
35
 y
(1
 y
)
 w
0.8808 (1
0.8808)
(
0.127 4) 1.1
 
0.0147
4
4
4
5
45

We then determine the weight corrections:

 w  13 3  w   x  23 2 3 
 w
13
3
 w
 x
23
2
3

(
1)
3
3
 w
 x 
14
1
4
w
 x
24
2
4

(
1)
4
4

0.1

1 0.0381

0.0038

0.1

0.1 (

1 0.0381

1) 0.0381

0.0038

 

0.0038

0.1

1

(

0.0147)

 

0.0015

0.1

1

0.1 (

(

0.0147)

 

0.0015

1) ( 0.0147) 0.0015

1/10/2017

47

At last, we update all weights and threshold:

     w  w w 0.5 0.0038 0.5038 13 13 13
 
w
 w
w
0.5
0.0038
0.5038
13
13
13
 
w
14
 w
w
0.9
0.0015
0.8985
14
14
 
w
23
 w
w
0.4
0.0038
0.4038
23
23
 
w
24
 w
w
1.0
0.0015
0.9985
24
24
 
 
 
w
35
 w
w
1.2 
0.0067
1.2067
35
35
 
w
45
 w
45
w
45
1.1
0.0112
1.0888
 
  
0.8 
0.0038
0.7962
3 3
3
 
  
 
 
0.1
0.0015
0.0985
4 4
4
 
  
0.3 
0.0127
0.3127
5 5
5

The training process is repeated until the sum of squared errors is less than 0.001.

1/10/2017

48

Learning curve for operation Exclusive-OR

Sum- Squared Ne tw ork Erro r for 224 Epochs 1 10 0 10 -1
Sum- Squared Ne tw ork Erro r for 224 Epochs
1
10
0
10
-1
10
-2
10
-3
10
-4
10
50
100
150
200
0
Sum-Squared Error

Epoch

1/10/2017

49

Final results of three-layer network learning

In puts

x 1

x 2

1

1

0

1

1

0

0

0

Desired output y d
Desired
output
y d

0

1

1

0

Actual

output

y 5

0.0155

0.9849

0.9849

0.0175

e
e
Actual output y 5 0.0155 0.9849 0.9849 0.0175 e Sum of squared errors 0.0010

Sum of

squared

errors

0.0010

0 1 1 0 Actual output y 5 0.0155 0.9849 0.9849 0.0175 e Sum of squared
0 1 1 0 Actual output y 5 0.0155 0.9849 0.9849 0.0175 e Sum of squared
0 1 1 0 Actual output y 5 0.0155 0.9849 0.9849 0.0175 e Sum of squared
0 1 1 0 Actual output y 5 0.0155 0.9849 0.9849 0.0175 e Sum of squared

1/10/2017

50

Network represented by McCulloch-Pitts model for solving the

Exclusive-OR operation

x 1

x 2

1

+1.5 1 +1.0 3 2.0 +0.5 +1.0 5 +1.0 +1.0 4 +1.0 +0.5
+1.5
1
+1.0
3
2.0
+0.5
+1.0
5
+1.0
+1.0
4
+1.0
+0.5

1

1
1
x 1 x 2  1 +1.5 1 +1.0 3 2.0 +0.5 +1.0 5 +1.0 +1.0
2
2
x 1 x 2  1 +1.5 1 +1.0 3 2.0 +0.5 +1.0 5 +1.0 +1.0

y 5

1/10/2017

51

Decision boundaries

x 2 x 1 + x 2 – 1.5 = 0 1 x 1 0
x 2
x 1 + x 2 – 1.5 = 0
1
x 1
0
1

( a )

x 2 x 1 + x 2 – 0.5 = 0 1 x 1 0
x
2
x 1 + x 2 – 0.5 = 0
1
x
1
0
1

( b)

x 2 1 x 1 0 1
x
2
1
x 1
0
1

( c)

(a) Decision boundary constructed by hidden neuron 3;

(b)Decision boundary constructed by hidden neuron 4;

(c)Decision

network

boundaries

constructed

by

the

complete

three-layer

1/10/2017

52

Pattern Association

and

Associative-Memory

1/10/2017 5 2 Pattern Association and Associative-Memory

1/10/2017

53

Neural networks were designed on analogy with the brain. The brain’s memory, however, works by association.

For example, we can recognize a familiar face even in an

unfamiliar environment within 100-200ms. We can also recall a complete sensory experience, including sounds and scenes, when we hear only a few bars of music. The brain routinely associates one thing with another.

Multilayer neural networks trained with the back-propagation

algorithm are used for pattern recognition problems. However, to emulate the human memory’s associative characteristics we need

a different type of network: a recurrent neural network. A recurrent neural network has feedback loops from its

outputs to its inputs. The presence of such loops has a profound impact on the learning capability of the network.

1/10/2017

54

Associative-Memory Networks

Input: Pattern (often noisy/corrupted)

Output: Corresponding pattern (complete / relatively noise-free)

Process

1. Load input pattern onto core group of highly-interconnected neurons.

2. Run core neurons until they reach a steady state.

3. Read output off of the states of the core neurons.

state. 3. Read output off of the states of the core neurons. Inputs Outputs Output: (1

Inputs

Outputs

output off of the states of the core neurons. Inputs Outputs Output: (1 -1 1 -1

Output: (1 -1 1 -1 -1)

Input: (1 0 1 -1 -1)

states of the core neurons. Inputs Outputs Output: (1 -1 1 -1 -1) I n p

1/10/2017

55

Associative Network Types

1. Auto-associative: X = Y

5 5 Associative Network Types 1. Auto-associative: X = Y *Recognize noisy versions of a pattern
5 5 Associative Network Types 1. Auto-associative: X = Y *Recognize noisy versions of a pattern
5 5 Associative Network Types 1. Auto-associative: X = Y *Recognize noisy versions of a pattern
5 5 Associative Network Types 1. Auto-associative: X = Y *Recognize noisy versions of a pattern

*Recognize noisy versions of a pattern

X = Y *Recognize noisy versions of a pattern 2. Hetero-associative Bidirectional: X <> Y BAM

2. Hetero-associative Bidirectional: X <> Y

pattern 2. Hetero-associative Bidirectional: X <> Y BAM = Bidirectional Associative Memory *Iterative correction
pattern 2. Hetero-associative Bidirectional: X <> Y BAM = Bidirectional Associative Memory *Iterative correction

BAM = Bidirectional Associative Memory

Bidirectional: X <> Y BAM = Bidirectional Associative Memory *Iterative correction of input and output

*Iterative correction of input and output

1/10/2017

56

Associative Network Types (2)

3. Hetero-associative Input Correcting: X <> Y

(2) 3. Hetero-associative Input Correcting: X <> Y *Input clique is auto-associative => repairs input
(2) 3. Hetero-associative Input Correcting: X <> Y *Input clique is auto-associative => repairs input
(2) 3. Hetero-associative Input Correcting: X <> Y *Input clique is auto-associative => repairs input

*Input clique is auto-associative => repairs input patterns

4. Hetero-associative Output Correcting: X <> Y

4. Hetero-associative Output Correcting: X <> Y *Output clique is auto-associative => repairs output
4. Hetero-associative Output Correcting: X <> Y *Output clique is auto-associative => repairs output

*Output clique is auto-associative => repairs output patterns

1/10/2017

57

Hebb’s Rule

Connection Weights ~ Correlations

``When one cell repeatedly assists in firing another, the axon of the first cell develops synaptic knobs (or enlarges them if they already exist) in contact

with the soma of the second cell.” (Hebb, 1949)

In an associative neural net, if we compare two pattern components (e.g. pixels)

within many patterns and find that they are frequently in:

a) the same state, then the arc weight between their NN nodes should be positive

b) different states, then

negative

Matrix Memory:

The weights must store the average correlations between all pattern components across all patterns. A net presented with a partial pattern can then use the correlations to recreate the entire pattern.

1/10/2017

58

Correlated Field Components

Each component is a small portion of the pattern field (e.g. a pixel).

In the associative neural network, each node represents one field component.

For every pair of components, their values are compared in each of several patterns.

Set weight on arc between the NN nodes for the 2 components ~ avg correlation.

a

b

?? ?? Avg Correlation
??
??
Avg Correlation

a

b

w ab a b
w ab
a
b

1/10/2017

59

Quantifying Hebb’s Rule

Compare two nodes to calc a weight change that reflects the state correlation:

Auto-Association:

Hetero-Association:

w

jk

w

jk

i

i

i

pk pj

pk

o

pj

* When the two components are the same (different), increase (decrease) the weight

i = input component o = output component

Ideally, the weights will record the average correlations across all patterns:

Auto:

w

jk

P

p 1

i

pk

i

pj

Hetero:

w

jk

P

p 1

i

pk

o

pj

Hebbian Principle: If all the input patterns are known prior to retrieval time,

then init weights as:

Auto:

w

jk

1

P

P

p 1

i

pk

i

pj

Hetero:

w

jk

1

P

P

p 1

i

pk

o

pj

Weights = Average Correlations

1/10/2017

60

Matrix Representation

Let X = matrix of input patterns, where each ROW is a pattern. So x k,i = the ith bit of the kth pattern. Let Y = matrix of output patterns, where each ROW is a pattern. So y k,j = the jth bit of the kth pattern.

Then, avg correlation between input bit i and output bit j across all patterns is:

1/P (x 1,i y 1,j + x 2,i y 2,j + + x p,i y p,j )

=

w i,j

To calculate all weights:

Hetero Assoc:

Auto Assoc:

W = X T Y W = X T X

Hetero Assoc: Auto Assoc: W = X T Y W = X T X X In

X

In Pattern 1: x 1,1

x

1,n

In Pattern 2: x 2,1

x

2,n

:

In Pattern p: x 1,1

x

1,n

X T Y Dot product P1 P2 Pp Out P1: y 1,1 y 1,j ……y
X T
Y
Dot product
P1
P2
Pp
Out P1: y 1,1
y 1,j ……y 1,n
Out P2: y 2,1
y 2,j ……y 2,n
X 1,i
X 2,i
X p,i
:
Out P3: y p,1
y p,j ……y p,n

1/10/2017

61

Auto-Associative Memory

1. Auto-Associative Patterns to Remember

1 2 3 4
1
2
3
4
1 2 3 4
1
2
3
4

Comp/Node value legend:

dark (blue) with x => +1

dark (red) w/o x => -1 light (green) => 0

2. Distributed Storage of All Patterns:

1 2 -1 1 4 3
1
2
-1
1
4
3

1 node per pattern unit

Fully connected: clique

Weights = avg correlations across all patterns of the corresponding units

1 2 3 4
1
2
3
4
1 2 3 4
1
2
3
4
1 2 3 4
1
2
3
4
1 2 3 4
1
2
3
4

3. Retrieval

= avg correlations across all patterns of the corresponding units 1 2 3 4 1 2
= avg correlations across all patterns of the corresponding units 1 2 3 4 1 2
= avg correlations across all patterns of the corresponding units 1 2 3 4 1 2
= avg correlations across all patterns of the corresponding units 1 2 3 4 1 2
= avg correlations across all patterns of the corresponding units 1 2 3 4 1 2
= avg correlations across all patterns of the corresponding units 1 2 3 4 1 2
= avg correlations across all patterns of the corresponding units 1 2 3 4 1 2

1/10/2017

62

Hetero-Associative Memory

1. Hetero-Associative Patterns (Pairs) to Remember

3. Retrieval

1

 
a
a
b
b
1
1
 
a
a
b
b

2

2 2
2 2

2

2 2

3

3 3

3

2. Distributed Storage of All Patterns:

1 -1 a 2 1 b 3
1
-1
a
2
1
b
3

1 node per pattern unit for X & Y

Full inter-layer connection

Weights = avg correlations across all patterns of the corresponding units

X & Y • Full inter-layer connection • Weights = avg correlations across all patterns of
X & Y • Full inter-layer connection • Weights = avg correlations across all patterns of
X & Y • Full inter-layer connection • Weights = avg correlations across all patterns of

1/10/2017

63

The Hopfield Network

&

Bidirectional Associative Memory

1/10/2017

64

The Hopfield Network

John Hopfield formulated the physical principle of storing information in a dynamically stable network.

Auto-Association Network

Fully-connected (clique) with symmetric weights

State of node = f(inputs)

Weight values based on Hebbian principle

Performance: Must iterate a bit to converge on a pattern, but

generally much less computation than in back-propagation

networks.

1/10/2017

65

Input

1/10/2017 6 5 I n p u t Hopfield Networks Output (after many iterations

Hopfield Networks

1/10/2017 6 5 I n p u t Hopfield Networks Output (after many iterations
1/10/2017 6 5 I n p u t Hopfield Networks Output (after many iterations

Output (after many iterations

1/10/2017 6 5 I n p u t Hopfield Networks Output (after many iterations
1/10/2017 6 5 I n p u t Hopfield Networks Output (after many iterations

1/10/2017

66

The Hopfield network uses McCulloch and Pitts neurons with the sign activation function as its computing element:

Y sign

  1,

  1,

if

if

  Y , if

X

0

X

 0

X

 0

The current state of the Hopfield network is determined by the

current outputs of all neurons, y 1 , y 2 ,

Thus, for a single-layer n-neuron network, the state can be defined by the state vector as:

., y n .

Y

 

 

y

y

y

1

2

n

 

 

1/10/2017

67

1/10/2017 6 7 The Hopfield Network

The Hopfield Network

1/10/2017 6 7 The Hopfield Network

1/10/2017

68

1/10/2017 6 8

1/10/2017

69

1/10/2017 6 9

1/10/2017

70

Retrieval Algorithm

the output update rule for Hopfield autoassociative memory can be expressed in the form

Hopfield autoassociative memory can be expressed in the form where k is the index of recursion

where k is the index of recursion and i is the number of the neuron currently undergoing an update.

Asynchronous update sequence considered here is random. Assuming that recursion starts at v o , and a random

is chosen, the output

sequence of updating neurons m,p, q,

vectors obtained are as follows

at v o , and a random is chosen, the output sequence of updating neurons m,p,

1/10/2017

71

Storage Algorithm

In the Hopfield network, synaptic weights between neurons are

usually represented in matrix form.

Assume that the bipolar binary prototype vectors that need to be

.,p. The storage algorithm

stored

for calculating the weight matrix is

S (m) ),for m

= 1, 2,

are

the weight matrix is S ( m ) ),for m = 1, 2, are number of

number of states to be memorized by the

network, I is n*n identity matrix, and superscript t denotes matrix transposition.

where p

is

the

1/10/2017

72

Possible states for the three-neuron Hopfield network

y 2 (1, 1,  1) y 1 (  1,1,  1) (1, 1,

y 2

(1, 1, 1)

y 1

y 2 (1, 1,  1) y 1 (  1,1,  1) (1, 1, 1)

(1,1, 1)

(1, 1, 1) 0 (1,1,1) ( 1,1, 1) (1,1, 1)
(1, 1, 1)
0
(1,1,1)
( 1,1, 1)
(1,1, 1)

(1, 1, 1)

(1,1,1)

y 3

1/10/2017

73

Hopfield Network

The stable state-vertex is determined by the weight matrix W, the current input vector X, and the threshold matrix q. If the input

vector is partially incorrect or incomplete, the initial state will

converge into the stable state-vertex after a few iterations.

Hopfield Network Example: Suppose, for instance, that our

network is required to memorize two opposite states, (1, 1, 1) and

(-1, -1, -1). Thus,

1

1

 


  

1

Y

1

Y

2

  1

1

1

or

Y

T

1

1

1

1

T

Y 2

1

1

1

where Y 1 and Y 2 are the three-dimensional vectors.

1/10/2017

74

The 3 ´ 3 identity matrix I is

I

1

0

0

0

1

0

0

0

1

Thus, we can now determine the weight matrix as follows:

1  1 1 0 0 0 2       
1
1
1
0
0
0
2
 
 2