Documente Academic
Documente Profesional
Documente Cultură
=
=
0 1or =
i
Y
20
Network Activation
The unit with the highest field hi fires
i* is the winner unit
Geometrically is closest to the current
input vector
The winning units weight vector is updated
to be even closer to the current input vector
* i
W
21
Learning
Starting with small random weights, at each
step:
1. a new input vector is presented to the network
2. all fields are calculated to find a winner
3. is updated to be closer to the input
Using standard competitive learning equ.
* i
W
) (
* * j i j j i
W X W = A q
22
Result
Each output unit moves to the center of
mass of a cluster of input vectors
clustering
23
Competitive Learning, Cntd
It is important to break the symmetry in the
initial random weights
Final configuration depends on initialization
A winning unit has more chances of winning
the next time a similar input is seen
Some outputs may never fire
This can be compensated by updating the non
winning units with a smaller update
24
More about SOM learning
Upon repeated presentations of the training
examples, the weight vectors of the neurons
tend to follow the distribution of the
examples.
This results in a topological ordering of the
neurons, where neurons adjacent to each other
tend to have similar weight vectors.
The input space of patterns is mapped onto a
discrete output space of neurons.
25
SOM Learning Algorithm
1. Randomly initialise all weights
2. Select input vector x = [x
1
, x
2
, x
3
, , x
n
] from training set
3. Compare x with weights w
j
for each neuron j to
4. determine winner
find unit j with the minimum distance
5. Update winner so that it becomes more like x, together with
the winners neighbours for units within the radius
according to
6. Adjust parameters: learning rate & neighbourhood function
7. Repeat from (2) until ?
=
i
i ij j
x w d
2
) (
)] ( )[ ( ) ( ) 1 ( n w x n n w n w
ij i ij ij
+ = + q
1 ) 1 ( ) ( 0 s s < n n q q
Note that: Learning rate generally decreases
with time:
26
Example
An SOFM network with three inputs and two cluster units is to be
trained using the four training vectors:
[0.8 0.7 0.4], [0.6 0.9 0.9], [0.3 0.4 0.1], [0.1 0.1 02] and
initial weights
The initial radius is 0 and the learning rate is 0.5 . Calculate the
weight changes during the first cycle through the data, taking the
training vectors in the given order.
(
(
(
5 . 0 8 . 0
2 . 0 6 . 0
4 . 0 5 . 0
q
weights to the first
cluster unit
0.5
0.6
0.8
27
Solution
The Euclidian distance of the input vector 1 to cluster unit 1 is:
The Euclidian distance of the input vector 1 to cluster unit 2 is:
Input vector 1 is closest to cluster unit 1 so update weights to cluster unit 1:
( ) ( ) ( ) 26 . 0 4 . 0 8 . 0 7 . 0 6 . 0 8 . 0 5 . 0
2 2 2
1
= + + = d
( ) ( ) ( ) 42 . 0 4 . 0 5 . 0 7 . 0 2 . 0 8 . 0 4 . 0
2 2 2
2
= + + = d
) 8 . 0 4 . 0 ( 5 . 0 8 . 0 6 . 0
) 6 . 0 7 . 0 ( 5 . 0 6 . 0 65 . 0
) 5 . 0 8 . 0 ( 5 . 0 5 . 0 65 . 0
)] ( [ 5 . 0 ) ( ) 1 (
+ =
+ =
+ =
+ = + n w x n w n w
ij i ij ij
(
(
(
5 . 0 60 . 0
2 . 0 65 . 0
4 . 0 65 . 0
28
Solution
The Euclidian distance of the input vector 2 to cluster unit 1 is:
The Euclidian distance of the input vector 2 to cluster unit 2 is:
Input vector 2 is closest to cluster unit 1 so update weights to cluster unit 1 again:
( ) ( ) ( ) 155 . 0 9 . 0 6 . 0 9 . 0 65 . 0 6 . 0 65 . 0
2 2 2
1
= + + = d
( ) ( ) ( ) 69 . 0 9 . 0 5 . 0 9 . 0 2 . 0 6 . 0 4 . 0
2 2 2
2
= + + = d
) 60 . 0 9 . 0 ( 5 . 0 60 . 0 750 . 0
) 65 . 0 9 . 0 ( 5 . 0 65 . 0 775 . 0
) 65 . 0 6 . 0 ( 5 . 0 65 . 0 625 . 0
)] ( [ 5 . 0 ) ( ) 1 (
+ =
+ =
+ =
+ = + n w x n w n w
ij i ij ij
(
(
(
5 . 0 750 . 0
2 . 0 775 . 0
4 . 0 625 . 0
Repeat the same update procedure for input vector 3
and 4 also.
29
Neighborhood Function
Gaussian neighborhood function:
d
ji
: lateral distance of neurons i and j
in a 1-dimensional lattice | j - i |
in a 2-dimensional lattice || r
j
- r
i
||
where r
j
is the position of neuron j in the lattice.
|
|
.
|
\
|
=
2
2
2
exp ) (
o
ij
ij i
d
d h
30
N
13
(1)
N
13
(2)
31
Neighborhood Function
o measures the degree to which excited
neurons in the vicinity of the winning
neuron cooperate in the learning process.
In the learning algorithm o is updated at
each iteration during the ordering phase
using the following exponential decay
update rule, with parameters
|
.
|
\
|
=
1
0
exp ) (
T
n
n o o
32
Neighbourhood function
0
0.5
1
-
1
0
-
8
-
6
-
4
-
2 0 2 4 6 8
1
0
0
0.5
1
-
1
0
-
8
-
6
-
4
-
2 0 2 4 6 8
1
0
Degree of
neighbourhood
Distance from winner
Degree of
neighbourhood
Distance from winner
Time
Time
33
UPDATE RULE
( ) ) ( - x ) ( ) ( ) ( ) 1 (
) (
n w n h n n w n w
j x ij j j
q + = +
exponential decay update of the learning rate:
|
.
|
\
|
=
2
0
exp ) (
T
n
n q q
34
Illustration of learning for Kohonen maps
Inputs: coordinates (x,y) of points
drawn from a square
Display neuron j at position x
j
,y
j
where its s
j
is maximum
Random initial positions
100 inputs 200 inputs
1000 inputs
x
y
35
Two-phases learning approach
Self-organizing or ordering phase. The learning rate
and spread of the Gaussian neighborhood function
are adapted during the execution of SOM, using for
instance the exponential decay update rule.
Convergence phase. The learning rate and Gaussian
spread have small fixed values during the execution
of SOM.
36
Ordering Phase
Self organizing or ordering phase:
Topological ordering of weight vectors.
May take 1000 or more iterations of SOM algorithm.
Important choice of the parameter values. For instance
q(n): q
0
= 0.1 T
2
= 1000
decrease gradually q(n) > 0.01
h
ji(x)
(n): o
0
big enough T
1
=
With this parameter setting initially the neighborhood of
the winning neuron includes almost all neurons in the
network, then it shrinks slowly with time.
1000
log (o
0
)
37
Convergence Phase
Convergence phase:
Fine tune the weight vectors.
Must be at least 500 times the number of neurons in
the network thousands or tens of thousands of
iterations.
Choice of parameter values:
q(n) maintained on the order of 0.01.
Neighborhood function such that the neighbor of the
winning neuron contains only the nearest neighbors.
It eventually reduces to one or zero neighboring
neurons.
38
39
Another Self-Organizing Map
(SOM) Example
From Fausett (1994)
n = 4, m = 2
More typical of SOM application
Smaller number of units in output than in input;
dimensionality reduction
Training samples
i1: (1, 1, 0, 0)
i2: (0, 0, 0, 1)
i3: (1, 0, 0, 0)
i4: (0, 0, 1, 1)
Input units:
Output units: 1 2
What should we expect as outputs?
Network Architecture
40
What are the Euclidean Distances
Between the Data Samples?
Training samples
i1: (1, 1, 0, 0)
i2: (0, 0, 0, 1)
i3: (1, 0, 0, 0)
i4: (0, 0, 1, 1)
i1 i2 i3 i4
i1 0
i2 0
i3 0
i4 0
41
Euclidean Distances Between Data
Samples
Training samples
i1: (1, 1, 0, 0)
i2: (0, 0, 0, 1)
i3: (1, 0, 0, 0)
i4: (0, 0, 1, 1)
i1 i2 i3 i4
i1 0
i2 3 0
i3 1 2 0
i4 4 1 3 0
Input units:
Output units: 1 2 What might we expect from the SOM?
42
Example Details
Training samples
i1: (1, 1, 0, 0)
i2: (0, 0, 0, 1)
i3: (1, 0, 0, 0)
i4: (0, 0, 1, 1)
With only 2 outputs, neighborhood = 0
Only update weights associated with winning output unit (cluster) at each
iteration
Learning rate
q(t) = 0.6; 1 <= t <= 4
q(t) = 0.5 q(1); 5 <= t <= 8
q(t) = 0.5 q(5); 9 <= t <= 12
etc.
Initial weight matrix
(random values between 0 and 1)
Input units:
Output units: 1 2
(
3 . 7 . 4 . 8 .
9 . 5 . 6 . 2 .
2
1
, ,
)) ( (
=
n
k
k j k l
t w i
)) ( )( ( ) ( ) 1 ( t w i t t w t w
j l j j
+ = + q
d
2
= (Euclidean distance)
2
=
Weight update:
Unit 1:
Unit 2:
Problem: Calculate the weight updates for the first four steps
43
First Weight Update
Training sample: i1
Unit 1 weights
d
2
= (.2-1)
2
+ (.6-1)
2
+ (.5-0)
2
+ (.9-0)
2
= 1.86
Unit 2 weights
d
2
= (.8-1)
2
+ (.4-1)
2
+ (.7-0)
2
+ (.3-0)
2
= .98
Unit 2 wins
Weights on winning unit are updated
Giving an updated weight matrix:
(
3 . 7 . 4 . 8 .
9 . 5 . 6 . 2 .
Unit 1:
Unit 2:
i1: (1, 1, 0, 0)
i2: (0, 0, 0, 1)
i3: (1, 0, 0, 0)
i4: (0, 0, 1, 1)
]) 3 . 7 . 4 . 8 . [ - 0] 0 1 [1 ( 6 . 0 ] 3 . 7 . 4 . 8 . [ 2 = + = weights unit new
.12] .28 .76 [.92
(
12 .
9 .
28 .
5 .
76 .
6 .
92 .
2 .
Unit 1:
Unit 2:
44
Second Weight Update
Training sample: i2
Unit 1 weights
d
2
= (.2-0)
2
+ (.6-0)
2
+ (.5-0)
2
+ (.9-1)
2
= .66
Unit 2 weights
d
2
= (.92-0)
2
+ (.76-0)
2
+ (.28-0)
2
+ (.12-1)
2
= 2.28
Unit 1 wins
Weights on winning unit are updated
Giving an updated weight matrix:
Unit 1:
Unit 2:
i1: (1, 1, 0, 0)
i2: (0, 0, 0, 1)
i3: (1, 0, 0, 0)
i4: (0, 0, 1, 1)
]) 9 . 5 . 6 . 2 . [ - 1] 0 0 [0 ( 6 . 0 ] 9 . 5 . 6 . 2 . [ 1 = + = weights unit new
.96] .20 .24 [.08
Unit 1:
Unit 2:
(
12 .
9 .
28 .
5 .
76 .
6 .
92 .
2 .
(
12 .
96 .
28 .
20 .
76 .
24 .
92 .
08 .
45
Third Weight Update
Training sample: i3
Unit 1 weights
d
2
= (.08-1)
2
+ (.24-0)
2
+ (.2-0)
2
+ (.96-0)
2
= 1.87
Unit 2 weights
d
2
= (.92-1)
2
+ (.76-0)
2
+ (.28-0)
2
+ (.12-0)
2
= 0.68
Unit 2 wins
Weights on winning unit are updated
Giving an updated weight matrix:
Unit 1:
Unit 2:
i1: (1, 1, 0, 0)
i2: (0, 0, 0, 1)
i3: (1, 0, 0, 0)
i4: (0, 0, 1, 1)
]) 12 . 28 . 76 . 92 . [ - 0] 0 0 [1 ( 6 . 0 ] 12 . 28 . 76 . 92 . [ 2 = + = weights unit new
.05] .11 .30 [.97
Unit 1:
Unit 2:
(
12 .
96 .
28 .
20 .
76 .
24 .
92 .
08 .
(
05 .
96 .
11 .
20 .
30 .
24 .
97 .
08 .
46
Fourth Weight Update
Training sample: i4
Unit 1 weights
d
2
= (.08-0)
2
+ (.24-0)
2
+ (.2-1)
2
+ (.96-1)
2
= .71
Unit 2 weights
d
2
= (.97-0)
2
+ (.30-0)
2
+ (.11-1)
2
+ (.05-1)
2
= 2.74
Unit 1 wins
Weights on winning unit are updated
Giving an updated weight matrix:
Unit 1:
Unit 2:
i1: (1, 1, 0, 0)
i2: (0, 0, 0, 1)
i3: (1, 0, 0, 0)
i4: (0, 0, 1, 1)
]) 96 . 20 . 24 . 08 . [ - 1] 1 0 [0 ( 6 . 0 ] 96 . 20 . 24 . 08 . [ 1 = + = weights unit new
.98] .68 .10 [.03
Unit 1:
Unit 2:
(
05 .
98 .
11 .
68 .
30 .
10 .
97 .
03 .
(
05 .
96 .
11 .
20 .
30 .
24 .
97 .
08 .
47
Applying the SOM Algorithm
time (t) 1 2 3 4 D(t)
q(t)
1 Unit 2 0 0.6
2 Unit 1 0 0.6
3 Unit 2 0 0.6
4 Unit 1 0 0.6
Data sample utilized
winning output unit
Unit 1:
Unit 2:
(
0
0 . 1
0
5 .
5 .
0
0 . 1
0
After many iterations (epochs)
through the data set:
Did we get the clustering that we expected?
48
What clusters do the
data samples fall into?
Unit 1:
Unit 2:
(
0
0 . 1
0
5 .
5 .
0
0 . 1
0
Weights
Input units:
Output units: 1 2
Training samples
i1: (1, 1, 0, 0)
i2: (0, 0, 0, 1)
i3: (1, 0, 0, 0)
i4: (0, 0, 1, 1)
49
Solution
Sample: i1
Distance from unit1 weights
(1-0)
2
+ (1-0)
2
+ (0-.5)
2
+ (0-1.0)
2
= 1+1+.25+1=3.25
Distance from unit2 weights
(1-1)
2
+ (1-.5)
2
+ (0-0)
2
+ (0-0)
2
= 0+.25+0+0=.25 (winner)
Sample: i2
Distance from unit1 weights
(0-0)
2
+ (0-0)
2
+ (0-.5)
2
+ (1-1.0)
2
= 0+0+.25+0 (winner)
Distance from unit2 weights
(0-1)
2
+ (0-.5)
2
+ (0-0)
2
+ (1-0)
2
=1+.25+0+1=2.25
Unit 1:
Unit 2:
(
0
0 . 1
0
5 .
5 .
0
0 . 1
0
Weights
Input units:
Output units: 1 2
Training samples
i1: (1, 1, 0, 0)
i2: (0, 0, 0, 1)
i3: (1, 0, 0, 0)
i4: (0, 0, 1, 1)
2
1
, ,
)) ( (
=
n
k
k j k l
t w i
d
2
= (Euclidean distance)
2
=
50
Solution
Sample: i3
Distance from unit1 weights
(1-0)
2
+ (0-0)
2
+ (0-.5)
2
+ (0-1.0)
2
= 1+0+.25+1=2.25
Distance from unit2 weights
(1-1)
2
+ (0-.5)
2
+ (0-0)
2
+ (0-0)
2
= 0+.25+0+0=.25 (winner)
Sample: i4
Distance from unit1 weights
(0-0)
2
+ (0-0)
2
+ (1-.5)
2
+ (1-1.0)
2
= 0+0+.25+0 (winner)
Distance from unit2 weights
(0-1)
2
+ (0-.5)
2
+ (1-0)
2
+ (1-0)
2
= 1+.25+1+1=3.25
Unit 1:
Unit 2:
(
0
0 . 1
0
5 .
5 .
0
0 . 1
0
Weights
Input units:
Output units: 1 2
Training samples
i1: (1, 1, 0, 0)
i2: (0, 0, 0, 1)
i3: (1, 0, 0, 0)
i4: (0, 0, 1, 1)
2
1
, ,
)) ( (
=
n
k
k j k l
t w i
d
2
= (Euclidean distance)
2
=
51
Word categories
52
Examples of Applications
Kohonen (1984). Speech recognition - a map
of phonemes in the Finish language
Optical character recognition - clustering of
letters of different fonts
Angeliol etal (1988) travelling salesman
problem (an optimization problem)
Kohonen (1990) learning vector quantization
(pattern classification problem)
Ritter & Kohonen (1989) semantic maps
53
Summary
Unsupervised learning is very common
US learning requires redundancy in the stimuli
Self organization is a basic property of the brains
computational structure
SOMs are based on
competition (wta units)
cooperation
synaptic adaptation
SOMs conserve topological relationships between
the stimuli
Artificial SOMs have many applications in
computational neuroscience
54
End of slides