Demo

Some Algorithms for Unsupervised Clustering
Jussi Tohka
Some Algorithms for Unsupervised Clustering p.1/30
Outline
Unsupervised classication problem The K-means algorithm The EM-algorithm
Unsupervised classication problem

Classication procedures that use labeled samples to train the classier are said to be supervised. Sometimes we do not have the training data. Classication procedures which use only unlabeled samples are said to be unsupervised. An unsupervised procedure must use the unlabeled input data to estimate the parameter values for the classication problem at the hand and also to classify the data.
The K-means algorithm

An algorithm to estimate the unknown cluster centres (means) based on the data

Aims to minimize
! !# ! ! "
where
is the closest cluster centre to
The K-means algorithm

1. Initialize
%
2. Classify all the data nearest . 3. recompute each

%
$ &%
according to the
'
4. If no change than terminate, otherwise go to step 2.
Properties of the K-means algorithm

Simple A local optimization algorithm

The cost function is not be suitable for many problems. e.g. for separating clusters of different size. A generalization: Fuzzy K-means clustering.

The K-means algorithm: An experiment

14 12
10
10
12
14

14 14 12 12
10
10
10
12
14
10
12
14

14 14 12 12
10
10
10
12
14
10
12
14
K-means experiment, iteration 1

14 12
10
10
12
14

14 14 12 12
10
10
10
12
14
10
12
14

14 14 12 12
10
10
10
12
14
10
12
14

14 14 12 12
10
10
10
12
14
10
12
14

14 12
10
10
12
14

14 14 12 12
10
10
10
12
14
10
12
14

14 14 12 12
10
10
10
12
14
10
12
14

14 12
10
10
12
14

14 14 12 12
10
10
10
12
14
10
12
14

14 14 12 12
10
10
10
12
14
10
12
14

14 12
10
10
12
14

14 14 12 12
10
10
10
12
14
10
12
14

14 14 12 12
10
10
10
12
14
10
12
14

14 12
10
10
12
14

14 14 12 12
10
10
10
12
14
10
12
14

14 14 12 12
10
10
10
12
14
10
12
14

14 12
10
10
12
14

14 14 12 12
10
10
10
12
14
10
12
14

14 14 12 12
10
10
10
12
14
10
12
14

14 12
10
10
12
14

14 14 12 12
10
10
10
12
14
10
12
14

14 14 12 12
10
10
10
12
14
10
12
14

14 12
10
10
12
14

14 14 12 12
10
10
10
12
14
10
12
14

14 14 12 12
10
10
10
12
14
10
12
14

14 12
10
10
12
14

14 14 12 12
10
10
10
12
14
10
12
14

14 14 12 12
10
10
10
12
14
10
12
14

14 12
10
10
12
14

14 14 12 12
10
10
10
12
14
10
12
14

14 14 12 12
10
10
10
12
14
10
12
14

14 12
10
10
12
14

14 14 12 12
10
10
10
12
14
10
12
14

14 14 12 12
10
10
10
12
14
10
12
14
Labels did not change
Convergence!
K-means experiment, results

14 14 12 12
10
10
10
12
14
10
12
14
True labels
Clustering result
K-means experiment, results

14 14 12 12
10
10
10
12
14
10
12
14
True labels
Errors, Error rate 1 %
K-means cost function values

5.5 5 4.5 4 3.5 3 2.5 2 1.5 1 0.5 x 10
4
10
12
K-means cost function values

5.5 5 4.5 4 3.5 3 2.5 2 1.5 1 0.5 x 10
4
10
12
Examples where K-means doesnt work

400 300
200
100
100
200
300 4
10
12
14
True labels

400 400 300 300
200
200
100
100
100
100
200
200
300 4
10
12
14
300 4
10
12
14
True labels
K-means result

12 10
10
12
True labels

12 12 10 10
10
12
10
12
True labels
K-means result
EM-algorithm for clustering

Aims to maximize the likelihood of the mixture density given the data, i.e.
% ! 2 ! 3 1 5 6 % % % 4%
where is the normal density with the mean and is the prior probability of the class .
% 5 3 % 7
4
Maximization of the above likelihood can be done with an EM-algorithm. This algorithm is also described in the book but it is not called an EM-algorithm there.
EM-algorithm for clustering

8 8
1. Initialize
$ %
.
1 @9 % 9 69 39
2. (E-step) Compute the probabilities of belonging to the class based on the parameter values form the previous iteration.
7 % %
3. (M-step) Re-compute the parameter values:

1 @9 % C @9 ( BA 1 @9 3 @9 %
1 @9

"
@9
4. Terminate if converged, else go to step 2.
1 @9
"
1 @9
EM versus K-means
14 12
10
10
12
14
True labels
EM versus K-means
14 14 12 12
10
10
10
12
14
10
12
14
True labels
K-means result
EM versus K-means
14 14 12 12
10
10
10
12
14
10
12
14
True labels
EM result
EM versus K-means
400 300
200
100
100
200
300 4
10
12
14
True labels
EM versus K-means
400 400 300 300
200
200
100
100
100
100
200
200
300 4
10
12
14
300 4
10
12
14
True labels
K-means result
EM versus K-means
400 400 300 300
200
200
100
100
100
100
200
200
300 4
10
12
14
300 4
10
12
14
True labels
EM result
EM versus K-means
12 10
10
12
True labels
EM versus K-means
12 12 10 10
10
12
10
12
True labels
K-means result
EM versus K-means
12
11
10
10 9
8
6
7 6
5 4
3
0
2
0 2 4 6 8 10 12
10
12
True labels
EM result
Application: MRI-segmentation
Ground truth
EM,T1+T2
K-means,T1+T2
Ground truth
EM,T1+T2
EM, only T1
5500 5000 4500 4000 3500 3000 2500 2000 1500 1000 500
100
200
300
400
500
600
700
800
900
1200
1800 1600
1000
1400
800
1200 1000
600
800
400
600 400
200
200
100
200
300
400
500
600
700
800
900
0 500
1000
1500
2000
2500
3000
3500
4000
4500
5000
5500

Demo

Încărcat de

Informații document

Descriere originală:

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Demo

Încărcat de

Drepturi de autor:

Formate disponibile

Some Algorithms for Unsupervised Clustering

Some Algorithms for Unsupervised Clustering p.1/30

Some Algorithms for Unsupervised Clustering p.2/30

Unsupervised classication problem

Some Algorithms for Unsupervised Clustering p.3/30

The K-means algorithm

is the closest cluster centre to

Some Algorithms for Unsupervised Clustering p.4/30

The K-means algorithm

2. Classify all the data nearest . 3. recompute each

4. If no change than terminate, otherwise go to step 2.

Some Algorithms for Unsupervised Clustering p.5/30

Properties of the K-means algorithm

Some Algorithms for Unsupervised Clustering p.6/30

The K-means algorithm: An experiment

Some Algorithms for Unsupervised Clustering p.7/30

The K-means algorithm: An experiment

Some Algorithms for Unsupervised Clustering p.7/30

The K-means algorithm: An experiment

Some Algorithms for Unsupervised Clustering p.7/30

K-means experiment, iteration 1

Some Algorithms for Unsupervised Clustering p.8/30

K-means experiment, iteration 1

Some Algorithms for Unsupervised Clustering p.8/30

K-means experiment, iteration 1

Some Algorithms for Unsupervised Clustering p.8/30

K-means experiment, iteration 1

Some Algorithms for Unsupervised Clustering p.8/30

K-means experiment, iteration 2

Some Algorithms for Unsupervised Clustering p.9/30

K-means experiment, iteration 2

Some Algorithms for Unsupervised Clustering p.9/30

K-means experiment, iteration 2

Some Algorithms for Unsupervised Clustering p.9/30

K-means experiment, iteration 3

Some Algorithms for Unsupervised Clustering p.10/30

K-means experiment, iteration 3

Some Algorithms for Unsupervised Clustering p.10/30

K-means experiment, iteration 3

Some Algorithms for Unsupervised Clustering p.10/30

K-means experiment, iteration 4

Some Algorithms for Unsupervised Clustering p.11/30

K-means experiment, iteration 4

Some Algorithms for Unsupervised Clustering p.11/30

K-means experiment, iteration 4

Some Algorithms for Unsupervised Clustering p.11/30

K-means experiment, iteration 5

Some Algorithms for Unsupervised Clustering p.12/30

K-means experiment, iteration 5

Some Algorithms for Unsupervised Clustering p.12/30

K-means experiment, iteration 5

Some Algorithms for Unsupervised Clustering p.12/30

K-means experiment, iteration 6

Some Algorithms for Unsupervised Clustering p.13/30

K-means experiment, iteration 6

Some Algorithms for Unsupervised Clustering p.13/30

K-means experiment, iteration 6

Some Algorithms for Unsupervised Clustering p.13/30

K-means experiment, iteration 7

Some Algorithms for Unsupervised Clustering p.14/30

K-means experiment, iteration 7

Some Algorithms for Unsupervised Clustering p.14/30

K-means experiment, iteration 7