Sunteți pe pagina 1din 67

Some Algorithms for Unsupervised Clustering

Jussi Tohka

Some Algorithms for Unsupervised Clustering p.1/30

Outline
Unsupervised classication problem The K-means algorithm The EM-algorithm

Some Algorithms for Unsupervised Clustering p.2/30

Unsupervised classication problem


Classication procedures that use labeled samples to train the classier are said to be supervised. Sometimes we do not have the training data. Classication procedures which use only unlabeled samples are said to be unsupervised. An unsupervised procedure must use the unlabeled input data to estimate the parameter values for the classication problem at the hand and also to classify the data.

Some Algorithms for Unsupervised Clustering p.3/30

The K-means algorithm


An algorithm to estimate the unknown cluster centres (means) based on the data
    

Aims to minimize
 !  !#   !  !   " 

where

is the closest cluster centre to

Some Algorithms for Unsupervised Clustering p.4/30

The K-means algorithm


1. Initialize
%

2. Classify all the data nearest . 3. recompute each


%

$ &%

according to the

 '

4. If no change than terminate, otherwise go to step 2.

Some Algorithms for Unsupervised Clustering p.5/30

Properties of the K-means algorithm


Simple A local optimization algorithm


The cost function is not be suitable for many problems. e.g. for separating clusters of different size. A generalization: Fuzzy K-means clustering.


Some Algorithms for Unsupervised Clustering p.6/30

The K-means algorithm: An experiment


14 12

10

10

12

14

Some Algorithms for Unsupervised Clustering p.7/30

The K-means algorithm: An experiment


14 14 12 12

10

10

10

12

14

10

12

14

Some Algorithms for Unsupervised Clustering p.7/30

The K-means algorithm: An experiment


14 14 12 12

10

10

10

12

14

10

12

14

Some Algorithms for Unsupervised Clustering p.7/30

K-means experiment, iteration 1


14 12

10

10

12

14

Some Algorithms for Unsupervised Clustering p.8/30

K-means experiment, iteration 1


14 14 12 12

10

10

10

12

14

10

12

14

Some Algorithms for Unsupervised Clustering p.8/30

K-means experiment, iteration 1


14 14 12 12

10

10

10

12

14

10

12

14

Some Algorithms for Unsupervised Clustering p.8/30

K-means experiment, iteration 1


14 14 12 12

10

10

10

12

14

10

12

14

Some Algorithms for Unsupervised Clustering p.8/30

K-means experiment, iteration 2


14 12

10

10

12

14

Some Algorithms for Unsupervised Clustering p.9/30

K-means experiment, iteration 2


14 14 12 12

10

10

10

12

14

10

12

14

Some Algorithms for Unsupervised Clustering p.9/30

K-means experiment, iteration 2


14 14 12 12

10

10

10

12

14

10

12

14

Some Algorithms for Unsupervised Clustering p.9/30

K-means experiment, iteration 3


14 12

10

10

12

14

Some Algorithms for Unsupervised Clustering p.10/30

K-means experiment, iteration 3


14 14 12 12

10

10

10

12

14

10

12

14

Some Algorithms for Unsupervised Clustering p.10/30

K-means experiment, iteration 3


14 14 12 12

10

10

10

12

14

10

12

14

Some Algorithms for Unsupervised Clustering p.10/30

K-means experiment, iteration 4


14 12

10

10

12

14

Some Algorithms for Unsupervised Clustering p.11/30

K-means experiment, iteration 4


14 14 12 12

10

10

10

12

14

10

12

14

Some Algorithms for Unsupervised Clustering p.11/30

K-means experiment, iteration 4


14 14 12 12

10

10

10

12

14

10

12

14

Some Algorithms for Unsupervised Clustering p.11/30

K-means experiment, iteration 5


14 12

10

10

12

14

Some Algorithms for Unsupervised Clustering p.12/30

K-means experiment, iteration 5


14 14 12 12

10

10

10

12

14

10

12

14

Some Algorithms for Unsupervised Clustering p.12/30

K-means experiment, iteration 5


14 14 12 12

10

10

10

12

14

10

12

14

Some Algorithms for Unsupervised Clustering p.12/30

K-means experiment, iteration 6


14 12

10

10

12

14

Some Algorithms for Unsupervised Clustering p.13/30

K-means experiment, iteration 6


14 14 12 12

10

10

10

12

14

10

12

14

Some Algorithms for Unsupervised Clustering p.13/30

K-means experiment, iteration 6


14 14 12 12

10

10

10

12

14

10

12

14

Some Algorithms for Unsupervised Clustering p.13/30

K-means experiment, iteration 7


14 12

10

10

12

14

Some Algorithms for Unsupervised Clustering p.14/30

K-means experiment, iteration 7


14 14 12 12

10

10

10

12

14

10

12

14

Some Algorithms for Unsupervised Clustering p.14/30

K-means experiment, iteration 7


14 14 12 12

10

10

10

12

14

10

12

14

Some Algorithms for Unsupervised Clustering p.14/30

K-means experiment, iteration 8


14 12

10

10

12

14

Some Algorithms for Unsupervised Clustering p.15/30

K-means experiment, iteration 8


14 14 12 12

10

10

10

12

14

10

12

14

Some Algorithms for Unsupervised Clustering p.15/30

K-means experiment, iteration 8


14 14 12 12

10

10

10

12

14

10

12

14

Some Algorithms for Unsupervised Clustering p.15/30

K-means experiment, iteration 9


14 12

10

10

12

14

Some Algorithms for Unsupervised Clustering p.16/30

K-means experiment, iteration 9


14 14 12 12

10

10

10

12

14

10

12

14

Some Algorithms for Unsupervised Clustering p.16/30

K-means experiment, iteration 9


14 14 12 12

10

10

10

12

14

10

12

14

Some Algorithms for Unsupervised Clustering p.16/30

K-means experiment, iteration 10


14 12

10

10

12

14

Some Algorithms for Unsupervised Clustering p.17/30

K-means experiment, iteration 10


14 14 12 12

10

10

10

12

14

10

12

14

Some Algorithms for Unsupervised Clustering p.17/30

K-means experiment, iteration 10


14 14 12 12

10

10

10

12

14

10

12

14

Some Algorithms for Unsupervised Clustering p.17/30

K-means experiment, iteration 11


14 12

10

10

12

14

Some Algorithms for Unsupervised Clustering p.18/30

K-means experiment, iteration 11


14 14 12 12

10

10

10

12

14

10

12

14

Some Algorithms for Unsupervised Clustering p.18/30

K-means experiment, iteration 11


14 14 12 12

10

10

10

12

14

10

12

14

Labels did not change

Convergence!

Some Algorithms for Unsupervised Clustering p.18/30

K-means experiment, results


14 14 12 12

10

10

10

12

14

10

12

14

True labels

Clustering result

Some Algorithms for Unsupervised Clustering p.19/30

K-means experiment, results


14 14 12 12

10

10

10

12

14

10

12

14

True labels

Errors, Error rate 1 %

Some Algorithms for Unsupervised Clustering p.19/30

K-means cost function values


5.5 5 4.5 4 3.5 3 2.5 2 1.5 1 0.5 x 10
4

10

12

Some Algorithms for Unsupervised Clustering p.20/30

K-means cost function values


5.5 5 4.5 4 3.5 3 2.5 2 1.5 1 0.5 x 10
4

10

12

Some Algorithms for Unsupervised Clustering p.20/30

Examples where K-means doesnt work


400 300

200

100

100

200

300 4

10

12

14

True labels

Some Algorithms for Unsupervised Clustering p.21/30

Examples where K-means doesnt work


400 400 300 300

200

200

100

100

100

100

200

200

300 4

10

12

14

300 4

10

12

14

True labels

K-means result

Some Algorithms for Unsupervised Clustering p.21/30

Examples where K-means doesnt work


12 10

10

12

True labels

Some Algorithms for Unsupervised Clustering p.22/30

Examples where K-means doesnt work


12 12 10 10

10

12

10

12

True labels

K-means result

Some Algorithms for Unsupervised Clustering p.22/30

EM-algorithm for clustering


Aims to maximize the likelihood of the mixture density given the data, i.e.
   % !  2 !  3 1 5 6  % %  % 4%

where is the normal density with the mean and is the prior probability of the class .
% 5 3 % 7

4 

Maximization of the above likelihood can be done with an EM-algorithm. This algorithm is also described in the book but it is not called an EM-algorithm there.

Some Algorithms for Unsupervised Clustering p.23/30

EM-algorithm for clustering


8 8

1. Initialize

$ %

.
1 @9  %   9 69 39

2. (E-step) Compute the probabilities of belonging to the class based on the parameter values form the previous iteration.
7 % %

3. (M-step) Re-compute the parameter values:


   1 @9 % C @9 ( BA 1 @9 3 @9   % 

 1 @9

 

 

"

@9

4. Terminate if converged, else go to step 2.

 1 @9

"

Some Algorithms for Unsupervised Clustering p.24/30

 1 @9

EM versus K-means
14 12

10

10

12

14

True labels

Some Algorithms for Unsupervised Clustering p.25/30

EM versus K-means
14 14 12 12

10

10

10

12

14

10

12

14

True labels

K-means result

Some Algorithms for Unsupervised Clustering p.25/30

EM versus K-means
14 14 12 12

10

10

10

12

14

10

12

14

True labels

EM result

Some Algorithms for Unsupervised Clustering p.25/30

EM versus K-means
400 300

200

100

100

200

300 4

10

12

14

True labels

Some Algorithms for Unsupervised Clustering p.26/30

EM versus K-means
400 400 300 300

200

200

100

100

100

100

200

200

300 4

10

12

14

300 4

10

12

14

True labels

K-means result

Some Algorithms for Unsupervised Clustering p.26/30

EM versus K-means
400 400 300 300

200

200

100

100

100

100

200

200

300 4

10

12

14

300 4

10

12

14

True labels

EM result

Some Algorithms for Unsupervised Clustering p.26/30

EM versus K-means
12 10

10

12

True labels

Some Algorithms for Unsupervised Clustering p.27/30

EM versus K-means
12 12 10 10

10

12

10

12

True labels

K-means result

Some Algorithms for Unsupervised Clustering p.27/30

EM versus K-means
12

11
10

10 9

8
6

7 6

5 4

3
0

2
0 2 4 6 8 10 12

10

12

True labels

EM result

Some Algorithms for Unsupervised Clustering p.27/30

Application: MRI-segmentation

Some Algorithms for Unsupervised Clustering p.28/30

Application: MRI-segmentation

Some Algorithms for Unsupervised Clustering p.28/30

Application: MRI-segmentation

Ground truth

EM,T1+T2

K-means,T1+T2

Some Algorithms for Unsupervised Clustering p.29/30

Application: MRI-segmentation

Ground truth

EM,T1+T2

EM, only T1

Some Algorithms for Unsupervised Clustering p.29/30

Application: MRI-segmentation
5500 5000 4500 4000 3500 3000 2500 2000 1500 1000 500

100

200

300

400

500

600

700

800

900

1200

1800 1600

1000
1400

800

1200 1000

600
800

400

600 400

200
200

100

200

300

400

500

600

700

800

900

0 500

1000

1500

2000

2500

3000

3500

4000

4500

5000

5500

Some Algorithms for Unsupervised Clustering p.30/30