Sunteți pe pagina 1din 67

# Some Algorithms for Unsupervised Clustering

Jussi Tohka

## Some Algorithms for Unsupervised Clustering p.1/30

Outline
Unsupervised classication problem The K-means algorithm The EM-algorithm

## Unsupervised classication problem

Classication procedures that use labeled samples to train the classier are said to be supervised. Sometimes we do not have the training data. Classication procedures which use only unlabeled samples are said to be unsupervised. An unsupervised procedure must use the unlabeled input data to estimate the parameter values for the classication problem at the hand and also to classify the data.

## The K-means algorithm

An algorithm to estimate the unknown cluster centres (means) based on the data
    

Aims to minimize
 !  !#   !  !   " 

where

1. Initialize
%

%

\$ &%

according to the

 '

## Properties of the K-means algorithm

Simple A local optimization algorithm


The cost function is not be suitable for many problems. e.g. for separating clusters of different size. A generalization: Fuzzy K-means clustering.


14 12

10

10

12

14

14 14 12 12

10

10

10

12

14

10

12

14

14 14 12 12

10

10

10

12

14

10

12

14

14 12

10

10

12

14

14 14 12 12

10

10

10

12

14

10

12

14

14 14 12 12

10

10

10

12

14

10

12

14

14 14 12 12

10

10

10

12

14

10

12

14

14 12

10

10

12

14

14 14 12 12

10

10

10

12

14

10

12

14

14 14 12 12

10

10

10

12

14

10

12

14

14 12

10

10

12

14

14 14 12 12

10

10

10

12

14

10

12

14

14 14 12 12

10

10

10

12

14

10

12

14

14 12

10

10

12

14

14 14 12 12

10

10

10

12

14

10

12

14

14 14 12 12

10

10

10

12

14

10

12

14

14 12

10

10

12

14

14 14 12 12

10

10

10

12

14

10

12

14

14 14 12 12

10

10

10

12

14

10

12

14

14 12

10

10

12

14

14 14 12 12

10

10

10

12

14

10

12

14

14 14 12 12

10

10

10

12

14

10

12

14

14 12

10

10

12

14

14 14 12 12

10

10

10

12

14

10

12

14

14 14 12 12

10

10

10

12

14

10

12

14

14 12

10

10

12

14

14 14 12 12

10

10

10

12

14

10

12

14

14 14 12 12

10

10

10

12

14

10

12

14

14 12

10

10

12

14

14 14 12 12

10

10

10

12

14

10

12

14

14 14 12 12

10

10

10

12

14

10

12

14

14 12

10

10

12

14

14 14 12 12

10

10

10

12

14

10

12

14

14 14 12 12

10

10

10

12

14

10

12

14

14 12

10

10

12

14

14 14 12 12

10

10

10

12

14

10

12

14

14 14 12 12

10

10

10

12

14

10

12

14

Convergence!

## K-means experiment, results

14 14 12 12

10

10

10

12

14

10

12

14

True labels

Clustering result

14 14 12 12

10

10

10

12

14

10

12

14

True labels

## K-means cost function values

5.5 5 4.5 4 3.5 3 2.5 2 1.5 1 0.5 x 10
4

10

12

## K-means cost function values

5.5 5 4.5 4 3.5 3 2.5 2 1.5 1 0.5 x 10
4

10

12

400 300

200

100

100

200

300 4

10

12

14

True labels

400 400 300 300

200

200

100

100

100

100

200

200

300 4

10

12

14

300 4

10

12

14

True labels

K-means result

12 10

10

12

True labels

12 12 10 10

10

12

10

12

True labels

K-means result

## EM-algorithm for clustering

Aims to maximize the likelihood of the mixture density given the data, i.e.
   % !  2 !  3 1 5 6  % %  % 4%

where is the normal density with the mean and is the prior probability of the class .
% 5 3 % 7

4 

Maximization of the above likelihood can be done with an EM-algorithm. This algorithm is also described in the book but it is not called an EM-algorithm there.

## EM-algorithm for clustering

8 8

1. Initialize

\$ %

.
1 @9  %   9 69 39

2. (E-step) Compute the probabilities of belonging to the class based on the parameter values form the previous iteration.
7 % %

## 3. (M-step) Re-compute the parameter values:

   1 @9 % C @9 ( BA 1 @9 3 @9   % 

 1 @9

 

 

"

@9

 1 @9

"

## Some Algorithms for Unsupervised Clustering p.24/30

 1 @9

EM versus K-means
14 12

10

10

12

14

True labels

## Some Algorithms for Unsupervised Clustering p.25/30

EM versus K-means
14 14 12 12

10

10

10

12

14

10

12

14

True labels

K-means result

## Some Algorithms for Unsupervised Clustering p.25/30

EM versus K-means
14 14 12 12

10

10

10

12

14

10

12

14

True labels

EM result

## Some Algorithms for Unsupervised Clustering p.25/30

EM versus K-means
400 300

200

100

100

200

300 4

10

12

14

True labels

## Some Algorithms for Unsupervised Clustering p.26/30

EM versus K-means
400 400 300 300

200

200

100

100

100

100

200

200

300 4

10

12

14

300 4

10

12

14

True labels

K-means result

## Some Algorithms for Unsupervised Clustering p.26/30

EM versus K-means
400 400 300 300

200

200

100

100

100

100

200

200

300 4

10

12

14

300 4

10

12

14

True labels

EM result

## Some Algorithms for Unsupervised Clustering p.26/30

EM versus K-means
12 10

10

12

True labels

## Some Algorithms for Unsupervised Clustering p.27/30

EM versus K-means
12 12 10 10

10

12

10

12

True labels

K-means result

## Some Algorithms for Unsupervised Clustering p.27/30

EM versus K-means
12

11
10

10 9

8
6

7 6

5 4

3
0

2
0 2 4 6 8 10 12

10

12

True labels

EM result

## Some Algorithms for Unsupervised Clustering p.27/30

Application: MRI-segmentation

## Some Algorithms for Unsupervised Clustering p.28/30

Application: MRI-segmentation

## Some Algorithms for Unsupervised Clustering p.28/30

Application: MRI-segmentation

Ground truth

EM,T1+T2

K-means,T1+T2

## Some Algorithms for Unsupervised Clustering p.29/30

Application: MRI-segmentation

Ground truth

EM,T1+T2

EM, only T1

## Some Algorithms for Unsupervised Clustering p.29/30

Application: MRI-segmentation
5500 5000 4500 4000 3500 3000 2500 2000 1500 1000 500

100

200

300

400

500

600

700

800

900

1200

1800 1600

1000
1400

800

1200 1000

600
800

400

600 400

200
200

100

200

300

400

500

600

700

800

900

0 500

1000

1500

2000

2500

3000

3500

4000

4500

5000

5500