Documente Academic
Documente Profesional
Documente Cultură
1 / 35
Unsupervised Learning: Hierarchical Methods
Road Map
1 Basic Concepts
2 BIRCH
3 ROCK
2 / 35
Unsupervised Learning: Hierarchical Methods
The Principle
3 / 35
Unsupervised Learning: Hierarchical Methods
4 / 35
Unsupervised Learning: Hierarchical Methods
Example
5 / 35
Unsupervised Learning: Hierarchical Methods
Example
1 AGNES
Clusters C1 and C2 may be merged
if an object in C1 and an object in C2 form the minimum Euclidean
distance between any two objects from different clusters
2 DIANA
A cluster is split according to some principle
e.g., the maximum Euclidian distance between the closest
neighboring objects in the cluster
6 / 35
Unsupervised Learning: Hierarchical Methods
Use cases
1 An algorithm that uses the minimum distance to measure the
distance between clusters is called sometimes nearest-neighbor
clustering algorithm
2 If the clustering process terminates when the minimum distance
between nearest clusters exceeds an arbitrary threshold, it is
called single-linkage algorithm
3 An agglomerative algorithm that uses the minimum distance
measure is also called minimal spanning tree algorithm
7 / 35
Unsupervised Learning: Hierarchical Methods
Use cases
1 An algorithm that uses the maximum distance to measure the
distance between clusters is called sometimes farthest-neighbor
clustering algorithm
2 If the clustering process terminates when the maximum distance
between nearest clusters exceeds an arbitrary threshold, it is
called complete-linkage algorithm
8 / 35
Unsupervised Learning: Hierarchical Methods
Minimum and maximum distances are extreme implying that they are
overly sensitive to outliers or noisy data
10 / 35
Unsupervised Learning: Hierarchical Methods
Road Map
1 Basic Concepts
2 BIRCH
3 ROCK
11 / 35
Unsupervised Learning: Hierarchical Methods
BIRCH
12 / 35
Unsupervised Learning: Hierarchical Methods
Data Objects
1
⇒2
3
4
5
6
13 / 35
Unsupervised Learning: Hierarchical Methods
Data Objects
1
⇒2
3
4
5
6
14 / 35
Unsupervised Learning: Hierarchical Methods
Data Objects
1
2
⇒3
4
5
6
15 / 35
Unsupervised Learning: Hierarchical Methods
Data Objects
1
2
⇒3
4
5
6
16 / 35
Unsupervised Learning: Hierarchical Methods
Data Objects
1
2
3
⇒4
5
6
17 / 35
Unsupervised Learning: Hierarchical Methods
Data Objects
1
2
3
4
⇒5
6
entry2 is the closest to object 5
Cluster 3 becomes too large by adding object 5 then split
cluster 3?
BUT there is a limit to the number of entries a node can have
Thus, split the node
18 / 35
Unsupervised Learning: Hierarchical Methods
Data Objects
1
2
3
4
⇒5
6
19 / 35
Unsupervised Learning: Hierarchical Methods
Data Objects
1
2
3
4
5
⇒6
20 / 35
Unsupervised Learning: Hierarchical Methods
21 / 35
Unsupervised Learning: Hierarchical Methods
Clustering Feature
Clustering Feature (CF): CF = (N , LS , SS )
22 / 35
Unsupervised Learning: Hierarchical Methods
23 / 35
Unsupervised Learning: Hierarchical Methods
Distance Measures
24 / 35
Unsupervised Learning: Hierarchical Methods
CF Tree
25 / 35
Unsupervised Learning: Hierarchical Methods
CF Tree Insertion
26 / 35
Unsupervised Learning: Hierarchical Methods
BIRCH Algorithm
Data
⇓
Phase 1. Load into memory by building a CF tree
⇓
Initial CF tree
⇓
Phase 2 (optional). Condense tree into desirable range by building a smaller CF tree
⇓
Smaller CF tree
⇓
Phase 3. Global Clustering
⇓
Good Clusters
⇓
Phase 4 (optional and offline). Cluster Refining
⇓
Better Clusters
27 / 35
Unsupervised Learning: Hierarchical Methods
28 / 35
Unsupervised Learning: Hierarchical Methods
Phase 2
A bridge between phase 1 and phase 3
Builds a smaller CF tree by increasing the threshold
Phase 3
Apply global clustering algorithm to the sub-clusters given by leaf
entries of the CF tree
Improves clustering quality
Phase 4
Scan the entire dataset to label the data points
Outlier handling
29 / 35
Unsupervised Learning: Hierarchical Methods
Road Map
1 Basic Concepts
2 BIRCH
3 ROCK
30 / 35
Unsupervised Learning: Hierarchical Methods
31 / 35
Unsupervised Learning: Hierarchical Methods
32 / 35
Unsupervised Learning: Hierarchical Methods
33 / 35
Unsupervised Learning: Hierarchical Methods
ROCK
O (n 2 + nmm ma + n 2 logn)
34 / 35
Unsupervised Learning: Hierarchical Methods
Summary
35 / 35