Documente Academic
Documente Profesional
Documente Cultură
Q1.What is a cluster?
A cluster is a subset of objects which are “similar”.
A subset of objects such that the distance between any two objects in the cluster is
less than the distance between any object in the cluster and any object not located
inside it.
A connected region of a multidimensional space containing a relatively high density
of objects
Clustering is a process of partitioning a set of data (or objects) into a set of
meaningful sub-classes, called clusters. •
o Help users understand the natural grouping or structure in a data set.
Q2.Types of Clusterings:
A clustering is the set of Cluster
Important distinction between hierarchical and partitional sets of clusters
• Partitional Clustering
– A division data objects into non-overlapping subsets (clusters) such that each
data object is in exactly one subset
– Construct various partitions and then evaluate them by some criterion (we will
see an example called BIRCH)
– Nonhierarchical, each instance is placed in exactly one of K non-overlapping
clusters.
– Since only one set of clusters is output, the user normally has to input the
desired number of clusters K.
• Hierarchical clustering
– A set of nested clusters organized as a hierarchical tree
– Create a hierarchical decomposition of the set of objects using some
criterion
Q3.Clustering Algorithms
• Partitional- K-means
• Hierarchical clustering
K-means Clustering (Partitional- K-means):
• Partitional clustering approach
• Each cluster is associated with a centroid (center point)
• Each point is assigned to the cluster with the closest centroid
• Number of clusters, K, must be specified
• The basic algorithm is very simple :
Two different K-means Clustering’s
3
2.5
1.5
y
1
0.5
3
3
2.5
2.5
2
2
1.5
1.5
y
y
1
1
0.5 0.5
0 0
Hierarchical Clustering
• Produces a set of nested clusters organized as a hierarchical tree
• Can be visualized as a dendrogram
– A tree like diagram that records the sequences of merges or splits
6 5
0.2
4
0.15 3 4
2
5
0.1 2
0.05 1
3 1
0
1 3 2 5 4 6
• Do not have to assume any particular number of clusters
– Any desired number of clusters can be obtained by ‘cutting’ the
dendogram at the proper level
• They may correspond to meaningful taxonomies
– Example in biological sciences (e.g., animal kingdom, phylogeny
reconstruction, …)