Documente Academic
Documente Profesional
Documente Cultură
Presentation Outline
Introduction to Clustering Abstract Existing System Proposed System Experimental Design
Experimental results
Conclusion
July 22, 2013 2
Clustering
Introduction
Clustering Grouping similar kind of data. Data clustering concerns how to group a set of objects based on their similarity of attributes. Main methods
ABSTRACT
The categorical data clustering methods are generating results based on incomplete information. This problem degrades the quality of the clustering result. This paper presents a new link-based approach for categorical data clustering which improves results by discovering unknown entries through similarity between clusters
Existing Methods
Proposed Methods
Link based approach improves the matrix by discovering the unknown entries.
Introduction to NLCD
Basic process
Clustering 1 Clustering 2 Consensus Function
Dataset X
Clustering M
Clustering
PairWise-Similarity Matrix
Binary Matrix
, the WTQ measure of and ; 0 For each c If c + Return Following that, the similarity between clusters and can be estimated by
July 22, 2013 10
Sim , =
wxy =
11
Experimental Results
Input parameters:
Memory (M): 5% of data set Disk space (R): 20% of M Initial threshold (T): 0.0 Page size (P): 1024 bytes
July 22, 2013 12
Experimental Results
KMEANS clustering
No 1 2 3 Time 43.9 13.2 32.9 D 2.09 4.43 3.66 # Scan 289 51 187 DS 1o 2o 3o Time 33.8 12.7 36.0 D 1.97 4.20 4.35 # Scan 197 29 241
NLCD clustering
No 1 2 3 Time 11.5 10.7 11.4 D 1.87 1.99 3.95 # Scan 2 2 2 DS 1o 2o 3o Time 13.6 12.1 12.2 D 1.87 1.99 3.99 # Scan 2 2 2
13
Conclusions
A New Link Based Clustering that stores the clustering features in Matrix.
Given a limited amount of main memory, NLCD can minimize the time required for I/O. The problem of constructing the refined matrix is efficiently resolved by similarity among categorical clusters
July 22, 2013 14
Future Work
The first prominent future work includes an extensive study regarding the behavior of other link-based similarity measures within this problem context.
The second prominent future work is the new method will be applied to specific domains, including tourism and medical data sets.
15
References
IEEE Journal on Data Mining http://ilpubs.stanford.edu:8090/508/1/2001-41.pdf IEEE Journal on Knowledge and data engineering http://en.wikipedia.org/wiki/Clustering_algorithm
16
Q&A
17