K-Means Consistency in Clustering

k-NEAREST-NEIGHBOR CONSISTENCY IN DATA
CLUSTERING
BY
Patnala Naveen Kumar - 16KA1A0529
Karjala Kiran – 16KA1A0546
Shaik Mukthiyar – 17KA5A0502
Shaik Mohammed Sharif - 17KA5A0508
Malappagari Dinesh - 17KA5A0515
Under the esteemed guidance of

SYED ZAHEER AHAMMED M Tech.,
Adhoc Assistant Professor
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

JAWAHARLAL NEHRU TECHNOLOGICAL UNIVERSITY ANANTAPUR
COLLEGE OF ENGINEERING KALIKIRI
Contents:
 Introduction
 Abstract
 Existing Algorithm
 Proposed Algorithm
 System Requirements
 References
Introduction
• The continuous expansion of data availability in many areas of engineering and
science, identifying patterns from vast amounts of data and identifying members
of a predefined class, which is called classification, have become critical tasks.
Therefore, classification is a fundamental problem, especially in pattern
recognition and data mining.
• k-Nearest Neighbor (KNN) classifiers are basic classifiers that are used to
classify a query object into the category as its nearest example.
• The efficiency of NN classification heavily depends on the type of distance
measure, especially in a large-scale and high-dimensional database. In some
applications, the data structure is so complicated that the corresponding distance
measure is computationally expensive.
• Traditional KNN adopts a fixed k for all query samples regardless of their
geometric location and related specialties.
• To overcome the above limitations of classification algorithms, we propose a new
method called the Natural Neighborhood-Based Classification Algorithm
(NNBCA). With the help of our previously proposed Natural Neighbor (NaN)
method
Abstract
• K-Nearest Neighbors algorithm can be used for classification. An object is classified by a
majority vote of its neighbors, with the object being assigned to the class most common
among its k nearest neighbors. It can also be used for regression. This value is the average
of the values of its k nearest neighbors.
• Various kinds of K-Nearest Neighbor algorithm based classification methods are the bases
of many well established and high performance pattern recognition techniques. However,
such methods are vulnerable to parameter choice K. Essentially, the challenge is to detect
the neighborhood of various data sets, while utterly ignorant of the data characteristic.
• In this we introduce a new supervised classification method: the natural neighbor
algorithm method, and shows that it provides a better classification result without choosing
the neighborhood parameter artificially. Unlike the original K-Nearest Neighbors
algorithm based method which needs a prior k, the natural neighbor algorithm method
predicts different k in different stages. Therefore, the natural neighbor algorithm method is
able to learn more from flexible neighbor information both in training stage and testing
stage, and provide a better classification result.
Existing Algorithm
K-Nearest-Neighbors (KNN)
 The KNN algorithm is a robust and versatile classifier that is often used as a benchmark for more
complex classifiers such as Artificial Neural Networks (ANN) and Support Vector Machines
(SVM). Despite its simplicity, KNN can outperform more powerful classifiers and is used in a
variety of applications such as economic forecasting, data compression and genetics. For example,
KNN was leveraged in a 2006 study of functional genomics for the assignment of genes based on
their expression profiles.
 KNN falls in the supervised learning family of algorithms. Informally, this means that we are
given a labeled dataset consisting of training observations (x,y) (x,y) and would like to capture the
relationship between xx and yy. More formally, our goal is to learn a function h:X→Yh:X→Y so
that given an unseen observation xx, h(x) h(x) can confidently predict the corresponding output y.
The KNN classifier is also a non parametric and instance-based learning algorithm.
Existing Algorithm Contd;
How does KNN work?
 In the classification setting, the K-nearest neighbor algorithm essentially boils down to
forming a majority vote between the K most similar instances to a given “unseen”
observation. Similarity is defined according to a distance metric between two data points.
 A popular choice is the Euclidean distance given by
d(x,x′)=sqrt((x1−x′1)2+(x2−x′2)2+…+(xn−x′n)2) .
Disadvantages of Existing Algorithm

 K-NN is slow algorithm
 Curse of Dimensionality
 K-NN needs homogeneous features
 Optimal number of neighbors
 Imbalanced data causes problems
 Outlier sensitivity
 Missing Value treatment
Proposed Algorithm
 Various kinds of k-Nearest Neighbor (KNN) based classification methods are the bases of
many well established and high-performance pattern recognition techniques. However,
such methods are vulnerable to parameter choice. Essentially, the challenge is to detect the
neighborhood of various datasets while ignoring the data characteristics.
 This introduces a new supervised classification algorithm, Natural Neighborhood Based
Classification Algorithm (NNBCA).
 Findings indicate that this new algorithm provides a good classification result without
artificially selecting the neighborhood parameter. Unlike the original KNN-based method ,
which needs a prior k, NNBCA predicts different k for different samples.
 Therefore, NNBCA is able to learn more from flexible neighbor information both in the
training and testing stages. Thus, NNBCA provides a better classification result than other
methods.
Proposed Algorithm Contd;
Advantages of Proposed Algorithm
 NaN method can create an applicable neighborhood graph based on the local
characteristics of various data sets. This neighborhood graph can identify the
basic clusters in the data set, especially manifold clusters and noises.
 This method can provide a numeric result named NaN Eigenvalue (NaNE) to
replace the parameter k in the traditional KNN method, and the number of NaNE
is dynamically chosen for different data sets.
 The NaN number of each point is flexible , and this value is a dynamic number
ranging from 0 to NaNE. The center point of the cluster has more neighbors, and
the neighbor number of noise is equal to 0.
References
1. Big data mining and analytics issn 2096-0654 01/06 pp257–265 Volume 1, Number 4,
December 2018 DOI: 10.26599/BDMA.2018.9020017
2. Z.H. Zhou , N.V. Chawla , Y.C.Jin , and G.J.Williams , Big data opportunities and
challenges: Discussions from data analytics perspectives, IEEE Comput. Intell. Mag.,
vol. 9, no. 4, pp. 62–74, 2014.
3. Y. T. Zhai, Y. S. Ong, and I. W. Tsang, The emerging “big dimensionality”, IEEE
Comput. Intell. Mag., vol. 9, no. 3, pp. 14–26, 2014.
4. T. Cover and P. Hart, Nearest neighbor pattern classification, IEEE Trans. Inf. Theory,
vol. 13, no. 1, pp. 21–27, 1967.
5. H. Zhang, A. C. Berg, M. Maire, and J. Malik, SVMKNN: Discriminative nearest
neighbor classification for visual category recognition, in Proc. 2006 IEEE Computer
Society Conf. Computer Vision and Pattern Recognition, New York, NY, USA, 2006, pp.
2126–2136.
6. D. Lunga and O. Ersoy, Spherical nearest neighbor classification: Application to
hyperspectral data, in Machine Learning and Data Mining in Pattern Recognition, P.
Perner, ed. Springer, 2011.
Thank you

K-Means Consistency in Clustering

Încărcat de

Informații document

Descriere originală:

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

K-Means Consistency in Clustering

Încărcat de

Drepturi de autor:

Formate disponibile

k-NEAREST-NEIGHBOR CONSISTENCY IN DATA

Under the esteemed guidance of

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

Disadvantages of Existing Algorithm

S-ar putea să vă placă și