Sunteți pe pagina 1din 2

Summary of

“An Imbalanced Data Classification Method based on Automatic Clustering


under-sampling”.
Authors ‘Xiaoheng Deng”, “Weijian Zhong”, “Ju Ren” faced that as data increases day
by day, it gets noisy and more complicated. And these complicated and noisy data introduced the
new challenges. One of this challenge is IMBALANCED DATA. For this ‘Imbalanced Data’,
the no. of negative samples is greater than that of positive sample & applying standard classifier
learning algorithm and evaluation criterion caused positive samples to be ignored or treated as
noisy. It has some difficulties as follow
1. Imbalanced no. of classes
2. Overlapping -make normal data submerged easily.
3. Small disjuncts.- Complicates da-distribution.
To deal with the difficulties of standard classifier and evaluation criterion, Author proposed
an improved algorithm based on Automatic Clustering and Under-Sampling(ACUS). It works as
follows:
1. Select samples from different clusters
2. Use variance to determine if cluster could be divided.
3. Determine importance of cluster by its weight so that imp. Sample can be found.
The ACUS algo. Is based on framework of Adaboost algo., in which weights of samples are
modified and classifier is trained. After some iterations clusters of negative sampling are divided
into sub cluster until no. of clusters are not less than max no. of clusters. Before training new
classifier, samples are extracted according to weights of samples in cluster. This method can
detect representative samples better w/o calculating complex distance as compared to traditional
method.ACUS consists of following 3 steps::
1. Clustering of samples in majority class
2. Sampling from clusters
3. Training ensemble classifier.
The time complexity of ACUS is not worse than K-Means but better than hierarchical clustering
in clustering procedures. In addition, ACUS yields significantly better results.[ ACUS:: O(Nn
log (Nn) t), O(Nplog Nn), and O(tB), respectively][K-Means:: O(Ntd)][Heirarchical::
O(N2dlogN)]
Advantages:
1. ACUS can distinguish negative sampling that are closer to positive samples better as
compared to K-Means and hierarchical clustering algo.
2. ACU is good at screening out unimportant negative samples as noisy sample by dividing
them into several separate clusters.
3. ACUS is more efficient and effective in selecting useful samples from both the positive
and negative samples.
Disadvantages:
1. When sample overlapping degree occurs high, ACUS ignores most of overlapped
negative samples.
2. In ACUS , due to undersampling instability is occurred, due to which kappa coefficient is
computed by running every algo. 5 times based on training set and gives 10 experimental
results.

S-ar putea să vă placă și