Sunteți pe pagina 1din 21

SPIS2015, 16-17 Dec.

2015, Amirkabir University of Technology, Tehran, IRAN


Dynamic Feature Weighting for Imbalanced Data Sets
Maryam Dialameh
Mansoor Zolghadri Jahromi
Department of Computer Science and Engineering and Information Shiraz University Shiraz,
Iran maryam.dialameh@yahoo.com
Department of Computer Science and Engineering and Information Shiraz University Shiraz,
Iran z.zjahromi@gmail.com
AbstractMost of data mining algorithms including classifiers suffer from data sets
with highly imbalanced distribution of the target variable. The problem becomes more
serious when the events have different costs. Feature weighting and instance weighting are
two most common ways to tackle this problem. However, none of the current weighting
methods take into account the salience of features. In order to accomplish this, a novel and
flexible weighting function is proposed that dynamically assigns a proper weight to each
feature. Experiments results show that the proposed weighting function is superior to
current methods.
Abstrak-Sebagian besar algoritma data mining termasuk pengklasifikasi menderita
kumpulan data dengan distribusi variabel target yang sangat tidak seimbang. Masalahnya
menjadi lebih serius bila kejadiannya berbeda biaya. Bobot fitur dan pembobotan contoh
adalah dua cara yang paling umum untuk mengatasi masalah ini. Namun, tidak ada
metode pembobotan saat ini yang mempertimbangkan arti penting fitur. Untuk mencapai
hal ini, sebuah fungsi pembobotan yang baru dan fleksibel diusulkan agar secara dinamis
memberikan bobot yang tepat untuk setiap fitur. Hasil percobaan menunjukkan bahwa
fungsi pembobotan yang diusulkan lebih unggul dari metode saat ini.
Keywordsimbalanced data; feature weighting; feature salience.
I. I
NTRODUCTION Imbalanced data set problems are referred to a type of
classification problems where one class has lesser number of data samples as compared to the
other classes. Learning from imbalanced data sets has been identified as one of the 10 most
challenging problems in data mining research [1]. Imbalanced data sets appear in many domains
such as medical applications [2], ris1 management [3], face recognition [4], and so on. While
there are multiclass data in which imbalances exist between the various classes, we wor1ed on
binary class imbalanced data sets, where there is only one positive and one negative class. The
positive and negative classes are respectively considered as minor and major classes. There are
two general ways to handle the problem of imbalanced data sets: 1) data-oriented methods which
change distribution of data to re-balance them and, 2) algorithm- oriented methods that extend or
modify existing algorithms to deal with imbalanced data [5]. However, the second way is
preferable than the first one because there is no need to change distribution of data. Therefore, in
this paper, an algorithm- oriented method is proposed which tries to solve the problem of
imbalance data using a proper feature weighting schema.
There are many wor1s tried to solve the effect of imbalanced data using a weighting schema
[6]; but none of them ta1e into account the salience of feature. They suppose that salience of
features is constant and is not affected by what it is being compared to [7]. To explain the
salience of feature, there is a good classification example introduced in [7] brought at the
following.
Masalah set data yang tidak seimbang mengacu pada jenis masalah klasifikasi dimana satu
kelas memiliki jumlah sampel data yang lebih sedikit dibandingkan kelas lainnya. Belajar dari
kumpulan data yang tidak seimbang telah diidentifikasi sebagai salah satu dari 10 masalah yang
paling menantang dalam penelitian data mining [1]. Set data yang tidak jelas muncul di banyak
domain seperti aplikasi medis [2], manajemen ris1 [3], pengenalan wajah [4], dan seterusnya.
Meskipun ada data multiklass di mana ketidakseimbangan ada di antara berbagai kelas, kita
mengalami peningkatan pada kumpulan data yang tidak seimbang, di mana hanya ada satu kelas
positif dan satu negatif. Kelas positif dan negatif masing-masing dianggap sebagai kelas minor
dan kelas utama. Ada dua cara umum untuk menangani masalah kumpulan data yang tidak
seimbang: 1) metode berorientasi data yang mengubah distribusi data untuk
menyeimbangkannya kembali dan, 2) metode berorientasi algoritma yang memperpanjang atau
memodifikasi algoritme yang ada untuk menangani data yang tidak seimbang [ 5]. Namun, cara
kedua lebih baik daripada yang pertama karena tidak perlu mengubah distribusi data. Oleh
karena itu, dalam makalah ini, sebuah metode yang berorientasi algoritma diusulkan yang
mencoba memecahkan masalah data ketidakseimbangan menggunakan skema pembobotan fitur
yang tepat.
Ada banyak upaya untuk mengatasi efek data yang tidak seimbang menggunakan skema
pembobotan [6]; Tapi tak satu pun dari mereka mempertimbangkan arti penting fitur. Mereka
menganggap arti penting fitur itu konstan dan tidak terpengaruh oleh apa yang dibandingkan
dengan [7]. Untuk menjelaskan arti penting fitur, ada contoh klasifikasi yang baik yang
diperkenalkan pada [7] berikut ini.
Suppose that 1-NN classifier is given to classify queries as retired or non-retired. Also,
there is only one instance as training data: <man, 50 years old, wor1er, very healthy, class
label=non-retired>. Given three queries to classify: Q1:< man, 25 years old, wor1er, very
healthy>, Q2:< man, 60 years old, wor1er, very healthy> and Q3:< man, 70 years old, wor1er,
very healthy>. Our intuitive understanding would lead us to sort queries similarities with the
training instance according to: Q1>Q2>Q3 that seems reasonable. However, the result will be
different if we had used a local/global weight schema embedded in 1-NN: Q2>Q3>Q1 that
seems unreasonable because the probability that the first man be non-retired is more than the
others while the similarities show the opposite of it. Based on our intuitive understanding, we
find out that we estimated different weights for feature age depending to the queries while the
local/global schema do not ta1e into account this dependency. The reason is that the salience of
feature can be change for different queries and the current weighting method do not have enough
express capability to reflect such a change in feature salience. The problem of feature salience
can be solved by a dynamic weighing schema where the feature weight of an instance is
dependent to both instance and query. This dynamicity needs to be determined in runtime during
a similarity measure.
We believed that the effect of feature salience may be very important in imbalanced data
problems; especially in classification tas1s. To explain more, assume that figure 1 represents the
class distribution of an imbalanced data set where the shaded region corresponds to the conflict
region (i.e. a region around decision boundary). In a best situation, most of the feature weighting
methods are preferred to consider all shaded region as the minority class. Therefore, their false
positive rates will be increased. The problem is that they do not ta1e into account the effect of
query in their decision rules. This problem can be solved using a context dependent weighting
schema named dynamic feature weighting so that the effect of query will be effective in the
weight assignment process. The experimental results proved this assertion.
Kami percaya bahwa efek fitur penting mungkin sangat penting dalam masalah data yang
tidak seimbang; Terutama dalam klasifikasi tas1s. Untuk menjelaskan lebih lanjut, asumsikan
bahwa angka 1 mewakili distribusi kelas dari kumpulan data yang tidak seimbang dimana
wilayah yang teduh sesuai dengan wilayah konflik (yaitu wilayah di sekitar batas keputusan).
Dalam situasi terbaik, sebagian besar metode pembobotan fitur lebih disukai untuk
mempertimbangkan semua wilayah yang diarsir sebagai kelas minoritas. Oleh karena itu, tingkat
positif palsu mereka akan meningkat. Masalahnya adalah mereka tidak memperhatikan efek
kueri dalam peraturan keputusan mereka. Masalah ini dapat diatasi dengan menggunakan skema
bobot tergantung konteks yang dinamai bobot fitur dinamis sehingga efek query akan efektif
dalam proses penugasan bobot. Hasil percobaan membuktikan pernyataan ini.
Fig. 1. Data imbalanced retio within and outside of margin [4].
K-nearest neighbor (K-NN) algorithms have been identified as one of the most influential data
mining algorithms [8]. The nearest neighbor (i.e. NN) algorithm is the simplest form of K- NN
when K=1. The NN algorithm has many advantages such as simplicity, effectiveness, and
popularity. Also, it has been proved in [9] and [10] that the NN algorithm has bounded error rate
that is at most twice as much the Bayes error rate independent of the distance metric used.
However, it acts very badly in highly imbalanced problems. Therefore, we aim to modify the NN
algorithm using the proposed dynamic weighting function to assess the quality of the proposed
weighting schema.
The main contributions of this wor1 are: i) propose a novel and flexible weighting function
that dynamically assigns feature weight; ii) optimize the nearest neighbor algorithm in
imbalanced data problems using the proposed weighting function.
The paper is organized as follows. Section II describes most current related wor1s. Section III
presents the proposed weighting function and its learning process for nearest neighbor algorithm.
Experimental results are presented in section IV. Section V concludes the paper.
Algoritma K-terdekat (K-NN) telah diidentifikasi sebagai salah satu algoritma data mining
yang paling berpengaruh [8]. Algoritma tetangga terdekat (yaitu NN) adalah bentuk paling
sederhana dari K- NN saat K = 1. Algoritma NN memiliki banyak keunggulan seperti
kesederhanaan, efektivitas, dan popularitas. Juga, telah terbukti di [9] dan [10] bahwa algoritma
NN telah membatasi tingkat kesalahan yang paling banyak dua kali dari tingkat kesalahan Bayes
yang independen dari metrik jarak yang digunakan. Namun, tindakan itu sangat buruk dalam
masalah yang sangat tidak seimbang. Oleh karena itu, kami bertujuan untuk memodifikasi
algoritma NN menggunakan fungsi dynamic weighting yang diusulkan untuk menilai kualitas
skema bobot yang diusulkan.
Kontribusi utama dari wor1 ini adalah: i) mengusulkan fungsi bobot baru dan fleksibel yang
secara dinamis memberikan bobot fitur; Ii) mengoptimalkan algoritma tetangga terdekat dalam
masalah data yang tidak seimbang dengan menggunakan fungsi bobot yang diusulkan.
Makalah ini disusun sebagai berikut. Bagian II menggambarkan perkembangan terkini.
Bagian III menyajikan fungsi bobot yang diusulkan dan proses pembelajaran untuk algoritma
tetangga terdekat. Hasil percobaan disajikan pada bagian IV. Bagian V menyimpulkan makalah
ini.
II. RELATED
WORK Over-sampling and under-sampling are two of the most common data-oriented methods.
Over-sampling adds data to the minority class while under-sampling removes data from the
majority class. Synthetic Minority Over-sampling Technique (SMOTE) [11] is a famous method
that over-samples the minority class by generating new synthetic data. Safe-Level SMOTE [12]
is an improvement on SMOTE which uses a weight degree, called safe-level, to ignore the effect
of nearby majority instances.
Overgeneralization is the main drawbac1 of the over- sampling methods [13]. ROSE (Random
Over Sampling Examples) [14] is another framewor1 based on a smoothed bootstrap re-sampling
technique. ROSE uses both over- and under-sampling to generate a new data sets from the given
training set. Weighted Distance Nearest Neighbor (WDNN) [15] is a prototype reduction method
based on retaining the informative instances and learning their weights to improve performance
of the NN algorithm on training data. However, the WDNN cannot wor1 for K-NN where K is
greater than 1. WDKNN [16] is an improvement on the WDNN which attempts to reduce time
complexity of the WDNN and extends it to wor1 with K greater than 1. Class Confidence
Weighted (CCW) [5] is a weighting method tries to improve performance of the K-NN on
imbalanced data sets. The CCW uses the probability of attribute values given class labels to
weight the K closet
neighbors to a test instance. Class Confidence Proportion Decision Tree (CCPDT) [17] is an
algorithm- oriented method that modifies standard C4.5 to deal with imbalanced data. The
CCPDT propose a metric named by class confidence proportion (CCP) instead of ordinary
information gain to overcome the bias of information gain. The wor1 proposed in [18] tries to
solve imbalanced data problem for mixed type data-data with both categorical and numerical
features-by considering the relationship between mixed features. Peng et al. [19] proposed an
extension on data gravitation-based classification (DGC) model namely Imbalanced DGC
(IDGC) for imbalanced problems. They used a gravitation coefficient that contains class
imbalance information, which can strengthen and wea1en the gravitational field of the minority
and majority classes. Weighted Multi-class Least Squares Twin Support Vector Machine
(WMLSTSVM), proposed in [20], is an approach to address the problem of imbalanced data
classification for multi class using an appropriate weight setting in loss function.
An interesting method, called LPD (Learning Prototypes and Distances) [21], is a sample
reduction and weighted method to improve performance of the NN algorithm. The LPD uses an
objective function related to the NN error rate to learn, simultaneously, a reduced set of training
data (called prototypes) and their associated features weights. A prototype weighting method is
proposed in Ref. [22] to improve performance of the NN on imbalanced data sets. It uses a
locally adapted distance to increase the chance of minor class to be the nearest neighbor of a
query instance.
None of the previous methods ta1e into account the feature salience in imbalanced data
problems. To accomplish this, in this paper, we aim to propose a dynamic weight function
optimized for the NN algorithm to handle imbalanced data problems. Our optimization and
learning process to learn weight function parameters are closely related to the LPD. While
accuracy is highly sensitive to changes in data [23], we used an objective function that is an
approximation of the geometric mean (G-mean) of the true rates [24]. The G-mean is defined as
follows:
where ACC stands for accuracy and the other symbols are defined in the following matrix named
as confusion matrix (i.e. a table that is used to describe the performance of a classifier).
Tidak ada metode sebelumnya yang mempertimbangkan fitur penting dalam masalah data yang
tidak seimbang. Untuk mencapai hal ini, dalam makalah ini, kami bertujuan untuk mengusulkan
fungsi bobot dinamis yang dioptimalkan untuk algoritma NN untuk menangani masalah data
yang tidak seimbang. Proses optimalisasi dan pembelajaran untuk mempelajari parameter fungsi
bobot terkait erat dengan LPD. Sementara akurasi sangat sensitif terhadap perubahan data [23],
kami menggunakan fungsi objektif yang merupakan perkiraan mean geometrik (mean G) dari
suku bunga sebenarnya [24]. Mean G didefinisikan sebagai berikut:
Di mana ACC singkatan dari akurasi dan simbol lainnya didefinisikan dalam matriks berikut
yang disebut matriks kebingungan (yaitu tabel yang digunakan untuk menggambarkan kinerja
pengklasifikasi).
TABLE I. CONFUSION MATRIX.
Positive Prediction Negative
Prediction
Positive Class True Positive (TP) False Negative (FN)
Negative Class False Positive (FP) True Negative (TN)
32

METHOD
We are interested in problem of learning a weight function to enhance the performance of the
NN algorithm in imbalanced data sets. First, we defined a novel and flexible weight function and
then a learning process is proposed to learn parameters of the proposed weight function.

A. Proposed weight function


There are many weight functions proposed in the literature [25] but none of them are flexible
enough to generate any desired shape such as strictly increasing/decreasing, first decreasing and
then increasing, single modal, and so on. Here, we proposed a weight function which is made by
multiplying two different sigmoid functions:

Kami tertarik untuk mempelajari fungsi bobot untuk meningkatkan kinerja algoritma NN dalam
kumpulan data yang tidak seimbang. Pertama, kami mendefinisikan fungsi bobot baru dan
fleksibel dan kemudian sebuah proses pembelajaran diusulkan untuk mempelajari parameter
fungsi bobot yang diusulkan.

A. Usulan fungsi berat badan


Ada banyak fungsi bobot yang diusulkan dalam literatur [25] namun tidak satupun dari mereka
cukup fleksibel untuk menghasilkan bentuk yang diinginkan seperti peningkatan / penurunan
yang ketat, penurunan pertama dan kemudian peningkatan, modal tunggal, dan sebagainya. Di
sini, kami mengusulkan fungsi bobot yang dibuat dengan mengalikan dua fungsi sigmoid yang
berbeda:
(3)
Since the nature of features can be different from each other, each feature must have its own
weighting function. Therefore, the j-index in eq. (2) identifies that the FW j
(z) is defined on jth feature. The S-function defined in eq. (3)
is a sigmoid function parameterized by Ck
j
(i.e. center of kth sigmoid on jth feature) and Bk
j
(i.e. slop of kth sigmoid on jth feature).
Multiplication of two sigmoid functions yields us some good properties: i) since the assigned
weights are normalized, the weight values have a probabilistic interpretation and, ii) it is flexible
enough to produce all possible shapes such as: increasing, decreasing, and single modal. Figure 2
shows some possible shapes produced by multiplication of two different sigmoid functions.
FIG. 2. Some possible shapes generated by multiplication of two sigmoid functions.
B. Proposed learning algorithm
The previous subsection proposed a weight function and its parameters. In this subsection a
learning algorithm is proposed to learn the parameters of the proposed weight function in such a
way that maximizes the G-mean criteria.
Let X={x1,...,xM} be a set of N-dimensional training data and their corresponding class labels
L={l1,...,lM} where xi belongs 6N and li O {1,...,G}, 1 i M (G is number of classes). As it was
mentioned, the proposed weigh function is optimized for the NN algorithm. Therefore, the NN
dissimilarity criteria between query and an instance is defined as follows:
III. PROPOSED
(4)
( (5)
In this wor1, the G-mean criteria is used as objective function. Without losing the optimal
reply, ln(G-mean) can be used as objective function:

(6)
where ={C1,C2,B1,B2} is the weight functions parameters set and G is total number of classes
(usually G=2). The ACC function on gth class, which means per-class accuracy, is defined as:
(7)
(8)
where n
g
is number of samples in gth class and p=, p are, respectively, the same-class and
different-class nearest neighbor of x. A gradient ascent procedure is proposed to maximize the J-
function. This needs J to be differentiable with respect to all parameters; while the step function
(eq. 7) is not differentiable. To do this, the step function is approximated by a sigmoid function,
defined as:
(9)
Using eq. (9), the ACC function becomes:
(10)
(11)
The b-parameter has a smoothing effect; if b is large then S
b
(z) is an accurate approximation of step(z). The
derivation of sigmoid function, which will be need throughout the paper, is simple and calculated
according itself:
(12)
The derivation of sigmoid function acts as windowing function which will be maximized for
z=1; this window will be decreased by decreasing the b-parameter (i.e. the derivation of sigmoid
function approaches Dirac delta function if b-parameter is large). Correspondingly, we can
derive from eq. (6):
(14) YW Y

(15)
. Similar to eq. (13), the derivation of J-function with respect to
other weight function parameters can simply be derived.
Based on the parameters derivations, an iterative Leave-One- Out gradient ascent procedure is
proposed in figure 3 to learn parameters of the weighting functions. During each iteration, one
sample of training data is selected as test data and the rest as train set. Then, the parameters will
be updated using delta rule:
(16)
where is desired parameter and is a learning rate adjusted empirically.
Proposed Learning Algorithm Input: T: Training set, : Learning rates.
b: Sigmoid slope, a: Small constant. Output: !"#"$!"$#
Initialize !"#"$!"$# Set !"]^ ! #"]^ # $!"]^ $! $#"]^ $#; While (terminator condition is not
reached) For all 9 2 H= Q 7
]
\
_
`
@ FF'9" !" #"$!"$#( HJ Q 7
_bb
@ FF'9" !" #"$!"$#( same=indexOf(H=); diff=indexOf(HJ); c==classLabelOf(H=);
cJ=classLabelOf(HJ); W
L

_'L"d _'L"df(
e
(
<
/VU
g
'h
'j..
i
(!/U
k
g
'h
i
(% ( For j=1 to j=N

f
!"]^
!"]^ `<
lh l.
,im

#"]^
#"]^ `<
lh l.
,in
$
!"]^ $
!"]^ `<
lh l+
,im
$
#"]^
$
#"]^
`<
l+ lh
, i n //End of For //End of For Set ! !"]^" # #"]^, $! $!"]^, $#"]^ $#; //End of while Return !"
#"$!"$#
FIG. 3. Proposed learning algorithm.
IV.E
XPERIMENTS This section shows the experimental results and their associated analysis in the
imbalanced data sets. In the first experiment, we used 24 highly imbalanced data sets from the
UCI machine learning repository where there are no more than 10% positive instances in the
whole data set compared to the
negative class. The selected data sets are different in number of samples, features and degree of
data-balance. The main characteristics of these data sets are summarized in Table II.
T
ABLE
II. UCI I
MBALANCED DATA SETS
.
Data set Instance Feature + Instance - Instance
Glass5 214 9 9 205
shuttle2vs4 129 9 6 123
abalone918 731 8 49 682
ecoli4 336 7 20 316
Glass4 214 9 13 201
ecoli034vs5 300 7 20 280
ecoli0146vs5 280 7 20 260
ecoli0147vs56 332 7 25 307
Glass2 214 9 17 197
glass0146vs2 205 9 17 188
ecoli01vs5 240 7 20 220
glass06vs5 108 9 9 99
ecoli0147vs23 56
336 7 29 307
ecoli067vs5 220 7 20 200
Vowel0 988 13 90 898
ecoli0347vs56 257 7 25 232
ecoli0346vs5 205 7 20 185
glass04vs5 92 9 9 83
ecoli0267vs35 224 7 22 202
ecoli01vs235 244 7 24 220
ecoli046vs5 203 7 20 183
glass015vs2 172 9 17 155
ecoli067vs35 222 7 22 200
yeast2vs4 514 8 51 463
Table III shows the G-mean results obtained by applying different methods on the data sets.
The 1-NN classifier is used for the data-oriented methods (the 1-parameter is adjusted manually
for each method). To have an accurate results, we used 10-Times 10-Folds cross validation
which is common in evaluation of classifiers performance. At any time, the entire data set is
divided to 10 bloc1s. Then, one bloc1 is selected as test set and the rest are selected as train set;
this process continues until all of the bloc1s are selected as test set. It is repeated 10 times,
therefore, the final results are the average of the 100 different experiments. For each data set, the
learning rate (`) and Z-parameter are selected from values in 3Po*"Po4" p "Poq5 empirically. To
accomplish that, for each possible combination of`andZ, we used 10-Times 10-Folds cross
validation and chose the best combination in terms of G- mean criteria.
Also, the Friedman-a nonparametric statistical method for testing whether all the algorithms
are equivalent over various
34
data set-is used to find significant differences among the results obtained by the studied methods
[26]. To accomplish this, the average ran1 is calculated using the Friedman test. For a specific
data set, the algorithm that achieves the highest performance measure value is ran1ed at the first
position and its ran1 value is set as one. The algorithm that achieves the second highest value is
given a ran1 value of two, and so forth. Finally, the average ran1 of each algorithm is computed
for comparison.
TABLE
III. THE G-MEAN RESULTS OBTAINED BY 10-TIMES 10-FOLDS CROSS VALIDATION.
Data Sets
[11] [12] [16] 1NN LPD [22] Propo
sed
Glass5 68.76 87.15 49.75 71.26 53.94 88.04 92.13 shuttle2vs 4
99.57 83.38
60.00 90.00 96.44
100 100
abalone91 8
62.55 71.00
05.00 28.05 14.92
62.57 69.41
ecoli4 90.14 92.26
71.88 84.84 87.47
90.42 96.13 Glass4 88.47 81.40
43.65 77.87 73.57
88.71 95.15 ecoli034vs 5
97.07 88.29
90.53 83.05 91.81
97.07 97.71
ecoli0146 vs5
87.95 87.38
82.04 83.90 86.01
87.37 93.88
ecoli0147 vs56
85.80 89.19
75.14 83.29 85.19
88.13 91.07
Glass2 39.05 51.12
00.00 33.40 06.69
64.06 67.35 glass0146 vs2
41.98 55.81
00.00 35.54 06.37
56.04 71.08
ecoli01vs5 84.67 90.79
87.59 84.77 85.65
90.36 91.36 glass06vs 5
87.95 79.18
49.49 89.05 59.10
88.94 92.40
ecoli0147 vs2356
80.27 85.31
59.79 79.58 82.29
83.15 93.43
ecoli067vs 5
84.23 83.19
62.25 79.02 82.37
86.65 89.45
Vowel0 100 94.21
97.66 100 97.95
100 96.97 ecoli0347 vs56
83.97 85.20
71.56 83.37 85.26
86.90 90.51
ecoli0346 vs5
87.81 84.23
77.74 83.29 85.39
90.02 89.14
glass04vs 5
90.00 84.31
68.01 70.57 84.98
99.35 100
ecoli0267 vs35
76.08 77.11
65.86 81.57 80.75
85.20 92.17
ecoli01vs2 35
82.76 84.23
73.12 85.62 77.94
86.77 94.72
ecoli046vs 5
88.00 87.22
76.95 85.37 84.52
87.51 91.80
glass015v s2
42.96 44.46
07.07 33.77 05.25
53.15 60.82
ecoli067vs 35
68.09 79.99
45.42 78.54 75.04
79.43 82.16
yeast2vs4 85.38 87.46 66.16 81.63 76.67 90.87 91.01 Average Rank
3.93 3.66 6.66 4.95 5.08 2.37 1.31
For more examination, we performed another experiment on some different data sets ta1en
from the UCI, StatLib1 and agnostic vs. prior competition2. As mentioned in Ref. [27], AUC-PR
(area under precision-recall curve) is better than Area Under ROC Curve since a curve dominates
in ROC space if and only if it dominates in PR space. Therefore, to have a better assessment, we
used the more informative metric of AUC-PR for classifier comparisons. Table IV and V,
respectively, show
1
http://lib.stat.cmu.edu/
the data sets information and the results obtained by 10-Times 10-Folds cross validation as well
as the first experiment.
TABLE VI. DATA SETS INFORMATION
.
Data set Instance Feature + Instance - Instance
Ipums 7019 60 57 6962
Arrhythmia 452 263 13 439
BrazilTourism 412 9 16 396
Primary 339 18 14 325
Sylva.agnostic 14395 213 885 13510
Balance 625 5 49 576
Bac1ache 180 33 25 155
TABLE V. THE AUC-PR RESULTS OBTAINED BY 10-TIMES 10-FOLDS CROSS
VALIDATION
.
Data Set [11] [16] CCW CCPDT Proposed
Ipums 0.136 0.170 0.140 0.037 0.183
Arrhythmia 0.083 0.134 0.229 0.346 0.332
BrazilTourism 0.233 0.184 0.241 0.152 0.213
Primary 0.310 0.347 0.279 0.170 0.286
Sylva.agnostic 0.928 0.922 0.925 0.934 0.930
Balance 0.135 0.091 0.149 0.092 0.140
Bac1ache 0.317 0.330 0.328 0.227 0.340
Average Ran1 3.28 3.28 2.71 3.71 2.00
The results show that, in general, algorithm-oriented methods yields better performance
compared with data-oriented methods. On average, the best results are achieved by the proposed
method. The Friedman test confirms this assertion. The main drawbac1 of the proposed method
is its sensitivity to the learning rate () and b-parameter which are learned empirically. They are
dependent to the nature of data set.
V. CONCLUSION The paper proposed a dynamic feature weighting schema
along with a novel and flexible weighting function that can successfully deal with the challenges
posed by learning from imbalances, mainly related to the disproportion of instances per class in
the training data set. The proposed weighting schema is optimized for the NN algorithm where
each feature has its own weight function. The weight functions parameters are learned using an
iterative learning algorithm which tries to maximize a differentiable objective function; the
objective function is an approximation of the G-mean criteria which is common in imbalanced
data classification problems.
2
http://www.agnostic.inf.ethz.ch
35
A number of experiments involving a large collection of standard benchmar1 imbalanced data
sets showed the good performance of the proposed method.
In future our plan is to extend the idea of dynamic weighting function to other weighting
classification problems.
R
EFERENCES
[1] Q. Yang and X. Wu, 10 challenging problems in data mining research, Int. J. Inf. Technol.
Decis. Mak., vol. 5, no. 04, pp. 597 604, 2006.
[2] M. A. Mazurows1i, P. A. Habas, J. M. Zurada, J. Y. Lo, J. A. Ba1er, and G. D. Tourassi,
Training neural networ1 classifiers for medical decision ma1ing: The effects of imbalanced
datasets on classification performance, Neural networks, vol. 21, no. 2, pp. 427436, 2008.
[3] Y.-M. Huang, C.-M. Hung, and H. C. Jiau, Evaluation of neural networ1s and data mining
methods on a credit assessment tas1 for class imbalance problem, Nonlinear Anal. Real World
Appl., vol. 7, no. 4, pp. 720747, 2006.
[4] Y.-H. Liu and Y.-T. Chen, Face recognition using total margin- based adaptive fuzzy
support vector machines, Neural Networks, IEEE Trans., vol. 18, no. 1, pp. 178192, 2007.
[5] W. Liu and S. Chawla, Class confidence weighted 1nn algorithms for imbalanced data sets,
in Advances in Knowledge Discovery and Data Mining, Springer, 2011, pp. 345356.
[6] P. Branco, L. Torgo, and R. Ribeiro, A Survey of Predictive Modelling under Imbalanced
Distributions, arXiv Prepr. arXiv1505.01658, 2015.
[7] X. Tong, P. Oztur1, and M. Gu, Dynamic feature weighting in nearest neighbor classifiers,
in Machine Learning and Cybernetics, 2004. Proceedings of 2004 International Conference on,
2004, vol. 4, pp. 24062411.
[8] X. Wu, V. Kumar, J. R. Quinlan, J. Ghosh, Q. Yang, H. Motoda, G. J. McLachlan, A. Ng, B.
Liu, and S. Y. Philip, Top 10 algorithms in data mining, Knowl. Inf. Syst., vol. 14, no. 1, pp.
137, 2008.
[9] T. Cover and P. Hart, Nearest neighbor pattern classification, Inf.
Theory, IEEE Trans., vol. 13, no. 1, pp. 2127, 1967.
[10] R. O. Duda and P. E. Hart, Pattern recognition and scene analysis.
Wiley, New Yor1, 1973.
[11] N. V Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, SMOTE: synthetic
minority over-sampling technique, J. Artif. Intell. Res., vol. 16, no. 1, pp. 321357, 2002.
[12] C. Bun1humpornpat, K. Sinapiromsaran, and C. Lursinsap, Safe- level-smote: Safe-level-
synthetic minority over-sampling technique for handling the class imbalanced problem, in
Advances in Knowledge Discovery and Data Mining, Springer, 2009, pp. 475 482.
[13] T. Maciejews1i and J. Stefanows1i, Local neighbourhood extension of SMOTE for mining
imbalanced data, in Computational Intelligence and Data Mining (CIDM), 2011 IEEE
Symposium on, 2011, pp. 104111.
[14] G. Menardi and N. Torelli, Training and assessing classification rules with imbalanced
data, Data Min. Knowl. Discov., vol. 28, no. 1, pp. 92122, 2014.
[15] M. Z. Jahromi, E. Parvinnia, and R. John, A method of learning weighted similarity
function to improve the performance of nearest neighbor, Inf. Sci. (Ny)., vol. 179, no. 17, pp.
29642973, 2009.
[16] T. Yang, L. Cao, and C. Zhang, A novel prototype reduction method for the K-nearest
neighbor algorithm with K 1, in Advances in Knowledge Discovery and Data Mining,
Springer, 2010, pp. 89100.
[17] W. Liu, S. Chawla, D. A. Ciesla1, and N. V Chawla, A Robust Decision Tree Algorithm
for Imbalanced Data Sets., in SDM, 2010, vol. 10, pp. 766777.
[18] C. Liu, L. Cao, and P. S. Yu, A hybrid coupled 1-nearest neighbor algorithm on imbalance
data, in Neural Networks (IJCNN), 2014 International Joint Conference on, 2014, pp. 2011
2018.
[19] L. Peng, H. Zhang, B. Yang, and Y. Chen, A new approach for imbalanced data
classification based on data gravitation, Inf. Sci. (Ny)., vol. 288, pp. 347373, 2014.
[20] D. Tomar and S. Agarwal, An effective Weighted Multi-class Least Squares Twin Support
Vector Machine for Imbalanced data classification, Int. J. Comput. Intell. Syst., vol. 8, no. 4,
pp. 761778, 2015.
[21] R. Paredes and E. Vidal, Learning prototypes and distances: A prototype reduction
technique based on nearest neighbor error minimization, Pattern Recognit., vol. 39, no. 2, pp.
180188, 2006.
[22] Z. Hajizadeh, M. Taheri, and M. Z. Jahromi, Nearest Neighbor Classification with Locally
Weighted Distance for Imbalanced Data.
[23] H. He and E. A. Garcia, Learning from imbalanced data, Knowl.
Data Eng. IEEE Trans., vol. 21, no. 9, pp. 12631284, 2009.
[24] G. E. Batista, R. C. Prati, and M. C. Monard, A study of the behavior of several methods
for balancing machine learning training data, ACM Sigkdd Explor. Newsl., vol. 6, no. 1, pp.
2029, 2004.
[25] C. G. At1eson, A. W. Moore, and S. Schaal, Locally weighted
learning for control, in Lazy learning, Springer, 1997, pp. 75113.
[26] M. Friedman, The use of ran1s to avoid the assumption of normality implicit in the
analysis of variance, J. Am. Stat. Assoc., vol. 32, no. 200, pp. 675701, 1937.
[27] J. Davis and M. Goadrich, The relationship between Precision- Recall and ROC curves, in
Proceedings of the 23rd international conference on Machine learning, 2006, pp. 233240.
36

S-ar putea să vă placă și