Sunteți pe pagina 1din 6

2010 Second International Symposium on Data, Privacy, and E-Commerce

k-NN Numeric Prediction Using Bagging and Instance-relevant Combination


Liang He, Qinbao Song, Junyi Shen
Department of Computer Science and Technology
Xian Jiaotong University
Xian, China
e-mail: {lhe, qbsong, jyshen}@mail.xjtu.edu.cn
scheme, k-NN is able to attain higher accuracy through
reasonable neighbor search in classifying and predicting.
However, the estimation made by k-NN relies not only on
what instances are selected to be the reference neighbors, but
also on the number of them. The spatial distribution and
distinct characteristics of various instances tend to be ignored
when the same number of neighbors, i.e., fixed value of k, is
applied. As a result, some instances may be classified or
predicted correctly while the others may not. Thus, the
overall accuracy is likely to be erratic. The numeric
prediction is more sensitive to this situation, because the
target attribute possesses continuous values rather than
countable categories in classification.
If we provide suitable k-NN models to different unknown
instances with consideration of their features, it is possible to
obtain desirable accuracy for most of these instances. In this
manner, sufficient datasets need to be generated in advance
for the training of diverse k-NN models. Unfortunately, there
is only one original dataset available in most of the time. The
regular training procedure on this dataset builds a sole k-NN
model instead of a series. We can intuitively divide an
original dataset into small partitions and formulate a model
on each of them. But it is unpractical to the instance-based kNN method, often adopted in applications with small dataset
owing to computational cost. An even small partitioned
reference sample inevitably produces poor performance. It is
basically impossible to realize adaptive k-NN prediction for
different unknown instances through such a way. To address
this problem in numeric prediction, the bagging-based
ensemble k-NN algorithm is proposed.
The ensemble principle is an innovative technology to
transform weak learners into strong ones. On the basis of this
conception, Breiman proposed an algorithm-independent
method named bagging [9,10]. By establishing various
individual models and amalgamating their outputs into a
single one, bagging performs an ensemble classification or
numeric prediction. This method helps most if the weak
learning algorithm is unstable in that small changes in the
input data result in quite different estimations. Hence,
bagging can hardly improve the k-NN prediction on account
of its stability. To solve this problem, we introduce attribute
selection into the ensemble k-NN prediction to perturb the
training sets. By this means, bagging is able to construct
diverse k-NN models.
The rest of the paper is organized as follows: In section II,
we describe the details of bagging-based k-NN algorithm for

AbstractThe fixed number of reference neighbors leads to


mutable prediction errors and low overall accuracy for various
unknown instances, according to existing k-NN methods. To
address this problem, a bagging-based k-NN numeric
prediction algorithm with attribute selection is proposed.
Within each training procedure, a set of base k-NN predictors
are built iteratively in terms of different bootstrap sampling
datasets. Then, the base predictors estimate the unknown
instance respectively. The combination mechanism of these
individual outcomes determines the performance of this
ensemble algorithm. Hence, an instance-relevant rule is
proposed to calculate the composite result. The weight of each
base k-NN predictor is dynamically updated with respect to the
distinct features of current unknown instance. The accuracy in
response to different number of base k-NN predictors is also
explored. The experimental results on public datasets show the
considerable improvement of k-NN numeric prediction.
Keywords- numeric prediction; k-NN algorithm; bagging;
instance-relevant combination

I.

INTRODUCTION

The k-Nearest-Neighbor (k-NN) learning has been widely


used in classification and numeric prediction of data mining.
According to this scheme, an unknown instance is classified
or predicted by the majority vote or weighted average of its k
closest neighbors, searching out from the given sample. The
Euclidean distance is usually chosen to be the proximity
measurement among instances. Extensive studies on k-NN
method in the past have been mainly concerned with the
improvement of classification accuracy and the efficiency of
neighbor search, including text categorization with attribute
weight adjustment [1], hybrid approach of simultaneous
attribute selection and weighting based on Tabu search
heuristic [2], multiple k-NN classifiers with different
distance calculation functions [3], semantic distance
measurement among instances [4], Estimating an adequate
value for k [5] ,reduced method for lower-cost neighbor
search in terms of gray relational structure [6], B+-tree
indexed data for accelerating the neighbor search in highdimensional space [7], artificial immune system based
reduction for training data dimension [8] and so on. In order
to achieve better accuracy, the works aim to refine the
proximity measurement and locate right neighbors. Although
these means have promoted k-NN algorithm in some sense,
they still suffer from the same problem incurred by the
drawback of fixed k. As an instance-based lazy learning
This work was supported by the National Science Foundation of China
under Grant NO.90718024 and the Science Foundation of Xian Jiaotong
University with No. xjj2009052.

978-0-7695-4203-4/10 $26.00 2010 IEEE


DOI 10.1109/ISDPE.2010.6

B. Training of Base k-NN Predictors


The training procedure on sampling datasets is to build
plentiful base k-NN predictors. Every predictor is described
by two parameters: a proper k and its attribute subset. The
training procedure tries to find out them by minimizing the
overall error on the specified sampling set. There are a
number of measures for the error calculation in numeric
prediction. Since in most practical situations the performance
of a prediction algorithm can be evaluated correctly no
matter which measure is applied, we choose the usual mean
relative error. The parameters of base k-NN predictors often
differ from each other due to various training sets generated
by bootstrap sampling. The mean relative error is evaluated
in terms of all the elements in a sampling set, so that
duplicate instances and those ones with common
characteristics attract more attention in training procedure
and therefore the selected k and attribute subset offer them
more accurate prediction. Unknown instances having similar
characteristics to the instances above may be predicted
accurately as well by the same parameters on this training set.
Another remaining problem is the metric space for
neighbor search. To an eager learning algorithm, the base
model generally learns from the sampling set by leave-oneout approach, taking no account of the original dataset.
However, it is infeasible to lazy instance-based k-NN
algorithm. According to the training method in eager
learning, k-NN prediction searches the neighbors for an
instance (x, y)Di from the set Di(x, y). In this way, the
value of selected k will likely be one due to error-free
prediction of the duplicate instances drawn by bootstrap. It is
useless in practice for the loss of generalization capability.
To avoid this, the search space is switched to D(x, y).
Let dj be the distance between the rth instance (xr, yr)
Di (r=1n, n=|Di|) and its jth (j=1k) nearest neighbor
from D(xr, yr), calculated by the given k and attribute
subset. Then, the predicted value y r and its relative error er
are measured as follows, where wj refers to the weight of the
jth instance:

numeric prediction. In section III, we evaluate this novel


algorithm, especially the instance-relevant combination rule,
on public datasets. Finally, we summarize our work.
II.

BAGGING-BASED K-NN NUMEIRIC PREDICTION

The bagging-based scheme is inspired by the ensemble


principle. At the beginning of this section, we introduce the
ensemble model for k-NN prediction. Next, the method of
training base predictors within this model is explored. After
that, we present the most important instance-relevant
combination rule. At last, a complete k-NN prediction
algorithm is realized.
A. Ensenble Model
According to bagging method, a series of training sets is
drawn from the original dataset firstly. Then, one base model
is learned for each training set. The bagging-based k-NN
prediction runs in a similar way illustrated in Fig. 1. Because
of the bootstrap sampling with replacement, some instances
of original dataset D will not be included in a certain
sampling set Di (i1T), whereas some others may occur
more than once in Di. Bootstrap repeats T times to generate T
corresponding training sets, where the value of integer T is
predefined prior to training. These T sampling sets are of
equal size to the original dataset D.

Figure 1. Ensemble model of k-NN numeric prediction.

Each base model established by training procedure refers


to a k-NN predictor presently. The ensemble model for k-NN
prediction consists of these T predictors. Instances in Di
ought to derive optimal overall accuracy from the base k-NN
predictor hi. The variety of the sampling sets endows the
predictors with wide applicability to different sorts of
unknown instances. Thus, the accurate prediction of most
unknown instances is ensured.
The base k-NN predictors estimate the unknown instance
respectively and give their individual outputs. Then, the
ensemble model combines the outputs to be the final result.
Not all the outputs are very close to the actual value of this
instance. Only such predictors learning from the training sets
of similar characteristics to the current unknown instance
occupy the ability to predict it accurately. Hence, they should
be assigned heavier weights in the combination. Invariable
weight distribution of base k-NN predictors can not adapt to
varying characteristics of different unknown instances. For
the purpose of dynamic weight adjustment, we present an
instance-relevant combination rule in subsection C for the
ensemble k-NN prediction.

wj =

1 d
k

(1)

j =1

y r =

(2)

y r y r .
yr

(3)

j =1

er =

Hence, the mean relative error of current k and attribute


subset on Di is:
=

n
1
er .
n r =1

(4)

By comparing the mean relative errors produced by each


candidate k and its attribute subset, the ones corresponding to
the minimum error is selected to be the parameters of
training set Di.
There are many kinds of techniques for attribute selection
including filter, wrapper, etc. [11]. The former makes
assessment on general characteristic of the training set; the
latter evaluates the subsets iteratively using the algorithm to
be employed for learning. Hence, the filter approaches have
shorter running time but mediocre performance, whereas the
wrapper approaches show the opposite. In order to acquire
appropriate attribute subset, a feasible approach named
backward elimination is wrapped into the ensemble k-NN
prediction in virtue of its practicability. With regard to a
training set of l attributes, one by one the attributes are
removed from the dataset to formulate l candidate subsets.
Then, the performance of each subset together with current k
is evaluated on the training set sequentially. The one
producing the most performance gains is chosen to be the
full attribute set for next iteration, on which the same
elimination repeats until no more gains are available.
Backward elimination is a kind of standard greedy search
procedure and warrants to find out a locally optimal subset.
The selected subset may be different when the value of k
varies, so the base k-NN predictor chooses an attribute subset
for each possible k and designates the best pair as its
parameters.
The ensemble principle has to spend a lot of time for the
construction of numerous base models. Hence, an ensemble
algorithm generally costs much more in contrast with the
weak learning scheme. However, the issue in bagging-based
k-NN prediction is not as serious as it seems to be in other
ensemble learning algorithms, in virtue of its advantage of
concurrency. Because we can draw sampling sets
independently from the original dataset, the succeeding
training of base predictors and separate prediction of
unknown instance run synchronously in shorter time. It is
absolutely impossible that this process happens in another
classical ensemble method named boosting, which generates
sampling sets sequentially by reason of iterative instance
weight update following each training round.

accordance with this combination, the weight of each


predictor is void during the training. Once the unknown
instance (x, y) is specified, the weight is going to be
distributed dynamically.
At first, each base k-NN predictor locates the closest
neighbor for unknown instance (x, y) consecutively from
original dataset D. There are a total of T closest neighbors
corresponding to T predictors. Suppose (xt, yt) is the closest
neighbor found by the tth predictor ht, t1T. Then, the
weight of ht is calculated by following equations:
c (t ) =

w (t ) =

At

1,
e

(5)

c (t )

(6)

c (t )
t =1

In (5), d denotes the distance between (x, y) and (xt, yt).


Since Euclidean distance is measured on the attribute subset
rather than the full set, the number of selected attributes
affects the value deeply. Thus, we divide the distance by |At|,
the number of attributes in subset. The symbol e refers to the
error of (xt, yt) produced by ht with leave-one-out approach
on D. This kind of errors may be calculated instantly during
combination or stored in advance as long as the base
predictor is established. The latter is preferred owing to the
necessity in any ensemble prediction. Greater distance or
error implies lower probability for accurate prediction of (x,
y). In (6), the weight w(t) is normalized into [0, 1] when all
c(t), t=1T, get ready. Compared with other regular rules,
instance-relevant combination requires more time cost,
whereas it is worthwhile for better accuracy.
An alternative simplified combination rule specifies the
individual output of the most promising predictor, i.e., the
one with greatest weight, to be the composite result, which
ignores the contribution of other predictors indeed. This
scheme inevitably yields slightly lower accuracy than
previous instance-relevant rule, combining all the base k-NN
predictors by their adaptability.

C. Instance-relevant Combination
Unknown instance is estimated as long as all the base kNN predictors have been established. A regular way refers
the mean or median of the individual predicted values to
composite result. Some other improved methods put forward
weighted combination by consulting the performance of each
base predictor during the training procedure. The one
producing slighter error receives heavier weight and plays a
significant role in combination, whatever the coming
unknown instance is like. Although these strategies cost less,
they can hardly benefit from the diverse characteristics
belonging to various unknown instances.
To make full use of the characteristics, we present an
instance-relevant weighted combination based on the
assumption that the unknown instance should obtain reliable
prediction if its neighbors found in training set have been
estimated accurately by the same base k-NN predictor. In

D. Bagging-based k-NN Algorithm


So far the issues about training and combining the base kNN predictors have been addressed. A complete
implementation of bagging-based ensemble k-NN prediction
is given in Fig. 2.
Algorithm: Bagging-based k-NN numeric prediction
Input: dataset D, unknown instance (xy)
Output: predicted value y
1
2
3
4
5

Normalize each numerical attribute to the interval of [0, 1];


for t=1T // build T base k-NN predictors
draw a bootstrap sampling set Dt from dataset D, |Dt|=|D|=n;
for k=1Kmax // training for ht
Predict the instances in Dt with respect to k nearest neighbors,
using p to denote the mean relative error;
6 for u1|U| U refers initially to the full attribute set of Dt
7
remove the uth attribute from U to formulate a subset U ;

8
9
10
11
12
13
14
15
16
17
18
19
20

A. Settings
The three public datasets [12] consist of numeric and
categorical attributes, involving different fields about
automobile, heart disease and cloud seeding. The size of
each dataset is given in Table I. For example, auto93 dataset
comprises 82 instances and each instance has a total of 23
attributes, including 6 categorical ones, 16 numeric ones and
one target attribute. We remove a few instances because of
missing values. In fact, k-NN learning is capable of
incomplete instances by ignoring the attributes in distance
calculation. However, this may be unfavorable to attribute
selection. A traditional k-NN prediction algorithm is
implemented as well to be a basis for evaluation. Its only
parameter k is determined by leave-one-out learning in
training procedure.

Dt Dt ( U ), D D( U );
use the datasets projected on U
Predict the instances in Dt , searching the k neighbors from D .
Calculate the mean error (u);
end for
m=min((u)), u1|U|
if m p then
UUmin; Umin is the attribute set corresponding to m
pm, go to 6; continue the selection
else
terminate the attribute selection. // the error can not be reduced
anymore
end if
err(k)p;
end for
, k1Kmax; construct ht with Kt and its
K t = arg min (err (k ))
k

attribute subset At
21 end for; // T base predictors have been built
22 for i=1T // begin to predict (x, y)
23 calculate the predicted value y(i) of (x, y) on hi;
24 calculate the weight w(i) of hi with (5) and (6);
25 end for
T
26
y = w(i ) y (i )

TABLE I.
Dataset

i =1

Figure 2. Bagging-based k-NN numeric prediction algorithm.

Attributes
categorical

numeric

total

auto93

82

16

23

Cleveland

297

14

cloud

108

We choose mean relative error and standard deviation for


the comparison between bagging-based and traditional k-NN
prediction in terms of 3-fold cross validation. The absence of
usual 10-fold cross validation is ascribed to insufficient test
instances in a fold, especially for instance-based learning
with small samples. Considering that varying number of base
k-NN predictors potentially leads to different performance of
the ensemble, the experimental results according to 5, 10 and
15 base predictors are investigated respectively. On each
dataset the experiment repeats three times in case of
occasional evaluation made by randomness of cross
validation or bootstrap sampling.

Because different attributes in a dataset often have


different scales, the effects of some attributes may be
dwarfed by others with larger scales. Hence, in line 1 of Fig.
2 the numerical attribute values must be normalized firstly to
lie between 0 and 1. From line 2 to line 21, T base k-NN
predictors are built in succession. The variable Kmax in line 4
is a predefined integer for declaring the maximum number of
reference neighbors. Theoretically, the value of Kmax may
reach n-1 at most, where n refers to the number of instances
in dataset D. However, too large k always incurs poor overall
accuracy in practice. Thus, it is not worth assessing all these
n-1 values of k with accuracy. It is necessary to evaluate the
performance of full attribute set before the backward
elimination in lines 5-20. If the mean error p on full attribute
set is smaller than on any subset, the selection should
suspend. Although this situation rarely happens, it does exist.
Finally, the unknown instance (x, y) is predicted in lines 2226.
Being a dominating part of the cost, the attribute
selection, i.e., backward elimination, in bagging-based k-NN
prediction has a complexity of O(l2n2), where l is the number
of full attributes. In contrast with the computational
complexity O(n2) of standard k-NN without attribute
selection, the complexity of sequentially implemented
ensemble k-NN reaches O(TKmaxl2n2). However, the
complexity may be further reduced to O(Kmaxl2n2) if
performed concurrently.
III.

Instances

SIZE OF PUBLIC DATASETS

B. Results
The experimental results on auto93 dataset are shown in
Table II. We use I-R, Mean, Med to indicate different kinds
of combination rules in bagging-based k-NN prediction, i.e.,
instance-relevant, mean and median, whereas k-NN refers to
the traditional algorithm.
Compared with standard k-NN prediction, the mean
relative error has been reduced effectively by the ensemble.
In each experiment round, the error decreases gradually
accompanied with increasing value of T. This improvement
benefits from more candidate base k-NN predictors when
greater T is preset. In this case an unknown instance stands a
good chance of finding out appropriate base predictors with
uniform characteristics. Among the three rules, instancerelevant (I-R) combination surpasses the other two
substantially as the weight of each base predictor is assigned
dynamically in terms of the unknown instance. Although the
performance of both mean and median combination is
inferior to I-R, they do promote k-NN prediction. In
particular, the mean relative error has been reduced by
16.47%-25.25%, according to instance-relevant combination
when T=15, achieving the best results on auto93 dataset.

EXPERIMENTAL RESULTS

In this section experiments are made to test the baggingbased k-NN numeric prediction algorithm. Firstly we
illustrate how the performance is evaluated. Then, the
experimental results are investigated.

others may produce very large errors. Since the ensemble kNN algorithm aims to provide accurate prediction to all
instances, not just part of them, the mean error as well as
standard deviation incline to be reduced consequentially.

Besides the mean relative error, its standard deviation has


also been cut down. A high standard deviation indicates
numerous prediction errors spreading out over a wide range,
i.e., some instances can be predicted accurately, whereas the
TABLE II.

EXPERIMENTAL RESULT OF AUTO93 DATASET


Algorithm

Round

Error

T=5
I-R

1
2
3

Mean

T=10
Med

I-R

Mean

T=15
Med

I-R

Mean

Med

k-NN

MREa (%)

17.98

18.39

19.35

16.87

18.24

18.62

16.65

18.05

18.18

19.95

Std. Dev. b

0.1352

0.1336

0.1394

0.1348

0.1357

0.1383

0.1300

0.1337

0.1350

0.1619

MRE (%)

18.97

19.10

20.20

17.75

18.74

19.34

17.09

18.72

19.24

20.46
0.1589

Std. Dev.

0.1459

0.1468

0.1561

0.1451

0.1574

0.1535

0.1332

0.1545

0.1505

MRE (%)

17.67

17.85

18.40

15.51

16.81

17.26

15.25

17.39

17.42

20.40

Std. Dev.

0.1291

0.1311

0.1427

0.1264

0.1255

0.1357

0.1145

0.1228

0.1360

0.1399

a: Mean Relative Error b: Standard Deviation

In Table III we illustrate the experimental results on


Cleveland dataset, which consist of 297 instances and 14
attributes. The overall accuracy of standard k-NN prediction
on this dataset is worse than that on auto93. However, the
ensemble promotes it in the same way. When the value of T
rises to be larger, both mean relative error and standard
deviation are depressed, except in round 3 (T=10, 15) that
the increment of T has almost no effort for making further
TABLE III.

improvement, possibly caused by similar sampling sets and


base predictors. Diverse base predictors help to predict more
instances precisely. The error of median rule on this dataset
is smaller than that of mean combination, whereas the former
brings greater standard deviation. On Cleveland dataset, the
mean relative error has been reduced by 18.60%-24.80%
when instance-relevant combination is applied at T=15.

EXPERIMENTAL RESULTS ON CLEVELAND DATASET


Algorithm

Round

Error

T=5
I-R

1
2
3

Mean

T=10
Med

I-R

Mean

T=15
Med

I-R

Mean

Med

k-NN

MRE (%)

29.13

33.35

32.30

29.09

33.02

31.26

27.88

33.19

30.79

34.25

Std. Dev.

0.4275

0.4320

0.4970

0.3916

0.4288

0.4837

0.3752

0.4209

0.4831

0.4637

MRE (%)

31.19

35.78

33.93

29.01

35.60

33.67

28.78

35.22

33.47

36.27

Std. Dev.

0.4936

0.4937

0.5544

0.4482

0.4876

0.5426

0.4418

0.4642

0.5263

0.5831

MRE (%)

25.91

33.43

32.06

25.80

32.89

30.54

25.81

32.85

31.31

34.32

Std. Dev.

0.3922

0.3931

0.4046

0.3936

0.3799

0.4024

0.3922

0.3692

0.4353

0.4919

mean and median. Nevertheless, all these rules promote the


standard k-NN prediction significantly. The reduction of
mean relative error ranges from 42.63% to 46.11% in terms
of instance-relevant combination while 15 base k-NN
predictors are available.

Remarkable improvement is shown in Table IV, on the


third public dataset, cloud, containing 108 instances and 7
attributes. Unlike the previous two datasets, there is not
much difference in accuracy among the three mechanisms of
combination. The instance-relevant rule is slightly better than
TABLE IV.

EXPERIMENTAL RESULTS ON CLOUD DATASET


Algorithm

Round

1
2
3

Error

T=5

T=10

T=15

k-NN

I-R

Mean

Med

I-R

Mean

Med

I-R

Mean

Med

MRE (%)

12.30

12.47

12.67

11.89

12.12

12.14

11.79

12.22

12.29

20.55
0.1453

Std. Dev.

0.1028

0.1074

0.1087

0.0991

0.1026

0.1051

0.0986

0.1036

0.1068

MRE (%)

11.89

12.28

12.43

11.97

12.29

12.20

11.84

12.09

12.37

21.53

Std. Dev.

0.1062

0.1085

0.1124

0.1074

0.1095

0.1119

0.1078

0.1080

0.1126

0.1652

MRE (%)

12.68

12.78

13.11

12.65

12.62

12.73

12.34

12.57

12.50

22.90

Std. Dev.

0.1068

0.1042

0.1065

0.1058

0.1041

0.1062

0.1058

0.1033

0.1071

0.1629

of the three datasets, the standard deviation of median


combination is greater than mean calculation. More base kNN predictors are preferred when sufficiently accurate
prediction is required, like the average results have shown
clearly in Table V that larger T always brings us outstanding
performance. Meanwhile, higher computational complexity
is inevitable. In practice, we need to make trade-offs between
the performance and cost.

At last we present in Table V the average results on each


test dataset within three experimental rounds. It can be seen
that the performance of instance-relevant combination has
exceed the other two combination rules and traditional k-NN
prediction substantially. Moreover, the accuracy of mean and
median combination changes with datasets. On auto93 and
cloud datasets the mean combination outperforms the median,
whereas on Cleveland dataset the latter is better. In all cases
TABLE V.

AVERAGE OF THREE ROUNDS


Algorithm

Dataset

Error

T=5
I-R

auto93
Cleveland
cloud

IV.

Mean

T=10
Med

I-R

Mean

T=15
Med

I-R

Mean

Med

k-NN

MRE (%)

18.21

18.45

19.32

16.71

17.93

18.41

16.33

18.05

18.28

20.27

Std. Dev.

0.1367

0.1372

0.1461

0.1354

0.1395

0.1425

0.1259

0.1370

0.1405

0.1536

MRE (%)

28.74

34.19

32.76

27.97

33.84

31.82

27.49

33.75

31.86

34.95

Std. Dev.

0.4378

0.4396

0.4853

0.4111

0.4321

0.4762

0.4031

0.4181

0.4816

0.5129

MRE (%)

12.29

12.51

12.74

12.17

12.34

12.36

11.99

12.29

12.39

21.66

Std. Dev.

0.1053

0.1067

0.1092

0.1041

0.1054

0.1077

0.1041

0.1020

0.1089

0.1578

[1]

E. H. Han, G. Karypis, and V. Kumar, Text categorization using


weight adjusted k-nearest neighbor classification, Proc of the 5th
Pacific-Asia Conference on Knowledge Discovery and Data Mining,
Springer Verlag, pp. 53-65, 2001.
[2] M. H. Tahir, P. Bouridane, and F. Kurugollu. Simultaneous feature
selection and feature weighting using Hybrid Tabu Search/k-nearest
neighbor Classifier, Pattern Recognition Lett., vol. 28(4), pp. 438446, 2007.
[3] T. Yamada, N. Ishii, and T. Nakashima, Text classification by
combining different distance functions with weights, IEEJ Trans.
Electron. Inf. Syst., vol. 127(12), pp. 2077-2085, 2007.
[4] L. Yang, C. Zuo, and Y.G. Wang, k-nearest neighbor classification
based on semantic distance, Chinese J. Software, vol. 16(12), pp.
2054-2062, 2005.
[5] B. Borsato, A. Plastino, and L. Merschmanir, k-NN: estimating an
adequate value for parameters k, Proceedings of the 10th
International Conference on Enterprise Information Systems,
Barcelona, Spain, pp. 459-466, 2008.
[6] P. C. Huang, A novel gray-based reduced NN classification
method, Pattern Recogn., vol. 39(11), pp. 1979-1986, 2006.
[7] H. V. Jagadish, B.C. Ooib, and K. L. Tan, iDistance: an adaptive B+tree based indexing method for nearest neighbor search, ACM
Trans. Database Syst., vol. 30(2), pp. 364-397, 2005.
[8] I. Turkoglu and E. D. Kaymaz, A hybrid method based on artificial
immune system and k-NN algorithm for better prediction of protein
cellular localization sites, Appl. Soft Comput. J., vol. 9(2), pp. 497502, 2009.
[9] L. Breiman, Bagging Predictors, Mach. Learn., vol. 24(2), pp.123140, 1996.
[10] L. Breiman, Bias, Variance, and Arcing Classifiers, Technical
Report:460, Department of Statistics, University of California at
Berkeley, 1996.
[11] I. Guyon and A. Elisseeff, An Introduction to Variable and Feature
Selection, J. Mach Learn. Res., vol. 3, pp. 1157-1182, 2003.
[12] University of WAIKATO, Weka Machine Learning Project,
http://www.cs.waikato.ac.nz/ml/weka/index_datasets.html, 2009-06

CONCLUSIONS

The standard k-NN learning predicts an unknown


instance according to fixed number of nearest neighbors,
without considering its distinctive characteristics. Hence, the
overall accuracy is not always satisfying. To address this
problem, we propose a bagging-based k-NN numeric
prediction algorithm. By this novel scheme, a set of base kNN predictors are established firstly. Then, each predictor
estimates the unknown instance and gives individual output
respectively. An instance-relevant combination rule is
presented as well to replace the regular mean and median
calculation. In this way, the characteristics of both current
unknown instance and base predictors have been taken into
account thoroughly. The base k-NN predictors agreed with
the current unknown instance will be assigned heavy
weights. Our ensemble k-NN prediction algorithm is suitable
for any kinds of attributes, such as numeric, categorical or
mixed. The experimental results on public datasets show the
improvement of the bagging-based k-NN prediction in
contrast with the existing k-NN algorithm.
ACKNOWLEDGMENT
This work was supported by the National Science
Foundation of China under Grant NO.90718024 and the
Science Foundation of Xian Jiaotong University with No.
xjj2009052.
REFERENCES

S-ar putea să vă placă și