Sunteți pe pagina 1din 4

An Improved k-Nearest Neighbor Algorithm and Its

Application to High Resolution Remote Sensing


Image Classification

Ying Li, Bo Cheng


Center for Earth Observation and Digital Earth Chinese Academy of Sciences
Beijing, China

Abstract—K-nearest neighbor (KNN) is a common classification


method for data mining techniques. It has been widely used in
many fields because of the implementation simplicity, the clarity
of theory and the excellent classification performance. But KNN
will increase classification error rate when training samples
distribute unevenly or sample number of each class is very
different. So, learning from the idea of clipping-KNN, this paper
adopts an improved KNN classification algorithm and applies it
to object-oriented classification of high resolution remote sensing
image.

Firstly, as sample points, image objects are obtained through


image segmentation. Secondly, original KNN, clipping-KNN and
the improved KNN are introduced and used to classify those
sample points respectively. Finally, classification results are
compared. Experiment shows that in the same training set and
testing set, the improved KNN algorithm can achieve higher
accuracy in the classification of high resolution remote sensing
image.

Keywords- KNN classification; high resolution remote sensing Figure 1. Segmentation parameters
image; object-oriented;segmentation
The technology of object-oriented classification is now an As shown in Fig.1: Layer ni, Layer r, Layer g, Layer swir
inevitable trend for the development of high resolution remote are four bands from multi-spectral image and Layer pan is the
sensing image. This paper adopts SPOT5 image data and panchromatic image. Usually the band being involved in
completes segmentation and feature extraction with the help of segmentation will get 1as its weight, otherwise 0. The scale
eCognition software. Then aiming at the defect of traditional parameter which mainly depends on user needs, image
KNN method and learning from the idea of clipping-KNN, the resolution and the complexity of geographical entity [1] is often
paper applies an improved KNN algorithm in the classification obtained by a number of tests and comparisons. Color weight
of image objects obtained by segmentation. Experiment proves can be set as 0.9 or 0.8, generally not less than 0.7. Smoothness
that: the improved KNN algorithm has better performance and and compactness are two parameters of shape features.
its application in object-oriented classification of high Smoothness stands for the smoothness degree of regional
resolution remote sensing image is feasible and advisable. border after segmentation, while compactness ensures more
compact regions. For example, circle has the highest
smoothness and rectangular has the highest compactness
I. SEGMENTATION AND FEATURE EXTRACTION
After segmentation, in order to describe image objects,
Image segmentation is a necessary step in the object- some of spectral and spatial features are extracted by
oriented classification. With the resolution improvement of eCognition [2] and recorded in a certain file. Table.1 is a part of
remote sensing image, more and more spatial information is the records: Each line stands for an image object which is also
available. Both spectral and spatial information must be called a sample point. The following six features constitute a
considered throughout the whole segmentation process. In this pattern vector.
paper, segmentation is completed by eCognition.
After strict image registration, panchromatic and multi-
spectral images are imported into eCognition at the same time,
and the segmentation parameters are shown in Fig.1:
TABLE I. FEATURE RECORDS OF IMAGE OBJECTS B. Clipping-KNN algorithm
A classifier is designed by its training samples, so the
quality of training samples will have a direct impact on
classifier performance. Based on this idea, Clipping-KNN
algorithm clips training set and gets rid of those sample outliers
which are not conducive to the classifier design. Two steps are
included: clipping and classification.

[
Suppose the training set is X = x1 , x2 , " x N ]:
Step1: Classify X through the KNN decision rules, then
delete those samples with wrong decision to form a new
II. AN IMPROVED K-NEAREST NEIGHBOR CLASSIFIER ∗
training set X constituted by the rest.
The original KNN algorithm was proposed by Cover and

Hart in1968 [3]. The KNN classifier is a very simple non- Step2: Classify the test samples through the new X and
parametric method for classification. Despite the simplicity of the same decision rules.
the algorithm, it performs very well and is an important
This clipping-KNN algorithm was proposed by Devijver
benchmark method . Because of its clear principles and
[4]

and Kittler in 1982 [5]. With the removing of samples which are
excellent classification performance, there has been a very
not reliable, the classification error rate can be gradually
wide range of applications.
declined.
A. KNN Decision rules
C. Improved KNN algorithm [6]
Assumed conditions: There are C classes marked as
KNN decision rules are mainly based on the number of
1,2," C . Each class has training samples with the same class training samples in K nearest neighbors of a certain test
label. x is a test sample. sample, which means each of these K training samples, either a
KNN method mainly depends on K which means the reliable one distributed in the center of a class or an unreliable
number of nearest neighbors. Decision rules can be described one distributed in the border, has the same contribution to
as follows: decision.
Clipping-KNN really deletes some unreliable border
When K = 1 , KNN method is known as NN(nearest
samples. However, on the one hand some important border
neighbor) method. Firstly, calculate the Euclidean distances
samples which usually reflect the details of information may be
between the test sample x and all training samples. Secondly,
lost forever, thereby the performance of classifier will be
find out the nearest neighbor, that is to say, the nearest training reduced. On the other hand, only the sample number tends to
sample to x . Finally give x the class label as same as the be infinite, can KNN algorithm achieve optimal performance
nearest neighbor’s. [2]
. Under the small training set, training samples have already
When K ≠ 1 , NN algorithm can be extended to KNN been scant, but Clipping-KNN further lowers the ratio of
algorithm. KNN tries to look for K nearest neighbors of x . sample number and sample dimension to increase the difficulty
Among these K nearest neighbors, if samples belonging to of classification.
class i have the largest amount, the class label of x can be For the above problems, this paper introduces an improved
marked by i . KNN algorithm. The basic idea is: Classify the training
samples by KNN decision rules. Then assign different weight
When training samples distribute unevenly or sample to different sample. The sample in full compliance with
number of each class is very different, KNN is liable to make decision rules will get highest weight and the sample totally
wrong decisions. For example, in Fig.2 the test sample in fact contrary to the rules will get lowest weight instead of being
belongs to class A, but will be decided to class B when K is deleted. In the test set classification, the same KNN decision
greater or equal to 9. rules are still used, but the decision basis is the weight sum of
samples from each class in K nearest neighbors of a test
sample, rather than the number of samples. Details of the
improved algorithm are as follows:
In the problem of J classes marked as 1,2," J ,
[ ]
X = x1 , x2 , ", xN is the training set and sample number
is N .
Firstly, X is classified by KNN decision rules. If
sample xi comes from class j and there are n j samples which
Figure 2. KNN decision rules
belong to the same class in K nearest neighbors of xi , the III. EXPERIMENT AND CONCLUSION

weight of xi can be defined as Fig.3 is the original image which will be divided into five
classes: class ROAD marked as 1, class GREENLAND marked
as 2, class WATER marked as 3, class BARELAND marked as
4 and class SETTLEMENT marked as 5.
wxi( j ) = n j K, j = 1,2, " , J (1)

In order to prevent 0 value of wxi( j ) , formula (1)can be


modified as

nj
wxi( j ) = α j +( 1 − α j) ⋅ , j = 1,2,", J (2)
K
Figure 3. Original image
In formula (2), α j is a constant and 0 < α < 1 . If n j = 0 , 1).The original image is segmented to form image objects
sample xi gets the lowest weight α j . If n j = K , sample through the method introduced in section1. Then for every
object, six features are extracted. These objects constitute the
xi gets the highest weight 1. test set showed in Table.2, and our purpose is to get the
predicted class label.
Secondly, test set is classified by KNN decision rules after
obtaining the weight of every training sample. If in K nearest
TABLE II. A PART OF TEST SET RECORDS

neighbors of the test sample x , there are n j training samples
from class j , discriminant function can be given by

nj
g j ( x) = ∑ wxl( j ), j = 1,2, " J (3)
l =1

If g t ( x) = max g j ( x), j = 1,2," J , the class label 2).In training set as shown in Table.3, there are totally 436
j samples: 42 from class ROAD, 166 from class GREENLAND,

of x can be marked by t . 72 from class WATER, 53 from class BARELAND and 103
from class SETTLEMENT. The sample number is so
It is thus clear that α j value avoids the loss of useful unbalanced that α of each class is respectively given by 0.5,
0.1, 0.3, 0.4 and 0.2.
information and the weight reflects the distribution density of
intraclass samples around a certain test sample. More the
intraclass samples exist, greater the weight is, vice versa. In TABLE III. A PART OF TRAINING SET RECORDS
this way, samples located in the border of a class will receive a
smaller weight, and samples in the center of the class will
receive a greater weight. In the classification of test samples,
the number of samples from each class is replaced by weight
sum to utilize more prior knowledge of training set including
not only the distribution relationship among training samples,
but also the relationship between training samples and test
samples.
Particularly in the case of small training set, this algorithm 3). Test samples are classified by KNN, Clipping-KNN and
can effectively compensate for the shortcoming of small the improved KNN. When K takes different value, the
number of samples and less available information. α value of classification accuracy is evaluated by Fig.4.
every class can be different from each other. When the number
of samples from each class is unbalanced, the class which has
fewer samples can get larger α to give its samples higher
weight and improve classification accuracy of this class.
KNN obtains highest accuracy 84% when K is equal to 20.
Regular KNN receives highest accuracy 85% when K is 15.
Overall, the improved KNN brings better classification effects
whose accuracy is 3% higher than Clipping-KNN and 3.2%
higher than regular KNN in average. So, the conclusion can be
summarized as follows:
The improved KNN algorithm introduced in this paper has
better classification performance and its application in object-
Figure 4. Accuracy evaluation oriented classification of high resolution remote sensing image
is feasible and advisable.
4). When K is 15 or 20, the highest accuracy can be
obtained by improved algorithm and the classification thematic
map is showed in Fig.5. REFERENCES
[1] ZHAO Yutao, YU Xinxiao and GUANG Wenbin, “Review on
landscape heterogeneity”, Chinese Journal of Applied Ecology,
2002,13(04):495~500.
[2] eCognition 4.0 User’s Guide.
[3] Cover T. and Hart P., “Nearest Neighbor pattern classification”, IEEE
Transactions on Information Theory, 1967,13:21-27.
[4] J.K. Shah, B.Y. Smolenski, R.E. Yantorno, and A.N. Iyer, “Sequentialk-
nearest neighbor pattern recognition for usable speech classification”,
European Signal Processing Conference, Vienna,Austria, Sept. 2004.
[5] Devijver P.A. and Kittler J., Pattern recognition, a statistical approach,
Prentice –Hall Inc., London, 1982.
[6] Zhang Jing, Qi Chun, Experimental study of Classifier Technique,
Figure 5. Classification results with highest accuracy Master Dissertation, Xi’an: Xi'an Jiaotong University,2003.

It can be seen from our experiment that: the improved KNN


gets highest accuracy 88% when K equals 15 or 20. Clipping-