Sunteți pe pagina 1din 4

Classification of handwritten digits using structural similarity index

Aleksandar Pajkanovi, Vladimir Risojevi


Faculty of Electrical Engineering Banja Luka
aleksandar.pajkanovic@etfbl.net

Abstract
Today there is no reliable way to translate scanned
handwriting into a text file. In this paper a clssifier for
handwritten digits based on structural similarity index is
described. The classifier is implemented using MATLAB.
Results of the performed experiments demonstrate that the
classifier successfuly recognizes digits in 86% of cases.

1 Introduction
There is no automatic way for reliable handwriting
recognition and its translation into a text file. A solution to
this problem would ease and accelerate the process of
digitalizing handwritings as well as their browsing and
searching. Special case of the problem of handwriting
recognition in which significant improvement has been
achieved is the recognition of handwritten digits [1, 2, 3]. In
this paper a classifier of handwritten digits, based on a signal
fidelity measure using structural similarity index, has been
developed.
Standard criterion for measuring signal quality and its
difference from the original (signal fidelity) is mean square
error. Using this criterion different methods for signal
processing are compared, and it is used to optimize signal
processing systems. Nonetheless, mean square error does not
achieve very good results, especially if signals that represent
speech and images are considered [4]. Since mean square
error is being more and more criticized, other signal fidelity
measurements have been proposed. One of those is structural
similarity index. Structural similarity index has been
proposed because the neighboring signal samples (pixels)
have strong mutual dependencies, which are ignored by mean
squared error. These dependencies contain very important
information about the structure of the objects in the image
[4].
Even though it was originally meant to be used as a
measure of the quality of the image, structural similarity
index is a measure of the similarity of images and it can be
used to classify them. Since structural similarity index yields
better results than mean square error in measurements of the
signal fidelity [4], it is considered that the introduction of this
method to the problem of handwritten digits recognition
could also lead to improvements.
In this paper a nearest neighbor classifier for handwritten
digits, based on structural similarity index, is described. We
experimentally evaluated the classifier on a subset of
handwritten digits from MNIST database [5]. Afterwards, the
results of the experiment are discussed. In these results it is
shown that the classifier has been able to recognize the test
digits in 86% cases. This paper is similar to papers [6] and

ERK'2011, Portoro, B:329-332

329

[7], with the difference that we used nearest neighbor


classifier instead of kernel methods.
In the second section of this paper a short overview of
the structural similarity index theory is given. Afterwards, in
the third section, the algorithm for classification of
handwritten digits images is described. In the fourth section
the results of the experiment are shown and discussed in
detail.

2 Signal fidelity
A universal measurement of signal fidelity which would
be appropriate for all areas of signal processing certainly does
not exist [4]. This is because signal fidelity measurement
depends on the area in which it is being used. Prime reason
for these facts is that human perception is very subjective.
Because of this, in this paper, a measure of signal fidelity is
considered to be appropriate only if it is in accordance with
the whole communication system. The communication
system in this case consists of the natural image (transmitter)
and human visual perception (receiver). In other words, the
measurement is supposed to yield close values for visually
close images and vice versa.
Structural similarity index is based on the observations
that natural image signals have very high mutual
dependencies between neighboring pixels, and that those
dependencies carry information about the structure of the
objects in the image. Structural similarity is based on the fact
that the human visual system is adapted to observe
information about the structure of the object which it sees.
This is why it is very important for a measurement of signal
fidelity to contain information about the structure of the
image. It is also possible to measure structural disorder to
create a measurement of signal fidelity. Since human visual
system is very sensitive to structural distortions (additive
noise, blurring the image, high-level loss compression, etc.),
while it adapts very well to distortions that do not change the
structure of the image (change of light, brightness or spatial
shift), it is necessary that the measure of image fidelity
simulates these two features [4].
2.1 Structural similarity index
Let two images be compared, and let x be a set of pixels
from the first image, and y a set of pixels from the second
image. Also, let x and y have the same coordinates, each on
their own image. Local structural similarity index measures
three elements of these two sets: similarity of brightness
l(x,y), contrast c(x,y) and sample structures s(x,y).
Combination of these local values yields structural similarity
index:

,  = , 
,  , =
  

  



 

 



=  
            , (1)


 

where  and  are mean values,  and  local standard


deviations of the sets x and y, respectively, and  is their
cross-correlation.  ,  and  are positive constants of very
small absolute value, which are used to stabilize each of the
mentioned terms in this way the values close to zero will
not lead to numerical errors during calculations [4].
Structural similarity index is a symmetric function:
,  = ,  , so the result of comparing two images is
independent on the order of the images. Numeric value of the
index is limited: 1 < ,  1, where the maximum can
be achieved only if:  = . The index is calculated locally,
within the sliding window which slides pixel by pixel over
the whole image, and then the obtained values are averaged
[4].
Even though the index is quite simple, this signal fidelity
measurement achieves very good results over a wide spread
of distortions. In [4] it is shown that the results which the
index yields are far more consistent with human perception
than those of mean square error. Changes of brightness and
contrast, distortions that do not change the structure of the
image yield high values of structural similarity index, while
the values for structural distortions are low.
2.2 Structural similarity index in the complex wavelet
domain
Structural similarity index as a measurement of image
similarity has a flaw. Namely, this approach is very sensitive
to relative translations, enlargements and rotations of images.
Since small geometric distortions do not affect the structure
of the image, this sensitivity is unwanted. This flaw of the
approach is eliminated by introducing structural similarity
index in complex wavelet domain. Introduction of this term
finds its justification in the fact that the local phase contains
more information about the structure of the image than its
magnitude [8]. In the complex wavelet domain, let
 =
$
,% |' = 1,2, , *+ and
 = $
,% |' = 1,2, , *+ be two
different sets of coefficients taken from the same spatial
coordinates within the same wavelet subbands of the two
images being compared, x and y, respectively. Then structural
similarity index of images in complex wavelet domain is
given as:
,-
 ,
 . = /
0 -
 ,
 . 12-
 ,
 . =
=

2 7
%84
,% 44
,% 4 + 6

247
%8
,%
,% 4 + 6

7
2 7
7
%84
,%
,% 4 + 6
%84
,% 4 + %84
,% 4 + 6

, 2

where the asterisk (*) marks complex conjugated value, and


K is a positive constant of very low value which is supposed
to stabilize the equation [9]. The first component: /
0 -
 ,
 .,
is determined by the magnitude of the coefficients, and its
maximum value, one, can be achieved only if 4
,% 4 = 4
,% 4,
for every i. That is why this term equals to usage of structural
similarity index on the magnitudes of the coefficients. The
component 12-
 ,
 . is determined by consistency of phase

330

changes of
 and
 . Its maximum value, one, is achieved
only if the phase difference between
,% and
,% is constant
for every i. Phase component of the structural similarity
index in complex wavelet domain transfers information about
structural similarity of images in a correct way, because the
local structural similarity is calculated using relative phase
patterns of local frequencies of the image. Also, constant
phase offset of all coefficients does not affect the structure of
local samples of the image. In this paper, for the calculation
of the index in complex wavelet domain the equation given in
[9] is used:
,-
 ,
 . =

247
%8
,%
,% 4 + 6

7
7
%84
,% 4 + %84
,% 4 + 6

3

Structural similarity index in complex wavelet domain is


calculated locally for every level of decomposition. Then,
averaging is done over spatial coordinates and over
decomposition levels. Through these operations the total
structural similarity index between the original and the
disordered image is obtained. This approach is more robust
than standard structural similarity index, against both
structural distortions and distortions that do not affect the
structural similarity [4], so it is used in the experiment within
this paper. In the rest of the paper structural similarity index
will always be calculated in complex wavelet domain.

3 Classification using structural similarity index


In this paper, using a paper from the same area [6] as a
model, MNIST database of images of handwritten digits is
used. This database was created by Corina Cortes and Yann
LeCunn [9]. In this way it is possible to compare the results
obtained by different algorithms for classification of
handwritten digits.
The database consists of two subsets, training and test.
Besides these two files which contain images of handwritten
digits, part of the library are the files which contain labels of
images contained in the first two files. Details about the
database and the way of its usage are available at the Internet
site of the authors [5].
Since we use the nearest neighbor rule for classification,
we do not have the training phase and we will refer to
training and test images as labeled and unlabeled,
respectively. In the experiment within this paper two hundred
labeled and one hundred unlabeled images from MNIST
database are used.
First we need to examine the possibility of thresholding
the value of structural similarity index for classification. In
order to do that, the distribution of values of structural
similarity index is examined for images from the same class
and for images from different classes. The experiment
consists of comparing (i.e. calculating structural similarity
index) of each labeled image with each unlabeled image.
Afterwards a histogram of thus obtained values is created,
Fig. 1. In Fig. 1. the ordinate represents (normalized) number
of comparisons which for result have the value of similarity
from the abscise.

1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2

Fig. 2. Confusion matrix for of the experiment with the


classifier based on the nearest neighbor principle.

0.1
0
0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Fig. 1. Histograms of the results of comparisons of different


digits (dashed line) and of the same digits (full line).
The simplest classifier to implement is the classifier
based on threshold. Threshold is possible to define in those
cases where there are no or little overlap in histograms
resulting from comparisons of samples from different classes
and from the same class. As it can be seen from Fig. 1.
histograms overlap almost completely. This means that it is
not possible to define the value of threshold which could be
used as a base for a decision whether the images belong to
the same class.
In accordance with the previous analysis, it has been
decided that the basis of the classifier should be the principle
of the nearest neighbor. The classifier based on the
mentioned principle has some attractive features: (1) it allows
working with a large number of classes and adding new
classes and labeled examples, (2) it does not demand training
and (3) it has no problems with overfitting.
The algorithm is implemented in MATLAB, and the
code is available at the Internet site of the first author of this
paper1. Code contains detailed comments. In order to achieve
pyramidal decomposition of images, functions of the library
MatlabPyrTools [10] are used.

4 Experimental results
During the experiments performed the number of correct
comparisons is counted, which enables later assessment of
the quality of the classifier. Also, the combinations of images
where the classifier made an error are noted. Classification
results are presented in the form of the confusion matrix,
which is given in Fig. 2. From the confusion matrix, it is
possible to see, not only how many times did the classifier go
wrong, but also in what way. In other words, from this matrix
it can be seen which digits the classifier has identified
wrongly. Confusion matrix enables us to see which digits
have been confused, but it is not possible to know which
exact image of the digit has been identified wrongly. This is
why the third criterion for the assessment of the classifier is
included. It consists of showing the combinations of images
for which the classifier made errors, Fig. 3.

http://pajkanovic.netne.net/cwssim

331

In Fig. 2. the diagonal of the matrix is marked with bold


typeface. This diagonal represents the number of correct
classifications of the classifier. Even though this matrix is
diagonal in ideal case, since the values of the diagonal are
greater than other elements (of which the most of them are
zero) the performance of the classifier can be considered
promising. During the described experiment, the classifier
has classified correctly 86% of images.
In Fig. 3. the exact images of digits are shown, which
the classifier classified wrongly. The first image in each pair
is from the labeled subset, and the second image is from the
unlabeled subset. This means that the second image was
wrongly classified to belong to the class of the first image.

(a)

(b)

(c)

(d)

(e)

(f)

(g)

(h)

(i)

(j)

(k)

(l)

Fig. 3. The combinations of the images of handwritten digits


which the classifier classified wrongly.
In most of the shown cases, a human would also notice
similarity between the images within the shown pairs. For
example, the similarity in cases: e, h, i and k. Half of the
mistakes the classifier made identifying digit four as digit
nine and vice versa. If these pairs are considered (b, h, i, j and
k) it is very easy to find the reason for the classifiers
decision: both digits are narrow, long and with the same
angle (b, h and k); the unfinished circle of the digit nine or
unnecessarily round upper part of the digit four (i and j), etc.
Other pairs also show obvious similarities.
A drawback of the nearest neighbor principle is that the
classifier based on it will always identify an unlabeled image
with one of the labeled images, only because the maximum
value of the index always exists, no matter how low it is.

Concerning the yielded values of the experiment and the


simplicity of the realized classifier it is obvious that the
structural similarity index in complex wavelet domain gives
promising results in the field of classification of scanned
images of handwritten digits.

5 Conclusions
Within this paper structural similarity index is
represented in its two forms (sections 2.1 and 2.2). Its
mathematical features, advantages and flaws as a
measurement of signal fidelity are listed. Since it has been
shown in the literature [4], [9] that in the area of assessment
of visual quality of image, the results which are most similar
to human perception are obtained by the structural similarity
index, we decided to use this index as a base for a classifier
of scanned images of handwritten digits. The classifier itself
works on the principle of the nearest neighbor. The algorithm
for classification is implemented in MATLAB.
While assessing the yielded results it has been concluded
that this classifier, considering its simplicity, gives promising
results, since its decisions were correct in 86% percent of the
cases.

Acknowlegments
The authors wish to express their gratitude to Professor
Branimir Reljin, PhD. for his many useful suggestions during
the writing of this paper.

References
[1] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner.
"Gradient-based learning applied to document
recognition." Proceedings of the IEEE, vol. 86, no. 11,
pp. 2278-2324, November 1998.
[2] K. Labusch, E. Barth, T. Martinetz, Simple Method for
High-Performance Digit Recognition Based on Sparse

332

Coding, IEEE Transactions on Neural Networks, vol.


19, no. 11, pp. 1985-1989, November 2008.
[3] D. C. Ciresan, U. Meier, L. M. Gambardella, J.
Schmidhuber, Deep, Big, Simple Neural Nets for
Handwritten Digit Recognition, Neural Computation,
vol. 22, no. 12, pp.3207-3220, November 2010.
[4] Zhou Wang and Alan C. Bovik, Mean Squared Error:
Love It or Leave It?, IEEE Signal Processing Magazine,
Vol. 26, No. 1. pp. 98-117, January 2009.
[5] http://yann.lecun.com/exdb/mnist/,
2010.

visited: December

[6] G. Fan, Z. Wang and J. Wang, CW-SSIM kernel based


random forest for image classification, Proc. Visual
Communications and Image Processing, Huang Shan,
An Hui, China, July 2010.
[7] Y. Gao, A. Rehman and Z. Wang, CW-SSIM based
image classification, Proc. of IEEE International
Conference on Image Processing, Brussels, Belgium,
September 2011.
[8] . Huang, . Burnett, . Deczky, The importance of
phase in image processing filters, IEEE Transactions on
Acoustics, Speech and Signal Processing, vol. 23, no.6,
pp. 529-542, December 1975.
[9] Mehul P. Sampat, Zhou Wang, Shaplini Gupta, Alan
Conrad Bovik and Mia K. Markey, Complex Wavelet
Structural Similarity: A New Image Similarity Index,
IEEE Transactions on Image Processing, vol. 18, no. 11,
pp. 2385 2401, November 2009.
[10] http://www.cns.nyu.edu/~lcv/index.html,
December 2010.

visited:

S-ar putea să vă placă și