Sunteți pe pagina 1din 5

The Role of Size Normalization

on the Recognition Rate of Handwritten Numerals

Chun Lei He, Ping Zhang, Jianxiong Dong, Ching Y. Suen, Tien D. Bui

Centre for Pattern Recognition and Machine Intelligence, Concordia University


Montreal, Quebec, Canada H3G 1M8
{cl_he, pin_zhan, jdong, sue , bui}@cenparmi.concordia.ca

Abstract Although correctly selecting learning algorithms


helps the improvement of recognition rate, one crucial
Size normalization is an important pre-processing factor of affecting the recognition rate is always
technique in character recognition. Although various ignored. From many observations and experiments, we
effective learning-based methods have been proposed, suspected that low resolution of original data could
the role of the original data in a database is always reduce the recognition rates of OCR systems
ignored. In this paper, we have conducted experiments dramatically.
to investigate its effects with neural networks and In this paper, we analyze the effect of size
support vector machines and have found that the normalization on the recognition of handwritten
performance of handwritten numeric recognition numerals. We normalize images to different sizes,
systems deteriorates dramatically due to low size applying the same classifier and features to observe the
resolution. For the MNIST dataset, this study shows relationship between sizes and recognition rates. In
that enlarging the size from 20 * 20 to 26 * 26 by pre-processing, we only do size normalization;
bilinear interpolation can improve the performance gradient features [5] are extracted from the normalized
significantly. After constructing a smaller database of images. We choose ANN and SVM as classifiers for
difficult original patterns from NIST, we find that this analysis because they have good learning ability
normalizing the original data to a size larger than 20 * and exhibited good performance.
20 in MNIST increases the recognition rate further. We describe the process of pre-processing and
feature extraction in Section 2. We then describe
1. Introduction recognizing the MNIST test set under different
normalization sizes in Section 3. In Section 4, we
Handwriting recognition has been a subject of construct a new but smaller database with all the poor
research for several decades. Generally, a character and the most difficult original patterns chosen from
recognition system includes three main tasks: pre- NIST, which are selected based on the error images
processing, feature extraction, and classification. In with eight different normalization sizes. We compare
pre-processing, researchers normally perform noise the error rates of the small database at different sizes
filtering, binarization, thinning [3], skew correction and from different sources in Section 5. Finally, we
[2], slant normalization [1], etc. to enhance the quality conclude and analyze the effect of size normalization
of images and to correct distortion. In feature on the recognition of handwritten numerals in Section
extraction, various types of features and extraction 6.
techniques are available, such as geometric features,
wavelet features [10], etc. In classification, a great 2. Pre-processing & Feature Extraction
number of classification methods are available,
including statistical classifiers, artificial neural In size normalization, we keep the aspect ratio of
networks (ANNs) [9], support vector machines the images and normalize them to bigger sizes. First,
(SVMs) [6], and Multiple Classifier Systems (MCSs) we binarize and cut the original images (Figure 1(a))
[7] [8]. of an MNIST numeral into a rectangle with the same
height and width of original patterns (Figure 1(b)).

1
After that, we enlarge images to a fixed size (e.g. 26 * Substitution Rate vs Normalization Size
26) using a bilinear interpolation algorithm (Figure
1(c)) [11]. Finally, we put the new normalized images 5
4.56
4.5
at the center of an empty image with size 32 * 32 4

(Figure 1(d)), ready for the extraction of gradient 3.5

Substitution rate
3.2
features [5]. In each pattern, a feature vector with size 3
2.5
400 (5 horizontal, 5 vertical, 16 directions) is 2 2.02

produced. 1.5
1.15
1 1.13 1.1
0.94
0.5
0
20*20 22*22 24*24 26*26 28*28 30*30 41*41
ANN 4.56 3.2 2.02 1.15 1.13 1.1 0.94
Normalization size

ANN

Figure 2. Substitution rates at different


normalization sizes of MNIST with ANN
(a) 28 * 28 pad (b) 20*20 cut image
Substitution Rate vs Normalization Size

1.2

1 1.02 1 1.01

0.84

Substitution rate
0.8 0.81 0.79
0.75

0.6

0.4
(c) 26*26 enlarging (d) 32*32 pad
0.2

Figure 1. Sample images in pre-processing 0


20*20 22*22 24*24 26*26 28*28 30*30 41*41
SVM 1.02 1 1.01 0.84 0.81 0.79 0.75
Normalization size
3. Recognizing images in MNIST with SVM

different sizes
Figure 3. Substitution rates at different
normalization sizes of MNIST with SVM
In this section, we applied two different classifiers –
We find that when we increase the normalization
a 3-layer ANN with back propagation (BP) algorithm
sizes from 20 * 20 to 26 * 26, the substitution rate
and a SVM classifier to test all the patterns in our
decreases from 4.56% to 1.15% and from 1.02% to
MNIST test set to observe the recognition rate at
0.84% with ANN and SVM respectively. When we
different sizes. The reason for choosing two classifiers
increase the normalization sizes from 26 * 26 to 41 *
is to ensure that the normalization size affecting the
41, the substitution rate continues to decrease, but the
recognition rate of the system is not happening because
differences are much smaller. Therefore, in our
of these classifiers. In ANN, the number of nodes in
experiments, we observed that 26 * 26 is an optimal
the first layer is equal to 401 (the number of features +
size of normalization. The recognition rate rises when
1 bias node); the number of nodes in the hidden layer
we normalize images to bigger sizes.
is 100; and the number of nodes in the output layer is
As images in MNIST have already been
10 representing the 10 classes. As a result, we find the
normalized, normalizing them to a bigger size is the
recognition rates rise with both ANN and SVM when
second source of distortion of the originals. Even
we enlarge the images to bigger sizes. The details are
though we distort (normalize) images twice, the
shown in Figure 2 and Figure 3.
recognition rates still rise. This suggests that if we
We provide some statistics on MNIST database [4],
normalize the image to a bigger size than 20 * 20 from
which is a widely known handwritten digit recognition
the originals, the recognition rate of the entire system
benchmark. It is a subset of a larger set available from
would rise because we only need to normalize images
NIST. In MNIST, the digits have been size-normalized
from the originals once instead of twice.
and centered in a fixed-size image. The original black
and white (bi-level) images from NIST were size
normalized to fit in a 20*20 pixel box while preserving 4. Finding the originals to construct a small
their aspect ratio. database

2
In all, we found 417 substitution images in 8 Our aim is to find patterns with minimum distances
different sizes. In order to create a database with the in (4.1) or patterns with distances smaller than a
most difficult cases, we constructed a small database certain threshold value. As f and g are fixed, then
with 181 images mis-recognized in at least 2 different ∑ fg gives a measure of mismatch in Equation (4.2).
sizes. Thus, we only need to find patterns with maximum
Since the MNIST database was constructed from values of ∑ fg .
NIST's Special Database 3 and Special Database 1,
which contain binary images of handwritten digits, and In the procedure of matching two images, the image
NIST Special Database 19 includes NIST’s Special has to satisfy constraints below: (i) number of
Database 3 and Special Database 1, consequently we dissimilar pixels is not big and (ii) owning similar
should be able to match all the images between the aspect ratios. If any image satisfies (i) and (ii), it is
normalized images from MNIST and the original considered as a candidate image; otherwise, if no
images from NIST SD 19. According to some image satisfies the two conditions, K in (i) need to be
statistics, the total number of handwritten labeled enlarged until an image or several images are found as
characters (digit and alphabetic) in NIST SD 19 is candidate images.
814,255. In the training set, there are 344,307 isolated (i). We use K in the following formula to represent the
digits, and there are 58,646 isolated digits in the test measure of similarity between two images:
set.
Since NIST SD 19 is too large to examine images
K = max fg ∑ (4.3)

one by one manually, we effectively apply template Here, we need K to satisfy the following condition:
matching with some constraints on number of K ≤ ( hsubstituti on * w substituti on ) / c1 , h substituti on is the
dissimilar pixels and an aspect ratio, which is ratio
between the height and the width in an image. height of the current substituted image, w substituti on is
At first, we found all the images misclassified by the weight of the current substituted image; and c1 is a
the SVM in MNIST with different sizes and sorted constant. Experimentally, we set c1 as six.
them to 10 classes (0, 1,…, 9). Secondly, we loaded
one error image in a class. Thirdly, we removed left, (ii). The difference between the aspect ratio of the
right, bottom and top boundaries and cut the image to original images and the aspect ratio of the current error
their real size. Fourthly, we loaded and normalized the image should be small. In (4.4), we experimentally set
images in NIST SD 19 to the same size of the cut c 2 as 0.1. roriginal is an aspect ratio of an original
image. After that, we matched the two images by image, and roriginal is an aspect ratio of an error image.
template matching in order to choose a candidate
image. Finally, we verified the candidate image with | roriginal − rtemplate |≤ c 2 (4.4)
local structures. When verifying the original images from candidate
Suppose that we have a template g[i, j] and we wish images, we consider two situations. If the minimum
to detect its instances in an image f[i, j]. An obvious distances in template matching are very small, we
thing to do is to place the template at a location in an endorse the images with minimum distances as their
image and to detect its presence at that point by originals. However, if the minimum distances are too
comparing intensity values in the template with the big, we need to consider all the candidate images with
corresponding values in the image. Since it is rare that their local structures in order to find their original
intensity values will match exactly, we require a images. We considered choosing several candidates
measure of dissimilarity between the intensity values instead of choosing one is because, during the
of the template and the corresponding values of the verification, we found that two specific situations
image. Here, we take the entire error image as a occurred when the minimum distances were large.
template and calculate the similarities between error Conditions:
images and original images with formula (4.1). a) If d(x, y) = || Dmin(x, y) – D2nd min(x, y)|| ≤ T, we
∑ ( f − g )2
[ i , j ]∈ R
(4.1) will match all the candidate images; otherwise, we
will assign the image with Dmin(x, y) to the image
where R is the region of the template. In the case of in MNIST as a matching pair, where x is the
template matching, this measure can be computed pattern in MNIST, y is the pattern in NIST, Dmin(x,
indirectly and computational cost can be reduced. We y) is the distance between x and y with template
can simplify: matching, and d(x, y) is the distance between
Dmin(x, y) and D2nd min(x, y), and T is a constant.
∑ ( f − g ) 2 = ∑ f 2 + ∑ g 2 − 2 ∑ f g (4.2)
[ i , j ]∈R [ i , j ]∈R [ i , j ]∈R [ i , j ]∈R

3
b) If d(x, y1) = d(x, y2) & r(x,y1) ≤ r(x,y1), where 5. Comparing the substitution rates of the
r(x,yi)= ||RMNIST (x)–RNIST(yi)||, (i = 1, 2), we assign small database at different sizes from
the image with y1, where RMNIST (x) is an aspect
ratio of pattern x in MNIST. different sources
If all the images satisfy condition a), which means
that the first candidate image is too similar to the While keeping their aspect ratios, we normalize the
second candidate image, we need to look and compare original images to various sizes and recognize the
their local geometric structures to those of the image in normalized images using the same feature extraction
MNIST. The patterns in Figure 5 serve as an example. algorithm and classifier.
Although the 1st candidate (far right image in NIST) According to Figure 6, we observe that enlarging
has the minimum distance, the second middle images again increases the recognition rate.
candidate (centre image in NIST) is the real match of Normalizing images from the originals has a better
the image in MNIST (the left one). Accordingly, performance than normalizing images from MNIST
considering the candidate images is necessary. when the images are normalized to the same sizes.
Image in Images in NIST
MNIST 11 11 1
Number of Errors vs. Normalization Size with Different
Original
111 111 11
11
111
11
11
111
11
11
111
11
Normalization Sources
111 111 111

Images
11 11 11
111 111 111
11 111 111 1
11
11
11
111
111
11
11
111
111
111
111
1111 180
11 11 111 1111 111 1111111
111111111111 1 11111111111111 11111111111111
11111111111111 11111111111111 11111111111111
11111111111
1111 111
111111111111
1111 111
11111
111
1111
111
160
1111 111 111 111 111 11
111 11 1111 111 1111 111
111
11
11
1
111
1
11
1
111
11
140

Local 120

No. of Errors
100
Structure 80
91
87
93
77
84
76
70 69
60 63
55 57 53
40

20
0
Distance 43 37 20*20 22*22 24*24 26*26 28*28 30*30
Normalization Size
Matching results √ × Normalized from MNIST Normalized from originals

Figure 4. An example that candidate images Figure 6. Number of errors in the small
are considered normalized database from different sources
Another case is the situation that condition b) is
satisfied. If condition b) is satisfied, which means that
the matched image in MNIST owns two candidate 6. Conclusion
images with minimum distance in NIST, the aspect
ratio should be considered. Let us look at the patterns Our experimental results indicate that enlarging the
in Figure 5. We have determined that the matching normalization size of MNIST/NIST numeral images
image is the middle one because it has the same aspect from 20 * 20 to bigger sizes (e.g. 26*26) improves the
ratio as the image in MNIST. recognition rate. Furthermore, most researchers agree
Image in Images in NIST that the substitution is mainly caused by the quality of
MNIST images, or distortion of images. The study has found
Original 11111
11111111
1
that one of the main reasons of substitution is the size
of images in MNIST. It can help researchers to define
111 11

Images
111 1
11 1
11
1

a normalization size when they do pre-processing in


11 111
11 111
11 11
111 11
1111111111

handwritten numeral recognition on MNIST or even


1

Aspect 20/13=1.54 72/48=1.5 51/38=1.34 construct a new database from NIST.


Ratio Since some MNIST data are not noise-free, they are
Distance 44 44 not good enough to be directly recognized as small
size images. Even though we normalize images twice,
Matching results √ ×
the recognition rates still rise with both ANN and
Figure 5. An example where the aspect ratios SVM. This suggests that if we normalize images to a
are considered size larger than 20 * 20, the recognition rate of the
entire system will rise as well. In other words,
normalizing numeral images to 20 * 20 prevents
accurate recognition.

4
Moreover, we find that by normalizing images from [8] C. L. He and C. Y. Suen, “A hybrid multiple classifier
the originals, the recognition rate is higher. As it is system of unconstrained handwritten numeral recognition,”
impossible to find all the original images of the test set Proceedings of 7th International Conference on Pattern
in MNIST, we used the most difficult images to Recognition and Image Analysis, St. Petersburg, Russian,
October, 2004, pp. 684 – 687.
construct a small database. After retrieving all the
original images of this small database (181 images [9] L. Yang, C. Y. Suen, T.D. Bui, and P. Zhang,
from NIST), we tested in exactly the same way as “Discrimination of similar handwritten numerals based on
described in Section 4. We saw that if we normalized invariant curvature features,” Pattern Recognition, vol. 38,
images from NIST to larger sizes directly, the No.7, 2005, pp. 947 – 963.
recognition rate went up higher in our small database.
Although normalizing images to a larger size [10] P. Zhang, T. D. Bui, and C. Y. Suen, “Extraction of
produced a higher recognition rate, enlarging images hybrid complex wavelet features for the verification of
requires a higher computational cost, both in space and handwritten numerals,” Proceedings of the 9th International
Workshop on Frontiers of Handwriting Recognition, Tokyo,
time. In the future, we can consider enlarging images
Japan, 2004, pp. 347 – 352.
partially instead of the entire database as an optimal
solution, e.g. mainly those that the classifier does not [11] S. Battiato, G. Gallo, and F. Stanco, “A New Edge-
have a high recognition confidence. Moreover, Adaptive Algorithm for Zooming of Digital Images,”
increasing the space resolution of gradient features will Proceedings of IASTED Signal Processing and
be taken into account in future studies. Communications SPC 2000, Marbella, Spain, 2000, pp. 144-
149.
7. References
[1] A. Britto-Jr., R. Sabourin, E. Lethelier, F. Bortolozzi, and
C. Y. Suen, “Improvement handwritten numeral string
recognition by slant normalization and contextual
information,” 7th IWFHR, Amsterdam-Netherlands, 2000, pp.
323 – 332.

[2] E. Kavallieratou, Fakotakis N. and G. Kokkinakis, “Slant


estimation algorithm for OCR systems,” Pattern
Recognition, vol. 34, No. 12, 2001, pp. 2515 – 2522.

[3] T. Y. Zhang and C. Y. Suen, “A fast parallel algorithm


for thinning digital patterns,” Communications of the ACM,
vol. 27, No. 3, March 1984, pp. 236 – 239.

[4] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner.


“Gradient-based learning applied to document recognition,”
Proceedings of the IEEE, vol. 86, No. 11, November 1998,
pp. 2278-2324.

[5] M. Shi, Y. Fujisawa, T. Wakabayashi, and F. Kimura,


“Handwritten numeral recognition using gradient and
curvature of gray scale image,” Pattern Recognition, vol. 35,
No. 10, 2002, pp. 2051 – 2059.

[6] J. X. Dong, A. Krzyzak, and C. Y. Suen, “A fast SVM


training algorithm,” International Journal of Pattern
Recognition and Artificial Intelligence, vol. 17, No. 3, 2003,
pp. 367-384

[7] L. Xu, A. Krzyzak, and C. Y. Suen, “Methods of


combining multiple classifiers and their applications to
handwriting recognition,” IEEE Trans. Systems, Man and
Cybernetics, vol. 22, No. 3, 1992, pp. 418 – 435.

S-ar putea să vă placă și