Scene Text Detection Using Machine Learning Classifiers

International Journal of Scientific Research Engineering & Technology (IJSRET), ISSN 2278 0882
Volume 4, Issue 5, May 2015
Scene Text Detection Using Machine Learning Classifiers

Nafla C.N.1, Sneha K.2, Divya K.P.3
1
(Department of CSE, RCET, Akkikkvu, Thrissur)

3
ABSTRACT
In this paper we present an efficient method of scene text
detection using two machine learning classifiers: one for
generating candidate word regions and the other for the
classification of text or nontext components. At first we
extract connected components with the help of maximally
stable extremal region algorithm. The resulting
components are partitioned into clusters with help of an
adaboost classifier based on adjacency relationship. After
that we extract features for classification from the
clusters. Then with the help of a support vector machine
classifier we classify a block into text and nontext
components
Keywords - Connected component (CC), maximally stable
extremal region (MSER), optical character recognition
(OCR), support vevtor machine (SVM).
I.
INTRODUCTION
Due to the wide availability of mobile devices having

high quality digital cameras, research areas related to
these devices are getting more attention in the last few
decades. Text detection and extraction is one of the most
important and interesting area among these researches.
Texts present in camera captured images are considered
as one of the important and strong source of information
about that image and about the place or situation from
where the image was captured. Text detection and
extraction from images have a lot of valuable and useful
application.
Texts present in an image or video can be classified as
scene text and caption text. Scene text exists in the image
naturally. Caption texts refer to those texts which are
added manually by the user. Scene texts overlap with the
background. Therefore scene text detection and extraction
are difficult as compared to the detection of caption text.
Compared to the scanned document images, text
extraction from the natural scenes are not easy because
they exist in arbitrary orientation, different sizes and
background interference. Examples of scene texts include
signs on streets, display boards on shops, texts on
vehicles, advertisement boards etc. Fig1 shows examples
of text in natural scene images.
Text string detection and extraction have a variety and

useful applications. As people travel through different
places for various purposes, it will be difficult for them to
understand the text present on display boards in the
foreign countries. In this case people either look for the
help of guides or intelligent hand held devices for the
translation of the information written on display boards.
For this text detection is an important part. Text detection
can play a crucial role in the case of content-based visual
information retrieval and the content-based image
retrieval, which includes utilization of techniques of
computer vision for the problem of image retrieval in
huge database applications. Another important
application of scene text extraction is helping people with
visual disabilities. It will be a great help for them if they
have a computerized system which can convey the text
information present on the objects and locations. License
plate detection is another important area where text
detection plays a central role. License plate detection has
crucial role in monitoring of traffic at custom check
points, for tracking of stolen cars. etc. Another significant
application of scene text detection and extraction are
robotic navigation, automatic geocoding etc.
Fig 1: Examples of natural images with scene text
www.ijsret.org
601
OCR is one of the technologies which can extract text

characters, by identifying the corners. This can be done
only if the characters have correct separation from
background. Background interference and degradation in
images will lead to the decrease in performance of OCR.
So performance of OCR is comparatively low in case of
natural scene images. Texture analysis and topic based
partition are other methods of detection. But they work
correctly on document images. Text detection and
extraction from natural image is not an simple task. Text
may exist in complex background and also the chances of
degradation are high in case of natural images. As a result
text extractions from natural images have a lot of
complexities.
The paper is organized as follows. In Section II, a
literature survey on existing methods of scene text
detection is done. In Section III, we provide details of the
proposed method. In Section IV, we show conclusion.
II.
LITERATURE SURVEY
This section covers the study of existing scene text

detection methods. Existing method of scene text
detection can be categorized as Texture based method,
connected component based method and hybrid method.
2.1Texture based methods
Texture based methods considers text as a special kind of
texture and identify the texts by using their properties like
wavelet features, filter responses and local intensities.
Angadi et al[1] described a method that make use of a
high pass filter that works in DCT domain for suppressing
of the background and make use of texture properties like
homogeneity and contrast for detection of text. The
method comprises mainly of 5 phases. They are removal
of background in the DCT domain, deriving feature
matrix D, block classification, merging of the blocks for
text area extraction and finally refinement of the text
region.
Kim et al[2] described a method that uses a combination
of CAMSHIFT and SVM for detection and extraction of
text.. Raw pixel intensity that forms the textural pattern is
given as input to the SVM. After texture extraction, the
text identification is performed by using the CAMSHIFT.
Gllavata et al[3] described a method that uses high
frequency wavelet coefficients distribution obtained by
the application of wavelet transform of the image. For
separating text and non text area. Then text area
classification is done by k-means clustering. Then text
extraction is performed by OCR engine by giving
segmented binary text image as input.
2.2 Connected component based methods

In connected component based methods, at first the image
is divided and candidate text components are extracted.
After that non text elements are eliminated through
various ways. Connected component based methods make
use of geometrical properties. This method works
properly on the images that contains texts of many
variations like changes in orientation, font etc.
Epshtein et al [4] describe a method that makes use of
stroke width for the extraction of text components. A
stroke is a contiguous part in an image that forms a band
of approximately constant width. Constant stroke width is
one of the important feature that separate texts from other
components of a scene. In this method they make use of a
logical operator together with geometrical reasoning that
identifies the place having same stroke width for the
identification of regions having text.
Yi et al [5] describes a method that use of gradient
features and color homogeneity of character components
for the extraction of candidate text regions. After that
character candidate grouping is performed to detect text
strings. This is performed on the basis of structural
features of characters in text string such as differences in
character size, distances between neighboring characters,
and alignment of characters.
Gatos et al[6] described a methodology for text detection
from natural scene images is based on an efficient
binarization and enhancement technique followed by a
connected component analysis procedure. Starting from
the original image, the method produces a binary image
and an inverted binary image. Then connected
components are extracted from complementary images.
Further, the text verification is conducted at character
level and word level on the candidate connected
components. Finally, text regions localized in two images
are refined and merged in post-processing.
2.3 Hybrid based methods
Hybrid based method is a combination of texture based
and connected component based methods.
Yi et al[7] described a hybrid approach. At first a text
region detector generates a text estimation map. This
helps in the segmentation of text components by local
binarization. After that non text component filtering is
performed by a conditional random field model. Finally
text line grouping of text components are performed by
learning based energy minimization method.
Liu et al[8] described a hybrid based method. This
method is based on the assumption that characters have
closed contours and a character string contains characters
that lie in a straight line. This method extracts the text
www.ijsret.org
602
region by extracting closed contours and searching

neighbors of them.
III
PROPOSED METHOD
This section describes the techniques used in the

proposed methodology.
3.1 overview of proposed method
We have illustrated the block diagram of our system in
fig 2.
Fig 3: input image

MSER algorithm finds out the connected component that
is brighter or darker than their surroundings. Fig 4 shows
the result of MSER extraction of the input image shown
in fig 3.
Fig 2: Overview of proposed system

As shown in the diagram the method consists of mainly
of the following steps: connected component extraction,
clustering with the help of an adaboost classifier, feature
extraction for svm classification, classification of clusters
into text and nontext components. For the CCs extraction
we make use of MSER algorithm. An adaboost classifier
that works on the basis of adjacency relationship between
the CCS is used for clustering. Then we extract features.
After that we classify the clusters as text and nontext
components. For classification, we make use of an svm
classifier.
3.2 connected component extraction
Although there are a lot of CC extraction methods we
make use of MSER algorithm because of its low
computation cost with high performance. MSER
algorithm will extract the part of the image where local
binarization will be stable over a wide range of
thresholds. This property helps us to extract most of the
text components in the image.
Fig 4: Result of MSER extraction

3.3 Clustering of CCs
Clustering includes grouping of CCs based on adjacency
relationship with the help of adaboost classifier
3.3.1Building of training sets
Our classifier is based on the pair wise adjacency
relationship between connected components extracted
using MSER. For building the training set for the
classifier, we obtain a collection of CCs by the help of
MSER extraction to the set of training images. Then for
every pair of extracted CCs we check if they are adjacent
and they belong to text component set. Then we build a
set of positive and negative examples. Positive set
www.ijsret.org
603
contains samples that are adjacent and both belong to text

component set. Negative samples are constructed by
providing pairs of CCs such that one CC belongs to text
component set and other belongs to nontext set.
3.3.2 Adaboost learning and clustering of CCs
With the help of collected samples, we train an adaboost
classifier which tells us whether two given CCs are
adjacent or not. For the purpose of training of classifier
we make use of one color based property and four
geometrical properties of CCs. first we construct
bounding box on each CC and denote its height and width
as ,
respectively. For each pair of CCs, we estimate
the vertical overlap, horizontal overlap and horizontal
distance between the bounding boxes. They are denoted
by voij, hoij, dij respectively.
,
,
(1)
(2)
(3)
And color distance between two CCs. we calculate these

features for both positive and negative samples. We train
an adaboost classifier with the help of these features. We
set the output of the adaboost classifier as +1 for CCs that
are adjacent and -1 for CCs that are not adjacent. We
checks these adjacency for all pair of CCs extracted using
MSER. Then we cluster the CCs with the help of union
find set algorithm.
3.3 Feature extraction

After clustering we will get a set of clusters which
includes text as well as non text regions. For the
classification of text and nontext component, we make
use of an SVM classifier. For this we have to extract
features from the clusters. For this we divide each cluster
into overlapped square and we extract feature from each
square block. Each square block is divided into 4 vertical
and horizontal ones and features are extracted. For a
horizontal block, we find
a) number of white pixels,
b) number of vertical white-black transitions
c) number of vertical black-white transitions
as features, and features for vertical block is defined
similarly.
3.4 SVM classification
For the training of SVM we first apply our connected
component extraction, clustering and feature extraction
steps and we train a support vector machine classifier for
the classification of square block as text and nontext
component. For a testing image, we do all the above steps
and finally decision result of all the square blocks of a
cluster is integrated. If the number square blocks which
are text is greater than the non text, then that cluster is
classified as a text component.
Fig 6: Text region detected from input image
IV CONCLUSION
Fig 5: Result of clustering on input image
Due to the complicated background and unpredictable

text appearances scene text detection is still a challenging
problem. We have presented in this paper an improved
scene text detection method that makes use of machine
learning classifiers. One for identifying the text
component and other classification of text and non text
www.ijsret.org
604
components. Our method is designed to work correctly on

images having text strings arranged horizontally. Our
future work will focus on developing an efficient learning
based algorithm that extracts text in complex background
and texts of arbitrary orientation.
ACKNOWLEDGEMNTS
Every success stands as a testimony not only to the
hardship but also to hearts behind it. Likewise, the present
work has been undertaken and completed with direct and
indirect help from many people and I would like to
acknowledge all of them for the same
[9] H Koo and D Kim., Scene text detection via

connected component clustering and non-text
filtering, IEEE Trans. Image Proc., vol. 22, no. 6 pp.
22962305, 2013
[10]
P. Shivakumara, T. Q. Phan, L. Shijian and C. L.
Tan, Gradient Vector Flow and Grouping Based for
Arbitrarily-Oriented Scene Text Detection in Video
Images, IEEE Trans. CSVT, 2013, pp 1729-1739.
REFERENCES
[1] Angadi, S.A. and Kodabagi, M.M, Text region
extraction from low resolution natural scene images
using texture features, 2ndInternational Advance
Computing Conference, IEEE, 2010,pp 121-128
[2] K. I. Kim, K. Jung, and J. H. Kim, Texture-based
approach for text detection in images using support
vector machines and continuously adaptive mean
shift algorithm, IEEE Trans. PAMI, vol. 25, no. 12,
pp. 16311639, 2003.
[3] J. Gllavata, R. Ewerth, and B. Freisleben, Text
Detection in Images Based on Unsupervised
Classification
of
High-Frequency
Wavelet
Coefficients, Proc. of Intl Conf. on Pattern
Recognition, Cambridge, UK, (page 425-428 Year of
Publication : 2004 ICPR.2004.1334146 ).
[4] B. Epshtein, E. Ofek, and Y. Wexler, Detecting text
in natural scenes with stroke width transform, in Proc.
IEEE Conf. Comput. Vis. Pattern Recognit., Page.
29632970
Year
of
Publication:
2010
CVPR.2010.5540041
[5] Yingli Tian and Chucai Yi, Text string detection from
natural scenes by structure based partition and
grouping, IEEE Transactions on image processing,
vol. 20, no. 9, pp. 2594-2605, 2011.
[6] Gatos, B.,Pratikakis, I. & Perantonis, S.J. ,Towards
text recognition in natural scene Images, in
Proceedings of Int. Conf. Automation and
Technology, ( Page 354-359 Year of Publication
2005)
[7] Yi-Feng Pan, Xinwen Hou, Cheng-LinLiu(2009),
Text Localization In Natural Scene Images Based
On Conditional Random Field, ICDAR,pp 6-10.
[8] Y.Liu, S. Goto, and T. Ikenaga, A contour-based
robust algorithm for text detection in color images,
IEICE Trans. Inf. Syst., vol. E89-D, no. 3, pp. 1221
1230, 2006.
www.ijsret.org
605

Scene Text Detection Using Machine Learning Classifiers

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Scene Text Detection Using Machine Learning Classifiers

Încărcat de

Drepturi de autor:

Formate disponibile

International Journal of Scientific Research Engineering & Technology (IJSRET), ISSN 2278 0882

Volume 4, Issue 5, May 2015

Scene Text Detection Using Machine Learning Classifiers

(Department of CSE, RCET, Akkikkvu, Thrissur)

Due to the wide availability of mobile devices having

Text string detection and extraction have a variety and

Fig 1: Examples of natural images with scene text

OCR is one of the technologies which can extract text

This section covers the study of existing scene text

2.2 Connected component based methods

region by extracting closed contours and searching

This section describes the techniques used in the

Fig 3: input image

Fig 2: Overview of proposed system

Fig 4: Result of MSER extraction

contains samples that are adjacent and both belong to text

And color distance between two CCs. we calculate these

3.3 Feature extraction

Fig 6: Text region detected from input image

Fig 5: Result of clustering on input image

Due to the complicated background and unpredictable

components. Our method is designed to work correctly on

[9] H Koo and D Kim., Scene text detection via

S-ar putea să vă placă și