Sunteți pe pagina 1din 5

2017 International Conference on Intelligent Computing,Instrumentation and Control Technologies (ICICICT)

Text Recognition Using Poisson Filtering and Edge


Enhanced Maximally Stable Extremal Regions

Jiji Mol1, Anisha Mohammed2 and Mahesh B S2


1
P G Scholar, College of Engineering Kallooppara, Pathanamthitta, Kerala, India
2
Assistant Professors, College of Engineering Kallooppara, Pathanamthitta, Kerala, India
jijijain91@gmail.com , anishamohammed@gmail.com , maheshbradiyil@gmail.com

Abstract—Text recognition in imagery gives more meaningful Fractional Poisson enhancement. The MSER regions
information, which makes it a relevant area of interest in corresponding to the Laplacian noise filtered image is obtained.
different fields like , content based image retrieval, navigation, It is then segmented using Canny’s edge operator and is
blind people assistance, intelligent transportation systems, vehicle enhanced using morphological operations. Text-candidate
testing etc. Text detection from the scene image is a process by regions are filtered from the non-text regions by using
which text zones are segmented from non-textual ones and they connected component analysis of the edge-enhanced MSER
are arranged in accordance with their correct order of reading. image. Clustering of the extracted text candidates from region
Diverse text patterns and variant background interferences are filtered image is done using morphological operations. An
the challenges that affect the reliability of text character
OCR is then used for recognizing the characters. The
extraction. A novel system for text detection and recognition in
images is proposed in this paper. The proposed method uses
remainder of this paper is organized as follows. A brief
Fractional Poisson enhancement for removing Laplacian noise of literature survey of the text extraction algorithm is discussed in
the input image. Then Edge-enhanced Maximally Stable section II, the proposed methodology is detailed in section III,
Extremal Regions (MSERs) is obtained from the pre-processed the performance evaluation is given in section IV and finally,
image. Region filtering is used to filter non-text regions and is the inference and the future scope is concluded in section V.
then recognized by an Optical Character Recognition (OCR)
system. The result of this algorithm outperforms other existing II. RELATED WORKS
methods in terms of Peak Signal to Noise Ratio (PSNR) and Over the last few decades, detection and recognition of text
Structural Similarity (SSIM) measurements. The method is in images and videos have been developed as an active
evaluated using the standard ICDAR dataset consisting mostly research area. The unique features of text help us to
real time images.
distinguish it from non-textual regions. Most commonly used
Keywords—Text recognition; text detection; fractional Poisson text extraction methods are summarized in this section.
enhancement;maximally stable extremal regions; region filtering;
optical character recognition A. Texture based methods
In Texture based methods, text is categorized as a special
I. INTRODUCTION texture which classifies it from the background. A machine
In recent years, text recognition from images and videos learning technique is used to train a classifier so as to detect the
gained popular interest with advancement in pattern presence of text within an image. Zhong et al. [3] analysed
recognition and computer vision technology. The significance gray-scale image using local spatial variations, with the
of semantic or high-level text data present in an image is that it assumption that text regions exhibit high degree of variance.
can easily describe an image with good clarity and can be This approach was limited to the detection of text with only
extracted using low-level features like color, texture etc., horizontal orientation. Horizontal and vertical frequencies were
which in turn varies with language, font, style and background, taken into consideration by Zhong in [4]. They also applied
thus making the task of text extraction a challenging one [1]. discrete cosine transform (DCT) for the extraction of text
Recognition of text is yet another challenge for the researchers characters. This algorithm was robust but failed to give precise
as low resolution text with small fonts may be present in an localization.
image or video with complex or textured background [2].
The aim of the proposed method is to detect and recognize B. Region-based methods
the text- candidates from a scene-text image. The first stage of Region - based method is based on the pixel difference of a
this method is text detection and the second stage deals with text-region with respect to its background. Here the pixels
recognition of the detected text characters. The proposed
algorithm begins with pre-processing of the input image using

978-1-5090-6106-8/17/$31.00 ©2017 IEEE 302


2017 International Conference on Intelligent Computing,Instrumentation and Control Technologies (ICICICT)

within a region are assumed to have uniform properties. A


threshold is used to group regions with similarities. It is of two
types: edge-based and connected component (CC)-based.
The high contrast between the text and the background is
used by the edge-based methods. Usually, a Canny or a Sobel
edge filter is used for detecting the edges, and then a
morphological operator is used for smoothening the edges. In
[5],[6], [7], Liu et al. proposed an edge-based algorithm where
a Gaussian pyramid was used to create a feature map. Dilation
operator was used for text localization.
CC-based method is based on the principle that pixels in a
connected component are connected to its neighboring pixels.
The spatial arrangement of connected components is used to
merge the text candidates and it is filtered from non-text Fig. 1: Schematic Diagram of the Proposed Method
regions using their geometrical properties [8]. Gllavata, Ewerth
and Freisleben in [9] proposed a text extraction algorithm using
a color reduction technique, for edge detection. The projection
profile analysis and geometric filtering of the edge image is
then used to localize the candidate text regions.
Chen et al. [10] proposed a CC-based text detection
algorithm, using Maximally Stable Extremal Regions (MSER)
and Stroke Width Distance (SWD) based features. They found
the intersection of Edge-enhanced MSERs and Canny edge
filtered image for detecting blurred image with low fonts.
In a previous work [11], extraction of text using Fractional (a) (b)
Poisson enhancement and morphological operations was Fig. 2: Laplacian noise removal using Poisson method. (a) Laplacian
performed. The pre-processed image was used for threshold Image , (b) Laplacian Noise filtered image
estimation which in turn was used for obtaining the binary
gradient mask. Morphological operator was then employed on B. Edge-enhanced MSER feature extraction
the binary gradient mask for finding the binarized image.
Candidate text-regions were then localized using blob analysis. Maximally Stable Extremal Regions (MSERs) is a feature
detector that is commonly used for extracting co-variant
regions within an image. This technique was proposed by
Matas et al. [13]. The MSER features of Laplacian filtered
III. PROPOSED METHODOLOGY image is plotted as in Fig. 3(a), and the edge image Fig. 3(b),
is obtained using Canny edge detector. The edge mask obtained
The proposed method consists of five steps: Pre-processing after morphological operation Fig. 3(c), is further enhanced
using Fractional Poisson enhancement, edge-enhanced MSER using morphological filling which results in the formation of
feature extraction, region filtering, connected component edge-enhanced mask, Fig. 3(d). The intersection of Canny
segmentation, and character recognition. Fig. 1, shows the edge map with edge-enhanced mask gives Edge- Enhanced
schematic diagram of the proposed method. MSER mask as shown in Fig. 3(e).
A. Pre-processing using Fractional Poisson enhancement C. Region Filtering
The input image is converted to gray-scale. Laplacian The foreground CCs are obtained as a binary image when
operations are often used to increase image contrast for getting we use edge-enhanced MSER [10]. The regions of interest in
good text detection, and recognition accuracies. In [12] Roy et an image that are identified by the binary mask is used for
al. proposed a Fractional Poisson enhancement method , for region filtering. If the binary mask contains 1's then filtered
removing the noises introduced during the Laplacian values are returned for the pixels and if it contains 0's, then
operations . The low contrast image is enhanced using the unfiltered values are returned for pixels [14]. Region properties
neighbor pixels. In the proposed method the input gray-scale can be used to remove some of the remaining connected
image Fig. 2(a), is first convolved with a Laplacian kernel to components from the edge-enhanced MSER mask. The
obtain the Laplacian image Fig. 2(b). The intensity transform thresholds used may vary for scene-text images different fonts,
used to increase the dynamic range of the gray level is the sizes, or languages. An example of a region filtered image is
fractional mean [11],[12] of the Laplacian image. shown in Fig. 4.

978-1-5090-6106-8/17/$31.00 ©2017 IEEE 303


2017 International Conference on Intelligent Computing,Instrumentation and Control Technologies (ICICICT)

(a) (b)
Fig.4: Region Filtered Text Mask

(c) (d)

(a) (b)

(e)

Fig. 3: Edge-enhanced MSER mask formation. (a) MSER mask, (b) Edge (c)
mask, (c) Edge mask after morphological operation, (d) Binary edge mask
after morphological filling, (f) Resultant Edge-enhanced MSER mask Fig.5: Sample image from ICDAR dataset showing text detection and
recognition results. (a) Morphological Mask, (b) Bounding boxes for text-
region, and (c) Recognized text with character annotations.
D. Connected Component Segmentation
The bounding box of the text region can be computed by, gives count of recognized characters with high probability.
merging the individual characters into a single connected Resultant image of OCR with character annotation is given in
component. In the proposed method, individual characters are Fig. 5(c).
connected to form a text cluster by using morphological
closing followed by opening to clean up any outliers [15]. The
image region corresponding to region filtered text mask after IV. EXPERIMENTAL RESULTS
morphological masking and detected text-regions with
The proposed method has been implemented with
bounding boxes can be clearly observed in Fig. 5(a), and
MATLAB 2015a using sample images taken from standard
Fig. 5(b), respectively.
ICDAR 2013 scene dataset with horizontal texts [16 ].
E. Character Recognition
A. Performance evaluation of detection stage
The binary text mask obtained is fed to an optical character
recognition (OCR) system to improve the accuracy of The text extraction result for different text detection
recognition. A threshold T is set to find character with high algorithm is displayed in Fig.6. It is clear from the
character confidence index. The number of extracted character observation that the proposed method outperforms other text
with character confidence index greater than the threshold is extraction algorithms like edge based algorithm, edge-
said to have high character confidence value, which in turn enhanced MSER along with stroke width based algorithm, and

978-1-5090-6106-8/17/$31.00 ©2017 IEEE 304


2017 International Conference on Intelligent Computing,Instrumentation and Control Technologies (ICICICT)

2017 International Conference on Intelligent Computing, Instrumentation and Control Technologies (ICICICT)

TABLE II. ANALYSIS OF TEXT-RECOGNITION RESULTS

Methods Character Character Error Time of


Confidence accuracy rate execution in
(a) (b) (%) rate (%) (s)
(%)

MSER and
SWD based 29.00 33.39 66.61 23.67
method [10]
Poisson
(c) (d) 44.90 52.96 47.04 28.06
method [11]
Proposed
90.30 96.42 3.58 21.00
method

(e) (f)
PSNR and MSE are defined in (1) and (2) respectively.

PSNR =20 log10*(255/MSE) (1)


MSE = ™™ (R( i , j) - G( i , j))2/MN (2)
i j
(g)
Fig.6: Results of text character extraction using various algorithms. (a) input
image, (b) ground truth image, (c) edge-based algorithm, (d) cc-based The structure of the extracted text image is compared with its
algorithm, (e) edge-enhanced MSER, (f) Poisson method with morphological ground truth in order to find structural similarity. The SSIM is
segmentation, (g) proposed method. defined in (3),

TABLE I. COMPARISON OF TEXT-DETECTION METHODS


SSIM =[l(x,y)Į]. [c(x,y)ȕ ]. [s(x,y)Ȗ] (3)

Evaluation Parameters where , l is the luminance comparison function, c is the


Methods
PSNR in dB SSIM contrast comparison function, and s is the structure
comparison function. The parameters Į, ȕ, and Ȗ are used to
Edge-based method [7] 14.78 0.34 adjust the relative importance of the three components [12].
CC-based method [9] 15.05 0.34 High value of SSIM gives better recognition accuracy.
MSER and SWD based
17.70 0.48
method [10]
Poisson method [11] 24.29 0.85 B. Performance evaluation of recognition stage
Proposed method 27.07 0.87
The performance of the OCR [17], is evaluated in Table
II, using character accuracy (CA) and character error rate
Poisson method with morphological segmentation. The (CER), as defined in (4) and (5) ,
performance evaluation of the text character extraction
methods is given in Table I, and is done by measuring average Character accuracy (CA) = (a/n) *100 (4)
Peak Signal to Noise Ratio (PSNR), and average Structural Character error rate = 100-CA. (5)
Similarity (SSIM) of about 15 images from ICDAR 2013
scene dataset, by comparing with its corresponding ground
where, a is the total number of characters in the resultant text
truth dataset. PSNR gives the estimates of the quality of
document and n is the total number of characters in the input
extracted text image compared with its ground truth. PSNR is
image. Character Confidence is a metric indicating the
obtained by calculating the mean squared error (MSE) confidence of the recognition result. Confidence values range
between the corresponding pixel values of the resultant image between 0 and 1 and should be interpreted as probabilities.
(R) and the ground truth image (G) [12], of size (M × N). Time of execution is the total time taken for executing the text
PSNR is a standard way to measure image fidelity and is a recognition process in a system with 3GB RAM, intel core
single number (in dB) that reflects the quality of detected text i5 processor, and 32-bit windows 7 operating system.
characters. When the value of PSNR is higher, better is the
rate of detection.

978-1-5090-6106-8/17/$31.00 ©2017 IEEE 305


2017 International Conference on Intelligent Computing,Instrumentation and Control Technologies (ICICICT)

[5] Xiaoqing Liu and Jagath Samarabandu, “An Edge-based text region
V. CONCLUSION extraction algorithm for Indoor mobile robot navigation,” Proceedings of
the IEEE, July 2005.
A novel system for detection and recognition of text, [6] Xiaoqing Liu and Jagath Samarabandu, “Multiscale edge-based Text
following the removal of Laplacian noise from the image and extraction from Complex images,”IEEE, 2006.
edge-enhanced MSERs is proposed. The method gives high [7] Xiaoqing Liu and Jagath Samarabandu, “A Simple and Fast Text
PSNR and SSIM compared to existing methods for detection Localization Algorithm for Indoor Mobile Robot Navigation,”
stage. It also give a very good character confidence of 90.30%, Proceedings of SPIE-IS&T Electronic Imaging, SPIE Vol. 5672, 2005.
highest character accuracy of the order of 96.42% , and is with [8] K. Jung, K. I. Kim, and A. K. Jain, “Text information extraction in
the least error rate of 3.58%. So from the results drawn, it can images and video: a survey,” Pattern Recognition, vol. 37, no. 5, pp. 977
– 997, 2004.
be concluded that the proposed method is robust than existing
methods for high resolution, complex background images with [9] Julinda Gllavata, Ralph Ewerth and Bernd Freisleben, “A Robust
algorithm for Text Detection in images,” Proceedings of the 3rd
horizontal and arbitrarily oriented texts. The future works international symposium on Image and Signal Processing and Analysis,
include methods for improving the results for images with 2003.
small fonts and distorted text. [10] H.Chen, S.S. Tsai, G. Schorth, D.M. Chen, R. Grzeszczuk, B. Girod,
“Robust text detection in natural scene images with edge-enhanced
maximally stable extremal regions,” in: Proceedings of ICIP, 2011,
pp.2609–2612.
[11] Jiji Mol, Anisha Muhammed, Nikhil G Kurup “ A Novel Method for
REFERENCES Text Detection in Imagery,” in NCICIS ,2017, in press.
[12] S. Roy, P.Shivakumara, H.A Jalab, R.W Ibrahim, U. Pal, T. Lu “Frac-
tional poisson enhancement model for text detection and recognition in
[1] Keechul Jung, Kwang In Kim and Anil K. Jain, “Text information video frames”,Pattern Recogn. 52(2016),433-447.
extraction in images and video: a survey,” The journal of the Pattern [13] J. Matas, O. Chum, M. Urban, and T. Pajdla. “ Robust wide baseline
Recognition society, 2004 stereo from maximally stable extremal regions,” Proc. of British
[2] Victor Wu, Raghavan Manmatha, and Edward M. Riseman, “TextFinder: Machine Vision Conference, pages 384-396, 2002.
An Automatic System to Detect and Recognize Text in Images,” IEEE [14] http://radio.feld.cvut.cz/matlab/toolbox/images/region.html
Transactions on Pattern Analysis and Machine Intelligence, Vol. 21, No.
11, November 1999. [15] https://in.mathworks.com.
[16] http://rrc.cvc.uab.es/?ch=4&com=downloads
[3] Y. Zhong, K. Karu, and A.K. Jain, “Locating Text in Complex Color
Images,” Pattern Recognition, vol. 28, no. 10, pp. 1,523-1,536, Oct. [17] Dr. S.Vijayarani and Ms. A.Sakila, “Performance comparison of ocr
1995. tools,” International Journal of UbiComp (IJU), Vol.6, No.3, July 2015
[4] Y. Zhong, H. Zhang, and A. K. Jain, “Automatic caption localization in
compressed video,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 22, no.
4, pp. 385 –392, 2000.

978-1-5090-6106-8/17/$31.00 ©2017 IEEE 306

S-ar putea să vă placă și