Documente Academic
Documente Profesional
Documente Cultură
1
Nadia Ben Amor
1
n.benamor@ttnet.tn
National Engineering School of Tunis (ENIT) Tunisia
and
2
Najoua Essoukri Ben Amara
2
Najoua.benamara@enim.rnu.tn
Abstract
Optical Characters Recognition (OCR) has Due to the cursive nature of the script, there are several
been an active subject of research since the early days characteristics that make recognition of Arabic distinct
of computers. Despite the age of the subject, it remains from the recognition of Latin scripts or Chinese.
one of the most challenging and exciting areas of The work we present in this paper belongs to the general
research in computer science. In recent years it has field of Arabic documents recognition exploring the use
grown into a mature discipline, producing a huge body of multiple sources of information. In fact, several
of work. experimentation carried out in our laboratory had proved
Arabic character recognition has been one of the last the importance of the cooperation of different types of
major languages to receive attention . This is due, in information at different levels (features extraction,
part, to the cursive nature of the task since even printed classification…) in order to overcome the variability of
Arabic characters are in cursive form. Arabic and especially multifont characters[2 ].
This paper describes the performance of
combining Hough transform and Hidden Markov In spite of the different researches realised in the field
Models in a multifont Arabic OCR system. Experimental of Arabic OCR (AOCR), we are not yet able to evaluate
tests have been carried out on a set of 85.000 samples of objectively the reached performances since the tests had
characters corresponding to 5 different fonts from the not been carried out on the same data base. Thus, the
most commonly used in Arabic writing. idea is to develop several single and hybrid approaches
Some promising experimental results are reported. and to make tests on the same data base of multifont
Keywords: Arabic Optical Character Recognition, Arabic characters so that we can deduce the most
Hough Transforms, Hidden Markov Models. suitable combination or method for Arabic Character
Recognition.
1. Introduction
In this paper, we present an Arabic Optical multifont
Arabic belongs to the group of Semitic alphabetical Character Recognition system based on Hough
scripts in which mainly the consonants are represented transform for features selection and Hidden Markov
in writing, while the markings of vowels (using Models for classification[3].
diacritics) is optional. In the next section, the whole OCR system will be
presented. The different tests carried out and obtained
This language is spoken by almost 250 million people results so far are developed in the fourth section.
and is the official language of 19 countries[1]. There
are two main types of written Arabic: classical Arabic 2. Characters Recognition System :
the language of the Quran and classical literature and
modern standard Arabic the universal language of the The main process of the AOCR system we developed
Arabic speaking world which is understood by all can be presented by the following figure:
Arabic speakers. Each Arabic speaking country or
region also has its own variety of colloquial spoken
Arabic.
Proceedings of the 4th International Symposium on Image and Signal Processing and Analysis (2005) 285
Acquisition and preprocessing
features, which, given a certain classification technique,
will produce the most and efficient classification results.
Features Extraction Obviously, the extraction of suitable features helps the
system reach the best recognition rate[6]. In a previous
work, we have used wavelet transform in order to
Character learning Character recognition
extract features and we have obtained very promising
results[7]. In this paper, we present a Hough Transform
based method for features extraction .
Models
Recognized characters 2.2.1 Hough Transform: