Sunteți pe pagina 1din 4

ISBN: 978 - 15084725 - 51

Proceedings of International Conference on Developments in Engineering Research

Date: 15.2.2015

A COMPLETE WORKFLOW OF MULTILINGUAL INDIAN


REGIONAL LANGUAGES
1

L.SURIYA KALA, 2 Dr P. THANGARAJ

Research Scholar, Mother Teresa Womens University, Kodaikanal, Tamilnadu, India

Computer Science and Engineering, Bannari Amman Institute of Technology, Sathiyamangalam, Tamilnadu, India

I. ABSTRACT: The system MLOCR is an effective way to convert the document images into editable text, this process is very
tedious and time consuming to make it easier the MLOCR comes with a solution. It provides the one click solution to all the
problems, here the basic scanned image is taken and preprocessed to be converted by the OCR Engine and the image is then
recognized by the engine to converted in a particular language, finally at the press of a button the image is converted into a text
which can be manipulated in every way. The whole process is done in GUI based environment to those who prefer windows
operating system its somewhat similar to all other application in the windows environment.
Key words: MLOCR, Preprocessing, Tesseract.
II. BASE SYSTEM
The base system of MLOCR is to convert the images into text by using object character recognition engine Tesseract;

Fig. 1. The regional languages of india in MLOCR

Here in particularly to convert the scanned image into text. By using the same concept the program is developed for the most
common regional languages of India.
The languages that are used in MLOCR are

Hindi
Telugu
Kannada
Malayalam

IAETSD 2015: ALL RIGHTS RESERVED

www.iaetsd.in

89

ISBN: 978 - 15084725 - 51


Proceedings of International Conference on Developments in Engineering Research

Date: 15.2.2015

Tamil
English

All these language images can be converted to text by using MLOCR; here any one language can be processed at an instance.
III. FUTURE ENHANCEMENT
Some of the planned future enhancements are as follows

Web based MLOCR


MLOCR for multiple platforms
Image to text then text to speech conversion
Multilanguage conversion at the same time
More Accurate results
Print, Export options
Special character recognition
And Language translation etc.

IV.FLOW OF DATA
The flow of data is the key part of any process in MLOCR the basic data is the scanned image which can be selected from
your system and the image is preprocessed to fit in the visible range then the image is recognized by the OCR Engine, to be
converted into a specific language that is selected in beginning itself.
Here everything is in a feed forward manner thus one after another each process will begin based on the output of the earlier
one.

Fig. 2. The data flow of MLOCR

IAETSD 2015: ALL RIGHTS RESERVED

www.iaetsd.in

90

ISBN: 978 - 15084725 - 51


Proceedings of International Conference on Developments in Engineering Research
V.

Date: 15.2.2015

WORK FLOW

In MLOCR the work flow begins at the selection stage of the language, once the language is selected the instance of that
language is also decided after that the image must be selected from the system. Once its selected the work begins as the image is
chosen it is converted (Preprocessed) into an ample height and width (irrespective of its original size).
Then the preprocessed image is sent to the OCR engine. Based on the language that I initialized earlier the engine will look
for the characters of that particular language. Then it converts the recognized characters into text. These texts are produced in an
editable format, so the user can manipulate it in any way. All of these processes are depicted in the following Fig 3.

Fig. 3. The work flow of MLOCR

VI.

APPLICATION AREA

In every office and those who needs to prepare or convert a hard copy of a document into an editable soft copy can use this
MLOCR for example E-Publications. In some situations the MLOCR can also be used to ID the number plates and the name on
vehicles to solve some critical cases.
A.

ACCURACY RANGE

The MLOCR is at the earlier stage of its development so not all of the languages cant be converted up to the range .But the
most important languages such as English, Hindi can be converted up to 95% and above accuracy, The
below chart will depict the accuracy level based on the languages.

IAETSD 2015: ALL RIGHTS RESERVED

www.iaetsd.in

91

ISBN: 978 - 15084725 - 51


Proceedings of International Conference on Developments in Engineering Research

Date: 15.2.2015

Fig. 4. The work flow of MLOCR

VII.

CONCLUSION

The MLOCR is a simple example of a bigger view by using the same technologies and procedures we can expand the number
of languages and its level of accuracy as well as large level of conversion is also possible using MLOCR.

ACKNOWLEDGEMENT
I would like to thank my research Supervisor Dr. R. Thangaraj for his valuable guidance and I thank Mother Teresa Womens
University, Kodaikanal for giving. The opportunity to present this paper. I express my gratitude to Don Bosco College,
Dharmapuri for their support and encouragement.
REFERENCES
1. A Classification of Handwritten Multilingual Documents Raushan Kumar Singh , Akhilesh Pandey
2. A Complete Tamil Optical Character Recognition System , Aparna K G and A G Ramakrishnan
3. A Review of Research on Devnagari Character Recognition, Vikas J Dongre

Vijay H Mankar

4. Implementation of Speech Recognition in Web Application for Sub Continental Language Dilip Kumar , Abhishek Sachan
Malay Kumar.
5. Optical Character Recognition Techniques: A Review Er. Neetu Bhatia

AUTHORS PROFILE
L.suriya kala had completed MCA., M.Phil and working as Assistant professor in Don Bosco College, Dharmapuri, Tamil
Nadu India. She is a research Scholar in the specialization of Digital Image processing at Mother Teresa Womens University
Kodaikannal, Tamil Nadu, India.
Dr P. Thangaraj Ph.D., Professor and Head ,Department of Computer Science and Engineering Bannari Amman Institute of
Technology, Sathiyamangalam, TamilNadu, India.

IAETSD 2015: ALL RIGHTS RESERVED

www.iaetsd.in

92