Sunteți pe pagina 1din 75

VISVESVARAYA TECHNOLOGICAL UNIVERSITY

Jnana Sangama, Santhibastawad Road, Machhe


Belagavi - 590018, Karnataka, India

“Native language to English translator using Image Processing”


Submitted in the partial fulfilment of the requirements for the award of the degree of

BACHELOR OF ENGINEERING
IN
INFORMATION SCIENCE AND ENGINEERING
For the Academic Year 2018-2019
Submitted by
Arjun Raja Y [1JS15IS010]
Ashika N B [1JS15IS012]
Chaitra Kulkarni [1JS15IS019]
Aditya Abhishek [1JS14IS001]

Under the Guidance of


Mrs. Sudha P R
Assistant Professor, Dept. of ISE, JSSATE

2018-2019
DEPARTMENT OF INFORMATION SCIENCE AND ENGINEERING
JSS ACADEMY OF TECHNICAL EDUCATION
JSS MAHAVIDYAPEETHA, MYSURU
JSS ACADEMY OF TECHNICAL EDUCATION
JSS Campus, Dr.Vishnuvardhan Road, Bengaluru-560060

DEPARTMENT OF INFORMATION SCIENCE & ENGINEERING

CERTIFICATE

This is to certify that Project work Phase II entitled “Native language to English
translator using Image Processing ” is a bonafide work carried out by Arjun Raja
Y [1JS15IS010], Ashika N B [1JS15IS012], Chaitra Kulkarni [1JS15IS019], Aditya
Abhishek [1JS14IS001] in partial fulfilment for the award of degree of Bachelor of
Engineering in Information Science and Engineering of Visvesvaraya Technological
University Belagavi during the year 2018-2019.

Signature of the Guide Signature of the HOD Signature of the Principal

Mrs. Sudha PR Dr. Dayananda P Dr. Mrityunjaya V Latte


Asst. Professor Assoc. Prof. & Head Principal
Dept. of ISE Dept. of ISE JSSATE, Bengaluru
JSSATE, Bengaluru JSSATE, Bengaluru
ACKNOWLEDGEMENT

The satisfaction and euphoria that accompany the successful completion


of any task would be incomplete without the mention of the people who made it
possible. So with gratitude, we acknowledge all those whose guidance and
encouragement crowned my effort with success.

First and foremost we would like to thank his Holiness Jagadguru Sri
Shivarathri Deshikendra Mahaswamiji and Dr. Mrityunjaya V Latte,
Principal, JSSATE, Bangalore for providing an opportunity to carry out the
Project Phase-I+ Seminar(15ISP78) as a part of our curriculum in the partial
fulfilment of the degree course.

We express our sincere gratitude for our beloved Head of the department,
Dr. Dayananda P, for his co-operation and encouragement at all the moments
of our approach.

It is our pleasant duty to place on record our deepest sense of gratitude to


our respected guide Mrs. Sudha P R, Asst. Professor for the constant
encouragement, valuable help and assistance in every possible way.

We are thankful to the Project Coordinators Mrs. Sowmya K N, Asst.


Professor and Mrs. Sudha P R Asst. Professor, for their continuous co-
operation and support.

We would like to thank all ISE department teachers and non teaching
staff for providing us with their valuable guidance and for being there at all
stages of our work.

Arjun Raja Y [1JS15IS010]


Ashika N B [1JS15IS012]
Chaitra Kulkarni [1JS15IS019]
Aditya Abhishek [1JS14IS001]
TABLE OF CONTENTS

1. Introduction
1
1.1 Overview
2

2. Literature Survey 4

2.1 Survey Paper 1 (Base Paper) 5

2.2 Survey Paper 2 (Base Paper) 8

2.3 Survey Paper 3 11

2.4 Survey Paper 4 13

2.5 Survey Paper 5 17

2.6 Survey Paper 6 19

2.7 Survey Paper 7 21

2.8 Survey Paper 8 23

2.9 Survey Paper 9 25

2.10 Survey Paper 10 27

3. OBJECTIVES AND PROBLEM STATEMENT 28

3.1 OBJECTIVES 29

3.2 PROBLEM DEFINITION 29

4. Methodology 30

4.1 Methodology 30

4.1.1 Binarization 32
4.1.2 Segmentation 34

4.1.3 Feature Extraction 35

4.1.4 Classification 36

5. Requirement Specification 38

5.1 Hardware and Software Requirements 39

5.2 Low Level Specification 39

6. Implementation 40

6.1 Input Pre-processing 41

6.2 Character Recognition 46

6.3 Conversion 50

7. Testing 53

8. Conclusion 60

9. References 62

10. Snapshots 64

Publication Details

Plagiarism Report
LIST OF FIGURES

FIG 1.1 KANNDA SCRIPT


FIG 2.1 BLOCK DIAGRAM OF AUTOMATIC FORM PROCESS
FIG 2.2 RESULT REPRESENTATION FOR PAPER 2.8
FIG 4.1 PROCESS DIAGRAM FOR THE SYSTEM
FIG 4.2 INPUT IMAGE
FIG 4.3 GRAY SCALE CONVERTED IMAGE
FIG 4.4 IMAGE AFTER APPLYING A THRESHOLD VALUE
FIG 4.5 GRADIENT FEATURE EXTRACTION
FIG 4.6 ARTIFICIAL NUERAL NETWORK
FIG 4.7 ‘CONVOLVED’ INPUT
FIG 6.1.1 AUGMENTATION
FIG 6.3.1 CTC LOSS
FIG 7.1 INPUT NOT IN A RANGE IS SCANNED
FIG 7.2 INPUT NOT IN A RANGE GIVES A DISORTED OUTPUT
FIG 7.3 INPUT READY FOR ANALYSIS
FIG 7.4 INPUT NOT IN A RANGE IS SUCEESSFULLY SEGMENTED
FIG 7.5 INPUT NOT IN A RANGE IS SUCCESSFULLY TRANSLATED
FIG 10.1
FIG 10.2 LINE SEGMENTATION
FIG 10.3 WORD SEGMENTATION
FIG 10.4 CHARACTER SEGMENTATION
FIG 10.5 CHARACTER AUGMENTATION
FIG 10.6 CHARACTER RECOGNITION
FIG 10.7 CONVERSION
ABSTRACT

Communication has come a long way since ancient times, what remains constant
though is the language barrier. If we have control over this aspect, exchanging information
for - be it the educational purpose or convenience purpose - simplifies. As we know, Image
processing and it’s management is the present. The proposed system provides automated data
processing of images with Kannada script. An efficient pre-processing strategy is presented
for extracting features from Kannada characters and then translating them into English using
neural networks. To overcome the barrier of non-existent standardized script, augmentation
of available script is employed. The Kannada to English converter application helps those
unfamiliar to either Kannada or English with the help of dictionary that provides English
meaning for the processed Kannada images. It serves a wide variety of population ranging
from the less privileged to the tourists easing everyday tasks.
NATIVE LANGUAGE TO ENGLISH TRANSLATOR USING IMAGE PROCESSING

CHAPTER 1

INTRODUCTION

DEPT OF ISE, JSSATEB 1


NATIVE LANGUAGE TO ENGLISH TRANSLATOR USING IMAGE PROCESSING

1. INTRODUCTION

1.1 OVERVIEW

India is getting more digitized day by day. From business transactions to food
deliveries, everything has found its place online. It’s becoming necessary that a channel is
provided for these to run smoothly. Just like there are abundant of handwritten English
recognizing software, we need the same for the native languages as well, so the culture can
be preserved while not hindering routine activities.
Written Indian scripts are difficult for machines to understand because they consist a
lot characters including numbers, vowels, modifiers and consonants. There has been immense
advancement in the recognition system created for native languages but these come with their
limitations. These have been either been built for a subset of characters or base characters,
ignoring modifiers of any kind. While these were created with their applicability in minds,
there is no system that could recognize the language as a whole. The idea is to keep create an
agency to convert Kannada to English and help them learn and use it better.

Kannada:

The official language of a state Karnataka located in the south of India is Kannada. This
language is used by about 49 million people in Karnataka and also popular as the Dravadian
language.

Kannada comes with its script that is derived from another script known as Brahmi.It
has its own script derived from Brahmi. This script contains a root set that has 52 characters
along with 16 vowels as well as 36 consonants. Furthermore, there are also certain tokens
used to modify these root consonants known as the vowel and consonant modifier. The script
has the same count of root characters as these modifiers. There is also another set of
characters known as aksharas that are built by binding together the tokens according to which
consonants, modifiers, vowel modifiers are selected keeping in sticking to a set of rules.

DEPT OF ISE, JSSATEB 2


NATIVE LANGUAGE TO ENGLISH TRANSLATOR USING IMAGE PROCESSING

FIG 1.1: KANNDA SCRIPT

DEPT OF ISE, JSSATEB 3


NATIVE LANGUAGE TO ENGLISH TRANSLATOR USING IMAGE PROCESSING

CHAPTER 2

LITERATURE
SURVEY

DEPT OF ISE, JSSATEB 4


NATIVE LANGUAGE TO ENGLISH TRANSLATOR USING IMAGE PROCESSING

2. LITERATURE SURVEY

2.1 PAPER-TITLE:

Review of Automatic Handwritten Kannada Character Recognition


Technique Using Neural Network 2017[1]

ABSTRACT:

In this paper, automatic processing of forms written in Kannada language is consider. The
Author proposes Principal Component Analysis (PCA) and Histogram of oriented Gradients
(HoG) methods for feature extraction.The Features are then classified using a feed forward
and back propagating ANN. Only 57 attribute are taken as unique classes. Performances
based on two features are compared for multiple classes. The Author arrives at a conclusion,
HoG has better accuracy than PCA as number of classes increased. Due to the huge number
of attribute set in Kannada script a reduction in the recognition accuracy along with an
increase in computational cost is noticed. Author proposes a method to reduce this problem
by reducing the symbol set, where the vowel modifiers (kagunitha) and
consonant modifiers (vattakshara) are considered as separate classes. Devanagari script has
similar characteristics as Kannada script like vowel modifiers, consonant conjuncts etc. Only
small subsets from compound attributes (upper or lower) are considered for recognition.
Many Arabic letters also share common primary shapes, which differs only in the number of
dots and the dots or above or below the primary shape. Fourier descriptors, chain codes and
different shape based features are used for the recognition of handwritten Kannada
characters. Recognition is carried out by Support Vector Mechanism and an accuracy of 95%
is obtained. A brief survey on offline recognition of Devanagari script [10]. Classifier based
Feature extraction is compared in a survey. Gradient and PCA based features with PCA,
SVM and Neural Network classifiers are found to have better recognition accuracy. Multi
Layer perceptron is discussed. MLP is used for recognition of mixed numerals for three
Indian scripts such as Devanagari, Bangla and English. PCA is used to reduce the dimension
of feature vector. It is found that ridge let features offered promising result than PCA.

DEPT OF ISE, JSSATEB 5


NATIVE LANGUAGE TO ENGLISH TRANSLATOR USING IMAGE PROCESSING

Accuracy of 87.24% is achieved when SVM is used as the classifier. Literature records few
papers on Kannada HWR. Kannada handwriting recognition and automatic form

processing is considered. HoG and PCA are used for feature extraction. Features
performances are compared for 57 attributes.
Handwritten character recognition of Kannada characters is a very challenging task because
of its large dataset, shape similarity among characters an non-uniqueness in the representation
of diacritics.

METHODOLOGY:
FORM PROCESSING METHODOLOGY:

Automatic Form Processing system involves Image acquisition using scanner, pre-processing
of scanned form, only handwritten character extraction, handwritten character segmentation,
character recognition and storage as shown in .The template of the Birth certificate is created
with all the required data fields. The applicant is then instructed to fill the form in Kannada
language with all the base characters in the upper box and conjuncts in the lower box.

FIG 2.1: BLOCK DIAGRAM OF AUTOMATIC FORM PROCESS

DEPT OF ISE, JSSATEB 6


NATIVE LANGUAGE TO ENGLISH TRANSLATOR USING IMAGE PROCESSING

RESULTS:

The segmented characters are used for feature extraction using PCA and HoG. Performance
of these feature for different number of classes are compared using neural network with back
propagation learning. 100 samples per class are used for recognition purpose. All characters
in the kannada script is not considered in this paper. Preprocessing helps identify exclusively
handwritten characters.Extraction of features is done by histogram and principle vector
analysis.

DEPT OF ISE, JSSATEB 7


NATIVE LANGUAGE TO ENGLISH TRANSLATOR USING IMAGE PROCESSING

2.2 PAPER TITLE:

Unrestricted Kannada Online Handwritten Akshara Recognition using


SDTW[2]

ABSTRACT:

This paper gives a solution to recognise online handwritten characters in which can then be
applied in practical implementation. It takes into consideration the numbers(Indo-Arabic),
punctuations, aksharas as well as the special tokens like #,$ etc. from Kannada. About 69
handwritings have been collected from people belonging to 4 different places to build the
dataset. It was brought to light that as features if smooth derivatives are used the performance
of a DTW classifier is improved by about 88%, but only to be discovered less efficient.
Providing a solution to this, we employed Statistical Dynamic Time Warping
(SDTW) that helped us attain speeded classification with better accuracy, that is enough fast
for real-time implementations. Hence, a quality of development that could be expected from
this. Where domain restrictions like language, post processing and vocabulary can be utilized.
METHODOLOGY:
DYNAMIC TIME WARPING (DTW):

It’s a stratergy that’s used to match unequal or equal length patterns. If they don’t have the
same length then the matching is done using DTW. During the process of matching, a cost
matrix that gives us the point of references that match in the considered pattern,is calculated.
Consider two sequences R = (r1, r2, r3, r4, .rJ ) and T = (t 1, t2, t3, t4, ......tK) where
(rj , tk)
Rd Let the warping path be φ = (φ(1), φ(2), φ(3), .φ(N)) with φ(n)=(rj , tk) which gives the
details of alignment of pattern R to T.
1) Warping path:
a) The first and last points of pattern T are compared with the first and last points of
pattern R, respectively. i.e., φ(1) = (1, 1) and φ(N) = (J,K), where N is the total

DEPT OF ISE, JSSATEB 8


NATIVE LANGUAGE TO ENGLISH TRANSLATOR USING IMAGE PROCESSING

instances in the warping path. b) φ(m)=(α, β) , φ(m − 1) = (α , β ) where : 0 <= (α −


α ) <= 1 and 0 <= (β − β ) <= 1. The above condition makes sure that the path
will be moving either right by one step or down by one or diagonally below towards
right by one step.
2) Step-by-step algorithm:
1) A distance matrix of dimension J * K is created, which contains Euclidean distance
of every point of R with every point of T whose elements are d(i,j) = EuclideanDist (ri
, tj )
2) A cumulative cost matrix of dimension J * K is generated, whose elements are
calculated as follows c(i,j) = d(i,j) + min(c(i-1,j), c(i,j-1), c(i-1,j-1)) ;
3) Warping cost = DTW(R,T) = c(J, K).
4) Warping path could be found out using dynamic programming technique as is done
in ”Viterbi Algorithm” to find Viterbi path.
3) Restrictions on features:
The Euclidean distance metric that this technique uses is sensitive to the y value of the
point; it does not differentiate between the points corresponding to the rising and
falling slopes, if the distinction in y values is the same. For the same reason, one large
subsection of one pattern may be the same as to one particular point in the some other
pattern. This leads to unintuitive matching, which is undesirable. For details.
4) F D as feature: F D (Estimate 1):
In this method, the derivative at the current point is estimated using the formula given
below:
X (j)= 2 1 i×(x(j+i)−x(j−i)) 2× 2 1 i2 , Y (j)= 2 1 i×(y(j+i)−y(j−i)) 2× 2 1 i2 Since the
above formula cannot be approximated for the 1st, 2nd, last and 2nd last points, those
values are calculated as shown below: where 1≤ i ≤ L where L = number of points in
the pattern. Warping cost is evaluated as explained above with features X’(j) Y’(j).
First Derivative (Estimate 2): Another estimate of derivatives of x and y at each point
was suggested as a feature in [1]. The estimated derivative of x and y at the point j in
a pattern could be calculated using the formulae given below: X (j)= x(j)−x(j−1)+
x(j+1)−x(j−1) 2 2 , Y (j)= y(j)−y(j−1)+ y(j+1)−y(j−1) 2 2 Since the above formula
cannot be estimated for the first and last points, their values are assumed to be the
same as those of second and penultimate points, respectively: X’(1)=X’(2);

DEPT OF ISE, JSSATEB 9


NATIVE LANGUAGE TO ENGLISH TRANSLATOR USING IMAGE PROCESSING

Y’(1)=Y’(2); X’(l)=X’ (l-1); Y’(l)=Y’(l-1) where 1≤ i ≤ L where L = number of


points in the pattern. Warping cost is evaluated as explained above with features
X’(j), Y’(j).

RESULTS:

Although if two different derivatives are used in a DTW, the speed of recognition reduces by
a second, it amplifies the precision by about one and four percent for the first and the second
estimate. If SDTW is used, the speed increases by forty six times with a negligible decrease
in accuracy.

DEPT OF ISE, JSSATEB 10


NATIVE LANGUAGE TO ENGLISH TRANSLATOR USING IMAGE PROCESSING

2.3 PAPER TITLE:

Unconstrained Handwritten Kannada Numeral Recognition.[3]

ABSTRACT:

Handwritten Recognition is a procedure of recognizing the handwritten data given as the


input from sources such as papers, photographs of documents and images. The image of the
handwritten text is scanned offline from a piece of paper. The Handwritten Recognition
System can be offline and online. There are vowels and consonant modifiers present in the
Kannada and all other Indian scripts, recognition of these scripts might be difficult because of
the presence of Kannada vowels, consonants, and modifiers. General Regression Neural
Network is used for classification, regression and prediction. Optimal Scanning is used for
optimal recognition of handwritten input in the form of images, papers and documents.

Input set and processing:

There is no Kannada neural database available at present. The dataset consists of 100 samples
of different writers for input processing.

METHODOLOGY:

Feature Extraction is the technique to extract a set of derived values that can be features from
the dataset intended to be redundant. If the input data is too large for the process and is
redundant then it can be transformed into smaller set of features. These subset of features are
selected and the selected features should be relevant with respect to the input data and using
this data the desired tasks can be completed.
The input neural image is of size 50x50 and thinning algorithm is used for this process. The
image is divided into 25 equal parts and results in a matrix of order 10x5 each box and
compute the pixel distance for grid column. Hence resulting in 10 features for each grid

DEPT OF ISE, JSSATEB 11


NATIVE LANGUAGE TO ENGLISH TRANSLATOR USING IMAGE PROCESSING

Algorithm for Handwritten Recognition:

Handwritten Recognition using box based system and distance calculation using vertical
projection.
Input: Dataset consisting of numerals written by different writers

Output: Classified Dataset


1: Usage of thinning algorithm for the dataset.
2: Images should be divided into 25 parts equally.
3: Calculate pixel distance.
4: Calculate average pixel distance if there is more than one pixel in grid column
5: Go to step 3&4 and repeat.

RESULTS:

The average recognition of 20% test samples is 98.8%.

The average recognition of 40% test samples is 97.8%

The average recognition of 60% test samples is 96%

The average recognition of 80% test samples is 95%

The average recognition of 100% test samples is 94%

GRNN is the method used for evaluation of classification of dataset and recognition of
dataset.
The efficiency for recognition of handwritten Kannada numerals was found out to be 98.8%

DEPT OF ISE, JSSATEB 12


NATIVE LANGUAGE TO ENGLISH TRANSLATOR USING IMAGE PROCESSING

2.4 PAPER TITLE:

Kannada Character Recognition System[4]

ABSTRACT:

Optical Character Recognition is a technique used for conversion of images of handwritten


text or documents into machine encoded text. Advanced systems produces a high degree of
accuracy in recognizing most common fonts.
Research has been carried out on this method and many works are published on this topic.
But majority of the OCR systems work on different character sets of Chinese, Japanese,
Arabic etc.
This paper aims on increasing the recognition accuracy using Optical Character Recognition
by combining the classifiers that maybe Dynamic Time Warping or Statistic Dynamic Time
warping.

METHODOLOGY:

Input Pre-processing involves these steps:

Binarization:
Binarization is the first process that involves converting the gray scale image into binary
image by highlighting the text present in the image and removing the background of the
image.

Skew Detection:
This method is used as a filtration method where all the unwanted noises are removed after
binarization and this method is applied on each word and later is sent for skew correction.

DEPT OF ISE, JSSATEB 13


NATIVE LANGUAGE TO ENGLISH TRANSLATOR USING IMAGE PROCESSING

Skew Correction: The images are filtered and sent for skew correction where the image page
is rotated to correct the skew that is detected for a small dataset but it takes more time for a
larger dataset.

Segmentation:
It is the process of extracting the image of our interest by reducing the dataset into smaller
segments and identifying the required text and converting it to something meaningful and
easier to understand.

Segmentation of Vowel Modifiers:


The segmentation of verbal modifiers includes two parts the top Matra and the right Matra
the top Matra is the character which is above the headline being present in the top zone going
to the fact that the knowledge of the head line and base line of every individual character is
already known finding the aspect ratio of the character which is segmented becomes crucial.
Aspect ratio is nothing but combining the top zone and the middle zone when it appears more
than 0.95 this leads to us checking for the presence of the right Maatra.

CHARACTER CLASSIFICATION

After the extraction of segmented vowel modifiers and characters the feature vector of these
character will be assigned a character classifier. There are many methods of character
classification like Nearest Neighbor Classifiers and Neural Networks. The dataset used is
divided into two kinds of sets that are training and test set related to individual characters.

Nearest Neighbor Classifier:

This method is the supervised method of pattern recognition and is very efficient when it
comes to performance. Since the dataset consists of both training and test set ,the training set
consists of positive and negative training sets.

Gradient calculation using Back Propagation Network:

Back Propagation is a methodology which has been employed in order to compute a gradient
which is vital in order to calculate the weights.

DEPT OF ISE, JSSATEB 14


NATIVE LANGUAGE TO ENGLISH TRANSLATOR USING IMAGE PROCESSING

The network that is being dealt with is perceived in the form of a multilayer having an input
layer and having more than one hidden layer and output layers. The training of the back
propagation network is done in a batch mode by utilising supervised learning and also by log
sigmoidal activation function. To meet the specified requirements of the active functionality
before the input is trained, the input would be bot to a normalised range from 0 to 1

Radial Basis function Network (RBF):

The radial basis function network is a network consisting of three layers, the layers being
input output and hidden layers. each of the training patterns we encounter the radial basis
functions are centralised this enables for us to keep all the biases in the layer constant which
in turn is dependent on the Gaussian spread.

Classification using SVM Classifier:

The SVM classifier is popularly known as a classifier with two classes based on the
discriminant functions. The discriminant function in this particular SVM has a surface being
the representation. This surface access the separator for the patterns which revolves as two
individual classes. When we consider the matter of applications having to do with OCR we
train two classes which of this class has a unique label attached to it. The test sample is
assigned to the label in which the class gives the largest possible positive outcome increase
the test set throws a negative output it is rejected directly.

RESULTS:

On analysing the different aspects of the paper and carefully studying the methodology will
come into conclusion that the results obtained in this particular paper has been obtained from
scanning text samples from popularly available sources such as text books and documents
and magazines. when the text was taken from a same source and the text was found to be
same the pattern generated was ensure to be disjoint. For recognition of the text the SVM
classifier was used and this classifier having two classes hands becomes complicated to deal
with more than two class problems.
in order to improve the accuracy in the recognition performance stage of the curves we have
used a recognition based segmentation methodology which intern uses wide length of

DEPT OF ISE, JSSATEB 15


NATIVE LANGUAGE TO ENGLISH TRANSLATOR USING IMAGE PROCESSING

dynamic nature in order to provide points of segmentation confirmed by the stage of


recognition. The previous process is said to be highly sensitive so come let's give them a very
good representation of the edges present in an image it has both directional sensitivity and as
well as highly and anisotropic. this methodology allows us for the possibility to represent
Kannada characters in order for them to be classified. Feature extraction is used in order to
extract the features from the large data set and the feature vector is analysed by using a
classifier.

DEPT OF ISE, JSSATEB 16


NATIVE LANGUAGE TO ENGLISH TRANSLATOR USING IMAGE PROCESSING

2.5 PAPER TITLE:

A font and size-independent OCR system for printed Kannada documents


using support vector machines[5]

ABSTRACT:

This paper consists of an OCR application for documents in a popular South Indian Language
Kannada.The dataset to this application will be in the form of a text image that is scanned for
input processing and is used as an input to this application. The expected output will be a file
that is compatible and is editable with all the softwares.The input is passed to the application
and the application extracts the readable words from the text image and then binarization is
carried out followed by segmentation of the image into smaller segments. Feature extraction
is done where all the features are extracted and the text is recognized and is classified using
SVM classifiers and the recognition of the text is not dependent on the fonts of the text and
the application is seen to deliver a decent and expected performance.

METHODOLOGY:

The image text is scanned using a scanner at 300DPI.After scanning the image text it is
binarised to obtain a binary image from a pixel image. Binarization is followed by skew
detection where unwanted noises are filtered and removed. Skew correction involves rotating
an image in such a way that the text inside an image is in an understandable and easier
format. Segmentation is followed by skew correction in which the words and lines are
separated in three vertical zones based on projection of the active word horizontally. After
segmentation the segments obtained are given as an input to the recognizer in order to
recognize the word and then feature extraction is done where the features are extracted and
the feature vector undergoes classification using a classifier called SVM. The output is
obtained by the ASCII file and is uploaded into the software for further process.

DEPT OF ISE, JSSATEB 17


NATIVE LANGUAGE TO ENGLISH TRANSLATOR USING IMAGE PROCESSING

SEGMENTATION PROCESS:
After the segmentation of three zones horizontally the most critical and the major portion of
the character is present in the middle zone, therefore it is very necessary to segment the
middle zone first.
There might be many vowel modifiers and base consonants inside the middle zone and top
zone, the main aim is to separate both the consonants and vowels modifiers present in the
middle zone. Because the middle zone is the crucial one, some of the consonants are
segmented in two or more parts. In order to who achieve the segmentation the over
segmented merge approach is followed. There are three different classifiers used to classify
three zoned segments. Hence the classes obtained from these segments present in two top
zones might involve vowel modifiers, vowels and consonants. Hence the total count of the
number of classes would be 67.Any sort of segment can be added to the training set and the
results are obtained with these training set consisting of 2999 patterns

CLASSIFICATION OF PATTERN USING SVM CLASSIFIER:


Feature Extraction is carried out to extract the features and the feature vector obtained from a
segment after segmentation should be associated with a label of pattern classifier. This can be
achieved using many classifiers like Nearest Neighbor Classifiers for prototype
Classification. SVM is the classifier that classifies the dataset.

RESULTS:

In order to process a document we follow a sequence of operations and steps which has been
mentioned in a system a single image of text in Kannada is scanned using a flatbed scanner.
The first stage in order to identify the text begins with skew correction. In this the windows
whose transformation has been used has a windows size of approximately one lakh pixels.
When the observation is estimated to be around half of the pixel occupancy in each of the line
being scanned and the text height in every line is said to be hundred pixels then the window
wood correspond to nearly ten lines which should be enough in order to obtain a correct
estimation of the skew. Segmentation is done based on the separation of the lines and not on
any kind of page layout.

DEPT OF ISE, JSSATEB 18


NATIVE LANGUAGE TO ENGLISH TRANSLATOR USING IMAGE PROCESSING

2.6 PAPER TITLE:

A Novel Approach on Offline Kannada and English Handwritten Words [6]

ABSTRACT:

One of the challenging and fascinating areas of research is the recognition of image text from
different kinds of sources in the field of image processing. This not only facilitates in
communication process but also helps many tourists and has many other applications like
recognizing and converting handwritten texts into electronic form, it can be a mode of
reading for blind. This paper focuses on recognition of handwritten names of Karnataka
districts written in Kannada using classifiers.

The process starts from scanning the text image for input processing followed by skew
detection that is noise filtering followed by skew correction and segmentation process.
Feature extraction is the method to extract features from the dataset that is redundant and a
feature vector is extracted where the recognition and classification is improved.

In order to extract the edges, edge detection algorithm is used and features are extracted for
better recognition and train the classifiers

METHODOLOGY:
The dataset consists of 1200 words in Kannada and English from different people. The words
and characters differs in font size and shapes collected from the writers.
The database consists of names of all the districts in Karnataka and around 20 words in
English
Recognition process involves these steps:

a. Input Processing

b. Extraction of feature from the dataset

c. Classification of the dataset

d. Recognition of handwritten text.

DEPT OF ISE, JSSATEB 19


NATIVE LANGUAGE TO ENGLISH TRANSLATOR USING IMAGE PROCESSING

The input pre-processing involves binarisation, skew detection, skew correction and
segmentation, thinning.
EDGE DETECTION PROCESS:

It is the process of detecting the sharp edges and boundaries of the objects inside images.

FEATURE EXTRACTION:

Feature Extraction is the process of extracting the features from the dataset that is redundant.
If the features are extracted from a larger dataset then it is simplified into smaller fragments
and features are extracted. Feature extraction increases the accuracy rate of recognition. The
feature extraction method is selected on basis of the application. This paper focuses on postal
application to recognize the district names of Karnataka. The features involves loops, matras,
and some cornered points.

CLASSIFICATION PROCESS:

In this stage a label is assigned to text images after feature extraction and establishing a
relationship between the features. The classifiers include neural networks and dynamic time
warping.

RESULTS:

The dataset consists of handwritten text images from 60 writers so that the handwriting will
be uniquely recognized. There are 30 names of districts of Karnataka and a total of 20
English words from all 60 writers.

So the database includes 1200 English and Kannada words. An Accuracy of 92% of
recognition is obtained.

DEPT OF ISE, JSSATEB 20


NATIVE LANGUAGE TO ENGLISH TRANSLATOR USING IMAGE PROCESSING

2.7 PAPER TITLE:


Offline Handwritten Character Recognition in South Indian Scripts: A
Broad Visualization[7]

ABSTRACT:
Handwritten Character Recognition is one of the challenging and fascinating research
concepts in pattern recognition.There are many applications for handwritten recognition in
English, Japanese and other languages but limited when it comes to Indian languages. This
paper focuses on Handwritten Character Recognition of South Indian Languages like
Kannada, Tamil, Telugu, and Malayalam.

METHODOLOGY:

There are two ways of Handwritten Character Recognition. Offline Handwritten Character
Recognition and Online Handwritten Character Recognition. Online Character Recognition
involves an electronic pen that is tracked for the recognition process and is less complicated
when compared to offline recognition process.
There is a constraint on time for offline recognition. The image text is scanned and is
processed. All the characters are grouped using a technique called K-means. A feature vector
is extracted in order to increase the recognition process and component transition is used for
this process. Later this process is followed by classification. The results of recognition
accuracies were 93% and 88.9% for two sets of datasets, training and test sets.

STUDIES ON HANDWRITTEN CHARACTER RECOGNITION OF KANNADA SCRIPT:


Kannada is the official language of the South Indian State, Karnataka. The Script consists of
a set of vowels and consonants and total character sums up to be 57. There are about 17
vowels and 40 consonants. The evolution of Kannada Aksharas were from Kadamba and
Chalukya scripts.

DEPT OF ISE, JSSATEB 21


NATIVE LANGUAGE TO ENGLISH TRANSLATOR USING IMAGE PROCESSING

RESULTS:
This paper consists of a detailed research and study on the handwritten character recognition
works developed for Indian scripts mainly South Indian Languages like Kannada,
Malayalam, Tamil, Telugu. There is no complete Optical Character Recognition application
system developed for these languages yet.

DEPT OF ISE, JSSATEB 22


NATIVE LANGUAGE TO ENGLISH TRANSLATOR USING IMAGE PROCESSING

2.8 PAPER-TITLE:
Efficient Zone Based Feature Extraction Algorithm For Handwritten
Recognition Of Four Popular South Indian Scripts

ABSTRACT:
In the field of pattern recognition, Character recognition is on of the most important fields.
Extensive attention has been put in the field of handwritten character recognition in recent
times. There are two types of recognition systems online and offline. Off-line handwriting
recognition falls under the broad field of optical character recognition. In this paper the
author proposes Zone centroid and Image centroid based Distance metric feature extraction
system. The character centroid is computed and the image is further divided into n equal
zones. Average distance from the character centroid to the each pixel present in the zone is
computed.

METHODOLOGY:

Zone Based hybrid method is proposed for feature extraction. This method is chosen because
of its robustness to small variation, simplicity and its impressive recognition rate. It provides
good recognition even when preprocessing steps are not included.

The image is divided into fifty equal parts after the character centroid is computed. Average
distance from the character centroid to the each pixel present in the zone is computed. Zone
centroid is computed and average distance from the zone centroid to each pixel present in the
zone is computed. Repeat this procedure for all the zones. Finally 100 such features are used
for feature extraction.

For classification and recognition nearest neighbor classifier and feed forward back
propagation neural network classifiers are used.

Nearest-Neighbour classifier is used for large scale pattern matching. Distances from the new
vector to all stored vectors are computed. Similarity measurement is the basis of
classification.

DEPT OF ISE, JSSATEB 23


NATIVE LANGUAGE TO ENGLISH TRANSLATOR USING IMAGE PROCESSING

Artificial Neural Networks have been used extensively both for the recognition of non-
Indian as well as Indian digits.

The structure of network and training algorithm determines the recognition rate of the
backpropagation model. The network is trained using the feed forward method. The network
structure is determined by the number of hidden and non hidden layers. Complete Neuron
connectivity is established. The network consists of 50 nodes in the input layer
(corresponding to one feature in each of the 50 zones), 80 neurons in the hidden layer. The
output layer has 10 neurons corresponding to 10 numerals. Under fitting is caused when there
are an insufficient number of hidden layers, it results in the network not recognising the
numeral because of a insufficient adjustable parameters.

Over fitting is caused when there are too many hidden layers, resulting in the network
failing to generalize.

There are several rules of thumb for deciding the number of neurons in the hidden layer.

• Hn < 2*In. Where Hn is the no. of hidden neurons and In is the number of input neurons.

• Finding the minimum number of epochs taken to recognize a character and recognition
efficiency of training as well as testing samples.

RESULTS

2000 samples were obtained, 1000 samples are used for training purpose and remaining
1000 samples are used for testing. Results of Kannada Numeral Recognition is shown below.

FIG 2.2: RESULT REPRESENTATION FOR PAPER 2.8

DEPT OF ISE, JSSATEB 24


NATIVE LANGUAGE TO ENGLISH TRANSLATOR USING IMAGE PROCESSING

2.9 PAPER-TITLE:
English Sentence Recognition using Artificial Neural Network through Mouse-based
Gestures

ABSTRACT

One of the most fundamental forms of communication is handwriting. The basic issue with
handwriting recognition has existed for about six decades now, though what now poses as a
major problem is unrestricted HR of sentences. The main focus here is automated English
handwriting using mouse gestures which is achieved using artificial neural network. Back
propagation is used to train this neural network making it self sufficient and it gave positive
outcomes. This technique gives a brilliant way to retrieve the outlines of sentences and
compares with the image and mentions the area and then makes the use of the network to
identify the sentence. This method gave successful results with high accuracy.

ENGLISH

This language is a part of the Indo European cluster of languages. The immediate alive
siblings of this language undoubtedly are Frisian and Scots. In Friesland as well as islands of
North sea and a few places in Germany are the only places where this language is living. This
language has a beautiful history that can be broken down into Old, Middle and Modern. Pure
English is hardly spoken anywhere as it has been influenced by a lot of foreign languages
over the years. As English has accepted a few of these words as its own, its even difficult to
find a perfect form of this language.

METHODOLOGY
Pre-processing:
Different common methods are used for pre processing, these are mostly basic and generally
used for all pre-processing processes. Thinning is a very important part of this process. Here,
large pixel picture is condensed into a smaller one. It’s applied only on binary images and the
outcome also is a binary image. The unwanted pixels are removed similar to erosion, and the

DEPT OF ISE, JSSATEB 25


NATIVE LANGUAGE TO ENGLISH TRANSLATOR USING IMAGE PROCESSING

image is thinned to required amount of thickness. This helps in better recognizing of the
image as the noise has been removed.
The feature extraction:
In this process, only the required features and extracted and further used. The features are
first retrieved according to relevancy and then this feature is eventually used to identify the
sentence. The features that are retrieved from raw data are further used for classification and
they should be such that they should be create minimum difference when in a class and the
maximum differences between different classes. At this stage, every character is given its
own identity which is actually nothing but its feature vector.

Proposed technique:
The approach used in the paper is for recognizing written English sentences using mouse-
gestures. This is achieved by employing neural networks. The images that need to be
recognized are provided as inputs to the network which is taken using the mouse gestures.
First, the network is fed some inputs consisting of English sentences to train it. Once its
trained, the features are similar to the data that was fed only then is the script termed as a true
one, else its considered to be false. This network paper proposes an English consists of three
tiers. The first tier is used for input, second is hidden(used for training) and the third one
gives us the outcome.

RESULTS:

The outcomes have proved to be successful for the continuous as well as discrete written
English using mouse-based gestures. The average success rate of recognizing the sentences in
English using the artificial neural network has come to be about 94 % for the sample data that
was taken from ten different people for five different samples.

DEPT OF ISE, JSSATEB 26


NATIVE LANGUAGE TO ENGLISH TRANSLATOR USING IMAGE PROCESSING

2.10 PAPER-TITLE:

OCR for printed Kannada text to Machine editable format using Database approach

ABSTRACT:

This paper deals with the character recognition of Kannada script for printed files. The
Application first scans the printed text and then extracts the characters of Kannada script and
then undergoes segmentation process. In order to store printed texts database is used and
there was an accuracy up to 99.99%.

METHODOLOGIES:

Segmentation is the process of segmenting the lines, words and characters from the dataset
that enables an application to recognize the characters and helps in conversion process.
Kannada script has many base consonants and vowel modifiers so this method is
implemented.
The image is scanned and is converted into a gray scale image. The image is rotated
horizontally for better understanding and the text image is recognized.
The above process is repeated for character segmentation and word segmentation.

RESULTS:

This method gives 99.99% accuracy. But this process needs a lot of space and lot of
computation is necessary.

DEPT OF ISE, JSSATEB 27


NATIVE LANGUAGE TO ENGLISH TRANSLATOR USING IMAGE PROCESSING

CHAPTER 3

OBJECTIVES AND
PROBLEM
STATEMENT

DEPT OF ISE, JSSATEB 28


NATIVE LANGUAGE TO ENGLISH TRANSLATOR USING IMAGE PROCESSING

3.1 OBJECTIVES:
These are the goals laid out to be achieved by the end of the project:

● To provide an user interface where inputting object is easy

● It should be user-friendly

● Application should stand capable of pre-processing the input to such that the
foreground overpowers the background.

● Application should be capable to identify the text portions from the input

● Application should extract text contained in the given input image and present it to
the end-user

3.2 PROBLEM DEFINITION:


● With our native language to English converter we concentrate on bringing an output
with higher accuracy than that is already been achieved in by previous efforts in the
same area

● The character dataset covered previously was small so as to avoid any complication in
the data processing and we try to incorporate larger datasets(along with the character
modifiers) in order to achieve better results and cover a larger portion of the language

● The applications previously made were specific to a particular area whereas our
application covers a major part of the population, including:
✓ Preserving the culture through old scripts
✓ Redefining peer-to-peer learning for rural kids
✓ Help tourists by translating sign boards
✓ Providing a means of meaningful audio output of correct English word/sentence
for a given Kannada scripture

DEPT OF ISE, JSSATEB 29


NATIVE LANGUAGE TO ENGLISH TRANSLATOR USING IMAGE PROCESSING

CHAPTER 4

METHODOLOGY

DEPT OF ISE, JSSATEB 30


NATIVE LANGUAGE TO ENGLISH TRANSLATOR USING IMAGE PROCESSING

4. METHODOLOGY

4.1 METHODOLOGY:

FIG 4.1 PROCESS DIAGRAM FOR THE SYSTEM

DEPT OF ISE, JSSATEB 31


NATIVE LANGUAGE TO ENGLISH TRANSLATOR USING IMAGE PROCESSING

4.1.1 Binarization:

The conversion of an image from pixel state to a binary state is known as binarization:

FIG 4.2 INPUT IMAGE

● Consider the above RGB image, a child writing a few lines of Kannada on a
blackboard.
● RGB is a standard system of representing all the colours in the visible spectrum by
combining red, blue and green in different portions.
● At first the image is converted into grayscale[8]. Only intensity information is carried
in a grayscale Image.

DEPT OF ISE, JSSATEB 32


NATIVE LANGUAGE TO ENGLISH TRANSLATOR USING IMAGE PROCESSING

FIG 4.3 GRAY SCALE CONVERTED IMAGE

● After the conversion a threshold value is selected to separate the foreground and
background data in the image

FIG 4.4 IMAGE AFTER APPLYING A THRESHOLD VALUE

DEPT OF ISE, JSSATEB 33


NATIVE LANGUAGE TO ENGLISH TRANSLATOR USING IMAGE PROCESSING

4.1.2 Segmentation:

The process of separating an image into different segments is known as segmentation..

Simplification and meaningful representation is the end goal of segmentation. Typical uses of
image segmentation are boundary and edge detection in images. More precisely, what image
segmentation does is assign a label to each pixel, so that similarly labeled pixels have similar
nature.

Region detection and edge detection are closely related, both depend on a sudden contrast in
intensity at the region boundaries. Due to these properties we can say that edge detection is
another way of segmentation..

Edge detection methods can be applied to the spatial-taxon region, in the same manner they
would be applied to a silhouette. This method is particularly useful when the disconnected
edge is part of an illusory contour

Character Extraction Algorithm[9]:

 Store info on visited pixels by using a list


 Pixel-by-Pixel scanning of each row
 Everytime a black pixel is encountered, crosscheck for presence in the traverse list, if
the pixel is unique then continuously apply the edge detecting algorithm otherwise
ignore the pixel and continue
 Once the algorithm returns a value store this into the traverse list
 Perform recurring analysis

DEPT OF ISE, JSSATEB 34


NATIVE LANGUAGE TO ENGLISH TRANSLATOR USING IMAGE PROCESSING

4.1.3 Feature Extraction:

We have different techniques for different efficiency requirements, while training a neural
network

 Character Geometry based Feature Extraction


 Gradient Features using Feature Extraction

Character Geometry based Feature Extraction:

To estimate positional features and also estimate different types of line formations in a
dataset we use Character Geometry based Feature Extraction..

Zoning Algorithm:

 The centroid of the input image is computed


 Image is partitioned into n zones, similar in size.
 The angle from the centroid to each pixel inside a zone is computed.
 The average of all angles in the zone is one feature.
 The center of each zone is two feature additions
 Continue the above steps for all zones
 In the end we get three features per zone for classification and recognition

Gradient Features based Feature Extraction:

A vector is defined as a quantity that has both magnitude and direction. A gradient is a vector
quantity.

The gradient operator generates a two dimensional vector graph, containing information
about the direction of highest possible surge in magnitude and its value is directly
proportional to the rate of change in the direction.

DEPT OF ISE, JSSATEB 35


NATIVE LANGUAGE TO ENGLISH TRANSLATOR USING IMAGE PROCESSING

Fig 4.5 GRADIENT FEATURE EXTRACTION

4.1.4 Classification:

Convolutional Neural Network

A neural network is made up of multiple nodes known as neurons. Neurons when grouped
together are referred to as Layers. Multiple layers are interconnected.

Each perceptron is tasked to complete a simple calculation. The result of this calculation is
transmitted to all the nodes that it is connected to.

Fig 4.6 ARTIFICIAL NUERAL NETWORK

DEPT OF ISE, JSSATEB 36


NATIVE LANGUAGE TO ENGLISH TRANSLATOR USING IMAGE PROCESSING

Convolution means “to combine”, therefore, a convolutional neural network does a


combination of many small attributes into one in each iteration. They are highly popular in
computer vision and it applications

The Convolution layer is the first layer. Тhe image input to it is a always in a matrix format.
The software selects a smaller matrix in the original matrix, this is called a neuron(filter,
core). The neuron produces convolution. The simple multiplication involves getting the
product of each obtained value to the initial pixel value. This is followed by the summation of
the products.. Finally a number is obtained. The neuron now moves one value to the right
while performing a similar operation. After the neuron passes through all the positions a
smaller matrix is obtained.

Fig 4.7 ‘CONVOLVED’ INPUT


After each convolution a new nonlinear layer is added. The nonlinear property is brought by
the activation function. Without this property a network would not be sufficiently intense and
will not be able to model the response variable. The nonlinear layer is followed by a pooling
layer. It performs downscaling operations on them using the width and height property. This
results in the image aggregate being diminished. A fully connected layer is attached after the
convolutions are completed. The output information of the convolution layer is taken care of
in this layer.

DEPT OF ISE, JSSATEB 37


NATIVE LANGUAGE TO ENGLISH TRANSLATOR USING IMAGE PROCESSING

CHAPTER 5

REQUIREMENT
SPECIFICATION

DEPT OF ISE, JSSATEB 38


NATIVE LANGUAGE TO ENGLISH TRANSLATOR USING IMAGE PROCESSING

5. REQUIREMENT SPECIFICATION

5.1 Hardware and Software Requirements:

Windows

Anaconda 5.3 Windows 8 or higher


With python 3.7
64 bit

Processor Intel core i5 or higher

RAM 4GB RAM

Disk Space 700MB for Anaconda package installer


And another 2GB for datasets and libraries

Graphics 8-bit graphics adapter and display(for 256 simultaneous


Adapter colours)

CD-ROM Drive For Installation through CD

5.2 Low Level Specification:

 Microsoft Windows Supported Graphic Accelerator card, Scanner.


 Microsoft word(office 2010 or above)
 Internet connection for research and libraries

DEPT OF ISE, JSSATEB 39


NATIVE LANGUAGE TO ENGLISH TRANSLATOR USING IMAGE PROCESSING

CHAPTER 6

IMPLEMENTATION

DEPT OF ISE, JSSATEB 40


NATIVE LANGUAGE TO ENGLISH TRANSLATOR USING IMAGE PROCESSING

6. IMPLEMENTATION

The implementation is broadly classified into four major modules, i.e, Input Preprocessing,
Character Recognition, Conversion and User Interface. The system is developed using python
libraries like scikit image which is an image processing library that facilitates extraction of
features, noise removal, numpy, openCV, Django, Scipy, Keras and Matplotlib.

6.1 Input Pre-processing


Pre-processing of data is a process of converting raw data into readable clean data. Initially
the dataset is collected from different sources which is considered as raw data not useful for
analysis. The pre-processing stage ensures better accuracy, hence the raw data is converted
into a specified format.
The system is given a grey scale image by converting grey scale image into binary image.
This process is called binarization. A threshold algorithm is used for this process. The image
pixel will have the intensity ranging between 0-255. A grey scale image has only one factor
showing the brightness of the image.Binarized image will have either 0 or1. So the
conversion of raw data into a clean readable data includes many stages like filtering,
normalising the contrast of image, skew detection, skew correction etc. The unwanted
presence of data is removed in the process of filtering and the focus is given only on the part
which has to be recognized. The contrast of the image is adjusted automatically, the angle of
the image is detected and corrected if the system is not able to read the image.
The lines are might not be in a single line. There might be alignment problem and few
characters might overlap each other so an algorithm called Hough is used on inverted images.

This script is used to create MNIST like dataset image where each row is a character.
rootdir is the source of the image data which is used to create the rows.
no_in_line specifies number of characters to put in each line

len_dirs = len(os.listdir(rootdir))

final_img = Image.new("L", (52*int(no_in_line), 52*len_dirs))

DEPT OF ISE, JSSATEB 41


NATIVE LANGUAGE TO ENGLISH TRANSLATOR USING IMAGE PROCESSING

for root, dirs, files in os.walk(rootdir, topdown=False):

dirs.sort(key=lambda f: int(''.join(filter(str.isdigit, f))))

for name in dirs:

if not os.path.isdir(os.path.join(rootdir, name)):

continue

dir3 = os.path.join(root, name)

os.chdir(os.path.abspath(dir3))

flist = glob.glob('*.jpg')

flist.sort(key=lambda f: int(''.join(filter(str.isdigit, f))))

flist = flist[:len(no_in_line)]

lines = Image.new("L", (52*int(no_in_line), 52))

x_offset = 0

for im in flist:

img = Image.open(im)

lines.paste(img, (x_offset, 0))

x_offset += 52

final_img.paste(lines, (0, y_offset))

y_offset += 52

os.chdir(ori)

final_img.save(str(rootdir)+'.jpg')

DEPT OF ISE, JSSATEB 42


NATIVE LANGUAGE TO ENGLISH TRANSLATOR USING IMAGE PROCESSING

image = sys.argv[1]

# get the file name without extension

filename = filename = os.path.splitext(image)[0]

# main formula: http://i.imgur.com/wA7gEks.png

# initialize parameters to be used ( values have been generated after using numerous
iterations)

s=1

lmda = 10

epsilon = 0.0001

# open the image as an array of pixels

X = numpy.array(Image.open(image))

# avrage value of pixel in original image

X_average = numpy.mean(X)

# scale the value of pixels

X = X - X_average

# calculate overall contrast of image

contrast = numpy.sqrt(lmda + numpy.mean(X**2))

# normalize

X = s * X / max(contrast, epsilon)

# save the image

scipy.misc.imsave(filename + '_contrast.png')

DEPT OF ISE, JSSATEB 43


NATIVE LANGUAGE TO ENGLISH TRANSLATOR USING IMAGE PROCESSING

After the conversion of raw data into clean data, the text should be divided into rows, words
and characters. The process of dividing the text document into lines, words and characters is
called segmentation. After the image is pre-processed, it is divided into several segments.
These segments help in the process of character recognition.
Dividing the document into lines is called line segmentation. Dividing the lines into words is
called word segmentation. Dividing the words into single characters is called character
segmentation. The steps to be followed are:
 Divide the text into rows
 Divide the rows into words
 Divide the words into characters.

Segmentation process helps in recognizing characters. Inoder to achieve accuracy


augmentation is also done. The augmentation steps considered are:
 Aspect Ratio
 Rotation
 Padding
 Noise
 Resizing

Fig 6.1.1 Augmentation

DEPT OF ISE, JSSATEB 44


NATIVE LANGUAGE TO ENGLISH TRANSLATOR USING IMAGE PROCESSING

Code:
def segment(file):
rootdir = 'web_app/hwrkannada/hwrapp/static/hwrapp/images/Processed_' + \
os.path.splitext(ntpath.basename(file))[0]
# Generate directory name to store segmented images
directory = rootdir + '/Segmented_' + \
os.path.splitext(ntpath.basename(file))[0]

# Check if subfolder already exists. If it doesn't, create it


if not os.path.exists(directory):
os.makedirs(directory)

# Read the image as numpy array


image = cv2.imread(file)

# Get sentences as separate images


sentences = segment_sentence(image, directory)

for i in range(0, len(sentences)):


# Get words as separate images
words = segment_word(sentences[i], directory, i)

for j in range(0, len(words)):


# Get characters as separate images
characters, ottaksharas = segment_character(words[j], directory)

for key in characters:


imageName = str(i+1).zfill(2) + '-' + str(j+1).zfill(2) + \
'-' + str(key+1).zfill(2) + '-0' + '.png'

DEPT OF ISE, JSSATEB 45


NATIVE LANGUAGE TO ENGLISH TRANSLATOR USING IMAGE PROCESSING

6.2 CHARACTER RECOGNITION

A neural network built is trained on the datasets. The convolution neural network layers and
recurrent neural network layers are used. A neural network is made up of multiple nodes
known as neurons. Neurons when grouped together are referred to as Layers. Multiple layers
are interconnected. Each perceptron is tasked to complete a simple calculation. The result of
this calculation is transmitted to all the nodes that it is connected to.

The image input to it is always in a matrix format. The software selects a smaller matrix in
the original matrix, this is called a neuron (filter, core). The neuron produces convolution.
The simple multiplication involves getting the product of each obtained value to the initial
pixel value. This is followed by the summation of the products. Finally a number is obtained.
The neuron now moves one value to the right while performing a similar operation. After the
neuron passes through all the positions a smaller matrix is obtained.

The convolution layers are trained to extract relevant features that is nothing but feature
extraction. All these layers are made up of three operations. The first layer applies a certain
operation. After each convolution a new nonlinear layer is added. The nonlinear property is
brought by the activation function. Without this property a network would not be sufficiently
intense and will not be able to model the response variable. The nonlinear layer is followed
by a pooling layer. It performs downscaling operations on them using the width and height
property. This results in the image aggregate being diminished. A fully connected layer is
attached after the convolutions are completed. The output information of the convolution
layer is taken care of in this layer.

Code:

def get_image_size():

img = cv2.imread(os.path.join(directory, '1', '1.jpg'), 0)

return img.shape

def get_num_of_classes():

DEPT OF ISE, JSSATEB 46


NATIVE LANGUAGE TO ENGLISH TRANSLATOR USING IMAGE PROCESSING

return len(os.listdir(directory))

directory = sys.argv[1]

image_x, image_y = get_image_size()

def cnn_model():

num_of_classes = get_num_of_classes()

model = Sequential()

model.add(Conv2D(52, (5, 5), input_shape=(

image_x, image_y, 1), activation='tanh'))

model.add(MaxPooling2D(pool_size=(2, 2), strides=(2, 2)))

model.add(LRN2D(alpha=0.1, beta=0.75))

model.add(Conv2D(64, (5, 5), activation='tanh'))

model.add(MaxPooling2D(pool_size=(5, 5), strides=(5, 5)))

model.add(Flatten())

model.add(Dropout(0.5))

model.add(Dense(num_of_classes, activation='softmax'))

sgd = optimizers.SGD(lr=1e-2)

model.compile(loss='categorical_crossentropy', optimizer=Adam(

lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=1e-08, decay=0.0), metrics=['accuracy'])

filepath = "cnn_model.h5"

checkpoint1 = ModelCheckpoint(

filepath, monitor='val_acc', verbose=1, save_best_only=True, mode='max')

DEPT OF ISE, JSSATEB 47


NATIVE LANGUAGE TO ENGLISH TRANSLATOR USING IMAGE PROCESSING

callbacks_list = [checkpoint1]

return model, callbacks_list

def train():

with open("train_images", "rb") as f:

train_images = np.array(pickle.load(f))

with open("train_labels", "rb") as f:

train_labels = np.array(pickle.load(f), dtype=np.int32)

with open("test_images", "rb") as f:

test_images = np.array(pickle.load(f))

with open("test_labels", "rb") as f:

test_labels = np.array(pickle.load(f), dtype=np.int32)

train_images = np.reshape(

train_images, (train_images.shape[0], image_x, image_y, 1))

test_images = np.reshape(

test_images, (test_images.shape[0], image_x, image_y, 1))

train_labels = np_utils.to_categorical(train_labels)

test_labels = np_utils.to_categorical(test_labels)

model, callbacks_list = cnn_model()

history = model.fit(train_images, train_labels, validation_data=(

test_images, test_labels), epochs=100, batch_size=100, callbacks=callbacks_list)

DEPT OF ISE, JSSATEB 48


NATIVE LANGUAGE TO ENGLISH TRANSLATOR USING IMAGE PROCESSING

scores = model.evaluate(test_images, test_labels, verbose=0)

print("CNN Error: %.2f%%" % (100-scores[1]*100))

# summarize history for accuracy

plt.plot(history.history['acc'])

plt.plot(history.history['val_acc'])

plt.title('model accuracy')

plt.ylabel('accuracy')

plt.xlabel('epoch')

plt.legend(['train', 'test'], loc='upper left')

# plt.show(hold=False)

plt.savefig('acc.png')

plt.clf()

# summarize history for loss

plt.plot(history.history['loss'])

plt.plot(history.history['val_loss'])

plt.title('model loss')

plt.ylabel('loss')

plt.xlabel('epoch')

plt.legend(['train', 'test'], loc='upper left')

# plt.show(hold=False)

plt.savefig('loss.png')

# model.save('cnn_model_keras2.h5')

DEPT OF ISE, JSSATEB 49


NATIVE LANGUAGE TO ENGLISH TRANSLATOR USING IMAGE PROCESSING

# create the required datatset type from the images

load_images.create_pickle(directory)

train()

K.clear_session()

6.3 Conversion

After the recognition of characters, it has to be converted into English. Each character image
has a label attached to it, so the images with a similar label will be grouped together. And the
grouped image is converted into English. You need to train the data, so the loss has to be
calculated. For each line-word or a character, a character is specified. So for a horizontal line
or a word a corresponding character is assigned and is trained accordingly. But the drawback
is that its time consuming. If there are duplicate characters then they have to be removed in
order to get accuracy and we get the proper conversion of the word.

Fig 6.3.1 CTC Loss

The loss calculation is shown in the above figure. Function calls like ctc_loss is used in this
process.

DEPT OF ISE, JSSATEB 50


NATIVE LANGUAGE TO ENGLISH TRANSLATOR USING IMAGE PROCESSING

Code:
img = cv.imread(r'final.jpg', 0)
#gray = cv.cvtColor(img,cv.COLOR_BGR2GRAY)
# Now we split the image to 50x10 cells, each 52x52 size
cells = [np.hsplit(row, 50) for row in np.vsplit(img, 10)]
# Make it into a Numpy array.
x = np.array(cells)
# Now we prepare train_data and test_data.
train = x[:, :25].reshape(-1, 2704).astype(np.float32)
test = x[:, 25:50].reshape(-1, 2704).astype(np.float32)

# Create labels for train and test data


k = np.arange(10) # 10 because 10 number to classify
train_labels = np.repeat(k, 25)[:, np.newaxis]
test_labels = train_labels.copy()

# Initiate kNN, train the data, then test it with test data for k=1
knn = cv.ml.KNearest_create()
knn.train(train, cv.ml.ROW_SAMPLE, train_labels)

ret, result, neighbours, dist = knn.findNearest(test, k=5)

# Now we check the accuracy of classification


# For that, compare the result with test_labels and check which are wrong

matches = result == test_labels


correct = np.count_nonzero(matches)
accuracy = correct*100.0/result.size
print(accuracy)

DEPT OF ISE, JSSATEB 51


NATIVE LANGUAGE TO ENGLISH TRANSLATOR USING IMAGE PROCESSING

# we store the train numpy array for further use


np.savez('knn_data.npz', train=train, train_labels=train_labels)

# Now load the data


with np.load('knn_data.npz') as data:
print(data.files)
train = data['train']
train_labels = data['train_labels']
print(train)

DEPT OF ISE, JSSATEB 52


NATIVE LANGUAGE TO ENGLISH TRANSLATOR USING IMAGE PROCESSING

CHAPTER 7

TESTING

DEPT OF ISE, JSSATEB 53


NATIVE LANGUAGE TO ENGLISH TRANSLATOR USING IMAGE PROCESSING

7. SYSTEM TESTING
Once the different units of a system are integrated, it has to be tested to make sure they are
incompliance with our requirements. Once different parts of a system are put together it
might have an unexpected behavior and this testing is carried out in order to make sure that
does not take place.
System testing is generally taken under black box testing. Black box testing is a type of
testing that is carried out without having information about the details of the inner working.

TYPES OF TESTING
There are various types of system testing, a few are mentioned below:
1. Usability Testing – This concerns with making the system user-friendly and
more flexible for those who do not yet have a clear idea of how it works as well

2. Load Testing – As the load that is provided during in the initial stages are just enough
to check the system for the specified requirement, they are not sufficient. So this
testing should be performed to make sure the system works in real life environments.

3. Regression Testing- Over the software cycle a lot of changes are made to the system
which affects the system in unexpected ways. To make sure none of these changes
have caused bugs regression testing is done. This might also be done to take care of
any new old recurring bugs

4. Recovery Testing - To vouch for the reliability, trustworthiness as well as the ability
to recover from future crashes, recovery testing is done.

5. Migration Testing - Sometimes the initial stages of a system are developed in an old
infrastructure and by the time it completes there are a lot of newer versions available.
To make sure the system works equally well in the newer infrastructure migration
testing is done.

DEPT OF ISE, JSSATEB 54


NATIVE LANGUAGE TO ENGLISH TRANSLATOR USING IMAGE PROCESSING

6. Functional Testing – This requires going through the whole system to check if any
important functions are missing. We could also add some more functionalities to
enhance the system features.

7. Hardware/Software Testing – The tester has to makes sure the software and the
hardware work well together. As new components are added it has to be taken care
that it doesn’t create any obstacles for the compliance between the hardware and the
software.

TEST PLAN
We have employed a few testcases to ensure the well working of our project. They are as
follows:
TESTCASE 1
Testcase Description:
The input that should be scanned should maintain a size range. The image cannot be less than
2kb.
Testcase Input:
We input an image less than 2kb
Expected Result:
It should fail as it doesn’t comply with the specified range.
Actual Result:
The system produces distorted images as the provided input is not sufficient to be read from
Remark:
PASS

DEPT OF ISE, JSSATEB 55


NATIVE LANGUAGE TO ENGLISH TRANSLATOR USING IMAGE PROCESSING

TESTCASE 2
Testcase Description:
The input that should be scanned should maintain a size range. The image cannot be more
than 10 mb.
Testcase Input:
We input an image more than 10mb
Expected Result:
It should fail as it doesn’t comply with the specified range.
Actual Result:
The system produces distorted images are too huge to let the curves of letters be identified.
Remark:
PASS

Fig7.1 Input not in specified range is scanned

DEPT OF ISE, JSSATEB 56


NATIVE LANGUAGE TO ENGLISH TRANSLATOR USING IMAGE PROCESSING

Fig 7.2 Input not in specified range gives a distorted output

The input gives distorted output right after the first pre-processing step and this leads to
wrong prediction of model due to mismatch of feature.

TESTCASE 3
Testcase Description:
Image lying only in the specified range of 2k<image<10mb will be processed further as the
model is trained that way.
Testcase Input:
An image is input in the specified range between 2kb and 10mb.
Expected Result:
It should fail as it doesn’t comply with the specified range.
Actual Result:
The system produces distorted images are too huge to let the curves of letters be identified.
Remark: PASS

DEPT OF ISE, JSSATEB 57


NATIVE LANGUAGE TO ENGLISH TRANSLATOR USING IMAGE PROCESSING

Fig 7.3 correct input image ready for analysis

Fig 7.4 Input in specified range gets successfully segmented

DEPT OF ISE, JSSATEB 58


NATIVE LANGUAGE TO ENGLISH TRANSLATOR USING IMAGE PROCESSING

Fig 7.5 Input in specified range gets successfully translated

DEPT OF ISE, JSSATEB 59


NATIVE LANGUAGE TO ENGLISH TRANSLATOR USING IMAGE PROCESSING

CHAPTER 8

CONCLUSION

DEPT OF ISE, JSSATEB 60


NATIVE LANGUAGE TO ENGLISH TRANSLATOR USING IMAGE PROCESSING

Conclusion

Communication in this fast pace world has become as necessary as shelter is for
survival. Language barrier only makes it worse and since there so many languages in
the world, 28 official only in India it makes it hard for even a multilingual person to
manage. This application will help people who do not know Kannada have an easy
access to the language without having to go around asking people which will again put
them in a loop of language barrier. This project uses a Convolutional Neural Network
model to efficiently and accurately recognise and classify Kannada scriptures from
input images and translates the recognised words to meaningful English words.

We also hope that, future work in the field of Kannada OCR will be developing an app
which is compatible with all the operating systems that will facilitate the use of this
application on mobile phones. Also, to come up with a dynamic translator through the
use of APIs. Once invested with time and money to acquire the necessary huge datasets,
this application can be used to recognize any Kannada word from any Kannada script

DEPT OF ISE, JSSATEB 61


NATIVE LANGUAGE TO ENGLISH TRANSLATOR USING IMAGE PROCESSING

CHAPTER 9

REFERENCES

DEPT OF ISE, JSSATEB 62


NATIVE LANGUAGE TO ENGLISH TRANSLATOR USING IMAGE PROCESSING

7. REFERENCES

7.1 References
[1] Review of Automatic Handwritten Kannada Character Recognition Technique Using
Neural Network 2017 IEEE

[2] Rituraj Kunwar, P Mohan, K Shashikiran, A.G Ramkrishnan “Unrestricted Kannada


Online Handwritten Akshara Recognition using SDTW” International Conference of
signal processing and Communication (SPCOM 2010) IEEE 2015
[3] Basappa B. Kodada and Shivakumar K. M. “Unconstrained Handwritten Kannada
Numeral Recognition.” International Journal of Information and Electronics
Engineering,2013
[4] 1K. Indira, 2 S. Sethu Selvi “Kannada Character Recognition System.” InterJRI
Science and Technology 2010
[5] T V ASHWIN, P S SASTRY “A font and size-independent OCR system for printed
Kannada documents using support vector machines”. Department of Electrical
Engineering, Indian Institute of Science (Sadhana 2002)
[6] Kumar B.Y Dr. Keshava Prasanna Dr Savitha “A Novel Approach on Offline
Kannada and English Handwritten Words”. International Journal of Scientific
Engineering and Applied Science
[7] Sunitha Anne M. O. Chacko, Ansu Joseph, Jeena Joji Anchanattu, Sreelakshmi .S
“Offline Handwritten Character Recognition in South Indian Scripts: A Broad
Visualization” International Journal of Computer Science and Information
Technologies
[8] www.felixniklas.com/imageprocessing/binarization
[9] https://www.slideshare.net/HWRformat=40/gcyr
[10] https://ieeexplore.ieee.org/document/5699408

DEPT OF ISE, JSSATEB 63


NATIVE LANGUAGE TO ENGLISH TRANSLATOR USING IMAGE PROCESSING

CHAPTER 10

SNAPSHOTS

DEPT OF ISE, JSSATEB 64


NATIVE LANGUAGE TO ENGLISH TRANSLATOR USING IMAGE PROCESSING

10. SNAPSHOTS

Fig 10.1

Fig 10.2 Line Segmentation

DEPT OF ISE, JSSATEB 65


NATIVE LANGUAGE TO ENGLISH TRANSLATOR USING IMAGE PROCESSING

Fig 10.3 Word Segmentation

Fig 10.4 Character Segmentation

DEPT OF ISE, JSSATEB 66


NATIVE LANGUAGE TO ENGLISH TRANSLATOR USING IMAGE PROCESSING

Fig 10.5 Character Augmentation

Fig 10.6 Character Recognition

DEPT OF ISE, JSSATEB 67


NATIVE LANGUAGE TO ENGLISH TRANSLATOR USING IMAGE PROCESSING

Fig 10.7 Conversion

DEPT OF ISE, JSSATEB 68

S-ar putea să vă placă și