Documente Academic
Documente Profesional
Documente Cultură
BACHELOR OF ENGINEERING
IN
INFORMATION SCIENCE AND ENGINEERING
For the Academic Year 2018-2019
Submitted by
Arjun Raja Y [1JS15IS010]
Ashika N B [1JS15IS012]
Chaitra Kulkarni [1JS15IS019]
Aditya Abhishek [1JS14IS001]
2018-2019
DEPARTMENT OF INFORMATION SCIENCE AND ENGINEERING
JSS ACADEMY OF TECHNICAL EDUCATION
JSS MAHAVIDYAPEETHA, MYSURU
JSS ACADEMY OF TECHNICAL EDUCATION
JSS Campus, Dr.Vishnuvardhan Road, Bengaluru-560060
CERTIFICATE
This is to certify that Project work Phase II entitled “Native language to English
translator using Image Processing ” is a bonafide work carried out by Arjun Raja
Y [1JS15IS010], Ashika N B [1JS15IS012], Chaitra Kulkarni [1JS15IS019], Aditya
Abhishek [1JS14IS001] in partial fulfilment for the award of degree of Bachelor of
Engineering in Information Science and Engineering of Visvesvaraya Technological
University Belagavi during the year 2018-2019.
First and foremost we would like to thank his Holiness Jagadguru Sri
Shivarathri Deshikendra Mahaswamiji and Dr. Mrityunjaya V Latte,
Principal, JSSATE, Bangalore for providing an opportunity to carry out the
Project Phase-I+ Seminar(15ISP78) as a part of our curriculum in the partial
fulfilment of the degree course.
We express our sincere gratitude for our beloved Head of the department,
Dr. Dayananda P, for his co-operation and encouragement at all the moments
of our approach.
We would like to thank all ISE department teachers and non teaching
staff for providing us with their valuable guidance and for being there at all
stages of our work.
1. Introduction
1
1.1 Overview
2
2. Literature Survey 4
3.1 OBJECTIVES 29
4. Methodology 30
4.1 Methodology 30
4.1.1 Binarization 32
4.1.2 Segmentation 34
4.1.4 Classification 36
5. Requirement Specification 38
6. Implementation 40
6.3 Conversion 50
7. Testing 53
8. Conclusion 60
9. References 62
10. Snapshots 64
Publication Details
Plagiarism Report
LIST OF FIGURES
Communication has come a long way since ancient times, what remains constant
though is the language barrier. If we have control over this aspect, exchanging information
for - be it the educational purpose or convenience purpose - simplifies. As we know, Image
processing and it’s management is the present. The proposed system provides automated data
processing of images with Kannada script. An efficient pre-processing strategy is presented
for extracting features from Kannada characters and then translating them into English using
neural networks. To overcome the barrier of non-existent standardized script, augmentation
of available script is employed. The Kannada to English converter application helps those
unfamiliar to either Kannada or English with the help of dictionary that provides English
meaning for the processed Kannada images. It serves a wide variety of population ranging
from the less privileged to the tourists easing everyday tasks.
NATIVE LANGUAGE TO ENGLISH TRANSLATOR USING IMAGE PROCESSING
CHAPTER 1
INTRODUCTION
1. INTRODUCTION
1.1 OVERVIEW
India is getting more digitized day by day. From business transactions to food
deliveries, everything has found its place online. It’s becoming necessary that a channel is
provided for these to run smoothly. Just like there are abundant of handwritten English
recognizing software, we need the same for the native languages as well, so the culture can
be preserved while not hindering routine activities.
Written Indian scripts are difficult for machines to understand because they consist a
lot characters including numbers, vowels, modifiers and consonants. There has been immense
advancement in the recognition system created for native languages but these come with their
limitations. These have been either been built for a subset of characters or base characters,
ignoring modifiers of any kind. While these were created with their applicability in minds,
there is no system that could recognize the language as a whole. The idea is to keep create an
agency to convert Kannada to English and help them learn and use it better.
Kannada:
The official language of a state Karnataka located in the south of India is Kannada. This
language is used by about 49 million people in Karnataka and also popular as the Dravadian
language.
Kannada comes with its script that is derived from another script known as Brahmi.It
has its own script derived from Brahmi. This script contains a root set that has 52 characters
along with 16 vowels as well as 36 consonants. Furthermore, there are also certain tokens
used to modify these root consonants known as the vowel and consonant modifier. The script
has the same count of root characters as these modifiers. There is also another set of
characters known as aksharas that are built by binding together the tokens according to which
consonants, modifiers, vowel modifiers are selected keeping in sticking to a set of rules.
CHAPTER 2
LITERATURE
SURVEY
2. LITERATURE SURVEY
2.1 PAPER-TITLE:
ABSTRACT:
In this paper, automatic processing of forms written in Kannada language is consider. The
Author proposes Principal Component Analysis (PCA) and Histogram of oriented Gradients
(HoG) methods for feature extraction.The Features are then classified using a feed forward
and back propagating ANN. Only 57 attribute are taken as unique classes. Performances
based on two features are compared for multiple classes. The Author arrives at a conclusion,
HoG has better accuracy than PCA as number of classes increased. Due to the huge number
of attribute set in Kannada script a reduction in the recognition accuracy along with an
increase in computational cost is noticed. Author proposes a method to reduce this problem
by reducing the symbol set, where the vowel modifiers (kagunitha) and
consonant modifiers (vattakshara) are considered as separate classes. Devanagari script has
similar characteristics as Kannada script like vowel modifiers, consonant conjuncts etc. Only
small subsets from compound attributes (upper or lower) are considered for recognition.
Many Arabic letters also share common primary shapes, which differs only in the number of
dots and the dots or above or below the primary shape. Fourier descriptors, chain codes and
different shape based features are used for the recognition of handwritten Kannada
characters. Recognition is carried out by Support Vector Mechanism and an accuracy of 95%
is obtained. A brief survey on offline recognition of Devanagari script [10]. Classifier based
Feature extraction is compared in a survey. Gradient and PCA based features with PCA,
SVM and Neural Network classifiers are found to have better recognition accuracy. Multi
Layer perceptron is discussed. MLP is used for recognition of mixed numerals for three
Indian scripts such as Devanagari, Bangla and English. PCA is used to reduce the dimension
of feature vector. It is found that ridge let features offered promising result than PCA.
Accuracy of 87.24% is achieved when SVM is used as the classifier. Literature records few
papers on Kannada HWR. Kannada handwriting recognition and automatic form
processing is considered. HoG and PCA are used for feature extraction. Features
performances are compared for 57 attributes.
Handwritten character recognition of Kannada characters is a very challenging task because
of its large dataset, shape similarity among characters an non-uniqueness in the representation
of diacritics.
METHODOLOGY:
FORM PROCESSING METHODOLOGY:
Automatic Form Processing system involves Image acquisition using scanner, pre-processing
of scanned form, only handwritten character extraction, handwritten character segmentation,
character recognition and storage as shown in .The template of the Birth certificate is created
with all the required data fields. The applicant is then instructed to fill the form in Kannada
language with all the base characters in the upper box and conjuncts in the lower box.
RESULTS:
The segmented characters are used for feature extraction using PCA and HoG. Performance
of these feature for different number of classes are compared using neural network with back
propagation learning. 100 samples per class are used for recognition purpose. All characters
in the kannada script is not considered in this paper. Preprocessing helps identify exclusively
handwritten characters.Extraction of features is done by histogram and principle vector
analysis.
ABSTRACT:
This paper gives a solution to recognise online handwritten characters in which can then be
applied in practical implementation. It takes into consideration the numbers(Indo-Arabic),
punctuations, aksharas as well as the special tokens like #,$ etc. from Kannada. About 69
handwritings have been collected from people belonging to 4 different places to build the
dataset. It was brought to light that as features if smooth derivatives are used the performance
of a DTW classifier is improved by about 88%, but only to be discovered less efficient.
Providing a solution to this, we employed Statistical Dynamic Time Warping
(SDTW) that helped us attain speeded classification with better accuracy, that is enough fast
for real-time implementations. Hence, a quality of development that could be expected from
this. Where domain restrictions like language, post processing and vocabulary can be utilized.
METHODOLOGY:
DYNAMIC TIME WARPING (DTW):
It’s a stratergy that’s used to match unequal or equal length patterns. If they don’t have the
same length then the matching is done using DTW. During the process of matching, a cost
matrix that gives us the point of references that match in the considered pattern,is calculated.
Consider two sequences R = (r1, r2, r3, r4, .rJ ) and T = (t 1, t2, t3, t4, ......tK) where
(rj , tk)
Rd Let the warping path be φ = (φ(1), φ(2), φ(3), .φ(N)) with φ(n)=(rj , tk) which gives the
details of alignment of pattern R to T.
1) Warping path:
a) The first and last points of pattern T are compared with the first and last points of
pattern R, respectively. i.e., φ(1) = (1, 1) and φ(N) = (J,K), where N is the total
RESULTS:
Although if two different derivatives are used in a DTW, the speed of recognition reduces by
a second, it amplifies the precision by about one and four percent for the first and the second
estimate. If SDTW is used, the speed increases by forty six times with a negligible decrease
in accuracy.
ABSTRACT:
There is no Kannada neural database available at present. The dataset consists of 100 samples
of different writers for input processing.
METHODOLOGY:
Feature Extraction is the technique to extract a set of derived values that can be features from
the dataset intended to be redundant. If the input data is too large for the process and is
redundant then it can be transformed into smaller set of features. These subset of features are
selected and the selected features should be relevant with respect to the input data and using
this data the desired tasks can be completed.
The input neural image is of size 50x50 and thinning algorithm is used for this process. The
image is divided into 25 equal parts and results in a matrix of order 10x5 each box and
compute the pixel distance for grid column. Hence resulting in 10 features for each grid
Handwritten Recognition using box based system and distance calculation using vertical
projection.
Input: Dataset consisting of numerals written by different writers
RESULTS:
GRNN is the method used for evaluation of classification of dataset and recognition of
dataset.
The efficiency for recognition of handwritten Kannada numerals was found out to be 98.8%
ABSTRACT:
METHODOLOGY:
Binarization:
Binarization is the first process that involves converting the gray scale image into binary
image by highlighting the text present in the image and removing the background of the
image.
Skew Detection:
This method is used as a filtration method where all the unwanted noises are removed after
binarization and this method is applied on each word and later is sent for skew correction.
Skew Correction: The images are filtered and sent for skew correction where the image page
is rotated to correct the skew that is detected for a small dataset but it takes more time for a
larger dataset.
Segmentation:
It is the process of extracting the image of our interest by reducing the dataset into smaller
segments and identifying the required text and converting it to something meaningful and
easier to understand.
CHARACTER CLASSIFICATION
After the extraction of segmented vowel modifiers and characters the feature vector of these
character will be assigned a character classifier. There are many methods of character
classification like Nearest Neighbor Classifiers and Neural Networks. The dataset used is
divided into two kinds of sets that are training and test set related to individual characters.
This method is the supervised method of pattern recognition and is very efficient when it
comes to performance. Since the dataset consists of both training and test set ,the training set
consists of positive and negative training sets.
Back Propagation is a methodology which has been employed in order to compute a gradient
which is vital in order to calculate the weights.
The network that is being dealt with is perceived in the form of a multilayer having an input
layer and having more than one hidden layer and output layers. The training of the back
propagation network is done in a batch mode by utilising supervised learning and also by log
sigmoidal activation function. To meet the specified requirements of the active functionality
before the input is trained, the input would be bot to a normalised range from 0 to 1
The radial basis function network is a network consisting of three layers, the layers being
input output and hidden layers. each of the training patterns we encounter the radial basis
functions are centralised this enables for us to keep all the biases in the layer constant which
in turn is dependent on the Gaussian spread.
The SVM classifier is popularly known as a classifier with two classes based on the
discriminant functions. The discriminant function in this particular SVM has a surface being
the representation. This surface access the separator for the patterns which revolves as two
individual classes. When we consider the matter of applications having to do with OCR we
train two classes which of this class has a unique label attached to it. The test sample is
assigned to the label in which the class gives the largest possible positive outcome increase
the test set throws a negative output it is rejected directly.
RESULTS:
On analysing the different aspects of the paper and carefully studying the methodology will
come into conclusion that the results obtained in this particular paper has been obtained from
scanning text samples from popularly available sources such as text books and documents
and magazines. when the text was taken from a same source and the text was found to be
same the pattern generated was ensure to be disjoint. For recognition of the text the SVM
classifier was used and this classifier having two classes hands becomes complicated to deal
with more than two class problems.
in order to improve the accuracy in the recognition performance stage of the curves we have
used a recognition based segmentation methodology which intern uses wide length of
ABSTRACT:
This paper consists of an OCR application for documents in a popular South Indian Language
Kannada.The dataset to this application will be in the form of a text image that is scanned for
input processing and is used as an input to this application. The expected output will be a file
that is compatible and is editable with all the softwares.The input is passed to the application
and the application extracts the readable words from the text image and then binarization is
carried out followed by segmentation of the image into smaller segments. Feature extraction
is done where all the features are extracted and the text is recognized and is classified using
SVM classifiers and the recognition of the text is not dependent on the fonts of the text and
the application is seen to deliver a decent and expected performance.
METHODOLOGY:
The image text is scanned using a scanner at 300DPI.After scanning the image text it is
binarised to obtain a binary image from a pixel image. Binarization is followed by skew
detection where unwanted noises are filtered and removed. Skew correction involves rotating
an image in such a way that the text inside an image is in an understandable and easier
format. Segmentation is followed by skew correction in which the words and lines are
separated in three vertical zones based on projection of the active word horizontally. After
segmentation the segments obtained are given as an input to the recognizer in order to
recognize the word and then feature extraction is done where the features are extracted and
the feature vector undergoes classification using a classifier called SVM. The output is
obtained by the ASCII file and is uploaded into the software for further process.
SEGMENTATION PROCESS:
After the segmentation of three zones horizontally the most critical and the major portion of
the character is present in the middle zone, therefore it is very necessary to segment the
middle zone first.
There might be many vowel modifiers and base consonants inside the middle zone and top
zone, the main aim is to separate both the consonants and vowels modifiers present in the
middle zone. Because the middle zone is the crucial one, some of the consonants are
segmented in two or more parts. In order to who achieve the segmentation the over
segmented merge approach is followed. There are three different classifiers used to classify
three zoned segments. Hence the classes obtained from these segments present in two top
zones might involve vowel modifiers, vowels and consonants. Hence the total count of the
number of classes would be 67.Any sort of segment can be added to the training set and the
results are obtained with these training set consisting of 2999 patterns
RESULTS:
In order to process a document we follow a sequence of operations and steps which has been
mentioned in a system a single image of text in Kannada is scanned using a flatbed scanner.
The first stage in order to identify the text begins with skew correction. In this the windows
whose transformation has been used has a windows size of approximately one lakh pixels.
When the observation is estimated to be around half of the pixel occupancy in each of the line
being scanned and the text height in every line is said to be hundred pixels then the window
wood correspond to nearly ten lines which should be enough in order to obtain a correct
estimation of the skew. Segmentation is done based on the separation of the lines and not on
any kind of page layout.
ABSTRACT:
One of the challenging and fascinating areas of research is the recognition of image text from
different kinds of sources in the field of image processing. This not only facilitates in
communication process but also helps many tourists and has many other applications like
recognizing and converting handwritten texts into electronic form, it can be a mode of
reading for blind. This paper focuses on recognition of handwritten names of Karnataka
districts written in Kannada using classifiers.
The process starts from scanning the text image for input processing followed by skew
detection that is noise filtering followed by skew correction and segmentation process.
Feature extraction is the method to extract features from the dataset that is redundant and a
feature vector is extracted where the recognition and classification is improved.
In order to extract the edges, edge detection algorithm is used and features are extracted for
better recognition and train the classifiers
METHODOLOGY:
The dataset consists of 1200 words in Kannada and English from different people. The words
and characters differs in font size and shapes collected from the writers.
The database consists of names of all the districts in Karnataka and around 20 words in
English
Recognition process involves these steps:
a. Input Processing
The input pre-processing involves binarisation, skew detection, skew correction and
segmentation, thinning.
EDGE DETECTION PROCESS:
It is the process of detecting the sharp edges and boundaries of the objects inside images.
FEATURE EXTRACTION:
Feature Extraction is the process of extracting the features from the dataset that is redundant.
If the features are extracted from a larger dataset then it is simplified into smaller fragments
and features are extracted. Feature extraction increases the accuracy rate of recognition. The
feature extraction method is selected on basis of the application. This paper focuses on postal
application to recognize the district names of Karnataka. The features involves loops, matras,
and some cornered points.
CLASSIFICATION PROCESS:
In this stage a label is assigned to text images after feature extraction and establishing a
relationship between the features. The classifiers include neural networks and dynamic time
warping.
RESULTS:
The dataset consists of handwritten text images from 60 writers so that the handwriting will
be uniquely recognized. There are 30 names of districts of Karnataka and a total of 20
English words from all 60 writers.
So the database includes 1200 English and Kannada words. An Accuracy of 92% of
recognition is obtained.
ABSTRACT:
Handwritten Character Recognition is one of the challenging and fascinating research
concepts in pattern recognition.There are many applications for handwritten recognition in
English, Japanese and other languages but limited when it comes to Indian languages. This
paper focuses on Handwritten Character Recognition of South Indian Languages like
Kannada, Tamil, Telugu, and Malayalam.
METHODOLOGY:
There are two ways of Handwritten Character Recognition. Offline Handwritten Character
Recognition and Online Handwritten Character Recognition. Online Character Recognition
involves an electronic pen that is tracked for the recognition process and is less complicated
when compared to offline recognition process.
There is a constraint on time for offline recognition. The image text is scanned and is
processed. All the characters are grouped using a technique called K-means. A feature vector
is extracted in order to increase the recognition process and component transition is used for
this process. Later this process is followed by classification. The results of recognition
accuracies were 93% and 88.9% for two sets of datasets, training and test sets.
RESULTS:
This paper consists of a detailed research and study on the handwritten character recognition
works developed for Indian scripts mainly South Indian Languages like Kannada,
Malayalam, Tamil, Telugu. There is no complete Optical Character Recognition application
system developed for these languages yet.
2.8 PAPER-TITLE:
Efficient Zone Based Feature Extraction Algorithm For Handwritten
Recognition Of Four Popular South Indian Scripts
ABSTRACT:
In the field of pattern recognition, Character recognition is on of the most important fields.
Extensive attention has been put in the field of handwritten character recognition in recent
times. There are two types of recognition systems online and offline. Off-line handwriting
recognition falls under the broad field of optical character recognition. In this paper the
author proposes Zone centroid and Image centroid based Distance metric feature extraction
system. The character centroid is computed and the image is further divided into n equal
zones. Average distance from the character centroid to the each pixel present in the zone is
computed.
METHODOLOGY:
Zone Based hybrid method is proposed for feature extraction. This method is chosen because
of its robustness to small variation, simplicity and its impressive recognition rate. It provides
good recognition even when preprocessing steps are not included.
The image is divided into fifty equal parts after the character centroid is computed. Average
distance from the character centroid to the each pixel present in the zone is computed. Zone
centroid is computed and average distance from the zone centroid to each pixel present in the
zone is computed. Repeat this procedure for all the zones. Finally 100 such features are used
for feature extraction.
For classification and recognition nearest neighbor classifier and feed forward back
propagation neural network classifiers are used.
Nearest-Neighbour classifier is used for large scale pattern matching. Distances from the new
vector to all stored vectors are computed. Similarity measurement is the basis of
classification.
Artificial Neural Networks have been used extensively both for the recognition of non-
Indian as well as Indian digits.
The structure of network and training algorithm determines the recognition rate of the
backpropagation model. The network is trained using the feed forward method. The network
structure is determined by the number of hidden and non hidden layers. Complete Neuron
connectivity is established. The network consists of 50 nodes in the input layer
(corresponding to one feature in each of the 50 zones), 80 neurons in the hidden layer. The
output layer has 10 neurons corresponding to 10 numerals. Under fitting is caused when there
are an insufficient number of hidden layers, it results in the network not recognising the
numeral because of a insufficient adjustable parameters.
Over fitting is caused when there are too many hidden layers, resulting in the network
failing to generalize.
There are several rules of thumb for deciding the number of neurons in the hidden layer.
• Hn < 2*In. Where Hn is the no. of hidden neurons and In is the number of input neurons.
• Finding the minimum number of epochs taken to recognize a character and recognition
efficiency of training as well as testing samples.
RESULTS
2000 samples were obtained, 1000 samples are used for training purpose and remaining
1000 samples are used for testing. Results of Kannada Numeral Recognition is shown below.
2.9 PAPER-TITLE:
English Sentence Recognition using Artificial Neural Network through Mouse-based
Gestures
ABSTRACT
One of the most fundamental forms of communication is handwriting. The basic issue with
handwriting recognition has existed for about six decades now, though what now poses as a
major problem is unrestricted HR of sentences. The main focus here is automated English
handwriting using mouse gestures which is achieved using artificial neural network. Back
propagation is used to train this neural network making it self sufficient and it gave positive
outcomes. This technique gives a brilliant way to retrieve the outlines of sentences and
compares with the image and mentions the area and then makes the use of the network to
identify the sentence. This method gave successful results with high accuracy.
ENGLISH
This language is a part of the Indo European cluster of languages. The immediate alive
siblings of this language undoubtedly are Frisian and Scots. In Friesland as well as islands of
North sea and a few places in Germany are the only places where this language is living. This
language has a beautiful history that can be broken down into Old, Middle and Modern. Pure
English is hardly spoken anywhere as it has been influenced by a lot of foreign languages
over the years. As English has accepted a few of these words as its own, its even difficult to
find a perfect form of this language.
METHODOLOGY
Pre-processing:
Different common methods are used for pre processing, these are mostly basic and generally
used for all pre-processing processes. Thinning is a very important part of this process. Here,
large pixel picture is condensed into a smaller one. It’s applied only on binary images and the
outcome also is a binary image. The unwanted pixels are removed similar to erosion, and the
image is thinned to required amount of thickness. This helps in better recognizing of the
image as the noise has been removed.
The feature extraction:
In this process, only the required features and extracted and further used. The features are
first retrieved according to relevancy and then this feature is eventually used to identify the
sentence. The features that are retrieved from raw data are further used for classification and
they should be such that they should be create minimum difference when in a class and the
maximum differences between different classes. At this stage, every character is given its
own identity which is actually nothing but its feature vector.
Proposed technique:
The approach used in the paper is for recognizing written English sentences using mouse-
gestures. This is achieved by employing neural networks. The images that need to be
recognized are provided as inputs to the network which is taken using the mouse gestures.
First, the network is fed some inputs consisting of English sentences to train it. Once its
trained, the features are similar to the data that was fed only then is the script termed as a true
one, else its considered to be false. This network paper proposes an English consists of three
tiers. The first tier is used for input, second is hidden(used for training) and the third one
gives us the outcome.
RESULTS:
The outcomes have proved to be successful for the continuous as well as discrete written
English using mouse-based gestures. The average success rate of recognizing the sentences in
English using the artificial neural network has come to be about 94 % for the sample data that
was taken from ten different people for five different samples.
2.10 PAPER-TITLE:
OCR for printed Kannada text to Machine editable format using Database approach
ABSTRACT:
This paper deals with the character recognition of Kannada script for printed files. The
Application first scans the printed text and then extracts the characters of Kannada script and
then undergoes segmentation process. In order to store printed texts database is used and
there was an accuracy up to 99.99%.
METHODOLOGIES:
Segmentation is the process of segmenting the lines, words and characters from the dataset
that enables an application to recognize the characters and helps in conversion process.
Kannada script has many base consonants and vowel modifiers so this method is
implemented.
The image is scanned and is converted into a gray scale image. The image is rotated
horizontally for better understanding and the text image is recognized.
The above process is repeated for character segmentation and word segmentation.
RESULTS:
This method gives 99.99% accuracy. But this process needs a lot of space and lot of
computation is necessary.
CHAPTER 3
OBJECTIVES AND
PROBLEM
STATEMENT
3.1 OBJECTIVES:
These are the goals laid out to be achieved by the end of the project:
● It should be user-friendly
● Application should stand capable of pre-processing the input to such that the
foreground overpowers the background.
● Application should be capable to identify the text portions from the input
● Application should extract text contained in the given input image and present it to
the end-user
● The character dataset covered previously was small so as to avoid any complication in
the data processing and we try to incorporate larger datasets(along with the character
modifiers) in order to achieve better results and cover a larger portion of the language
● The applications previously made were specific to a particular area whereas our
application covers a major part of the population, including:
✓ Preserving the culture through old scripts
✓ Redefining peer-to-peer learning for rural kids
✓ Help tourists by translating sign boards
✓ Providing a means of meaningful audio output of correct English word/sentence
for a given Kannada scripture
CHAPTER 4
METHODOLOGY
4. METHODOLOGY
4.1 METHODOLOGY:
4.1.1 Binarization:
The conversion of an image from pixel state to a binary state is known as binarization:
● Consider the above RGB image, a child writing a few lines of Kannada on a
blackboard.
● RGB is a standard system of representing all the colours in the visible spectrum by
combining red, blue and green in different portions.
● At first the image is converted into grayscale[8]. Only intensity information is carried
in a grayscale Image.
● After the conversion a threshold value is selected to separate the foreground and
background data in the image
4.1.2 Segmentation:
Simplification and meaningful representation is the end goal of segmentation. Typical uses of
image segmentation are boundary and edge detection in images. More precisely, what image
segmentation does is assign a label to each pixel, so that similarly labeled pixels have similar
nature.
Region detection and edge detection are closely related, both depend on a sudden contrast in
intensity at the region boundaries. Due to these properties we can say that edge detection is
another way of segmentation..
Edge detection methods can be applied to the spatial-taxon region, in the same manner they
would be applied to a silhouette. This method is particularly useful when the disconnected
edge is part of an illusory contour
We have different techniques for different efficiency requirements, while training a neural
network
To estimate positional features and also estimate different types of line formations in a
dataset we use Character Geometry based Feature Extraction..
Zoning Algorithm:
A vector is defined as a quantity that has both magnitude and direction. A gradient is a vector
quantity.
The gradient operator generates a two dimensional vector graph, containing information
about the direction of highest possible surge in magnitude and its value is directly
proportional to the rate of change in the direction.
4.1.4 Classification:
A neural network is made up of multiple nodes known as neurons. Neurons when grouped
together are referred to as Layers. Multiple layers are interconnected.
Each perceptron is tasked to complete a simple calculation. The result of this calculation is
transmitted to all the nodes that it is connected to.
The Convolution layer is the first layer. Тhe image input to it is a always in a matrix format.
The software selects a smaller matrix in the original matrix, this is called a neuron(filter,
core). The neuron produces convolution. The simple multiplication involves getting the
product of each obtained value to the initial pixel value. This is followed by the summation of
the products.. Finally a number is obtained. The neuron now moves one value to the right
while performing a similar operation. After the neuron passes through all the positions a
smaller matrix is obtained.
CHAPTER 5
REQUIREMENT
SPECIFICATION
5. REQUIREMENT SPECIFICATION
Windows
CHAPTER 6
IMPLEMENTATION
6. IMPLEMENTATION
The implementation is broadly classified into four major modules, i.e, Input Preprocessing,
Character Recognition, Conversion and User Interface. The system is developed using python
libraries like scikit image which is an image processing library that facilitates extraction of
features, noise removal, numpy, openCV, Django, Scipy, Keras and Matplotlib.
This script is used to create MNIST like dataset image where each row is a character.
rootdir is the source of the image data which is used to create the rows.
no_in_line specifies number of characters to put in each line
len_dirs = len(os.listdir(rootdir))
continue
os.chdir(os.path.abspath(dir3))
flist = glob.glob('*.jpg')
flist = flist[:len(no_in_line)]
x_offset = 0
for im in flist:
img = Image.open(im)
x_offset += 52
y_offset += 52
os.chdir(ori)
final_img.save(str(rootdir)+'.jpg')
image = sys.argv[1]
# initialize parameters to be used ( values have been generated after using numerous
iterations)
s=1
lmda = 10
epsilon = 0.0001
X = numpy.array(Image.open(image))
X_average = numpy.mean(X)
X = X - X_average
# normalize
X = s * X / max(contrast, epsilon)
scipy.misc.imsave(filename + '_contrast.png')
After the conversion of raw data into clean data, the text should be divided into rows, words
and characters. The process of dividing the text document into lines, words and characters is
called segmentation. After the image is pre-processed, it is divided into several segments.
These segments help in the process of character recognition.
Dividing the document into lines is called line segmentation. Dividing the lines into words is
called word segmentation. Dividing the words into single characters is called character
segmentation. The steps to be followed are:
Divide the text into rows
Divide the rows into words
Divide the words into characters.
Code:
def segment(file):
rootdir = 'web_app/hwrkannada/hwrapp/static/hwrapp/images/Processed_' + \
os.path.splitext(ntpath.basename(file))[0]
# Generate directory name to store segmented images
directory = rootdir + '/Segmented_' + \
os.path.splitext(ntpath.basename(file))[0]
A neural network built is trained on the datasets. The convolution neural network layers and
recurrent neural network layers are used. A neural network is made up of multiple nodes
known as neurons. Neurons when grouped together are referred to as Layers. Multiple layers
are interconnected. Each perceptron is tasked to complete a simple calculation. The result of
this calculation is transmitted to all the nodes that it is connected to.
The image input to it is always in a matrix format. The software selects a smaller matrix in
the original matrix, this is called a neuron (filter, core). The neuron produces convolution.
The simple multiplication involves getting the product of each obtained value to the initial
pixel value. This is followed by the summation of the products. Finally a number is obtained.
The neuron now moves one value to the right while performing a similar operation. After the
neuron passes through all the positions a smaller matrix is obtained.
The convolution layers are trained to extract relevant features that is nothing but feature
extraction. All these layers are made up of three operations. The first layer applies a certain
operation. After each convolution a new nonlinear layer is added. The nonlinear property is
brought by the activation function. Without this property a network would not be sufficiently
intense and will not be able to model the response variable. The nonlinear layer is followed
by a pooling layer. It performs downscaling operations on them using the width and height
property. This results in the image aggregate being diminished. A fully connected layer is
attached after the convolutions are completed. The output information of the convolution
layer is taken care of in this layer.
Code:
def get_image_size():
return img.shape
def get_num_of_classes():
return len(os.listdir(directory))
directory = sys.argv[1]
def cnn_model():
num_of_classes = get_num_of_classes()
model = Sequential()
model.add(LRN2D(alpha=0.1, beta=0.75))
model.add(Flatten())
model.add(Dropout(0.5))
model.add(Dense(num_of_classes, activation='softmax'))
sgd = optimizers.SGD(lr=1e-2)
model.compile(loss='categorical_crossentropy', optimizer=Adam(
filepath = "cnn_model.h5"
checkpoint1 = ModelCheckpoint(
callbacks_list = [checkpoint1]
def train():
train_images = np.array(pickle.load(f))
test_images = np.array(pickle.load(f))
train_images = np.reshape(
test_images = np.reshape(
train_labels = np_utils.to_categorical(train_labels)
test_labels = np_utils.to_categorical(test_labels)
plt.plot(history.history['acc'])
plt.plot(history.history['val_acc'])
plt.title('model accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
# plt.show(hold=False)
plt.savefig('acc.png')
plt.clf()
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
# plt.show(hold=False)
plt.savefig('loss.png')
# model.save('cnn_model_keras2.h5')
load_images.create_pickle(directory)
train()
K.clear_session()
6.3 Conversion
After the recognition of characters, it has to be converted into English. Each character image
has a label attached to it, so the images with a similar label will be grouped together. And the
grouped image is converted into English. You need to train the data, so the loss has to be
calculated. For each line-word or a character, a character is specified. So for a horizontal line
or a word a corresponding character is assigned and is trained accordingly. But the drawback
is that its time consuming. If there are duplicate characters then they have to be removed in
order to get accuracy and we get the proper conversion of the word.
The loss calculation is shown in the above figure. Function calls like ctc_loss is used in this
process.
Code:
img = cv.imread(r'final.jpg', 0)
#gray = cv.cvtColor(img,cv.COLOR_BGR2GRAY)
# Now we split the image to 50x10 cells, each 52x52 size
cells = [np.hsplit(row, 50) for row in np.vsplit(img, 10)]
# Make it into a Numpy array.
x = np.array(cells)
# Now we prepare train_data and test_data.
train = x[:, :25].reshape(-1, 2704).astype(np.float32)
test = x[:, 25:50].reshape(-1, 2704).astype(np.float32)
# Initiate kNN, train the data, then test it with test data for k=1
knn = cv.ml.KNearest_create()
knn.train(train, cv.ml.ROW_SAMPLE, train_labels)
CHAPTER 7
TESTING
7. SYSTEM TESTING
Once the different units of a system are integrated, it has to be tested to make sure they are
incompliance with our requirements. Once different parts of a system are put together it
might have an unexpected behavior and this testing is carried out in order to make sure that
does not take place.
System testing is generally taken under black box testing. Black box testing is a type of
testing that is carried out without having information about the details of the inner working.
TYPES OF TESTING
There are various types of system testing, a few are mentioned below:
1. Usability Testing – This concerns with making the system user-friendly and
more flexible for those who do not yet have a clear idea of how it works as well
2. Load Testing – As the load that is provided during in the initial stages are just enough
to check the system for the specified requirement, they are not sufficient. So this
testing should be performed to make sure the system works in real life environments.
3. Regression Testing- Over the software cycle a lot of changes are made to the system
which affects the system in unexpected ways. To make sure none of these changes
have caused bugs regression testing is done. This might also be done to take care of
any new old recurring bugs
4. Recovery Testing - To vouch for the reliability, trustworthiness as well as the ability
to recover from future crashes, recovery testing is done.
5. Migration Testing - Sometimes the initial stages of a system are developed in an old
infrastructure and by the time it completes there are a lot of newer versions available.
To make sure the system works equally well in the newer infrastructure migration
testing is done.
6. Functional Testing – This requires going through the whole system to check if any
important functions are missing. We could also add some more functionalities to
enhance the system features.
7. Hardware/Software Testing – The tester has to makes sure the software and the
hardware work well together. As new components are added it has to be taken care
that it doesn’t create any obstacles for the compliance between the hardware and the
software.
TEST PLAN
We have employed a few testcases to ensure the well working of our project. They are as
follows:
TESTCASE 1
Testcase Description:
The input that should be scanned should maintain a size range. The image cannot be less than
2kb.
Testcase Input:
We input an image less than 2kb
Expected Result:
It should fail as it doesn’t comply with the specified range.
Actual Result:
The system produces distorted images as the provided input is not sufficient to be read from
Remark:
PASS
TESTCASE 2
Testcase Description:
The input that should be scanned should maintain a size range. The image cannot be more
than 10 mb.
Testcase Input:
We input an image more than 10mb
Expected Result:
It should fail as it doesn’t comply with the specified range.
Actual Result:
The system produces distorted images are too huge to let the curves of letters be identified.
Remark:
PASS
The input gives distorted output right after the first pre-processing step and this leads to
wrong prediction of model due to mismatch of feature.
TESTCASE 3
Testcase Description:
Image lying only in the specified range of 2k<image<10mb will be processed further as the
model is trained that way.
Testcase Input:
An image is input in the specified range between 2kb and 10mb.
Expected Result:
It should fail as it doesn’t comply with the specified range.
Actual Result:
The system produces distorted images are too huge to let the curves of letters be identified.
Remark: PASS
CHAPTER 8
CONCLUSION
Conclusion
Communication in this fast pace world has become as necessary as shelter is for
survival. Language barrier only makes it worse and since there so many languages in
the world, 28 official only in India it makes it hard for even a multilingual person to
manage. This application will help people who do not know Kannada have an easy
access to the language without having to go around asking people which will again put
them in a loop of language barrier. This project uses a Convolutional Neural Network
model to efficiently and accurately recognise and classify Kannada scriptures from
input images and translates the recognised words to meaningful English words.
We also hope that, future work in the field of Kannada OCR will be developing an app
which is compatible with all the operating systems that will facilitate the use of this
application on mobile phones. Also, to come up with a dynamic translator through the
use of APIs. Once invested with time and money to acquire the necessary huge datasets,
this application can be used to recognize any Kannada word from any Kannada script
CHAPTER 9
REFERENCES
7. REFERENCES
7.1 References
[1] Review of Automatic Handwritten Kannada Character Recognition Technique Using
Neural Network 2017 IEEE
CHAPTER 10
SNAPSHOTS
10. SNAPSHOTS
Fig 10.1