Documente Academic
Documente Profesional
Documente Cultură
By:
Farhan Ar Ra Sk. Golam Muhammad Hasnain Humaira Baker
10-17788-3 11-18716-1 11-19613-3
Supervisor:
Dr. Tabin Hasan
Head of Graduate Program
Department Of Computer Science
American International University Bangladesh
December, 2015
i
i
We would like to dedicate this thesis to the people who enriched
Bengali Language in the past and who will work to enrich it even
more.
Acknowledgments
First, we would like to thank our supervisor, Dr. Tabin Hasan, Head of
Graduate Program, Department Of Computer Science, American In-
ternational University-Bangladesh for taking time out of his extremely
busy schedule and help us. Secondly we would like to thank Dr. Md.
Shahedur Rashid, Geonames Ambassador and Professor, Department
of Geography & Environment, Jahangirnagar University, Dr. A S M
Abu Dayen, Associate Professor, Department of Bangla, Jahangirna-
gar University and Prof Dr. Saidur Rahaman, Chairman, Department
of Public Administration, Jahangirnagar University for their complete
support in our thesis. Thirdly, we would also like to thank Saida Hos-
sain, Shamim Ahmed, M. Samin Yasar and Sumaiya Salam Munira
- graduate students of American International University Bangladesh
who worked with us in making our research successful. Finally, we
would like to thank our parents for their encouragement, support and
assistance and would also like to convey our regards to all those who
supported us in any respect during the completion of this thesis.
Abstract
Contents iv
List of Figures vi
1 Introduction 1
2 Literature Review 3
4 System Design 10
4.1 Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
4.2 Programming Language . . . . . . . . . . . . . . . . . . . . . . . 11
4.3 User Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4.4 Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4.5 Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4.5.1 Application Server . . . . . . . . . . . . . . . . . . . . . . 12
4.6 Application Programming Interface (API) . . . . . . . . . . . . . 13
4.7 Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
5 Results 15
5.1 Classication Results . . . . . . . . . . . . . . . . . . . . . . . . . 15
6 Future Work 18
iv
CONTENTS v
7 Conclusions 20
Bibliography 21
Appendix A i
v
List of Figures
4.1 High Level Diagram of the system, showing how dierent modules
connect to each other . . . . . . . . . . . . . . . . . . . . . . . . . 12
4.2 High Level Diagram of the system, showing how dierent modules
connect to each other . . . . . . . . . . . . . . . . . . . . . . . . . 14
vi
Chapter 1
Introduction
1
2
2
Chapter 2
Literature Review
3
4
Paolo Rosso have explained the GeoWordnet as a resource for Geographical Infor-
mation Retrieval (GIR) [3] [10]. Geographical Information Retrieval (GIR) has
become a new research area in the sector of Information Retrieval (IR). Associat-
ing a toponym (i.e., a geographical name) to its actual coordinates in a map are
the main technical challenges in GIR. In Bangladesh there were no such system
which can classify the location according to Geo location.
Davide Buscaldi, Emilio Sanchis and Paolo Rosso have proposed an automatic
method in [3] to expand the geographical terms in queries. They have used
the Wordnet ontology and another method that expands the terms during the
indexing phase. Unfortunately, WordNet has quite limited coverage in geo-spatial
information and lacks of latitude and longitude coordinates [10]. Therefore, it
is essential to look elsewhere if we want an adequate amount of geographical
information. Saida Hossain [5] has classied nine feature classes and again 667
subcategorized feature codes of Geonames in Bengali language. On the basis of
quantity and quality we choose Geoname features classes due to less ambiguity
and Geonames contains the largest geo-spatial database.
Geo Wordnet has a very signicant importance in many applications such
as Geographical Information Retrieval (GIR) for increasing semantic interoper-
ability, semantic Geographic Information Systems as well as Geo-spatial Web
Services. This paper aims at developing a system for data collection in Bengali
language which contains the geographical information such as longitude, latitude,
geographical coordinates about cities, countries and dierent places and locations
of Bangladesh. Here we have used the 9 feature classes and further subcategorized
684 feature codes of GeoNames, a geographical database containing information
about dierent places. This will contribute a lot in Geo Wordnet, Multi Wordnet
- a multilingual lexical database including information about words in dierent
languages as well as in natural language processing.
Saida Hossain [5] has classied nine feature classes and again 684 subcatego-
rized feature codes of Geonames in Bengali language. She faced several challenges
while translating the feature codes and developing the classes and entities in Ben-
gali. Our research hugely focus on this work as Saidas [5] classications were not
validated. We updated the classication of several classes in order to make a
correct sense. We have also validated manually with the help of Dr. Abu Dayen,
4
5
5
Chapter 3
Classication of Bengali
Geonames
3.1 Methodology
During our classication we used two books of geography terminology [11] [12] in
order to nd the accurate senses of the feature code. After doing this we went to
meet with Dr. Shahedur Rashid to validate our classication. He told us to go
Department of Bangla of Jahangir Nagar University to measure the accuracy of
our translation because the classication is in Bengali Language.
After going the Bangla Department we met with Dr. Aniruddha Kahaly in
order to validate our translation. Dr. Aniruddha Kahaly had issued a letter to
Dr. A S M Abu Dayen to verify our classication. Dr. Abu Dayen told us to nd
the accurate classication we had to study Bengali terminology. Dr. Abu Dayen
told us to visit Bangla Academy library to nd the terminology of the feature
classes. In addition Dr. Abu Dayen also give us idea what types of books we
need to search.
We had visited Bangla Academy several times and improved continuously in
iteration under the supervision of Dr. Abu Dayen. We had used about 9 books
of terminology [13], [14], [15], [16], [17], [11], [12], [18], [19] in order to nd the
accurate senses of the feature code in Bengali. For every classes of classication
of Geonames we choose specic books to nd out the best terminology as well as
6
3.2. Classication Process 7
7
3.2. Classication Process 8
any description at all. We had decide the terms of those words from the context of
parent classes. For examples territory is a subclass of country, region and states.
According to this context it means a place which falls in a specic country. For
every classes of translation of Geonames we choose specic books to nd out the
best terminology as well as searched other books also. For examples to nd
the senses of subclasses of city and villages we used arts and social terminology
books [15]. The major problems we had faced to nd the Bengali terminology of
administrative and political subclasses classes. Some subclasses are not valid to
the perspective of Bangladesh. For examples Israeli settlement but according to
the sense it means refugee camp. Though we had managed to make senses of
these sub classes with the help of Dr. Saidur Rahaman .
We had updated around 300 terminology comparing to Saida Hossains work,
which we had found out from the books. For examples rst order administrative
division means division in Bangladesh. From Saida Hossains work additional
classes were found out from the existing classes. For examples Beel and Jheel
maps with lake class of Geonames. There were some Geonames features classes
that do not exists in our country. So naturally most of them does not have any
8
3.2. Classication Process 9
9
Chapter 4
System Design
Since the last decade with the increasing availability of internet all over the world,
more and more people are being able to access contents and applications through
it. According to Global Internet Report- 2014, number of internet users reached 1
billion on October 05 2 billion on September 13 and is estimated to have reached
3 billion by May 2015 according to the report of 2015. By the end of September
14, more smartphones were being sold in developing countries and by the end of
October 2015, no of tablets sold crossed no of all PCs sold.
Most these applications that run on these devices need a strong backend
system to function properly and without errors. Our approach has been to design
such a system that is robust and scalable. The system has been designed to work
with the Geonames classication and the database has been modeled using the
data-structure of Geonames. The database has been further modeled to accept
more data and add more features to the system.
4.1 Design
For developing Bengali GeoWordNet, we needed a system that can eciently
handle all user inputs and outputs eciently. For that reason we chose MVC as
the base framework for our system. The Model-View-Controller design pattern
is very useful for architecting interactive software systems [20]. MVC pattern
allows systems to be built using modules, which add dierent features to the
10
4.2. Programming Language 11
system such as, security, easy maintenance, easy upgrade, scalability and good
performance. The MVC of Model View Controller Pattern is proven in the eld
as a dependable development pattern.
4.4 Framework
A framework provides basic libraries and various functions. We required a frame-
work which is lightweight and fast. As our application is mainly backend server,
we didnt need very good UI. The PHP framework CodeIgniter matched all our
requirements. Besides CodeIgniter is free and open-source which made it a perfect
choice for our application.
4.5 Structure
The system has been built using MVC pattern and uses the following design in
its implementation level.
11
4.5. Structure 12
Model: The model denes the data-structure of dierent objects in the sys-
tem. It controls all data input and output to and from the database. The
models provide security checks and data validation and also prevents SQL
injections. The class diagram for some important models are shown below.
We have designed the system to have a dierent Model Class for object in
the system. For example User Object, Geonames Object etc.
Figure 4.1. High Level Diagram of the system, showing how dierent modules
connect to each other
View: The view denes the user interface of the system. In the system we have
two types of views. Views with graphical user interface and views that do
not have a user interface but provides the interface to let other applications
or systems to access the system. The view also provides basic access to the
system for registration and allows registered users to insert data using a
web UI. We have planned to provide a validation system based on web UI
in the future.
Controller: The controller handles all input output to the system. All user
inputs are checked and validated in the controller and then send to the
12
4.6. Application Programming Interface (API) 13
model for database interaction. All outputs are send to the view from the
controller. We have designed a controller for each of the APIs.
4.7 Database
The database of the system is designed after Geonames database and then has
been modied according to system requirements. We have designed to store all
application data like User information, non-validated dump data and other data
to one database. The other database is designed to store validated data which will
be used for further research and application development. All user provided data
is saved into the system and later after validation (including automatic checking
using data validation algorithms and manual checking by expert) is moved to
13
4.7. Database 14
another database. The initial database design was provided by Shamim Ahmed,
which was extended based on further requirements. Each entry in the vocabulary
set contains distinct 19 data including location, altitude, population and alternate
names of the entry. A very basic database diagram of relation between the tables
that store data (provided by the user), is shown below. The gure shows how
Figure 4.2. High Level Diagram of the system, showing how dierent modules
connect to each other
dump data from users are related to Bengali feature classes and how users are
mapped to each record they enter. The record is kept for validation process in
the future.
14
Chapter 5
Results
15
5.1. Classication Results 16
Table 5.1. Tables containing statistics are shown above for 9 classes of Geonames
feature classes.
16
5.1. Classication Results 17
The results from the table shows that almost all Geonames features could
be classied in Bengali. Only 8 features could not be matched to any Bengali
features. Figure 8 shows our percentage of performance.
17
Chapter 6
Future Work
The research results in creation of Bengali Geonames and leads the path to cre-
ation of Bengali GeoWordnet by mapping Bengali Geonames to Wordnet. We
manually laid the foundation of all computational work. In the future, Natural
Language processing, Optical Character Recognition and Automated Sentence
Translation.
In the future after the system is complete, we will have names of all the places
in the Bangladesh. We will be able to search based on geographical features and
places. For example if someone wants to nd where an ocean, a river, a forest
or a mangrove forest is, they will be able to nd it correctly.
Geonames includes features that are nonexistent in Bangladesh. These fea-
tures need to be removed from Bengali Geonames. There are also features that
exist in our country but are not included in Geonames and features that exist
all over the world and no word in Bengali for these features. Thus a path of
research has also opened in Bengali linguistic for the creation of new words. Be-
sides Geonames does not contain any places for undersea features which needs to
be added.
A new path has also been set for the development of Applications that use
geological features in Bengali. These applications can use gamication rules to
create innovative applications. The future work is to provide descriptions in
Bengali for easy understanding of concept classes. Besides, terminology of some
words were not found anywhere and is presumed to be nonexistent due to the
fact that these geological features do not exist in places inhibited by people using
18
19
Bengali. The current system that provides web service needs major development
and improvement of features like XML support and support of dierent output
types for users of various devices.
19
Chapter 7
Conclusions
We had classied the geonames features class to Bengali. As a result from our
research many new things came out. Our classifying process was fully manuals
with some automatic checks when needed. We had manually let the foundation
of all computational work. We use the vocabulary of our collaborators and we
know how to classify this so that computational logic can be built up on it.
After classifying the features classes we had built the API system. Here we
created the gateway so other people to work in this research process. Our research
opened up many sectors of research and brought the necessity of development of
new words in Bengali.
20
Bibliography
21
BIBLIOGRAPHY 22
Information Science, vol. 22, no. 10, pp. 10451065, 2008. [Online]. Available:
http://www.tandfonline.com/doi/abs/10.1080/13658810701850547
[11] Vubiddya Porivasha Kosh. Abdul Kadir, Admin ocer, Kandrio bangla
Unnayan Board, 10 green road Dhaka-2, 1968.
[12] G. Moin Uddin, Vougolik Porivasha Kosh. Dhaka: Kandrio bangla Unnayan
Board 84, Shantinagor, Dhaka-2, 1996.
[14] Bangla Academy English-Bangla Dictionary, 2nd ed. Dr. Jalal Ahmed,
Director Incharge, Bangla Academy, 2015.
22
BIBLIOGRAPHY 23
23
Appendix A
i
ii
ii
iii
iii
iv
iv