Sunteți pe pagina 1din 5

978-1-61284-848-8/11/$26.

00 2011 IEEE
Remote Sensing Image Database Based on NOSQL
Database

ZhiIeng Xiao, Yimin Liu*
State Key Lab oI InIormation Engineering in Surveying Mapping and Remote Sensing
Wuhan University
Wuhan, China
*Corresponding author:liuyimin-2007163.com


Abstract-Relational database technology plays an eventually
role in the area of storage, management and analysis of global
image in the past few decades. However, an RDBMS isn`t an
ideal platform for modeling complicated spatial data networks
and it is expensive and hard to maintain especially when the
whole system is getting larger. With a revolution of Web
technology, named as Web2.0, NOSQL storages become an
extremely hot new paradigm in data present and application.
Hadoop offers a free open database named as HBase that closely
emulate most of the components of Google`s BigTable in the
management of global mass GIS data. This paper provides a view
of the capabilities of NOSQL Database applied in global image
systems and we have come out with that NOSQL Database
involved huge volumes, network partitioning, and replication has
a wide applicability, scalability and high performance than the
traditional relational ones.
Keywords-AOSQL Database; HBase; Pyramid map; HBCIS
I. INTRODUCTION
Geographic InIormation System is built Ior acquisition,
storage, management and release spatial data and attribute,
and WebGIS is provided Ior users to access, manage and share
global geographic inIormation concurrently no matter where
they are Irom or what platIorms they are using |1|. In
traditional web map services, single-node system could meet
the requirements Ior application when spatial data sets are
small, but it has obvious limitations in perIormance, scal-
ability and reliability while processes large data sets.
We have come out with the NOSQL database storage
technology which is Ilexible enough to achieve high per-
Iormance, thus meeting the demands oI reliability and
scalability |2|. In this paper, we use Hadoop which is
conIirmed to be a decent distributed platIorm with a superior
store strategy and computing capacity to support web map
applications |3|. HBase does not support a Iull relational data
model |4|. However, it provides a simple data model that
supports dynamic control over data layout and allows clients
to scan the locality and properties oI the data. Thereby, an
index is available in HBase. HBase treats data as uninterpreted
strings. Clients can control the locality oI their data through
careIul choices.
In this paper, we use Google Web Toolkits (GWT) to build
the Rich Internet Applications (RIA) and HBase as the tile
source. In the Key/Value store system, the attribute data such
as band and coordinate inIormation will be added as the value.
Actually, we designed a series oI columns and an eIIective
index mechanism to manage and store the pyramid map. Then
client could request Ior the map by using center coordinates
and zoom parameters with an AJAX technology |5|. In HBGIS,
cluster conIiguration can enhance the quality and eIIiciency oI
the global image service. As a result, NOSQL database takes a
good perIormance in the simulation oI thousands oI data read
and write requests in diIIerent backend databases |6|.
The structure oI the paper is as Iollows. Section 2 describes
the basic model oI NOSQL databases and gives several exa-
mples. Section 3 explores the illustration oI HBase in HBGIS
in detail. Section 4 shows some results oI our implementation
and analyses some advantages oI using NOSQL databases
compared to relational databases. Part 5 discusses the hot topic
oI when and how to choose NOSQL databases as the remote
sensing image databases and Section 6 gives related and Iuture
work.
II. NOSQL DATABASE MODELS
A. What Is NOSQL Database
NOSQL database has many connotations, but the most
popular one today is Not Only SQL database which means
that NOSQL database is not a simply deny oI the traditional
relational databases but a inheritance and development oI the
past patterns |7|. NOSQL database aims to solve the needs oI
high concurrent read-write, eIIicient mass data storage and
access, database scalability and high availability based on
sparse, distributed, persistent multidimensional sorted map.
Usually NOSQL database is applied in a distributed sys-
tem so as to provide more concurrent connections and reduce
the pressure oI the main server. The biggest diIIerence
between NOSQL databases and relational databases is that the
Iormer don`t need to design the table schema in advance but
using column Iamily instead. NOSQL databases maintain data
in lexicographic ordered by row key. The row keys are
arbitrary strings and every operation under a single row key is
serialized. Column keys are grouped into sets which Iorm the
unit oI access control. A column Iamily must be created
explicitly. Then data can be added dynamically.
B. NOSQL Databases Classfication
Generally speaking, there are three popular types oI
NOSQL databases |8|. The Iirst type is key-value stores.
Amazon`s SimpleDB is a Web service that provides core
database Iunctions oI inIormation indexing and querying in
the cloud. The second one is column-oriented databases. This
type consists in the majority oI NOSQL databases. Facebook
created the high perIormance Cassandra to help power its
website. The Apache SoItware Foundation developed HBase.
The third type is document-based stores. 10gen (a cloud com-
puting company invested by Twitter) commercially supports
and sponsors the development oI MongoDB, an open source
document database built Ior scalability and ease oI use.
C. HBase Architecture Overview
Fig. 1 shows the relationship between Hadoop platIorm
and HBase database and the generally layout oI HBase |9|.
Figure 1. HBase Overview
1) HBaseMaster
The HBaseMaster is responsible Ior assigning regions to
HRegionServers. The HBaseMaster slso monitors the health
oI each HRegionServer.
2) HRegionServer
The HRegionServer is responsible Ior handling client read
and write requests. It communicates with the HBaseMaster to
get a list oI regions to serve and to tell the master that it is alive.
3) HBaseClient
The HBaseClient is responsible Ior Iinding HRegion-
Servers that are serving the particular row range oI interest.
On instantiation, the HBase client communicates with the
HBaseMaster to Iind the location oI the ROOT region.
4) HFile
HFile is the basic element oI HBase that achieves
BigTable`s basic Iunction oI Iast eIIicient storage.

Figure 2. HFile elements
As shown in Fig. 2 above, there is an index pointed to the
oIIset oI the data block and it will be read into memory.
When we search Ior some key, we don`t have to search the
whole HFile. Thus, you will Iind it quite convenient and Iast
in data presenting.
III. THE DESIGN AND IMPLEMENTATION
As current GIS tools are mainly designed Ior executing
sequentially on a single workstation, a GIS becomes much less
eIIicient and expensive when dealing with tremendous data and
complex computations. This research applies HBase to build a
low-cost, Ilexible and easy to maintain pyramid tile layer
services. We call it HBGIS which means it`s a Hadoop based
geographical inIormation system. HBGIS not only allows
Browsers render maps quickly but also shows and queries
raster data on map quickly. What`s more, user can understand
data in the spatial distribution easily.
A. General Layout of HBGIS
When we run massive image applications, the backend
databases are oIten the bottleneck oI high perIormance and the
existing databases seem to aggravate their load drastically.
Another problem that oIten conIuses us is the network traIIic.
To deal with these problems, we should spit the big map into
small pieces and give each tile a unique key. Just take a
glimpse at what Google Earth has done, you will Iind pyramid
technology works perIect in global image services. Clients Ieel
comIortable with an AJAX technology Ior they don`t have to
wait Ior the data doing nothing. At the same time, the Browser
cache mechanism improves the eIIiciency and increases the
number oI concurrent users signiIicantly |10|.

Figure 3. Overview oI WebGIS systems based on HBase database
In Fig. 3, Iirstly users send Ajax requests to Web Server
through the Browser. Web Server gets user`s parameters and
sends image requests to the backend database. Then the
Datanode will search Ior the images according to the row key
and return binary stream Iiles to the HRegion Server. Usually
these Iiles are GeotiII Iormat streams, we can`t display tiII on
the screen directly. So we transIer data Iormat to jpeg or png
instead through the Geotools interIace. Then the Map Server
sends the new image stream to Ajax engine. This engine will
recombine the little tiles and Iinally the client gets their map.
In this research, we try to provide an easy way to store,
share and publish geospatial data. Through this, it will be easier
to spread GIS data and, thus, making it easier to share
resources as well as collaborate in the GIS domain.
B. Pyramia Map Set
We organize the remote sensing image into a data set in
accordance with the sensor type, data acquisitive sequence,
space scope and other dimensions. Each data set has three data
structures: the image data, spatial index and image metadata.
The metadata records data set name, geographic scope, block
size, the range oI the pyramid layers, coordinate inIormation
and image projection in XML Iormat. Image data is stored in
HBase and spatial index provides the high perIormance. In
general, the user`s interest to remote sensing image data is the
data type, access time, spatial coverage, band and so on.
ThereIore, we use multi-level naming to name the remote
sensing tiles with these inIormation. At currently, we just
support image display navigation and mapping without any
Iurther analysis. Thereby a lossy coding to image is Ieasible.
According to the original image pyramid design, a large
map should separate to some diIIerent zoom layers and split
into some same-scale little tiles. Each tile can be stored in the
NOSQL database used its tile ID as the row key and its data as
the value. A high speed index is set up to accelerate the speed
oI search. Because oI the incomparable predominance oI
column-store Iundamental, we can handle massive global
image data services quickly and add data continuously.

Figure 4. Image Pyramid Architecture
In this paper, we use power oI 2 to establish the image
pyramid as shown in Fig. 4. That means the scale oI one layer
will be the halI oI it`s lower one. The speciIic creating process
is shown as Iollows. At Iirst, take the original image as the
bottom layer oI the pyramid, signed as level 0. Then create the
new upper layer by image re-sampling method. What you need
to do next is repeate iterate step two to the top layer which is
deIined in advance. Given to the pyramid dimension and
reading speed, a image depository with Iour or Iive layers is the
most eIIective and takes highest perIormance. We use
FWTools which can make use oI GDAL`s powerIul data
processing capabilities to slice each layer and converser data in
command line. You can also achieve a Python programming
interIace oIIered by FWTools.
The server provides a series oI core Iunctions such as
creating databases, getting maps and displaying images, etc.
When users handle image requests at diIIerent scales, the
server will return the call data on the diIIerent pyramid layers,
thus speeding up the display speed as well as solving the
network bottlenecks.
C. Data Schema Design
TABLE I. IMAGE TILE DATA SCHEMA
l) \u'u
ko l) lum`') Ouu'`` C'' vu'u :tum kmul
t`'TD! t`' t`':uum b0!3.t` t! uum o th t`'
t`':`dth ?b t? `mu_ `dth
t`':h`_ht ?b t3 `mu_ h`_ht
t`':'v' 0 t+ )um`d 'u)
t`':o ! t o uo. `u 'u)
t`':o'umu 3 tb o'umu uo. `u 'u)
`mu_ `mu_:t) t` t `mu_ omut
`mu_::`z !9?l t8
`mu_:dutu |)t| t9 dutu :to
|uud |uud:ouut 3 t!0 |uud `uomut`ou
|uud:dutut) 0DTb)t t!!
|uud:o'o`ut kd t!? ou o th |uud:
|uud:x:`z ?b t!3 |uud :`z
|uud:):`z ?b t!+
utt`|ututt`|ut:'t b!8 t! t`' ||ox
utt`|ut:`_ht b998 t!b
utt`|ut:to ?980+b t!
utt`|ut:dou ?9?0b t!8
utt`|ut:t`m ?003!3 t!9 `tu tul t`m
o_t o_t:_ uu'' t?0
o_t:0T 0oTuu:o t?! o_t `uo
mtudutu mtudutu:d:` St`u_ t?? mtu d:`|:
mtudutu:mtu ^u t?3 mu mtudutu
t`'TD?
...

As shown in Table 1, a little tile is designed to be stored
like this in HBase. Firstly we design Iamily schema in advance
and divide image tile into six parts. Part tile gives the basic
introduction oI the image contained name, width, height, zoom
level, column, row tags. Image describes the tile and manages
diIIerent versions oI data. Band provides the detailed
inIormation oI each image stored in band. Attribute gives the
range oI the image and project oIIers the project datum which
is always one chose Irom GCPs and GeoTransIormation. Last
but not the least, metadata present the Ioundation oI the tile.
AIter completed the Iamilies, data can be added dynamically
as you like. When you write data to rows, your write operation
is locked while the read operation is unlocked. AIter updating
some data, timestamps are added automatically. Also we can
design a column to store the relationships between each little
tile. ThereIore, query and locate tiles is eIIicient.
HBase is a distributed database similar to BigTable. Both oI
them are sparse, long-term storage, multi-dimensional and
ordered maps. The index oI these map are row keys, column-
key and timestamp and each value is an array oI characters
unexplained. In other word, the data are strings, not type. The
spatial index in HBase is a double structure system. The Iirst
level index is the row key to the column Iamilies which can
locate data set Irom the huge database. The second index is the
column key. We can get the accurate image Irom the column
Iamily through this key.
D. Data Operations in HBase
Client has a common interIace to connect HBase, and the
connections maybe diIIerent Irom Java API, ThriIt, REST to
HBase shell. Usually a cache mechanism is maintained at
client. For example, inIormation such as the location oI
HRegion is kept in client cache. In a normal write pattern,
Iirstly data would be written into the Log and MemStore.
When the size oI MemStore gains to the threshold value, a
new MemStore will be built and the old one will be put into
the Ilush lists waiting to be written in the StoreFile. While in a
read model, HBase query the necessary data block with a
column index and bloom Iilter technology, and deserialize
the image and then return back true data to client. Fig. 5 has
given out the process oI data writing and reading.

Figure 5. Data operations:(a)data writing, (b) data reading
In HBase, there are simple operations such as put, get,
update and delete to manage the mass data. In other words,
elementary operation is quite simple in HBase though Java
API. So we don`t have to accomplish the complex related
operations among the tables as there are no directly relations
between each table.
E. The Client Implementation of HBGIS

Figure 6. Ajax timing diagram
As we can see in Fig. 6, we use Google Web Toolkits to
build the Rich Internet Application and HBase as the tile
source. First user sends a map request through the Browser,
this request will be intercepted by the Ajax engine and then
turned to an asynchronous XML one. The server will accept
and analyse these requests and return back data to client while
in the data transIer user can other things such as move the map.
It`s extremely appropriate to use Ajax technology in client map
applications. The obvious advantage is that it achieves the
target oI building a common geographic inIormation sharing
platIorm in a very natural way.
IV. RESULT AND ANALYSIS
A. Part of The Results
As is known to all, a data manipulate program is indispen-
sable in every backend databases. Fig. 7 has shown the manage
page oI HBGIS built by FWTools. We can get an image
pyramid through this program.

Figure 7. Backend data management
In our systems, we have come out with a special data
processing table and many data serving tables to store and hand
the massive images. The original image inIormation is put into
the data processing table through a program to make the data
Iormatted. Each row in the data processing table represents a
tile oI physical map and the geographical naming oI key
ensures these separate blocks Iormed a large map. Multi-
column groups ensure that data is sparse, and a single storage
Iile would not be too large. A background program will deal
with these data on a regular basis, organize and entry data
services to the serving tables, then empty the processed raw
data. Data serving table contains several data sheets and an
index table.

Figure 8. A simple client oI HBGIS
Fig 8 has shown the client implementation oI HBGIS. In
this simple prototype system, user gives Iour parameters to
control the image that will be returned. The area is usually a set
oI which pyramid you want to search and zoom level
represents the correspond layer in the pyramid. The last two
parameters are the center coordinates oI map that will be
displayed in your screen. The server will transIer the data
blocks selected by the user to the client and then connect these
blocks into a seamless map.
B. Analysis of HBGIS
Because data block can be located directly in the massive
database based on the current resolution and the range oI
coordinates, the server`s perIormance will not become lower.
HBGIS has solved several problems in data management as
Iollows.
1) Data inaepenaence
As diIIerent images have diIIerent Iormats, the object-
oriented design can help to manage your data quickly without
considering the image Iormat and type. This characteristic Iully
embodies the independence oI a database.
2) Spatial inaex of the image
HBGIS built a simple but eIIective index system to get
right tile data in diIIerent version and the index system in
Hadoop will help to acquire high perIormance.
3) Data query
HBGIS supports data query, including the scope oI the
geographic coordinates, the image acquired time, acquisition
mode and projection inIormation, etc.
4) Multi-version aata management capabilities
The column based schema works perIect in managing
multi-version data. The unique row-key and timestamp can
distinguish diIIerent tiles and same tile oI diIIerent versions.
5) Ability to support multi-source aata
Because data can be added dynamically as you like and
every column can be extended to store more inIormation, you
can allocate multi-source images into HBase convenient.
According to the perIormance oI our prototype system,
HBGIS, and the analysis above, we achieve the system design
goals oI publishing mass remote sensing images on Web.
V. DISCUSSION
Applications deployed on the Internet are immediately
accessible to a vast population oI potential users, and with the
coming oI WEB2.0, users need more and more data. As a
result, relational databases product is Iacing serious load
problems. As the large number oI unstructured inIormation
needed to be interacted with the database, we need a
distributed and scalable Iramework. NOSQL database is a
suitable platIorm Ior storing unstructured data. Another
diIIerence between NOSQL database and relational database
is that it is a column-based model rather than line-based.
These two Ieatures above make the store structure very loose
and the content Iield really simple. In many ways, compared
with relational databases, NOSQL databases deal better, in
particular in the data consistency.
In terms oI a new technology oI distributed database,
NOSQL databases have some disadvantages as well. As Ricky
Ho said, compared with relational database, the query search is
NOSQL's short board. At present, many NOSQL databases are
based on DHT (Distributed Hash Table) model, so the query is
equivalent to access the hash table.
In conclusion, NOSQL databases Iace some challenges. Be-
cause NOSQL databases don`t work with SQL, they require
manual query programming, which can be Iast Ior simple tasks
but time-consuming Ior others. In addition, complex query
programming Ior the databases can be diIIicult. The relational
databases are rich in type selection and storage, but HBase has
only one string type and when HMaster server crashed, the
whole system would become unavailable. Anyway, NOSQL
databases like HBase have shown their potentiality in massive
data storage and Iine extension. In a word, when and how to
use these new NOSQL databases is depend on your
applications.
VI. CONCLUSION
It is obviously that a revolution oI Web technology will
make a big diIIerence to GIS and NOSQL databases certainly
have giant advantages to relational ones, especially counting
the data is increasing day by day. ThereIore, the BigTable and
HBase similar column-based distributed database model works
more eIIective in massive data storage and better adapted to the
needs oI global image applications. A Ilexible distributed
architecture can make use oI inexpensive hardware to build a
large data warehouse and cluster conIiguration can enhance the
quality and eIIiciency oI the global image service. This paper
has given us more conIidence and we are quite sure NOSQL
databases have giant potentials in remote sensing image
management. Our next work is to use these NOSQL databases
as the data source in image web systems based on WMTS
which is an OGC standard.
REFERENCES
|1| P. Bolstad, GIS Fundamentals: A First Textbook on Geographic
InIormation Systems, 3rd ed., Bookmasters Dist, 2008.
|2| D. Borthakur, 'The Hadoop Distributed File System: Architecture and
Design, The Apache SoItware Foundation, http://hadoop.apa-
che.org/core/docs/current/hdIsdesign.pdI, 2007.
|3| X. Liu, J. Han, Y. Zhong, and C. Han, 'Implementing WebGIS on
Hadoop: A case study oI improving small Iile I/O perIormance on
HDFS, Proc. IEEE ConI. Cluster Computing and Workshops,
CLUSTER '09, pp. 1-8, 2009, doi:10.1109/CLUSTR.2009.5289196.
|4| C. Fay, D. JeIIrey, G. Sanjay, H. Wilson C, and W. Deborah A,
'Bigtable: A distributed storage system Ior structured data, ACM
Transactions on Computer Systems, vol. 26(2), 2008, pp. 1-26.
|5| R. Francis and L. Kevin, 'Rapid SoItware Prototyping Using Ajax and
Google Map API, Proc. IEEE ConI. Advances in Computer-Human
Interactions, 2009, pp. 317-323, doi:10.1109/achi.2009.68.
|6| A. SteIan, J. Dean, K. AlIons, and S. Michael, 'A Comparison oI
Flexible Schemas Ior SoItware as a Service, Proceedings oI the
International ConIerence on Management oI Data and 28th Symposium
on Principles oI Database Systems, pp. 881-888, 2009.
|7| J. Ernst, 'SQL Databases v. NoSQL Databases, Communications oI the
ACM, vol. 53(4), 2010, pp. 10-11.
|8| N. Leavitt, 'Will NoSQL Databases Live Up to Their Promise? IEEE
Computer Society, vol. 43(2), 2010, pp. 12-14.
|9| A. Khetrapal and V. Ganesh, 'HBase and Hypertable Ior large scale
distributed storage, systems: A PerIormance evaluation Ior Open Source
BigTable Implementations, http://www.ankurkhetrapal.com/downlo-
ads/HypertableHBaseEval2.pdI, accessed at 2011.
|10| K. Murali, K. Balaji, R. Anand, and S. Sriram, 'Implementation and
perIormance evaluation oI a hybrid distributed system Ior storing and
processing images Irom the web, Proc. IEEE ConI. Cloud Computing
Technology and Science, pp. 762-767, 2010.

S-ar putea să vă placă și