Documente Academic
Documente Profesional
Documente Cultură
AbstractThe increase of spatial data has led organizations to the server. The authenticated users send queries to the service
upload their data onto third-party service providers. Cloud com- provider for relevant information, but do not want to reveal
puting allows data owners to outsource their databases, eliminat- their location to the server.
ing the need for costly storage and computational resources. The
main challenge is maintaining data condentiality with respect Thus, one has to consider the following requirements
to untrusted parties as well as providing efcient and accurate when outsourcing spatial databases in the cloud environment.
query results to the authenticated users. We propose a dual First, the database content should be kept hidden from the
transformation scheme on the spatial database to overcome this service provider and malicious attackers. Naturally, there exists
problem, while the service provider executes queries and returns
a straightforward solution to protect data owners privacy:
results to the users. First, our approach utilizes the space-lling
Hilbert curve to map each spatial point in the multidimensional The data owner can encrypt all spatial data points and send
space to a one-dimensional space. This space transformation only encrypted objects to the service provider without any
method is easy to compute and preserves the spatial proximity. other information. During the query phase, an authorized user
Next, the order-preserving encryption algorithm is applied to the retrieves all the encrypted data from the server, decrypts it and
clustered data. The user issues spatial range queries to the service searches for required data points. This would provide perfect
provider on the encrypted Hilbert index and then uses a secret security, but clearly it cannot be used in real-time applications
key to decrypt the query response returned. This allows data as the resulting communication cost will be extremely high,
protection and reduces the query communication cost between especially if only a small portion of the data is required.
the user and service provider. Furthermore, the attractive features of the cloud environment
KeywordsSpatial Data, Outsourced Database, Space-Filling such as scalability are ignored.
Curves, Order-Preserving Encryption, Database Security
Another important issue to resolve is the development
of efcient query processing techniques for encrypted data
I. I NTRODUCTION at the server. Several specialized encryption techniques have
been proposed, such as the order-preserving encryption (OPE)
Cloud computing has gained wide popularity in recent technique by Agrawal et al. [2]. OPE hides the original dis-
times due to its attractive features. Database outsourcing is tance values while allowing simple comparisons to be correctly
a common cloud computing paradigm that allows data owners evaluated at the SP. A relatively new encryption scheme is
to take advantage of its on-demand storage and computational the Fully Homomorphic Encryption [3] technique proposed by
resources [1]. For a small cost, organizations with limited Gentry et al., which enables direct computation on encrypted
resources can outsource their large volumes of data to a third- data which is stored in the service providers in the cloud.
party service provider and utilize their dynamically-scalable Different types of queries can be processed without decrypting
storage as well as computational power. However, the fact the data. However, all known homomorphic schemes are too
remains that the data is controlled by an untrusted third-party inefcient for use in practice and suffer from high performance
and this raises critical security issues such as condentiality overhead [4].
and privacy. Data condentiality requires that data is not
disclosed to untrusted users and data privacy assures that data Most existing approaches protect the outsourced data us-
is not altered before being processed by the server. ing spatial transformation schemes [5][8] or conventional
cryptographic techniques [9], [10]. However, to the best of
Recently, mobile devices and navigational systems have our knowledge, with most schemes there is a trade-off be-
become exceedingly common and this has created the need tween data condentiality and efcient query processing. To
for location-based services (LBSs) of various kinds. This in overcome these limitations, we propose a two-layer encoding
turn has led to an increase in spatial data which has to be approach HELP (Hilbert-curve transformed Encrypted List of
managed and maintained effectively. Numerous users require Packets), where the spatial points are transformed and then,
LBSs on a daily basis and would like to issue spatial queries in an encryption technique is applied to the transformed spatial
an anonymous manner with guaranteed results. Also, the real space. The cloud architecture model comprises of 3 main
spatial data owners do not want to reveal the data to the service entities, namely the Data Owner (DO), Service Provider (SP)
provider in order to maintain the privacy of the data. A scenario and Authenticated User (AU).
of such an exchange is where a data owner outsources its data
to a service provider like Google Maps. In the process, the data The DO guarantees security by transforming and encrypt-
owner does not want to expose the sensitive information to ing the spatial database before outsourcing to the SP. To
14
indexes, while sending the data to the SP. The scheme proves orientation and n is the curve order. Based on the HSK, it is
to be inefcient as the buckets are not hierarchically structured possible for two or more points to have the same cell index
and this demands a linear search on the number of buckets. value in the curve.
Agrawal et al. [2] point out that bucketing is vulnerable to the
estimation attack, where the attacker can infer the approximate
values of the encrypted data. Motivated by this, they were the
rst ones to propose an order-preserving encryption scheme
for one-dimensional numeric data.
The OPE scheme in [2] allows the server to execute range
queries directly on the encrypted representation. While their
technique supports efcient processing of range queries at the
SP, it assumes that an attacker cannot stage a known plaintext Fig. 1. Hilbert Space Transformation and Encryption
attack. Additionally, the rst formal study of security provided
by OPE schemes was conducted by Boldyreva et al. in [15]. The Hilbert Packet List (HPL) construction process at the
In contrast, our approach supports the general case where an DO is as follows. Each spatial point in the space is assigned
attacker knows a sample of the data points as well as their a Hilbert cell index in the grid. Next, data points are stored in
encrypted values (i.e. known-plaintext attack). Furthermore, in packets based on their index value. Each packet is represented
our transformation scheme HELP, we utilize the more secure as: Ps which is the starting Hilbert index for that packet, Pe
OPE protocol proposed by Boldyreva et al. in [16]. is the ending Hilbert index and Pc is a list of a xed number
of original spatial points in the packet. Ps and Pe represent
III. S PACE T RANSFORMATION AND E NCRYPTION the location of the rst and last data points in the packet,
respectively. The Pc size is determined by c, the number of
To preserve the privacy of spatial data, we propose to hide points stored per packet.
the original spatial data points in two-ways. First, we transform
the space by converting the 2-D points to 1-D using the Hilbert For clarity, Figure 1 illustrates a simple example of the
Space Key [6]. Next, we encrypt the resulting Hilbert indices proposed approach where the Hilbert curve order is 3 and Pc
and data points using an order-preserving encryption technique size is 4. For the rst packet, the grid is spanned from Ps = 0
[16]. Both the transformation key and the encryption key lling Pc for c points and Pe is the cell index of the cth object
are transmitted by the DO to the trusted AUs over a secure in the packet. The rest of the HPL is lled in a similar manner
communication channel using SSL without the need for any till all the points have been stored. The DO sends the HSK to
costly tamper-resistant devices. the AU for mapping the spatial range query to cell indices.
B. Hilbert Packet List Construction The key requirements of a secure database outsourcing
scheme demand that: 1) data condentiality is maintained on
The process starts with static spatial data points distributed the server and 2) queries are efciently processed by the SP
in the grid. Spatial points are traversed and indexed based and results are returned to the user without any alteration.
on the order in which they are visited by the curve. Next, Since our approach provides a dual layer of security, we
the points are assigned Hilbert cell index values based on the analyze the security of the Hilbert space-lling curve based on
curve. The grid is spanned according to the curve using the the brute-force attack and the location approximation attack.
Hilbert Space Key (HSK) [19]. The HSK = {x0 , y0 , , n}, Furthermore, the popular known-plaintext attack for the order-
where (x0 , y0 ) is the curves starting point, is the curves preserving encryption is presented.
15
1) Brute-Force Attack: If an attacker is aware of the points (l m n o) are contained in the query area (cf. Figure
space-transformation technique used (i.e. Hilbert curve in our 1). The AU then decrypts the retrieved data points using the
method), as well as a subset of the original spatial data points OPE key and generates the actual query response.
along with their transformed Hilbert cell values, the attacker
can determine the key of the transformation technique. The Algorithm 1 shows the complete spatial range query pro-
study by [6] suggests that it is infeasible for a malicious cedure. Lines 1 5 list the process at the AU. First, the query
adversary to infer the exact transformation key being used. region is converted into a set of Hilbert cell values (Line 1).
Next, this set of values is encrypted using the OPE key and sent
2) Location Approximation Attack: The spatial data points to the SP. Lines 613 represent the processing at the SP side.
are stored along with their indices in the HPL. Each packet Each Hilbert cell value is checked to see if it belongs within
in the HPL (cf. Figure 1) consists of the starting Hilbert the start and end range of the packet. The packet contents
index, ending Hilbert index and the set of points in that range. are added to the query response if the packet qualies (Lines
The encrypted HPL is stored at the SP. The security of the 8 9). Lastly, the result is returned to the AU.
Hilbert space transformation can be exposed if the attacker can
gather limited background knowledge about the data without
the HSK. We assume that the passive attacker has: 1) a subset
of the original spatial points, S SD, as well as the 2)
corresponding subset of their Hilbert index ranges in the HPL,
S R. Thus, the attacker can try to approximate the original
locations of the spatial points in SD S by making use of
Hilbert index ranges in R S , but not knowing the actual
Hilbert index value of the point.
3) Known-Plaintext Attack: The attacker is required to Fig. 2. Spatial Range Query Execution
obtain the plaintexts for the encrypted data. Order-preserving
encryption schemes are deterministic such that they ensure
that the numerical ordering of plaintext data is preserved in Algorithm 1 Spatial Range Query
the ciphertext domain. The one-wayness property of encryp- Input: [(cx0 , cy0 ), (cx1 , cy1 )]: opposite corners of Query
tion was proven to hold by Boldyreva et al. in [16], where Region
the adversary is unable to invert the encryption without the Output: Result: Spatial data points covered by the query
knowledge of the key. But the adversary may be able to gain response
information about the order of encrypted values revealed by the 1) QR = using (cx0 , cy0 ) and (cx1 , cy1 ), obtain cells
OPE scheme and predict the plaintext values (i.e. if (p1 , c1 ) and contained in query
(p2 , c2 ) are known for p1 < p2 and no other known plaintext- 2) Hilbert Cell Indices = {}
ciphertext pairs occur between these two). The more plaintext 3) for all (c QR)
and ciphertext pairs known by the adversary, the weaker the 4) Hilbert Cell Indices = Hilbert Cell Indices
security provided by OPE. (encrypted) computed Hilbert Cell Index of c
5) end for
IV. S PATIAL R ANGE Q UERY 6) for all (P HPL)
7) while (Hilbert Cell Indices Ps )
Our approach deals with two-dimensional spatial range
8) if (Hilbert Cell Indices [Ps , Pe ])
queries due to their popularity. When a query request is
9) Result = Result Pc
initiated by the AU, the range query is converted to a set of
10) end if
1-D numbers, and this includes all Hilbert cells that partially
11) end while
or completely overlap with the query region. Since some of
12) end for
these cells only partially overlap with the query, the set of
13) Return Result to AU
indices might retrieve extra data points (i.e. false positives) in
the query response. The SP is responsible for processing the
query request. The data points in the original space are encoded
and their Hilbert cell values are stored in the HPL, which is V. S ECURITY A NALYSIS
then encrypted and stored at the SP. Figure 2 shows the range Before showing the experimental results, we rst analyze
query process on the example given in Figure 1. Given the the security of our transformation and encryption scheme.
coordinates of opposite corner points of a range query, the Having described the attack model in Section III-D, here we
AU converts the query into a set of Hilbert cell values that quantify the attackers background knowledge and empirically
belong to the region. The mapping of a query is made possible measure the security of the Hilbert-curve and the order-
using the HSK provided by the DO, without any knowledge preserving encryption.
of the original space distribution. Next, the AU encrypts the
query request using the OPE key and then sends it to the A. Hilbert Packet List
SP. In Figure 2, the values sent correspond to the last two
packets of the HPL. The SP makes simple comparisons over Starting with the brute-force attack, the attackers entire
the encrypted HPL given the encrypted Hilbert cell indices and search space will have 2b 2b 2b elements for x0 , y0 and .
returns the relevant encrypted Pc s to the AU. Even though Let us denote the number of possibilities for the curve order n
eight data points (i j k l m n o p) are returned, only four as N . Since N 2b , the complexity of the brute-force attack
16
to nd the transformation key is O(23b ), where b is the number of 123, 593 points. The datasets are obtained from [20]. We
of bits used to represent each parameter in the HSK. Choosing normalize the domain of each dataset to the unit square [0, 1]2 .
a large enough value for b will make the Hilbert mapping
irreversible. To accurately nd the curves starting point, it B. Index Construction Time
should lie on the intersection of two edges. Using b bits for
each x0 and y0 , the attacker can generate 2b values on each
axis and this will require an exhaustive search over the grid.
Likewise, for the curve orientation, the entire continuous 360
space should be discretized to generate 2b values. Assuming
that b is chosen to be 16 bits, the resulting complexity of
nding the HSK parameters would be O(2316 ) for different
possibilities of the curve order n.
For the location approximation attack, the estimation will
simply reveal the Hilbert index range of the spatial point
and not the Hilbert index value itself. The Hilbert start and
end ranges in each HPL tuple are determined by the packet
size value, c. Moreover, in the transformed Hilbert space, Fig. 3. Dataset vs. Index Construction Time (s)
the distance between original points and transformed index
range values is not proportional. We can measure the privacy Figure 3 shows the amount of time it takes to build the
disclosure risk (PDR) of the Hilbert curve using the denition index with the proposed approach for both datasets and a
provided in [8]. For any given dataset, we compute the PDR as comparison with the CRT technique is also done. In HELP,
follows: 1/N T , where N T = totalSpatialP oints/c denotes we rst build the HPL using the HSK and encrypt it using
the number of tuples in the HPL. Based on the selected the OPE key. While in the CRT, the R -tree index structure
packet size and curve order shown in the experimental analysis is built and encrypted using AES [13]. For either approach,
(Section VI), the resulting PDR is low such that P DR the construction trend for both datasets is quite identical. The
c/totalSpatialP oints. proposed HELP shows better performance as it can be built
in approximately 5% of the time it takes to construct the R -
In the worst-case, if the HPL packet encrypted using OPE tree for both OL and NE. However, both the HELP and CRT
is decrypted, the adversary can only determine the range of indexes are built for static data and therefore cannot handle
Hilbert Cells the spatial point belongs to and not the exact point dynamic updates. In case of modications or additions, the
location. Thus, this proves that the proposed HELP approach indexes would have to be rebuilt, encrypted and sent again to
is secure from attackers. the service provider.
B. Order-Preserving Encryption C. Packet Size
When the adversary knows a certain number of plaintext-
ciphertext pairs, the scheme splits the plaintext and ciphertext
spaces into subspaces. On each subspace, the analysis under
each one-wayness denition reduces to that of the random
order-preserving function (ROPF) domain and range of the
subspace. Boldyreva et al. [16] show that the attacker cannot
forward-evaluate ROPF since that would require knowledge
of the OPE key. For their analysis to apply to our approach,
the ciphertext space must be at least twice the size of the
plaintext space. Thus, in the OPE approach adapted by the
HELP scheme, the OPE parameters are chosen in such a way
that subspaces are unlikely to violate this condition.
Fig. 4. Packet Size vs. Redundant Data (%)
VI. E XPERIMENTAL R ESULTS
We have conducted several experiments to evaluate the The value of packet size (c), which is the number of
performance of the proposed approach. Moreover, we empir- data points per packet, dictates how the HPL is built. An
ically compare the HELP technique with the Cryptographic optimal value thus reduces the amount of redundant data points
Transformation (CRT) method proposed by Yiu et al. [9]. transmitted. Figure 4 shows the effect of the packet size on
Experiments were performed on an Intel Core i7-3770 CPU the data redundancy for different query sizes. The redundant
@ 3.40 GHz with 16 GB of RAM running the 64-bit Ubuntu data percentage is given with respect to the total amount of
operating system and are implemented in C++. data returned. We vary the packet size values from 5 to 25 for
spatial range query sizes 1%, 2% and 5%, where the query size
is measured as a % of the domain space extent. Using HELP,
A. Datasets
the amount of redundant data returned stops uctuating when
Our experiments are performed on two real-world spatial packet size, c = 20 . The experiments were performed on
datasets: (1) City of Oldenburg (OL) Road Network compris- the NE dataset and a similar trend was observed for the OL
ing of 6, 104 points and (2) North East USA (NE) consisting dataset.
17
D. Range Query Communication Cost R EFERENCES
[1] Z. Xiao and Y. Xiao, Security and privacy in cloud computing, IEEE
Communications Surveys & Tutorials,, vol. 15, no. 2, pp. 843859,
2013.
[2] R. Agrawal, J. Kiernan, R. Srikant, and Y. Xu, Order preserving
encryption for numeric data, in Proceedings of the 2004 ACM SIGMOD
international conference on Management of data. ACM, 2004, pp.
563574.
[3] C. Gentry et al., Fully homomorphic encryption using ideal lattices.
in STOC, vol. 9, 2009, pp. 169178.
[4] B. Hore, S. Mehrotra, M. Canim, and M. Kantarcioglu, Secure multidi-
mensional range queries over outsourced data, The VLDB JournalThe
International Journal on Very Large Data Bases, vol. 21, no. 3, pp.
333358, 2012.
[5] J. K. Lawder and P. J. H. King, Querying multi-dimensional data
indexed using the Hilbert space-lling curve, ACM Sigmod Record,
Fig. 5. Query Size vs. Communication Cost (KBytes) vol. 30, no. 1, pp. 1924, 2001.
[6] A. Khoshgozaran and C. Shahabi, Blind evaluation of nearest neighbor
In this experiment, we generate random spatial range queries using space transformation to preserve location privacy, in
Advances in Spatial and Temporal Databases. Springer, 2007, pp.
queries of varying sizes and measure the amount of data (i.e. 239257.
communication cost in Kilobytes) entailed in this exchange [7] W.-S. Ku, L. Hu, C. Shahabi, and H. Wang, A query integrity assurance
between the service provider and authenticated user. The scheme for accessing outsourced spatial databases, Geoinformatica,
communication cost includes: 1) the AU transforms the query vol. 17, no. 1, pp. 97124, 2013.
region into a set of Hilbert index values using the HSK and [8] F. Tian, X. Gui, P. Yang, X. Zhang, and J. Yang, Security analysis for
transmits these to the server, 2) the SP searches the HPL for Hilbert curve based spatial data privacy-preserving method, in 2013
corresponding spatial points and returns the encrypted result. IEEE 10th International Conference on High Performance Computing
and Communications & 2013 IEEE International Conference on Em-
We experiment with query sizes ranging from 0.005% to 0.1%, bedded and Ubiquitous Computing. IEEE, 2013, pp. 929934.
where each range query is a randomly distributed region in [9] M. L. Yiu, G. Ghinita, C. S. Jensen, and P. Kalnis, Enabling search
the normalized domain space. We represent each Hilbert index services on outsourced private spatial data, The VLDB Journal, vol. 19,
value by 4 bytes and each spatial point in double precision with no. 3, pp. 363384, 2010.
16 bytes. Figure 5 shows the resulting average communication [10] P. Wang and C. V. Ravishankar, Secure and efcient range queries on
cost for the NE over 100 query runs for 5 varying query sizes. outsourced databases using R-trees, in 2013 IEEE 29th International
It is clear that the communication cost increases linearly as Conference on Data Engineering (ICDE). IEEE, 2013, pp. 314325.
the query size increases due to the increase in number of [11] H. Hacigumus, B. Iyer, and S. Mehrotra, Providing database as a
service, in 18th International Conference on Data Engineering, 2002.
points returned. The OL dataset results in a similar trend. Proceedings. IEEE, 2002, pp. 2938.
We compare our HELP approach with the CRT technique and
[12] E. Damiani, S. Vimercati, S. Jajodia, S. Paraboschi, and P. Samarati,
demonstrate that our method is at least twice as fast. Moreover, Balancing condentiality and efciency in untrusted relational dbmss,
for query sizes greater than 0.05%, there is a sharp increase in in Proceedings of the 10th ACM conference on Computer and Commu-
the communication cost of CRT due to the amount of messages nications Security. ACM, 2003, pp. 93102.
transferred between the user and server, which is dominated [13] N. F. Pub, 197: Advanced Encryption Standard (AES), Federal
by the depth of the R -tree utilized. This is due to the fact Information Processing Standards Publication, vol. 197, pp. 4410311,
2001.
that in CRT, the query is processed at both the user and server
side, whereas in HELP, the SP processes the query and returns [14] A. Khoshgozaran and C. Shahabi, Private Buddy Search: Enabling
private spatial queries in social networks, in International Conference
the result to the AU. on Computational Science and Engineering, 2009 (CSE09)., vol. 4.
IEEE, 2009, pp. 166173.
[15] A. Boldyreva, N. Chenette, Y. Lee, and A. ONeill, Order-preserving
symmetric encryption, in Advances in Cryptology-EUROCRYPT 2009.
VII. C ONCLUSION Springer, 2009, pp. 224241.
[16] A. Boldyreva, N. Chenette, and A. ONeill, Order-preserving encryp-
Database outsourcing is a popular paradigm of cloud com- tion revisited: Improved security analysis and alternative solutions, in
puting. Cloud computing virtualizes storage and computing Advances in CryptologyCRYPTO 2011. Springer, 2011, pp. 578595.
resources at the server and provides data to trusted users. In [17] B. Moon, H. V. Jagadish, C. Faloutsos, and J. H. Saltz, Analysis
this work, we are trying to achieve a balance between data con- of the clustering properties of the Hilbert space-lling curve, IEEE
dentiality at the server and integrity of query results returned. Transactions on Knowledge and Data Engineering,, vol. 13, no. 1, pp.
124141, 2001.
We propose to transform the spatial database by applying the
Hilbert-curve with the Hilbert Space Key. Next, we make it [18] M. F. Mokbel, W. G. Aref, and I. Kamel, Analysis of multi-dimensional
space-lling curves, GeoInformatica, vol. 7, no. 3, pp. 179209, 2003.
more secure by applying the order-preserving encryption to the
[19] W.-S. Ku, L. Hu, C. Shahabi, and H. Wang, Query integrity assurance
transformed data. We dene several attack models and show of location-based services accessing outsourced spatial databases, in
that our scheme provides strong security against them. Lastly, Advances in Spatial and Temporal Databases. Springer, 2009, pp.
we perform experiments and demonstrate that our HELP 8097.
approach is superior to the existing cryptographic approaches. [20] Real spatial datasets, http://www.cs.fsu.edu/ lifeifei/SpatialDataset.htm.
Thus, we conclude that our dual transformation method not
only protects the data but also enables the authenticated users
to retrieve spatial range query responses efciently.
18