An Overflow Problem in Network Coding For Secure Cloud Storage

This article has been accepted for publication in a future issue of this journal, but has not been
fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TPDS.2018.2870890, IEEE
Transactions on Parallel and Distributed Systems
An Overflow Problem in Network Coding for

Secure Cloud Storage
Yu-Jia Chen, Member, IEEE and Li-Chun Wang, Fellow, IEEE,
National Chiao Tung University, Taiwan
Email: allan920693@g2.nctu.edu.tw and lichun@cc.nctu.edu.tw
Abstract— In this paper, we present the overflow problem of Source

a network coding storage system (NCSS) when the encoding
parameters and the storage parameters are mismatched. The Original File
Split
overflow problem of the NCSS occurs because the network- Original Symbols
coded encryption yields extended coded data, resulting in high Encode
storage and processing overhead. To avoid the overflow problem, Encoded Data
we propose an overflow-avoidance NCSS scheme that takes
account of security and storage requirements in both encoding
and storage procedures. We provide the analytical results of
the maximum allowable stored encoded data under the perfect
secrecy criterion. The design guidelines to achieve high coding
efficiency with the lowest storage cost are also presented.
Index Terms— Cloud storage; network coding; data security.
I. I NTRODUCTION Encoded Data

Network coding is attractive for its capability of achieving Decode
the unconditional security. In principle, network coding simply Original symbols
Assemble
mixes data from different network nodes based on the well- Original file
designed linear combination rules. As long as partial network-
coded data are protected, an eavesdropper cannot decode the Fig. 1. An illustrative example of network coding based secure cloud storage
entire plaintext even with infinite computing power and time systems.
[1]. Another advantage of network coding is that no bandwidth
expansion occurs compared to the cryptographic approaches.
In recent years, network coding is introduced to enhance storage system may encounter the following issue. The stored
the security of cloud storage in which customers outsource coded data size in the cloud database can be longer than
their data to multiple clouds [2]. Though offering many ad- the original data size. This issue occurs when network coded
vantages, cloud storage inevitably poses security threats on the data are represented in the format of digits and is called the
outsourced data. In [3], the problem of checking the integrity overflow problem in this paper. It is notable that the above
of network coded data in a secure cloud storage system was works [2]–[4] on secure cloud storage will face the overflow
investigated. In [4], it was shown that network coding can problem.
be used to prevent eavesdropping in distributed cloud storage. Table I is an illustrative example of the overflow problem in
However, from the aspect of implementation, the performance the case of binary digits. A is the network encoding matrix in
issues of network coding for secure cloud storage remains Galois field GF(23 ) constructed with the primitive polynomial
open. This motivates us to explore how to practically and cost- P (x) = x3 + x + 1, bi ’s are the original data, and ci ’s are the
effectively store coded data in multiple clouds. network-coded data, where i = 1, 2, 3. Assume that c1 (0) and
Figure 1 illustrates a secure cloud storage scenario con- c3 (101) are stored in the first database and c2 (11) is stored
sidered in this paper. A network coding storage system con- in the second database. In this example, the bit length of ci
sists of the following three procedures: splitting, encoding, is larger than that of bi . We consider that the number of bits
and distributing. In this figure, an original file is split into required to represent a single codeword (i.e., code length) is
smaller chunks of symbols. These symbols are encoded by fixed for all the coded data stored in the same database. In
Vandermonde matrix [5] and then stored to different cloud this case, the code length used in a cloud database is the
databases. With network coding protection, a legitimate user maximum bit length of the stored network-coded data. Hence,
can recover the entire original file, but an eavesdropper in only it requires three bits code length to store c1 and c3 and two bits
one cloud database cannot decode the original symbols [4]. code length to store c2 in the respective cloud databases. In
When encoding parameters (such as the size of encoding principle, network coding should not incur more bandwidth,
matrix) are not jointly designed with the storage parameters i.e., encoded bit length should be equal to the plaintext bit
(such as the storage size per node), a secure network coding length. In this example, the bit length of b is three, while
1045-9219 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TPDS.2018.2870890, IEEE
TABLE I
combined and regenerated at the intermediate nodes according
A N ILLUSTRATIVE EXAMPLE OF THE OVERFLOW PROBLEM IN THE
to algebraic encoding. In addition to throughput enhancement
CASE OF BINARY DIGITS
[6]–[9] and data robustness [10], the other advantages of
network coding are reliability and security.
A b c
 
 1 1 1  A. Network Coding for Data Recovery
 
(1, 1, 0)T (0, 3, 5)T =(0, 11, 101)T
 
 1
 2 3 
 Network coding can make data recovery process more
efficiently, especially in the distributed storage systems. In
 
1 4 5
contrast to erasure coding [11], the repaired data fragments in
network coding are mixed in the intermediate nodes. Hence,
network coding can recover data with smaller amount of
the bit length of c is extended to six. We call it the overflow information communicated during a repair process. As proved
problem in this paper. by [12], the data recovery problem of distributed storage sys-
In overflow problem, bandwidth expansion and redundant tems can be translated to the routing problem of multicasting
computation occur if the data representation formats of the networks. A new class of storage codes based on network
original data and the coded data are mismatched. This is coding namely Regenerating Code was proposed in [12]. For
mainly because the existing encoding process (e.g., splitting coding complexity reduction, the authors of [13] proposed a
the original data) does not consider i) how to place the coded low-complexity regenerating code using a new form of coding
data among multiple clouds and ii) the risk of being decoded matrix with a small field size.
by the eavesdropper. Clearly, the extended coded data would As in the distributed storage systems, recent studies [14]–
waste storage space and degrade coding efficiency. [16] demonstrated the feasibility of storing coded data to
To solve this overflow problem, we develop a systematic multiple clouds even with multiple node failures. The authors
design method to calculate the appropriate parameters of a of [14] applied network coding to optimize the reliability
network-coded cloud storage system, such as the size of performance of frequently accessed data in cloud storage
encoding matrix. The key idea of the NCSS scheme is to take systems. To simplify the repair procedures, network coding
dynamic-length alphabet representation of network coded data. with network structure based on the general erasure codes
The original data are regrouped before the encoding process. was shown to reduce the repair traffic significantly [15]. A
A complete encoding procedures and data distribution scheme new type of regenerating code that can reconstruct coded data
are jointly designed for secure cloud storage. Our contributions from multiple failures in batches rather than separately was
are described as follows. proposed in [16].
• Formulate the overflow problem of a network-coded
cloud storage system. To our best knowledge, the over- B. Network Coding for Data Security
flow problem for a network coding storage system has
Network coding can prevent data from being eavesdropped
not been investigated in the literature yet.
in a wiretap network where a wiretapper can access any one of
• Propose an overflow-avoidance network coding based
subsets of wiretap channels [17]. The goal of a secure network
secure storage (NCSS) scheme.
coding scheme for a wiretap network is to ensure a wiretapper
• Analyze the minimum storage cost subject to different
obtains no information about the original message, while all
security levels and derive the upper bound of the amount
the legitimate receivers can decode the message. A network
of encoded data that can be stored in cloud databases to
coding system was built so that a wiretapper cannot obtain any
achieve perfect secrecy.
information [18]. The construction of a secure linear network
• Provide the design guidelines for the appropriate size of
code for a wiretap network was presented in [19].
the encoding matrix so that the network coding process
For network-coded distributed storage systems, the secrecy
can be accelerated.
capacity was used to quantify the secure storage capacity [20]–
The rest of this paper is organized as follows. In Section II, [23]. The secrecy capacity is defined as the maximum amount
we give a literature survey on the related works of network of data that can be securely stored under the perfect secrecy
coding storage systems. In Section III, the overflow problem condition. Perfect secrecy means that the eavesdropper cannot
of a network-coded cloud storage system is formulated. In obtain any information of source data. In other words, perfect
Section IV, we analyze the overflow problem. We present the secrecy requires that the entropy of the plaintext is equal to
overflow-avoidance NCSS scheme in Section V. In Sections the conditional entropy of the plaintext given the eavesdropped
VI and VII, we analyze the security and storage performances data. A network coding scheme for approaching the storage
of the proposed scheme, respectively. Section VIII shows the upper bound under the perfect secrecy was proposed in [20].
experimental results. Finally, we give our concluding remarks Different secure regenerating codes to achieve perfect secrecy
in Section IX. against eavesdropping were reported in [21]–[23].
For secure storage over multiple clouds, the authors of [4]
II. R ELATED W ORK proposed a security protection scheme to prevent eavesdrop-
Network coding is a generalized store-and-forward network pers from decoding any symbol. In [24], a link eavesdropping
routing principle. Messages from different source nodes are problem was investigated in a network-coded cloud storage
system in which transmission links between the local data- • We investigate the overflow problem in chunked network
center and its remote backup site are eavesdropped. In the coding. Through analyses, we show that the number of
considered link eavesdropping problem, the security level is bits to represent a symbol is an important factor related
defined as the probability that coded data cannot be decoded to the overflow problem.
correctly. In addition to eavesdropping attacks, some recent • Different from the previous work [33] considering binary
works [25], [26] investigated how to detect when the coded operation, we extend the performance characterization of
data are modified. chunked network codes using a general finite field.
• The encoding process and the data placement are jointly
designed in the proposed network coding framework in
C. Performance Issue of Network Coding consideration of the storage cost as well as security
Two major challenges for designing a practical network requirements. Extending [34], we consider a probabilistic
coding system include i) the computational cost of encoding security model of a network-coded cloud storage system.
and decoding and ii) the storage cost of coded data. The Our results provide a comprehensive understanding for
destination node can decode the received packet if and only if finding the best combination of coding and storage pa-
the coefficient matrix of the packet is full rank. To decrease rameters.
the probability of receiving linearly dependent packets, the
coding parameters including the field size and the encoding III. S YSTEM M ODEL AND P ROBLEM S ETUP
matrix size are assumed to be large. However, larger value of
coding parameters will lead to higher computational cost [27]. Now we discuss the coding scheme and define the overflow
In addition, to decode a received codeword, the destination problem.
node requires the coding vector which results in additional
packet overhead. Especially, the computation and storage costs A. System Model for NCSS
would be severe for a huge number of input packets [28].
Consider the original base-d data vector b = (b1 , . . . , bn )T ,
To overcome the above issues, it is proposed to separate
where elements bi are independent discrete uniformly dis-
a large file into a number of small chunks to which the
tributed integers over {0, . . . , d − 1}. To securely store b to
network coding is applied [29]. This design is also used in the
multiple cloud databases, network coding scheme that encodes
network coding storage system in which the information bits
symbols by linear transformation is considered in this paper
are divided into groups (chunks) before encoding. However, it
[4].
is still an open issue to jointly optimize the design of chunked
Let an n × n Vandermonde matrix A be the encoding
network codes and chunk transmission scheme [30].
matrix, where [Ai,j ] = (ai−1 j ) and ai are distinct nonzero
elements over a finite field Fq for q = 2k > n. Then a cloud
D. Objective of This Paper user encodes data c = (c1 , . . . , cn )T = Ab and splits the
encoded data into p segments. It is assumed that the cloud
In this paper, we focus on the performance issue of network
user can arbitrarily store any piece of the encoded data to
coding when applying network coding in multiple untrusted
any cloud database. Let c̃i (i = 1, . . . , p) be the encoded data
clouds. The objective of this work is to develop a systematic
vector stored in the i-th cloud database. A legitimate user can
design methodology of a network-coded cloud storage system.
collect c̃i from the cloud databases and obtain the original
Similar methodology for the joint coding and placement prob-
data by performing A−1 c.
lem can be found in [31]–[34]. The authors of [31] considered
We consider the security threat from an eavesdropper having
the relations among the clouds during the encoding process
infinite computing power and the knowledge of encoding
and proposed an encoding-aware data placement scheme to
matrix, but access less than half of the cloud databases [4]. The
achieve throughput gains of encoding operations. An adaptive
objective of the eavesdropper is to guess the original data. The
network coding storage scheme was proposed in [32]. The
considered cloud storage system can support different security
encoding strategy is adjusted according to the transmission
levels in different databases [35]. Define Pei as the probability
conditions (e.g., packet loss rate). However, the storage cost of
that the i-th cloud database is compromised. Also, the cloud
the coded data is not considered. In [33], the authors proposed
user specifies a security requirement Pu , which represents
to encode chunks using binary addition and bitwise cyclic
the maximum probability that an eavesdropper can guess the
shift in order to reduce encoding complexity. It is shown
original data. Next, we will show the overflow problem when
that the optimal tradeoff between storage capacity and repair
distributing encoded symbols to multiple cloud databases.
bandwidth can be achieved. The most relevant one to our
work is [34]. It investigated how to store data reliably in
multiple clouds and provided the optimal amount of data to B. Overflow Problem
be stored in the clouds. The storage cost is shown to be Although network coding scheme can prevent eavesdroppers
highly affected by the potential number of colluding cloud from obtaining the information of the original data [1], the
databases. However, the number of colluding cloud databases length of encoded data in digital format may become larger
in [34] is assumed to be known, which is impractical in than the length of the original data. This phenomenon is called
many applications. Compared with these previous works, our overflow in this paper and is formally defined as follows.
proposed methodology has the following unique features.
k
Definition 1 (Strictly Non-overflow) Let ld (a) be the num- Secondly, we assume that si > log2 d . We take exponentia-
ber of digits that represents a in base d. A piece of encoded tion with base d on both sides and we have dsi > dlogd 2 = 2k
k
data c = (c1 , . . . , cn )T is strictly non-overflow if and only if from (1). Since bi = dsi contradicts the fact that the maximum
ld (ci ) ≤ ld (bi ) for each i. Note that the length of the encoded value of bi is 2k − 1, si = s = logk d .
2
data is equal to that of the plaintext for a strictly non-overflow
encoding process. Theorem 2 The NCSS system is α-bounded non-overflow if
si ≥ α1 logd (2k − 1) for every i.
Definition 2 (α-bounded Non-overflow) Let |c̃i | denote the Proof: Since si = ld (bi ) and ld (ci )max = logd (2k − 1),
number of elements in c̃i . A piece of encoded data c = we have
(c1 , . . . , cn )T is α-bounded Non-overflow if and only if
|c̃i |
X
|c̃i |
X ld (cj ) ≤ |c̃i | logd (2k − 1)
ld (cj ) ≤ |c̃i | αld (bi ) , j=1
j=1 1
=α · |c̃i | logd (2k − 1)
for 1 ≤ i ≤ p. α
≤α |c̃i | si
Assume the encoded data are stored in cloud databases
=α |c̃i | ld (bi ) . (3)
randomly. The increasing cost of storage or computation
resources can be measured by the extension degree α = lldd (ci)
(bi ) .
Table II shows an example for the two different overflow cases Theorems 1 and 2 provide the criteria of selecting the length
with d = 2 and p = 2. The extension degree is bounded by 3 of the plaintext. Next, we discuss the relation between the
in case 2, compared to the strictly non-overflow case 1. Note security requirement and the amount of encoded stored data.
that all the coding operations in the example are performed in
Theorem 3 The NCSS system satisfies the security require-
Galois field GF(23 ), constructed with the primitive polynomial
ment Pu if
P (x) = x3 + x + 1. Table III summarizes the notations used
in this paper. |c̃i | n
X X Pu
ld (c̃i (j)) ≤ ld (ct ) + logd ,
j=1 t=1
P ei
IV. OVERFLOW- AVOIDANCE NCSS S YSTEM
A. Overflow Analysis for 1 ≤ i ≤ p.
Now we analyze the conditions that cause the overflow Proof: Without loss of generality, we consider an
problem of a network-coded cloud storage system. Then we eavesdropper that can access only one of the two cloud
show how the overflow problem can be avoided by selecting databases. Thus, the probability that an eavesdropper can
the proper data length in encoding process. We investigate guess the original data (denoted by Pg ) is the product of the
the conditions of distributing coded data for achieving various intrusion probability of the cloud database and the probability
security levels. Based on the above analysis, we describe the of guessing the remaining encoded digits. It follows that
system design methods of the NCSS scheme. 
n |P
c̃i |

P
The encoding parameters in NCSS is related to the overflow − ld (ct )− ld (c̃i (j))
t=1 j=1
problem. To avoid the overflow problem, the encoding param- Pg =Pei d
eters can be designed according to the following Theorems. ≤ Pei dlogd Pu −logd Pei
Theorem 1 Let si be the number of digits in the base-d = Pu . (4)
plaintext bi and 2k be the Galois field size of encoding
matrix A. Then, the NCSS system is strictly non-overflow if
si = s = logk d .
2
B. Proposed Scheme
k
Proof: First, we assume that si < log2 d . Then, we have
Now we present our proposed overflow-avoidance NCSS
k scheme with the required security level Pu . Our proposed
= klogd 2 = logd 2k . (1) scheme is executed in three steps. First, a dynamic-length
log2 d
alphabet representation of network-coded data is adopted
Because the coding process deals with integers, we have si ≤ based on Theorems 1 and 2. Second, the original data are
logd (2k − 1). Since ci is distributed over 0, . . . , 2k − 1 , preprocessed and regrouped. Third, the regrouped data are
the maximum number of digits used to represent an encoded encoded and distributed to the distributedly located cloud
element is ld (ci )max = logd (2k −1). Furthermore, the number databases.
of digits in bi can be represented as ld (bi ). Thus, we have Figure 2 shows the system flow of the proposed overflow-
si = ld (bi ) ≤ logd (2k − 1) = ld (ci )max . (2) avoidance NCSS scheme. Assume that a cloud user wants to
store a single-digit data array b = (b1 , . . . , bm )T with base d
As a result, the length of encoded data can be larger than the to the p cloud databases. We first choose a power k for the
length of the original data, and the overflow problem occurs. field characteristics according to the following condition:
TABLE II
E XAMPLE OF THE DEFINITIONS FOR OVERFLOW PROBLEM
A b c c̃1 = (c1 , c3 ) c̃2 = (c2 ) strictly non-overflow 3-bounded non-overflow

 
 1 1 1 
 
Case1 (1, 0, 0)T (1, 1, 1)T (1, 1) (1) Yes Yes
 
 1 2 3 
 
 
1 4 5
 
 1 1 1 
 
Case2 (1, 1, 0)T (0, 11, 101)T (0, 101) (11) No Yes
 
 1 2 3 
 
 
1 4 5
TABLE III
N OTATIONS IN THIS PAPER Input data b with base d
Notations Descriptions Decide k for GF(2k) by

b Original data array Condition 1
d Base of bi
ld (a) Number of digits that represents a in base d Dynamic Length Alphabet Representation
A Encoding matrix Strictly non-overflow scheme ߙ-bounded non-overflow scheme
k Use Galois field size 2k for A Decide ‫ݏ‬௜ by Theorem 1 Decide ‫ݏ‬௜ by Theorem 2
n Matrix size of A
p Total number of cloud databases
Generate A of dimension n by
c Encoded data vector Condition 2
c̃i Encoded data vector
that stored in the i-th cloud database Regroup b and get b’
|c̃i | Number of elements in c̃i
si Number of digits in bi
b0 Regrouped data array Encoding: c=Ab’
r Size of b0
Pei Probability of the i-th Generate ࢉ෤ by Theorem 3 with
cloud database being compromised security requirement ܲ௨
Pg Probability that an eavesdropper

Distributedly store ࢉ෤ to
can guess the original data
cloud databases
Pu Security requirement: Maximum probability
that an eavesdropper can guess the original data
Fig. 2. System flow of the overflow-avoidance NCSS scheme.
l Amount of encoded data stored at
a local machine for every encoding operation
m Length of the original message Condition 2
α Number of encoding operations
n < 2k (6)
and
Condition 1 n≤r . (7)
2k ≥ d . (5) Since matrix A is constructed from n distinct elements over

the Galois field, we have n < 2k . In addition, the matrix
multiplication cannot be operated if the size of encoding
The field size must be larger than the maximal value of the
matrix is larger than the size of regrouped data array. We
data array element d−1. Otherwise, some data elements cannot
then encode b0 with A and obtain the encoded data array
be represented in the field. After that, a proper length of data
c = (c1 , . . . , cn )T . Finally, c can be regrouped to c̃ by
elements si can be decided according to Theorems 1 and 2.
Theorem 3, which specifies the maximum amount of encoded
This is called dynamic length alphabet representation. We then
data that can be stored in a cloud database according to user’s
regroup b to b0 = (b1 ...bs1 , bs1 +1 ...bs1 +s2 , · · · , bŝr−1 +1 ...bŝr )
r security requirement. Finally, the elements of c̃ are distributed
∆ P
based on the value of si , where ŝr = si . Next, we generate to the corresponding p cloud databases.
i=1
an n × n encoding matrix A with the following condition: Table IV shows an example of the proposed overflow-
avoidance NCSS scheme in the strictly non-overflow case.
Assume that the original data are b = (0, 0, 1, 0, 1, 1, 1, 0, 1) set of rows from the i-th to the j-th position of matrix
and the encoded data are stored to two cloud databases with D is represented as Di:j . In addition, bi are independent
1
Pe1 = 0.5, Pe2 = 0.25, and Pu = 64 . According to Theorem random variables uniformly distributed over Fq with entropy
1, we have s = 3. Hence, the original data are regrouped to H(bi ) = H(b).
(001, 011, 101) in the dynamic length alphabet representation For simplicity, without loss of generality, assume that t
process. The resulting coded data is (111, 011, 001). Next, contiguous components of the encoded data cp+1:p+t are
from Theorem 3, we can calculate the maximal numbers of stored to the clouds. Then we can obtain
digits that can be stored in the first and the second cloud
H(b(w) )
database are four and five, respectively. As a result, the coded
data stored in the first and the second cloud database are 1110 = H(b(w) |cp+1:p+t ) − H(b(w) |c) (9)
(w) (w)
and 11001, respectively. = I(b ; c) − I(b ; cp+1:p+t )
= H(c) − H(cp+1:p+t ) − H(c|b(w) ) + H(cp+1:p+t |b(w) )
V. S ECURITY A NALYSIS ≤ H(c) − H(cp+1:p+t ) . (10)
In this section, we analyze the proposed overflow-avoidance
NCSS scheme in terms of security level and storage cost. First, In the above equations, (9) holds because of the perfect secrecy
we discuss the issue of enhancing security level from a system criterion and due to the fact that the secret information can be
design aspect. Then, we derive the upper bound on data size reconstructed if the entire codewords are given. In (10), we
that can be stored in the cloud with unconditional security. have H(cp+1:p+t |b(w) ) − H(c|b(w) ) ≤ 0 since
To begin with, from (4) we know that the lower bound of H(c|b(w) ) − H(cp+1:p+t |b(w) ) = H(cp+t+1:n |b(w) , cp+1:p+t ) .
the security requirement Pu is
  Since bi are i.i.d random variables, it follows that
n
P |P
c̃i |
−
H(b(w) ) = H bq(1) , bq(2) , . . . , bq(w)
ld (ct )− ld (c̃i (j))

t=1 j=1
Pei d ≤ Pu . (8)
= wH(b) , (11)
Since ld (ct ) is proportional to the size of Galois field, a larger
where q(j) is the j-th element of a random integer sequence
encoding matrix size n and a large value of power k of
ranged from 1 to n. Because the encoded data vector c
the field characteristics can result in higher security levels.
contains the entire information of b at most, we can obtain
However, enlarging encoding parameters causes higher coding
complexity. Next, we show that the security level can be H(c) ≤ nH(b) . (12)
enhanced to unconditional security level by storing a certain
Moreover, an n × n Vandermonde matrix A is nonsingular
amount of encoded data in the local machine. In the considered
[5]. Thus, the eavesdropper can apply Gaussian elimination
NCSS with eavesdropper, unconditional security is equivalent
to obtain the reduced row echelon form of the submatrix S,
to perfect secrecy, which means that the eavesdropper can get
whose elements are [Si,j ] = [Ai,j ] for p + 1 ≤ i, j ≤ p + t.
no information from the original message [36].
The Eavesdropper Reduced Matrix M can be obtained as
Definition 3 (Perfect Secrecy Criterion [37]) Denote S as
mp1 mpn
 
... | | ...
the random variable associated with the secret data fragments .. .. | I | .. ..
Mp+1:p+t =   ,
 
and E as the random variable associated with the encoded . . t . .
fragments observed by the eavesdropper. The perfect secrecy m1p+t−1
... | | ... mnp+t−1
requires (13)
H(S|E) = H(S) , where the other element of M are the same as A. Hence, the
eavesdropper have t equations to solve n unknown elements.
where H(X) represents the entropy of a random variable X. It implies that
In the worst case, an eavesdropper can access the encoded H(cp+1:p+t ) = tH(b) . (14)
data of all the cloud databases. The following theorem can
be applied to specify the maximal amount of encoded data Substituting (11), (14) and (12) into (10), we obtain
fragments that can be stored in the cloud, while keeping the
tH(b) ≤ nH(b) − wH(b) . (15)
rest of data in a local machine to ensure perfect secrecy.
The above equation shows that we can store at most the
Theorem 4 Assume that w-digit secret information is en-
n−w components of encoded data to the clouds under perfect
coded with (n−w)-digit data b. For both strictly non-overflow
secrecy criterion. For the strictly non-overflow scheme, we
and α-bounded non-overflow schemes, a cloud user can store
n have only one digit in each component of encoded data.
n
P
at most ld (cj ) − w digits of encoded data to the cloud P
j=1 Thus, we can store at most ld (cj ) − w digits of encoded
under the perfect secrecy criterion. j=1
data to the clouds, while keeping the remaining w digits in
Proof: Let e(h) represent a subset containing any h the local machines. However, we may have multiple digits
components of vector e. We denote ei:j as the subvector in each component of encoded data for α-bounded non-
formed from the i-th to the j-th position of vector e. The overflow scheme. Let e(h̃) represent a subset containing any w
TABLE IV
E XAMPLE OF ADOPTING OVERFLOW- AVOIDANCE NCSS SCHEME IN STORING ENCODED DATA TO TWO CLOUD DATABASES
b d k s b0 r n A c c̃
 
 1 1 1 
 
(0, 0, 1, 0, 1, 1, 1, 0, 1) 2 3 3 (001, 011, 101) 3 3 (111, 011, 001) (1110, 11001)
 
 1 2 3 
 
 
1 4 5
fragmentary components of vector e. With at least n unknown Subject to the security requirement Pu , the storage cost
digits, knowing c(w̃) cannot help solve b. As a result, it follows minimization problem can be expressed as
that
min f (n, l) (20a)
I c(w̃) ; b = 0 . (16) s.t. (1 − Pe )p d−αl 6 Pu (20b)
2 6 n 6 2k (20c)
Note that we still have t equations to solve n unknown
l6n (20d)
elements. That is,
α×n×s=m (20e)
H(b(w) |cp+1:p+t , c(w̃) ) = H(b(w) |cp+1:p+t ) . (17) n, l ∈ Z+ , (20f)
Finally, we obtain where s is defined in Theorem 1. An eavesdropper can guess
the original message only if he/she can intrude all the cloud
I cp+1:p+t , c(w̃) ; b(w) = I cp+1:p+t ; b(w) . (18) databases and guess the encoded data in the local machine. It
is observed that the optimization problem is nonconvex even
Consequently, we can select w digits of encoded data from if we relax the noncovex constraints n, l ∈ Z+ . The complete
different w components, i.e., select one digit for each compo- algorithm for solving this optimization problem is given in
nent. These w-digit encoded data can be stored in the local Appendix.
n
machines, while the remaining
P
ld (cj ) − w digits are stored Figure 4 shows the optimal parameter setting for encoding
j=1 matrix size n versus the original message length m for d = 2,
to the clouds. Pe = 0.5, p = 3, and Pu = 10−6 . As the message length
increases, the size of the encoding matrix increases. A smaller
encoding matrix size is preferred if Galois field size is large.
VI. S TORAGE A NALYSIS
Due to the integer constraints in the optimization problem, the
We here analyze the amount of stored encoded data with encoding matrix size increases in a step-like function.
the security requirement in terms of the probability that an Figure 5 shows the storage cost f (n, l) versus message
eavesdropper can obtain the original data. This is because only length m for d = 2, Pe = 0.5, and p = 3. Intuitively, we need
a certain amount of encoded data fragments are stored in the more storage for lower Pu . However, the storage cost with
local machines to enhance the security level, as shown in the various Pu are the same when m exceeds a certain threshold.
previous section. As the required security level increases, the This is because the considered system is in the case of lower
amount of encoded data stored at the local site increases. bound cost (i.e., l = 1). Noteworthily, a larger k can yield a
Let a cloud user keep the length-l encoded data in each smaller lower bound when m > 1000. In general, k ∈ [8, 16]
encoding operation and store the remaining encoded data to p [38]. For m < 1000, it is suggested that k = 8; otherwise,
cloud databases as shown in Fig. 3. We assume all the cloud k = 16.
databases have the same probability of being compromised The amount of stored encoded data l is another important
(i.e., Pei = Pe ) and the security requirement is Pu , which design parameter for the proposed NCSS. In practice, the
specifies the maximum probability that an eavesdropper can NCSS system with large l requires a large memory to store
guess the original message. In addition to the encoded data, all the coding coefficients. Fig. 6 shows the required l under
the encoding matrix is stored at the local site. different Pu . To achieve a higher security requirement, the user
Let m and α be the length of the original message and the needs to store more encoded data in the local site. In addition,
number of encoding operations, respectively. In addition to the l can be reduced up to 80% if a large file is encoded. It is
encoded data, the user needs to keep the encoding matrix for observed that the file size plays a bigger role in determining
decoding. In case of strictly non-overflow storage, the storage l compared to the Galois field size.
cost at the local site is the function of encoding matrix size Noth that we consider a secure network coding system with
n and the amount of stored encoded data l. As a result, the no redundancy as in [1], i.e., n input symbols are encoded
storage space used to store the encoded data and the encoding to n coded symbols, and we need all the n coded symbols
matrix at the local site is to recover the data. As shown in [12], network coding can
achieve optimal storage-bandwidth tradeoff in erasure coded-
f (n, l) = n2 s + αl . (19) distributed storage systems. The proposed scheme can be
Cloud 30
k=8, m=50 MB
Original File k=8, m=500 MB
Split k=16, m=50 MB
Original Symbols 25 k=16, m=500 MB
Encode
Encoded Data
20
l(MB)
15
Encoding Matrix =
10
User
0
10-16 10-14 10-12 10-10 10-8 10-6 10-4 10-2 100
Fig. 3. An illustrative example of a user keeping a certain amount of encoded Pu
data at the local site in order to enhance security protection.
Fig. 6. The amount of stored encoded data l versus security requirement Pu
for different message lengths m.
applied to those with redundancy, such as erasure codes [31]

18
k=16 and regenerating codes [33]. In these cases, n input symbols
k=8
16 are encoded to n + z coded symbols, where the amount of
redundancy z depends on the required reliability level. This
14 can be achieved by using a (n + z) × n coding matrix A in
the proposed network coding scheme.
12
n
10
VII. E XPERIMENTAL R ESULTS
8
Since the encoding process is performed at local machines,
6 processing delay may be the performance bottlenecks. Thus,
it is of importance to investigate the impacts of the system
4 design parameters of a secure network coding scheme on its
0 20 40 60 80 100
m(KB)
delay performance. To implement the user application and
cloud storage, we develop the coding layer and storage layer of
Fig. 4. Optimal parameter setting for encoding matrix size versus message
length under different Galois field sizes 2k .
NCSS. Each original file is associated with the metadata which
includes the coding information (e.g., encoding coefficients).
The goal of our experiments is to explore the encoding per-
formance of the proposed NCSS in terms of the file encoding
time and the storage cost. Our experiments are conducted on a
commodity computer with an Intel Core i5 processor running
140
k=8, Pu=2
10 at 2.4 GHz, 8 GB of RAM, and a 5,400 RPM Hitachi 500
k=16, Pu=210 GB Serial ATA drive with an 8 MB buffer. Table V shows the
120 k=8, Pu=220
k=16, Pu=220
parameters setting for experiments. Note that, in our setting,
100 different cloud databases are geographically separated. Hence,
the presented results are equivalent to those with p clouds, each
f(bits)
80 having numerous databases.
60 TABLE V
PARAMETER S ETTING
40
20 Parameter Value
0 500 1000 1500
m(bits) Original file size 2 MB
Base of bi (d) 2
Fig. 5. Storage cost versus message length for different Galois field sizes Galois field size (k) 8 to 16
2k and security requirement Pu . Number of cloud databases (p) 2 or 3
Probability of the cloud databases being compromised (Pe ) 0.5
Security requirement (Pu ) 10−6
7 0.8
strictly non−overflow (k = 8)
GF(210) strictly non−overflow (k = 16)
0.7
6 GF(25) α−bouned overflow (α=5, k = 8)
α−bouned overflow (α=5, k = 16)
Processing Time (Seconds)
0.6
Processing Time (minute)

5
0.5
4
0.4
3
0.3
2
0.2
1
0.1
0
0 1 2 3 4 5
0
Number of Multiplications (105 times) 20 40 60 80 100 120 140
Matrix Size n
Fig. 7. Processing time versus the multiplication times for different Galois Fig. 8. Comparison of processing time between the strictly non-overflow
fields 2k . and the α-bounded non-overflow schemes versus matrix size n with p = 2.
0.8
strictly non−overflow (n = 15)
We begin by estimating the cost of basic field operation. strictly non−overflow (n = 31)
0.7
Fig. 7 shows the multiplication processing time of the network strictly non−overflow (n = 127)
coding storage system with different sizes of Galois field. α−bounded non−overflow (α=5, n = 15)
0.6
Processing Time (minute) α−bounded non−overflow (α=5, n = 31)
Although the complexity for the network coding is O(n2 ) α−bounded non−overflow (α=5, n = 127)
0.5
modular multiplication, we find that the field size only af-
fects the processing time slightly, which supports our design 0.4
methodology of selecting k. Specifically, it indicates that the
0.3
security level can be enhanced significantly by selecting an
appropriate value of k at a small computational cost. 0.2
To evaluate the computational efficiency of the proposed
0.1
NCSS scheme, we conduct an encoding test using the pro-
posed network coding scheme. Fig. 8 shows the processing 0
8 9 10 11 12 13 14 15 16
time between the strictly non-overflow and the α-bounded Power of GF Characteristic k
non-overflow schemes for 2 MB file with p = 2, where α = 5.

The processing time is longer for a smaller n or k since the Fig. 9. Comparison of processing time between the strictly non-overflow
numbers of encoding times increase. As a result, the system and the α-bounded non-overflow schemes versus power of Galois field
spends more time in I/O operations and fetching data between characteristic k with p = 2.
the kernel and user [10]. Compared to the strictly non-overflow
scheme, the α-bounded non-overflow scheme requires more
computation cost. The α-bounded non-overflow scheme costs cost is the primary concern, k = 16 is recommended based
more than 11 times and 22 times of the processing time on our results.
than that of the strictly non-overflow scheme when k = 16
and 8, respectively. Finally, the best performance is achieved VIII. C ONCLUSIONS
when n > 100 for both non-overflow schemes. Because In this paper, we investigated the overflow problem in a
increasing n results in a larger cost than increasing k, we network coding cloud storage system. The overflow problem
suggest adjusting k to meet the security requirements under causes more storage spaces and increases encoding time.
the condition n = 100. We developed the overflow-avoidance network coding based
Figure 9 compares the processing time of the strictly non- secure storage (NCSS) scheme. A systematic approach for
overflow and the α-bounded non-overflow schemes versus the optimal encoding and storage parameters was provided
the power of Galois field characteristic k. In the figure, the to solve the overflow problem and minimize the storage cost.
strictly non-overflow scheme is preferable to the α-bounded Furthermore, we derived an analytical upper bound on the
non-overflow scheme. Noteworthily, k negligibly affects the maximal allowable stored data in the cloud nodes under perfect
processing time of the strictly non-overflow scheme, but do secrecy criterion. We demonstrated that encoding efficiency
impact the processing time of the α-bounded non-overflow in terms of processing time can be improved by jointly
scheme. designing the encoding and the storage system parameters.
We note that it is shown in [33] that k = 8 is preferred More importantly, we suggested the design guidelines for
from the viewpoint of low computational cost in the case of NCSS to optimize the performance tradeoff among security
m = 4 KB with no eavesdropper. This is consistent to our requirement, storage cost per node, and encoding processing
observation from Figs. 4 and 8. On the other hand, if storage time. This work can be extended to incorporate user budgets
10
and file recovery, which is an interesting topic to study further [24] Y.-J. Chen, L.-C. Wang, and C.-H. Liao, “Eavesdropping prevention for
in the future. network coding encrypted cloud storage systems,” IEEE Transactions
on Parallel and Distributed Systems, vol. 27, pp. 2261–2273, 2016.
[25] H. C. Chen and P. P. Lee, “Enabling data integrity protection in
regenerating-coding-based cloud storage: Theory and implementation,”
R EFERENCES IEEE transactions on parallel and distributed systems, vol. 25, no. 2,
pp. 407–416, 2014.
[1] P. F. Oliveira, L. Lima, T. T. V. Vinhoza, J. Barros, and M. Medard, [26] F. Chen, T. Xiang, Y. Yang, and S. S. Chow, “Secure cloud storage
“Trusted storage over untrusted networks,” IEEE Global Communication meets with secure network coding,” IEEE Transactions on Computers,
Conference, 2010. vol. 65, no. 6, pp. 1936–1948, 2016.
[2] H. C. H. Chen, Y. Hu, P. P. C. Lee, and Y. Tang, “NCCloud: A network- [27] P. Chau, T. D. Bui, Y. Lee, and J. Shin, “Efficient data uploading
coding-based storage system in a cloud-of-clouds,” IEEE Transactions based on network coding in LTE-Advanced heterogeneous networks,”
on Computers, vol. 63, no. 1, pp. 31–44, 2014. IEEE International Conference on Advanced Communication Technol-
[3] F. Chen, T. Xiang, Y. Yang, and S. S. M. Chow, “Secure cloud storage ogy (ICACT), pp. 252–257, 2017.
meets with secure network coding,” IEEE Transactions on Computers, [28] S. Wunderlich, J. A. Cabrera, F. H. P. Fitzek, and M. Reisslein, “Network
vol. 65, no. 6, pp. 1936–1948, 2016. coding in heterogeneous multicore IoT nodes with DAG scheduling
[4] P. F. Oliveira, L. Lima, T. T. Vinhoza, J. Barros, and M. Medard, of parallel matrix block operations,” IEEE Internet of Things Journal,
“Coding for trusted storage in untrusted networks,” IEEE Transactions vol. 4, no. 4, pp. 917–933, 2017.
on Information Forensics and Security, vol. 7, no. 6, pp. 1890–1899, [29] S. Yang and R. W. Yeung, “Batched sparse codes,” IEEE Transactions
2012. on Information Theory, vol. 60, no. 9, pp. 5322–5346, 2014.
[5] A. Klinger, “The Vandermonde matrix,” The American Mathematical [30] B. Tang and S. Yang, “An LDPC approach for chunked network codes,”
Monthly, 1967. IEEE/ACM Transactions on Networking, vol. 26, no. 1, pp. 605–617,
[6] P. Li, S. Guo, S. Yu, and A. V. Vasilakos, “Reliable multicast with 2018.
pipelined network coding using opportunistic feeding and routing,” IEEE [31] R. Li, Y. Hu, and P. P. Lee, “Enabling efficient and reliable transition
Transactions on Parallel and Distributed Systems, vol. 25, no. 12, pp. from replication to erasure coding for clustered file systems,” IEEE
3264–3273, 2014. Transactions on Parallel and Distributed Systems, vol. PP, no. 99, pp.
[7] W. Qiao, J. Li, and J. Ren, “An efficient error-detection and error- 1–1, 2017.
correction scheme for network coding,” IEEE Global Telecommunica- [32] J. Li, Y. Liu, Z. Zhang, J. Ren, and N. Zhao, “Towards green IoT
tions Conference, pp. 1–5, 2011. networking: Performance optimization of network coding based com-
[8] D. Zeng, S. Guo, Y. Xiang, and H. Jin, “On the throughput of two-way munication and reliable storage,” IEEE Access, vol. 5, pp. 8780–8791,
relay networks using network coding,” IEEE Transactions on Parallel 2017.
and Distributed Systems, vol. 25, no. 1, pp. 191–199, 2014. [33] H. Hou, K. W. Shum, M. Chen, and H. Li, “BASIC codes: Low-
[9] Y. Wu and S.-Y. Kung, “Distributed utility maximization for network complexity regenerating codes for distributed storage systems,” IEEE
coding based multicasting: A shortest path approach,” IEEE Journal on Transactions on Information Theory, vol. 62, no. 6, pp. 3053–3069,
Selected Areas in Communications, vol. 24, no. 8, pp. 1475–1488, 2006. 2016.
[10] C. Fragouli and J. L. Boudec, “Network coding: An instant primer,” [34] P. Hu, C. W. Sung, S.-W. Ho, and T. H. Chan, “Optimal coding and
ACM SIGCOMM Computer, vol. 36, no. 1, pp. 63–68, 2006. allocation for perfect secrecy in multiple clouds,” IEEE Transactions on
Information Forensics and Security, vol. 11, no. 2, pp. 388–399, 2016.
[11] Y. Hu, H. Chen, P. Lee, and Y. Tang, “NCCloud: Applying network
[35] M. Barua, X. Liang, R. Lu, and X. Shen, “ESPAC: Enabling security
coding for the storage repair in a cloud-of-clouds,” in Proc. of the 10th
and patient-centric access control for eHealth in cloud computing,”
USENIX Conf. on File and Storage Tech, vol. 1, 2012.
International Journal of Security and Networks, vol. 6, no. 2, pp. 67–76,
[12] A. Dimakis, P. Godfrey, Y. Wu, M. Wainwright, and K. Ramchandran,
2011.
“Network coding for distributed storage systems,” IEEE Transactions
[36] D. Chen, N. Zhang, R. Lu, X. Fang, K. Zhang, Z. Qin, and X. Shen, “An
on Information Theory, vol. 56, no. 9, pp. 4539–4551, 2010.
LDPC code based physical layer message authentication scheme with
[13] S.-J. Lin and W.-H. Chung, “Novel repair-by-transfer codes and system-
prefect security,” IEEE Journal on Selected Areas in Communications,
atic exact-MBR codes with lower complexities and smaller field sizes,”
vol. 36, no. 4, pp. 748–761, 2018.
IEEE Transactions on Parallel and Distributed Systems, vol. 25, no. 12,
[37] J. L. Massey, “An introduction to contemporary cryptology,” Proceed-
pp. 3232–3241, 2014.
ings of the IEEE, vol. 76, no. 5, pp. 533–549, 1988.
[14] Y. Lu, J. Hao, X.-J. Liu, and S.-T. Xia, “Network coding for data- [38] G. Angelopoulos, M. Médard, and A. P. Chandrakasan, “Energy-aware
retrieving in cloud storage systems,” International Symposium on Net- hardware implementation of network coding,” International Conference
work Coding, pp. 51–55, 2015. on Research in Networking, pp. 137–144, 2011.
[15] H. Zhang, H. Li, and S.-Y. Li, “Repair tree: Fast repair for single failure
in erasure-coded distributed storage systems,” IEEE Transactions on
Parallel and Distributed Systems, vol. 28, no. 6, pp. 1728–1739, 2017.
[16] J. Li and B. Li, “Beehive: Erasure codes for fixing multiple failures
in distributed storage systems,” IEEE Transactions on Parallel and
Distributed Systems, vol. 28, no. 5, pp. 1257–1270, 2017.
[17] L. Ozarow and A. Wyner, “Wire-tap channel II,” Advances in Cryptol-
ogy, pp. 33–50, 1985.
[18] N. Cai and R. Yeung, “Secure network coding,” in IEEE International
Symposium on Information Theory, 2002.
[19] N. Cai and R. W. Yeung, “Secure network coding on a wiretap network,”
IEEE Transactions on Information Theory, vol. 57, no. 1, pp. 424–435,
2011.
[20] A. S. Rawat, N. Silberstein, O. O. Koyluoglu, and S. Vishwanath, “Se-
cure distributed storage systems: Local repair with minimum bandwidth
regeneration,” International Symposium on Communications, Control
and Signal Processing, pp. 5–8, 2014.
[21] R. Tandon, S. Amuru, T. C. Clancy, and R. M. Buehrer, “Toward optimal
secure distributed storage systems with exact repair,” IEEE Transactions
on Information Theory, vol. 62, no. 6, pp. 3477–3492, 2016.
[22] A. Agarwal and A. Mazumdar, “Security in locally repairable storage,”
IEEE Transactions on Information Theory, vol. 62, no. 11, pp. 6204–
6217, 2016.
[23] K. Huang, U. Parampalli, and M. Xian, “On secrecy capacity of min-
imum storage regenerating codes,” IEEE Transactions on Information
Theory, vol. 63, no. 3, pp. 1510–1524, 2017.
11
Yu-Jia Chen received the B.S. degree and Ph.D.

degree in electrical engineering from National Chiao
Tung University, Taiwan, in 2010 and 2015, re-
spectively. He is currently a postdoctoral fellow in
National Chiao Tung University. His research inter-
ests include network coding for secure storage in
cloud datacenters, software defined networks (SDN),
and 5G cellular network. Yu-Jia Chen has published
22 conference papers and 6 journal papers. He is
holding three US patent and three ROC patent.
Li-Chun Wang (M’96 – SM’06 – F’11) received

the B.S. degree from National Chiao Tung Univer-
sity, Taiwan, R.O.C. in 1986, the M.S. degree from
National Taiwan University in 1988, and the Ms.
Sci. and Ph. D. degrees from the Georgia Institute of
Technology, Atlanta, in 1995, and 1996, respectively,
all in electrical engineering.
From 1990 to 1992, he was with the Telecommu-
nications Laboratories of Chunghwa Telecom Co. In
1995, he was affiliated with Bell Northern Research
of Northern Telecom, Inc., Richardson, TX. From
1996 to 2000, he was with AT&T Laboratories, where he was a Senior
Technical Staff Member in the Wireless Communications Research Depart-
ment. Since August 2000, he has joined the Department of Electrical and
Computer Engineering of National Chiao Tung University in Taiwan and is
the current Chairman of the same department. His current research interests
are in the areas of radio resource management and cross-layer optimization
techniques for wireless systems, heterogeneous wireless network design, and
cloud computing for mobile applications.
Dr. Wang won the Distinguished Research Award of National Science
Council, Taiwan in 2012, and was elected to the IEEE Fellow grade in 2011
for his contributions to cellular architectures and radio resource management
in wireless networks. He was a co-recipient(with Gordon L. Stuber and Chin-
Tau Lea) of the 1997 IEEE Jack Neubauer Best Paper Award for his paper
“Architecture Design, Frequency Planning, and Performance Analysis for a
Microcell/Macrocell Overlaying System,” IEEE Transactions on Vehicular
Technology, vol. 46, no. 4, pp. 836-848, 1997. He has published over 200
journal and international conference papers. He served as an Associate Editor
for the IEEE Trans. on Wireless Communications from 2001 to 2005, the
Guest Editor of Special Issue on ”Mobile Computing and Networking” for
IEEE Journal on Selected Areas in Communications in 2005, ”Radio Resource
Management and Protocol Engineering in Future Broadband Networks”
for IEEE Wireless Communications Magazine in 2006, and ”Networking
Challenges in Cloud Computing Systems and Applications,” for IEEE Journal
on Selected Areas in Communications in 2013, respectively. He is holding 10
US patents.

An Overflow Problem in Network Coding For Secure Cloud Storage

Încărcat de

Informații document

Descriere originală:

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

An Overflow Problem in Network Coding For Secure Cloud Storage

Încărcat de

Drepturi de autor:

Formate disponibile

This article has been accepted for publication in a future issue of this journal, but has not been

An Overflow Problem in Network Coding for

Abstract— In this paper, we present the overflow problem of Source

I. I NTRODUCTION Encoded Data

A b c c̃1 = (c1 , c3 ) c̃2 = (c2 ) strictly non-overflow 3-bounded non-overflow

Notations Descriptions Decide k for GF(2k) by

Pg Probability that an eavesdropper

2k ≥ d . (5) Since matrix A is constructed from n distinct elements over

applied to those with redundancy, such as erasure codes [31]

80 having numerous databases.

Processing Time (minute)

non-overflow schemes for 2 MB file with p = 2, where α = 5.

Yu-Jia Chen received the B.S. degree and Ph.D.

Li-Chun Wang (M’96 – SM’06 – F’11) received

S-ar putea să vă placă și