Documente Academic
Documente Profesional
Documente Cultură
64,
NO. 9,
SEPTEMBER 2015
2691
INTRODUCTION
D. Chakraborty is with Computer Science Department, CINVESTAVIPN, 2508 Av. IPN, San Pedro Zacatenco, Mexico City 07360, Mexico.
E-mail: debrup@delta.cs.cinvestav.mx.
C. Mancillas-L
opez is with Hubert Curien Laboratory, UMR 5516 CNRS,
Universite Jean Monnet, Saint-Etienne, France.
E-mail: cuauhtemoc.mancillas.lopez@univ-stetienne.fr.
P. Sarkar is with Applied Statistics Unit, Indian Statistical Institute, 203
B.T Road, Kolkata 700108, India. E-mail: palash@isical.ac.in.
Manuscript received 17 June 2013; revised 12 Feb. 2014; accepted 1 Oct. 2014.
Date of publication 3 Nov. 2014; date of current version 12 Aug. 2015.
Recommended for acceptance by F. de Dinechin.
For information on obtaining reprints of this article, please send e-mail to:
reprints@ieee.org, and reference the Digital Object Identifier below.
Digital Object Identifier no. 10.1109/TC.2014.2366739
0018-9340 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
2692
VOL. 64,
NO. 9,
SEPTEMBER 2015
CHAKRABORTY ET AL.: STES: A STREAM CIPHER BASED LOW COST SCHEME FOR SECURING STORED DATA
PRELIMINARIES
2693
(1)
(2)
2694
h1
h2
hb
9
M1 K1 M2 K2 M3 K3 M4 K4 >
>
>
>
Mm1 Km1 Mm Km
>
>
>
M1 K3 M2 K4 M3 K5 M4 K6 >
=
(3)
Mm1 Km1 Mm Km2
>
>
>
>
>
>
>
M1 K2b1 M2 K2b
>
;
Mm1 Km2b3 Mm Km2b2
NO. 9,
SEPTEMBER 2015
where
VOL. 64,
CONSTRUCTION OF STES
CHAKRABORTY ET AL.: STES: A STREAM CIPHER BASED LOW COST SCHEME FOR SECURING STORED DATA
2695
Fig. 1. STES: A TES using SC and MLUH. The -bit string fStr is a parameter to the whole construction. The length of the IV of SC is and the data
path of MLUH is d.
2696
VOL. 64,
NO. 9,
SEPTEMBER 2015
and the hash functions. There is one call to the stream cipher
from the main body of the algorithm to generate the hash
keys and the other two calls are part of the function Feistel.
It is to be noted that in real life stream ciphers are quite fast
in generation of the outputs, but when a stream cipher is
called on different initialization vectors then there is a significant time required for initialization. The three calls to
the stream cipher required in STES are all on different initialization vectors. Hence, stream cipher initializations
occupy a significant amount of the time required for STES.
The hash functions MLUH and PD can be implemented very efficiently in hardware with a proper choice
of the data path d. The choice of d dictates the amount of
parallelism possible. Recall that the main goal of the construction is to enable a hardware realization which uses
small amount of hardware resources. A proper choice of
the stream cipher and the data path can help in realizing
a circuit with adequate throughput but with a small
hardware footprint. These issues are discussed in details
in Section 5 where we demonstrate that STES meets the
expected efficiency requirements both in terms of time
and circuit size.
Parallelism. There exists ample scope to exploit parallelism in the construction of STES. In the hardware implementation that we present later, we decided to use two
stream cipher cores, our specific design choices give rise
to an architecture where decryption is a bit faster than
encryption. This characteristic of the design has some
good practical implications, as read speeds in memory
are faster than the write speeds and it is expected that a
typical block would be read many more times than it
would be written.
SECURITY OF STES
CHAKRABORTY ET AL.: STES: A STREAM CIPHER BASED LOW COST SCHEME FOR SECURING STORED DATA
fK
Advprf
) 1 PrAf ) 1:
f A PrA
(4)
Let Advprf
f t; q; s be the supremum of the advantages of
all adversaries running in time t, making q queries and providing a total of s bits in all its queries. The quantity s is
called the query complexity. Note that Advprf
f t; q; s is
always positive even though the quantity defined in (4) can
sometimes be negative. The value of Advprf
f t; q; s is called
the PRF-bound for f.
2697
p
rp
AdvTES
A PrAEK ;DK ) 1 PrAP;P
1
) 1:
(5)
p
f
rp
Define Adv
TES t; q; s to be the supremum of the advantages of all adversaries which run in time t, make a total of q
queries and send a total of s bits in all the queries. Security
of a scheme against an adversary which has access to both
the encryption and the decryption oracles is called security
as a strong pseudo-random permutation.
Suppose now that the oracles EK and DK are replaced by
two oracles which return independent and uniform random
strings on any input. More precisely, if T; X is a query to
EK , then an independent and uniform random string of
length equal to the length of X is returned; if T; Y is a
query to DK , then similarly, an independent and uniform
random string of length equal to the length of Y is returned.
Let PrA$;$ ) 1 be the probability that A outputs 1 after
such interaction. The advantage of A is defined as follows:
EK ;DK
) 1 PrA$;$ ) 1:
Adv
rnd
TES A PrA
(6)
Define Adv
rnd
TES t; q; s to be the supremum of all advantages
of adversaries which run in time t, make a total of q queries
and send a total of s bits in all the queries.
The
rnd and
g
prp advantages are related as follows:
f
p
rp
A Adv
rnd
AdvTES
TES A
q 1
:
2 2
(7)
p
f
rp
0
AdvSTESSC;H;fStr
t; q; s Advprf
SC t ; q; s
9q2
:
21
(8)
2698
TABLE 1
Specific Parameters Used to Implement Trivium, Grain128 and
Mickey128 2.0
Stream cipher
Trivium
Grain128
Mickey128-2.0
jIV j (bits)
80
96
96
jKj(bits)
80
128
128
Field
NO. 9,
SEPTEMBER 2015
TABLE 2
Irreducible Polynomials
Data paths
used (bits)
1; 4; 8; 16; 40
1; 4; 8; 16; 32
1
VOL. 64,
Irreducible Polynomial
GF 2
GF 28
GF 216
GF 220
GF 232
GF 240
4
x4 x 1
x8 x4 x3 x 1
x16 x5 x3 x 1
x20 x3 1
x32 x7 x3 x 1
x40 x5 x4 x3 1
was taken to make the design suitable for STES. Note that in
case of PD each multiplication requires two message and
key blocks which is not the case in MLUH. As the message
and key blocks are generated on the fly hence this design
helps in preventing data stalls in the circuit. This issue is
discussed in more details in Section 5.2.
Target FPGA. We target our designs for Xilinx Spartan 3
and Lattice ICE40 FPGAs. The rationale is that these are
considered suitable for implementing hardware/power
constrained designs. Moreover, they are cheap and one can
consider deploying these FPGAs directly into a commercial
device. In particular, in Spartan III the LUTs within SLICEM
can be used as a 16 1-bit distributed RAM or as a 16-bit
shift register (SRL16 primitive). This functionality of a 16-bit
shift register has been previously exploited to achieve compact implementations of stream ciphers [15]. We follow this
suggestion.
The ICE40 FPGAs does not provide such functionalities.
On the other hand, their architectural design specifically
supports low power implementations. Our experimental
results also suggest that they are much more competitive in
this respect compared to Spartan 3 devices. To the best of
our knowledge, there are no previous work reporting
stream cipher implementation on this class of FPGAs.
9
M1 K1 M2 K2 . . . Mm Km
>
>
=
M1 K2 M2 K3 . . . Mm Km1
>
>
;
M1 Kb M2 Kb1 . . . Mm Kbm1:
(9)
CHAKRABORTY ET AL.: STES: A STREAM CIPHER BASED LOW COST SCHEME FOR SECURING STORED DATA
2699
2700
VOL. 64,
NO. 9,
SEPTEMBER 2015
TABLE 3
Performance of MLUH on Spartan 3 and Lattice ICE40 for Hashing 4,016 bits to 80 bits
Primitive
MLUH-1b
MLUH-4b
MLUH-8b
MLUH-16b
MLUH-40b
SPARTAN 3
Slices
158
247
452
737
1,410
Freq.
(MHz)
215.11
183.76
177.46
175.24
173.89
Pipeline
Stages
0
0
1
2
3
LATTICE ICE40
Thrghpt
(Mbps)
210.90
720.68
1,389.24
2,738.38
6,625.63
Logic
Cells
325
419
772
1,638
2,756
Freq.
(MHz)
189.78
179.50
170.68
172.89
161.49
Pipeline
Stages
0
0
1
2
3
Thrghpt
(Mbps)
186.07
703.98
1,336.16
2,701.66
6,153.17
TABLE 4
Performance of PD on Spartan 3 and Lattice ICE40 for Hashing 4016 bits to 80 bits
Primitive
PD-4b
PD-8b
PD-16b
PD-40b
SPARTAN 3
Slices
266
364
710
922
Freq.
(MHz)
172.65
167.50
169.87
172.18
Pipeline
Stages
0
0
1
2
LATTICE ICE40
Thrghpt
(Mbps)
664.14
1,288.66
2,603.80
6,498.82
Logic
Cells
490
748
1,023
1,832
Freq.
(MHz)
169.03
162.85
165.71
159.29
Pipeline
Stages
0
0
1
2
Thrghpt
(Mbps)
650.22
1,252.88
2,540.04
6,012.30
CHAKRABORTY ET AL.: STES: A STREAM CIPHER BASED LOW COST SCHEME FOR SECURING STORED DATA
2701
2)
3)
4)
The MLUH constructed with 8-bit multipliers as discussed in Section 2.2. In the diagram this component
is labeled MLUH.
Two stream cipher cores labeled SC1 and SC2.
Two 80-bit registers RegH1 and RegH2 which are
used to store the output of MLUH.
Four registers labeled regF1, regF2, regKh and regb.
All these registers are 80 bits long and are formed by
ten registers each of eight bits connected in cascade,
2702
EXPERIMENTAL RESULTS
VOL. 64,
NO. 9,
SEPTEMBER 2015
6.1 Primitives
In Tables 3 and 4 we show the performance results of
MLUH and PD implementations. In the tables, MLUH-db
represents an MLUH with a d-bit data path. And PD-db
represents a PD construction with d-bit data path and muld
tipliers in GF 22 (see the architecture discussed in Section
5.2). For all the cases shown in the tables, we consider
hashing a message of 4,016 bits to 80 bits. We also computed MLUH and PD considering the output to be 96 bits,
these results are similar and hence are not shown. Note
that as we consider only datapaths which divides the output length hence for 80-bit outputs (which are used with
Trivium) we did not implement the hash functions with
32-bit datapath, similarly for 96-bit outputs (which are
used with Grain) we did not implement the versions with
40-bit datapath.
It is clear from the tables that in both Spartan 3 and ICE40
with the increase in data path the throughput increases at
the cost of area. For MLUH-1b, we obtain a very high frequency, as in this case the multiplier is only an AND gate.
As the data path increases, the complexity of the circuit
implementing the multiplication grows which increases the
critical path of the circuit. For 16 and 40-bit implementations
we break the critical path of the multiplier by dividing it
into balanced pipeline stages, the number of pipeline stages
were carefully selected to maintain a high operating frequency. This is the reason why all 8, 16 and 40-bit implementations operate on similar frequencies on both Spartan 3
and ICE40.
In Table 4, the performance of PD is shown. For 4-bit
data paths, the size of PD is a bit bigger than MLUH but
for bigger data paths the size of PD is smaller. This conforms to the argument that we provided in Section 5.2. As
explained in Section 5.2, the number of cycles required to
compute PD is also marginally more than that of MLUH.
Moreover, due to the more complex circuitry of PD it operates lower frequencies and thus achieves lesser throughput than MLUH.
In case of both MLUH and PD we can see that the number of logic cells required for ICE40 FPGA is almost double
than the slices required in Spartan 3. It is to be noted that a
logic cell in ICE40 has much lesser components than in a
Spartan 3 slice, which explains the difference in area in the
two families. Moreover, the ICE40 implementations operate
at a little lower frequencies compared to the Spartan 3
implementations, this can also be explained by the fact that
CHAKRABORTY ET AL.: STES: A STREAM CIPHER BASED LOW COST SCHEME FOR SECURING STORED DATA
2703
TABLE 5
Performance of the Stream Ciphers in Spartan3 and Lattice ICE40
Primitive
Trivium-1b
Trivium-4b
Trivium-8b
Trivium-16b
Trivium-40b
Grain128-1b
Grain128-4b
Grain128-8b
Grain128-16b
Grain128-32b
Mickey128-1b
SPARTAN 3
Slices
49
120
148
203
278
67
175
232
320
490
182
Freq.
(MHz)
201.02
197.38
193.49
189.32
187.83
193.00
182.54
178.80
179.73
173.58
202.80
Setup
cycles
1,232
308
154
77
31
384
96
48
24
12
288
LATTICE ICE40
Thrghpt.
(Mbps)
201.02
789.52
1,547.93
3,029.16
6,010.60
193.00
730.17
1,430.42
2,875.64
5,554.56
202.80
Logic
Cells
313
329
347
398
530
297
360
434
592
997
420
Freq.
(MHz)
190.26
188.54
186.13
176.10
166.73
192.59
162.41
145.73
137.72
136.84
169.74
Setup
cycles
1,232
308
154
77
31
384
96
48
24
12
288
Thrghpt.
(Mbps)
190.26
754.15
1,489.04
2,817.60
6,669.20
192.58
649.64
1,165.84
2,203.53
4,378.88
169.74
2704
VOL. 64,
NO. 9,
SEPTEMBER 2015
TABLE 6
Performance of STES on Spartan 3
Mode
STES[Trv,ML]-1b
STES[Trv,ML]-4b
STES[Trv,PD]-4b
STES[Trv,ML]-8b
STES[Trv,PD]-8b
STES[Trv,ML]-16b
STES[Trv,PD]-16b
STES[Trv,ML]-40b
STES[Trv,PD]-40b
STES[Grn,ML]-1b
STES[Grn,ML]-4b
STES[Grn,PD]-4b
STES[Grn,ML]-8b
STES[Grn,PD]-8b
STES[Grn,ML]-16b
STES[Grn,PD]-16b
STES[Grn,ML]-32b
STES[Grn,PD]-32b
STES[Mky,ML]-1b
Cycles
Area
Freq.
(Slices) (MHz)
528
683
714
1,050
964
1,346
1,249
2,433
2,150
591
834
876
1,249
1,297
1,657
1,598
2,729
2,595
835
160.15
159.60
158.56
160.26
155.14
158.78
154.70
154.34
153.65
159.61
159.26
154.11
159.79
153.87
158.78
153.90
155.82
149.50
152.92
enc
dec
13,601
3,401
3,481
1,705
1,745
859
879
353
361
10,337
2,585
2,681
1,297
1,345
655
679
332
344
9,953
12,369
3,093
3,173
1,551
1,591
782
802
323
331
9,953
2,489
2,585
1,249
1,297
631
655
320
332
9,665
Throughput
(Mbps)
enc
dec
48.18
192.03
186.39
384.62
363.80
756.90
720.17
1,789.12
1,741.65
63.18
252.11
235.21
504.13
468.13
991.95
927.48
1,920.57
1,778.35
62.87
52.98
211.14
204.48
422.81
399.01
831.43
789.32
1,960.14
1,904.11
65.62
261.83
243.95
523.50
485.46
1,029.67
961.46
1,992.59
1,842.63
64.74
TPA
TPP
Static Dyn. Total
(enc)
power power power
(mW) (mW) (mW) (Mbits/J)
enc
dec
22.30
68.71
63.80
89.52
92.23
137.42
140.91
179.71
197.96
26.13
73.87
65.62
102.67
100.70
146.30
141.84
171.98
167.47
18.40
24.52
75.55
69.99
98.41
101.15
150.95
154.44
196.88
216.43
27.13
76.72
68.06
106.61
104.43
151.86
147.03
178.43
173.53
18.95
54
60
61
61
61
62
62
102
102
58
60
60
61
60
62
62
64
63
70
51
57
70
94
93
191
196
256
240
32
41
36
108
96
220
212
360
356
52
105
117
131
155
154
253
258
358
342
90
101
96
169
156
282
274
424
419
132
458.8
1,641.2
1,422.8
2,481.4
2,362.3
2,991.7
2,791.3
4,997.5
5,092.5
702.0
2,496.1
2,450.1
2,983.0
3,000.8
3,517.5
3,384.9
4,529.6
4,244.2
476.2
STES[SC,Hash]-db denotes STES of data path d instantiated with stream cipher SC and hash function Hash. Trv: Trivium, Grn: Grain128, Mky: Mickey128.
1
ML: Multilinear universal hash, PD: Pseudo Dot Product. TPA : Throughput per area, is computed as AreaTime
, where Area is in slices and Time in microseconds for encrypting/decrypting 512 bytes. TPP: Throughput per power, is shown for encryption only.
TABLE 7
Performance of STES in Lattice ICE40
Mode
STES[Trv,ML]-1b
STES[Trv,ML]-4b
STES[Trv,PD]-4b
STES[Trv,ML]-8b
STES[Trv,PD]-8b
STES[Trv,ML]-16b
STES[Trv,PD]-16b
STES[Trv,ML]-40b
STES[Trv,PD]-40b
STES[Grn,ML]-1b
STES[Grn,ML]-4b
STES[Grn,PD]-4b
STES[Grn,ML]-8b
STES[Grn,PD]-8b
STES[Grn,ML]-16b
STES[Grn,PD]-16b
STES[Grn,ML]-32b
STES[Grn,PD]-32b
STES[Mky,ML]-1b
Area Freq.
(Logic (MHz)
Cells)
1,687
1,934
1,991
2,386
2,434
3,015
2,935
6,607
5,121
2,050
2,147
2,390
2,680
2,797
3,788
3,682
5,405
5,215
2,794
149.31
140.10
136.50
141.62
135.80
142.27
136.15
139.74
132.50
145.79
138.84
136.56
135.75
135.80
136.15
136.87
135.29
132.50
143.05
Cycles
enc
dec
13,601
3,401
3,481
1,705
1,745
859
879
353
361
10,337
2,585
2,681
1,297
1,345
655
679
332
344
9,953
12,369
3,093
3,173
1,551
1,591
782
802
323
331
9,953
2,489
2,585
1,249
1,297
631
655
320
332
9,665
Throughput
(Mbps)
enc
dec
44.92
168.56
160.45
339.89
318.44
677.72
633.81
1,619.87
1,501.91
57.71
219.78
208.43
428.29
413.16
850.57
824.84
1,667.49
1,576.13
58.81
49.39
185.35
176.03
373.63
349.27
744.46
694.67
1,774.72
1,642.00
59.94
228.26
216.17
444.74
428.44
882.93
855.07
1,730.02
1,633.10
60.56
TPA
enc
dec
6.51
21.30
19.69
34.81
318.45
54.93
52.77
59.92
71.67
6.88
25.02
21.31
39.05
36.10
54.87
54.75
75.39
73.86
5.14
7.16
23.42
21.61
38.27
349.27
60.34
57.84
65.64
78.36
7.15
25.98
22.10
40.55
37.43
56.96
56.75
78.22
76.53
5.30
32.93
48.57
50.87
57.77
59.75
88.51
85.74
135.54
120.41
54.05
54.38
66.07
59.64
70.89
104.53
107.51
171.20
159.20
44.65
33.89
48.73
50.24
57.93
59.91
88.68
85.90
135.70
120.57
54.21
54.54
66.23
59.80
71.05
104.96
107.67
171.36
159.36
44.81
1,325.4
3,459.0
3,193.6
5,867.2
5,315.3
7,642.3
7,378.4
11,937.1
12,456.7
1,064.5
4,029.7
3,147.0
7,162.0
5,815.0
8,103.7
7,660.8
9,730.9
9,890.3
1,312.4
STES[SC,Hash]-db denotes STES of data path d instantiated with stream cipher SC and hash function Hash. Trv: Trivium, Grn: Grain128, Mky: Mickey128.
1
ML: Multilinear universal hash, PD: Pseudo Dot Product. TPA : Throughput per area, is computed as AreaTime
, where Area is in logic blocks and Time in
microseconds for encrypting/decrypting 512 bytes. TPP: Throughput per power, is shown for encryption only.
CHAKRABORTY ET AL.: STES: A STREAM CIPHER BASED LOW COST SCHEME FOR SECURING STORED DATA
2705
TABLE 8
TES Implemented Using AES-128 and a Polynomial Hash on Different Platforms
Mode
TES-AESs-1s
Spartan 3
TES-AESs-4s
Spartan 3
TES-AESp-4s
Virtex 5
TES-sAESs-1s
Spartan 3 *
Slices
Cycles
Frequency
(MHz)
45.58
Throughput
(Mbps)
480.70
B-RAMS
11
Static power
(mW)
191
Dynamic
power (mW)
182
Total
power (mW)
372
6,170
388
6,389
403
74.07
752.10
11
192
245
437
4,902
111
287.44
10,596.44
1,243
2,276
3,519
2,800
2,694
71.51
108.62
TES-AESs-1s: Sequential AES with one fully parallel 128-bit multiplier, TES-AESs-4s: Sequential AES with one four-stage pipelined 128-bit multiplier, TESAESp-4s: 10-stage pipelined AES encryption core with four-stage pipelined 128-bit multiplier. TES-sAESs-1s (estimation): A small AES (167 slices, three block
RAMs, with latency of 42 cycles per block) and four 32-bit multipliers.
CONCLUSION
2706
VOL. 64,
NO. 9,
SEPTEMBER 2015
TABLE 9
Recommendations Corresponding to Speed Rates of SD Cards and USB Memories of Different Types Which
Are Currently Sold by ADATA
Read Speed
(Mbps)
Class 4
Class 10
Class 10 Premier
Class 10 Premier Pro
UE700 USB 3.0
S102 USB 3.0
C103 USB 3.0
32
160
400
760
1600
800
560
Write Speed
(Mbps)
32
128
264
360
800
400
80
Recommendation
Spartan 3
STES[Trv,ML]-1b
STES[Trv,ML]-4b
STES[Trv,ML]-8b
STES[Trv,PD]-16b
STES[Trv,PD]-40b
STES[Trv,ML]-16b
STES[Trv,PD]-16b
ICE40
STES[Trv,ML]-1b
STES[Trv,ML]-4b
STES[Grn,ML]-8b
STES[Grn,PD]-16b
STES[Trv,PD]-40b
STES[Grn,PD]-16b
STES[Trv,PD]-16b
Our recommendations corresponds to the smallest design which provides the required performance.
ACKNOWLEDGMENTS
The authors thank the reviewers for their careful reading of
the paper and providing useful comments. Debrup Chakraborty acknowledges the support from project 166763 funded
by Consejo Nacional de Ciencia y Technologa (CONACyT),
Mexico.
REFERENCES
[1]
CHAKRABORTY ET AL.: STES: A STREAM CIPHER BASED LOW COST SCHEME FOR SECURING STORED DATA
2707