Sunteți pe pagina 1din 4

The Design of Low-area 32-bit AES

Encryption/Decryption System on FPGA



Wattanit Hortrakool, AAI AlHarbiy, Ji Song, Xiao-Yang Ji, Yu-Ou Jiang

Abstract In many papers, FPGA design for the


Advanced Encryption Standard (AES) Rijndael
algorithm mainly focused on the high throughput that
is up to twenty gigabit per second (Gbps). While there
are few application need high throughput, instead, the
low cost and low area are more suitable. This paper
indicates a 32-bit core architecture which occupies
only 288 slices in Spartan-3 device and provide the
throughput upto 195 Mbps.
Index Terms Advanced Encryption Standard
(AES), Field Programmable Gate Array (FPGA),
Encryption, decryption, and low area.

I. INTRODUCTION
This coursework objective is to design a
encryption and decryption unit using the Advanced
Encryption Standard (AES) algorithm and
implement the system on Field Programmable Gate
Array(FPGA) board.
National Institute of Standards and
Technology replacing propose AES of Rijndael
cipher algorithm on 2001. It is a new digital
encryption standard that replace Digital Encryption
Standard (DES). Moreover, it is a Symmetric Key
Cryptosystem that means the encryption and
decryption use the same key ciphers. This
algorithm could use the 128, 192 and 256 bits as
the block ciphers size on 128-bit data block, and it
is more flexible, security and effective in the
cryptography [1].
Recently, the low area consumption of AES are
applied in Wireless Local Area Networks (WLAN),
Wireless Personal Area Networks (WPAN),
Wireless Sensor Networks (WSN) or other fields
[2]. Typically, AES algorithm with loop-unrolled
and 128-bit data path is high-speed design, but the
consumption and area also remain high. Reducing
the data-path from 128-bit to 32-bit could decrease
the slices of area, thus the 32-bit data path is
applied in our design with 128-bit ciphers length.
This paper is organized as follows. In section 2
indicates an overview of AES. Section 3 presents
our the 32-bit low-area architecture. The
implementation result and comparison with other
works are shown in Section 4. Finally, Section 5
makes a conclusion of this paper.
II. ADVANCE ENCRYPTION STANDARD
The Advance Encryption Standard is a round-
based symmetric bloc cypher algorithm. AES uses
a cipher key of length 128, 192, or 256 bits to
encrypt or decrypt the data block of 128 bits [3].
The number of iteration round Nr depends on the
size of key, which are 10, 12, 14 rounds
respectively. In each round between 1 to Nr-1,
there are four basic operations which are SubByte,
ShiftRow, MixColumn and AddRoundKey. Each
128-bit data block called state. SubByte is a
nonlinear byte substitution, uses a substitution table
(Sbox) to operate on each byte of the state
independently. ShiftRow circularly shifts different
numbers of bytes on the row of state. MixColumn
mixes the bytes in the columns using the
multiplication of the state with a polynomial
modulo. Finally, AddRoundKey is an XOR process,
adding a round key from Key Expansion unit to the
state in each iteration. The encryption and
decryption flow diagrams are shown in Figure 1.
(a) Encryption (b) Decryption
Figure 1 128-bit AES Encryption/Decryption flow diagram.

Normaltext
AddRoundKey
SubByte
ShiftRow
MixColumn
AddRoundKey
SubByte
ShiftRow
AddRoundKey
CypherText
Normaltext
AddInvRoundKey
InvSubByte
InvShiftRow
InvMixColum
InvSubByte
InvShiftRow
AddInvRoundKey
Normaltext
AddInvRoundKey
In thi
low-area
implemen
device (X
design u
However
192-bit a
architectu
A. SubBy
In thi
(shown
2048x9 d
These Bl
LUT for
the Block
each row
wide, th
connecte
whereas
encryptio
BRAMs
operation
The use
occupied
computin
throughp
presented
B. ShiftR
The d
(shown in
16-bit Sh
shift regi
because
shift reg
means th
because
capable t
slice of S
III. PROPOS
s section, ou
AES system
nted on the
XC3S50). Th
using 32-bit
r, this system
and 256-bit k
ure of this sys
yte and Invese
is design, the
in Figure 3)
dual-port Bloc
lock RAMs a
SBox and In
k RAMs prov
w. The address
he first 8 bit
d to the in
the 9
th
bit
on/decryption
can perform
n for 4 bytes, w
of these BR
d by 4 sets of
ng 4 SBox
put especially
d in the system
Row and Inver
design of S
n Figure 4) ar
hift Registers(
isters are grou
of 32-bit data
ister to hand
here are 32 re
SRL16 requi
to put 2 regis
Spartan 3 cont
ED ARCHITEC
ur designed
m is presented
e smallest X
his design is t
datapath wit
m can be rede
key as well.
stem is shown
eSubByte
e SubByte a
are implemen
ck RAMs(RA
are treated as
nvSBox. Each
vides the Sub
s of the Block
ts address of
nput data fro
is used to
LUT. Altog
m SubByte a
which is 1 co
RAM halp re
combinationa
xes, as wel
when there i
m.
rseShiftRow
ShiftRow and
re implemente
(SRL16) and
uped as a byte
apath- There
dle the data f
egisters prese
ires only 1 L
ster into a sin
tains 2 LUTs).
Figur
TURE
architecture f
d. Our design
Xilinx Sparta
the round-bas
th 128-bit ke
esign easily f
The high-lev
in Figure 2.
and InvSubBy
nted using tw
AMB16_S9_S
ROMs to sto
h 8-bit output
bByte result f
k RAM is 11-
f each port a
om every ro
select betwe
gether, these
and InvSubBy
olumn, at a tim
educe the slic
al logic used f
ll as increa
is no pipelini
d InvShiftRo
ed using a gro
multiplexers.
e. In this desig
are 4 groups
for one colum
ented. Howev
LUT, there a
ngle slice (CL
.
re 2 High-level ar
for
n is
an3
sed
ey.
for
vel
yte
wo
9).
ore
of
for
-bit
are
ow
een
2
yte
me.
ces
for
ase
ing
ow
oup
. 8
gn-
of
mn,
ver,
are
LB
op
m
si
ef
si
co
th
In
In
In
In
In
rchitecture
Figure 3 Im

The output
peration is
multiplexers us
mple calcula
fficiently cont
gnal. As a re
omponent occ
he input data a
nvShiftRow.

Figure 4 D

Byte1in
Byte2in
Byte3in
Byte4in
E/D
nput row 1
nput row 2
nput row 3
nput row 4
O
mplementation of
of ShiftRo
done by co
sing 4-state Fi
ation, each
trolled using
sult, the Shift
cupies only 18
and the output
Design of ShiftRo
Addr
BRAM
Addr
Addr
BRAM
Addr
utput row 1-4
SubByte/InvSub
ow and Inv
ontrolling th
inite State Ma
multiplexer
only single b
tRow and Inv
8 slices. Table
ut data of Shif
ow and InvShiftR
Ou
Byte4
Byte1
Byte2
Byte3
8

Byte
vShiftRow
he 7-to-1
achine. By
can be
bit control
vShiftRow
e 1 shows
ftRow and
Row
utput
4

2
3
8xSRL16
8xSRL16
8xSRL16
8xSRL16
C. MixC
Galois
column tr
column
paramete
calculate
and decry

The m
a constan
c(x)
While
by a fixed
=
This e
implemen
of {0b}
implemen
where
c(x)
Howev
InvMixC
results in
using the
different
Figure
As sh
InvMixC
the logic
number o
Instead
apply aft
be seen,
each of
Moreove
mean onl

0 4
1 5
2 6
3 7
(a)In
Table 1 Result o
Column and In
s Field multip
ransformation
are represen
er in GF ( 2
d by function
yption. The fo
o(x) = o
3
x
3
mix column m
nt polynomial
) = {uS]x
3
+
, in the decry
d polynomial
J(x)
= {ub]x
3
+ {u
equation of in
nted owing to
, {0d}, {09
nt as follow.
J(x) = c(x
= {u8]x
3
+{
(x) =
ver, this
Column is very
n large circui
e method me
method follow
5 Implementation
hown in Figu
Column in this
c and resourc
of slices occup
c(x)
c(x)
J(x)
2
=
d of comput
er c(x)in orde
J(x)
2
has o
them is str
er, {05} coul
ly one multipl
4 8 12 0
9 13 5
6 10 14 10
7 11 15 15
nput(b)Aft
of ShiftRow and I
nverseMixColu
plication is es
n, and in the 3
nted as pol
2
8
). Every
n that is variab
orm of polyno
3
+o
2
x
2
+o
1
multiplied mod
c(x).
+{u1]x
2
+ {u
yption the inv
d (x), shown
) = c
-1
(x)
uJ]x
2
+ {u9]x
nverse mix co
o complicated
9} and {0e}
x) +c(x) +
{u8]x
2
+ {u8
{u4]x
2
+ {u4
method of
y complex, in
it. In this des
entioned abov
wing [4].
n of mix column
column
ure 5, the M
s system are d
ces in order
pied. By using
J(x) = {u1]
J(x)
2
= J(x)
= {u4]x
2
+ {u
ting J(x) dire
er to get the in
only two mul
raightforward
ld equal to
lication need c
4 8 12
9 13 1 1
0 14 2 6 1
3 7 11
erShiftRow(c)Aft
InvShiftRow
umn
ssential for m
2-bit system t
lynomials w
byte could
ble in encrypti
mials is
1
x +o
0

dulo x
4
+ 1 w
1]x + {u2]
verse multipli
by
x + {uc]
lumn is direc
d multiplicati
. It could
(x)
8]x + {u8]
4]
implementi
nefficient whi
sign, instead
ve, we apply
and inverse mix
MixColumn a
esigned to sha
to optimize t
g this relation
]
)
S]
ectly, J(x)
2
c
nverse. As it c
ltiplications a
to impleme
{04}+{01} th
calculate.
0 4 8 12
13 1 5 9
10 14 2 6
7 11 15 3
terInvShiftRow

mix
the
with
be
ion
with
ied
ctly
ion
be
ing
ich
of
y a
and
are
the
can
can
and
ent.
hat
sy
co
M
D
ex
ev
ca
w
re
w
st
be
ut
co
w
co
D
an
R
Fi
w
de
R
fir
th
on
pr
an
co
M
E.
en
m
co
as
co
key
re
By using th
ystem is mu
omponent occ
MixColumn an
D. Key Expans
Generally th
xpansion. The
very block of
an change key
way will repea
esult in speed
way is to comp
ore all keys i
efore key add
tilising the
omponent.
In order to a
way is used.
omponent ca
Decryption uni
nd then store
AM(RAMB1
igure 6.

Figu
As a new ke
will store in 3
elay. When th
otWord, S-Bo
rst column re
hat, the new c
ne clock and
rocess is repea
nd stored. The
olumn being c
Machine.
. Control Un
The contro
ncryption/decr
multiplexers. T
ome from the
s a master
ontroller also
y_in
set
his method, t
uch reduced
cupies total
d InvMixColu
sion
here are two w
e first way is
encrypted da
y very fast w
at key schedu
reduction and
plete whole k
into block ram
dition. This s
resource,
achieve low-ar
By using th
an be share
t. We first pre
them in a 5
6_36), the d
ure 6 Structure of
ey come in, t
-deep shift re
he fourth colu
ox and then a
eading from th
column create
add with the
ated until all 1
e control of R
calculate are
it
ol of data
ryption is do
The coltrol si
Finite State
controller. M
used to contr
FSM
Ro
Rcon
the complexi
d. As a re
of 56 slices
umn.
ways to imple
s process key
ata. Using this
with no delay
ule every tim
d inefficient.
key expansion
m, then read
second way
especially
rea degsign, th
his way, the
ed with En
ecompute all r
12x9 single-p
diagram is s
Key Expander
the first three
egister after o
umn flow in,
add with Rco
the shift regis
ed will be de
e second colu
10 round keys
Rcon and the c
done by a Fi
a path us
one by contro
ignals for mu
Machine whi
Moreover, thi
rol the round
del otWord
SBox
ty of the
sult, this
for both
ement key
y schedule
s way one
y, but this
me, which
The other
n first and
them just
also help
SubByte
he second
SubByte
ncryption/
round key
port block
shown in
e columns
one clock
it will do
n and the
ster. After
elayed by
umn. This
s are done
chosen for
inite State
sing for
olling the
ultiplexers
ich works
is master
d key read
ay
3deep
SRL16
key_
order during the reading from block RAM. In this
design, the controller is a 256-state FSM, which is
implemented using the 8-bit counter.
In general, the value of 8-bit counter is
correspond to the address of block RAM. Using
this way, the round key can be easily read from
block RAM then added into the data.
In order to control the datapath, multiplexers can
be easily control by a single Encryption/Decryption
signal (E/D) using simple logic gates. Moreover, in
case of decyption, the round key is needed to be
read in reverse order. This can be done by using a
simple substraction circuit to reverse the state of
FSM.
IV. RESULT AND COMPARISON
TABLE 2 PERFORMANCE COMPARISON BETWEEN
AES IMPLEMENTATION
Our P.
Chodo
wiec et
al.[4]
G.
Rouvro
y et al.
[5]
S.
McMill
an et
al.[6]
K. Gaj
et al.
[7]
Device
used
Sparta
n3
Sparta
n2-6
Sparta
n3
Virtex Virtex
-6
Functio
nality
Both
Encry
ption
and
Decry
ption
Both
Encry
ption
and
Decry
ption
Both
Encry
ption
and
Decry
ption
Encry
ption
only
Both
Encry
ption
and
Decry
ption
Key
length
128-
bit
128-
bit
128-
bit
Extern
al Key
Expan
sion
Extern
al Key
Expan
sion
CLB
slices
288 222 163 240 2902
BRAM
s
3 3 3 8 0
Throug
hput
(Mbps)
195 166 208 250 331.5
Clock
(MHz)
130 60 71 136 26
Clock
cycle
per
round
8 4 4 7 1
As shown in Table 2, this design is comparable
to the other available designs. The first issue is
device used. Our design used Spartan3 device,
which are small and low cost FPGA, which is
comparable to the design of [4] and [5]. Whereas
the [6] and [7] used Virtex FPGA which are much
more advance and, thus, more expansive. In term of
functionality, Most design available with
encryption/decryption capability, except that of [6]
which has only encryption core available . In term
of key length, out design, as well as [4] and [5], has
internal key expansion unit which can work with
128-bit key version. However, the design of [6] and
[7] has no internal key expansion, therefore, the
additional key expansion from outside source is
required. In term of area, Our design requires 288
slices with 3 block RAMs, which are a slightly
higher than the other low-area design, such as
[4],[5], and [6]. However, our throughput achieved
is at 195 Mbps, which are even higher than [4]
which is our ancestor. The interesting part of our
result is the maximum clock frequency. In our
design, we can achieve the maximum clock
frequency at 130MHz, which is much higher than
any other design in devices in Spartan family. Our
clock frequency almost equal to the system which
is implemented in Virtex family. By using this
design, the Spartan3 device can work at very fast
speed and enable the high clock frequency for other
circuits implemented in the same FPGA.
V. CONCLUSION
In this work, a compact and fast solution of AES
on FPGA was implemented. This design is shown
to be one of the highest throughput per slice as
compared to table above. This implementation was
done on the smallest Spartan-3 FPGA, results in
288 slices occupied with 3 block RAMs to achieve
the throughput of 195Mbps at 130MHz clock
frequency. This design can serve wide range of
embedded system that varies from applications
which is sensitive to latency and need high speed
connection like video conference down to
applications that require low area like smart card.

REFERENCE
[1] S. N. Han and X. J. Li, Area and Power Optimized serial
AES Encrypt/Decrypt Circuit MICROELECTRONICS,
vol. 40, Beijing: Chinese Academy of Sciences, 2010.
[2] W.-K. Chen, Linear Networks and Systems (Book style).
Belmont, CA: Wadsworth, 1993, pp. 123135.
[3] National Institute of Standards and Technology (NIST),
Information Technology Laboratory (ITL), Advanced
Encryption Standard (AES), Federal Information
Processing Standards (FIPS) Publication 197, November
2001.
[4] P. Chodowiec, K. Gaj, P. Bellows and B. Schott,
Experimental Testing of the Gigabit IPSec-Compliant
Implementations of Rijndael and Triple DES Using
SLAAC-1V FPGA Accelerator Board, Information
Security Conference (ISC 2001), Malaga, Spain, 2001.
[5] G. Rouvroy, F.-X. Standaert, J.-J. Quisquater and J.-D.
Legat,Compact and efficient encryption/decryption
module for FPGA implementation of the AES Rijndael
very well suited for small embedded applications, In
Proc. IEEE Int. Conf. on Inf. Tech.: Coding and
Computing, vol. 2, pp. 583587, Las Vegas, NV, USA,
April.2004.
[6] S. McMillan and C. Patterson, JBits Implementations of
the Advanced Encryption Standard (Rijndael), Field-
Programmable Logic and Application (FPL 2001),
Belfast, Northern Ireland, UK, 2001.
[7] K. Gaj and P. Chodowiec, Comparison of the hardware
performance of the AES candidates using reconfigurable
haredware, Third Advanced Encryption Standard (AES3)
Candidate Conference, New York, 2000.

S-ar putea să vă placă și