A Low-Cost Implementation of Trivium: Nele Mentens, Jan Genoe, Bart Preneel, Ingrid Verbauwhede

A low-cost implementation of Trivium
Nele Mentens, Jan Genoe, Bart Preneel, Ingrid Verbauwhede

Katholieke Universiteit Leuven ESAT/COSIC, Kasteelpark Arenberg 10, B-3001 Leuven, Belgium Nele.Mentens@esat.kuleuven.be Katholieke Hogeschool Limburg, Agoralaan, Campus B, Bus 3, B-3590 Diepenbeek, Belgium Nele.Mentens@iwt.khlim.be IMEC, Polymer and Molecular Electronics, Kapeldreef 75, B-3001 Heverlee, Belgium Jan.Genoe@imec.be Katholieke Hogeschool Limburg, Agoralaan, Campus B, Bus 3, B-3590 Diepenbeek, Belgium Jan.Genoe@iwt.khlim.be Katholieke Universiteit Leuven ESAT/COSIC, Kasteelpark Arenberg 10, B-3001 Heverlee, Belgium Bart.Preneel@esat.kuleuven.be Katholieke Universiteit Leuven ESAT/COSIC, Kasteelpark Arenberg 10, B-3001 Heverlee, Belgium Ingrid.Verbauwhede@esat.kuleuven.be
Abstract. This paper describes the implementation of two Trivium cores on a single chip. The cores are realized in a 5-metal 0.35m AMIS technology. The chip is currently being manufactured. The rst core on the chip is an automatically placed and routed standard cell core. The second one is a custom design using dynamic logic and C2 MOS ipops. The goal of this paper is to evaluate and compare the size of the cores based on the lay-out results. The lay-out of the custom design shows a signicant size reduction compared to the standard cell design. Keywords: stream cipher, custom design, Trivium, dynamic logic.
Introduction
The most ecient cryptographic algorithms to achieve data condentiality are block ciphers and stream ciphers. Stream ciphers are used as
The results presented in this paper have been realized by the master students of 20072008 in electronic engineering, chip design at the Katholieke Hogeschool Limburg: Michael Billen, Free Claessens, Dries Cuypers, Jeroen Dreessen, Frederik Gomm, e Wim Heedfeld, Jens Homann, Kurt Ilsen, Lukasz Jaszczuk, Jan Jooken, Peter Schreurs, Peter Timmermans and Luc Van Roey.
alternatives for block ciphers when high throughput or low gate count are important requirements. The Prole 2 stream cipher candidates in the ECRYPT eSTREAM project are developed to be designed in hardware with restricted resources [2]. In this paper we focus on the hardware implementation of the Prole 2 stream cipher Trivium. Like most stream cipher implementations, the implementation of Trivium requires a few hundreds of storage elements or ipops to store the internal state, while the combinatorial logic part is rather limited. As a consequence, the die size is mainly determined by the size of the ipops. When implemented in a straightforward manner, i.e. using automatic synthesis and standard cell placement and routing, the size of a ipop is equivalent to about 8 standard cell NAND gates or 32 transistors. We present a custom design of Trivium that consists of dynamic logic and C2 MOS ipops. This results in a signicant decrease in size, which is shown on the basis of a lay-out comparison to a standard cell design in the same technology. The cores are compared in a 5-metal 0.35m AMIS technology1 [1]. This paper is organized as follows. Section 2 shortly describes the implemented algorithm, Trivium. In Sect. 3, some previously reported hardware implementations of Trivium are listed. Section 4 describes the two cores that are contained in the chip that is currently being manufactured. Finally, Sect. 5 concludes the paper.
Trivium: The Algorithm
Trivium consists of a 288-bit the state register, in which 3 bits are updated based on the result of combinatorial logic and the remaining bits perform a cyclic shift. The key stream output is the result of an XOR operation on 6 bits in the state register. The schematic representation of the Trivium algorithm is given in Fig. 1. In the initialization phase, the 80-bit key and 80-bit Initial Value (IV) are loaded into the shift register at positions 1-80 and 94-173, respectively. Then, the state register is updated 4 288 times according to Fig. 1 without generating key stream bits. After the initialization phase, a maximum of 264 key stream bits are generated according to Fig. 1. This description matches the radix-1 version of Trivium that outputs one key stream bit in every clock cycle. Because it is our goal to design a low area Trivium core, we only consider Trivium radix-1.
1
AMIS has been bought by ON Semiconductor on September 13, 2007.
Fig. 1. Schematic representation of the Trivium algorithm.
Previous Work
On the eSTREAM Phase 3 website, several hardware implementations of Trivium cores are presented [2]. These implementations are all standard cell designs of which the area and/or the number of equivalent NAND gates is reported. The results are given Table 1. Our design also contains a standard cell core. However, in addition to this reference core, we present a custom design of Trivium. The details of these cores are given in Sect. 4, which also reports on the area of the cores.
4
4.1
Two Trivium Cores

Standard Cell Core
The standard cell core was designed using automatic standard cell placement and routing in L-Edit, a tool from Tanner for physical lay-out. To limit the number of I/O pins on the chip, the loading of the key and the IV in the initialization phase is done in a bit-serial manner. To accommodate this, three multiplexors are added to the design as depicted in Fig. 2.
Table 1. Overview of hardware resources for the implementation of Trivium reported on the eSTREAM Phase 3 website [2]. The equivalent number of NAND gate for the TSMC library was calculated using [3]. Area Number of Technology Radix (m2 ) equivalent NAND gates Grkaynak et al. 144128 u 0.25 m CMOS UMC 64 Feldhofer 169950 3090 0.35 m 16 Gaj et al. 7428 3068 90 nm TCBN90G TSMC 1 Gaj et al. 13440 5551 90 nm TCBN90G TSMC 64 Good and Benaissa 2599 0.13 m 1 Good and Benaissa 2660 0.13 m 4 Good and Benaissa 2801 0.13 m 8 Good and Benaissa 3185 0.13 m 16 Good and Benaissa 3787 0.13 m 32 Good and Benaissa 4921 0.13 m 64 Authors
4.2
Dynamic Core using C2 MOS ipops
The second core has the same architecture as depicted in Fig. 2. However, it uses dynamic instead of static logic for the logic gates in the design. Instead of standard CMOS ipops, the dynamic core uses C2 MOS ipops to store the internal state. The next two paragraphs describe the components of the dynamic core. Dynamic logic Dynamic logic gates are constructed with a single PullUp Network (PUN) or Pull-Down Network (PDN), whereas static CMOS gates contain both a PUN and a complementary PDN. To enable a dynamic gate to provide a logical 1 as well as a logical 0, two precharge transistors are added. These transistors are fed with a clock signal such that each period of the clock consists of a precharge phase followed by an evaluation phase. In the precharge phase the output is precharged to 0 (1). In the evaluation phase the output either remains 0 (1) or is (dis)charged to 1 (0). Fig. 3 depicts a NAND gate in static CMOS logic and dynamic logic. If the Boolean equation of a gate is written as an inverted function, the number of transistors in a static CMOS implemenation is 2N, where N is the number of literals in the function. An advantage of dynamic logic is that the number of transistors only equals N+2. Especially for large gates, this results in a signicant decrease in the number of transistors. Another advantage is that the capacitive input load of a dynamic gate is smaller than that of the corresponding static CMOS gate, resulting in a
Fig. 2. Architecture of the standard cell Trivium design.
higher speed. The reason is that in dynamic logic only a single nMOS or pMOS needs to be driven per literal in the Boolean function, while for static CMOS an nMOS and a pMOS need to be driven. A disadvantage of dynamic logic is that automatic placement and routing is dicult to employ. The reason is that there are timing restrictions that prevent straightforward cascading of logic gates. The following restrictions need to be taken into account: Two cascaded dynamic logic gates that evaluate on the same clock level cannot both have a PUN or a PDN.
Fig. 3. Transistor schemes of a static CMOS NAND gate (left), a dynamic NAND gate with PDN (middle) and a dynamic NAND gate with PUN (right).
When the output of a dynamic logic gate that evaluates on a high (low) clock level is connected to a dynamic logic gate that evaluates on a low (high) clock level, a transition block needs to be inserted. This transition block is depicted in Fig. 4 for a transition of a signal from a dynamic logic gate evaluating on a low clock level to a dynamic logic gate evaluating on a high clock level.
Fig. 4. Transistor scheme of a transition block to be inserted in between a dynamic gate with evaluation on a low clock level and a dynamic gate with evaluation on a high clock level.
Another drawback of dynamic logic is the fact that the output node of a gate is oating when there is no (dis)charge during the evaluation phase. Therefore, the clock frequency not only has an upper bound caused by the delay of the critical path, but also a lower bound that depends on the leakage of the signal on the oating output node. In practical applications, the lower bound on the clock frequency is usually assumed to be around 1MHz. In our custom design of Trivium, we implement the XOR gates, NAND gates and multiplexors using dynamic logic. The ipops are described in the next paragraph.
C2 MOS Flipop A C2 MOS ipop consists of two transition blocks that pass the signal on a dierent clock level. A rising edge C2 MOS ipop is depicted in Fig. 5. A falling edge C2 MOS ipop consists of the same two transition blocks in reverse order. Whereas a standard cell ipop consists of about 8 equivalent NAND gates or 32 transistors, a C2 MOS ipop consists of only 8 transistors. However, there are some drawbacks to C2 MOS ipops:
Fig. 5. Transistor scheme of a C2 MOS ipop.
When the clock level is high in Fig. 5, the node in between the two transition blocks is oating. This poses a lower bound on the clock frequency in the same way as for dynamic logic gates. In order to prevent a race condition where the signal passes through the C2 MOS at once, the rise and fall time of the clock needs to be high enough. In our custom design of Trivium, we store the internal state using 288 C2 MOS ipops. This signicantly reduces the area of the design, as shown in the next section. 4.3 Comparison of the Area of the Cores
Fig. 6 shows the lay-outs of the standard cell core and the dynamic core in a 0.35m AMIS technology. The area of the standard cell core is estimated at 108900 m2 , while the custom design has a signicantly smaller estimated area of 40425 m2 . This comes down to an equivalent number of NAND gates equal to 2017 for the standard cell design and 749 for the custom design. When comparing these core sizes to the results in Table 1, we notice that only the design of Gaj et al. has a smaller die area. However, this is an unfair comparison since this design was realized in a 90nm technology while our design uses a 0.35m technology. A more fair comparison can be made based on the equivalent number of NAND gates. The designs that report this metric, all show a larger core size, even in comparison to our standard cell design. The reason for this could be that the key and IV are loaded serially in our design, which takes away the need for multiplexors compared to a design that loads the key and IV in parallel. However, the most useful comparison is the one that evaluates the sizes of our standard cell and custom designs, which shows that the custom design decreases the area by a factor of more than 2.
Fig. 6. Lay-out of the standard cell core (top) and the dynamic core (bottom) in a 0.35m AMIS technology.
Conclusions and future work
This paper described the implementation of two Trivium cores in dierent design styles. The cores are currently being manufactured in a single chip. The sizes of the cores are compared based on their lay-outs. The custom design shows a signicant decrease in die area compared to the standard cell design. Upon production, the functionality of the chip will be tested. Moreover, the minimal and maximal operating frequency will be determined for both cores as well as the allowable rise and fall time of the clock.
References
1. 0.35 micron AMIS technology. http://www.amis.com/pdf/standard cell/sc3 fs.pdf, 2008. 2. eSTREAM Phase 3: Trivium. http://www.ecrypt.eu.org/stream/triviump3.html, 2008. 3. TSMC. TSMC standard cell libraries. http://www.cadence.com/datasheets/4456 TSMC SC ds.pdf, 2008.

A Low-Cost Implementation of Trivium: Nele Mentens, Jan Genoe, Bart Preneel, Ingrid Verbauwhede

Încărcat de

Informații document

Descriere originală:

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

A Low-Cost Implementation of Trivium: Nele Mentens, Jan Genoe, Bart Preneel, Ingrid Verbauwhede

Încărcat de

Drepturi de autor:

Formate disponibile

A low-cost implementation of Trivium

Nele Mentens, Jan Genoe, Bart Preneel, Ingrid Verbauwhede

Trivium: The Algorithm

AMIS has been bought by ON Semiconductor on September 13, 2007.

Fig. 1. Schematic representation of the Trivium algorithm.

Two Trivium Cores

Dynamic Core using C2 MOS ipops

Fig. 2. Architecture of the standard cell Trivium design.

Fig. 5. Transistor scheme of a C2 MOS ipop.

Conclusions and future work

S-ar putea să vă placă și