Finite Field Polynomial Multiplier With Linear Feedback Shift Register

Tamkang Journal of Science and Engineering, Vol. 10, No. 3, pp.
253-264 (2007)
253
Finite Field Polynomial Multiplier with Linear Feedback Shift Register

Che-Wun Chiou1*, Chiou-Yng Lee2 and Jim-Min Lin3
Department of Computer Science and Information Engineering, Ching Yun University, Chung-Li, Taiwan 320, R.O.C. 2 Department of Computer Information and Network Engineering, Lung Hwa University of Science & Technology, Taoyuan, Taiwan 333, R.O.C. 3 Department of Information Engineering and Computer Science, Feng Chia University, Taichung, Taiwan 407, R.O.C.
1
Abstract
We will present an one-dimensional polynomial basis array multiplier for performing multiplications in finite field GF(2m). A linear feedback shift register is employed in our proposed multiplier for reducing space complexity. As compared to other existing two-dimensional polynomial basis multipliers, our proposed linear array multiplier drastically reduces the space complexity from O(m2) to O(m). A new two-dimensional systolic array version of the proposed array multiplier is also included in this paper. The proposed two-dimensional systolic array multiplier saves about 30% of space complexity and 27% of time complexity while comparing with other two-dimensional systolic array multipliers. Key Words: Finite Field, Multiplication, Polynomial Basis, Systolic Array, Cryptography
1. Introduction
Arithmetic operations in a finite field play an increasingly important role in error-correcting codes [1], cryptography [2], digital signal processing [3,4], and pseudorandom number generation [5]. Two premier arithmetic operations over finite fields are addition and multiplication. Addition operation is simple. Multiplication operation requires more computational time and higher circuit complexity. Many other complex arithmetic operations, like exponentiation, division, and multiplicative inversion, can be therefore performed by applying multiplication operations repeatedly. Hence, it is important in a practical sense to develop fast multiplication algorithms for these complex arithmetic operations. In recent years, the realization of multiplication operation in finite fields has received wide attentions, and several approaches have been presented [6-36]. The complexity
*Corresponding author. E-mail: cwchiou@cyu.edu.tw
of implementing multiplication operations depends on the representation of the field elements. There are three main representation types of bases over GF(2m) fields, namely, normal basis (NB), dual basis (DB), and polynomial basis (PB). The major advantage of the NB multipliers [6-8] is that the squaring of an element could be computed simply by a cyclic shift of the binary representation. Thus, the normal basis multipliers could be very effectively applied on performing inverse, squaring, and exponentiation operations. The DB multipliers [9-13] require less chip area than other two types. However, the former two multipliers need basis conversion, while the latter type does not [36]. The polynomial basis representation has been widely used and leads to lots of efficient implementations of multipliers. As compared to other two bases multipliers, the polynomial multipliers have the feature of lower design complexity and their sizes could be easily extended to desirable scales to meet various applications due to their simplicity, regularity, and modularity in architecture.
254
Che-Wun Chiou et al.
Numerous architectures for PB multipliers have been presented [14-35]. The first parallel PB multiplier was suggested by Bartee and Schneider [14]. The PB multiplication operation for GF(2m) is often accomplished in two steps: polynomial multiplication and modular reduction. In practical, both steps are usually combined together for performance reason. Mastrovito [15, 16] firstly proposed the architecture for performing such combinational operations. Recently, several bit-parallel PB multipliers have been proposed for VLSI implementation by using some specific classes of polynomials, such as trinomials [17-23], all one polynomials (AOP) and equally spaced polynomials (ESP) [24-26], and composite fields [27,28]. Yet these architectures still have certain shortcomings as regards cryptographic application due to their high circuit complexity and long latency. When the size of the finite field is getting large, the issue of modular multipliers design requires much more attentions. To alleviate the long latency problem, most existing PB multipliers employ XOR trees to minimize time complexity. Unfortunately, these circuits are not suitable for VLSI systems, due to the irregular and nonmodular structure of XOR trees. To overcome this problem, Lee [22] has proposed a regular and modular PB multiplier using irreducible trinomials with the space complexity of O(m2) and the time complexity of O(m). This multiplier could be easily extended and implemented using VLSI technologies. In this article, we will present a linear parallel-in parallel-out PB array multiplier using general irreducible polynomials with a linear feedback shift register. The proposed PB multiplier requires the space complexity of O(m). In order to demonstrate that our proposed multiplier is superior to other existing two-dimensional systolic array multipliers, a new two-dimensional systolic array multiplier version of such multiplier is also presented. We will show that the proposed two-dimensional systolic array multiplier also saves both space and time complexities while comparing with other existing twodimensional systolic array multipliers. The organization of this paper is as follows. In Section 2, we will provide some basic definitions and preliminaries. In Section 3, we derive the one-dimensional parallel-in parallel-out PB multiplication algorithm using general irreducible polynomials and a linear feedback shift register. The two-dimensional systolic array
version of the proposed algorithm is then described in Section 4. The space and time complexities are discussed in Section 5. Finally, a brief conclusion is given in Section 6.
2. Preliminaries
It is assumed that the reader is familiar with the basic concepts of finite fields. The properties of finite fields are covered in detail in [1,2]. The properties of finite fields are reviewed briefly as required in the following paragraphs. The finite field GF(2m) can be viewed as a vector space of dimension m over GF(2). Suppose that the finite field GF(2m) is generated by the irreducible polynomial P(x) = p0 + p1x + + pm-1xm-1 + xm of degree m over GF(2), where p0 = 1. Then any element A in the Galois field GF(2m) can be represented as A(x) = a0 + a1x + a2x2 + + am-1xm-1, where x is an intermediate over GF(2). The basis {1, x, x2,, xm-1} is known as standard basis and often refered to as polynomial basis, conventional basis or canonical basis. Since P(x) = 0, xm = p0 + p1x + + pm-1xm-1 can be used to reduce the high order term xp, p m, to a polynomial of degree less than m. Thus, xB(x) mod P(x) can be reduced by xB(x) mod P(x) = b0x + b1x2 + + bm-1xm mod P(x) = bm-1p0 + (bm-1p1 + b0)x + + (bm-1pm-1 + bm-2)xm-1 Let B(x)(1) = xB(x) mod P(x) (1)
Therefore, xiB(x) mod P(x) can be obtained as the following formula B(x)(i) = xB(x)(i-1) mod P(x) Note that B(x)(0) = B(x). Let the PB representation of B(x)(i) be
B( x)(i ) = bi ,0 + bi ,1 x + bi ,2 x 2 + bi ,3 x3 + ... + bi , m -1 x m -1 , where bi, j {0,1} for 0 j m - 1.
(2)
According to xm = p0 + p1x + p2x2 + + p m-1xm-1, the relation between B ( X)(i+1) and B ( X)(i) is depicted as follows:
255
B( X )(i +1) = bi +1,0 + bi +1,1 x + bi +1,2 x 2 + ... + bi +1, m -1 x m -1 = xB( X )(i ) = x(bi ,0 + bi ,1 x + bi ,2 x 2 + bi ,3 x3 + ... + bi , m -1 x m -1 ) = bi ,0 x + bi ,1 x 2 + bi ,2 x3 + bi ,3 x 4 + ... + bi , m - 2 x m -1 + bi , m -1 x m = bi ,0 x + bi ,1 x 2 + bi ,2 x3 + bi ,3 x 4 + ... + bi , m - 2 x m -1 + bi , m -1 ( p0 + p1 x + p2 x 2 + ... + pm -1 x m -1 ) = bi , m -1 p0 + (bi ,0 + bi , m -1 p1 ) x + (bi ,1 + bi , m -1 p2 ) x + ... + (bi , m - 2 + bi , m -1 pm -1 ) x m -1
2
cj =
a S (i)
i j i =0
m -1
for 0 j m - 1
The cell Uj is responsible for accumulating the coefficient cj (0 j m-1). Another shift register E with m bits, Em-1Em-2E1E0, is used for storing and rotating A(X) and is defined as follows: Em-1(i+1) = E0(i), and Ej(i+1) = Ei+1(i) for 0 j m-2.
(3)
3. The Proposed Multiplier with Linear Feedback Shift Register

Let A(x) and B(x) be any two elements in G(2 ), and the element C(x) be the multiplication of A(x) and B(x) in GF(2m), i.e., C(x) = A(x) B(x) mod P(x). Referring to the Horners rule, the product C(x) = A(x) B(x) mod P(x) can be obtained:
C ( x) = A( x) B( x) mod P( x) = a0 B( x) + a1 xB( x) + a2 x B( x) + a3 x B( x)
2 3
m
The notation Ej(i) denotes the value of the bit Ej of the register E at clock cycle i. Both registers S and E are initially loaded in parallel with B(X) and A(X) in the following manners:
S j (0) = b j , and E j ( 0 ) = a j , for 0 j m - 1
The following example is used to describe the hardware implementation of the proposed linear array multiplier structure. Example 1: An example with an irreducible polynomial P(X) = 1 + x + x3 + x4 + x8 is given here to describe the hardware implementation of the proposed array multiplier. The hardware implementation is shown in Figure 1. The linear feedback shift register S with 8 bits is right shifted one bit for each clock. The shift register E with 8 bits is rotated clockwise one bit for each clock. Their functions are defined as follows: (1)S S0(i+1) = S7(i), S1(i+1) = S0(i)S7(i), S2(i+1) = S1(i), S3(i+1) = S2(i)S7(i), S4(i+1) = S3(i)S7(i), S5(i+1) = S4(i), S6(i+1) = S5(i), S7(i+1) = S6(i). (2)E E7(i+1) = E0(i), and for 0 j 6 Ej(i+1) = Ej+1(i).
+ ... + am -1 x m -1 B( x) = a0 B( x) + a1 ( x( B( x))) + a2 ( x( xB( x))) + a3 ( x( x 2 B( x))) + ... + am -1 ( x( x m - 2 B( x))) = a0 B( x)(0) + a1 B( x)(1) + a2 B( x)(2) + a3 B( x)(3) + ... + am -1 B( x)( m -1)
(4)
Let us define a linear feedback shift register S with m bits (in binary representation: Sm-1Sm-2S1S0) for realizing the Eq. (4) as follows: S0(i+1) = Sm-1(i), and for 1 j m-1, Sj(i+1) = Sj-1(i) if pj = 0, Sj(i+1) = Sj-1(i)Sm-1(i) if pj = 1, where Sj(i) denotes the content of the bit-j of S (i.e., Sj) at clock cycle i. Based on Eq.(2), each coefficient of C(X) is computed as follows:
cj =
a b
i =0
m -1
i i, j
for 0 j m - 1, or
256
Figure 1. Hardware implementation of C(X) = A(X)B(X) mod (1+X+X3+X4+X8).
The shift registers S and E are initially loaded with B(X) and A(X) as follows: Sj(0) = bj and Ej(0) = aj for 0 j 7. The detailed circuit of each cell Uj is shown in Figure 2. The cell Uj realizes the following function: vout = hin v1inv2in, and hout = hin. The output vout is computed and then the result is
latched in the 1-bit latch for each clock. All 1-bit latches in U cells are initially reset to 0s. The symbol L in Figure 2 represents a 1-bit latch. The detailed circuits for cells Sj and Ej could be found in Figure 3 and Figure 4, respectively. The shift registers S and E can be loaded in parallel. The procedure for computing C(X) = A(X) B(X) in Figure 1 is described in the Appendix.
Figure 2. The detailed circuit of the cell Uj.
Figure 3. The detailed circuit of the cell Sj.

j = m-1 j =0
257
Each coefficient of the product C ( X ) = puted as follows:

m -1 i =0
cx
j
is com-
cj =
a b
i i, j
for 0 j m - 1, or
c j = a0 b0, j + a1b1, j + a2 b2, j + a3b3, j + a4 b4, j + ... + am -1bm -1, j a0 b0, j + a1 (b0, j -1 + b0, m -1 p j ) + a2 (b1, j -1 + b1, m -1 p j ) = + a3 (b2, j -1 + b2, m -1 p j ) + ... + am -1 (bm - 2, j -1 + bm - 2, m -1 p j ) a0 b j + (a1b j -1 + a1b0, m -1 p j ) + (a2 b1, j -1 + a2 b1, m -1 p j ) = +(a3b2, j -1 + a3b2, m -1 p j ) + ... +(am -1bm - 2, j -1 + am -1bm - 2, m -1 p j )
(6) The following algorithm can be used for computing the coefficient cj based on Eq. (6).
Figure 4. The detailed circuit of the cell Ej.
4. Implementation with Semi-Systolic Two-Dimensional Array

A semi-systolic two-dimensional systolic array implementation of the proposed array multiplier structure is discussed in this section. As aforementioned, the results in Eqs. (2) and (4) are rewritten as follows:
C ( X ) = a0 B ( X )(0) + a1 B ( X )(1) + a2 B ( X )(2) + a3 B( X )(3) + ... + am -1 B( X )( m -1) ,
Algorithm A: (Using traditional method) cj: = 0; b-1,j-1: = bj; b-1,m-1: = 0; For i = 0 to m-1 Begin cj: = cj + aibi-1,j-1; cj: = cj + ai bi-1,m-1pj; End If Algorithm A is realized with the hardware circuit, the propagation delay of one AND gate delay and two XOR gate delays is needed. To shorten this propagation delay, a parallel version of Algorithm A, Algorithm B, is depicted as follows. Algorithm B: (Using parallel method) z-1: = bj; b-1,m-1: = bm-1; For i = 0 to m-1 Begin Cobegin zi: = zi-1 + bi-1,m-1pj; cj: = cj + aizi-1; Coend End
and B(X)(i) for 0 i m - 1is represented by

B( X )(i ) = bi ,0 + bi ,1 x + bi ,2 x 2 + bi ,3 x3 + ... + bi , m -1 x m -1 , where bi, j {0,1} for 0 j m - 1.
The initial value of B(X)(i) is assigned as follows:

B ( X )(0) = B ( X )
and the relation between coefficients of B(X)(i+1) and B(X)(i) is illustrated as follows:
bi +1,0 = bi , m -1 p0 , bi +1, j = bi , j -1 + bi , m -1 p j for 1 j m - 1
(5)
258
Based on Eqs. (5~6) and Algorithm B, the semisystolic two-dimensional systolic array for realizing the product C(X) = A(X) B(X) is shown in Figure 5. The circuit for the processing element Vi,j is shown in Figure 6.
Suppose that the generating polynomial P(X) has k terms. Most existing PB multipliers using XOR binary trees require the space complexity of O(m2) and take
5. Complexity
In the CMOS VLSI technology, 2-input AND, 2input XOR, and 1-bit latch are composed of 6, 6, and 8 transistors, respectively [37]. Suppose that an XOR gate with 3-input and an XOR gate with 4-input are constructed by two 2-input XOR gates and three 2-input XOR gates, respectively. Thus, the propagation delays of going through a 3-input XOR gate and a 4-input XOR gate would be the same. A comparison of space and area-time complexities of various PB bit-parallel multipliers is given in Table 1.
Figure 6. The detailed circuit for the cell Vi,j.
Figure 5. The proposed semi-systolic two-dimensional systolic array over GF(2m).
259
Table 1. Comparison of various PB bit-parallel multipliers Items Multipliers Yeh et al. [34] Generating polynomial General form Function AB + C Space complexity Gate count
2
Transistor count
Latency 3m
Area-time complexity (ns) 7680m3
Wang-Lin [31]
General form
AB + C
Wei [33]
General form
AB + C
Lee [22]
Trinomials
AB + C
Our proposal in Fig. 1
General form
AB + C
Our proposal in Fig. 5
General form
AB + C
#AND2: 2m 80m2 2 #XOR2: 2m #L: 7m2 #AND2: 2m2 76m2 2 #XOR3: m #L: 7m2 #AND2: 3m2 68m2 2 #XOR2: m #XOR3: m2 #L: 4m2 36m2 + 24m - 24 #AND2: m2 2 #XOR2: m + m - 1 #L: 3m2 - 2m - 2 #AND2: 7m 72m + 6k #XOR2: m + k #L: 3m 48m2 #AND2:2m2 2 #XOR2: 2m #L: 3m2
3m
10032m30 2992m3
2m - 1
2304m3 2304m2 1536m3
time complexity of O(log2 m) [24,25]. However, such multipliers are not regular and then are not suitable for VLSI implementation due to their tree structures. To overcome this problem, many systolic array structures, which have features of regularity and modularity and are well suited to VLSI implementation, have been presented. However, most existing systolic array multipliers need the space complexity of O(m2). Our proposed linear systolic array multiplier in Figure 1 using an irreducible polynomial only requires the space complexity of O(m). However, two-dimensional systolic array multipliers are useful when there are many successive multiplication operations to be performed as in the case of exponentiation operation. Thus, a two-dimensional systolic array version of the proposed multiplier is shown in Figure 5. Comparing with the multiplier proposed by Wei [33], the proposed two-dimensional semi-systolic systolic array multiplier in Figure 5 saves about 30% of space complexity. Comparisons of time complexities of various PB bit-parallel multipliers are given in Table 2. Let TA, TX, TL, and T3X represent the gate delays of 2-input AND gate, 2-input XOR gate, 1-bit latch, and 3-input XOR gate, respectively. We assume that real circuits such as M74HC86 (STMicroelectronics, XOR gate,
tPD = 12ns (TYP.)) [38], M74HC08 (STMicroelectronics, AND gate, tPD = 7ns (TYP.)) [39], and M74HC279 (STMicroelectronics, Latch, tPD = 13ns (TYP.)) [40] are employed. The proposed multiplication architectures in Figure 1 and Figure 5 save about 27% of time complexity as compared to the multiplier in [33]. Although, the developed multiplier in Figure 1 increases the space complexity as compared to Lees multiplier [22] for all trinomials, but saves about 33% of areatime complexity.
6. Conclusion
In this study, we have presented an one-dimensional array multiplier for performing multiplications in the finite field GF(2m) with the PB representation. A linear feedback shift register is employed in our proposed multiplier. Our proposed linear array multiplier requires only O(m) space complexity while other existing two-dimensional systolic array multipliers need O(m2) space complexity. Such low-complexity multiplier is very attractive for mobile platforms such as PDA and smart phone. A new two-dimensional systolic array version of the proposed multiplier has also been included. The proposed two-dimensional systolic
260
Table 2. Comparisons of time complexities of various PB bit-parallel multipliers Items Generating polynomial Multipliers Yeh et al. [34] Wang-Lin [31] Wei [33] Lee [22] General form General form General form Trinomials AB + C AB + C AB + C AB + C AB + C AB + C Function Latency (unit = clock cycles) 3m 3m m 2m - 1 m m Throughput 1 1 1 1 1/m 1 Time complexity Propagation through one cell TA+TX+TL TA+T3X+TL TA+T3X TA+TX+TL TA+TX+TL TA+TX+TL Total propagation delay (unit = ns) 3m(TA+TX+TL) (96m ns) 3m(TA+T3X+TL) (132m ns) m(TA+T3X+TL) (44m ns) (2m-1)(TA+TX+TL) (64m ns) m(TA+TX+TL) (32m ns) m(TA+TX+TL) (32m ns)
Our proposal in Fig. 1 General form Our proposal in Fig. 5 General form
array multiplier saves about 30% of space complexity and 27% of time complexity while comparing with other existing two-dimensional systolic array multiplier in [33].
U3 = 0 E0 (0)S3 (0) = a0 b3, U4 = 0 E0 (0)S4 (0) = a0 b4, U5 = 0 E0 (0)S5 (0) = a0 b5, U6 = 0 E0 (0)S6 (0) = a0 b6, U7 = 0 E0 (0)S7 (0) = a0 b7. Step 2: At clock cycle 1; Cells U0~U7 do following operations: U0 = U0 E0 (1)S0 (1) = a0 b0 a1b7, U1 = U1 E0 (1)S1(1) = a0 b1 a1(b0 b7), U2 = U2 E0 (1)S2(1) = a0 b2 a1b1, U3 = U3 E0 (1)S3(1) = a0 b3 a1(b2 b7), U4 = U4 E0 (1)S4(1) = a0 b4 a1(b3 b7), U5 = U5 E0 (1)S5(1) = a0 b5 a1b4, U6 = U6 E0 (1)S6(1) = a0 b6 a1b5, U7 = U7 E0 (1)S7(1) = a0 b7 a1b6. Step 3: At clock cycle 2; Cells U0~U7 do following operations: U0 = U0 E0 (2)S0 (2) = a0 b0 a1b7 a2b6, U1 = U1 E0 (2)S1(2) = a0 b1 a1(b0 b7) a2 (b7 b6), U2 = U2 E0 (2)S2(2) = a0 b2 a1b1 a2(b0 b7), U3 = U3 E0 (2)S3(2) = a0 b3 a1(b2 b7) a2(b1 b6), U4 = U4 E0 (2)S4(2) = a0 b4 a1(b3 b7) a2(b2 b7 b6), U5 = U5 E0 (2)S5(2) = a0 b5 a1b4 a2(b3 b7), U6 = U6 E0 (2)S6(2) = a0 b6 a1b5 a2b4, U7 = U7 E0 (2)S7(2) = a0 b7 a1b6 a2b5.
Appendix: Procedure-A
Procedure-A: /* Let C(X) = A(X) B(X) mod P(X), and */ /* C(X) = c0 + c1X1 + c2X2 + c3X3 + c4X4 + c5X5 + c6X6 + C7X7, */ /* A(X) = a0 + a1X1 + a2X2 + a3X3 + a4X4 + a5X5 + a6X6 + A7X7, */ /* B(X) = b0 + b1X1 + b2X2 + b3X3 + b4X4 + b5X5 + b6X6 + B7X7, */ /* P(X) = 1 + X1 + X3 + X4 + X8. */ Step 0: Initial condition; (a) B(X) is loaded into the linear feedback shift register S as follows: Si(0) = bi for 0 i 7. (b) A(X) is loaded into the shift register E as follows: Ei(0) = ai for 0 i 7. (c) All 1-bit latches in cells Ui for 0 i 7 are initially reset to zeros. Step 1: At clock cycle 0; Cells U0~U7 do following operations: U0 = 0 E0 (0)S0 (0) = a0 b0, U1 = 0 E0 (0)S1 (0) = a0 b1, U2 = 0 E0 (0)S2 (0) = a0 b2,
261
Step 4: At clock cycle 3; Cells U0~U7 do following operations: U0 = U0 E0 (3)S0 (3) = a0 b0 a1b7 a2b6 a3b5, U1 = U1 E0 (3)S1(3) = a0 b1 a1(b0 b7) a2(b7 b6) a3(b6 b5), U2 = U2 E0 (3)S2(3) = a0 b2 a1b1 a2(b0 b7) a3(b7 b6), U3 = U3 E0 (3)S3(3) = a0 b3 a1(b2 b7) a2(b1 b6) a3(b0 b7 b5), U4 = U4 E0 (3)S4(3) = a0 b4 a1(b3 b7) a2(b2 b7 b6) a3(b1 b6 b5), U5 = U5 E0 (3)S5(3) = a0 b5 a1b4 a2(b3 b7) a3(b2 b7 b6), U6 = U6 E0 (3)S6(3) = a0 b6 a1b5 a2b4 a3(b3 b7), U7 = U7 E0 (3)S7(3) = a0 b7 a1b6 a2b5 a3b4. Step 5: At clock cycle 4; Cells U0~U7 do following operations: U0 = U0 E0 (4)S0 (4) = a0 b0 a1b7 a2b6 a3b5 a 4 b4 , U1 = U1 E0 (4)S1(4) = a0 b1 a1(b0 b7) a2(b7 b6) a3(b6 b5) a4(b5 b4), U2 = U2 E0 (4)S2(4) = a0 b2 a1b1 a2(b0 b7) a3(b7 b6) a4(b6 + b5), U3 = U3 E0 (4)S3(4) = a0 b3 a1(b2 b7) a2(b1 b6) a3(b0 b7 b5) a4(b7 b6 b4), U4 = U4 E0 (4)S4(4) = a0 b4 a1(b3 b7) a2(b2 b7 b6) a3(b1 b6 b5) a4(b0 b7 b5 b4), U5 = U5 E0 (4)S5(4) = a0 b5 a1b4 a2(b3 b7) a3(b2 b7 b6) a4(b1 b6 b5), U6 = U6 E0 (4)S6(4) = a0 b6 a1b5 a2b4 a3(b3 b7) a4(b2 b7 b6), U7 = U7 E0 (4)S7(4) = a0 b7 a1b6 a2b5 a3b4 a4(b3 b7). Step 6: At clock cycle 5; Cells U0~U7 do following operations: U0 = U0 E0 (5)S0 (5) = a0 b0 a1b7 a2b6 a3b5 a4b4 a5(b3 b7), U1 = U1 E0 (5)S1(5) = a0 b1 a1(b0 b7) a2(b7 b6) a3(b6 b5) a4(b5 b4) a5(b4 b3 b7), U2 = U2 E0 (5)S2(5) = a0 b2 a1b1 a2(b0 b7) a3(b7 b6) a4(b6 b5) a5(b5 b4),
U3 = U3 E0 (5)S3(5) = a0 b3 a1(b2 b7) a2(b1 b6) a3(b0 b7 b5) a4(b7 b6 b4) a5(b6 b5 b3 b7), U4 = U4 E0 (5)S4(5) = a0 b4 a1(b3 b7) a2(b2 b7 b6) a3(b1 b6 b5) a4(b0 b7 b5 b4) a5(b7 b6 b4 b3 b7), U5 = U5 E0 (5)S5(5) = a0 b5 a1b4 a2(b3 b7) a3(b2 b7 b6) a4(b1 b6 b5) a5(b0 b7 b5 b4), U6 = U6 E0 (5)S6(5) = a0 b6 a1b5 a2b4 a3(b3 b7) a4(b2 b7 b6) a5(b1 b6 b5), U7 = U7 E0 (5)S7(5) = a0 b7 a1b6 a2b5 a3b4 a4(b3 b7) a5(b2 b7 b6). Step 7: At clock cycle 6; Cells U0~U7 do following operations: U0 = U0 E0 (6)S0 (6) = a0 b0 a1b7 a2b6 a3b5 a4b4 a5(b3 b7) a6(b2 b7 b6), U1 = U1 E0 (6)S1(6) = a0 b1 a1(b0 b7) a2(b7 b6) a3(b6 b5) a4(b5 b4) a5(b4 b3 b7) + a6(b3 b7 b2 b7 b6), U2 = U2 E0 (6)S2(6) = a0 b2 a1b1 a2(b0 b7) a3(b7 b6) a4(b6 b5) a5(b5 b4) a6(b4 b3 b7), U3 = U3 E0 (6)S3(6) = a0 b3 a1(b2 b7) a2(b1 b6) a3(b0 b7 b5) a4(b7 b6 b4) a5(b6 b5 b3 b7) a6(b5 b4 b2 b7 b6), U4 = U4 E0 (6)S4(6) = a0 b4 a1(b3 b7) a2(b2 b7 b6) a3(b1 b6 b5) a4(b0 b7 b5 b4) a5(b7 b6 b4 b3 b7) a6(b6 b5 b3 b7 b2 b7 b6), U5 = U5 E0 (6)S5(6) = a0 b5 a1b4 a2(b3 b7) a3(b2 b7 b6) a4(b1 b6 b5) a5(b0 b7 b5 b4) a6(b7 b6 b4 b3 b7), U6 = U6 E0(6)S6(6) = a0 b6 a1b5 a2b4 a3(b3 b7) a4(b2 b7 b6) a5(b1 b6 b5) a6(b0 b7 b5 b4), U7 = U7 E0 (6)S7(6) = a0 b7 a1b6 a2b5 a3b4 a4(b3 b7) a5(b2 b7 b6) a6(b1 b6 + b5). Step 8: At clock cycle 7; Cells U0~U7 do following operations: U0 = U0 E0 (7)S0 (7) = a0 b0 a1b7 a2b6 a3b5 a4b4 a5(b3 b7) a6(b2 b7 b6) a7(b1 b6 b5),
262
U1 = U1 E0 (7)S1(7) = a0 b1 a1(b0 b7) a2(b7 b6) a3(b6 b5) a4(b5 b4) a5(b4 b3 b7) a6(b3 b7 b2 b7 b6) a7(b2 b7 b6 b1 b6 b5), U2 = U2 E0 (7)S2(7) = a0 b2 a1b1 a2(b0 b7) a3(b7 b6) a4(b6 b5) a5(b5 b4) a6(b4 b3 b7) a7(b3 b7 b2 b7 b6), U3 = U3 E0 (7)S3(7) = a0 b3 a1(b2 b7) a2(b1 b6) a3(b0 b7 b5) a4(b7 b6 b4) a5(b6 b5 b3 b7) a6(b5 b4 b2 b7 b6) a7(b4 b3 b7 b1 b6 b5), U4 = U4 E0 (7)S4(7) = a0 b4 a1(b3 b7) a2(b2 b7 b6) a3(b1 b6 b5) a4(b0 b7 b5 b4) a5(b7 b6 b4 b3 b7) a6(b6 b5 b3 b7 b2 b7 b6) a7(b5 b4 b2 b7 b6 b1 b6 b5), U5 = U5 E0 (7)S5(7) = a0 b5 a1b4 a2(b3 b7) a3(b2 b7 b6) a4(b1 b6 b5) a5(b0 b7 b5 b4) a6(b7 b6 b4 b3 b7) a7(b6 b5 b3 b7 b2 b7 b6), U6 = U6 E0 (7)S6(7) = a0 b6 a1b5 a2b4 a3(b3 b7) a4(b2 b7 b6) a5(b1 b6 b5) a6(b0 b7 b5 b4) a7(b7 b6 b4 b3 b7), U7 = U7 E0 (7)S7(7) = a0 b7 a1b6 a2b5 a3b4 a4(b3 b7) a5(b2 b7 b6) a6(b1 b6 b5) a7(b0 b7 b5 b4). The final result C(X) is obtained from the outputs of Ui for 0 i 7.
[3] Blahut, R. E., Fast Algorithms for Digital Signal Processing, Reading, Mass.: Addison-Wesley (1985). [4] Reed, I. S. and Truong, T. K., The Use of Finite Fields to Compute Convolutions, IEEE Trans. Information Theory, Vol. IT-21, pp. 208-213 (1975). [5] Wang, C. C. and Pei, D., A VLSI Design for Computing Exponentiation in GF(2m) and its Application to Generate Pseudorandom Number Sequences, IEEE Trans. Computers, Vol. 39, pp. 258-262 (1990). [6] Omura, J. and Massey, J., Computational Method and Apparatus for Finite Field Arithmetic, U.S. Patent Number 4,587,627 (1986). [7] Wang, C. C., Truong, T. K., Shao, H. M., Deutsch, L. J., Omura, J. K. and Reed, I. S., VLSI Architectures for Computing Multiplications and Inverses in GF(2m), IEEE Trans. Computers, Vol. C-34, pp. 709-717 (1985). [8] Reyhani-Masoleh, A. and Hasan, M. A., A New Construction of Massey-Omura Parallel Multiplier Over GF(2m), IEEE Trans. Computers, Vol. 51, pp. 511520 (2002). [9] Berlekamp, E. R., Bit-Serial Reed-Solomon Encoder, IEEE Trans. Inform. Theory, Vol. IT-28, pp. 869-874 (1982). [10] Wu, H., Hasan, M. A. and Blake, I. F., New LowComplexity Bit-Parallel Finite Field Multipliers Using Weakly Dual Bases, IEEE Trans. Computers, Vol. 47, pp. 1223-1234 (1998). [11] Wu, H. and Hasan, M. A., Low Complexity BitParallel Multipliers for a Class of Finite Fields, IEEE Trans. Computers, Vol. 47, pp. 883-887 (1998). [12] Lee, C. Y., Chiou, C. W. and Lin, J. M., LowComplexity Bit-Parallel Dual Basis Multipliers Using the Modified Booths Algorithm, Computers & Electrical Engineering, Vol. 31, pp. 444-459 (2005). [13] Lee, C. Y. and Chiou, C. W., Efficient Design of Low-Complexity Bit-Parallel Systolic Hankel Multipliers to Implement Multiplication in Normal and Dual Bases of GF(2m), IEICE Trans. on Fundamentals of Electronics, Communications and Computer Science, Vol. E88-A, pp. 3169-3179 (2005). [14] Bartee, T. C. and Schneider, D. J., Computation with
Acknowledgments
The authors would like to thank anonymous referees and the editor for carefully reading the paper and for their great help in improving the paper.
References
[1] MacWilliams, F. J. and Sloane, N. J. A., The Theory of Error-Correcting Codes, Amsterdam: North-Holland (1977). [2] Lidl, R. and Niederreiter, H., Introduction to Finite Fields and Their Applications, New York: Cambridge Univ. Press, U.S.A. (1994).
263
Finite Fields, Information and Computing, Vol. 6, pp. 79-98 (1963). [15] Mastrovito, E. D., VLSI Architectures for Multiplication Over Finite Field GF(2 ), Applied Algebra, Algebraic Algorithms, and Error-Correcting Codes, Proc. Sixth Intl Conf., AAECC-6, T. Mora, ed., Rome, pp. 297-309 (1988). [16] Mastrovito, E. D., VLSI Architectures for Computations in Galois Fields, Ph.D. thesis, Linkping Univ., Dept. of Electrical Eng., Linkping, Sweden (1991). [17] Ko, . K. and Sunar, B., Low-Complexity BitParallel Canonical and Normal Basis Multipliers for a Class of Finite Fields, IEEE Trans. Computers, Vol. 47, pp. 353-356 (1998). [18] Sunar, B. and Ko, . K., Mastrovito Multiplier for All Trinomials, IEEE Trans. Computers, Vol. 48, pp. 522-527 (1999). [19] Wu, H., Bit-Parallel Finite Field Multiplier and Square Using Polynomial Basis, IEEE Trans. Computers, Vol. 51, pp. 750-758 (2002). [20] Elia, M., Leone, M. and Visentin, C., Low Complexity Bit-Parallel Multipliers for GF(2 ) with Generator Polynomial xm+xk+1, Electronics Letters, Vol. 35, pp. 551-552 (1999). [21] Wu, H., Montgomery Multiplier and Squarer for a Class of Finite Fields, IEEE Trans. Computers, Vol. 51, pp. 521-529 (2002). [22] Lee, C. Y., Low Complexity Bit-Parallel Systolic Multiplier Over GF(2m) Using Irreducible Trinomials, IEE Proc.-Comput. Digit. Tech., Vol. 150, pp. 39-42 (2003). [23] Chiou, C. W., Lin, L. C., Chou, F. H. and Shu, S. F., Low Complexity Finite Field Multiplier Using Irreducible Trinomials, IEE Electronics Letters, Vol. 39, pp. 1709-1711 (2003). [24] Itoh, T. and Tsujii, S., Structure of Parallel Multipliers for a Class of Fields GF(2 ), Information and Computation, Vol. 83, pp. 21-40 (1989). [25] Hasan, M. A., Wang, M. and Bhargava, V. K., Modular Construction of Low Complexity Parallel Multipliers for a Class of Finite Fields GF(2 ), IEEE Trans. Computers, Vol. 41, pp. 962-971 (1992).
m m m m
[26] Lee, C. Y., Lu, E. H. and Lee, J. Y., Bit-Parallel Systolic Multipliers for GF(2m) Fields Defined by All-One and Equally-Spaced Polynomials, IEEE Trans. Computers, Vol. 50, pp. 385-393 (2001). [27] Paar, C., A New Architecture for a Parallel Finite Field Multiplier with Low Complexity Based on Composite Fields, IEEE Trans. Computers, Vol. 45, pp. 856-861 (1996). [28] Paar, C., Fleischmann, P. and Roelse, P., Efficient Multiplier Architectures for Galois Fields GF(24n), IEEE Trans. Computers, Vol. 47, pp. 162-170 (1998). [29] Kim, N.-Y., Kim, H.-S. and Yoo, K.-Y., Computation of AB2 Multiplication in GF(2m) Using Low-Complexity Systolic Architecture, IEE Proc.-Circuits, Devices and Systems, Vol. 150, pp. 119-123 (2003). [30] Drolet, G., A New Representation of Elements of Finite Fields GF(2m) Yielding Small Complexity Arithmetic Circuits, IEEE Trans. Computers, Vol. 47, pp. 938-946 (1998). [31] Wang, C.-L. and Lin, J.-L., Systolic Array Implementation of Multipliers for Finite Fields GF(2m), IEEE Trans. Circuits and Systems, Vol. 38, pp. 796-800 (1991). [32] Wei, S.-W., A Systolic Power-Sum Circuit for GF(2m), IEEE Trans. Computers, Vol. 43, pp. 226-229 (1994). [33] Wei, S.-W., VLSI Architectures for Computing Exponentiations, Multiplicative Inverses, and Divisions in GF(2m), IEEE Trans. Circuits and Systems-II: Analog and Digital Signal Processing, Vol. 44, pp. 847-855 (1997). [34] Yeh, C.-S., Reed, I. S. and Truong, T. K., Systolic Multipliers for Finite Fields GF(2m), IEEE Trans. Computers, Vol. C-33, pp. 357-360 (1984). [35] Wang, C. L. and Guo, J. H., New Systolic Arrays for C+AB2, Inversion, and Division in GF(2m), IEEE Trans. Computers, Vol. 49, pp. 1120-1125 (2000). [36] Hsu, I. S., Truong, T. K., Deutsch, L. J. and Reed, I. S., A Comparison of VLSI Architecture of Finite Field Multipliers Using Dual, Normal, or Standard Bases, IEEE Trans. Computers, Vol. 37, pp. 735-739 (1988). [37] Weste, N. and Eshraghian, K., Principles of CMOS VLSI Design: A System Perspective, Reading, Mass.:
264
Addison-Wesley (1985). [38] http://www.st.com/stonline/books/pdf/docs/2006.pdf [39] http://www.st.com/stonline/books/pdf/docs/1885.pdf [40] Http://www.stm.com/stonline/products/literature/
ds/1937/m74hc279.pdf
Manuscript Received: Dec. 1, 2005 Accepted: Oct. 2, 2006

Finite Field Polynomial Multiplier With Linear Feedback Shift Register

Încărcat de

Informații document

Descriere originală:

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Finite Field Polynomial Multiplier With Linear Feedback Shift Register

Încărcat de

Drepturi de autor:

Formate disponibile

Tamkang Journal of Science and Engineering, Vol. 10, No. 3, pp.

Finite Field Polynomial Multiplier with Linear Feedback Shift Register

Che-Wun Chiou et al.

Finite Field Polynomial Multiplier with Linear Feedback Shift Register

3. The Proposed Multiplier with Linear Feedback Shift Register

Che-Wun Chiou et al.

Figure 1. Hardware implementation of C(X) = A(X)B(X) mod (1+X+X3+X4+X8).

Figure 2. The detailed circuit of the cell Uj.

Figure 3. The detailed circuit of the cell Sj.

Finite Field Polynomial Multiplier with Linear Feedback Shift Register

Each coefficient of the product C ( X ) = puted as follows:

4. Implementation with Semi-Systolic Two-Dimensional Array

and B(X)(i) for 0 i m - 1is represented by

The initial value of B(X)(i) is assigned as follows:

Che-Wun Chiou et al.

Figure 6. The detailed circuit for the cell Vi,j.

Figure 5. The proposed semi-systolic two-dimensional systolic array over GF(2m).

Finite Field Polynomial Multiplier with Linear Feedback Shift Register

Area-time complexity (ns) 7680m3

Our proposal in Fig. 1

Our proposal in Fig. 5

2304m3 2304m2 1536m3

Che-Wun Chiou et al.

Finite Field Polynomial Multiplier with Linear Feedback Shift Register

Che-Wun Chiou et al.

Finite Field Polynomial Multiplier with Linear Feedback Shift Register

Che-Wun Chiou et al.

Addison-Wesley (1985). [38] http://www.st.com/stonline/books/pdf/docs/2006.pdf [39] http://www.st.com/stonline/books/pdf/docs/1885.pdf [40] Http://www.stm.com/stonline/products/literature/

Manuscript Received: Dec. 1, 2005 Accepted: Oct. 2, 2006

S-ar putea să vă placă și