Documente Academic
Documente Profesional
Documente Cultură
Mrio P. Vstias
Horcio C. Neto
INESC-ID/ISEL/IPL
email: mvestias@deetc.isel.ipl.pt
INESC-ID/IST/UTL
email: hcn@inesc-id.pt
ABSTRACT
73
2500
Number of LUTs
eq 3
eq 4
eq 5
1000
500
1
ShiftAdd
18
51
110
184
278
398
526
675
10
11
12
13
14
15
16
eq 2
26
54
88
130
186
245
312
395
479
571
682
791
908 1046
eq 3
22
47
70
104
145
188
236
297
355
424
498
596
684
781
eq 4
22
50
71
91
140
173
199
269
312
346
431
486
526
630
eq 5
22
47
73
103
132
174
203
259
296
349
408
480
529
602
Number of Digits
(3)
(4)
(5)
These equations have more available parallelism compared to the original Horners arrangement.
To determine which representation has the lowest cost
implementation, the architectures to implement them were
described in VHDL and synthesized considering a Virtex-4
FPGA (see results in figure 1).
The shift and add approach is by far the worst approach
in terms of area. The implementations of equations 4 and
5 are the best alternatives with improvements from 50 to
almost 75% of those obtained with the shift and add algorithm. A few other rearrangements of the Horner equation
were tested, including other powers of ten, but the achieved
results were worse.
The delays associated with the implementations follow
a similar relation (see figure 2).
The best solution achieves up to almost 50% improvement in the delay compared to that of the shift and add solution.
(1)
Because of the multiplications by the powers of ten, a direct implementation of this equation would require multipliers whose size would significantly increase with the size of
the numbers to be converted. To overcome this problem, (1)
can be rearranged by applying the Horners rule (see equation (2)).
eq 2
1500
ShiftAdd
2000
(2)
74
40
eq 2
30
Delay (ns)
ShiftAdd
35
eq 4
20
eq 5
(16..0)
(26..17)
eq 3
25
72 x + y
15
10
18
5
0
18 bits
10
11
12
13
14
15
16
ShiftAdd
11
14
16
18
21
23
25
27
29
31
34
eq 2
12
14
16
18
20
23
25
27
29
32
34
eq 3
11
13
14
16
17
20
21
23
24
26
eq 4
10
11
13
13
14
16
17
18
eq 5
10
12
12
12
14
16
16
16
19
17
b2TOb1000
8
Number of Digits
17
b2TOb1000
D2(100)
D0(1000)
D1(1000)
and,
bn1 2n1 + bn2 2n2 + . . . + b0 20
(6)
c = b1 72 + b0
c (26 + 23 ) b1 + b0 < 218
17 bits
(8)
So, the b2TOb1000 can be already applied to determine the
least significant digit, d0 , and part of the following digits,
d1 .
b1 762 =
b0 217
(11)
An hardware implementation of this converter can be designed using a set of adders and the b2TOb1000 unit (see
Fig. 3). Instead of two modules 24 x + y and two adders,
this new approach uses only one module that calculates 72
x + y and one adder.
Both architectures were evaluated using operands with
different sizes and compared to a parallel implementation of
the shift and add-3 algorithm (see results in figure 4).
For certain operand sizes, the new approach achieves almost 50% area reduction compared to the shift and add-3.
Compared to the solution presented in [10] the improvements are almost 10% for numbers with 16 digits. We also
108 1
217
(7)
75
2000
B7-0
A7-0
ShiftAdd
1800
[10]
1600
32
32
New converter
LUTs
1400
BCDtoBIN
BCDtoBIN
1200
1000
27
27
800
600
AB
400
200
0
10
11
12
ShiftAdd
14
41
91
147
213
320
411
518
675
810
955
[10]
14
41
67
95
132
171
248
339
427
593
732
950
13
14
15
New converter
14
41
67
81
132
159
232
305
394
546
707
885
54
16
BINtoBCD
Number of Digits
16
70
ShiftAdd
60
[10]
New converter
Delay (ns)
50
40
30
20
10
0
10
11
12
13
14
15
16
ShiftAdd
10
13
15
16
19
21
23
25
26
28
33
[10]
13
15
18
21
25
29
35
39
44
48
54
58
63
New converter
13
15
18
20
21
24
27
30
34
38
42
47
52
Number of Digits
76
A2xB3+A3xB2 (9)
000 (3)
000 (3)
A2 (4)
A1 (4)
A0 (4)
B3 (4)
B2 (4)
B1 (4)
B0 (4)
A0xB2+A0xB2+A0xB2 (9)
A1xB3+A2xB2+A3xB1 (9)
A3xB3 (9)
A3 (4)
000 (3)
A0xB0 (8)
A0xB1+A1xB0 (9)
A0xB3+A1xB2+A2xB1+A3xB0 (9)
4 digits
12 digits
32 Digits
Fig. 8. 16x16 Decimal multiplier with decimal partial products
4 Digits
4 Digits
4 Digits
4 Digits
8 Digits
8 Digits
4 digits
8 Digits
8 Digits
3127
2023
2061
1176
0
4
0
4
Delay
73 ns
76 ns
45 ns
47 ns
8 digits
[7]
Our with DSP
Our without DSP
16 Digits
2609
1176
2061
0
4
0
34 ns
47 ns
45 ns
77
8. REFERENCES
8729
3005
6493
0
16
0
54 ns
68 ns
65 ns
May
2007,
[4] R. D. Kenney, M. J. Schulte, and M. A. Erle, Highfrequency decimal multiplier, in Proceedings IEEE International Conference on Computer Design: VLSI in Computers
and Processors, Oct. 2004, pp. 2629.
6. CONCLUSION
We have implemented an 8 8 and an 16 16 decimal multipliers using binary multiplications. The results show that
this approach is better than those considering direct manipulation of decimal operands when implemented in a Virtex-4
FPGA.
An important advantage of the approach proposed herein
is that it can effectively use the embedded binary multipliers available in actual FPGAs and in other coarse-grained
reconfigurable architectures.
For future work, we plan to analyze the effects of other
subdivisions of the initial operands over the performance
and the consumed area.
It would be also important to test the designs with other
technologies besides FPGAs, namely coarse-grained reconfigurable architectures with binary arithmetic units of different complexities.
[6] T. Lang and A. Nannarelli, A radix-10 combinational multiplier, in Proceedings IEEE 40th International Asilomar
Conference on Signals, Systems, and Computers, Oct. 2006,
pp. 313317.
[7] A. Vzquez, E. Antelo, and P. Montushi, A new family of
high-performance parallel decimal multipliers, in Proceedings IEEE 18th Symposium on Computer Arithmetic, June
2007, pp. 195204.
[8] L. Dadda and A. Nannarelli, A variant of a radix-10 combinational multiplier, in Proceedings IEEE International Symposium on Circuits and Systems (ISCAS), May 2008, pp.
33703373.
[9] P. Alfke and B. New, Serial code conversion between BCD
and binary, in Xilinx application note XAPP 029, Oct. 1997.
7. ACKNOWLEDGMENT
[10] H. Neto and M. Vstias, Decimal multiplier on fpga using embedded binary multipliers, in Proceedings IEEE Field
Programmable Logic and Applications, 2008, pp. 197202.
This work was partially supported by the Portuguese Foundation for Science and Technology (FCT) through Project
Reconfigurable Hardware using MTJ Memories.
(PTDC/EEA-ELC/72933/2006).
78