Documente Academic
Documente Profesional
Documente Cultură
Transactions Briefs
Performing Arithmetic Functions with the Chinese
Abacus Approach
I. INTRODUCTION
The Chinese abacus is a very popular and efficient technique used
to perform arithmetic functions. It was used for centuries in many
part of the world (mainly in China) and it is still in use in shops
and small commercial enterprises. The main feature of the Chinese
abacus is the speed of use: a well-trained operator is often capable
of competing with electronic pocket calculators. The time required
inputting data manually is comparable to the electronic approach, and
the generation of the result in the Chinese abacus is so straightforward
that the total computation time is extremely fast.
The above observation stimulated us to analyze the basic reason
of the displayed speed and, possibly, to transfer the same features to
an electronic circuit. This paper shows that, actually, the use of the Fig. 2. B/T converter (four unity-weight inputs).
Chinese-abacus approach leads to promising results when using, for
example, a 0.35-m CMOS process. The speed for an 8-bit pipeline
full adder is as high as 1.3 GHz, and a parallel 8 2 8 bit multiplier can
The number representation used in the Chinese abacus refers to
the digital numeric system. As we are mostly interested in the case
run at 980 MHz. Moreover, the compactness of the physical layout
of binary-based coding, it is more convenient to use a basic element
leads to a relatively small area for the circuits.
made up of four unity-weight beads and two beads having a weight of
four units [Fig. 1(b)]. In practice, we use a base of 22 = 4; and the
II. OPERATION PRINCIPLE basic element is able to represent numbers comprised in the range
The Chinese abacus is made of a set of unity elements representing from 0 to 12. The configuration shown in Fig. 1(b) represents the
the various decade of decimal number. Each element is made up of number five.
five beads having a unity weight and two beads having a weight of As it happens, in the other considered cases, the given coding is
5. The configuration shown in Fig. 1(a) represents the number seven. able to represent numbers exceeding the full scale by half of the base
The coding rule is thermometric; thus, in order to represent a of the numeric system. Having an over-scaled room is the key of the
number lower than five, the same number of beads will be raised operation of the method.
in the main part of the unit. For numbers higher than five, one bead
with weight 5 will be lowered. In such a way, a basic element is
able to represent a decimal number comprised in the range from 0 to III. CHINESE-BEAD BASIC BLOCKS
15. The key feature of the Chinese abacus is the use of two beads In order to design circuits based on the Chinese-abacus approach,
with weight 5. This allows the operator to minimize the transmission it is necessary to achieve, with electronic circuits, some basic
of rests. Moreover, the use of the thermometric code permits a fast functions.
implementation of elementary arithmetic functions such as addition The first of them is the binary-to-Chinese-bead conversion. We
and subtraction. attain it with two steps: a binary-to-thermometric (B/T) conversion
Manuscript received September 13, 1999. This work of C. Gang was sup- and a thermometric-to-abacus (T/A) coding. Fig. 2 shows the basic
ported in part by the Italian Foreign Ministry. This paper was recommended block for the B/T conversion, where we have four unity-weight
by Associate Editor W. Liu. inputs. Similar circuits with binary-weight input can be designed.
F. Maloberti is with the Department of Electronics, University of Pavia, The solution in Fig. 2 is based on the pass-transistor approach [1]
and contains n-channel transistors. The control is given by the inputs
Via Ferrata 1, 27100 Pavia, Italy.
C. Gang is with the Department of Electronics, Vocational and Technical
College, Hunan Normal University, Yuelushan, 410081 Changsha, China. x1 ; x2 ; x3 ; and their complemented x0 ; x1 ; x2 ; x3 : The output is
Publisher Item Identifier S 1057-7130(99)09933-4. made by a thermometric 0 representation or high impedance.
1057–7130/99$10.00 1999 IEEE
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—II: ANALOG AND DIGITAL SIGNAL PROCESSING, VOL. 46, NO. 12, DECEMBER 1999 1513
Fig. 6. B/T converter (two unity-weight inputs and two inputs with weight
2).
where A and B have been defined in (2) and (3). It is well known
P
that the digital representation of results from the sum between the where i = 1; 3; 5; 7 and j = 0; 2; 4; 6; moreover, for i = 1; the last
binary elements (11), shown at the bottom of the page, where the term in (13) must be set equal to zero.
elements of the same column have equal binary weight that increases We achieve a thermometric representation of the partial sums Pi; j
by a factor 2 moving from right to left. Of course, the term a0 b0 with simple logic (for achieving the necessary “and” operations)
represents the LSB. The conventional approach to calculate (11) is and a schematics similar to the one in Fig. 6. The successive use
to use a “shift-and-add” serial technique or, for fast applications, to of a T/A block permits to represent the result into the abacus
hardware implement (11) in a parallel or a pipeline fashion. format.
7 0
a b 6 0
a b 5 0
a b 4 0
a b 3 0
a b 2 0
a b 1 0
a b 0 0
a b
7 1
a b 6 1
a b 5 1
a b 4 1
a b 3 1
a b 2 1
a b 1 1
a b 0 1
a b
7 2
a b a6 b2 a5 b2 a4 b2 a3 b2 a2 b2 a1 b2 a0 b2
7 3
a b 6 3
a b a5 b3 a4 b3 a3 b3 a2 b3 a1 b3 a0 b3
7 4
a b a6 b4 a5 b4 a4 b4 a3 b4 a2 b4 a1 b4 a0 b4
7 5a b 6 5
a b a5 b5 a4 b5 a3 b5 a2 b5 a1 b5 a0 b5
a7 b6 a6 b6 a5 b6 a4 b6 a3 b6 a2 b6 a1 b6 a0 b6
a7 b7 a6 b7 a5 b7 a4 b7 a3 b7 a2 b7 a1 b7 a0 b7 (11)
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—II: ANALOG AND DIGITAL SIGNAL PROCESSING, VOL. 46, NO. 12, DECEMBER 1999 1515
Using the same principle followed to compute (11), we can group TABLE I
FEATURES OF ABACUD ARITHMETIC CIRCUITS
the terms in (12) as follows:
P = 28 Q1 + Q0 (17)
The total number of transistors required by the simulated circuits is
limited; it ranges from 296 to 3699. These figures are quite acceptable
Q1 = 16K7; 7 + K7; 3 + H7; 4 for the implemented functions. Moreover, a custom layout permits
Q0 = 16(H7; 0 + H3; 4 ) + H3; 0 (18) obtaining a good compactness. The 33 transistors required to achieve
a B/T function can accommodated within a 16 2 19 m space,
leading to an area per single transistor as small as 9.5 m2 : Assuming
The approach proposed here is similar to the well-known Wallace
that the overhead for block interconnections is 100% of the basic
block area, we can estimate that the entire 8 2 8 pipeline multiplier
tree [2] and Dadda [3], [4] implementations. The basic idea is
can be accommodated in 0.07 mm2 : The above estimation is rough;
to achieve the multiplication result with a hierarchical operand
reduction. However, the method proposed here utilizes an abacus
nevertheless, the achieved result just gives us an idea of the possible
representation of numbers with a 0–7 range instead of a simpler binary
chip area of the proposed solution.
coding. This feature leads to a further reduction of carry–transfer
need and a lower number of hierarchical levels. Moreover, specific
architectures can be studied in order to reduce the critical path.
VII. CONCLUSION
Nevertheless, the proposed method requires using the variety of basic
blocks discussed in Section II. This is a partial limit: the basic blocks This brief presented a technique for performing arithmetic func-
can be achieved with a regular layout and a well-structured floor plan. tions that mimic the Chinese abacus. The key feature of the method is
The calculation of the Kl; m and Hi; j terms represented by (15) the use of a different data representation. Using abacus basic blocks,
and (16) involve the addition of terms with different weight, some it was possible to achieve fast CMOS adders/multipliers operating
of them having an abacus format. It is possible to achieve the result at a clock frequency higher than traditional counterparts. The circuit
by a proper use of abacus blocks. Unity weight bin, like the lower implementation requires a small chip area. Nevertheless, it is difficult
beads’ partial sums Pi; j or the output of logic “and,” are added by to compare our solution with traditional architectures; the chip area
B/T or SU blocks. The results are then transformed into the abacus critically depends on design rules of the specific technology used.
format by T/A converters. Similar to the architecture in Fig. 7, we can
design parallel computation lines with a minimum (or pipelined) carry
ACKNOWLEDGMENT
path. The strategy used was to performs the required operations with
a hierarchical approach: the various terms are successively grouped The authors would like to thank G. Torelli and S. Cirimelli for
in groups of three or four terms and the results are calculated with numerous helpful discussions.
architectures made by basic blocks.
Pipeline implementations are also possible: the technique, of
course, requires the architecture partitioning in various stages. Each REFERENCES
stage provides the input to an “hold block” used as interface of the
successive pipeline stage. [1] R. Zimmerman and W. Fichtner, “Low-power logic styles: CMOS
versus pass transistor logic,” IEEE J. Solid-State Circuits, vol. 32, pp.
1079–1090, July 1997.
VI. SIMULATION RESULTS AND IMPLEMENTATION ISSUES 2
[2] M. Hanawa, K. Kaneko, et. al, A 4.3 ns 0.3 mm CMOS 54 54 multiplier
using precharged pass-transistor logic,” in Proc. ISSCC’96, pp. 364–365.
Using the methodology discussed in the previous section we [3] S. Naffziger, “A sub-nanosecond 0.5 mm 64 b adder design,” in Proc.
have designed an 8-bit parallel adder, an 8-bit pipeline adder, and ISSCC’96, pp. 362–363.
an 8 2 8 pipeline multiplier. The circuits have been simulated 2
[4] A. Inoue, R. Ohe, et al., “A 4.1 ns compact 54 54 multiplier utilizing
with SPICE using a 0.35-m CMOS process. Parasitic capacitances sing select booth encoders,” in Proc. ISSCC’97, pp. 416–417.
extracted from the layout of basic blocks and an estimation of
interconnection capacitances have been accounted for. The achieved
results are summarized in Table I. We can observe that for the pipeline
implementations, the pre-charge phase and the I/O delay due to the
transfer-gate operation are less than 0.38 and 0.51 ns for the 8-bit
adder and the 8 2 8 multiplier, respectively. Therefore, the maximum
possible clock frequency is, in the nominal case, 1.3 GHz and 980
MHz, respectively.