Sunteți pe pagina 1din 4

2010 Interational Conference on Mechanical and Electrical Technolog (ICMET 2010)

1h81uW8t U88gatun Lumgat8un u U8t8ntA1 Atcht8ctut88


Junkai Sun
Second Design Department
Beijing Microelectronics Technology Institute
Beijing, China
Email: sunjunkai86@hotmail.com
Abstract-Low power is a challenging work in processor design.
Implementing power optimization on all components of the
processor is a choice. One of the most basic components in
processor is the Arithmetic and Logic Unit (ALU), which
performs arithmetic operations and logic operations. The
architecture of ALU has several implications on power
consumption, delay and area. In this paper, diferent ALU
architectures are described and discussed. To fnd out which
ALU architecture provides the best power efciency, an 8-bit
ALU of the diferent architectures is designed. Compared with
other architectures, the power savings of the most power
efective architecture range from 19.38% to 33.87%. At the
same time, the corresponding area savings range from 56.37%
to 14.92%.
Keywords-ALU architecture; Low power; Oerand isolaton.
I. INTRODUCTION
Batter-powered and hand-held devices such as laptop
computers and cell phones have improved our daily life
greatly. It is required that te hardware system have to be
fast and multifnctional. But for the purpose of portabilit,
the minimum of batter weight and the maximum of
operating time between battery recharge (which is decided
by power consumption) are required. Therefore, it is urgent
for designers to develop circuits and systems that use less
energy without greatly sacrifcing the performance.
Power optimization of a processor can be flflled at
many diferent levels of the design hierarchy with diferent
method. For example, Dynamic Voltage Scaling (DVS) at
system level, bus-coding at algorithm level, clock-gating and
operand isolation at register transfer level, transistor sizing
and threshold voltage scaling at circuit & transistor level.
Power optimization of a processor can also be implemented
on all components of it. Arithmetic and Logic Unit (ALU)
which takes operads fom register fle, data memor or
ALU write-back bus is one of these components. The basic
structure of ALU is showed in Figure 1. As is clocked at the
highest speed and is kept busy almost 100% of the time,
ALU is one of the most power hungr components in
processor and is ofen the possible location of hot spots [ 1].
Therefore, low power design of ALU can considerably
reduce the total power consumption of a processor.
A ALU combines a variety of arithmetic and logic
operations into a single unit. For examples, a tpical ALU
might perform additional, subtraction, AND, OR, and XOR
operations. Since the architecture of ALU has several
978-1-4244-8102-6/10/$26.00 C 2010 IEEE 430
Anping Jiang
Second Design Department
Beiing Microelectronics Technology Institute
Beijing, China
Email: apjiang@gmail.com
implications on power consumption, delay, and area, then
how to organize the operations is a problem. In this paper,
we are mainly concer with the power consumption of the
ALU. Hence proper choice of ALU architecture is needed
when the design is targets for low power dissipation.
MUX control
operand A -
operand B
Figure 1. Basic ALU organization
Previous work indicates that there are three tpical ALU
architectures: A. Complex stucture [2] [3], B. Adder
independent structure [4] [5], c. Tree structure and chain
structure [6]. Complex structure combines a variet of
arithmetic and logic operations into a single unit and uses
several control signals to choose the desired operation.
Adder independent stucture uses an adder performs
arithmetic operations and an individual module performs
logic operations. Tree structure or chain structure organizes a
set of fnctional components as a tree or a chain, each
component is responsible for one tpe operation. In order to
explore which architecture is good for low power, an 8-bit
ALU which performs arithmetic operations and logic
operations is designed with 0. 18um static CMOS process.
The paper is organized as follows. In section II, the
ALU is designed and flflled with the three diferent
architectures. Simulation results and discussions are showed
in section III and fnally the conclusions are presented in
section I.
II. ALU DESIGN
By researching on the instructions of processors, we fnd
that all the instructions ALU performs can be accomplished
through basic operations such as Addition, Subtraction, And,
Or, Not, Xor and Clear. In this paper, we are concentrating
on the efects of ALU architectures have on power
consumption. Therefore, we design an ALU that is 8-bit
width and capable of the basic operations with the three
2010 Interational Conference on Mechanical and Electrical Technolog (ICMET 2010)
diferent architectures. Table 1 shows the fnction of the
designed ALU.
TABLE I. FUNCTIONAL TABLE OF THE DESIGNED ALU
op-code
Operation fnction
s,s_s,s,
0000 Addition F= A+B
0001 Addition with carry F= A+B+C;n
0010 Increment F= A+I
0011 Decrement F= A-I
0100 Subtraction with barrow F= A-B-C;n
0101 Inversion F= A'
OliO A AND B F= A&B
0111 AOR B F= A I B
1000 AXORB F= AfB
1001 Clear A F=O
Adder is one of the most basic components of aritmetic
circuits and is usually on the critical path [4]. Therefore,
adders have received a lot of attentions fom researchers. The
Ripple Ca Adder (RCA) is the earliest and the most
fndamental adder. I is O(n)time, O(n) area adder. The
Carr Look-ahead Adder (CLA) becomes popular due to its
speed and modularit. I is O(log n) time, O( n log n)
area adder. Besides, the power consumption of CLA is lower
than CSA, CSL est. [7].
As we all know, the relationship of the inputs and outputs
of an adder can be expressed as follows:
G=A&B
(1)
P=ArB
(2)
!,,(U,J)= U+ J!
(3)
(U, J)= Jr
!
(4)
Here, U stands for ca generation, P stands for carr
propagate,
! ,,
stands for output of carr,
!
stands for the
carr in fom the lower bit, and stands for the sum output.
I an N-bit adder, there exists the relationship
'-,; = Jl;-^;-'-,;.,)
. The CLA speeds up the
addition process by eliminating the ripple delay. Then the
! ,,
expression of CLA can be developed as:
(5)
For the architecture described in formula (5) is efective
on N:4 [7], therefore we design the 8-bit adder of the
ALU with the adder scheme which uses interal carr look
431
Figure 2. The 8-bit Carry Look-ahead adder
A. complex structure
In complex structure, we can accomplish the logic
operation unit by modifing the P and G block of the adder.
The modifed P and G block of complex structure is showed
in Figure 3. A 4-bit ALU of complex structure is showed in
Figure 4.
g
p
Figure 3. I-bit modifed P G block
Figure 4. A 4-bit ALU of complex structure
In fgure 3, sO, sl, s2 and s3 are the signals which contol
the fnction of the P G block. Signal c is the signal which
decides the ALU performs arithmetic or logic operation.
For the 8-bit adder of the ALU is 4-bit CLA interal and
ripple carr across blocks, the 8-bit ALU is serialized by two
4-bit ALUs.
B. Adder independent structure
In this structure, we have an individual block performs
logic operations and a CLA adder performs arithmetic
2010 International Conference on Mechanical and Electrical Technolog (ICMET 2010)
operations. The logic operation block is showed in Figure 5
and the adder independent structure is showed in Figure 6.
For low power consideration, we adopt the operand
isolation technology. Two AND gates are added afer the
two operands as the broken line fame showed in Figure 6.
:::
Figure 5. Logic operation block
.,,
-
,, __
8{cs}
L-
Figure 6. Adder independent structure
C. Chain structure
... out
According to [6], chain structure AU has smaller area
and potentially faster than tee structure. And for the chain
structure, placing a fnctional component diferently in the
chain structure may cause diferent power consumption.
Then, we apply chain structure of diferent fnctional
components placements to benchmark Dhrstone, and get
the most power efcient one as showed in Figure 7.
cs]O]
n[7:0]
b]1:0]
Figure 7. Chain structure
For low power consideration, we adopt operand isolation
technology. Two de-multiplexers are added afer the two
operands as the broken line fame showed in Figure 7.
III. SrMUA nON RESULT AND DrSSCITION
I this section, we make a comparison of the power
consumption between diferent architectures. The AU is
designed with Verilog HL code for fnctional simulation,
is synthesized by Synopsys Design Compiler with the SMIC
O.l8um standard cell librar, and then the power is
calculated by Synopsys Prime Power.
Table II and Figure 8 show the power consumption,
delay and area of the three diferent AU architectures when
executing the benchmark Dhrystone at lOOMz. From the
432
table, it could be seen that the complex structure is most
power efcient and has the smallest area.
TABLE II. POWR, DELAY AND AREA OF ALU ARCHITECTURES.
Architecture
Power Delay Area
(in mW) ( in ns) (in um
2
)
Complex structure 2.057 4.99 42616
Without operand
2.752 3.99 48974
Adder isolation technology
independent With operand
2.640 4.49 55757
isolation technolog
Without operand
2.643 4.49 50864
Chain isolation technolog
structure With operand
2.456 4.99 66638
isolation technolog
Ch ain structure with oprand
isolation
Cha in str ucture without
operand isolation
Adder indepndent structure
with operand isolation
Adder indepndent structure
without operand isolaion
Complex strcrur
o 2 4
A (in IO411m2) Delay (in ns)

Power (i n mW)
6
Figure 8. Power, Delay and Area Comparisons
8
Compared Figure 3 with fgure 5, we fnd the modifed
P G logic in complex structure is only one 3-input OR gate
more than the Logic Operation Block in adder independent
structure. But the CLA in adder independent structure needs
an AND gate and a XOR gate to generate the P G signals.
Besides, the adder needs a multiplexer to select the suitable
output. For the multiplexer is at the end of the critical path,
the switching active here is much higher than other parts of
the AU. Then, there is no surprise that the power
consumption of the adder independent structure is much
higher than the complex structure, and so is the area.
Compared Figure 4 with Figure 6, we fnd the adder
independent structure has shorter path than the complex
structure. Therefore, the adder independent structure is
faster than the complex structure.
From Figure 5, Figure 6 and Figure 7, we fnd the logic
operation unit in adder independent structure and in chain
structure is diferent. Figure 9 shows the diference of them.
I must be noted that the power consumption of the logic
operation block is not signifcant here. To investigate which
structure is more power efective, we choose three groups of
test-benches that generate at random and get the result
showed in Table III and Figure 10.
2010 Interational Conference on Mechanical and Electrical Technolog (ICMET 2010)
cs[O]
sO sl s2 s3
Figure 9. The diference of adder independent structure and chain
structure
From table II I, we found that the power consumption of
the ALUs except the ones with chain structure all have an
observable decline when performs random operations. That
is because the percentage of arithmetic operations in
Dhrstone is much higher than in test-benches generate at
random, and an arithmetic operation consumes more power
than a logic operation do. When performs Dhrystone, the
power consumption of adder independent structure is more
than chain structure and reversed when performs random
instructions. That is because the chain structure is designed
fowing Dhrystone benchmark, and the test-bench has
signifcant efect on the power consuming of it.
TABLE III. POWER CONSUMPTION OF DIFFERENT ARCHITECTURES
APPLYING DIFFERENT TEST-BENCHES
Random
architecture operations
(mW)
Complex structure 1.836
Without
operand 2.481
Adder isolation
independent With
Chain
structure
operand 2.370
isolation
Without
operand 2.652
isolation
With
operand 2442
isolation
Chain Structure wi th
operand isolation
Chain Structure withol t
operand isolation
Adder Independent Structure
wi th operand isolation
Adder Independent Struclure
wi thout operand isolation
Complex Struc ture
Arithmeti
Logic
c
operation
operations
(mW)
(mW)
2.026 1.243
2.654 1.829
2.535 1.357
2.714 2.171
2.579 1.710
-Logic Operations (in mW)
o 0.5 1 1. 5 2 2.5 3
-Arithmetic Operations (in mW)
-Handom DDerations (in mW)
Figure 10. Power consumption of different architectures applying different
test-benches
433
From table II and table III, we found that when adopt
operand isolation technology, the power consumption of
both adder independent structure and chain structure
declined. The result is showed in table I.
TABLE IV. THE RESULT OF OPERAND ISOLA nON TECHNOLOGY
Test-bench
Adder Chain
independent structure
Dhrystone benchmark 4.24% 7.61%
Random operation 4.68% 8.59%
Random Arithmetic operations 4.69% 5.23%
Random Logic operations 34.78% 26.69%
Operand isolation technology have efect on power
saving here, but not signifcant. That is because the ALU is
kept active all the time, and an arithmetic operation
consumes more power than a logic operation. When the
ALU performs logic operations only, operand isolation
technology has signifcant efect on power saving.
I. CONCLUSIONS
We design a 8-bit ALU that performs 10 diferent basic
operations with three diferent architectures. Simulation
result shows that the complex structure has the smallest area
and most power efective. Compared with other ALU
architectures, the minimum power saving of the complex
structure is 19.38% and the corresponding area saving is
56.37%, which is maximum.
It is important to notice that low power design is a
systems engineering, and can be implemented at all levels of
design fow. The work we do here gives some advice on
selecting appropriate architecture when designing a low
power ALU.
REFERECES
[1] Swaroop Ghosh and Kaushik Roy, "Exploring High-Speed Low
Power Hybrid Arithmetic Units at Scaled Supply and Adaptive
Clock-Stretching", 2008 Asia and South Pacifc Design Automation
Conference, p635, 2008.
[2] David A. Patterson and John L. Hennessy, "Computer organization
and design: The hardware/sofware interface", third edition, Elsevier
Inc, p.B-26, 2005.
[3] Beom Seon Ryu, Jung Sok Yi, Kie Yong Lee and Tae Won Cho, "A
design of low power 16-B ALU", IEEE Trans, 1999.
[4] Patanjali Prakash and Saxena A.K, Design of low power high speed
ALU using feedback switching logical, 2009 Interational
Conference on Advances in Recent Technologies in Communication
and Computing, p899-902, 2009.
[5] Rajesh Karan Megalingam, Venkat Krishnan.B, Mithun.M, Rahul
Srikumar and Vineeth Sarma. V, Gating and serializing the data path
of CPU for low power consumption, 2009 Interational Conference
on Parallel Processing Workshops, p550-557, 2009.
[6] Yu Zhou and Hui Guo, Application specifc low power ALU design,
2008 IEEEIIFIP Interational Conference on Embedded and
Ubiquitous Computing, p214-220, 2008.
[7] Chetana Nugendra, Mar Jane Irwin and Robert Michael Owens,
Area-Time-Power Tradoffs in Parallel Adders, IEEE Transactions on
circuit and systems-II, p689-702, 1996.

S-ar putea să vă placă și