2010 Interational Conference on Mechanical and Electrical Technolog (ICMET 2010)
1h81uW8t U88gatun Lumgat8un u U8t8ntA1 Atcht8ctut88
Junkai Sun Second Design Department Beijing Microelectronics Technology Institute Beijing, China Email: sunjunkai86@hotmail.com Abstract-Low power is a challenging work in processor design. Implementing power optimization on all components of the processor is a choice. One of the most basic components in processor is the Arithmetic and Logic Unit (ALU), which performs arithmetic operations and logic operations. The architecture of ALU has several implications on power consumption, delay and area. In this paper, diferent ALU architectures are described and discussed. To fnd out which ALU architecture provides the best power efciency, an 8-bit ALU of the diferent architectures is designed. Compared with other architectures, the power savings of the most power efective architecture range from 19.38% to 33.87%. At the same time, the corresponding area savings range from 56.37% to 14.92%. Keywords-ALU architecture; Low power; Oerand isolaton. I. INTRODUCTION Batter-powered and hand-held devices such as laptop computers and cell phones have improved our daily life greatly. It is required that te hardware system have to be fast and multifnctional. But for the purpose of portabilit, the minimum of batter weight and the maximum of operating time between battery recharge (which is decided by power consumption) are required. Therefore, it is urgent for designers to develop circuits and systems that use less energy without greatly sacrifcing the performance. Power optimization of a processor can be flflled at many diferent levels of the design hierarchy with diferent method. For example, Dynamic Voltage Scaling (DVS) at system level, bus-coding at algorithm level, clock-gating and operand isolation at register transfer level, transistor sizing and threshold voltage scaling at circuit & transistor level. Power optimization of a processor can also be implemented on all components of it. Arithmetic and Logic Unit (ALU) which takes operads fom register fle, data memor or ALU write-back bus is one of these components. The basic structure of ALU is showed in Figure 1. As is clocked at the highest speed and is kept busy almost 100% of the time, ALU is one of the most power hungr components in processor and is ofen the possible location of hot spots [ 1]. Therefore, low power design of ALU can considerably reduce the total power consumption of a processor. A ALU combines a variety of arithmetic and logic operations into a single unit. For examples, a tpical ALU might perform additional, subtraction, AND, OR, and XOR operations. Since the architecture of ALU has several 978-1-4244-8102-6/10/$26.00 C 2010 IEEE 430 Anping Jiang Second Design Department Beiing Microelectronics Technology Institute Beijing, China Email: apjiang@gmail.com implications on power consumption, delay, and area, then how to organize the operations is a problem. In this paper, we are mainly concer with the power consumption of the ALU. Hence proper choice of ALU architecture is needed when the design is targets for low power dissipation. MUX control operand A - operand B Figure 1. Basic ALU organization Previous work indicates that there are three tpical ALU architectures: A. Complex stucture [2] [3], B. Adder independent structure [4] [5], c. Tree structure and chain structure [6]. Complex structure combines a variet of arithmetic and logic operations into a single unit and uses several control signals to choose the desired operation. Adder independent stucture uses an adder performs arithmetic operations and an individual module performs logic operations. Tree structure or chain structure organizes a set of fnctional components as a tree or a chain, each component is responsible for one tpe operation. In order to explore which architecture is good for low power, an 8-bit ALU which performs arithmetic operations and logic operations is designed with 0. 18um static CMOS process. The paper is organized as follows. In section II, the ALU is designed and flflled with the three diferent architectures. Simulation results and discussions are showed in section III and fnally the conclusions are presented in section I. II. ALU DESIGN By researching on the instructions of processors, we fnd that all the instructions ALU performs can be accomplished through basic operations such as Addition, Subtraction, And, Or, Not, Xor and Clear. In this paper, we are concentrating on the efects of ALU architectures have on power consumption. Therefore, we design an ALU that is 8-bit width and capable of the basic operations with the three 2010 Interational Conference on Mechanical and Electrical Technolog (ICMET 2010) diferent architectures. Table 1 shows the fnction of the designed ALU. TABLE I. FUNCTIONAL TABLE OF THE DESIGNED ALU op-code Operation fnction s,s_s,s, 0000 Addition F= A+B 0001 Addition with carry F= A+B+C;n 0010 Increment F= A+I 0011 Decrement F= A-I 0100 Subtraction with barrow F= A-B-C;n 0101 Inversion F= A' OliO A AND B F= A&B 0111 AOR B F= A I B 1000 AXORB F= AfB 1001 Clear A F=O Adder is one of the most basic components of aritmetic circuits and is usually on the critical path [4]. Therefore, adders have received a lot of attentions fom researchers. The Ripple Ca Adder (RCA) is the earliest and the most fndamental adder. I is O(n)time, O(n) area adder. The Carr Look-ahead Adder (CLA) becomes popular due to its speed and modularit. I is O(log n) time, O( n log n) area adder. Besides, the power consumption of CLA is lower than CSA, CSL est. [7]. As we all know, the relationship of the inputs and outputs of an adder can be expressed as follows: G=A&B (1) P=ArB (2) !,,(U,J)= U+ J! (3) (U, J)= Jr ! (4) Here, U stands for ca generation, P stands for carr propagate, ! ,, stands for output of carr, ! stands for the carr in fom the lower bit, and stands for the sum output. I an N-bit adder, there exists the relationship '-,; = Jl;-^;-'-,;.,) . The CLA speeds up the addition process by eliminating the ripple delay. Then the ! ,, expression of CLA can be developed as: (5) For the architecture described in formula (5) is efective on N:4 [7], therefore we design the 8-bit adder of the ALU with the adder scheme which uses interal carr look 431 Figure 2. The 8-bit Carry Look-ahead adder A. complex structure In complex structure, we can accomplish the logic operation unit by modifing the P and G block of the adder. The modifed P and G block of complex structure is showed in Figure 3. A 4-bit ALU of complex structure is showed in Figure 4. g p Figure 3. I-bit modifed P G block Figure 4. A 4-bit ALU of complex structure In fgure 3, sO, sl, s2 and s3 are the signals which contol the fnction of the P G block. Signal c is the signal which decides the ALU performs arithmetic or logic operation. For the 8-bit adder of the ALU is 4-bit CLA interal and ripple carr across blocks, the 8-bit ALU is serialized by two 4-bit ALUs. B. Adder independent structure In this structure, we have an individual block performs logic operations and a CLA adder performs arithmetic 2010 International Conference on Mechanical and Electrical Technolog (ICMET 2010) operations. The logic operation block is showed in Figure 5 and the adder independent structure is showed in Figure 6. For low power consideration, we adopt the operand isolation technology. Two AND gates are added afer the two operands as the broken line fame showed in Figure 6. ::: Figure 5. Logic operation block .,, - ,, __ 8{cs} L- Figure 6. Adder independent structure C. Chain structure ... out According to [6], chain structure AU has smaller area and potentially faster than tee structure. And for the chain structure, placing a fnctional component diferently in the chain structure may cause diferent power consumption. Then, we apply chain structure of diferent fnctional components placements to benchmark Dhrstone, and get the most power efcient one as showed in Figure 7. cs]O] n[7:0] b]1:0] Figure 7. Chain structure For low power consideration, we adopt operand isolation technology. Two de-multiplexers are added afer the two operands as the broken line fame showed in Figure 7. III. SrMUA nON RESULT AND DrSSCITION I this section, we make a comparison of the power consumption between diferent architectures. The AU is designed with Verilog HL code for fnctional simulation, is synthesized by Synopsys Design Compiler with the SMIC O.l8um standard cell librar, and then the power is calculated by Synopsys Prime Power. Table II and Figure 8 show the power consumption, delay and area of the three diferent AU architectures when executing the benchmark Dhrystone at lOOMz. From the 432 table, it could be seen that the complex structure is most power efcient and has the smallest area. TABLE II. POWR, DELAY AND AREA OF ALU ARCHITECTURES. Architecture Power Delay Area (in mW) ( in ns) (in um 2 ) Complex structure 2.057 4.99 42616 Without operand 2.752 3.99 48974 Adder isolation technology independent With operand 2.640 4.49 55757 isolation technolog Without operand 2.643 4.49 50864 Chain isolation technolog structure With operand 2.456 4.99 66638 isolation technolog Ch ain structure with oprand isolation Cha in str ucture without operand isolation Adder indepndent structure with operand isolation Adder indepndent structure without operand isolaion Complex strcrur o 2 4 A (in IO411m2) Delay (in ns)
Power (i n mW) 6 Figure 8. Power, Delay and Area Comparisons 8 Compared Figure 3 with fgure 5, we fnd the modifed P G logic in complex structure is only one 3-input OR gate more than the Logic Operation Block in adder independent structure. But the CLA in adder independent structure needs an AND gate and a XOR gate to generate the P G signals. Besides, the adder needs a multiplexer to select the suitable output. For the multiplexer is at the end of the critical path, the switching active here is much higher than other parts of the AU. Then, there is no surprise that the power consumption of the adder independent structure is much higher than the complex structure, and so is the area. Compared Figure 4 with Figure 6, we fnd the adder independent structure has shorter path than the complex structure. Therefore, the adder independent structure is faster than the complex structure. From Figure 5, Figure 6 and Figure 7, we fnd the logic operation unit in adder independent structure and in chain structure is diferent. Figure 9 shows the diference of them. I must be noted that the power consumption of the logic operation block is not signifcant here. To investigate which structure is more power efective, we choose three groups of test-benches that generate at random and get the result showed in Table III and Figure 10. 2010 Interational Conference on Mechanical and Electrical Technolog (ICMET 2010) cs[O] sO sl s2 s3 Figure 9. The diference of adder independent structure and chain structure From table II I, we found that the power consumption of the ALUs except the ones with chain structure all have an observable decline when performs random operations. That is because the percentage of arithmetic operations in Dhrstone is much higher than in test-benches generate at random, and an arithmetic operation consumes more power than a logic operation do. When performs Dhrystone, the power consumption of adder independent structure is more than chain structure and reversed when performs random instructions. That is because the chain structure is designed fowing Dhrystone benchmark, and the test-bench has signifcant efect on the power consuming of it. TABLE III. POWER CONSUMPTION OF DIFFERENT ARCHITECTURES APPLYING DIFFERENT TEST-BENCHES Random architecture operations (mW) Complex structure 1.836 Without operand 2.481 Adder isolation independent With Chain structure operand 2.370 isolation Without operand 2.652 isolation With operand 2442 isolation Chain Structure wi th operand isolation Chain Structure withol t operand isolation Adder Independent Structure wi th operand isolation Adder Independent Struclure wi thout operand isolation Complex Struc ture Arithmeti Logic c operation operations (mW) (mW) 2.026 1.243 2.654 1.829 2.535 1.357 2.714 2.171 2.579 1.710 -Logic Operations (in mW) o 0.5 1 1. 5 2 2.5 3 -Arithmetic Operations (in mW) -Handom DDerations (in mW) Figure 10. Power consumption of different architectures applying different test-benches 433 From table II and table III, we found that when adopt operand isolation technology, the power consumption of both adder independent structure and chain structure declined. The result is showed in table I. TABLE IV. THE RESULT OF OPERAND ISOLA nON TECHNOLOGY Test-bench Adder Chain independent structure Dhrystone benchmark 4.24% 7.61% Random operation 4.68% 8.59% Random Arithmetic operations 4.69% 5.23% Random Logic operations 34.78% 26.69% Operand isolation technology have efect on power saving here, but not signifcant. That is because the ALU is kept active all the time, and an arithmetic operation consumes more power than a logic operation. When the ALU performs logic operations only, operand isolation technology has signifcant efect on power saving. I. CONCLUSIONS We design a 8-bit ALU that performs 10 diferent basic operations with three diferent architectures. Simulation result shows that the complex structure has the smallest area and most power efective. Compared with other ALU architectures, the minimum power saving of the complex structure is 19.38% and the corresponding area saving is 56.37%, which is maximum. It is important to notice that low power design is a systems engineering, and can be implemented at all levels of design fow. The work we do here gives some advice on selecting appropriate architecture when designing a low power ALU. REFERECES [1] Swaroop Ghosh and Kaushik Roy, "Exploring High-Speed Low Power Hybrid Arithmetic Units at Scaled Supply and Adaptive Clock-Stretching", 2008 Asia and South Pacifc Design Automation Conference, p635, 2008. [2] David A. Patterson and John L. Hennessy, "Computer organization and design: The hardware/sofware interface", third edition, Elsevier Inc, p.B-26, 2005. [3] Beom Seon Ryu, Jung Sok Yi, Kie Yong Lee and Tae Won Cho, "A design of low power 16-B ALU", IEEE Trans, 1999. [4] Patanjali Prakash and Saxena A.K, Design of low power high speed ALU using feedback switching logical, 2009 Interational Conference on Advances in Recent Technologies in Communication and Computing, p899-902, 2009. [5] Rajesh Karan Megalingam, Venkat Krishnan.B, Mithun.M, Rahul Srikumar and Vineeth Sarma. V, Gating and serializing the data path of CPU for low power consumption, 2009 Interational Conference on Parallel Processing Workshops, p550-557, 2009. [6] Yu Zhou and Hui Guo, Application specifc low power ALU design, 2008 IEEEIIFIP Interational Conference on Embedded and Ubiquitous Computing, p214-220, 2008. [7] Chetana Nugendra, Mar Jane Irwin and Robert Michael Owens, Area-Time-Power Tradoffs in Parallel Adders, IEEE Transactions on circuit and systems-II, p689-702, 1996.