Sunteți pe pagina 1din 10

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 15, NO.

2, FEBRUARY 2007

125

Clock Delayed Domino Logic With Efcient Variable Threshold Voltage Keeper
Amir Amirabadi, Student Member, IEEE, Ali Afzali-Kusha, Senior Member, IEEE, Yousof Mortazavi, Member, IEEE, and Mehrdad Nourani, Senior Member, IEEE

AbstractIn this paper, efcient clock delayed domino logic with variable strength voltage keeper is proposed. The variable strength of the keeper is achieved through applying two different body biases to the keeper. The circuits used to generate the body biases are called capacitive body bias generator and cross-coupled capacitive body bias generator. Compared to a previous work, the body bias generator circuits presented in this paper are simpler and do not require double or triple power supply while consuming less area and power. To show the efciency of the proposed technique, the implementation of a carry generator circuit by the proposed techniques and the previous work are compared. The simulation results for standard CMOS technologies of 0.18 m and 70 nm show considerable improvements in terms of power and power delay product. In addition, the proposed technique shows much less temperature dependence when compared to that of previous work. Index TermsCMOS digital integrated circuits, combinational logic circuits, leakage currents, very high-speed integrated circuits, VLSI systems.
Fig. 1. Standard clock delayed domino logic.

I. INTRODUCTION

OMINO logic has been widely employed in high-performance digital circuits to reduce the area and enhance the speed [1]. One variation of this style is clock-delayed domino logic, where the cascading of stages is realized using a delayed clock for each stage. The amount of delay for each stage is determined by the evaluation time of the previous stage [2]. In this paper, we refer to clock-delay domino as domino for simplicity. The domino logic style, however, has a lower noise margin which becomes more problematic as the technology scales down and the operating frequency increases, making the design of the domino logic for deep-submicrometer circuits challenging [1]. A standard domino (SD) logic, which consists of the logic (pull-down network or PDN), a precharge transistor, the evaluation transistor, an inverter, and a feedback keeper

Manuscript received April 13, 2005; revised October 30, 2006. This work was supported by the Research Council of the University of Tehran, Tehran, Iran. A. Amirabadi and A. Afzali-Kusha are with the Nanoelectronics Center of Excellence, School of Electrical and Computer Engineering, University of Tehran, Tehran 14395/515, Iran (e-mail: a.amirabadi@ece.ut.ac.ir; afzali@ut.ac.ir). Y. Mortazavi was with the Nanoelectronics Center of Excellence, School of Electrical and Computer Engineering, University of Tehran, Tehran 14395/515, Iran. He is now with the Department of Electrical and Computer Engineering, University of Texas, Austin, TX 78712-0240 USA (e-mail: author@lamar. colostate.edu). M. Nourani is with the Department of Electrical Engineering, University of Texas at Dallas, Richardson, TX 75083-0688 USA (e-mail: nourani@utdallas. edu). Digital Object Identier 10.1109/TVLSI.2007.891097

transistor , is shown in Fig. 1. In the precharge phase, the output node is charged to high while in the evaluation phase, depending on the inputs, either the output should remain high or discharge to low. During the evaluation phase, when the dynamic node should keep its high state, the keeper should keep the state of the dynamic node against the coupling noise, charge sharing, and leakage current. Note that the keeper, which is typically sized smaller than the pull-down network, is fully on at the beginning of the evaluation phase. At this point, if the dynamic node should be discharged, the pull-down network and the keeper compete in determining the state of the dynamic node. This degrades the speed and increases the power consumption. From the performance perspective, the keeper should be sized as small as possible, whereas from the noise-immunity viewpoint, it should be sized as large as possible. Therefore, either a tradeoff between the reliability and the high-speed/energy-efcient operation should be made [1] or the issue should be addressed using variable strength keepers. This should lead to a lower drive current at the beginning of the evaluation phase, for power and delay reduction, and high current drive for enhanced noise immunity during the rest of the period [1]. The scheme proposed in [2] employs two keeper transistors, where one is sized small to reduce the competition current (the current drawn by the keeper during the competition), and the other is sized larger to improve the noise immunity [3]. This scheme requires a delay element and a control circuit that increases the power and area overhead [1]. Another scheme, proposed in [4], makes use of a single keeper that is off at the beginning of the evaluation phase. This causes the dynamic node to oat at the beginning of the evaluation phase making it sensitive to noise [1].

1063-8210/$25.00 2007 IEEE

126

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 15, NO. 2, FEBRUARY 2007

Another technique, proposed in [1], employs a variable voltage threshold keeper to control the competition current. The threshold voltage is dynamically modied by changing the body bias of a p-channel transistor at a voltage greater for increasing or decreasing the threshold or less than voltage, respectively. Here, if needed, the keeper size may be increased without signicantly degrading the power and the speed. Two body biasing circuits are given in [1], where one only generates a voltage greater than the circuit supply voltage for increasing the threshold voltage and the other circuit [1], [5]. can generate both voltages greater or less than The major disadvantages of these circuits are the double and triple power supply operation. In this paper, to overcome the previous problems, the bias generator circuits proposed in [1] are replaced by two efcient circuits called capacitive body bias generator (CBBG) for increasing the threshold voltage and cross couple capacictive body bias generator (C BBG) for increasing and decreasing the threshold voltage. The structure of this paper is as follows. Section II briey describes the variable threshold keeper technique. The circuits proposed for body bias generations are presented in Sections VI and IV, while the area and power comparisons of the methods are discussed in Section V. The results are discussed in Section VI, and the summary and conclusion are given in Section VII. II. VARIABLE THRESHOLD VOLTAGE KEEPER TECHNIQUE As discussed in the Section I, high-speed yet energy-efcient operation of domino logic calls for a variable strength keeper, with lower drive current at the beginning of the evaluation phase, for power and delay reduction, and high current drive for enhanced noise immunity during the rest of the period. By biasing , the body of a p-channel transistor at a voltage greater than the threshold voltage may be increased due to the body effect, making the keeper transistor weak. Toward the end of the evaluation process when a strong keeper may be needed to keep the high state for the dynamic output node, the keeper body may be , to lower the threshold voltage compared biased lower than to the zero-biased state. This would improve the noise immunity of the circuit in a noisy environment. Note that the enhancement is achieved without increasing the keeper size, and, therefore, the gate capacitance does not increase signicantly when compared to the case of increasing the width of the keeper transistor. Thus, a variable strength keeper may be implemented if the body bias is dynamically modied during the circuit operation to have a varying drive current. A control circuit for dynamically biasing the keeper according to the phases of the domino logic is required for each domino gate. This circuit should be controlled by the clock which also determines the phases of domino logic namely precharge and evaluation. III. BODY BIAS GENERATOR FOR INCREASING THE KEEPER THRESHOLD VOLTAGE In this section, rst the concepts of the technique proposed in [1] are discussed, and then the body bias generator circuit of this paper is presented.

Fig. 2. DBBG of [1].

Fig. 3. CBBG circuit and its input clock.

A. Dynamic Body Bias Generator (DBBG) With Two Supply Voltages To increase the threshold voltage of the keeper in [1], the dynamic biasing of the keeper body is controlled with the DBBG circuit shown in Fig. 2. When the clock is high, , , and are on and is connected to which means low for the . Siminoninverting delay gates. After a delay, becomes become on making larly, when the clock is low, , , and high and after a delay rises to . This circuit and which generates a square wave signal between is delayed through the noninverting delay circuit (repeated for each gate) and is connected to the body of the keeper transistor of each domino gate. This control circuit consumes some power and silicon area and requires a second power supply voltage. B. CBBG Similar to the variable threshold technique used in static CMOS circuits, by varying the voltage (charge) of a oat capacitance, a variable body bias voltage may be generated [6]. As discussed in [6], the parasitic capacitance between substrate and diffusion of the static nodes (the nodes which are staying high or low during the evaluation phase) work effectively to stabilize substrate-bias against high-frequency noise. We propose a body bias generator which is based on the voltage doubler technique and makes use of a single supply. Fig. 3

AMIRABADI et al.: CLOCK DELAYED DOMINO LOGIC WITH EFFICIENT VARIABLE THRESHOLD VOLTAGE KEEPER

127

Fig. 4. Waveforms for the CBBG for technology 0.18-m CMOS with V 1.8 V.

shows the circuit structure of the so-called CBBG which consists of a diode-connected transistor and a capacitor. With this structure, both the area and the power overhead are reduced. The operation of the CBBG, which is controlled by the circuit clock, determines different phases of the domino circuit. Since, , after several clock cycles, the capacitance is charged to under the steady-state operation, the capacitor is fully charged. is low, the diode connected transistor In steady state, when for any slight turns on and the capacitor may be charged to is taken from an inverted stage of the discharge. The signal clock delay circuit which is required for the normal operation of the clock-delayed domino logic [7]. On the other hand, when is high, the diode turns off causing the output node to be oat. This means that its voltage would be determined by and the voltage of the capacitor giving rise to the sum of for this node. almost which is the body Note that node is connected to the capacitance of the keeper transistor and to the top plate of while node is connected to bottom plate of and the output capacitance of the clock delay circuit. During the transition of to , since the voltage of changes node from from to , some charge transfer from to occurs. For a similar reason, the charge transfer is reversed during the to . transition of node from has some insigniIn addition, during these transitions, cant loading effect on the clock delay circuit. The waveform for the CBBG, obtained from the HSPICE simulation for 0.18- m CMOS technology, is shown in Fig. 4. The maximum voltage which is due to the charge of the body bias is lower than sharing with the bulk capacitance of the keeper. To minimize the charge sharing effect, therefore, the minimum size of the capacitor is set, for example, to twice the keeper bulk capacitance. The body of the p-channel diode-connected transistor is confor reasons exnected to the output node, rather than to plained here. As seen from the waveform, the voltage at node is (almost) always higher than or equal to making node (almost) always the source of the transistor during the steady state. With the body connected to the source, the subthreshold current is reduced, resulting in a slower discharge of the camakes the pacitor. In addition, connecting the body to bulk-source junction forward biased when the voltage of node

exceeds . In this case, the capacitor discharges to the supply through the bulk of the transistor which is not desired. Finally, note that in the regular diode-connected p-channel , drop should exist across the transistor, a threshold voltage diode. In this scheme, however, in the transient state when the drain is the terminal connected to node , the voltage drop lies on the order of several tens of millivolts. There are two reasons for this. First, the bulk-source junction becomes forward-biased in the transient state, injecting additional charging current from the bulk of the transistor to node . Second, the forward-biased bulk-source junction reduces the threshold voltage, thereby, in. The creasing the source-drain current to charge node to maximum discharge of this node in the steady state for the technologies (0.18 m and 70 nm [8]) used in this paper is around several tens of millivolts. As evident from Fig. 4, the body bias ( 1.8 V for generator produces a voltage close to an 0.18- m technology) which remains almost constant for the after the beginning of the evaluation phase. This duration of time is taken as the worst-case evaluation delay of the standard reduces the noise immunity, whereas domino gate. Higher increases the noise immunity, and, hence, a delay and less noise immunity tradeoff should be made. The reduction of the noise immunity is due to the fact that for this case the threshold voltage of the keeper will be larger and, hence, the dynamic node will be oat for a longer time. IV. BODY BIAS GENERATOR FOR INCREASING AND DECREASING THE KEEPER THRESHOLD VOLTAGE In this section, rst the technique proposed in [1] is discussed and then the body bias generator circuit of this paper is presented. A. DBBG With Three Supply Voltages A dynamic body biasing technique which can generate a voltage higher and lower than is proposed in [1], [5]. The circuit, shown in Fig. 5, makes use of three voltage supplies. The circuit operation is similar to that of the circuit shown in while Fig. 2. The domino logic circuit is supplied by at the the control circuitry connects the keeper body to beginning of the evaluation phase for speed enhancement, for noise immunity during the rest of the and then to , and evaluation phase. This requires has the problems associated with multiple supply circuits. B. C BBG In this paper, a C BBG, depicted in Fig. 6, is proposed. The operation of the circuit which is based on the circuit given in [10] is explained as follows. During the transient period, when is low (high), is on (off) charging to , and as becomes low (high), becomes to . In the steady state, and on (off) charging are charged to and when the becomes high, the body oats to a voltage higher than . Note that bias node for this circuit, the time that both transistors are simultaneously and . With on is proportional to the overlap between and approxithis structure, the body bias toggles between . mately

128

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 15, NO. 2, FEBRUARY 2007

Fig. 7. Body vias voltage in 122-kHz scaling down operating frequency.

Fig. 5. Dynamic body bias generator for generating voltages higher and lower (V ) [1]. than V

Fig. 8. Use of transistor sizing for the ne tuning of the output voltage of C BBG.

Fig. 6. C BBG [10].

Here, it should be mentioned that for the normal operation frequencies, the voltage of the oating capacitor is not very sensitive to the leakage currents (the bulk reversed biased -junc) during the evaluation tion and the subthreshold currents of phase. To assess the voltage reduction which is maximized at low frequencies, we have simulated the circuit at a very low frequency of, say, 122 kHz (1/4096th of the maximum frequency) and shown the waveform in Fig. 7. The voltage level of the body reduces around 1 mV from 1.6 V (0.062%) during the evaluation phase and hence the loss may be ignored. Another concern regarding this cross-coupled circuit could be the effect of a transistor mismatch on the body voltage level generated by the circuit. Note that, for this application, each of the transistors works as a switch which most of the times, depending on the level, one of them is on and the other is off. The mismatch between these transistors mainly changes the transistor on-resistance which does not affect the body bias voltage in the steady state. Therefore, the mismatch effect is not a prime concern for the proper operation of the device. for forward body biasing, the To lower the voltage below may be reduced through two charging current and time of methods. In one method, the gate voltage of is decreased

Fig. 9. Use of delaying the input clksfor the coarse tuning of the output voltage of C BBG.

by using a smaller and increasing the channel length of . This has two effects. First, when is low, charged to a , and, second, lower voltage due to increased resistance of makes a transition from low to high, the voltage of when node drops more due to charge sharing. Note that here smaller is more comparable to the junction capacitance on node and more signicant charge sharing effect. The smaller voltage in the next of this node reduces the driving capability of to a voltage lower than producing phase which charges a smaller body bias.

AMIRABADI et al.: CLOCK DELAYED DOMINO LOGIC WITH EFFICIENT VARIABLE THRESHOLD VOLTAGE KEEPER

129

Fig. 10. 4-bit multiple-output carry generator of a carry look-ahead adder implemented with the capacitive variable threshold voltage keeper [7].

The other method reduces the charging time of through , where the charging time of is reduced by delaying the overlap time between and , and, therefore, it is less than half of the clock period. Using these methods, a forward body bias of 0.5 V is easily attainable in a 0.18- m technology without affecting the performance of the body bias generator circuit. The rst method is used for the ne tuning while the second method is utilized for coarse tuning. These two effects are shown in Figs. 8 and 9, respectively. Finally, note that in has Fig. 8, the M1 has varied from 0.18 to 0.9 m while changed from 3 to 10 fF. This also shows that the process variation does not considerably affect the technique accuracy. V. AREA AND POWER COMPARISON For the clock delay circuit required for the CBBG and C BBG, we make use of the clock delay circuit available in the clock-delayed domino logic. For the capacitors in the CBBG/C BBG, we utilize pMOS transistors whose source, drain, and body terminals are connected as one terminal and the gate as the other terminal. The size of the transistors (capacitors) depends on the size of the keeper transistor used for the circuit. The features of the proposed circuits lead to a simpler body bias generator circuit for the CBBG and C BBG which in turn gives rise to a smaller area and power compared to those of DBBG [1]. In addition, the use of DBBG requires two or three power supplies which impose some area overhead for routing the supply interconnections. In DBBG circuits of Figs. 3 and 5, transistors and of the DBBG are used for the whole chip while

each domino gate requires its own noninverting delay gates. Note that any noninverting delay gate requires at least two inverters (four transistors). In contrast, CBBG and C BBG require smaller numbers of minimum feature size transistors per domino gate compared to their DBBG counterparts, as will be discussed in Section VI, for example. The capacitors are fully charged under the steady state eliminating any dc path and ground. In addition, most of the charge between the is transferred back and forth between the capacitor and the keeper body and any charge leakage is compensated by the diode, minimizing the loading effect on the clock delay circuit. Based on the previous discussion, CBBG and C BBG are expected to have lower power consumption compared to their DBBG counterparts. The corresponding noise margins should be identical. VI. RESULTS AND DISCUSSION To evaluate the performance of the proposed technique, two 4-bit multiple-output domino carry generators (CGs) [9] were implemented using the CBBG and C BBG techniques. These circuits are shown in Figs. 10 and 11, respectively. For the comparison, the SD and DBBG counterparts were implemented based on the sizing presented in [1] for all transistors in the 0.18- m standard CMOS technology with the clock frequency of 500 MHz. The worst-case evaluation delay of the CG circuit occurs while discharging the fourth dynamic node (Cn4) from high and G1-G4 low. the critical path with P1-P4 and Similar to [1], the keeper size is specied through a parameter which is dened as the ratio of the keeper size to called

130

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 15, NO. 2, FEBRUARY 2007

Fig. 11. 4-bit multiple-output carry generator of a carry look-ahead adder implemented with the cross-coupled capacitive variable threshold voltage keeper [9].

the equivalent size of the critical path. The number of minimum size transistors used for CBBG and C BBG were 4 and 8 per gate, respectively. In the case of DBBG, the corresponding number of minimum size transistors used for the noninverting delay gates was 16 per gate. The widths of the transistors used in the CG circuit for all circuits were again selected based on the same guideline as specied by [1], where the widths are scaled based on the minimum width of the technology. In the case of the 0.18- m standard CMOS technology, the widths for , , , , and were 1.1 m while for the transistors , , , and the widths were equal to 0.55, 0.37, 0.28, the width was 5 m and 0.22 m, respectively. Also, for while for the Pullup transistors 14 a width of 0.22 m was used. The transistor widths in the 70-nm CMOS technology were selected similarly. are The delays of the three circuits as a function of shown in Fig. 12(a). The delays of CBBG and DBBG are signicantly lower than that of SD. The delay reduction of the variable threshold keepers is the same for both body bias generator circuits. Fig. 12(b) compares the power consumption of the three circuits. As evident from the gure, the power consumption is the highest in the case of DBBG while it is the lowest in the case of CBBG due to the lack of need for using noninverting delay stage, as needed for the DBBG circuit. The power consumption difference becomes more signicant for higher keeper sizes. The power delay product (PDP) for each circuit is shown

TABLE I POWER, DELAY, AND PDP WITH KPR = 2:8 FOR A 0.18-m STANDARD CMOS TECHNOLOGY AT 500-MHz INPUT CLOCK

in Fig. 12(c) which demonstrates the lowest PDP for CBBG for of 2.8 all keeper sizes. Table I contains the results for a revealing 28% and 18% reductions in the power compared to DBBG and SD, respectively. The delays for DBBG and CBBG are the same which are equal to 59% lower than that of SD. The simulation results for the power, delay, and power delay product of the CBBG, DBBG, and SD circuits in a 70-nm standard CMOS technology with the clock frequency of 1 GHz for equal to 2.8 (as an example) are given in Table II. Again, a the sizing scheme similar to [1] was used in the design. The results again reveal lower delay, power, and power-delay product for CBBG in this technology. The layout of the CG circuit which is implemented in CBBG and 0.18- m standard CMOS technology has been plotted in Fig. 13. The post- and prelayout simulation results for the circuit are given in Table III. As the table shows, compared to the prelayout simulations the delay decreases while the power consumption increases, with the overall

AMIRABADI et al.: CLOCK DELAYED DOMINO LOGIC WITH EFFICIENT VARIABLE THRESHOLD VOLTAGE KEEPER

131

Fig. 13. Layout of CBBG.

TABLE III PRE- AND POST-LAYOUT SIMULATION RESULTS FOR POWER, DELAY, AND PDP OF CBBG CIRCUIT WITH KPR = 1:4 FOR A 0.18-m STANDARD CMOS TECHNOLOGY AT 500-MHz INPUT CLOCK

Fig. 12. CBBG circuit. (a) Evaluation delay versus KPR. (b) Power dissipation versus KPR. (c) Power delay product versus KPR.

TABLE II POWER, DELAY, AND PDP WITH KPR = 2:8 FOR A 70-nm STANDARD CMOS TECHNOLOGY AT 1-GHz INPUT CLOCK

effect of the power delay product reduction. We attribute this to the parasitic capacitance which reduces the clock feedthrough (see Fig. 14). To assess the temperature dependence of the proposed scheme the body bias waveforms for C BBG at two extreme temperatures are plotted in Fig. 15. It shows that the maximum voltage decreases (increases) at low (high) temperature extreme leading to a delay increase (decrease). As discussed previously, and increases, the simultathe overlap between the neous on time of the two transistors and increases.

Fig. 14 Effect of parasitic capacitance on the reduction of the clock feedthrough.

This duration is also enlarged as the threshold voltages of the transistors decreases. Note that at lower temperatures threshold voltage reductions occur leading to higher forward bias voltage

132

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 15, NO. 2, FEBRUARY 2007

Fig. 15. Keeper body bias at two different temperatures. Fig. 17. Power of the CG circuit as a function of temperature for KPR = 2.

Fig. 16. Delay of the CG circuit as a function of temperature for KPR = 2. Fig. 18. PDP of the CG circuit as a function of temperature for KPR = 2.

and hence attening the decrease of the delay for C BBG. The delay of the two circuits for as a function of temperature is drawn in Fig. 16 which reveals signicantly less temperature dependence over a wide temperature range. For both circuits, the body reverse bias voltage is equal to 3.2 V in the 0.18- m technology. As is evident from the gure, there is a noticeable increase in the delay of the DBBG circuit at 80 C. For higher temperatures, our simulations show that the DBBG circuit does not work properly whereas the C3BBG still functions correctly. The power consumption of the two CG circuits based on C BBG and DBBG as a function of temperature for a is illustrated in Fig. 17. The results show 32% less power consumption for the C BBG circuit which is attributed to the fact that this circuit does not need to use a noninverting delay stage, as needed for the DBBG circuit. Excluding the delay stages, the power of the two methods is comparable. Finally, power delay product (PDP) as a function of temperature for the two circuits is shown in Fig. 18 which reveals PDP reductions from 20% to 36% in this temperature range. Also, note that the temperature dependency on PDP is not signicant in the case of C BBG. A summary of the results is given in Table IV at two temperature extremes of 10 C and 80 C. Table V compares the power, delay, and power delay product of CG circuit designed using DBBG and C BBG in 70-nm standard CMOS

TABLE IV POWER, DELAY, AND PDP WITH KPR = 2 FOR A 0.18-m STANDARD CMOS TECHNOLOGY AT 500-MHz INPUT CLOCK (SIMULATION TEMPERATURES OF 10 C AND 80 C)

technology. Again, the power-delay product in for C BBG is smaller than DBBG and SD. The layout of the implemented CG with C BBG is depicted in Fig. 19 with the post- and prelayout simulation results are compared in Table VI. Again, the reduction in the delay is attributed to the clock feed through reduction owing to the parasitic capacitance. The gure of merit for noise immunity is usually considered to be the noise margin. For the dynamic logic, however, the classical denition fails and there is no consensus on the denition of dynamic noise margin. We use the denition which is illustrated in Fig. 20. After at least one precharge evaluate cycle of is applied to the domino circuit, a ramp voltage with width

AMIRABADI et al.: CLOCK DELAYED DOMINO LOGIC WITH EFFICIENT VARIABLE THRESHOLD VOLTAGE KEEPER

133

TABLE V POWER, DELAY, AND PDP WITH KPR = 2:8 FOR A 70-nm STANDARD CMOS TECHNOLOGY AT 1-GHz INPUT CLOCK

Fig 20. Dynamic noise margin calculation method.

Fig. 19. Layout of C BBG with KPR = 1.

TABLE VI PRE- AND POST-LAYOUT SIMULATION RESULTS FOR POWER, DELAY, AND PDP OF C BBG CIRCUIT WITH KPR = 1 FOR A 0.18-m STANDARD CMOS TECHNOLOGY AT 500-MHz INPUT CLOCK (SIMULATION TEMPERATURES OF 10 C AND 80 C)

Fig. 21. Noise margin versus the keeper size.

one of the inputs which causes the pull-down network to dis, while the other incharge the dynamic node down to puts are properly biased to high. Ideally, the ramp width should be , where for this duration the keeper is weak (see Fig. 4). The width, however, may not be enough to create the required drop in the dynamic node. To determine the width, using the bisection algorithm iteratively, we change the width of the ramp . This voltage until the dynamic node voltage reaches for the circuits with method leads to a ramp width around only the reverse body biasing while for the circuits with both the reverse and forward body biasings, the ramp width turns out

to be around . The need for a larger ramp width for the latter case is expected due to the fact that with the forward biasing, the keeper would be stronger requiring more noise energy to de. The dynamic crease the dynamic node voltage to noise margins of CBBG, DBBG, and SD as a function of are shown in Fig. 21(a). The size, with the minimum of 0.8 and the maximum of 2 used in the simulations, may be determined by the designer based on the required dynamic noise margin. As observed, the dynamic noise margins of CBBG and DBBG are smaller than the dynamic noise margin of SD. The noise margin of C BBG and DBBG as a function of KPR obtained at 25 C and 75 C for C BBG and DBBG are depicted in

134

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 15, NO. 2, FEBRUARY 2007

Fig. 21(b) which reveals a higher noise margin at low temperatures compared to that of the DBBG circuit. At higher temperature, however, the forward bias is lower in our scheme compared to that of DBBG leading to a smaller noise margin (see Fig. 15). VII. CONCLUSION In this paper, new clock delayed domino logic with variable strength voltage keeper was proposed. The variable strength of the keeper was achieved through applying two different body biases to the keeper using CBBG and C BBG circuits. Compared to a similar work, the body bias generator circuits presented in this paper are simpler and do not require double or triple power supply while consuming less area and power. To show the efciency of the proposed technique, the methods were applied to a carry generator circuit where the results for standard CMOS technologies of 0.18 m and 70 nm were compared to those of the previous work. The proposed technique led to lower power and power delay products without affecting the noise margin and the delay much. The effect of the temperature on the noise margins of the proposed techniques were assessed by performing the simulations at low and high temperatures showing less temperature dependence when compared to that of the previous work. As a nal note, the results of this paper show that combining the variable threshold voltage keeper technique and the CBBG, the domino logic is taken to a new level of high-speed and low-power operation, without degrading the noise-immunity. Thus, domino logic may still be employed in the sub-100-nm technologies where noise is becoming an increasingly limiting issue. REFERENCES
[1] V. Kursun and E. G. Friedman, Domino logic with variable threshold voltage keeper, IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 11, no. 6, pp. 10801093, Dec. 2003. [2] J. Silberman et al., A 1.0 GHz single-issue 64b powerPC integer processor, in Proc. IEEE Int. Solid-State Circuits Conf., 1998, pp. 230231. [3] A. Alvandpour, R. K. Krishanmurty, K. Soumayanath, and S. Y. Borkar, A sub-130-nm conditional keeper technique, in Proc IEEE Int. Conf. Electron. Circuit Syst., 1999, pp. 209212. [4] M. W. Allam, M. H. Anis, and M. I. Elmasry, High-speed dynamic logic style for scaled-down CMOS and MTCMOS technologies, in Proc. IEEE Int. Symp. Low-Power Electron. Design, 2000, pp. 155160. [5] V. Kursun and E. G. Friedman, Forward body biased keeper for enhanced noise immunity in domino logic circuits, in Proc. IEEE Int. Symp. Circuits Syst. (ISCAS 04), 2004, pp. 917920. [6] T. Kuroda, T. Fujita, S. Mita, T. Mori, K. Matsuo, M. Kakumu, and T. Sakurai, Substrate noise inuence on circuit performance in variable threshold-voltage scheme, in Proc. Int. Symp. Low Power Electron. Design (ISLPED), 1996, pp. 309312. [7] G. Yee and C. Sechen, Clock delayed domino for adder and combinational logic design, in Proc. IEEE Int. Conf. Comput. Design (ICCD96), 1996, pp. 332337. [8] Berkeley Predictive Technology Model (PTM) [Online]. Available: http://www.eas.asu.edu/~ptm/ [9] I. S. Hwang and A. L. Fisher, Ultrafast Compact 32-bit CMOS adders in multiple-output domino logic, IEEE J. Solid-State Circuits, vol. 24, no. 2, pp. 358369, Apr. 1989. [10] T. B. Cho and P. R. Gray, A 10 b, 20 Msample/s, 35 mW pipeline A/D converter, IEEE J. Solid-State Circuits, vol. 30, no. 3, pp. 166172, Mar. 1995.

Amir Amirabadi (S06) received the B.Sc. degree in electrical engineering from Shahid Beheshti University, Tehran, Iran, in 2002, and the M.Sc. degree in electrical engineering from the University of Tehran, Tehran, Iran, in 2005, where he is currently pursuing the Ph.D. degree in electrical and computer engineering. His current research interests include low power circuit design, mixed signal circuit design, and analog signal processing circuit design and test.

Ali Afzali-Kusha (M04SM06) received the B.Sc. degree from Sharif University of Technology, Tehran, Iran, the M.Sc. degree from University of Pittsburgh, Pittsburgh, and the Ph.D. degree from University of Michigan, Ann Arbor, all in electrical engineering, in 1988, 1991, and 1994, respectively. From 1994 to 1995, he was a Post-Doctoral Fellow at The University of Michigan. Since 1995, he has joined The University of Tehran, Tehran, Iran, where he is currently an Associate Professor of the School of Electrical and Computer Engineering, the Director of Low-Power High-Performance Nanosystems Laboratory, the Head of Electronics Division, and the Head of Nanoelectronics Center of Excellence. Also, on a research leave from the University of Tehran, he has been a Research Fellow at University of Toronto, Toronto, ON, Canada, and University of Waterloo, Waterloo, ON, Canada, in 1998 and 1999, respectively. His current research interests include network-on-chip, low-power high-performance design methodologies from the physical design to system level, and nanoelectronics circuits and systems.

Yousof Mortazavi (S00M05) received the B.S. degree in electrical engineering from the University of Tehran, Tehran, Iran, in 2005. He is currently pursuing the Ph.D. degree from the University of Texas at Austin. Since 2006, he has been a Research Assistant at the Embedded Signal Processing Laboratory, University of Texas at Austin. His current research interests include designing low-complexity digital communications transceivers especially for multicarrier modulation schemes.

Mehrdad Nourani (S91M94SM05) received the B.Sc. and M.Sc. degrees in electrical engineering from the University of Tehran, Tehran, Iran, in 1984 and 1986, respectively, and the Ph.D. degree in computer engineering from Case Western Reserve University, Cleveland, OH, in 1993. He was with the Department of Electrical and Computer Engineering, the University of Tehran, from 1995 to 1998 and the Department of Electrical Engineering and Computer Science, Case Western Reserve University, from 1998 to 1999. Since August 1999, he has been on the faculty of the University of Texas at Dallas, where he is currently an Associate Professor of Electrical Engineering and a Member of the Center for Integrated Circuits and Systems (CICS). His current research interests include design for testability, system-on-chip testing, signal integrity modeling and test, application specic processor architectures, packet processing devices, high-level synthesis and low power design methodologies. He has published over 120 papers in journals and refereed conference proceedings. Dr. Nourani was a recipient of the Texas Telecommunications Consortium Award in 1999, The Clark Foundation Research Initiation Grant in 2001, the National Science Foundation Career Award in 2002, the Cisco Systems Inc. URP Award in 2004, and a Best Paper Award at the 2004 International Conference on Computer Design (ICCD). He is a Member of the IEEE Computer Society and the ACM SIGDA.