Documente Academic
Documente Profesional
Documente Cultură
Christophe Tretz and Charles Zukowski Department of Electrical Engineering Columbia University in the City of New York, New York NY 10027-6699 (212) 854-8478, tiger@vlsi.columbia.edu
Abstract
In thispaper, we revisit three of the well known optimization results in CMOS transistor sizing with the energy-delay product as a new metric. We study the absolute sizes of and the ratio between n-channel and p-channel transistor widths in uniform logic, the optimal distance between repeaters in an RC line, and the optimum number of inverter stages, along with their sizes, needed to drive a large load capacitance. Results, both theoretical and numerical, show that in general the optimum solutions for energy-delay lead to smaller designs than the ones obtained for minimum delay.
isp
energy/operation, or power normalized to unit operating frequency. To evaluate a logic gate or logic chain design for power, we thus consider the energy required t propagate a o single transition. Energy cannot generally be considered in isolation, however. As proposed in [l], we use the energydelay product as our ultimate cost function for two reasons. First, it provides a simple way to consider delay along with energy, since low power is rarely the only goal in logic
circuit design. Second, since adjusting the power supply voltage (including its form in adiabatic circuits) can trade speed directly for lower energy, it provides a good metric for energy efficiency in an optimization problem where the power supply is assumed fixed. A circuit with an improved energy-delay product can have its supply voltage adjusted to achieve lower energy for the same delay, or a lower delay for the same energy. In [2], numerical algorithms to solve simultaneously driver and wire sizing for speed and power optimizationare presented. This method however uses a weighting relation to optimize energy and delay separately. Because of that, the resulting optimum is based on the weights applied to the relation, and their optimum depends on the choice of the weights and is different from the energy-delay product optimum we are presenting in this paper. With the energy-delayproduct as a new metric, we revisit three of the well known optimization results in CMOS transistor sizing. First, we consider the sizing of n-channel and p-channel transistor widths, including their ratio. To match rising and falling delays, one generally uses the mobility ratio for a width ratio. To minimize simple pair delay, however, the optimum ratio is the square root of the mobility ratio; the loading effect of the p-channel transistor in every stage is thus balanced against its weaker drive at a logic stage with a rising output. Consideringenergy as well, there can be an even stronger push towards minimum-size transistors. Second, we consider the optimal distance between repeaters in an RC line, a balance between the delay arising from extra stages and the advantageof cutting a line with a delay that increases quadratically with distance. Energy considerations favor a reduction in the number of repeaters, each of which increases the capacitance that must be charged and discharged. Third, we consider the optimum number of inverter stages, along with their sizes, needed to drive a large capacitive load. Larger stage ratios must be used near the load when energy effects are considered,because in the conventionalminimm-delay solution,most of the energy is consumed by the last stage.
168
0-8186-7502-0/96 $5.00 0 1996 IEEE
In the second section, we briefly review the first-order CMOS energy and delay models, and cover our basic assumptions. In the next three sections, we explore the effects of changing the cost function from delay to energydelay product in each of the three situations mentioned above. Finally, we discuss some of the implications in the conclusion.
and wilh MOSIS 2 micron process parameters [6], we = 16 . . obtain ii ratio of Considering now the optimum for the energy-delay producl..,we can use the same methodl as the one used lo find the minimum delay. The expression of this product is given by:
E*t-(W,+Wp)
X
(PnWn -
-I---
1
n w
(EQ6)
+ W p , i +1 )
Keeping W, fixed, we can find1 the optimum of the product by setting its gradient with respect to Wp equal to zero. Doing so, we end up with a quadratic equation having two real roots, and the only positive solution is: il 2P 2 2 ----(E~t)=O~-pWp+WpWn-Wr = 0 (EQ7)
aw,
Pn
Howewer, we can see in this derivation that we could have kept Nrpfixed; doing so, we end up with a completely symmelrical quadratic equation, and solving this time for W,, we. obtain:
where Wn,i and Wn,i+lrepresent the transistor width of the: driver and load respectively. To include wire parasitics, as well as the fanout off the critical path as we do i section 3., n we use Cwire(in units of microns of equivalent transistor width) to represent the parasitic capacitance associated withi interconnect wiring and fanout. In the special case of ai chain of identical inverters, which could represent a pathi through some random logic, when Cwireis included, in general we obtain a pair delay (one stage rising and the: other falling) oE
(EQ 4:)
This second result is obviously different from the first one, and using MOSIS 2 micron parameters, we see that the first ex]:ression would give an optimum ratio of 0.66, while the second expression would give an optimum ratio of 2.85. As a note, we end up with the same expression for the minimum delay case with both approaches. We c(an then plat a surface proportional to the energydelay product as a function of both widths, as well as a 2D plot showing the level lines of the function, giving us a better view of the optimum values for the minimum energydelay product (figure l a and Ib). We can clearly see both
169
FlGURE 1.
I8 14 It
IO
I
4
IO
It
Id
I*
I#
FlGURE 2.
optimum ratios. For the sake of completeness, we are also showing the same plots obtained for the minimum delay (figure 2a and 2b), on which we also show the optimum ratio. In this case, we can see that we have two minima for the energy-delay, depending if we choose to fix the size of the n- or p-channel transistor. Interestingly, the global minimum with given technological constraints is obtained for minimum size devices. If we also consider the effects of parasitic wiring capacitance, something that is becoming more important for small geometries like submicron or deep submicron processes, we must add a constant t r to the expression em for load capacitance. This also can model fanout within a critical path. In [l], Horowitz shows that we reach the optimum energy-delay roughly when the contribution of the gate capacitance is equal to the contribution of the wire capacitance. Here, we have a more complete result, as we are showing that when we include the wire capacitances in the model, we can reach a global minimum for the energydelay product for which the width of the pullup is larger than the width of the pulldown, as we can see from the equation when we add a significant constant Cwirein the
(EQ 10)
We can see in figure 3 the global optimum for the energydelay when we include the extra capacitance in the model. This reflectsa wire capacitance that is equivalent to a gate capacitance from a transistor that is 4 times minimum width For this global minimum, we have a gate with p-channel transistor wider *thann-channel transistor, but none of the optimum widths are minimum. We have this relation because the mobility of electrons is higher than the mobility of holes. We added on this figure the curve of the various global minima obtained for various values of C , located in between the minimum for delay (1.6) and energy-delay with fixed Wp (2.85). These three optimums (fixed W,, fixed W, or including Cwire)can be used to optimize standard cell designs using scalable cells.
170
m
U U
m
0
a
:t
IO
la
Id
I*
wP
FIGURE 3.
Optimum for the energy-delay product, irlcluding parasitics, a) mapping, b) level lines
;Sli(E * t ) = 0 a
Wn. can however plot the product, and find the optimum We
values numerically, in order to compare them to the ones obtained for the minimum delay. We have in figure 4 both the surlhce representing the energy-delay product, and the level lines of this surface as a function of N and W,, and in figure 5 , the same curves for the delay alone for comparilson. From these figures, we clearly see that in order to achieve the minimum energy-debiy product, we need to use fewer repeaters of smaller size than what we would use to achieve the minimum delay.
where RL and CL are per unit length parameters of the long resistive line, W- is the minimum transistor width, and &: and C are parameters of a minimum size inverter, the time: , constant z~ and z are given by ZL=RLCL, z=R& and If we are now minimizing the energy-delay product, we: need the expressions of both delay and energy for this problem, and using the same approach we have:
Using the gradient of the product with respect to both the: number of repeaters and the width of these repeaters, we: end up with a system of two equations in N and Wn:
171
uRE 4.
a)
FIGURE 5.
problem. In order to find the minimum delay, we were using the fact that we will obtain the minimum delay if each stage has the same delay. We cannot transpose this assumption to e energy-delay problem. An immediate consequence is at we will not have a constant ratio between successive stages. We intuitively already know this result as we know that the last stage will dominate the energy consumption of the delay-minimized inverter chain, and thus, we do not want to have constant ratios between stages in order to reduce the size of the last stages as much as possible. ualitatively, we want to have larger ratios at the end of chain to reduce the energy consumption. This means at from a problem with two unknown parameters (M and gf, w e end up with a problem with (M+l) unknown parameters. The expressions for delay, energy and energy-delay product (after simplification)are:
M-1
M-1
E .t
= %CLV:
i=o
Knowing that the product of all stage ratios should be equal to the ratio between the driven load CL, and the starting load CO, can solve this problem. We used the we Lagrange multipliers method [7] to find the optimum values of the various ratios gi and of the number of stages. The set of equations used for the optimization process are the various derivatives with respect to each ratio, as well as the product of the ratios used as the constraint imposed on the
function to optimize. We thus have the following system to
'+Jn
i i
gj
j=i+l
(EQ 16)
C O
r=Q-gi
i=O
(
(EO 15)
172
It is then just a matter of iteration to also find the globally of stages by applying the same method tal f M and comparing the results. As the: r complex and slow if the. mes signilicant, we c create a set of m a designer to easily find a good first set of n load capacitance. In figure 6 we: show one such set of curves as an example. In this case, we: values of M that match the: um delay solution, for e s ay comparison between the two situations. As we can see., while the first ratios are smaller for the energy-delay case, the last ones are a lot larger; as a result, we have smaller inverters in the chin, using thus also less area. The table in figure 6 gives the delay and energy-delay values for various; es;the first row of each box gives the valuer; tained for the minimum delay approach, t e h second row for the sizing obtained for minimum energy-. delay. All values en in base units (i.e., unit capacitance for the it time for the delay, and unit energy-ti~e the for hertermsizes -optimum energy-de
delay metric. Contrary to the minimum delay case, the results are not as simple, and numerical solutions are necessary in some cases. However, the obtained results are interesting as significant energy-delay improvements can be achieved using these results. A common result is that in general, smaller is better to improve the energy-delay producl., and from the various results, we can see that we do not netxi to sacrifice speed a lot to improve the energydelay. Other results to minimize delay for particular situations have been derived, and it would be interesting to also find how they change when the energydelay product is minimized instead. An iirrteresting aspect of these results comes with the optimum ratio to achieve minimum pair energy-delay produclt presented in section 3. The optimum curves deriv can be used to design optimum standard cells in scalable CMOS using the various design constraints imposed by the system. Another interesting result is that most chains of gates will have fewer and smaller stages to achieve optimum energydelay product when we use the general optimization algorithm of equation 16, meaning that we reduce the overall (mea,increasing thus the wirability or the available area for additional overhead circuitry used to implement more complex co~ltrol schemes to further reduce power dissipation in the circuit.
References
Maurk Horowitz, Thomas Indermaur C Ricardo Gonzales, Low-Power Digital Design, IEEE Symposium on Low Power Electronics, 1994, 8-11. pp J. Clong & C-KKoh, Simultaneous Driver and Wire Sizing for performance and Power Optimization, IEEE Trans. on VLSI Systems, Vol. 2 No. 4 December 1994,pp 408425. , A. Chmdrakasan & R. Brodersien, Minimizing Power Consumption in Digital CMOS Cin:uits, Froceedings of the IEBB. Vol. 83, NO.4 April 1995,pp 498-523. , N.1-[.E. Weste & K. Eshraghian, B -Des,ipn. A Svstems Persuective. Second Edition, AddisonWesley Publishing Company, 1993. L. .A. Glasser & D. W. Dobberpuhl, The Desien and Analvsis -- VLSI Circuits, Addison-Wes1e:y Publishing Company, of
number
DrOaChlDelav Product min. Dro
1985.
MOSIS User Manual, Version 4 0 1995. ., George B. Thomas, Calculus and Analvtical Geometry, Adtfson-Wesley Publishing Compamy. 1972.
6. Conclusions
sections, we revisited three well known transistor sizing with the energy-delay product as a new metric to minimize instead of the standard
173