Sunteți pe pagina 1din 13

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 12, NO.

11, NOVEMBER 2004

1221

Characterization and Modeling of Run-Time Techniques for Leakage Power Reduction


Yuh-Fang Tsai, Student, IEEE, David E. Duarte, Member, IEEE, N. Vijaykrishnan, Member, IEEE, and Mary Jane Irwin, Fellow, IEEE

AbstractWhile some leakage power reduction techniques require modication of the process technology, others are based on circuit-level optimizations and are applied at run-time. We focus our study on the latter and compare three techniques: input vector control, body bias control, and power supply gating. We determine their limits and benets in terms of the potential leakage reduction, performance penalty, and area and power overhead. The leakage power savings trends considering technology scaling are also presented. Due to the differences in the properties of datapath logic and memory structures, different implementations are recommended. Finally, the use of the minimum idle time parameter, as a metric for evaluating different leakage control mechanisms, is showed. Index TermsData preserving, leakage power, low power, power estimation, run-time leakage reduction, technology scaling, very large scale integration (VLSI) circuits.

I. INTRODUCTION S TECHNOLOGY scales down, the supply voltage must be reduced such that dynamic power can be kept at reasonable levels and power delivery can still be performed within the functional requirements. In order to prevent the negative effect ) must be on performance incurred, the threshold voltage ( reduced at the same rate such that a sufcient gate overdrive is maintained [1]. This reduction in the threshold voltage causes up to a 5 leakage current increase per generation [2], which in turn, can increase the static power of the device to unacceptable levels. Additionally, a large leakage current has negative effects on the stability of 6T SRAM cells and the noise immunity of dynamic circuits. Although many sources contribute to the total transistor leakage current, most of the interest has focused on the leakage by weak inversion (caused by the subthreshold current), even though other components, such as gate-oxide tunneling, may become signicant in the future. The impact of leakage power is worsened by a high die temperature, so better cooling techniques are critical in order to control both active and leakage power. Many techniques have been proposed to achieve leakage power reduction. Some require modication of the process
Manuscript received November 3, 2003; revised May 26, 2004. This work was supported in part by MARCO/Defense Advanced Research Projects Agency (DARPA) GSRC Grant, National Science Foundation (NSF) under NSF 0093085, NSF 0130143, and NSF 0202007. Y.-F. Tsai, N. Vijaykrishnan, and M. J. Irwin are with the Department of Computer Science and Engineering, Pennsylvania State University, University Park, PA 16802 USA (e-mail: ytsai@cse.psu.edu; vijay@cse.psu.edu; mji@cse.psu.edu). D. Duarte is with Portland Technology Development, Intel Corporation, Hillsboro, OR 97124 USA (e-mail: david.e.duarte@intel.com). Digital Object Identier 10.1109/TVLSI.2004.836315

technology, achieving leakage reduction during the fabrication/design stage, while others are based on circuit-level optimization schemes that require architectural support, and in some cases, technology support as well, but are applied at run-time. Some popular technology/fabrication techniques include the following: multiple threshold CMOS (MTCMOS) [3][5]; silicon on insulator (SOI) [6], [22]; strained silicon [7] nFET [8], [9]; multigate structure [10]; and using Hi- dielectric materials [11]. MTCMOS technique assigns low-threshold voltage transistors for the devices in critical paths and needs extra process steps and masks to generate multiple threshold voltage transistors. The usage of SOI, strained silicon, multigate structure, and nFET is intended to improve performance while using a thicker gate oxide, so that gate leakage can be controlled. Nevertheless, the higher performance allows level, which yields less DIBL and, therefore, less lower subthreshold leakage. SOI has a signicant impact on power by virtually eliminating the diffusion capacitance and allowing for steeper subthreshold slopes. In strained silicon, the atoms are stretched by inserting germanium atoms into the silicon lattice and, thus, the atoms in silicon are moved slightly farther apart. This reduces the atomic forces that interfere with the movement of electrons through the transistors and thus the performance is improved. Multigate and nFET structures increase the control over the channel and a thicker gate oxide can be used. The other process solution for leakage reduction is using Hi- dielectric material for gate oxide to mitigate the gate leakage for the same physical oxide thickness, due to a higher oxide energy barrier. We focus our study on the techniques that apply at run-time, so that the potential benets and limits of each technique can be recognized. Due to the differences between the properties and design concerns of datapath logics and memory structures, different implementations are recommended and characterized for them. Aggressive scaling of technology and the increasing demand of performance raise the concern of the effectiveness of these run-time techniques for future technologies. The evaluations of the practicability of the techniques with technology scaling are also done. Throughout the evaluation, we would like to emphasize the importance of using minimum idle time, which is the minimum time for a circuit to stay in the sleep mode so that the power saving can be gained when considering the overheads associated with the leakage reduction techniques, as a metric for evaluating these leakage reduction techniques. Ideally, the amount of time that the unit remains idle must be long enough so that the overhead of moving to the low-leakage state is less than the consumed leakage power during the same time if a low-leakage state is not used. If the required minimum idle

1063-8210/04$20.00 2004 IEEE

1222

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 12, NO. 11, NOVEMBER 2004

time of a leakage reduction technique is longer than the most of the time that a circuit in interest stays idle, that technique will not be benecial for this circuit. This minimum idle time can be used as a metric to evaluate the feasibility of a leakage reduction technique. The contribution of this paper is to provide a complete comparison of run-time leakage power reduction techniques in terms of the leakage savings, feasibility, and scaling trend. Through the detailed discussion of the mechanisms and their implementation, we explain the ideas and point out the related concerns for applying each technique. The rest of the paper is organized as follows. In Section II, we briey review the most commonly used leakage reduction techniques. In Section III, we present the results of our study on datapath designs and correlate them with some models and equations that can be used for early estimation of the resulting effects while Section IV presents the results of study on memory structures. Finally, we discuss some conclusions and implications of technology scaling, and provide a forecast of the use of this work in Section V. II. OVERVIEW OF RUN-TIME TECHNIQUES FOR LEAKAGE REDUCTION Leakage is the static current owing through the transistors. The amount of leakage depends on the bias states of the gate, source, drain, and bulk. In [2], it was formulated as (1) Where is the leakage current of a single device of a unit and , width in off state with its is the gate swing, is DIBL effect factor, is body effect , , and are the amount of reduction factor, and in gate, drain, and bulk-to-source voltage, respectively. We can reduce the leakage by lowering the bias of any of the terminals mentioned. A. Input Vector Control Many researchers have used models and algorithms to estimate nominal [12], [13] and minimum and maximum leakage of a given circuit [14], [15]. This work has made evident the inuence of the input pattern on the circuit leakage behavior, which is associated with the stacking effect [16], as a result bias conditions for each transistor. of the above mentioned As the state of devices in the stack is determined by their corresponding inputs, which in turn are determined by the units input signals, the goal can be expressed as nding the input pattern that maximizes the number of disabled transistors in all stacks across the unit [16]. All the mentioned tools can be used for determining the minimum-leakage vector and to further exploit the stacking effect by inserting transistors to reduce the effect of leaky sections of the circuit [17]. Another possibility requires performing exhaustive circuit-level simulation for all the input patterns and then looking for the pattern with the least amount of leakage, which, may be prohibitive for circuits with a large number of inputs and/or a very large logic depth. In [18], the authors use probabilistic theory to reduce the number of trials (simulations) as a function of desired condence and error tolerance. The implementation of the input vector control technique

requires minimal architectural support. The sleep signal that determines whether the device is active or not may be already implemented in most designs, but a threshold should still be used such that this vector is activated only when the unit is expected to be idle for a while. As compilers have a larger instruction scope, the decision to enter the sleep mode can be shared between software and hardware. B. Increasing the Threshold Voltage This technique has different implementations, but for all, some process technology support is required such that the threshold voltage of some (or all) transistors can be changed from the default dened for the technology. For fabrication of what is called multiple-threshold CMOS where two types of devices are available, each with a different threshold voltage, techniques like threshold adjustment by ion implantation, by using two different oxide thicknesses or by using different channel lengths, are usually employed. On the other hand, if the threshold voltage is adjusted at run-time, triple-well technology is required. An example of multiple-threshold CMOS is what has been called multiple-threshold voltage CMOS (MTCMOS) [19] in which a high-threshold device is inserted in series into lowthreshold circuitry, creating a sleep transistor. In this way, virtual supply and ground rails are created with voltage levels very close to the real power lines due to the very small on resistance transistor inserted. In practice, only one virtual of the highrail (i.e., the virtual ground, usually) is used. Another approach uses high-threshold voltage devices on noncritical paths such that leakage power is reduced while performance is maintained by keeping the low-threshold devices on the critical paths. This technique requires an algorithm that performs a search for the places where the high-threshold voltage devices can be placed [20]. This technique is called CMOS. In dynamic threshold MOS (DTMOS), the Dual body and gate of each transistor are tied together such that whenever the device is off, low leakage is achieved while when the device is on, higher current drives are possible [21]. The supply voltage is limited by the diode built-in potential (i.e., approximately 0.6 V) but this technique may also be successful in signicantly reducing gate leakage in more aggressive technologies that require low-supply voltages; if the gate and substrate are at the same voltage, then no electric eld across the oxide is present. SOI technology is a way to remove the low supply voltage limitation [22]. Among the techniques that dynamically modify the threshold voltage during runtime, the classic example is a technique called standby power reduction (SPR) or variable threshold CMOS during standby mode by making (VTCMOS), which raises (p devices) or lower the substrate voltage either higher than than ground (n devices). This requires an additional power supply, which imposes cost limitations for commercial designs. A technique presented in [22] has removed this constraint and successfully applied the method to a commercial digital-signal processor. The architectural support needed can be implemented in hardware or software. As there is a larger performance penalty due to the time required to remove the substrate voltage so that the circuit can operate normally, we

TSAI et al.: CHARACTERIZATION AND MODELING OF RUN-TIME TECHNIQUES FOR LEAKAGE POWER REDUCTION

1223

TABLE I SIMULATED AVERAGE DYNAMIC AND LEAKAGE POWER FOR THE DESIGNS STUDIED. TOOLS: MAXMAGIC MAX AND HSPICE

believe that the software should be in charge of controlling the application of the scheme. Noise immunity problems have been reported when the substrate voltage is changed but since the technique is applied when the system is idle, there is no negative effect on the normal operation of the circuits. C. Gating the Supply Voltage The last approach considered is power-supply gating. There are many ways in which this technique can be implemented, but the basic idea remains: to shut down the power supply so that idle units do not consume leakage power. This can be done using sleep transistors [1], with MTCMOS being the most popular approach. It could be possible that one sleep transistor is present per gate, but larger granularities are more common, which require fewer but larger transistors. The problems with the technique are reduced performance and noise immunity if care is not exercised when designing the sleep transistors and that reduction, but no elimination, of leakage is achieved. Another implementation is using voltage regulators that can both remove and scale the supply voltage to the circuit. The clear option would be switching regulators [24], [25], which have already been used in support of dynamic voltage scaling (DVS). But, as they require the use of inductive elements, their use has been restricted to the chip level; at least until effective ways to create high-quality on-chip inductive elements are developed. Many designs for on-chip voltage regulators have been proposed [26], which usually encounter limits in terms of process variations and controllability, as most are designed to generate a constant supply voltage rather than a variable voltage. Phase-locked loops (PLLs) have also been proposed as voltage regulators [27]. In contrast to techniques that require adjustment of the threshold voltage, this method does not require process technology support. The above scheme will also be suitable for memory structures when the supply gating is done for the whole array. But if a ner granularity is required, supply gating with sleep transistors must be employed, which also has the potential of minimizing leakage through the bit-lines. The sleep transistors can be pMOS, nMOS, or the combination of both (CMOS). To obtain balanced performance in practical SRAM designs, the pMOS are sized larger than nMOS. This sizing requirement results in a larger pMOS sleep transistor for comparable driving capability. For a SRAM cell, an nMOS or CMOS sleep transistor is recommended for larger leakage savings and smaller area penalty. In [28], the authors pointed out that the increased bit-line leakage through transmission transistors causes slow or

TABLE II SUMMARY OF SIMULATION CONDITIONS

incorrect read/write operations. Since these transmission transistors are n-type, by including an nMOS sleep transistor, the bit-line leakage can be reduced and thus mitigate this problem. The other concern for memory is preserving the data. This can be achieved by correctly sizing the sleep transistors [29]. The functionality of the inverter loop in the 6-T SRAM cell assures the retention of stored data and this mechanism for data retention is explained next. During sleep mode, the virtual ground ) will be pulled up to a specic level depending on the ( voltage transfer characteristic (VTC) of the inverter loop in the SRAM cell and the size of sleep transistor used. This allows of each nMOS transistor in the logic circuit to be for the is increased and the leakage reduced. Furthermore, their and current is lowered. However, if the difference between is lower than the noise margin of the inverter loop, functionality is harmed. To maintain the functionality, there is a maxand/or a minimum for each design and imum these levels can be kept by correctly sizing the sleep transistors. III. CHARACTERIZATION OF LEAKAGE REDUCTION TECHNIQUES FOR DATAPATH LOGIC From the techniques described above, we have chosen one per category, each of which is controllable at run-time and we now focus on characterizing them and obtaining the potential energy savings and all incurred overheads (in performance, area, etc.) when the technique is applied, while trying to formulate estimates whenever possible. We next verify how each technique compares to the others, while Table I denes our starting reference point by listing the average dynamic and leakage power obtained by SPICE simulations, which were done at the conditions presented in Table II. Take note that short channel effect (SCE) and drain induced barrier lowering (DIBL) have been considered when obtaining the leakage power and will be discussed. A. Input Vector Control In our evaluation of input vector control (IVC), we start with the design of functional units that are front-ended by latches for use in pipelined datapaths. The latch is modied to support the

1224

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 12, NO. 11, NOVEMBER 2004

TABLE III VARIOUS PERFORMANCE PARAMETERS FOR THE IVC SCHEME. THE PERFORMANCE PENALTY IN ALL CASES IS LESS THAN A CLOCK CYCLE

Fig. 1.

Input control logic for IVC.

input control logic as shown in Fig. 1. In this design, when in sleep mode, the control_to_1 logic has two nMOS transistors in stack while the control_to_0 logic has two pMOS transistors in stack in the worst case. The use of stacking in the input control latch reduces the leakage power of the control logic tenfold. In [18], 59 random input vectors were shown to achieve a 95% condence of nding the input vector producing the least leakage current. The key to the proposed approach was the tting of a Gaussian distribution to the leakage prole obtained by the selected input vectors. In our approach, 180 random input vectors were generated to t a Gaussian distribution of leakage measurements. Each input vector was simulated by HSPICE to nd the input vector with the least leakage. The control unit was then added to the circuit, and the average obtainable leakage savings were found. The signal that enables a unit can be used to activate the input latches, providing a recovery time below one clock cycle and in most cases, causing no performance overhead. However, if the unit requires the addition of the latches for IVC support, the actual power savings are not as good and the area overhead becomes noticeable. In particular, it was found that the area increases by a factor close to 5 as an average, when the latches have to be added to the design. From Table III, it is clear that the actual savings depends strongly on the unit to which the technique is applied, with the

logic design style and logic depth being the most inuential factors. A clear example is the logic units, which are of the complementary CMOS design style. The measured savings of NAND and NOR are larger than that of AND and OR, respectively, because they are implemented with one logic depth less than the later ones so that more stacking is controllable by the inputs. The exception is the XOR design, whose savings are lower because it uses a transmission-gate-based design style where the potential savings from stacking are not fully exploited. Note that if the logic depth is too deep, taking the 16 16-bit array multiplier for example, the savings are limited and the required idle time is too long. If the area constrains allow, additional control logic could be inserted half way into the units logic depth to increase the ability to control the state of internal nodes. This is practical for pipelined circuits where there are latches at the inputs of each pipeline stage. In terms of power overhead, the only contribution occurs when the unit enters the sleep mode and comes from the transition from the state in which the unit was, to the minimum-leakage state. Note that if the switching incurred in setting, the input to the desired pattern causes a dynamic power consumption larger than that of the leakage at the current state for the given idle time, and there will be no savings. In other words, the amount of time that the unit remains idle must be long enough so that the dynamic power used in setting the low-leakage input is less than the consumed leakage power during the same time if no low-leakage input is set. In order to provide an estimate of the minimum idle time required that actually gives energy savings, the following analysis is the is presented. As shown in Fig. 2, let us assume that power wasted if no leakage reduction scheme is used and is the resulting leakage power after the technique is applied. Then, we can estimate the energy dissipated with and without the reduction scheme as follows:

(2) Where all times are dened in Fig. 2, and and are the and , energy dissipated by the overhead circuitry during respectively. These transition energies include both dynamic and leakage contributions. In some cases, the transition times . To are comparable, so we can assume that

TSAI et al.: CHARACTERIZATION AND MODELING OF RUN-TIME TECHNIQUES FOR LEAKAGE POWER REDUCTION

1225

Fig. 2. Representation of the leakage power behavior during scheme application.

make the leakage reduction scheme worth the effort, we need (in [1], the rule used is ) and after some manipulation of this inequality we get

(3) corresponds to the avIn the case of input vector control, erage dynamic power-delay product of the unit. The minimum idle time determined for each unit is listed in Table III. The small minimum idle time required for IVC is the result of the small transition energy penalty even when the leakage savings is small. This points out the importance of including this parameter in evaluation instead of considering transition energy penalty or leakagepower savings alone. Due to the increasing leakage reduction, the minimum idle time decreases with technology scaling. B. BBC VTCMOS is used as the sample technique for body bias control (BBC) as it requires architectural support and does not rely completely on hardware design choices and placement, allowing it to be applied at runtime. This is a required feature for useful comparison against the other techniques studied. To provide the substrate bias, we modied the netlists generated from the layouts and manually adjusted the body voltages of P and ground, and N devices, which, by default, are wired to respectively. The substrate bias level is manually modied to the optimum level for BBC. It must be noted that for 70-nm technology, the Berkeley predictive transistor model (BPTM) does not capture the degradation of threshold voltage caused by substrate bias due to SCE. For our simulations of 70-nm technology, in addition to the previous modication, we manually adjusted the threshold voltage value according to (4) in the netlists. Fig. 3 shows the achievable increase in threshold voltage by changing the substrate bias. For 70-nm technology, the achievable threshold voltages, with and without considering SCE, are shown to illustrate the importance of capturing this effect. (4)

Fig. 3. Achievable threshold voltage by basing substrate voltage.

Fig. 4. Normalized I

as a function of the normalized body bias voltage.

Where is the oxide thickness, is potential barrier, is is the effective device length. the substrate bias, and Fig. 4 shows the simulated average leakage current and value. The use of large values of (i.e., above optimum 1 V for 0.25 m, 0.8 V for 0.18 m, and 0.4 V for 70 nm) increase is not recommended since the large values of gate/junction leakage (as they are a function of the voltage difference between the gate/junction and the substrate). Unlike the 0.25- m and 0.18- m technology, which have an optimum substrate bias level that balances the conicting tradeoff between subthreshold and gate/junction leakage, for 70-nm technology, overall leakage keeps reducing with increasing substrate bias. This is because subthreshold leakage is always larger than gate/junction leakage at high temperature for this

1226

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 12, NO. 11, NOVEMBER 2004

technology. We assume that the use of Hi- , dielectric gate oxide technology to control gate leakage. However, because of reliability concerns, there is a limit to the magnitude of substrate bias. We decide to limit it to the burn-in power supply level. While prior level, which is typically 1.4 times of research [30] indicates that the optimum substrate bias level for leakage reduction depends only on the process technology, our results reveal that the design style plays an important role in selecting this value as well. An example is the difference between the complementary CMOS logic and pass-transistor logic. In pass-transistor logic, the raised threshold voltage of on nMOSs (pMOSs), which passes a weak 1(0), increases the short circuit current in the following gates that it drives and could amortize any reduction in the subthreshold leakage of all off transistors due to high-threshold voltage. However, the short circuit of subsequent stages is not an issue when increasing threshold voltage in complementary CMOS logic. As this effect diminishes in smaller technologies, we observe that the differences in optimum supply voltages for 0.18 m is less than that for 0.25 m. In the case of 70 nm, the bias voltage is determined by the reliability constraints. The above discussion leads us to the concern of effectiveness of BBC. Our results conrm that BBC will be less effective with technology scaling, which can be reasoned by the curves of 70 nm in Fig. 3 showing the reducing control ability of substrate roll off. Even bias to threshold voltage due to SCE and though the elevating percentage of subthreshold leakage provides more room for BBC, the results in Table V verify the limited effectiveness of BBC for future technologies. The power overhead is represented by the circuitry in charge of adjusting the body bias voltage. The circuit presented in [23] uses a charge pump to change the substrate level to an optimum standby bias and a charge injector to perform the recovery to active mode in reasonable time while trying to keep the area overhead to a minimum. In this implementation, there is a portion of the circuit that continuously draws current from the supply, but its effect can be ignored due to small magnitude (around 1 nA and can be kept small with careful design as technology scales down). The bulk of the power overhead is in the energy required to charge the substrate when the system is entering a sleep mode. Since the transition time for fully charging the substrate is comparatively longer, there are two cases to be considered. Independent of how fast the substrate is charged, the energy required to charge the substrate can be estimated as follows: where the substrate is fully charged to 1) for the optimum substrate bias level (5) where the substrate is partially charged 2) for to a level less than the optimum substrate bias

is the capacitance per unit of area from the substrate to the active regions (P or N). We assume linear relationships between the transition time and transition energy and between the transition time and the obtained reduced leakage power. This assumption results in a pessimistic but safe estimation of minimal idle time. The leakage power consumed during the time , can be estimated as simply the avthe scheme is applied, erage of and for convenience. Note that since the leakage current of the substrate bias control circuitry is small (1 nA for 0.25 m), its leakage power can be neglected. The minimum idle time thus can be formulated as follows: 1) for (7) 2) for (8) is the transition time from sleep mode to active Where mode. The performance overhead happens when changing the substrate level. The transition time of the charging circuits can be estimated as follows: (9) Where is the voltage difference of substrate to be is the substrate capacitance, is charged, is the transistor saturathe width of driving devices, and tion current. To satisfy the feasibility and to match the speed improvement of commercial products, we scale up the size of driving transistors in the charging circuit so that the delay is scaled by 0.7 per generation. The incurred area and power overhead across technologies can be estimated with the other parameters scaled using the scaling factors in [31]. The estimated data in Table IV shows the increasing area overhead. We can see that the minimum idle time is comparably larger than that of the other techniques evaluated due to the large transition energy. However, despite the decreasing effectiveness of BBC, the data in Table V shows the minimum idle time reduces due to the larger percentage of leakage in newer technologies. C. Power Supply Gating (PSG) A technique that has not received much attention is the use of phase-locked loops (PLLs) as voltage regulators for power supply gating (PSG). PLLs have been mostly used as frequency synthesizers in microprocessor systems and their popularity is even larger now that variable frequency generators are needed to support DVS. By taking advantage of such a familiar device and making the correct design choices, the PLL can be converted into an efcient voltage regulator, which at the same time can support leakage reduction by PSG (i.e., by setting its output voltage to zero). One of the very attractive characteristics of the PLL is that it provides a way to compensate for the inuence of process variations, although some designs may require self-biasing schemes. Fig. 5 shows one possible scenario in which the PLL can be used. Note that the presence of the voltage follower

(6) Where is the period when the charge pump is charging the is the time for the substrate to be fully charged substrate, to the desired substrate bias level, is the area utilized, and

TSAI et al.: CHARACTERIZATION AND MODELING OF RUN-TIME TECHNIQUES FOR LEAKAGE POWER REDUCTION

1227

TABLE IV PERFORMANCE PENALTY AND DELAY TIME FOR TRANSITIONS BETWEEN SLEEP MODE AND ACTIVE MODE FOR BBC

TABLE V VARIOUS PERFORMANCE PARAMETERS FOR BBC

Fig. 5.

PLL as a voltage regulator.

is optional but highly preferable. In this way, the current uctuations caused by the unit being powered up are isolated from the reference voltage ( ), which results in improved regulation. Additionally, the design process is facilitated as the design of the PLL and the buffer can be optimized separately, while keeping in mind that the buffers timing behavior should be at least as fast as the response time of the PLL. In the gure above, two situations are possible. The sleep signal provides a way to perform global leakage reduction by shutting down the PLL and consequently all supply voltages that depend on the reference voltage generated ( ), while the enable signal in the buffer provides support for local supply gating of only the units being powered by that particular buffer. We now discuss these two possibilities. 1) Global PSG: As with the previous techniques, we need to determine the savings obtained, the performance penalty, and the power and area overhead incurred. The savings are easily determined, as the removal of the power supply will make the leakage power of the target unit close to zero. While the PLL is turned off, the leakage in the PLL circuitry is small due to the relatively small size of the circuit and is negligible compared to the standby power consumed by the PLL bias circuitry. Nevertheless, by adding a signal that shuts down the bias circuitry (which can be easily accomplished by the same signal

that puts the PLL into sleep mode), the total static power overhead is almost zero. The performance penalty is determined by the time required to achieve lock such that the voltage at loop capacitor reaches the intended supply voltage. Assuming that the buffer is sized such that the response time is solely determined by the PLL, we now focus in dening the PLL timing behavior in terms of the PLL dynamic parameters. We will use the methods and equations described in [31] to determine the impact of technology scaling on the wake-up time of the PLL. We limit ourselves here to briey explain the nature of the PLL phase acquisition time, but the reader is referred to [31] for further details. The total lock acquisition time is (10) is the external reference frequency to the PLL, Where is the multiplication ratio and is a scaling constant. Additionally, and perform a correction on the calculated times due to the discrete-time nature of the PLL which makes the times slightly shorter than they actually are. This is the time required by the PLL to restore the desired voltage, coming from a nonswitching state. In other words, this is the performance penalty

1228

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 12, NO. 11, NOVEMBER 2004

incurred by shutting down the power supply. We see that an efand defective way to reduce this factor is by increasing ). Average errors in respect to circreasing (optimally, cuit-level simulation were below 5%. The area overhead is basically determined by comparing the area occupied by the PLL design against the area occupied by the target unit. The low-pass lter capacitor strongly determines the total PLL area, taking as much as 80% of the PLL area in certain designs. A good way to achieve the same performance with a small lter capacitor is by using a high-operating frequency, which also helps to improve the PLLs response but will negatively affect the power overhead, as shown later. It must be pointed out that since the PLL is targeted for use at a large level of granularity where multiple functional units are affected and a large area is being inuenced, area overhead may not be a concern. In contrast to the previous schemes, besides the power overhead incurred during the enabling/disabling of the sleep mode, there is also a power overhead during the time the scheme is not being applied. While the unit is functioning normally, the PLL is also operating in order to provide a stable reference voltage, which translates into continuous power consumption. So, the PLL can be seen as an additional functional unit that increases the systems dynamic power consumption. In [31], a complete PLL power model was derived and validated, providing estimates within 5% of circuit-level simulation (SPICE). The total PLL power under lock is estimated as

TABLE VI PLL CHARACTERISTICS AT THE VARIOUS TECHNOLOGIES

(5-stage differential VCO with ) and whose characteristics are given in Table VI. In this scheme, the minimum idle time is not determined as formulated earlier since there is a permanent power overhead and the shut down of the PLL for any amount of time will provide energy savings. Thus, the decision of gating the supply voltage depends solely on minimizing the effects on the units performance, which can be expressed as (12) This means that if the idle time is long enough such that the PLL can be shut down and enabled again without exceeding this time frame, the application scheme does not harm performance while providing power reduction. This results in a moderate minimum idle time 2) Local PSG: As it was mentioned before, additional power is consumed during the normal operation of the units, since the PLL and the buffer must be active to provide the required supply voltage. For some cases, this dynamic power overhead is as large as, and even greater than, the target units dynamic power. Thus, under this condition, to mask the PLL dynamic power overhead, one can share the reference voltage generated by the PLL among various units and have a buffer assigned to each unit, allowing for individual shut down. Under these conditions, the data in Table VII was generated. Observe the drastic reduction obtained in area overhead; while the buffer operating power is just a fraction of the PLL power in most cases, and except when the units are signicantly large and demand high currents. The enable time numbers given are not the minimum possible but can be reduced at the expense of additional operating power overhead and reduced regulation. The buffer design chosen is commonly used in voltage down converters (VDCs) for memory chip applications [26]. The basic cell consists of a differential amplier and a pMOS driver, which could alternatively be an nMOS driver. Fig. 6 shows the cell and the required modications to allow the buffer complete shutdown, which is controlled by a second-level sleep or enable signal. The differential stage isolates the driver from the reference voltage while providing a feedback loop that keeps track of variations at the driver output voltage such that its regulation improves. The driver should be sized such that the load-current requirements are met while the differential stage could be sized in order to improve regulation performance (i.e., increase the buffer loop bandwidth) at the expense of a sluggish enabling process. Note that, to maximize efciency, the reference voltage should be as close as possible to the real supply voltage such that the power dissipated by the driver transistor is much less than the power delivered to the load. To drive the output voltage to

(11) accounts for gate, diffusion and interconnect caWhere and pacitances, is the number of stages in the VCO, are the number of transistors in the phase-frequency deand tector (PFD) and the frequency divider (FDIV), are the activity factors for the PFD and FDIV, is the multiis the external reference frequency. For plication factor and the contribution of the bias circuitry, the only requirement is an ), which we have chosen to be estimate of the total current ( proportional to . In addition to the power consumed during a locked state, power is also used in shutting down and later turning back on the PLL. The shutdown power can be roughly estimated as 65% (depending on the operating frequency) of the power consumed during the running (locked) state, as long as the bias circuitry is shut down as well. The experiments performed in [31], showed that during the turn on process, the PLL power is slightly above the locked state power (in the range between 4%6% above). Thus, the above expression is all we need for the power characterization of the PLL. It should be highlighted that, relatively speaking, a higher operating frequency improves response time more than it harms the power behavior of the PLL. Besides the PLL, there is additional power and area overhead caused by the voltage follower, assuming that it is designed to be as fast as the PLL so that the PLL determines the performance penalty. In the next section, we will take a closer look at the proposed buffer design and study the feasibility of the local PSG scheme. For the global supply gating scheme, the describing parameters used are those of our base PLL design

TSAI et al.: CHARACTERIZATION AND MODELING OF RUN-TIME TECHNIQUES FOR LEAKAGE POWER REDUCTION

1229

TABLE VII VARIOUS PERFORMANCE PARAMETERS FOR THE PSG SCHEME. THE LEAKAGE POWER REDUCTION IS VIRTUALLY 100% FOR ALL UNITS

zero, the input signal must be gated while the additional pMOS and a device increases the voltage at the drivers gate to complete shut down is achieved. In contrast to what was done earlier with buffers for BBC, the driver is not sized for a constant rising time or a constant delay overhead, but to meet the corresponding units average current requirements during normal operations. This will make the response time of the buffer dependent on the unit, as shown in Table VII. Due to this reason, the results in Table VII show that the area overhead and buffer enable time (performance penalty), depend on the unit which is supply gated. We also observe that the minimum idle times decrease as incurred performance and power penalty decrease with technology scaling. An alternative is to adjust the differential amplier bias current in order to obtain a shorter enabling/disabling time, at the expense of additional power during normal operation and reduced regulation performance. One way that supply gating can be exploited is to integrate both levels on a system. Consider a system-on-a-chip (SOC) design in which many functional units are present. At a larger level of granularity, the supply voltage of each core can be shut down whenever the processing unit is not being used so that leakage can be eliminated. In this case, the dynamic power overhead incurred by the PLL does not come close to the dynamic power consumed by the assigned core and as mentioned earlier, such an infrastructure can be either an enhancement of an existing DVS scheme or provide the starting base for such a technique. At a lower level of granularity, the technique can also be applied inside a processing core so that the PLL is not disabled while the individual buffers that provide power to functional units inside the core are. In this way, additional leakage power savings are achieved by taking advantage of the idleness of certain components in the core. No signicant power overhead is incurred since the PLL is shared among all the units and its overhead is calculated at the processing unit level.

Fig. 6. Schematic of the voltage follower used, including the required modications.

A. Input Vector Control Performing leakage reduction on memory structures presents a new set of difculties not considered yet. Due to the symmetry of memory arrays, no saving is achieved by forcing the cell data to either 0 or 1. Besides, it is especially important to maintain the stored data for memory during the application of a given leakage reduction scheme. In the case of IVC, this condition can be achieved as the decoders and the memory array can be isolated from each other during the application of the optimum input vector. This, however, will only affect the leakage of the decoders and not that of the cell array, which is the main source. Additionally, this technique could only be applied when the entire structure is not being used, which will not be frequent given the increasing role of memory structures in current microprocessor designs. However, there is another technique, leakage-biased bitlines (LBB) [32], based on a concept similar to that of IVC to mitigate the bitline leakage owing through the access transistors. Instead of forcing the bitlines of inactive subbanks to nMOS precharging sleep vector, it simply turns off the hitransistors and lets the bitlines oat. The leakage current from the bit cells automatically bias the bitlines to a mid-rail voltage that minimizes the bitline leakage current. The transition energy penalty happens when restoring the charge back to the bitlines before the memory cells can be used and wakeup latency is the precharging time, which is delayed until the subbank needs to be accessed. The simulation results are shown in Table VIII. B. BBC In the case of BBC, the situation would be similar to the case of datapath logic for unit-level granularity (i.e., the body voltage is adjusted for the entire array) and comparable numbers for area overhead, performance penalty, and leakage savings could be

IV. CHARACTERIZATION OF LEAKAGE REDUCTION TECHNIQUES FOR MEMORY STRUCTURES In this section, the techniques evaluated previously are applied to a 128-bit SRAM array. The characterizations, obtained potential energy savings, and all incurred overheads are discussed. The last row in Table I denes the dynamic and leakage power as the base information for experiments to follow.

1230

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 12, NO. 11, NOVEMBER 2004

TABLE VIII LEAKAGE REDUCTION OF IVC-BASED LEAKAGE-BIASED BITLINES(LBB) FOR A 128-BIT SRAM CELL ARRAY

TABLE IX VARIOUS PERFORMANCE PARAMETERS FOR BODY BIAS CONTROL (BBC) FOR A 128-BIT SRAM CELL ARRAY

PERFORMANCE PARAMETERS

FOR THE

TABLE X MODIFIED GATED-V SCHEME FOR A 128-BIT SRAM. THE AREA SLEEP CONTROL SIGNALS IS NOT INCLUDED

OF

ROUTING

FOR

obtained. A set of simulations of a 128-bit SRAM cell array were done and the results are presented in Table IX. C. PSG There is a new challenge that appears if leakage reduction is to be achieved by PSG. Caches and other memory structures play an important role in current designs and are rarely left idle for long periods of time. Thus, global and local supply gating as proposed in the previous section are not adequate for use with memory structures as elimination of the supply voltage should be done at ner granularities. For instance, if blocks or rows of a cache are found to be idle for some time, these particular blocks or rows can be shut down so that leakage savings can be obtained. Thus, PSG based on sleep transistors is the most adequate technique for leakage reduction on memory structures since they can also preserve data. We now discuss some of the issues inherent with such a technique, where only a large sleep transistor per row is needed. HSPICE simulations were done in 0.25 m, 0.18 m, and 70-nm technology under the conditions shown in Table II. are used as sleep transistors due Transistors with nominal to area overhead and performance concerns. For our nMOS sleep transistor implementations, the maximum virtual ground ) levels that guarantee the functionality of inverter loop ( are 2.15 V for 0.25 m, 1.5 V for 0.18 m, and 0.88 V for 70-nm level during sleep mode technology, respectively. The lowers with larger sleep transistors. When this level equals or , the data stored in cell will is lower than the maximum be preserved. By correctly sizing the sleep transistor, one can level for keep the virtual ground level at the maximum data holding while maximizing the leakage reduction. Table X shows the leakage reduction, area penalty, and speed penalty for the schemes with each type of sleep transistor. Compared scheme, the use of larger-size and to a conventional gatedsleep transistors benets performance. The pMOS nominalsleep transistor gating has negligible performance penalty.

However, the nMOS gating and CMOS gating of the modied scheme achieve comparable performance. Thus, the gatedwith nMOS and CMOS gating scheme modied gatedfor SRAM can achieve high-leakage reduction with small area penalty while performance is slightly affected. Additionally, our results show that the dynamic power dissipation penalty is negligible, in particular, for the 70-nm technology. It must be noted that as a result of the large size of the sleep transistor, the caused coupling between the sleep control signal and by gate to drain capacitances may corrupt the stored data. The ramp-up rate of the sleep control signal was also studied with a 128-bit SRAM. Data is preserved with a maximum slope of 1.4 V/50 ns. If the size of SRAM gets larger and a wider sleep transistor is needed, inserting a buffer to reduce the control signal rising/falling slopes might be required. Table X shows that the effectiveness improves and the minimum idle time decreases as expected. However, as the sleep transistor is sized to preserve data in sleep mode, both the area and performance penalty increase for smaller technologies. Memory elements occupy a signicant portion of the whole chip area in embedded designs, which in turn consume the dominant fraction of leakage power. Leakage power reduction for memory structures has been crucial for power-consumption control. It is shown that the reduction of leakage power of cache hierarchies can be exploited most by techniques which can be controlled in a cache-line level while preserving data. V. COMPARISONS AND IMPLICATIONS OF TECHNOLOGY SCALING It has been shown that leakage power reduction can be achieved at different levels by paying different costs, which include performance penalty, additional area, and power overhead, for all the techniques evaluated. The results obtained conrmed an intuitive reasoning that the presented techniques followed a hierarchical distribution, but not quite following the

TSAI et al.: CHARACTERIZATION AND MODELING OF RUN-TIME TECHNIQUES FOR LEAKAGE POWER REDUCTION

1231

TABLE XI COMPARISON AMONG THE EVALUATED TECHNIQUES AT 0.18 m AND 0.07-m TECHNOLOGY. *NUMBERS IN THIS ROW DO NOT INCLUDE THAT OF THE MULTIPLIER FOR AVERAGING ; **:D STANDS FOR DATAPATH LOGIC WHILE M STANDS FOR MEMORY CIRCUITS

TABLE XII COMPARISON OF IMPACTS OF TECHNOLOGY SCALING AMONG THE EVALUATED TECHNIQUES BASED ON FIXED AREA OVERHEAD OVER TECHNOLOGIES

assumption that those techniques with more demanding implementation requirements (i.e., circuit complexity, area overhead) and higher performance penalties provide the larger leakage power savings. To better illustrate the observed behavior, Table XI presents average values of the dened performance parameters across all units evaluated for each technique. outperThe rst interesting observation is that gatedforms BBC in all aspects, including leakage reduction. The reasoning behind this is that the sleep transistor provides stacking effect to all the transistors of the same type in the circuit. The effect of transistor stacking on leakage power (which reportedly reduces leakage by at least a factor of 10 [16]) is more signicant than those caused by an increase in the threshold voltage. However, this does not mean that we can completely rule out the applicability of BBC. As expected, the PSG scheme presents the maximum possible leakage reduction at the cost of requiring the largest overheads, which restricts its applicability to the processing-unit-level. But it was found, however, that a variation of this scheme, which we called local PSG, provides the same level of leakage reduction but with overheads that make it possible to apply the technique at lower levels of granularity. A very desirable side-effect obtained in both cases is the possibility of performing dynamic voltage scaling (DVS) since the required support is also provided by using a PLL as a voltage regulator. To facilitate the design and study of the PLL for such types of applications, time response models for the PLL were formulated and validated, obtaining estimates with average deviation of 5% with respect to SPICE simulation. These models are generic and can be used independently of the PLL application (i.e., power regulator, clock synthesizer, etc.). It is clear that PSG, at any level, cannot be used if the logic state of the target unit is to be maintained.

It is also interesting to observe how these techniques perform over technology scaling. Table XII shows the trends of the evaluating parameters while technology scales, based on the assumption of a scaling factor of 0.7 per generation for the cycle time. It should be noted that the effectiveness of BBC will reduce as technology scales while that of others increase. The decreasing roll off and elevating SCE. effectiveness of BBC is due to Note that the declining leakage reduction causes undesirable idle time needed to gain power savings. To solve this problem, larger driving devices are recommended at the expense of the area overhead. Our results show that even though the effectiveness of BBC decreases, the reduction will still be signicant for 70 nm ( in average) and the minimum idle time can be tuned to a desirable value with reasonable area overhead. However, BBC is intrinsically more problematic for reliability as the high-voltage across oxide decreases the lifetime of devices. Column four in Table XII shows decreasing minimum idle time for all the techniques evaluated regardless of the trends for effectiveness. This is due to the increasing percentage of the leakage power.The decreasing ratio shownis the ratio of the minimum idle time in cycles in 0.18- m technology to that in 70-nm technology assuming 0.7 (per generation) scaling factor of the cycle time. This trend of decreasing minimum idle time indicates that there will be more opportunities in future technologies for applying the leakage mitigation techniques for even shorter duration of functional unit/memory idleness. This summary is intended to provide an overall guideline in understanding the effectiveness of the techniques in newer technologies. However, more detailed analysis will be necessary to evaluate the exact magnitude of decrease in minimum idle time with scaling as it is inuenced by the actualcircuitunderconsideration.Further,thedecreaseinminimum idle time should not be expected to decrease linearly with tech-

1232

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 12, NO. 11, NOVEMBER 2004

nology scaling due to the nonlinear effects inuencing leakage and leakage reduction techniques. All the evaluated techniques cause only a delay penalty when that also incurs a run-time waking up the units except gated(active mode) performance penalty due to the sleep transistor being in series. Because of the increasing driving current requirements, even though the wake up time decreases, the simulation result shows that the run-time performance penalty in. Also, it must be observed that the IVC creases for gatedcan also incur run-time penalty in active mode if the base version of functional unit did not require a latch for its inputs. We believe that the results presented here will provide a reference ground for circuit designers and computer architects so that the appropriate techniques are used for a target technology node for reducing leakage REFERENCES
[1] A. Chandrakasan, W. Bowhill, and F. Fox, Design of High Performance Microprocessor Circuits. Piscataway, NJ: IEEE Press, 2000. [2] S. Borkar, Design challenges of technology scaling, IEEE Micro, vol. 19, pp. 2329, JulyAug. 1999. [3] M. Anis and M. Elmasry, Multi-Threshold CMOS Digital Circuits -Managing Leakage Power. Norwell, MA: Kluwer , 2003. [4] L. Wei, Z. Chen, K. Roy, M. Johnson, and V. De, Design and optimization of dual-threshold circuits for low-voltage low-power applications, IEEE Trans. VLSI Syst., vol. 7, pp. 1624, Mar. 1999. [5] S. Sirichotiyakul, T. Edwards, O. Chanhee, Z. Jingyan, A. Harchoudhury, R. Panada, and D. Blaauw, Stand-by power minimization through simultaneous, threshold voltage selection and circuit sizing, in Design Automation Conf., Jun 1999, pp. 436441. [6] S. Narendra, J. Tschanz, A. Keshavarzi, S. Borkar, and V. De, Comparative performance, leakage power and switching power of circuits in 150 nm PD-SOI and bulk technologies including impact of SOI history effect, in Symp. VLSI Circuits, June 2001, pp. 217218. [7] K. Kim, R. V. Joshi, and C.-T. Chuang, Strained-Si devices and circuits for low-power applications, in Int. Symp. Low-Power and Electronics Design, Aug 2003, pp. 180183. [8] B. Yu, L. Chang, S. Ahmed, H. Wang, S. Bell, C.-Y Yang, C. Tabery, C. Ho, O. Xiang, T.-J King, J. Bokor, C. Hu, M.-R Lin, and D. Kyser, FinFET scaling to 10 nm gate length, in Int. Electron Devices Meeting, Dec. 2002, pp. 251254. [9] E. J. Nowak, I. Aller, T. Ludwig, K. Kim, R. V. Joshi, C.-T Chuang, K Bernstein, and R. Puri, Turning silicon on its edge, IEEE Circuits Devices Mag., vol. 20, pp. 2031, Jan.-Feb. 2004. [10] T. Fukai, Y. Nakahara, M. Terai, S. Koyama, Y. Morikuni, T. Suzuki, M. Nagase, A. Mineji, T. Matsuda, T. Tamura, F. Koba, T. Onoda, Y. Yamada, M. Komori, Y. Kojima, Y. Yama, M. Ikeda, T. Kudoh, T Yamamoto, and K. Imai, A 65 nm-node CMOS technology with highly reliable triple gate oxide suitable for power-considered system-on-a-chip, Tech. Papers VLSI Technol., pp. 8384, June 2003. [11] S. Datta, G. Dewey, M. Doczy, B. S. Doyle, B. Jin, J. Kavalieros, R. Kotlyar, M. Metz, N Zelick, and R. Chau, High mobility Si/SiGe strained channel MOS transistors with HfO/sub 2//TiN gate stack, in IEEE Int. Electron Devices Meeting, Dec. 2003, pp. 28.1.128.1.4. [12] A. Ferre and J. Figueras, Characterization of leakage power in CMOS technologies, in IEEE Int. Conf. Electronics, Circuits, Systems, vol. 2, Sept. 1998, pp. 185188. [13] Z. Cheng, M. Johnson, L. Wei, and K. Roy, Estimation of standby leakage power in CMOS circuits considering accurate modeling of transistor stacks, in Int. Symp. Low-Power Electronics Design, Aug 1998, pp. 239244. [14] M. Johnson, D. Somasekhar, and K. Roy, Models and algorithms for bounds in CMOS circuits, IEEE Trans. Computer-Aided Design, vol. 18, pp. 714725, June 1999. [15] S. Bobba and I. Hajj, Maximum leakage power estimation for CMOS circuits, in IEEE Alessandro Volta Memorial Workshop Low-Power Design, Mar. 1999, pp. 116124. [16] Y. Ye, S. Borkar, and V. De, A new technique for standby leakage reduction in high-performance circuits, in Symp. VLSI Circuits, June 1998, pp. 4041.

[17] M. Johnson, D. Somasekhar, and K. Roy, Leakage control with efcient use of transistor stacks in single threshold CMOS, in Design Automation Conf., June 1999, pp. 442445. [18] J. Halter and F. Najm, A gate-level leakage power reduction method for ultra low power CMOS circuits, in IEEE Custom Integrated Circuits Conf., May 1997, pp. 475478. [19] S. Mutoh, T. Douskei, Y. Matsuya, T. Aoki, S. Shigematsu, and J. Yamada, 1-V power supply high-speed digital circuit technology with multi-threshold voltage CMOS, IEEE J. Solid-State Circuits, vol. 30, pp. 847854, Aug. 1995. [20] L. Wei, Z. Chen, M. Johnson, K. Roy, and V. De, Design and optimization of low-voltage high-performance dual threshold CMOS circuits, in Design Automation Conf., June 1998, pp. 489494. [21] F. Assaderaghi, D. Sinitsky, S. A. Parke, J. Bokor, P. K. Ko, and C. Hu, Dynamic threshold-voltage MOSFET (DTMOS) for ultra-low voltage VLSI, IEEE Trans. Electron Devices, vol. 44, pp. 414422, Mar. 1997. [22] L. Wei, K. Roy, and V. De, Low-voltage low-power CMOS design techniques for deep submicron ICs, in Int. Conf. VLSI Design, Jan. 2000, pp. 2429. [23] T. Kuroda et al., A 0.9 V 150 MHz 10 mW 4 mm 2 2-D discrete cosine transform core processor with variable threshold-voltage (VT) scheme, IEEE J. Solid-State Circuits, vol. 31, pp. 17701779, Nov. 1996. [24] A. Dancy and A. Chandrakasan, Techniques for aggressive supply voltage scaling and efcient regulation, in IEEE Custom Integrated Circuits Conf., May 1997, pp. 579586. [25] G. Wei and M. Horowitz, A fully digital, energy-efcient, adaptive power-supply regulator, IEEE J. Solid-State Circuits, vol. 34, pp. 520528, Apr. 1999. [26] S. Jou and T. Chen, On-chip voltage down converter for low power digital system, IEEE Trans. Circuits Syst. II, vol. 45, pp. 617625, May 1998. [27] V. Von Kaenel et al., A voltage reduction technique for battery-operated systems, IEEE J. Solid-State Circuits, vol. 25, pp. 11361140, Oct. 1990. [28] K. Agawa et al., A bitline leakage compensation scheme for low voltage SRAMs, IEEE J. Solid-State Circuits, vol. 36, pp. 726732, May 2001. [29] A. Agarwal, H. Li, and K. Roy, DRG-cache: A data retention gatedground cache for low power, in Design Automation Conf., June 2002, pp. 473478. [30] A. Keshavarzi et al., Technology scaling behavior of optimum reverse body bias for standby leakage power reduction in CMOS ICs, in Int. Symp. Low-Power Electronics Design, Aug. 1999, pp. 252254. [31] D. Duarte, Clock network and phase-locked loop power estimation and experimentation, Ph.D. dissertation, Pennsylvania State University, University Park, PA, 2002. Yuh-Fang Tsai (S00) received the B.S. degree in electronics engineering from Chun-Yuan Christine University, Chun-Li, Taiwan, R.O.C., in 1996 and the M.S. degree in computer science and engineering from Pennsylvania State University, University Park, PA, in 2002, where she is currently pursuing the Ph.D. degree. Her research interests include leakage power management and power aware VLSI circuit and system designs.

David E. Duarte (S97M02) received the electronics engineer degree from the Ponticia Universidad Javeriana, Bogota, Colombia, in 1996 and the M.S. and Ph.D. degrees in electrical engineering from Pennsylvania State University, University Park, PA, in 1999 and 2002, respectively. He is currently with Intel Corporation, Hillsboro, OR, working on analog circuit design for Intel microprocessors. From January 1995 to May 1997, he was a part-time Instructor with the Department of Electronics Engineering, Ponticia Universidad Javeriana, Bogota, Colombia. From April 1996 to August 1997, he was a Research Engineer with ITEC-TELECOM, Bogota, Colombia. His research interests include low-power analog VLSI circuit design. Dr. Duarte received the Universidad Javerianas rst rank in the second class of electronic engineers of 1996. He was also the recipient of the 2003 IEEE Circuits and Systems Society VLSI Transactions Best Paper Award.

TSAI et al.: CHARACTERIZATION AND MODELING OF RUN-TIME TECHNIQUES FOR LEAKAGE POWER REDUCTION

1233

N. Vijaykrishnan (S97M98) is an Associate Professor in the Computer Science and Engineering Department, Pennsylvania State University, University Park. His research interests are in the areas of energyaware reliable systems, embedded Java, nano/VLSI systems and computer architecture. He has authored more than 150 papers in these areas. Dr. Vijaykrishnan has received several awards including the IEEE CAS VLSI Transactions Best Paper Award in 2002, the Pennsylvania State CSE Faculty Teaching Award in 2002, the ACM SIGDA Outstanding New Faculty Award in 2000, the Upsilon Pi Epsilon Award for Academic Excellence in 1997, the IEEE Computer Society Richard E. Merwin Award in 1996, and the University of Madras, Chepauk, India, First Rank in Computer Science and Engineering in 1993. He is the co-editor-in-chief of the ACM Journal on Emerging Technologies in Computing and an Associate Editor of the IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION SYSTEMS.

Mary Jane Irwin (S74M77SM89F95) received the Ph.D. degree in computer science from the University of Illinois, Urbana-Chanpaign in 1977. In 1977, she joined Pennsylvania State University, University Park, as a Faculty Member, where she is currently the A. Robert Noll Chair in Engineering in the Department of Computer Science and Engineering. Her research and teaching interests include computer architecture, embedded and mobile computing systems design, power aware design, and electronic design automation. Dr. Irwin received an Honorary Doctorate from Chalmers University, Sweden, in 1997 and the Pennsylvania State Engineering Societys Premier Research Award in 2001. She was named an ACM Fellow in 1996, and was elected to the National Academy of Engineering in 2003. She is currently serving as a member of the Technical Advisory Board of the Army Research Lab and as the Co-Editor-in-Chief of ACMs Journal of Emerging Technologies in Computing Systems. She has also served as an elected member of the Computing Research Associations Board of Directors, IEEE Computer Societys Board of Governors, ACMs Council, and as Vice President of ACM. She has served in leadership roles for several major conferences including General Chair of the 1996 Federated Computing Conference, General Co-chair of the 1998 CRA Conference, Snowbird, UT, General Chair of the 36th Design Automation Conference, and the General Co-chair of the 2002 International Symposium on Low-Power Electronics and Design.

S-ar putea să vă placă și