Sunteți pe pagina 1din 13

This article has been accepted for inclusion in a future issue of this journal.

Content is final as presented, with the exception of pagination.


IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS 1
Unied Capture Scheme for Small Delay Defect
Detection and Aging Prediction
Song Jin, Yinhe Han, Member, IEEE, Huawei Li, Senior Member, IEEE, and Xiaowei Li, Senior Member, IEEE
AbstractSmall delay defect (SDD) and aging-induced circuit
failure are both prominent reliability concerns for nanoscale inte-
grated circuits. Faster-than-at-speed testing is effective on SDD
detection in manufacturing testing, which is always implemented
by designing a suite of test signal generation circuits on the chip.
Meanwhile, the integration of online aging sensors is becoming
attractive in monitoring aging-induced delay degradation in the
runtime. These design requirements, if implemented in separate
ways, will increase the complexity of a reliable design and
consume more die area. In this paper, a unied capture scheme is
proposed to generate programmable clock signals for the detec-
tion of both SDDs and circuit aging. Our motivation arises from
the observations that SDD detection and online aging prediction
both need to capture circuit response ahead of the functional
clock. The proposed aging-resistant design method enables the
ofine test circuit to be reused in online operations. Reversed
short channel effect is also exploited to make the underlying
circuit resilient to process variations. The proposed scheme is
validated by intensive HSPICE simulations. Experimental results
demonstrate the effectiveness in terms of low area, power, and
performance overheads.
Index TermsFaster-than-at-speed testing, online aging
prediction, reversed short channel effect unied capture scheme.
I. INTRODUCTION
W
ITH the relentless scaling in technology, small delay
defects (SDDs), which commonly arise from resistant
opens and shorts, gate oxide failure, via voids, etc., have
become a serious problem in integrated circuits [1], [2]. SDDs
can cause timing failure when they are activated on a longer
path during functional operation [3]. Moreover, SDDs are also
a threat to circuit reliability because such defects may be
magnied by the subsequent aging in the eld, resulting in
permanent device failure [4]. It is therefore essential to detect
SDDs in the chip during the fabrication testing stage itself [5].
However, traditional at-speed delay testing based on transition
fault model is always prone to sensitization of the short paths
Manuscript received September 9, 2011; revised April 3, 2012; accepted
April 11, 2012. This work was supported in part by the National Basic
Research Program of China (973) under Grant 2011CB302503 and Grant
2011CB302501, and the National Natural Science Foundation of China under
Grant 61076037, Grant 60921002, and Grant 61176040, and the Fundamental
Research Funds for the Central Universities under Grant 12MS123.
S. Jin was with the State Key Laboratory of Computer Architecture, Institute
of Computing Technology, Chinese Academy of Sciences, Beijing 100190,
China. He is now with the Department of Electronic and Communication
Engineering, School of Electrical and Electronic Engineering, North China
Electric Power University, Baoding 071003, China.
Y. Han, H. Li, and X. Li are with the State Key Laboratory of Computer
Architecture, Institute of Computing Technology, Chinese Academy of
Sciences, Beijing 100190, China (e-mail: yinhes@ict.ac.cn).
Color versions of one or more of the gures in this paper are available
online at http://ieeexplore.ieee.org.
Digital Object Identier 10.1109/TVLSI.2012.2197766
in the circuit. SDDs will still be hidden because of the large
timing slack of the short paths under the functional clock,
thereby escaping the test stage. One of the effective ways to
detect SDDs is by exploiting faster-than-at-speed testing [6].
By increasing the test clock frequency, timing slackness of
the short paths decreases, which improves the capability of
screening SDDs.
On the other hand, aging effects have become a prominent
reliability challenge as the process advances into the nanome-
ter regime [7]. Much work has been done on understanding
and modeling the intrinsic mechanism of aging, such as
negative/positive bias temperature instability (NBTI/PBTI),
time-dependent dielectric breakdown, hot-carrier injection,
electromigration, etc. Given that circuit aging is a gradual
process, online aging prediction [8][10] is an effective way
to prevent the system data or state corruption from aging-
induced circuit failure. It captures the circuit response under
the normal working mode and generates a warning signal if the
aging-induced delay degradation exceeds a specied threshold.
Based on this warning information, some redundancy or tuning
mechanisms can be enabled to ensure that the system can
continue to work well.
The hardware circuit used in manufacturing testing is
usually deactivated or abandoned when the chip works in
the eld. However, for reliability, the designer still needs to
insert specic circuits for online monitoring some faults which
may result in functional failure of the chip, such as soft
error, inductive noise, and aging-induced delay degradation.
If we can reutilize the hardware circuit designed for ofine
manufacturing testing to online fault detection or monitoring,
the implementation complexity in defect- or aging-related
reliability design can be signicantly reduced. Meanwhile, the
total area overhead consumed by ofine and online circuits
can be saved as well.
We noticed that SDD detection and online aging prediction
have a common characteristic, i.e., both of them need to
capture the circuit response ahead of the functional clock.
This motivates us to propose a unied capture scheme to
support both faster-than-at-speed testing and online aging
prediction. It reuses the ofine test hardware circuit in online
monitoring of circuit aging and generates an on-chip program-
mable clock signal to exibly capture the circuit response at
the designated time.
However, the implementation of a unied capture scheme is
confronted with some challenges. The intrinsic aging process
will degrade the on-chip circuit. With the continuous reduction
in oxide thickness and the increase in operational temperature,
the NBTI effect can become a limiting factor in the device
10638210/$31.00 2012 IEEE
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
2 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS
lifetime [11]. The PBTI effect emerges as a dominant aging
mechanism in sub-32-nm high-k processes [12], [13]. Aging
effects can cause the drift in the generated clock signal and
thereby result in the interval for capturing the circuit response
in online aging prediction to deviate from its initial span
gradually. This makes the traditional hardware circuit designed
for fabrication testing unt for online operation.
Another important design concern in the proposed clocking
scheme is the negative impact of process variation (PV) on the
on-chip circuit. PV will cause a skew in the generated clock
signal, thus affecting the test efciency of faster-than-at-speed
testing or the prediction accuracy of online aging prediction.
We tackled the above design concerns in the proposed
unied capture scheme. Our main contributions are as follows.
1) The proposed unied capture scheme relies on the
generation of programmable clock signals to achieve
different test frequencies in faster-than-at-speed testing
or capture intervals in online aging prediction. Moreover,
the capture interval can be dynamically adjusted in the
runtime to further compensate the aging-induced drift
in itself.
2) The proposed aging-resistant design method signi-
cantly reduces the drift in the generated clock signal
under runtime NBTI effect. Reversed short channel
effect (RSCE) is also exploited to determine the optimal
transistor channel length for the circuit design. This
minimizes the skew of the generated clock signal in the
presence of PV.
The rest of this paper is organized as follows. Section II
introduces the related work. Details of the unied capture
scheme are presented in Section III. PV-resilient design con-
cern is discussed in Section IV. Section V presents the veri-
cation and simulation results. We conclude in Section VI.
II. RELATED WORK
To tackle NBTI-induced reliability problem, some
researchers have proposed to model, analyze, and optimize
NBTI-induced degradation in the design phase. A long-term
NBTI model was formulated in [14][16]. NBTI-induced
degradation on random logic and SRAM with [17], [18] and
without [19], [20] considering PV has been analyzed. Some
NBTI-resistant techniques were proposed accordingly [21],
[22]. While theoretical analysis for NBTI effects is valuable,
the actual NBTI-induced performance degradation strongly
depends on the practical workloads executed on the circuit
and hence it is essential to be able to monitor it online.
Online aging prediction has attracted much attention in
recent years. Agarwal et al. have proposed the design of
an aging sensor in [8] and [9], wherein a warning signal
will be triggered if a delayed transition occurred within the
predetermined guardband interval. Yan et al. have proposed
a unied fault detection scheme called SVFD [10] which
can be used to detect soft error and aging-induced delay
degradation at the cost of adding extra clock signals. The
designs in [8][10] have the same problem: the guardband
bound can have a large deviation from the designated value
due to PV, which may result in false alarms. Keane et al.
have proposed an on-chip NBTI degradation sensor using a
delay-locked loop for analyzing pMOS aging under dc and
ac stress scenarios [11]. In [23], a circuit failure prediction
method was proposed by measuring the delay of the logic
block. Although the circuit delay degradation can be accurately
measured, the hardware overhead is considerable. A voltage
glitch generation and detection circuit that can be used to mon-
itor the circuit aging was proposed in [24]. However, the circuit
itself in [24] is not aging-resistant. In [25], Vazquez et al.
proposed a programmable aging sensor design that relies on
a congurable pull-down nMOS network. However, the actual
detection range cannot be easily modied by resizing the
nMOSs under the PV-induced uncertainty.
On the spectrum of SDD detection, timing-aware automatic
test pattern generation (ATPG) and faster-than-at-speed testing
are two kinds of commonly used schemes. The former tries to
involve timing information in test generation with a transition
fault model and expects to sensitize longer paths [4], [26][28].
However, as shown in [28], the runtime of the timing-aware
ATPG may be 20 times higher than that of a traditional ATPG.
For the above reasons, some authors have proposed faster-
than-at-speed testing for SDDs detection. This technique tries
to reduce the timing slack of the short path, while masking
the output ip-op (FF) of longer sensitized path to avoid
unnecessary timing failure [6]. Reducing the slack of the
target fault site can increase the possibility to detect SDDs.
However, exploiting the external automatic test equipment
(ATE) to provide the fast clock signal [29] will increase the test
cost considerably. The fast clock signal is commonly obtained
by using the on-chip clock generation circuit. Ahmed et al.
proposed to group the test pattern into some subsets and
determine an optimal test frequency [3]. Since their scheme
does not increase the test frequency to an extent that any paths
exercised at the rated functional frequency may fail, any scan
FF masking is avoided. Nakamura et al. [30] proposed to use
an on-chip clock chopper and gating logic to obtain a faster
test clock at the cost of adding extra pins on the chip. In
[31], McLaurin et al. proposed a clock control circuit using
the phase-locked loop (PLL) clock to perform the faster-than-
at-speed testing. Their scheme, however, is rather complex for
realistic application because the PLL has to be reset for every
pattern. Pei et al. [32] proposed to generate on-chip clocks
during faster-than-at-speed testing by using two symmetric
delay lines. Tayade et al. proposed an on-chip programmable
capture generation circuit that generates a faster capture signal
for faster-than-at-speed testing [33].
The above circuit designs for faster-than-at-speed testing are
not aging-resistant, which makes them unsuitable to be reused
in online aging prediction. Moreover, PV can also result in
large skews in the generated clock signals and thus may lead
to test escape or overtesting.
III. UNIFIED CLOCKING SCHEME
A. Top View
Faster-than-at-speed testing and online aging prediction
have a common characteristic: both of them need to capture
circuit response ahead of the functional clock. As shown in
Fig. 1(a), in faster-than-at-speed testing, after the test vector
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
JIN et al.: CAPTURE SCHEME FOR SDD DETECTION AND AGING PREDICTION 3
FCLK FCLK
OUT TCLK
Capture Interval Launch Capture edge
(b) (a)
Fig. 1. (a) Faster-than-at-speed testing. (b) Online aging prediction. Both
need to capture circuit response in advance of the functional clock.
PCSGM
AS
Circuit under test/monitor
AS
FF FF
WMSM CSM
FCLK
GSEN
SEL
CTRL
xCLK
D D
Fig. 2. Framework of the proposed unied capture scheme.
is applied to CUT by a launch clock, a faster capture clock
is used to capture the test response. This clock reduces the
timing slack of short paths, thereby improving the capability
of SDD detection [6]. Similarly, a capture interval is formed
before the trigger edge of the functional clock during the
time that online aging prediction is performed, as shown in
Fig. 1(b). It generally spans tens to hundreds of picoseconds. If
a transition on the circuit output falls into the capture interval,
it will be captured by the aging sensor and is recognized as
an indication that circuit aging has exceeded the designated
threshold.
Based on the above observations, we propose a unied
capture scheme that generates programmable clock signals
to support both faster-than-at-speed testing and online aging
prediction. The framework of the proposed unied capture
scheme, which consists of several modules, is illustrated in
Fig. 2. The PCSGM module generates programmable clock
signals which can be used as launch and capture clocks
(together denoted as xCLK) for faster-than-at-speed testing as
well as the control signal (CTRL) to form the capture interval
for online aging prediction. Two control signals global scan
enable signal (GSEN) and SEL are applied to the working
mode selection module (WMSM) and the clock selection
module (CSM). They are used to switch the working modes
between faster-than-at-speed testing and online aging predic-
tion as well as to select the corresponding clock signals. The
AS unit is the aging sensor that generates the warning signal
if a transition occurs within the specic capture interval.
B. Circuit Design Details
The circuit design details of the proposed unied capture
scheme (which we call the capture circuit for short in the rest
of this paper) are illustrated in Fig. 3. Some hardware circuits
(shaded) can be reused in both the faster-than-at-speed testing
and the online aging prediction. As shown in Table I, different
combinations of GSEN and SEL determine the work modes
at which the capture circuit operates. Generation of GSEN
and SEL will be discussed in Section III-B5. Note that after
power on, GSEN and SEL both remain at logic 0, the capture
circuit therefore rst steps into the idle mode. In this mode, the
functional clock is applied to the combination circuit (circuit
under test/monitor in Fig. 2), while the capture circuit is at
the aging-resistant state (as discussed in Section III-B4).
1) Generation of Programmable Clock Signal: Two falling
transition signals upper delay line (UDL) and lower delay line
(LDL) are generated by the programmable delay submodule in
PCSGM (shaded parts inside PCSGM). Unlike in prior work
such as [32] and [33], the programmable delay submodule
possesses asymmetric characteristic and operates under a well-
designed control mode. It facilitates the transformation of
UDL and LDL into the clock signals to be used either in
faster-than-at-speed testing or in online aging prediction. The
test clock frequency or the capture interval is decided by
the delay difference between the opened delay stages in the
programmable delay submodule.
The programmable delay submodule is divided into the
upper delay part (UDP) and the lower delay part (LDP). Each
delay part consists of multiple delay stages (DU
i
and DL
i
).
Each delay stage has one or more delay elements (DEs). The
number of delay stages in UDP is higher than that in LDP. In
any case, delay of the opened delay stages in UDP is always
smaller than that in LDP. Suppose the number of delay stages
on UDL and LDL is m and n, respectively, and the number of
opened delay stages on UDL and LDL is p and q, respectively.
The propagation delays of a single DE on UDL and LDL are
denoted as TP
U
and TP
L
, respectively. The delay difference
D
R
between UDL and LDL can be expressed as
D
R
=
q

i=1
TP
L(i)

j =1
TP
U( j )
(1 p m, 1 q n). (1)
When the trigger signal IN is at logic 0, the control pMOS
P1 conducts while the middle nMOS in each nMOS stack shuts
down. In this case, UDL and LDL remain at logic 1. When
IN switches from logic 0 to logic 1, P1 shuts down. At
this time, if one of the FFs in UDP and LDP outputs logic
1, the corresponding nMOS stacks connected with these FFs
will conduct entirely. Consequently, UDL and LDL will switch
from logic 1 to logic 0. Obviously, a well-designed control
vector in the FFs in the programmable delay submodule can be
used to control which nMOS stacks in UDP and LDP should
conduct, i.e., it decides the numbers of opened delay stages
and in turn affects the timings of UDL and LDL. Generation
of this control vector will be discussed below.
When the chip is power on, the unknown state of the
scannable FFs in the programmable delay sub-module may
turn on the connected nMOS stacks, thus resulting in a short
path between VDD and GND. There are two ways to cope
with this problem. The rst way is forcing SEL to be at logic
0 during power on. Alternatively, all scannable FFs can be
synchronously reset during power on. Such resetting can make
sure that all of the scannable FFs output logic 0 and hence
shut down all the nMOS stacks.
The scannable FFs in the UDP and LDP are organized
into one circular shift register (CSR). Before the capture
circuit begins the operation of online aging prediction, GSEN
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
4 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS
Fig. 3. Schematic diagram of capture circuit.
TABLE I
CONTROL FUNCTION OF GSEN AND SEL
Working Control signal Control
mode GSEN SEL function
faster-than- 1 1 scan in or scan out
at-speed testing 0 1 launch, capture
aging prediction 1 0 capture in advance
idle 0 0 functional clock
and SEL both keep remain at logic 0. At this time, a
well-designed control vector can be formed by setting or reset-
ting the scannable FFs to different states (0 or 1) in the CSR.
This control vector is used to control the number of opened
delay stages in UDP and LDP. For the individual segments of
the control vector lying in UDP and LDP separately, only one
bit is 1 while the other bits are all 0. Depending on the
locations of bit 1 in the segments, the number of opened
delay stages in UDP and LDP can be equal (in faster-than-at-
speed testing) or different (in online aging prediction).
Note that, although UDL and LDL are both connected to
the control pMOS P1, they do not interfere with each other
during the operation of the capture circuit. For example, only
when the UDL switches to logic 0 and remains at this state
for a long enough time or constantly, LDL will be affected
and switch to logic 0 correspondingly, and vice versa.
However, to generate the high clock frequency or the short
capture interval, the time difference between switching UDL
and LDL from logic 1 to logic 0 is very small. Such a
short time will not therefore lead to interference between UDL
and LDL.
2) Performing Faster-Than-at-Speed Testing: During faster-
than-at-speed testing, the chip under test works alternatively
Fig. 4. FMC and its timing diagram.
between two modes. When GSEN and SEL both are at logic
1, the scan clock SCLK can be applied to the system clock
tree to scan in the test vectors (from the scan input, SI) or
scan out the test responses. When GSEN switches from logic
1 to logic 0 while SEL remains at logic 1, two falling
transitions will be generated on UDL and LDL. These two
falling transitions are fed into the faster-than-at-speed testing
module circuit (FMC) unit (Fig. 3) and are transformed into
the launch and capture clocks. The structure of the FMC unit
and the corresponding waveform are shown in Fig. 4.
3) Performing Online Aging Prediction: During the nor-
mally functional operation of the chip, SEL and GSEN both
remain at logic 0. At this time, the functional clock FCLK
is applied to the system clock tree. For performing online
prediction, SEL remains at logic 0 while GSEN switches
from logic 0 to logic 1. This will trigger a rising transition
on the input signal (IN) of the programmable delay submodule
and a falling transition on the output signals of UDL and
LDL, respectively. UDL is inverted by INV1 (Fig. 3), which
substitutes FCLK to feed the system clock tree in the aging
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
JIN et al.: CAPTURE SCHEME FOR SDD DETECTION AND AGING PREDICTION 5
Fig. 5. Structure of the aging sensor.
Fig. 6. Timing diagram of online aging prediction.
prediction mode. LDL is NORed with the inverted UDL to
generate a control signal CTRL. CTRL is fed into the aging
sensors (AS in Fig. 2) and is used to form the capture interval
in online aging prediction.
In [10], we proposed a stability checker that can effectively
monitor circuit aging online. However, the structure of the
proposed stability checker in [10] is complex. In this paper,
with the support of PCSGM, we propose an aging sensor with
a very simple structure, as shown in Fig. 5. The main body of
the proposed aging sensor contains six transistors compared to
the one proposed in [8] with an eight-transistor conguration.
A warning signal will be generated if a transition on the com-
bination circuit output (D in Fig. 2) occurs within the capture
interval. Fig. 6 illustrates the timing diagram of the online
aging prediction. When LDL remains at logic 1, CTRL is
at logic 0 correspondingly. At this time, P1 and P2 (Fig. 5)
conduct to hold ALERT at logic 0 regardless of whether D
has a transition or not. When LDL switches from logic 1
to logic 0, CTRL switches to logic 1, which shuts down
P1 and P2, resulting in the conduction of N3 and N4. ALERT
will switch from logic 0 to logic 1 if and only if D has
a transition during the time that CTRL remains at logic 1
(i.e., the capture interval).
In the online aging prediction mode, SEL remains at logic
0, which prevents SCLK from applying to the FFs in LDP
while organizing the FFs in UDP into a new CSR (see the two
MUXes in the programmable delay submodule). Therefore, the
segment of the control vector that lies in the FFs in UDP can
be shifted circularly under the control of SCLK. This will alter
the opened delay stages in the UDP and in turn will alter the
capture interval in online aging prediction.
4) Aging-Resistant Design Concern: The hardware circuit
used in online aging prediction should be aging-resistant to
minimize the drift in the capture interval during the runtime
usage. In this paper, we concentrate on aging-resistant design
Fig. 7. Structure of the DE.
consideration of the proposed capture circuit on the BTI
effects, including NBTI and PBTI. NBTI has been widely
recognized as a major aging mechanism in poly-gate CMOS
circuits, while PBTI is found to be dominant in high-k
processes. Of course, there are still some other aging effects
that will degrade the proposed capture circuit. However, by
making the capture circuit BTI-resistant, we believe it has a
good chance to maintain the operation accuracy in the runtime.
An intuitive way to stop the V
th
degradation is preventing
pMOS and nMOS from being in the biased (stress) state.
Firstly, let us look at the aging sensor. The NBTI-induced
degradation on P1 and P2 (Fig. 5) is not a problem because
P1 and P2 have already shut down every time when the aging
prediction begins. When the capture circuit steps into the
idle mode, CTRL remains at logic 0. N3 and N4 therefore
receive logic 0 input and will not suffer from PBTI-induced
degradation. N1 and N2 inevitably suffer from PBTI. However,
no serious PBTI-induced degradation is expected since inputs
of N1 and N2 depend on the output of the combination
circuit (D in Fig. 3), which usually switches between logic
1 and 0.
The inverter INV1 (Fig. 3) does not suffer from NBTI-
induced degradation when the capture circuit is in idle mode,
because the input of INV1 (i.e., UDL) constantly remains at
logic 1. For NOR1 in Fig. 3, the input of the upper pMOS
(i.e., LDL) in the stacked pMOSs also remains at logic 1.
Due to the stack effect, NOR1 does not suffer from the NBTI
effect regardless of whether the input of the lower pMOS is
at logic 0 or not.
The UDP and LDP consist of multiple delay stages (DU
i
and DL
i
). Each delay stage has one or more DEs. Fig. 7
illustrates the BTI-resistant DE. Two extra pMOSs (CP1 and
CP2) and nMOSs (CN1 and CN2) are added to each inverter
in DE and are controlled by a global control signal CNTL.
CNTL can be obtained by NORing GSEN and SEL. During
the faster-than-at-speed testing or the online aging prediction,
CNTL remains at logic 0, which results in the conduction
of CP1 and CP2 while shutting down CN1 and CN2. At this
time, the DE works as the traditional delay buffer. When the
capture circuit is in idle mode, CNTL switches to logic 1,
so that CP1 and CP2 shut down while CN1 and CN2 conduct.
Consequently, the nodes k and OUT are kept at logic 0. This
results in V
gs
= 0 for the pMOSs (CP1, CP2, P1, and P2)
and avoids the NBTI-induced degradation on these pMOSs.
Similarly, the inputs of nMOSs N1 and N2 remain at logic
0 because IN and the node k remain at logic 0. This
also results in V
gs
= 0 for N1 and N2, preventing them from
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
6 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS
being in the stress state (positively biased). CN1 and CN2
will suffer from PBTI-induced degradation. However, they are
already shut down before the capture circuit performs online
aging prediction. Hence degradations on CN1 and CN2 will
not affect the timing of DE. As discussed above, we can say
that the DE can simultaneously resist NBTI and PBTI effects
during the idle period.
In addition to the DE, let us consider the control pMOS P1
in Fig. 3, which also suffers from NBTI-induced degradation.
However, since P1 always shuts down (IN switches to logic
1) during the generation of UDL and LDL, the NBTI-
induced degradation on P1 does not affect the timing of the
generated clock signal either.
Finally, let us look at the nMOS stacks in the programmable
delay submodule (Fig. 3). The nMOSs will not be affected by
NBTI. When the capture circuit steps into the idle mode, IN
keeps at logic 0. As described above, every DE in the delay
submodule also outputs logic 0. In this case, all the middle
nMOSs in the nMOS stacks will receive a logic 0 input.
The remaining segments of the control vector stored in the
scannable FFs in UDP and LDP only have one bit of 1
while the other bits are all 0. This means that the inputs of
all three nMOSs in most nMOS stacks will remain at logic
0, thus preventing them from being in the positively biased
state and avoiding PBTI-induced degradation.
In summary, the proposed capture circuit is BTI-resistant
during the long idling period. Of course, it is impossible to
completely eliminate aging-induced degradation on the capture
circuit, especially when it is functioning. Fortunately, the cap-
ture circuit can dynamically adjust the capture interval in the
runtime. This further improves the aging-resistant capability of
the capture circuit because we can adjust the capture interval
back to the predened value when the capture circuit has aged
to some extent (as illustrated in Section V-C).
5) Implementation Problems: Here we discuss the imple-
mentation problems related to the practical operation of the
capture circuit. One is how to generate and reuse the two
control signals GSEN and SEL. The other is the test generation
issue for faster-than-at-speed testing.
GSEN and SEL signals can be provided by the external
ATE during the faster-at-speeding testing. When the chip starts
service in the eld, these two signals can be reused to switch
the capture circuit between the idle and online aging prediction
modes. We propose two simple schemes for generating GSEN
and SEL in the eld. First, a small MCU such as 8051 located
off-chip can be used to generate GSEN and SEL and program
their timings. Such an MCU usually has price below US$1 and
occupies a small area of the board. The second way to generate
GSEN and SEL is by constructing a very simple nite state
machine (FSM) on-chip. The FSM provides only three states
for the capture circuit, namely, the idle mode, the set/reset
mode, and the online aging prediction mode. The set/reset
mode means that before starting the operation of online aging
prediction, the scannable FFs in the programmable delay
submodule are rst forced into a certain state (i.e., the control
vector is formed correspondingly).
Faster-than-at-speed testing certainly needs test vectors to
sensitize the path under test. However, circuit design to
produce a fast test clock and test generation to obtain the
test vectors are two orthogonal issues and can be relative
independently implemented. This paper mainly concentrates
on circuit design to generate higher test clock frequency. How
to generate the test vectors is not the major concern for us.
The reason is that the generation of a set of test vectors that
can be effectively applied in faster-than-at-speed testing needs
to tackle many problems, such as the test path selection, test
vector grouping, long path masking, and IR drop reduction,
etc. Tackling these problems needs extra effort and can be
dealt with in a separate paper. The interested reader can refer
to some relevant literature, such as [3].
Fortunately, circuit design for implementing faster-than-at-
speed testing is more or less independent of the test generation.
As long as the designed test circuit can support the common
modes in scan-based delay testing, such as launch-off-capture
(LOC) or launch-off-shift (LOS), it can seamlessly combine
with the generated test vectors and perform faster-than-at-
speed testing.
IV. PV-RESILIENT DESIGN
PV-induced parameter variabilities can manifest themselves
across several dies (die-to-die) or within a single die (within-
die). Within-die variation can be further divided into system-
atic and random components. Systematic variation is mainly
caused by subwavelength lithography and line-edge roughness,
while the random variation arises from the uctuation in
oxide thickness and random dopant uctuation (RDF). In
earlier processes, die-to-die variations dominate the parameter
variability. However, with the continuous shrinking in the
feature size, within-die variations become more and more
prominent.
In this paper, we exploit the RSCE to reduce the delay sensi-
tivity of the circuit to tackle the PV effect. Exploiting RSCE to
resist PV was rst proposed in [34]. For nanoscale processes,
the threshold voltage initially increases as the channel length
is made shorter, and then it decreases. Therefore, within the
RSCE range, the threshold voltage decreases with the increase
of channel length.
Assigning the channel length of manufactured transistors
into the RSCE range will improve the PV resilience of the
circuit. For example, the propagation delay T
p
of an inverter
can be expressed as
T
p

V
dd
C
gate
K
n
(V
dd
V
tn
)
2
+ K
p
(V
dd
V
tp
)
2
(2)
where V
dd
is the supply voltage, C
gate
is the gate capacitance,
V
tn
and V
tp
are the threshold voltages of nMOS and pMOS,
respectively, and K
n
and K
p
are the gain factors of the nMOS
and pMOS, respectively.
When the channel length of the designed transistor falls into
the RSCE range, with the reduction of channel length, the
square gate overdrive voltages (V
dd
V
tn
)
2
and (V
dd
V
tp
)
2
in 2 decrease while the gain factors K
n
and K
p
increase.
Therefore, T
p
is less sensitive to the variation of channel
length under PV due to the gate overdrive voltages and the
gain factors always move in opposite directions.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
JIN et al.: CAPTURE SCHEME FOR SDD DETECTION AND AGING PREDICTION 7
TABLE II
CONFIGURATION OF PROGRAMMABLE DELAY SUBMODULE
Delay of each stage (ps)
First Second Third Fourth Fifth Sixth
UDP 30 60 90 120 150 180
TC 120 240 480 960 N/A N/A
BCI N/A N/A 990 960 930 900
CI N/A N/A 10 40 70 100
LDP 150 300 570 1080 N/A N/A
TC: test clock period. BCI: bound of the capture interval. CI: capture interval.
Fig. 8. Simulation waveform for online aging prediction.
V. VERIFICATION AND DISCUSSION
The proposed capture circuit is simulated in HSPICE
under the PTM 65-nm technology node [35] and EPFL-EKV
MOSFET model (for RSCE simulation). The functional clock
frequency is assumed as 1 GHz.
Table II lists the conguration of the programmable delay
submodule. UDP and LDP have six and four delay stages,
respectively. The test clock period in faster-than-at-speed test-
ing (TC) is decided by the delay difference between the rst
four stages in UDP and LDP, while the capture interval in
online aging prediction (CI) is decided by the delay difference
between the stages 3 and 6 in UDP and the last stage in LDP.
As shown in Table II, the minimum of achievable test
clock period in faster-than-at-speed testing can be set to
120 ps. Actually, the minimum test clock period can be further
reduced by decreasing the delay difference between UDP and
LDP. However, faster-than-at-speed testing generally cannot
be performed at such high test clock frequencies because the
induced test power may fail the chip under test. Therefore,
the achieved high test clock frequencies in the simulation
are used just to demonstrate that the proposed capture circuit
potentially can provide a very high test clock frequency for
SDD detection.
Under the assumption that the timing margin is 10% of the
functional clock period (i.e., 100 ps), four kinds of capture
interval in online aging prediction can be achieved (row CI
Fig. 9. Simulated waveform for faster-than-at-speed testing.
in Table II). Note that the value of the preserved timing
margin (5%, 10%, or 15%) does not directly correlate with the
operation of the capture circuit. It just affects the identied
critical paths by the timing analysis procedure and hence
determines the number of aging sensors to be inserted or the
path to be tested.
A. Verication
1) Online Aging Prediction: The proposed unied capture
circuit is rst exploited to perform online aging prediction.
Fig. 8 illustrates the corresponding simulation waveform. The
capture interval is set to 70 ps (i.e., bound of this capture
interval is at 930 ps). As shown in Fig. 8, ALERT, the output
of aging sensor, remains at logic 0 even though there are
two transitions on D occurring outside the capture interval.
When the third transition occurs inside the capture interval,
the aging sensor generates a rising transition (ALERT 01)
as the warning signal.
2) Faster-Than-at-Speed Testing: In this paper, we leverage
the last transition generation (LTG) circuit proposed in [36] to
perform faster-than-at-speed testing, which supports both LOC
and LOS schemes [37]. The LTG circuit can be embedded
into a scan chain to provide a fast local scan enable (LSEN)
signal, which obviates the need for a timing-critical global scan
enable signal (GSEN). The details of LTG circuit can be seen
in [36].
Fig. 9 illustrates the HSPICE simulation waveform for
faster-than-at-speed testing using the LOS scheme. The sim-
ulation waveform for the LOC scheme is not presented here
because it is very similar to that of LOS except LSEN can
be de-asserted asynchronously. The test clock period is set to
480 ps. As shown in Fig. 9, during the scan-in phase, SCLK is
applied to the clock tree to shift the test vector. GSEN is then
de-asserted to enable the generation of launch and capture
clocks using the programmable delay submodule. After the
launch operation, LSEN is de-asserted fast to switch CUT from
the scan-in mode into the functional mode. After capturing
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
8 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS
Fig. 10. RSCE range for 65-nm technology.
the circuit response, LSEN is asserted to logic 1 again and
SCLK is applied to the clock tree to scan out the test response.
To improve the test coverage, a hybrid testing scheme
(LOC + LOS) can be exploited in the faster-than-at-speed
testing. As the results in [38] show, hybrid application of
LOC and LOS schemes can provide high test coverage with
reasonable hardware overhead and test vector magnitude.
B. Resilience to PV
In this subsection, we rst identify the optimal transistor
channel length from the RSCE range for the proposed capture
circuit. Then we show the resilience of the capture circuit to
PV in both cases that perform online aging prediction and
faster-than-at-speed testing.
1) RSCE Range Identication: We perform HSPICE simu-
lation to identify the RSCE ranges for nMOS and pMOS at the
65-nm technology node. Simulation is iteratively performed
for 160 runs. In the simulation, threshold voltages of nMOS
and pMOS with respect to the channel length are evaluated
by varying the channel length from 1 (scale = 1) to 20
(scale = 20) the baseline length (65 nm) in steps of 0.1.
Because V
th
remains stable when the scale is larger than 10,
we just show the results for 1 to 16 channel lengths in
Fig. 10. As shown in the gure, the RSCE ranges for both
nMOS and pMOS span from 1.4 to 6 the baseline length.
To choose the optimal channel length of the transistor from
the RSCE range, we evaluate the delay variations for different
delay elements within the RSCE range using the HSPICE
Monte Carlo simulation. Propagation delays of the three kinds
of delay element are 30, 60, and 90 ps, respectively. The total
standard variance of 10% is assumed from the variations in
channel length and oxide thickness under PV. As shown in
Fig. 11, the minimum delay variations of the three kinds of
delay element in the RSCE range are at scales of 5, 1.8, and 2.
However, choosing the 5 baseline channel length for the
30-ps delay element will result in a large area overhead
(assuming W/L = 10). Therefore, under the pessimistic
assumption that the 3 of parameter variation under PV is
45%, to minimize the area overhead, 130 nm is chosen as the
channel length of the transistor (i.e., 2 the baseline length
in Fig. 10).
The RSCE range of the transistor in a practical process may
be different from our HSPICE simulation results. However, it
does not affect the correctness of the proposed PV-resilient
method. The designer just needs to choose the transistor
Fig. 11. Delay variations within the RSCE range.
Fig. 12. Distribution of capture interval bound under PV.
channel length from the practical RSCE range provided by
the foundry.
2) Restricting Variation in Capture Interval: We realize
two versions of the proposed capture circuit for online aging
prediction by assigning the transistor channel length inside
(130 nm) and outside (65 nm) the RSCE range, respectively.
In both cases, the ratios of W/L for pMOS and nMOS are set
to 10 and 5.
For clarity, we use variation in the capture interval bound
to represent the variation in the capture interval. To obtain
the distribution of capture interval bound under PV, HSPICE
Monte Carlo simulations are performed for 250 runs for each
of these two cases. The capture interval bound is set at 930 ps.
A total standard variance of 20% is assumed from the channel
length, oxide thickness, and RDF. It is divided into systematic
variance of 10% and random variance of 17%. The random
variance is further equally divided into two components (12%
for each one) corresponding to the variations from oxide
thickness and RDF. Unlike the channel length and the oxide
thickness, it is not easy to simulate RDF variation in HSPICE.
Hence we transform the RDF effect into V
th
variation in
HSPICE because RDF mainly affects V
th
of the transistor.
As shown in Fig. 12, the variation is signicantly restricted
by choosing the channel length of transistor as 130 nm (inside
RSCE range). The left histogram and the tted Gaussian
distribution curve show that most samples of capture interval
bound locate around 930 ps closely. On the contrary, as
shown in the right histogram, the samples of capture interval
bound in the 65-nm design (outside RSCE range) have a wide
distribution under PV.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
JIN et al.: CAPTURE SCHEME FOR SDD DETECTION AND AGING PREDICTION 9
TABLE III
VARIATION IN TEST CLOCK PERIOD
Channel length = 130 nm (inside RSCE range) Channel length = 65 nm (outside RSCE range)
Norm. 960 % 480 % 240 % 120 % 960 % 480 % 240 % 120 %
Mean 968.21 474.52 235.15 113.20 918.44 452.79 228.65 109.89
1 18.28 1.89 13.90 2.93 9.43 4.01 5.82 5.14 217.65 23.70 93.26 20.60 164.87 72.10 61.28 55.76
AD 14.21 1.47 9.63 2.03 7.67 3.26 4.62 4.08 92.14 10.03 43.69 9.65 46.99 20.55 31.68 28.83
Norm: nominal clock period; Mean: mean of sampled clock period; %: percentage of mean; AD: average deviation.
In [8], under the assumption that 3 variation under PV
effect is 30%, the maximum variation in the guardband bound
(i.e., capture interval bound in this paper) reaches 11.4%,
while the maximum variation in the capture interval bound in
this paper is about 7.3% by choosing 130-nm channel length
even under the pessimistic assumption that 3 of variation
is 60%. Therefore, it is obvious that choosing the transistor
channel length from the RSCE range signicantly improves
the resilience of our circuit to PV.
3) Restricting Skew in the Test Clock: By choosing the
channel length of transistor from the RSCE range, skew in the
generated test clock in faster-than-at-speed testing under PV
is also reduced considerably. Table III lists the simulated test
clock periods and their distributions under PV by assigning
the channel lengths of transistor inside (130 nm) and outside
(65 nm) the RSCE range. Parameter variation resulting from
PV effect is assumed to be the same as in the case in
Section V-B2.
As shown in Table III, 1- and average deviation of the
test clock period in the 130-nm design are much less than
those in the 65-nm design. The variation decreases as the test
clock period increases. This is because a larger clock period
needs more opened delay stages (i.e., more delay elements).
In this case, the random variations on each delay stage will
partially cancel each other. Moreover, from the row Mean
we can see that the mean of the simulated clock period in the
130-nm design has only a very small error compared with the
designated value while the case of 65 nm has a large deviation.
C. Resilience to Intrinsic Aging-Induced Degradation
1) Minimizing Drift in the Capture Interval: During the
time of performing online aging prediction, the capture circuit
still suffers from aging effects. We obtain the drift in the
capture interval bound under NBTI effect by HSPICE simu-
lation with its MOSFET model reliability analysis (MOSRA)
module. In fact, it does not need to consistently monitor the
change in circuit delay during the whole lifetime because
circuit aging is a gradual process. Hence, as assumed in [8],
the time for performing aging prediction is assumed as 10%
of the totally operational time of the chip. The capture interval
bound is set at 930 ps. The average working temperature
is assumed to be 375 K. Moreover, we perform the same
MOSRA on the modied design for online aging prediction
by using a traditional delay buffer to construct the asymmetric
delay submodule, i.e., the asymmetric delay submodule in this
case is not NBTI-resistant (we called this design the traditional
design). Drifts in capture interval under 10-year NBTI effect
for these two cases are illustrated in Fig. 13.
Fig. 13. Drift in capture interval bound under NBTI effect.
As shown in Fig. 13, after 10 years of usage, drift in the
capture interval bound in the proposed design is about 30 ps
while the traditional design has a drift of about 70 ps. For
the 10-year usage time, the capture interval bound of the
traditional design under NBTI effect is even beyond the trigger
edge of the functional clock and thus will lead to aging sensor
invalidity.
Since HSPICE only supports NBTI simulation, we do not
evaluate PBTI-induced degradation on the capture circuit in
this paper. However, as shown in the results in Fig. 13, the
aging-resistant design of the capture circuit can effectively
reduce NBTI-induced degradation by preventing pMOSs in
the circuit from being in the negatively biased state. Hence
it is reasonable to expect that this aging-resistant design is
also effective on relieving PBTI-induced degradation on the
capture circuit by preventing nMOSs in the circuit from being
in the positively biased state.
2) Compensating Drift in the Capture Interval: Obviously,
under the intrinsic aging effects, prediction accuracy of the
aging sensor proposed in [8] will be affected by the gradual
drift in guardband bound, because the guardband cannot be
adjusted after fabrication. However, the proposed clocking
scheme in this paper can compensate the drift in the capture
interval bound by dynamically adjusting the capture interval
at runtime. For example, as shown in Fig. 13, if the drift in
the capture interval bound reaches 30 ps, the capture interval
bound can be adjusted to 900 ps. This still ensures that the
realistic capture interval bound is at 930 ps. This self-adjusting
capability improves the aging resistance of our scheme and
ensures the prediction accuracy correspondingly.
D. Analysis on Capture Resolution
Similar to the traditional FF, setup and hold time constraints
are also important for the proposed capture circuit, especially
for the aging sensor. Violation of the setup and hold time
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
10 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS
Fig. 14. Extending the capture interval to improve the capture resolution.
constraints will degrade the effectiveness of the capture circuit.
If the transition output by the combinational circuit is very
close to either the rising or the falling edge of the capture
interval, false alarm or miss detection will occur.
In this paper, we exploit the countermeasure proposed in [8]
to overcome this problem. As shown in Fig. 14, a small margin
is attached to the original capture interval to form an extended
capture interval. In this case, if a transition is very close to the
bound of the extended capture interval, it may not be detected
by the capture circuit. However, this does not affect the capture
result because this transition originally lies outside the original
capture interval. Contrarily, any transition occurring inside the
original capture interval will be reliably captured. To achieve
this, it is essential to add a proper margin to the original
capture interval. We perform HSPICE simulation to determine
the extra margin. In the simulation, we intentionally generate a
set of transitions that have different distances (from near to far)
with the bound of the original capture interval and launch the
capture operation. From the capture results, we found that all
the transitions occurred 1011 ps before the capture interval
bound could be reliably captured. Consequently, the added
margin to the capture interval is set to 12 ps for the capture
circuit. Compared with the added margin in [8] (20 ps), our
capture circuit needs a smaller extended capture interval and
thus can provide higher capture resolution.
E. Impacts of Temperature Variation and IR Drop
Besides PV and aging effects, temperature uctuation and
IR drop are both common variation sources at runtime. Delay
variations resulting from these two sources may affect the
circuit under test/monitor and the capture circuit, thus affecting
the capture results. Nevertheless, we argue that treating the
operational environment experienced by the capture circuit to
be same as the one for the circuit under test/monitor may be
overly pessimistic. It is reasonable to say that delay variations
caused by temperature uctuation or IR drop should not be
equal because they are located far from each other (except the
embedded aging sensor). For above reasons, next we discuss
the impact of temperature uctuation on the online aging
prediction and of IR drop on the faster-than-at-speed testing.
Temperature variation will usually affect the timings of
the signals (i.e., UDL and LDL) generated by the capture
circuit. We evaluate the formed capture intervals by the capture
circuit under different temperatures. The target capture interval
is still set to 70 ps. The evaluation results are shown in
Table IV. From the evaluation results, we can see that, when
TABLE IV
CAPTURE INTERVALS UNDER DIFFERENT TEMPERATURES
Temperature (K) 348 368 388 408
capture interval (ps) 69.72 61.03 50.89 41.11
the temperature increases, the duration of the formed capture
interval decreases. The reason is that the timings of UDL
and LDL both become slower as the temperature increases.
Since the number of delay elements in LDP is larger than
that in UDP (see Table II), variation in LDL is larger than in
UDL. In such a case, the bound of formed capture interval
moves toward the right (see Fig. 6), thus decreasing the
duration of the capture interval. We argue that the capture
interval becoming smaller is not a big problem for online
aging prediction. A smaller capture interval just means that
the combinational circuit is permitted to be a little more aged.
In faster-than-at-speed testing, application of the test vectors
may cause an IR-drop-induced delay variation, thus affecting
the test efciency. However, effectively solving this problem
mainly relies on the test generation. As in [3], by carefully
grouping the test patterns into multiple subsets with close
path delay distribution and determining an optimal test fre-
quency, performance degradation due to IR-drop effects in the
faster-than-at-speed can be improved effectively. We think the
scheme proposed in [3] can be combined with our capture
circuit to tackle the power supply noise problem.
F. Overhead Evaluation and Comparison
1) Area Overhead: To evaluate the area overhead, the
proposed unied capture circuit is integrated into several large
ISCAS89 benchmark circuits. A key problem is to decide how
many aging sensors should be embedded into the FFs in the
original circuit. Under the assumption that path delay can be
increased up to 20% under 10-year NBTI effect, statistical
static timing analysis (SSTA) is performed to pick out the FF
if the statistical timing slack (+1) in one of its fan-in paths
is less than 20%.
Moreover, for comparison, we also separately realize the
circuit in [33] for faster-than-at-speed testing and the circuit
in [8] for on-line aging prediction. The achievable testing
clock frequency and the capture interval are as same as ours.
Then we combine them together and integrate them into
benchmark circuits also (which we call [9]+[2] in the rest
of this paper). Given that whether the delay element in the
aging sensor is shared among multiple stability checkers or
not, the [9]+[2] circuit is further classied into two cases.
In case 1 ([9]+[2]:1), each delay element corresponds to a
stability checker in the aging sensor. In case 2 ([9]+[2]:2),
one delay element is shared among four stability checkers, as
in [8]. Finally, the area overheads for the two cases in [9]+[2]
and our scheme are evaluated by ABC [39], a synthesis tool
published by U. C. Berkeley.
Column 2 in Table V lists the number of the aging sensors
that are needed to be used to monitor the FFs. Columns 35
list the corresponding area overheads for the three cases.
For the three benchmark circuits, the area overheads in our
scheme are less than those in the [9]+[2] circuit. Although the
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
JIN et al.: CAPTURE SCHEME FOR SDD DETECTION AND AGING PREDICTION 11
TABLE V
AREA OVERHEAD AND POWER CONSUMPTION
Area (%) Power (%)
circuit AS ours [9]+[2]:1 [9]+[2]:2 ours [9]+[2]:1 [9]+[2]:2
s38584 294 1.16 6.83 3.29 1.1 0.8 0.6
s38417 302 2.41 8.91 4.66 1.3 1.1 0.7
s35932 327 2.23 8.65 4.21 1.5 1.3 1
area overhead in [9]+[2]:2 is smaller than that in [9]+[2]:1
by sharing the delay element, it is still larger than ours.
This demonstrates that by reusing the ofine testing hardware
circuit in online operation, the total area overhead is saved.
2) Power Consumption: We evaluate the increase in average
power consumption during the online aging prediction by
HSPICE simulation. The evaluation is performed by assuming
that all the embedded aging sensors work simultaneously. The
time for performing aging prediction is assumed as 10% of the
total operational time of the chip (10 years). The evaluation
results are listed in columns 68 of Table V.
The power is mainly induced by the transitions in the delay
submodule. From the evaluation results we can see that the
power consumption in our scheme is a little larger than that in
the [9]+[2] circuit. This because the number of delay stages
that the transition propagated in our capture circuit is larger
than that in the [9]+[2] circuit. However, the total power
increase during online aging prediction is still negligible.
3) Performance Loss: Performance loss is induced by the
embedded aging sensor, which adds extra load on the path
output. Similar as in [8], we evaluate the performance loss
targeting a single path in HSPICE simulation by comparing
the path delay variation before and after inserting the aging
sensor. The reported performance loss is only about 0.5%.
G. Discussion
1) Applicability of the Unied Capture Scheme: Delay
variation continuously exacerbates with the technology scal-
ing and has a signicant impact on the reliability of the
shipped chip. Hence it is essential to detect SDDs during
the fabrication testing stage as well as to monitor runtime
delay variations online. The proposed unied capture scheme
supports both these goals with one suite of the hardware
circuit. The implementation complexity in defect- or aging-
related reliability design is signicantly reduced. Meanwhile,
the total area overhead consumed by ofine and online circuits
is also saved. With the above stated merits, we believe that the
proposed unied capture scheme will be quite promising.
The proposed capture scheme is mainly applicable to chips
that rely on edge-triggered FF designs. Although the underly-
ing capture circuit operates for the positive-edge- triggered FF
design, it also can be applied to the negative-edge-triggered FF
design by making a few of modications. For example, if we
replace the NAND gate in the FMC unit (Fig. 4) with an AND
gate, launch and capture signals will be the falling transitions
according to UDL and LDL signals. Similarly, for online aging
prediction, we just need to move the falling transition on
LDL ahead of the falling edge of ACLK by adjusting the
programmable delay difference (Fig. 6), the aging sensor can
still work as usual. FF-based design is common in the practical
ASIC or microprocessor, such as in [31]. So, we think the
proposed capture scheme is practical and appropriate for many
industry designs.
On the other hand, from the experimental evaluation, we
can see that implementation of the proposed capture circuit
does not result in large overheads either on the silicon area
or on the power consumption. The data in Table V show that
the power dissipated by the capture circuit is negligible, while
the area overhead is only a little more than 2% for the circuit
with the tens of thousands of gates. For larger scale circuits,
only the number of aging sensors will increase. Therefore, it is
reasonable to say that the capture circuit is applicable for large-
scale circuit in terms of the low area and power overheads.
The actual production environment is certainly different
from HSPICE simulation. We make the following suggestions
on implementation of the proposed unied capture scheme
in a real IC. First, the RSCE ranges of pMOS and nMOS
in a particular process can be obtained from the foundry.
Second, the achieved delay difference of the programmable
delay submodule may need to be adjusted according to the
results in the rst silicon debugging. This will facilitate
the proposed capture circuit to operate under the practical
parameter variability. Moreover, identication of critical path
should also be performed at this phase. This can improve the
accuracy of the timing analysis and help identify the FFs that
are embedded with the aging sensor. Finally, test compression
techniques [40], [41] can be combined with our scheme for
reducing the test patterns in SDD detection.
2) Implementation in Traditional Design Flow: Implemen-
tation of the proposed capture scheme needs a little modica-
tion on the traditional IC design ow. We propose a solution as
follows. First, besides the usual standard cells, some specic
cells used to construct the capture circuit should be added
to the technology library. These specic cells include the
aging-resistant delay element, FMC unit, nMOS stacks, and
the hybrid cell consisting of the aging sensor and the FF.
Then, the netlists of the circuit under test/monitor and the
capture circuit can be mapped to the technology library by
using appropriate cells. Note that, in the mapping process, all
the FFs are still the traditional ones (i.e., without the embedded
aging sensor). After the timing analysis, paths under monitor
(PUM) are identied according to the timing constraint. At
this time, the FFs that are fed by the PUMs are replaced with
the hybrid cell (aging sensor + FF). The subsequent steps in
IC design ow can remain unchanged. We think that such a
solution will have minimal perturbation on the traditional IC
design ow and hence is feasible.
VI. CONCLUSION
In this paper, we proposed a unied capture scheme to
support both the faster-than-at-speed testing and online aging
prediction. Unlike previous work, the underlying capture cir-
cuit in this paper possesses asymmetric characteristic and
operates under a well-designed control mode. It can generate
programmable clock signals that facilitate the achievement of
different test clock frequencies in faster-than-at-speed testing
as well as capture intervals in online aging prediction. Com-
ponents in the underlying capture circuit are aging-resistant,
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
12 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS
which signicantly reduces the drift in the capture interval
caused by aging effects. By choosing the channel length of the
transistor from the RSCE range, the proposed capture circuit
is resilient to PV. The proposed unied capture scheme can be
easily incorporated into the original clock distribution system
in the chip. It just adds multiplexers in clock distribution
nodes to select the corresponding clock signals giving different
working modes without changing the closed form of the
original clock distribution system.
One of our future objectives is to implement the proposed
unied capture scheme in a test chip to validate the effective-
ness in an actual production environment.
REFERENCES
[1] S. Menon, A. D. Sigh, and V. Agrawal, Output hazard-free transition
delay fault test generation, in Proc. IEEE VLSI Test Symp., May 2009,
pp. 97102.
[2] T. Mak, A. Krstic, K. T. Cheng, and L.-C. Wang, New challenges in
delay testing of nanometer, multigigahertz designs, IEEE Design Test
Comput., vol. 21, no. 3, pp. 241248, MayJun. 2004.
[3] N. Ahmed and M. Tehranipoor, A novel faster-than-at-speed transition-
delay test method considering IR-drop effects, IEEE Trans. Comput.-
Aided Design Integr. Circuits Syst., vol. 28, no. 10, pp. 15731582, Oct.
2009.
[4] N. Ahmed, M. Tehranipoor, and V. Jayaram, Timing-based delay test
for screening small delay defects, in Proc. 43rd Ann. Design Autom.
Conf., 2006, pp. 320325.
[5] R. Tayade, S. Sundereswaran, and J. Abraham, Small-delay defect
detection in the presence of process variations, in Proc. 8th IEEE Int.
Symp. Qual. Electron Design, Mar. 2007, pp. 711716.
[6] M. Amodeo and B. Cory, Beyond At-Speed. Springeld, MO: Test
Measurement World, 2005.
[7] International Technology Roadmap for Semiconductors. (2005) [Online].
Available: http://www.itrs.net/
[8] M. Agarwal, B. C. Paul, Z. Ming, and S. Mitra, Circuit failure
prediction and its application to transistor aging, in Proc. 25th IEEE
VLSI Test Symp., May 2007, pp. 277286.
[9] M. Agarwal, V. Balakrishnan, A. Bhuyan, K. Kyunglok, B. C. Paul,
W. Wenping, Y. Bo, C. Yu, and S. Mitra, Optimized circuit failure
prediction for aging: Practicality and promise, in Proc. IEEE Int. Test
Conf., Oct. 2008, pp. 110.
[10] G. Yan, Y. Han, and X. Li, A unied online fault detection scheme via
checking of stability violation, in Proc. Conf. Design Autom. Test Eur.,
2009, pp. 2024.
[11] J. Keane, T. H. Kim, and C. H. Kim, An on-chip NBTI sensor for
measuring PMOS threshold voltage degradation, IEEE Trans. Very
Large Scale Integr. (VLSI) Syst., vol. 18, no. 6, pp. 947956, Jun. 2010.
[12] K. Zhao, J. H. Stathis, B. P. Linder, E. Cartier, and A. Kerber, PBTI
under dynamic stress: From a single defect point of view, in Proc. IEEE
Int. Rel. Phys. Symp., Apr. 2011, pp. 4A.3.14A.3.9.
[13] E. Cartier, B. P. Linder, V. Narayanan, and V. K. Paruchuri, Fundamen-
tal understanding and optimization of PBTI in nFETs with SiO
2
/HfO
2
gate-stack, in Proc. IEEE IEDM, Dec. 2006, pp. 14.
[14] B. C. Paul, K. Kang, H. Kuuoglu, M. A. Alam, and K. Roy, Tem-
poral performance degradation under NBTI: Estimation and design for
improved reliability of nanoscale circuits, in Proc. Design Autom. Test
Eur., 2006, pp. 16.
[15] Y. Wang, H. Luo, K. He, R. Luo, H. Yang, and Y. Xie, Temperature-
aware NBTI modeling and the impact of input vector control on
performance degradation, in Proc. IEEE Design Autom. Test Eur., Apr.
2007, pp. 16.
[16] W. Wang, Z. Wei, S. Yang, and Y. Cao, An efcient method to identify
critical gates under circuit aging, in Proc. IEEE Int. Conf. Comut.-Aided
Design, Nov. 2007, pp. 735740.
[17] W. Wang, V. Reddy, Y. Bo, V. Balakrishnan, S. Krishnan, and C. Yu,
Statistical prediction of circuit aging under process variations, in Proc.
IEEE Custom Integr. Circuits Conf., Sep. 2008, pp. 1316.
[18] L. Yinghai, S. Li, Z. Hai, Z. Hengliang, Y. Fan, and Z. Xuan, Statistical
reliability analysis under process variation and aging effects, in Proc.
Design Autom. Conf., Aug. 2009, pp. 514519.
[19] W. Wang, S. Yang, S. Bhardwaj, R. Vattikonda, S. Vrudhula, F. Liu,
and Y. Cao, The impact of NBTI on the performance of combinational
and sequential circuits, in Proc. 44th IEEE Design Autom. Conf., Jun.
2007, pp. 364369.
[20] S. V. Kumar, K. H. Kim, and S. S. Sapatnekar, Impact of NBTI on
SRAM read stability and design for reliability, in Proc. 7th Int. Symp.
Qual. Electron. Design, Mar. 2006, pp. 213218.
[21] S. V. Kumar, C. H. Kim, and S. S. Sapatnekar, NBTI-aware synthesis of
digital circuits, in Proc. Design Autom. Conf., Jun. 2007, pp. 370375.
[22] R. Vattikonda, W. Wang, and Y. Cao, Modeling and minimization of
PMOS NBTI effect for robust nanometer design, in Proc. 43rd IEEE
Design Autom. Conf., Jul. 2006, pp. 10471052.
[23] S. Mitra, Globally optimized robust systems to overcome scaled CMOS
challenges, in Proc. IEEE Design Autom. Test Eur., Mar. 2008, pp. 941
946.
[24] H. Dadgour and K. Banerjee, Aging-resilient design of pipelined
architectures using novel detection and correction circuits, in Proc.
Design Autom. Test Eur. Conf. Exhibit., Mar. 2010, pp. 244249.
[25] J. C. Vazquez, V. Champac, I. C. Teixeira, M. B. Santos, and J. P.
Teixeira, Programmable aging sensor for automotive safety-critical
applications, in Proc. IEEE Design Autom. Test Eur. Conf. Exhibit.,
Mar. 2010, pp. 618621.
[26] W. Qiu, W. Jing, D. M. H. Walker, D. Reddy, L. Xiang, L. Zhuo, S.
Weiping, and H. Balachandran, K longest paths per-gate (KLPG) test
generation for scan-based sequential circuits, in Proc. IEEE Int. Test
Conf., Oct. 2004, pp. 223231.
[27] Y. Sato, S. Hamada, and T. Maeda, Invisible delay qualitySDQL
model lights up what could not be seen, in Proc. IEEE Int. Test Conf.,
Nov. 2005, no. 47.1, pp. 19.
[28] M. Yilmaz, K. Chakrabarty, and M. Tehranipoor, Test-pattern grading
and pattern selection for small-delay defects, in Proc. 26th IEEE VLSI
Test Symp., May 2008, pp. 233239.
[29] H. Yan and A. D. Singh, Experiments in detecting delay faults using
multiple higher frequency clocks and results from neighboring die, in
Proc. IEEE Int. Test Conf., Oct. 2003, pp. 105111.
[30] H. Nakamura, A. Shirokane, Y. Nishizaki, A. Uzzaman, V. Chickermane,
B. Keller, T. Ube, and Y. Terauchi, Low cost testing of nanometer SoCs
using on-chip clocking and test compression, in Proc. 14th IEEE Asian
Test Symp., Dec. 2005, pp. 156161.
[31] T. McLaurin and F. Fredrick, The testability features of the MCF5407
containing the 4th generation coldre microprocessor core, in Proc. Int.
Test Conf., 2000, pp. 151159.
[32] S. Pei, H. Li, and X. Li, An on-chip clock generation scheme for faster-
than-at-speed delay testing, in Proc. IEEE Design Autom. Test Eur.
Conf. Exhibit., Mar. 2010, pp. 13531356.
[33] R. Tayade and J. A. Abraham, On-chip programmable capture for
accurate path delay test and characterization, in Proc. IEEE Int. Test
Conf., Oct. 2008, pp. 110.
[34] A. Torkel and S. Roland, Time delay line with low sensitivity to process
variations, U.S. Patent 0 079 487, Mar. 26, 2009.
[35] W. Zhao and Y. Cao, New generation of predictive technology model
for sub-45 nm early design explorations, IEEE Trans. Electron Devices,
vol. 53, no. 11, pp. 28162823, Nov. 2006.
[36] N. Ahmed, C. P. Ravikumar, M. Tehranipoor, and J. Plusquellic, At-
speed transition fault testing with low speed scan enable, in Proc. 23rd
IEEE VLSI Test Symp., May 2005, pp. 4247.
[37] S. Wang, X. Liu, and S. T. Chakradhar, Hybrid delay scan: A low hard-
ware overhead scan-based delay test technique for high fault coverage
and compact test sets, in Proc. IEEE Design Autom. Test Eur. Conf.
Exhibit., Feb. 2004, pp. 12961301.
[38] G. Xu and A. D. Singh, Achieving high transition delay fault coverage
with partial DTSFF scan chains, in Proc. IEEE Int. Test Conf., Oct.
2007, pp. 19.
[39] ABC: A System for Sequential Synthesis and Verication. Berkeley Logic
Synthesis and Verication Group, Berkeley, CA [Online]. Available:
http://www.eecs.berkeley.edu/alanmi/abc/
[40] Y. Han, Y. Hu, X. Li, H. Li, and A. Chandra, Embedded test decom-
pressor to reduce the required channels and vector memory of tester for
complex processor circuit, IEEE Trans. Very Large Scale Integr. (VLSI)
Syst., vol. 15, no. 5, pp. 531540, May 2007.
[41] Y. Han, X. Li, H. Li, and A. Chandra, Test resource partitioning
based on efcient response compaction for test timer and tester channels
reduction, J. Comput. Sci. Technol., vol. 20, no. 2, pp. 201209, Mar.
2005.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
JIN et al.: CAPTURE SCHEME FOR SDD DETECTION AND AGING PREDICTION 13
Song Jin received the Ph.D. degree in computer science from the Institute
of Computing Technology, Chinese Academy of Sciences, Beijing, China, in
2011.
He is currently an Assistant Professor with the Department of Electronic
and Communication Engineering, School of Electrical and Electronic Engi-
neering, North China Electric Power University, Baoding, China. His current
research interests include VLSI design and testing and computer architecture,
with emphasis on design for reliability and variation tolerance.
Yinhe Han (M06) received the B.Eng. degree from the Nanjing University
of Aeronautics and Astronautics, Nanjing, China, in 2001, and the M.Eng.
and Ph.D. degrees in computer science from the Institute of Computing
Technology (ICT), Chinese Academy of Sciences, Beijing, China, in 2003
and 2006, respectively.
He is currently an Associate Professor at ICT. His current research interests
include VLSI design and testing, reliable system, and architecture.
Dr. Han was a recipient of the Test Technology Technical Council Best
Paper Award from the Asian Test Symposium in 2003. He is a member of
the Association for Computing Machinery and the Institute of Electronics,
Information and Communication Engineers.
Huawei Li (M00SM09) received the B.S. degree in computer science
from Xiangtan University, Xiangtan, China, in 1996, and the M.S. and Ph.D.
degrees from the Institute of Computing Technology (ICT), Chinese Academy
of Sciences, Beijing, China, in 1999 and 2001, respectively.
She is currently a Professor at ICT. Her current research interests include
VLSI and SoC design verication and test generation, delay testing, and
dependable computing.
Prof. Li has been serving as Secretary General of the China Computer
Federation Technical Committee on Fault Tolerant Computing since 2008.
She also served as Program Chair of the IEEE Asian Test Symposium in
2007 and the IEEE Workshop on RTL and High Level Testing in 2003.
Xiaowei Li (SM04) received the B.Eng. and M.Eng. degrees in computer
science from the Hefei University of Technology, Hefei, China, in 1985 and
1988, respectively, and the Ph.D. degree in computer science from the Institute
of Computing Technology (ICT), Chinese Academy of Sciences, Beijing,
China, in 1991.
He was with the Department of Computer Science, Peking University,
Beijing, as an Assistant Professor from 1991 to 2000 and an Associate
Professor since 1993. He joined at ICT as a Professor in 2000. He is currently
the Deputy (Executive) Director of the State Key Laboratory of Computer
Architecture, ICT. He has co-authored over 200 papers in published academic
journals and international conference proceedings and holds 34 patents and
35 software copyrights. His current research interests include VLSI testing,
design verication, and dependable computing.
Prof. Li has been serving as Chair of the China Computer Federation
Technical Committee on Fault Tolerant Computing since 2008, the IEEE Asian
Pacic Regional Test Technology Technical Council Vice Chair since 2004,
and the Steering Committee Chair of the IEEE Asian Test Symposium since
2011. In addition, he serves on the Technical Program Committees of several
IEEE and Association for Computing Machinery conferences, including VTS,
DATE, ASP-DAC, and PRDC. He also serves as an Editorial Board Member
of JCST, JETTA, and JOLPE.

S-ar putea să vă placă și