Documente Academic
Documente Profesional
Documente Cultură
XIJIN TIAN
Int. J. Rel. Qual. Saf. Eng. 2005.12:459-474. Downloaded from www.worldscientific.com
Hewlett-Packard Company
3404 E. Harmony Road, MS B5, Ft. Collins, CO 80528, USA
bill.tian@hp.com
This paper provides a comprehensive study on reliability issues related to DC-DC con-
verter design. First of all, some common reliability issues and DC-DC converter design
topologies are discussed. Then, a reliability sensitivity study results on an industry stan-
dard VRM is presented using a prediction-based design-for-reliability and sensitivity
analysis tool created by the author and colleagues at HP. The thermal design issues and
component derating in the converter design are discussed. Since electronic component
reliability has been a bottleneck in DC-DC converter reliability, the component relia-
bility issues are discussed and some reliability application guidelines are also provided
for the DC-DC converters reliability design. In the last part of the paper, an integrated
reliability testing, qualication, and quality control procedure is presented using highly
accelerated testing methodology. The use of this procedure by HP on DC-DC converter
reliability and quality control has shown promising results.
Keywords: Power converter; design for reliability; sensitivity analysis; component appli-
cation; accelerated testing; qualication.
1. Introduction
Ideally, a reliable product should have no early-life failures (infant mortality fail-
ures), nor any wear-out failures before its expected lifetime, and the failure rate
should be very low during its useful life. However, the reality is that many factors,
such as design, components, manufacturing, or simple mistakes or oversights, can
lead to unexpected failures of a product. Reliability cannot be solely tested-in. It
has to be designed-in and built-in. A reliable product requires a systematic, holistic
approach starting from product concept design to its applications at customers.
DC-DC converters include power bricks (fully-encapsulated, enclosed modules),
open frame converters, bare-board converters, and VRMs (Voltage Regulator Mod-
ules). Todays high availability computer and telecom systems have put great
pressure on the performance and reliability of these power converters. High-speed
microprocessors require fast delivery of enormous supply currents in microseconds,
459
October 17, 2005 11:25 WSPC/122-IJRQSE SPI-J072 00194
460 X. Tian
With the sharp drop of application voltage (from 5 V to 3.3 V to 1.3 V) which
requires DC-DC converters to have higher output current, more and more DC-
DC converter vendors are using synchronous rectier (MOSFET) to replace the
conventional Schottky diodes in the design to obtain high eciency.
October 17, 2005 11:25 WSPC/122-IJRQSE SPI-J072 00194
Flyback Converter Simple, low part count Low output current, low
eciency, unconstrained
drain-voltage, Vds = 1.5
Vin(max), large output Irms
Single Transistor Better transformer use More components,
Forward Converter Unconstrained drain-voltage,
Int. J. Rel. Qual. Saf. Eng. 2005.12:459-474. Downloaded from www.worldscientific.com
When use synchronous rectication topology, the main switching MOSFET must
be o before the synchronous rectier is turned on, and vice versa. Otherwise,
there will be a shoot through, that is, the input (or output) voltage will have a
direct path to ground which will generate very high losses and potential failures.
Timing of the gate-drive to the synchronous rectier (MOSFET) could cause
cross conduction or reverse conduction. Incorrect delay can result in either
MOSFET body diode conduction, with losses even worse than a simple rectier,
or high shoot through current transients.
MOSFET was found to fail under low-load conditions due to the intrinsic body
diode of the MOSFET undergoes dynamic avalanching during its reverse recovery
with an associated high dv/dt in some phase-shifted, zero-voltage, full-bridge
converters.3
The forward topology requires a minimum load. A forward converter cannot
operate without load. This requires the inductor to be big enough to ensure that
its peak ripple current is less than the minimum load current. Otherwise it will
go discontinuous and the output voltage will rise, peak detecting.
October 17, 2005 11:25 WSPC/122-IJRQSE SPI-J072 00194
462 X. Tian
and high power applications which result in improved power density and high
eciency.
Wrongfully choose a component in DC-DC converter design or misapplications of
components have caused various reliability problems. Some of the examples are:
Int. J. Rel. Qual. Saf. Eng. 2005.12:459-474. Downloaded from www.worldscientific.com
failure of MOSFET at low input voltage due to transient stress on the switching
parts exceed their safe-operating area (SOA) during turn-on and turn-o; out-
put voltage overshoots due to poor output lter selection and poor over voltage
protection; input capacitor failures due to cracking ceramic capacitors (MLCC,
multilayer ceramic capacitor) or poor quality electrolytic or tantalum capaci-
tors; and power MOSFET thermal runaway due to inadequate thermal design
due to the factor that the on-resistance of a power MOSFET is increasing with
temperature.
To make a DC-DC converter design reliable, designers should keep reliability in
mind during very early stage of their design. Reliability should be designed-in from
the beginning of the design. The major reliability factors that should bear in mind
during a DC-DC converter design are:
Thermal design and management
Component selection and derating design
Design layout and mechanical design
The following section provides a sensitivity study of these design factors on DC-DC
converter reliability.
gate driver, and input and output capacitors. Using an industry standard VRM
(buck topology design) as an example, the following are the analysis results of
these design factors on the VRM reliability.
The BOM (Built of Materials) of the VRM includes:
140
Electrolytic
Caps
120
MOSFETs
Failure Rate (FIT)
100
80
ICs
60
40
Ceramic
Caps
20 Diodes Inductors Res.
0
1 2 3 4 5 6 7
Components
464 X. Tian
where
by ADVANCED SYSTEMS LABORATORY TECHNICAL INFORMATION CENTRE on 10/20/17. For personal use only.
The log-linear thermal stress model is equivalent to the Arrhenius model with
the substitution of Ea/k for mT
1 1 Ea
T = emT ( T 0 T 1 ) = e k ( T10 T11 )
where
Ea = activation energy
k = Boltzman constant = 8.62105
where
mS is the electrical stress slope parameter,
P 1 is the applied stress as a percent of rated,
P 0 is the reference stress (50% of rated)
500
450
250
200
150
100
50
Int. J. Rel. Qual. Saf. Eng. 2005.12:459-474. Downloaded from www.worldscientific.com
25 30 35 40 45 50 55 60
Temperature (C)
350
300
Failure Rate (FIT)
250
200
150
100
50
50% 55% 60% 65% 70% 75% 80%
Derating Level
466 X. Tian
a DC-DC converter. The following are some common reliability issues related to
thermal design and management.
through the electrolytic capacitors seal. As the electrolyte boils away, the capac-
itance decreases and the eective series resistance (ESR) rises, causing increased
power dissipation. If this regenerative process continues, it can cause the capac-
itor to exceed its maximum thermal rating.5
Int. J. Rel. Qual. Saf. Eng. 2005.12:459-474. Downloaded from www.worldscientific.com
Both switching and rectier MOSFETs could run very hot in a synchronous rec-
tication converter. It is essential to keep both the junction temperature of these
parts far below their maximum. To use packages with low thermal resistance
from junction-to-case and low thermal resistance from case-to-ambient are essen-
tial for a good thermal design. This is especially important in a bare-board DC-
DC converter design. The common used SO-8 package has a thermal resistance
of 20 25 C/W from junction-to-case and of 20 C/W from case-to-ambient.
This is much higher compared to some new designed MOSFET packages, such
as LFPak, I2 PAK, or DirectFET.9
In a DC-DC converter, the control components are sensitive to high temperature
and these parts account for a big part of the DC-DC converter failure rate. In
layout, the control components should be kept away from heat generating parts,
such as switching MOSFETs and magnetics.
Derating Design
Derating is the process of limiting electrical, thermal, and mechanical stresses on
electronic parts to levels below their specied ratings. In the component derating of
a DC-DC converter design, the stresses must take into account the stresses during
transient process, such as ripple current, AC peak voltage, and temperature rise due
to transients. For example, the derated voltage level of a capacitor should include
both applied DC voltage and AC peak voltage, that is
Ripple and pulse currents cause temperature rise in a component and have
adverse inuence on the failure rate of the component. Therefore, the ripple
and pulse currents must be limited during DC-DC converter design.
Voltage derating factor of transistors should be applied to the worst case combi-
nation of DC, AC and transient voltages.
For tantalum and aluminum electrolytic capacitors, when ambient temperature
by ADVANCED SYSTEMS LABORATORY TECHNICAL INFORMATION CENTRE on 10/20/17. For personal use only.
is greater than the maximum temperature minus 40 C (i.e., Ta > Tmax 40),
additional voltage derating has to be applied. In DC-DC converter application,
attention must be paid for the eect of temperature rise of nearby components.
Int. J. Rel. Qual. Saf. Eng. 2005.12:459-474. Downloaded from www.worldscientific.com
Aluminum electrolytic capacitors can have large capacitance and high rated volt-
age and they are cheap. However, aluminum electrolytic capacitors typically have
limited operating life especially under high temperature. Increased temperature
causes the electrolyte to evaporate and boil away which in turn causes the capac-
itance decrease and the ESR (eective series resistance) increase. The power loss
of a capacitor is determined by Pcap = (Irms )2 ESR, where Irms is the ripple
current of the capacitor. It is seen that for reliable operation, the aluminum elec-
trolytic capacitor must operate below its maximum allowable ripple current and
the ESR should be kept low. The ESR of the capacitor is the main cause of its
internal temperature rise.
High temperature will reduce the operating life of an aluminum electrolytic capac-
itor dramatically. The capacitor will either fail catastrophically (blow open) or
parametrically (too little capacitance, too high ESR, or large leakage current).
To keep the aluminum electrolytic capacitor cool, the capacitor should be kept
away from heat generate components, such as large resistors, catch diodes, and
heat sinks.
October 17, 2005 11:25 WSPC/122-IJRQSE SPI-J072 00194
468 X. Tian
The quality and performance of aluminum electrolytic capacitors can vary signif-
icantly from one vendor to another. The most critical part inside an aluminum
electrolytic capacitor is its electrolyte used. The electrolyte determines the oper-
ating temperature range of a capacitor and has a major eect on dissipation fac-
tor, ripple current rating and the operating life of the capacitor.7 An electrolyte
is composed of an organic solvent and solutes that provide ionic conductivity.
by ADVANCED SYSTEMS LABORATORY TECHNICAL INFORMATION CENTRE on 10/20/17. For personal use only.
seriously generate hydrogen gas. If the hydrogen gas is not absorbed inside the
capacitor, it will eventually blow o the capacitor seal and cause it fail. Cer-
tain chemicals, called depolarizers, can be added into the electrolytes to absorb
the generated gas. However, the chemicals of the depolarizer could go wrong
which can cause disaster failures.6 Currently, there is a new electrolyte, called
BL:EG (Butyrolactone:Ethylene), developed by NASA which generate much less
hydrogen gas than traditional EG electrolytes. Test shows that the new BL:EG
electrolytic capacitor have better ESR stability, less tendency for gas generation,
and better ability to withstand high temperature and trace chloride under severe
application conditions.7
Another option for input ltering capacitor is to use tantalum electrolytic capac-
itor. Tantalums have substantially better high frequency performance than alu-
minum, but cost more and are limited to about 100 V and a few hundred
microfarads.8
Solid tantalum (MnO2 ) capacitors should not generally be used for power supply
ltering unless specically made and tested for the application. This is because
solid tantalum capacitors cannot withstand high surge currents so must be cur-
rent limited with an external resistor for charging and discharging if the source
impedance is less than one ohm. In DC-DC converters, the input side is typ-
ically fed from voltage sources, which are not regulated and are of nominally
low impedance. This type of application severely stresses the capacitor. A higher
than normal failure rate level may be experienced. In our practice, solid tanta-
lum capacitors are not recommended for use in the input application. If used, a
voltage-derating factor of 50% or more is required for low impedance tantalum
capacitor applications. A protection resistor of 3 ohm/V or more is needed in
series to tantalum capacitors to limit the current to 300 mA or less.
Tantalum capacitors must not be operated and charged in reverse mode. Reverse
current causes loss of active power when passing through tantalum capacitors
and thereby causes the temperature of capacitor to rise.
Do not use silver-case tantalum capacitors. This kind of cheap wet tantalum
capacitor has always been problematical. They have almost zero tolerance of
reverse bias because the silver grows dendrites (silver migration) that cause rapid
damage.
October 17, 2005 11:25 WSPC/122-IJRQSE SPI-J072 00194
Today, more and more DC-DC converter designs are using ceramic capacitors for
input ltering. Ceramic capacitor oers the advantages of small size, low ESR
and high Irms capability. However, using ceramic capacitors as input capacitors
has caused various reliability problems in DC-DC converters across industry.
Multilayer ceramic chip capacitor (MLCC) cracking has been a big reliability
problem across DC-DC converter industry. The crack in MLCC could be caused
by ADVANCED SYSTEMS LABORATORY TECHNICAL INFORMATION CENTRE on 10/20/17. For personal use only.
package which will cause short circuit failure. One of the solutions is in the design
of MLCC capacitors the termination cap should not extend beyond the active
region of the package. This kind of MLCC structure is sometimes called fail-
safe construction where the active region does not extend to termination area.
TDK and Kemet have fail-safe MLCC capacitors available.
Input-voltage transients across input ceramic capacitors in a DC-DC converter
can easily exceed twice of that of the original voltage step. Choose correctly
rated ceramic capacitor and using a properly designed transient-snubber circuit
are essential to have a reliable operation.
Low stando height of MLCC can result in high halide ion concentration which
could cause migration of the silver-glass frit and lead to excessive current leakage.
Short circuits can occur due to silver migration when parts are subjected to THB
(Temperature-humidity-bias).
For switching MOSFETs, the dynamic or switching losses are the predominant
factor, and conduction losses play a secondary role. The dynamic loss is pro-
portional to the switching frequency. A switching MOSFET should have a gate
charge as low as possible to keep the dynamic losses small.
For synchronous rectier MOSFET, the static or conduction losses are the
dominant factor, and the gate charge losses play a secondary role. For syn-
chronous rectier MOSFET, it must have a suciently low Rds-on to meet the
eciency goal.
October 17, 2005 11:25 WSPC/122-IJRQSE SPI-J072 00194
470 X. Tian
that are rated for the temperature range they are going to see in applications.
The Safe Operating Area (SOA) of a power MOSFET shrinks with age (degrada-
tion with age). Depending on the stresses on the device, for long term operation,
both current and voltage could degrade up to the 50% of their values at time
zero which are the SOA values MOSFET vendors typically provided in the spec-
ication. The time dependent rate of SOA shrinking is a strong function of the
devices prolonged electrical and thermal cycling.2 To ensure the MOSFET have
long term reliability, it is desired to have MOSFET derated by 50% to take the
aging process into account.
MOSFET can fail at low input voltage due to transient stress on the switching
parts exceed their safe-operating-area (SOA) during turn-on and turn-o. It is
reported3 that the intrinsic body diode of power MOSFET undergoes dynamic
avalanching during its reverse recovery with an associated high dv/dt. This phe-
nomenon results in an excessive power loss in the circuit and increased switching
stress for the MOSFET. Even though the forward current is very small, there
is a destruction of the device due to the large reverse current. Under conditions
of high di/dt, dynamic avalanching is shown to occur even though the reverse
voltage is much smaller compared to the breakdown voltage.3
An individual gate resistor is needed for each MOSFET, regardless of whether
the MOSFETs are in parallel. This is because MOSFETs have both capacitance
(gate-source) and inductance (in the leads) which potentially forms an under-
damped resonant tank, and paralleled MOSFETs have been observed to oscillate
at 100 MHz.8 The gate resistor acts to limit the current the source has to source
or sink to the gate, but its real signicance is to damp the oscillations.8
more than 75% of its rated reverse voltage, nor with the junction temperature
above 110 C.
Do not use two rectier diodes in parallel. This is because when a diode gets
hotter, its forward voltage gets smaller. The one in the parallel that is conducting
more current at the beginning will get hotter, and have a lower forward voltage,
and as result it will conduct even more. The positive feedback process will nally
by ADVANCED SYSTEMS LABORATORY TECHNICAL INFORMATION CENTRE on 10/20/17. For personal use only.
failure.
There are a variety of factors that could cause a ceramic capacitor to crack. These
factors include manufacturing defects (ring crack, delamination, and voiding),
thermal shock (wave solder, solder reow), and handling (depaneling, insertion,
and attachment of heatsink). Use smaller, thinner capacitors with higher fracture
toughness dielectric materials. Do not put a ceramic capacitor near a board edge
or near a hole. Ceramic capacitors with a large thermal mass can also be problem
because they heat up slower than the board does.
472 X. Tian
HALT Testing
Temperature step tests which include low temperature step-stress testing and
high temperature step-stress testing. These tests are to discover any thermal
Int. J. Rel. Qual. Saf. Eng. 2005.12:459-474. Downloaded from www.worldscientific.com
problems, such as circuit design issues and component failures, and to ensure
enough temperature margin of the product.
Rapid thermal cycling or thermal shock. The temperature ramp rate should be
no less than 30 C/min. The test is to discover any material incompatibility and
problems caused by dierent CTE, such as solder joint failure, crack components,
and intermittent component failures.
Six-axis step-stress random vibration test. The test starts at vibration level of
5 Grams and step up 35 Grams until failure. This is to uncover connector prob-
lems, broken leads, shorted components, and solder joints problems.
Combined thermal shock and random vibration. This test is to use rapid thermal
cycling with continuous step-stressed random vibration in the background. The
test should run until the unit-under-test (UUT) to fail. Failures uncovered in this
test include solder joint problem, interconnect failure, intermittent components,
circuit design issues, etc.
It is essential that the UUT be powered and monitored during all the tests above
to catch any intermittent failures.
HALT test is dierent than margin test or design evaluation test. Although the
temperature and other stress margins are observed during HALT testing, this is
only a by-product of HALT testing. The HALT testing must push the UUT into
failures to uncover any potential weakness or reliability problems. It is a reliability
testing not a margin testing or design evaluation test. HALT test should always
focus on failure mechanisms not stresses used.
Theoretically, HALT test could use any test stresses and stress combinations as
long as the test is ecient to uncover potential reliability issues and the failure
mechanism uncovered is practically reasonable. To have a HALT test plan to
be success, it is essential to analyze every failure into its root cause and to use
engineering judgment to come up with appropriate corrective actions.
First all samples go through mechanical thermal shock test (minimum 300 cycles)
with temperature ramp rate in the range of 20 30 C/min. This is to uncover
any solder joint and other mechanical failures. Visual inspection of the solder
joints is performed of all modules via x-ray after 150 cycles and at 300 cycles.
Int. J. Rel. Qual. Saf. Eng. 2005.12:459-474. Downloaded from www.worldscientific.com
Also, pick one random sample at 150 cycles and 300 cycles to de-capsulate, depot,
and visual inspect the solder joint integrity.
After thermal shock, the samples are divided into two groups. The rst group
will go through HTOL, and the other one goes to HTB.
HTOL will be running at a few degrees below the modules high operating temper-
ature limit as discovered in HALT test and with maximum load. Power cycling is
also desired during HTOL. HTOL typically last for 2,000 hours. Common failures
uncovered during HTOL include poor quality components, such as electrolytic
capacitors and electro-mechanical devices, poor solder joints, overstressed com-
ponents, and circuit design problems.
Humidity-Temperature-Bias (HTB) test is also called 85/85 test (that is 85 C
plus 85% relative humidity). HTB is important to uncover any corrosion failures,
ionic migration (such as silver migration), dendrite growth, and package failures.
The HTB test typically last for 2,000 hours and is running at minimum load level
to avoid self-heating eect.
After HTOL and HTB test, all samples will go through full function test and com-
pare the results to the full function test results before the test. Any performance
degradation will be analyzed to its root cause.
HASS Test
HASS stands for Highly Accelerated Stress Screening. It is a 100% screening test
in the manufacturing process. HP requires DC-DC converter vendors to do HASS
test on their new modules using combined rapid thermal cycling plus random
vibration.
Compared to traditional ESS or Burn-in (constant temperature test), HASS uses
high stresses and combined stresses and has shown to be more ecient to catch
any process problems and infant mortality failures.
6. Conclusions
Reliability is a complex task which needs an integrated teamwork to accomplish.
The reliability work of a DC-DC converter starts from topology design, component
selection and qualication, design, testing, manufacturing, and does not end with
October 17, 2005 11:25 WSPC/122-IJRQSE SPI-J072 00194
474 X. Tian
the rst customer shipment. Reliability has to be designed in, tested in, and built
in. High Reliability = Failure-free Design + Failure-free Manufacturing. Design-for-
reliability is essential for a product to have high intrinsic reliability. Rigorous relia-
bility testing, qualication testing and manufacturing screening testing are a must
to uncover any potential reliability risk and weakness before the product shipped
to customers. Component reliability and application have been a bottleneck in DC-
by ADVANCED SYSTEMS LABORATORY TECHNICAL INFORMATION CENTRE on 10/20/17. For personal use only.
References
1. C. Varga, Power converter topology and MOSFET selection for 48-V telecom appli-
cations, Application Note 7004, Fairchild Semiconductor (2001).
2. K. Shenai, Made-to-order Power Electronics, IEEE Spectrum 7 (2000) 5055.
3. K. Shenai, P. Singh and S. Rao, Power supply design for performance and reliability, in
IEEE Proc. National Aerospace and Electronics Conference (NAECON 2000) (2000)
(Oayton, OH, USA), pp. 524531.
4. X. Tian and B. Edson, Design-for-reliability and sensitivity analysis based on predic-
tion, ASQ Reliability Review 4 (2003) 1828.
5. B. Human, Build reliable power supplies by limiting capacitor dissipation, EDN 3
(1993) 9398.
6. D. Zogbi, Low-ESR aluminum electrolytic failures linked to Taiwanese raw material
problem, Passive Component Industry Magazine (2002).
7. R. Alwitt and Y. Liu, Electrolytes for high voltage aluminum electrolytic capacitors,
Passive Component Industry Magazine (2000).
8. R. Lenk, Practical Design of Power Supplies (IEEE Press: New York, 1998).
9. G. Prophet, Power FETs nd their place, EDN 4 (2003) 4350.
10. Telcordia Technologies, Reliability prediction procedure for electronic equipment,
SR-332, Issue 1 (2001).