CH 04 - Overview of Estimation Techniques

4
Overview of Estimation
Techniques
Reliability tests are often costly. Whether or not a concept or design prototype
item survives a test, its time on test is a useful information measure that should
not be discarded. But how does one make effective use of this information?
In this chapter we show how this information can be used to derive
parametric point and interval estimates of reliability measures.
4.1 INTRODUCTION
The rst steps in the analysis of life data is the selection of a potential distribution
from the set surveyed in Chapter 3. In order to t a distribution, the parameters of
the distribution must be estimated. Once this is established, other informative
reliability metrics such as percentiles of the survival or failure distribution can be
estimated. In this chapter we survey popular approaches for the development of
point estimates of parameters and other reliability metrics and then introduce
methods for the development of the more powerful condence interval estimates
of the same. We focus our discussion on the application of these estimation
techniques for the more popular Weibull, exponential, and normal distributions.
The methods work for any distribution, however. Per the advice given in Chapter
3, the analyst is encouraged to apply a simple log transformation of his or her data
Copyright 2002 Marcel Dekker, Inc.
set if he or she wishes to conduct a lognormal distribution t. Extreme-value
distribution ts can be tested using a Weibull t according to the recommenda-
tions of Chapter 3. Alternatively, reliability analysts might want to utilize popular
software such as Minitab or Reliasoft
1
Weibull, which have built-in
capabilities for directly estimating the properties of a whole collection of
distributions used in reliability.
In this chapter the exponential distribution is treated as a special case
because the development of point and interval estimates of exponential properties
is straightforward, as the exact sampling distribution of the exponential MTTF or
hazard-rate parameter is well known and related to the chi-square sampling
distribution. This is not the case for Weibull or incomplete normal data sets,
wherein approximate methods must be used for obtaining condence interval
estimates.
This chapter surveys four popular approaches for distribution tting and
parameter estimation of Weibull and normal distribution metrics:
1. Graphical approaches: Probability plotting is a widely used method by
reliability practitioners. Rank estimators of F(t) are plotted on special
graph paper, constructed so that a good t will be evidenced by data
points that closely hug a straight-line t to the data. Parameter
estimates may then be determined graphically.
2. Rank regression: The raw data used to construct a probability plot is
analyzed by applying simple, linear regression techniques to trans-
formed rank statistics and raw observations. Parameter estimates, t-
tests for signicance, condence intervals, etc. are developed based on
information provided from analysis of variance and the least-squares t
to the data.
3. Maximum likelihood estimation: Maximum likelihood (MLE) methods
are based on formal theory and so are appealing to statisticians. The
estimates tend to be biased for small samples but are asymptotically
correct (consistent). In this chapter we make use of statistical comput-
ing packages such as Minitab and Reliasoft to generate ML properties.
4. Monte Carlo simulation: Monte Carlo (MC) methods constitute a
simulation strategy wherein we resample from a distribution with
known parameters. Given the availability of powerful, yet inexpensive
computing resources in desktop computers, such methods are increas-
ing in popularity. We demonstrate the use of WinSmith software for
MC simulation.
Other estimation methods including the use of linear estimation methods
and method of moments (and hybrid methods) are mentioned briey.
4.2 RANK REGRESSION AND PROBABILITY PLOTTING
TECHNIQUES
Probability plotting techniques have enjoyed immense popularity due to
The ease of which they can be generated manually or with computer-based
methods.
The analyst can readily assess the goodness-of-t by examining how well
the data can be tted with a straight line.
In this section we survey the use of probability plotting techniques along
with the use of more formal rank regression methods for estimation.
4.2.1 Normal Probability Plotting Techniques
Denition. The normal probability plot is a plot of the (Zscore) order
statistic, Z
i
= F
1
(
^
FF(t
i
)), versus a usage characteristic, t
i
, the ordered times, on
Cartesian paper, with the y-axis relabeled in cumulative probabilities, F(Z
i
).
1. The recorded observations, t
i
; i = 1; 2; . . . ; n, represent the ordered
raw data in units of time or usage, with t
i
_ t
2
< _ t
i
_ t
i1
_ _ t
n
. The index i, the order of occurrence or rank of the ordered
failure times, must be adjusted if some of the observations are censored
readings. Techniques for making this adjustment were presented in
Chapter 2.
2.
^
FF(t
i
) is an order statistic, such as
^
FF(t
i
) = (i
3
8
)=(n
1
4
), the default
rank estimator used internally by Minitab software in the construction
of normal probability plots. The index i is the rank or adjusted rank if
the data set contains censored observations.
3. Z
i
= F
1
(
^
FF(t
i
)), which converts the order statistic,
^
FF(t
i
), to standard
normal statistics, or Zscores, with the application of a standard normal
inverse operator. By the property of the normal distribution, 99.73% of
Zscore statistics will fall in the range of 3 to 3. For small sample
sets, 100% of all order statistics will be in this range.
The scatter of the points on the normal plot should appear to be linear if t is
normally distributed. If the t is deemed adequate, then it may be used to obtain
estimates of m and s
2
. An eyeball t of the plotted points is usually sufcient for
assessing the adequacy of the normal t. If not, rank regression methods can be
used to assess the t and arrive at formal estimates of the parameters of the
normal distribution. Details for construction of a rank regression model are
discussed next.
Rank Regression Model
We begin with an expression for the cumulative distribution function of a
normally distributed characteristic and equate it to a rank estimator of F(t
i
)
such as
^
FF(t
i
) = (i
3
8
)=(n
1
4
). Formally,
^
FF(t
i
) = F
t
i
^ mm
^ ss
_ _
.,,.
std: normal
cdf
=
i
3
8
n
1
4
_ _
.,,.
rank
estimator
(4:1)
The standard normal inverse operator, F
1
, is then applied on all of the
terms that appear in Eq. (4.1). The resultant relationship is linear between the
Zscore statistics and ordered failure times, t
i
, as follows:
ZScore
i
.,,.
y
F
1
i
3
8
n
1
4
_ _
=
^ mm
^ ss
.,,.
intercept
1
^ ss
.,,.
slope
t
i
.,,.
x
(4:2)
Thus, we can t the following regression model
ZScore
i
.,,.
y
=
m
s
.,,.
intercept
1
s
.,,.
slope
t
i
.,,.
x
e
i
(4:3)
where e
i
is an N(0; 1) random variable.
Using Normal Probability Plotting Paper
When a computer is not nearby, the normal plotting paper Ref 4-1 of Appendix
4B can be used to visually assess a normal t. The y-axis is automatically scaled
in standard normal (z) units: as such, the ordered pair (t
i
;
^
FF(t
i
)) can be directly
plotted on the paper.
Graphical Estimation
The best straight-line t can be made visually. The mean, m, can be easily
estimated by identifying the x-axis value corresponding to the 50th percentile.
This point maps to either ZScore = 0:0 in the Cartesian plot representation or to
F(ZScore) = 0:50 in the normal probability paper representation)the 50th
percentile. Specically,
^ mm = t
0:50
(4:4)
The standard deviation, s, can be estimated from the slope of the tted
relationship. However, a simpler method is to take the difference between the
84th percentile of the t, which corresponds to Z = 1, and the 50th percentile,
mean, m, which corresponds to Z = 0. That is,
t
0:16
^ mm
^ ss
= 1:0 = Z
0:16
=
t
0:16
= ^ mm ^ ss - t
0:50
^ ss =
^ ss = t
0:16
t
0:50
(4:5)
The use of Eqs. (4.4) and (4.5) is illustrated in the worked-out example that
follows (Example 4-1.)
Rank Regression
Rather than rely on the analysts subjectivity, it is often desirable to develop
formal regression least-squares estimates of the parameters of the model
expressed by Eq. (4.2). Abernethy (1996, 1998) contends that it is preferable
to run an inverse regression; that is, a regression model run with time t the
dependent variablethe x in Eq. (4.2)and with Zscore, the independent
variablethe y in Eq. (4.2). The reader is likely to feel a bit uncomfortable
running an inverse regression, since we usually think of y, our dependent variable,
as the order statistic,
^
FF(t
i
), which appears as a y-axis on probability plots. Most
reliability software is set up this way. However, the arguments for the use of an
inverse regression model are compelling, and they follow.
In an inverse regression run, the Zscore values are order statistics. As such,
their values are predetermined by sample size, with the introduction of a potential
random component if random censoring is present. The failure times, however,
are totally random. Thus, it makes more sense to use Zscore as the independent,
regressor variable and the ordered failures times as the dependent variable. In this
case the least squares are minimized in the x-direction instead of the more
familiar y-direction. Through the use of simulation techniques, Abernethy (1998)
argues that there will be less bias in the regression t if an inverse regression
model is runthat inverse rank regression parameter estimates are more
consistent. Abernethy (1996) uses a simulation approach on 1000 samples
from a given Weibull distribution with y = 1000 and b = 3:0 to demonstrate
the benets of using inverse rank regression methods.
The inverse regression of the Zscore order statistic upon t will be of the
form
t
i
.,,.
new y
= m
.,,.
intercept
s
.,,.
slope
Zscore
i
.,,.
new x
e
i
(4:6)
Setting Up Rank Regression Models
Accordingly, we suggest the following regression procedure to estimate m and s:
1. Create two columns of information:
a. Ordered failures, t
i
b.
^
FF(t
i
) using adjusted ranks, if data set contains censored observa-
tions
Only ordered failures and adjusted ranks are analyzed.
2. Create a column of Zscores, where Zscore = F
1
(
^
FF(t
i
)). Note that in
Excel this is accomplished using the function, NORMSINV(
^
FF(t
i
)).
3. Run a simple linear regression, with independent variable, Zscore, and
dependent variable (y), the ordered failure times.
4. ^ mm; intercept in inverse regression model.
5. ^ ss; slope in inverse regression model.
Example 4-1: (Right-censored data set)
Data is collected on the wearout of n = 15 ball bearings in hundred-hr. The test is
stopped at 90 hundred-hr, and only 11 failures are recorded. The following
information is summarized in Table 4-1:
a. Ranks
b. Ordered bearing life times (Blife)
c. Rank estimator, p
i
= (i
3
8
)=(n
1
4
)
d. Zscore = F
1
(p
i
)
TABLE 4-1 Bearing Life Data (in hundred-hr)
i BLife (i
3
8
)=(n
1
4
) ZScore
1 70.1 0.041 1.74
2 72.0 0.107 1.24
3 75.9 0.172 0.94
4 76.2 0.238 0.71
5 82.0 0.303 0.51
6 84.3 0.369 0.33
7 86.3 0.434 0.16
8 87.6 0.500 0.00
9 88.3 0.566 0.16
10 89.1 0.631 0.33
11 89.4 0.697 0.51
Analysis. A normal probability plot will be constructed using two different
procedures:
1. The Minitab statistical computing package was used to automatically
output a normal probability plot of the data with superimposed 95%
condence bands. The output is displayed in Figure 4-1. All points
reside within the 95% condence band, but we do observe points at
each tail that appear to deviate signicantly from the straight line. As
such, the adequacy of the normal t must be in doubt. A goodness-of-
t test might help to resolve this issue further. Goodness-of-t tests are
surveyed in Chapter 5. The parameters m and s are estimated as
follows:
^ mm = t
0:50
- 88 hundred-hr
^ ss = t
0:16
t
0:50
- 101 88 = 13 hundred-hr
2. A plot of the Zscore statistic versus bearing life with superimposed
best straight-line t is presented in Figure 4-2. We construct an
FIGURE 4-1 Normal probability plot of bearing life data with 95% condence bands (test
stopped at 90 hundred-hr) and annotated graphical estimates of m and s.
inverse regression model to estimate m and s. The tted relationship is
expressed as
BLife = 86:17 10:08 ZScore
Based on Eq. (4.6), the parameters are estimated as
^ mm = 86:17 hundred-hr
^ ss = 10:08 hundred-hr
4.2.2 Weibull Probability Plotting Techniques
We begin with the Weibull survival function,
R(t) = e
(t=y)
b
FIGURE 4-2 Bearing Life (hundred-hr) vs. Zscore with superimposed least squares t
from Minitab
1
Stat > Regression > Fitted Line Plot:
and take the natural log of both sides, which yields
ln R(t) =
t
y
_ _
b
By taking the natural log of both sides again, we have the linear relation
ln ln
1
1
^
FF(t)
_ _
.,,.
y
= b
.,,.
slope
ln t
.,,.
x
b ln y
.,,.
intercept
(4:7)
Thus, a plot of the rank statistic, the double log of 1=(1
^
FF(t
i
)) ought to
vary linearly with ln(t) when failure times follow a Weibull distribution. Weibull
plotting paper is based on this relationship. It is constructed by scaling the x-axis
in natural log units while scaling the y-axis in units of the double log of the
inverse of 1
^
FF(t). It becomes the familiar Weibull plot once the y-axis is
relabeled in cumulative probabilities of F.
Sample Weibull paper is presented in Ref 4-2 in Appendix 4B. A graphical
protractor for estimating b is annotated on the paper (Ford, 1972).
1. To estimate b, we make use of the protractor on the upper left
corner, to estimate b as the slope of the tted relationship.
2. To estimate y, we make use of the relationship
F(y) = 1 exp((y=y)
b
) = 1 e
1
= 0:632 =
^
yy = t
0:368
Rank Regression
We parallel the discussion that was used to describe the development of rank
regression estimators on the normal parameters, m and s. We use the linear
relationship expressed by Eq. (4.7) to develop rank regression estimates of b and
y. Rather than regress the double log of the rank estimator with ln t, an inverse
rank regression model is used, where the left-hand side of (4.7) is treated as the
independent variable. The inverse regression is of the form
ln t
.,,.
new y
=
1
b
.,,.
slope
ln(ln R(t))
.,,.
new x
ln y
.,,.
intercept
e (4:8)
We now demonstrate the use of inverse regression techniques for Weibull
analysis.
Example 4.2: Multiply right-censored Weibull data set
A group of n = 20 electric motors was tested for 200K revolutions. Two motors
were removed from the test for other reasons at 30K and 35K revs, respectively.
Two motors were still operational at 200K revs. Thus, only a total of r = 16
failures were recorded.
The data set is presented in Table 4-2, along with median rank statistics
based on the adjusted ranks of the multiply censored data set.
Analysis. Empirical estimates of
^
FF(t) were obtained with the use of
Minitabs default estimator,
^
FF(t
i
) = (adjusted rank
i
3
8
)=(n
1
4
).
A Weibull plot of the data set is shown in Figure 4-3. The data appears to
hug the superimposed straight-line t, which is indicative of a good t. We now
illustrate the use of probability plotting and rank regression techniques for
estimating y and b.
TABLE 4-2 Multiply Censored Test Data on
n = 20 Electric Motors
Krevs InvRank AdjRank
^
FF(t)
20 20 1.00 0.031
25 19 2.00 0.080
30 18
35 17
41 16 3.12 0.135
53 15 4.24 0.191
60 14 5.35 0.246
75 13 6.47 0.301
80 12 7.59 0.356
84 11 8.71 0.411
95 10 9.82 0.467
128 9 10.94 0.522
130 8 12.06 0.577
139 7 13.18 0.632
152 6 14.29 0.687
176 5 15.41 0.743
176 4 16.53 0.798
180 3 17.65 0.853
200 2
200 1
Note: denotes a suspended item.
Graphical Estimation (see Figure 4-3)
^
bb - 1:60:
^
yy - 130K revs:
FIGURE 4-3 Weibull plot of data in Table 4-2.
Inverse Rank Regression
The results of an inverse rank regression of ln t (y-values) versus
doubln R = ln ln 1=(1
^
FF(t)) (x-values) are presented in Figure 4-4. Note the
extremely high level of signicance of the t, with p-values equal to 0 to 3
decimal places, and R
2
of 98.1% This is to be expected, as the use of order
statistics induces a correlation between time and the median rank statistics.
The tted relationship is
ln T = 4:899 0:599 doubln R:
^
bb = inverse of the slope of the fit = (0:599)
1
= 1:67:
ln
^
yy = intercept of fit = 4:899 =
^
yy = 134:2K revs:
FIGURE 4-4 Inverse regression analysis of ln t vs. ln ln(1=R) provided by Minitab
1
.
Commencing with version 13, Minitab has introduced a built-in capability
for generating inverse rank regression estimates. The Minitab output is presented
in Figure 4-5. The parameter estimates differ slightly, perhaps due to a difference
in the internal choice of rank estimator used in the routine.
4.3 MAXIMUM LIKELIHOOD ESTIMATION
4.3.1 Introduction to ML Estimation
We generalize our discussion by considering a multiply right-censored data set
consisting of independent observations, t
i
_ t
2
_ _ t
n1
_ t
n
, with associated
survival function R(t; y), and failure density function f (t; y), where y is an array
of one or more unknown parameters that are to be estimated (see Table 4-3).
The likelihood function, L(y), is constructed as follows:
L(y) =

n
i=1
L
i
(y) (4:9)
FIGURE 4-5 Inverse rank regression capability of Minitab
1
V13.1 for the multiply
censored data set of Table 4-2.
where each likelihood term L
i
(y) is exchanged with
f (t
i
; y) if t
i
is a recorded failure
R(t
i
; y) if t
i
is a right-censored observation
The maximum likelihood (ML) estimator for y is just that unique value of
y, if it should exist, that maximizes L(y). Maximum likelihood (ML) approaches
enjoy several advantages over rank regression methods for parameter estimation.
Due to the nature of least-squares estimation, the regression t can be excessively
inuenced by observations in the tail regions of the distribution. In the left-tailed
region, this might be considered as a benet, since it is the region of greatest
interest (earliest failures). However, this would never be the case for the right-
tailed region! On the other hand, ML methods constitute a formal framework for
estimation, which, for large samples (asymptotic) are unbiased (consistent) and of
minimum variance (asymptotically efcient). For small samples, however, ML
estimates are biased estimators; that is, E(
^
yy) ,= y.
A more formal overview of likelihood estimation is presented in Chapter 7,
which illustrates the use of Excel procedures for obtaining likelihood estimates
and asymptotic, approximate condence intervals on reliability metrics.
4.3.2 Development of Likelihood Condence Intervals
Condence interval estimation comprises a much more effective strategy for
conveying estimates. When we develop a single point estimate,
^
yy, of y, we do not
associate a level of condence with it. We recognize that each time a new sample
is collected, a different point estimate will be obtained. For that reason,
^
yy has a
sampling distribution with expected value E[
^
yy], which will equal y if the
estimator is unbiased, and with standard error se
^
yy =
E(y E
^
yy)
2
_
.
Knowledge of the sampling distribution can then be used to develop a
condence interval estimate of y and an associated level of condence,
C = 1 a, that the true value of y lies in the constructed interval.
TABLE 4-3 y Is an Array of Parameters
Distribution y
Exponential l or y
Normal (m; s)
Lognormal (t
med
; s)
Weibull (b; y) or (b; y; d)
EVD (m; s) or (ln y; s)
Denition. A condence interval about one or more parameter(s) yor a
function of one or more parameters f (y)has associated with it a level of
condence, C, on the likelihood that the condence interval contains the true
value, y. It will be one of the following forms:
Two-sided Condence Interval on y:
P(y
L
_ y _ y
U
) _ C (4:10)
One-sided Condence Interval on y:
P(y _ y
L
) _ C
(one
-
sided lower confidence interval)
or P(y _ y
U
) _ C
(one-sided upper confidence interval)
(4:11)
The mathematics of constructing condence intervals is closely related to
hypothesis testing. Given a sample of size n from a population, a condence
interval consists of
All values of y = y
0
for which the hypothesis y = y
0
is accepted. For two-
sided condence intervals, we test y = y
0
versus y ,= y
0
; for one-sided
condence intervals, we test y = y
0
versus y < y
0
or y > y
0
.
However, if the sampling distribution is not known, the condence interval
must be approximated, and this leads to concerns about the efcacy of such
intervals. For data sets consisting of one or more suspensions, expressions for the
exact condence limits on normal reliability metrics such as m, s, or t
R
cannot be
obtained due to the inability to derive expressions for the sampling distributions
of ^ mm; ^ ss, and t
R
. For all data sets, suspended or complete, the same is true for the
development of expressions for the sampling distributions of Weibull metrics,
^
yy;
^
bb, and
^
tt
R
.
Instead, we must take advantage of the asymptotic (large sample) properties
of likelihood estimators to arrive at approximate condence intervals on the
distributional parameters or reliability metrics of interest. Two types of approx-
imations are commonly used. They are
1. Fisher matrix (FM) asymptotic intervals
2. Likelihood ratio (LR) based procedures.
These procedures are discussed in Chapter 9. Fisher matrix (FM) intervals
are based on the normal approximation of a sum of n partial derivatives of a log-
likelihood function. (Normality is justied by the central limit theorem.) Like-
lihood ratio (LR) procedures are based on the asymptotic distribution of a ratio of
likelihood functions, as used in the application of hypothesis testing. That is, LR
condence interval may be viewed as consisting of all values of y = y
0
for which
the hypothesis test H
0
: y = y
0
is not rejected, as explained in ,9:2:2. These
procedures require the use of unconstrained, nonlinear optimization methods for
Weibull and incomplete normal data sets. The underlying mathematics and
procedures for generating these condence intervals are deferred to Chapter 9,
wherein we introduce the use of the Excel
1
Tools > Solver or Goal Seek routine
for carrying out the optimization.
Fortunately, with the advent of good statistical computing packages, the
practitioner need not be familiar with the algorithms used to generate these limits.
We will demonstrate this capability with the use of Minitab in worked-out
examples later on.
Efcacy of Approximate Condence Intervals
We must remember that condence intervals are usually constructed based on
outcomes from a single sample. If the process of obtaining a random sample is
repeated, and the condence limits recalculated, the limits will be different and,
due to sample variation, will be different each time. That is, both the center and
width of condence intervals will be affected by sample variation.
This, condence intervals must possess the following desirable properties:
1. They must be of minimum width using either the width H = y
U
y
L
for two-sided condence intervals or the half-width H =
^
yy y
L
or
y
U

^
yy for one-sided condence intervals.
2. Ideally, their coverage must be C*100%, where coverage is dened as
the long-run proportion of time that the true value for y will lie in the
condence interval if the process of taking a random sample and
recalculating the condence limits is repeated a large number of times.
Most of the classical condence limit expressions for complete samples on
the normal parameters, m and s, do possess this ideal property. However, for
incomplete samples, we often make use of the asymptotic (large sample)
properties of the maximum likelihood estimator to devise approximate condence
intervals on parameters or reliability metrics of interest. In such cases we must
look at both interval width and coverage when evaluating the efcacy of using
these approximations.
Monte Carlo (MC) simulation constitutes an effective methodology for
evaluating the efcacy of condence interval limits. In this, MC methods are used
to create many samples from a known distribution. The coverage percentage, or
proportion of times that the true value of a parameter is contained in the
condence interval, is tabulated along with information on the average width
or half-width of the intervals. Tradeoffs in coverage and width must be considered
when evaluating several competing condence limit expressions. The best
C*100% condence interval will then be that condence interval having the
shortest width among all condence intervals that have exactly C coverage.
Monte Carlo estimation is discussed in ,4:4 and Appendix 4A.
4.3.3 Maximum Likelihood Estimation of Normal Parameters, m and s,
for Complete Sample Sets
For complete data sets, the ML estimates of m and s are well known and
presented here:
^ mm = xx (4:12)
^ ss
2
=
n
i=1
(x
i
xx)
2
n
=
n 1
n
_ _
s
2
(4:13)
Note: ^ ss
2
is a biased estimator of s
2
since E( ^ ss
2
) = [(n 1)=n] E(s
2
) =
[(n 1)=n] s
2
.
Condence Interval Estimates of m, s, and Percentiles of the Normal
Distribution for Complete Samples
Knowledge of the sampling distributions of ^ mm and ^ ss is needed to create exact
condence intervals on m and s. When data sets are completethat is, every item
on test has failedstandard sampling distributions such as the standard normal,
T, or chi-squared distributions are used to construct these condence intervals.
This is not the case when samples consist of one or more suspensions, as the
sampling distributions on ^ mm and ^ ss must be approximated in some way.
The condence intervals will be of the form
P(m
L
_ m _ m
U
) _ C (4:14)
and
P(s _ s
U
) _ C (4:15)
Only a one-sided condence interval on s is shown in Eq. (4.15), as we
recognize that we are interested only in obtaining information relevant to setting
an upper bound on s.
The familiar condence interval expressions on m and s, cited in every
introductory engineering statistics textbook, are based on an assumption that the
data set is complete. These expressions, which are repeated here, are based on the
use of normal sampling distributions, T and chi-squared, each with n 1 degrees
of freedom. These standard distributions are used to model the sampling
distributions of

tt and s
2
, respectively:
tt m
s=

n
_
_ _
T
n1
(n 1)s
2
s
2
w
2
n1
(4:16)
This leads immediately to the well-known condence interval expressions
on m and s
2
, which are presented in Eqs. (4.17) and (4.18):
P ^ mm t
n1;(1C)=2
s
2
n
_
_ _
.,,.
m
L
_ m _ ^ mm t
n1;(1C)=2
s
2
n
_
_ _
.,,.
m
ij
_
_
_
_
_
_
_
_
_ C
(4:17)
P s
2
_
(n 1)s
2
w
2
n1;C
_ _
.,,.
s
2
U
_
_
_
_
_
_
_
_
_ C
(4:18)
A graphical representation of the right-tailed percentiles, T
n1;a
and w
2
n1;a
,
is presented in Figures 4-6 and 4-7, along with information on how to use Excels
statistical functions to evaluate percentiles of the T- and w
2
-distributions,
respectively. (Unlike many textbooks, the author has elected not to reproduce
tables of normal sampling distribution percentiles. They can be easily obtained
with the use of Excel.)
Example 4-3: Condence intervals on m and s
2
no censoring allowed
Data is collected on a machine characteristic (thread depth). Based on a sample of
size n = 12, average thread depth is 15 ten-thousandths with a sample variance of
s
2
= 3:7 ten-thousandths.
Construct a two-sided condence interval on mean depth m and a one-sided
upper condence interval on the variance, s
2
. Use C = 90%.
Solution:
Condence Limits on m Using Eq. (4.17)
m
L
= ^ mm T
11; 0:05
s=

n
_
= 15 1:796
3:7
12
_
= 14:002 ten-thousandths
m
U
= ^ mm T
11; 0:05
s=

n
_
= 15 1:796
3:7
12
_
FIGURE 4-6 Evaluating percentiles of the t-distribution with the use of Microsoft
1
Excel.
FIGURE 4-7 Evaluating percentiles of the chi-squared distribution with the use of Microsoft
1
Excel.
Condence Limits on s Using Eq. (4.18)
s
2
U
=
(n 1)s
2
w
2
11; 0:90
=
(12 1) 3:7
5:578
2
Condence Intervals on Percentiles of the Survival Distribution Based
on Complete Sample Information
For complete data sets (that is, failure time for every item placed on test), the
survival percentile of the normal distribution is calculated using (see Table 3-2)
^
tt
R
= ^ mm ^ ssZ
R
(4:19)
Alternatively, Excel can be directly employed to evaluate survival percen-
tiles, as shown earlier.
Exact expressions for obtaining interval estimates of t
R
are based on the
nding that the pivotal quantity (
^
tt
R
t
R
)=s is distributed according to a
noncentral t-distribution with noncentrality parameter Z
R
n
_
. Statistical com-
puting packages have built-in functions for generating condence intervals on t
R
.
4.3.4 ML Estimation of Normal Parameters m and s
2
in the Presence
of Censoring
In the real world, data sets are rarely complete: Suspensions can occur randomly
due to either unforeseen circumstances or competing failure modes. Additionally,
resource limitations on availability of test xtures and time constraints due to
product lead time reduction pressures result in early withdrawal of items under
test. Accordingly, todays practitioners need to be familiar with the variety of
methods available for analyzing censored data. Most practitioners are not well
versed in the complexities involved in the generation of maximum likelihood
estimates of m and s in the presence of censoring.
In such instances the resultant relationships for obtaining ML estimates of m
and s require the use of a nonlinear search procedure such as the Newton
Raphson numerical method. For our advanced readers, the mathematical relation-
ships for deriving ML estimates of m and s
2
are presented in Chapter 7, along
with an introduction to the use of Excel for generating easy, one-step ML
estimates of m and s.
Fortunately, many of todays statistical computing packages have built-in
capabilities for obtaining maximum likelihood estimates of m and s
2
. We
illustrate the use of Minitab for obtaining ML estimates of m and s for the
bearing life data of Table 4-1. The output from Minitab is shown in Figure 4-9. To
develop asymptotically correct condence intervals on normal parameters m and
s, we make use of Minitabs built-in capability for generating Fisher-matrix
FIGURE 4-8 Evaluation the 90th percentile of survival distribution with the use of Microsoft
1
Excel for a normally
distributed characteristic with mean, 10, and standard deviation of 2.0.
condence intervals. Both point and 95% condence interval estimates on m
(location parameter) and s (scale parameter) are shown in Figure 4-9. The reader
is encouraged to compare the ML parameter estimates, ^ mm = 85:56 and ^ ss = 8:71,
with either of the two rank regression estimates worked out in ,4:2:1.
4.3.5 ML Estimation of Weibull Parameters y and b
The likelihood equations for estimating Weibull parameters, y and b, are derived
in Chapter 7. For multiply, right-censored data sets, the relationships for
developing ML estimates of Weibull parameters y and b reduce to
1
b
n
i=1
d
i
ln t
i
r
n
i=1
t
b
i
n
i=1
t
b
i
ln t
i
= 0 (4:20)
^
yy =
\i
t
b
i
r
_
_
_
_
_
_
1=b
(4:21)
FIGURE 4-9 ML estimates and 95% condence intervals provided by Minitab
TM
V13.1
Stat > Reliability=Survival > Parametric Right-Censored procedure.
Note that in Eq. (4.20), we make use of the indicator variable:
d
i
=
1 if t
i
is a recorded failure
0 if t
i
is a right-censoring time
_
The identication of a value of b that satises Eq. (4.20) generally requires
the use of nonlinear, gradient search procedures. In Chapter 9 we present the use
of Excels Tool > Solver or Tool > Goal Seek procedure for deriving
^
bb. Once
^
bb
is identied, Eq. (4.21) can then be used to derive
^
yy. Here we will make use of the
strong features of Reliasoft Weibull V6.0 (2001) software to develop ML
estimates for y and b for the Weibull data set of Table 4-2. Weibull software
has the capability to generate likelihood contours, which are reproduced in Figure
4-10. The contours are seen to be fairly well behaved. As such, most nonlinear
gradient search routines should work well. There are challenges, however, which
are discussed in greater detail in Chapter 9.
ML estimates are readily provided by Reliasoft and reproduced in Figure
4-11. The ML estimates are
^
bb = 1:77848 and
^
yy = 132:8035
FIGURE 4-10 Likelihood contours generated by Reliasoft Weibull V6.0 software
(2001).
To develop asymptotically correct condence intervals on Weibull para-
meters y and b, we make use of Minitabs built-in capability for generating
Fisher-matrix condence intervals. We reproduce the output from the Minitab
Stat >Reliability=Survival >Parametric Right Censored procedure, in Figure
4-12 for the sample Weibull data set. Point and interval estimates of the Weibull
parameters are summarized are shown.
Using Monte Carlo simulation, Abernethy (1996) shows that condence
interval approximates of this type perform very poorly. For example, for Weibull
data, 90% asymptotic condence interval approximations have been shown to
have coverage percentages as low as 75% when sample sizes are as small as
n = 20.
4.4 SIMULATION-BASED APPROACHES FOR THE
DEVELOPMENT OF NORMAL AND WEIBULL
CONFIDENCE INTERVALS
With the advent of widely available, powerful desktop computing resources and
analysis software, we are witnessing a return to simulation-based approaches. As
a prime example of this statement, consider the fact that WinSmith Weibull
Analysis software now includes built-in capabilities for devising Monte Carlo-
based condence intervals. Monte Carlo (MC) simulation procedures have been
developed for obtaining approximations to the exact condence intervals about
Weibull and normal parameters and reliability metrics (see Lawless, 1982, pp.
226232).
FIGURE 4-11 ML estimates of beta and theta from Reliasoft Weibull V6.0 software.
The MC capabilities of WinSmith software were tested on the Weibull data
set of Table 4-2. Monte Carlo (MC) samples are drawn from a hypothetical
Weibull distribution with y =
^
yy and b =
^
bb. This is a form of parametric
bootstrap sampling (see Hjorth, 1994, ,6:1, and Meeker and Escobar, 1998).
The results are shown next.
Example 4.4: Monte Carlo condence limits for Weibull data set of
Table 4-2
MC condence intervals were generated for the data set presented in Table 4-2.
The output report from WinSmith is presented and summarized in Figure 4-13.
The use of MC methods for the development of approximate condence
intervals is discussed in greater detail in Appendix 4A.1. A word of caution is in
order:
The use of MC simulation on multiply right-censored observations is still an
open area of research. Either of the approaches outlined in the appendix has
not been proven. The methodology does seem useful, however. The reader
FIGURE 4-12 ML point and interval estimates of b; y; t
0:90
, and R(100) provided by
Minitab
1
Stat >Reliability=Survival >Par. Distr. Analysis-Right Censored procedure.
should consult Appendix ,4A.1, which provides an overview on strategies for
obtaining MC intervals along with two worked-out examples illustrating the
use of Minitab macros for manually generating MC condence limits.
4.5 OTHER ESTIMATORS
4.5.1 Best Linear Estimators of m and s
Linear estimators were popular in the early days of computing when computing
resources were not widely available to obtain ML estimates. Today they are rarely
used as, with the advent of inexpensive computing resources, the availability of
software for the development of maximum likelihood estimates has superseded
the need for using linear estimators.
Until the 1980s, reliability textbooks contained many pages devoted to the
reproduction of tables in the appendices for constructing linear estimators. For
example, the reader might consult textbooks by Kapur and Lamberson (1977) or
FIGURE 4-13 Monte Carlo percentile condence and point estimates for y; b, and t
0:90
.
Mann et al. (1974) for the availability of such tables. The reader is likely to come
upon a reference to BLUE (best-linear unbiased estimator) or BLIE (best-linear
invariant estimator) estimators. These estimators are formed from linear combi-
nations of the ordered failure times,
^
yy = a
1;n
t
1
a
2;n
t
2
a
r;n
t
r
, where the
constants a
1;n
; a
2;n
; . . . ; a
r;n
are tabulated or incorporated internally into statis-
tical computing packages. The constants are chosen so that
^
yy has the minimum
variance (BLUE) or minimum mean-squared error (BLIE) among all possible
linear estimators of this kind. The reference textbooks are often lled with 30 or
more pages of tabulated constants for developing linear estimates for sample sizes
up to n = 25 (see Lawless, 1982, pp. 144145).
4.6 RECOMMENDATIONS FOR CHOICE OF
ESTIMATION PROCEDURES
The uses of rank regression, maximum likelihood, and simulation-based
approaches to estimation have been described. It is now up to the reliability
analyst to decide on an estimation scheme for his or her data. To assist in making
this decision, we provide the following information, as we refer back to the
Weibull data set of Table 4-2.
Differences in the rank regression and maximum likelihood estimates
appear to be signicant, but, in fact, their differences are relatively minor
compared to the standard error of the estimates of y and b. For example, refer
back to the results given by Figure 4-5 (Weibull rank regression) to those
presented by Figure 4-12 (Weibull ML estimation). Even with a fairly good-
sized data set (n = 20; r = 16 failures), we see the construction of very wide
condence intervals on b, from 1.19 to 2.67. This is a much more important issue
than whether or not rank regression methods are preferred over simulation-based
or maximum likelihood procedures. In any case, with respect to arguments for
and against ML-based approaches, we provide the following arguments given by
Abernethy (1996).
For small and moderate samples sizes of less than 100 failures, Abernethy
(1996) reports that ML estimates tend to be biased. The ndings are based upon a
Monte Carlo simulation study of 1000 replicates from a known Weibull
distribution, comparing inverse rank regression and maximum likelihood esti-
mates. For this reason, Abernethy does not recommend the use of likelihood-
based methods. To alleviate the bias, some work has begun on the development of
simple bias correction factors, similar to the n=(n 1) correction factor used for
removing the bias on the ML estimator of the normal variance, s
2
(see
Abernethy, 1999).
However, there are also mathematical objections to the use of least-squares
(rank regression) methods for Weibull parameter estimation. We briey allude to
some of the difculties in ,4:2:2. Observations in the tail region can overly
inuence the regression t and the corresponding parameter estimates. Abernethy
(1996) reports that the lower end (tail) of the t tends to be overweighed
compared to the upper end, particularly so when data is time- (type I) or failure-
(type II) censored. From an estimation viewpoint, this is not a good property.
Abernethy (1996) also reports that Dr. Suzuki of the University of Tokyo suggests
discarding the lower third of the data before tting the data.
Based on simulation studies, Abernethy (1996) and Fulton (1999) recom-
mend the following:
+ For small sample sizes, r < 11 failures, use rank regression for
estimating y and b followed by using Monte Carlo simulation for
developing condence interval estimates on any reliability metric.
+ For sample sizes of 11 or greater, use rank regression methods for
estimating b and y in combination with the use of ML-based, Fisher
matrix, for generating approximate condence intervals on Weibull
parameters and associated reliability metrics of interest.
+ For sample sizes of 11 or greater, and if ML techniques have been used
to estimate y and b, use LR methods for the development of condence
intervals on Weibull parameters and associated reliability metrics of
interest (despite the fact that they are not symmetric!).
4.7 ESTIMATION OF EXPONENTIAL DISTRIBUTION
PROPERTIES
The exponential distribution is extremely easy to work with. It consists of only
one parameter: either a mean time-to-failure parameter, y; or a hazard-rate
parameter, l = 1=y. The exponential distribution is named for its survival
function, which is of the (simple) exponential form:
R(t) = 1 F(t) = exp(lt) = exp(t=y) for t _ 0 (4:22)
Here we show how straightforward it is to obtain maximum likelihood
estimates of exponential properties. Because the exact distribution of
^
yy and
^
ll are
related to the chi-square distribution, the development of exact ML, one-sided
condence limits on exponential metrics is easy to obtain and reproduced here.
The theory behind the development of these expressions is discussed in
greater detail in 9.1.
4.7.1 Estimating the Exponential Hazard-Rate Parameter, l, or MTTF
Parameter, y
For complete data sets, the MTTF parameter, y = 1=l, may be estimated using
^
yy =
n
i=1
t
i
n
= l
1
(4:23)
The MTTF parameter, y, has the characteristic property that F(y) =
1 exp(1) = 0:632. Thus, the 63.2 percentile of the failure distribution, or
36.8 percentile of the survival distribution, may be used to develop point
estimates of y and l:
^
yy = t
:368
= 1=
^
ll (4:24)
Should a data set contain censored observations, it is evident that the use of
Eq. (4.23) would lead to an underestimation of the true mean time-to-failure
parameter, as the potential failure times of any censored items could be far greater
than the times they were suspended from the test. Without the use of a formal
procedure for estimation, which can take censoring into accountfor example,
likelihood methods, it is not obvious how Eq. (4.23) might be properly modied
to take censoring into account.
We now play a game that we play in the classroom! What kinds of simple
modications to Eq. (4.23) might one suggest to properly take into account the
effect of censoring mechanisms? To this end, in Table 4-4 we present an
enumerated list of all possible sample average estimators, including the one
shown in Eq. (4.23). Four possible sample average estimators are shown. For the
rst three(a) to (c)arguments against their use are obvious and explained
below. However, the fourth one is quite curious, as there are not any apparent
arguments that would support its use, yet there are not any strong objections
against its use either. In Chapter 9, ,9.1.1, the fourth estimator is shown to be the
maximum likelihood estimator of y:
MTTF =
total exposure time of all units on test
no: of recorded failures

T
r
(4:25)
Generalization of
^
ll = r=T
Equation (4.25) applies to any arbitrary censoring (e.g., combinations of left and
right censoring and interval censoring) situation. In Table 4-6 we provide
formulas for calculating T, the total unit exposure time, under a wide range of
censoring scenarios.
In the table
t
i
= Recorded event (either failure or censored observation):
t
r
= Time of rth recorded failure
(stopping time for type II; singly-failure censored):
t* = Time when test is stopped (for type I; time-censoring):
t
rc
= Time of rth recorded failure when c early suspensions occur
(stopping time for type II; multiply-censored test):
r = Number of failures:
c = Number of suspended items:
k = Number of test stands or test fixturing devices:
For example, consider a replacement test, wherein we have k test xtures
available. Our policy is to keep every test xture running for t* time units. That
is, we assume that the moment that an item fails, that in an instant, the item is
removed from test, and a new test item is placed on test. Under this assumption,
Eq. (4.25) generalizes to
T = no: of test fixtures (k) duration of test (t*)
TABLE 4-4 Possible Estimators of the Exponential MTTF Parameter
(1) MTTF =
n
i=1
t
i
n
(2) MTTF =
r
i=1
t
i
n
MTTF is underestimated if some of the
observations are censored.
MTTF is severely underestimated since
censoring times are ignored in the
numerator.
(3) MTTF =
r
i=1
t
i
r
(4) MTTF =
n
i=1
t
i
r
MTTF is underestimated since censoring
times are ignored in the numerator.
Least objectionable estimator, but there is
no rational argument for suggesting this
estimator. We later determine that it is an
unbiased estimator.
TABLE 4-5 Multiply Right-, Failure-Censored Exponential Data Set
10 23 28 30 66 85 89 102 119 144 150
156 170 210 272 286 328 367 402 406 494 535
Note: denotes an item taken off test.
Example 4-5: Use of ML estimation of exponential hazard-rate parameter
A type II, multiply right-censored data set in Table 4-5 is based on n = 30 items
on test that was stopped at 535 hours upon the occurrence of the 20th recorded
failure. Due to unfortunate circumstance, c = 2 items were taken off the test early
at t = 30 and 89 hours, respectively. Based on Eq. (4.25),
^
ll =
r
T
=
20
rc
i=1
t
i
(n r c)t
r
=
20
4472 (30 20 2)*535
= 0:00229=hr
4.7.2 Exponential Condence Intervals
In ,9:2:1 it is proven that T is distributed according to a chi-square distribution,
for which tables are readily available. Accordingly, exact expressions for the
condence limits on l; y, or other reliability metrics are available. A C = 1 a
one-sided, upper condence interval on the hazard parameter l is given by
P l _
w
2
2r;(1C)
2T
_ _
.,,.
l
U
_
_
_
_
_
_
_
_
_ C (4:26)
TABLE 4-6 Calculation of Total Unit Exposure Time (T)
Type of data Expression for T
Complete sample T =
n
i=1
t
i
Singly time-censored (type I) at t
+
T =
r
i=1
t
i
(n r)t*
Singly failure-censored (type II) at t
r
T =
r
i=1
t
i
(n r)t
r
Multiply time-censored at t* (time-censored with c suspen-
sions)
T =

rc
i=1
t
i
(n r c)t*
Multiply failure-censored (failure-censored with c
suspensions) at t
rc
T =

rc
i=1
t
i
(n r c)t
rc
Replacement test T = kt*
General censoring
all
items
(t
off
test
t on
test
)
The reader will note the absence of two-sided condence limit expressions
on exponential reliability metrics. This is due to the fact that the author does not
believe in the use of two-sided condence limits for metrics whose ideal value is
either a smaller-the-better, in the case of expressions related to the exponential
hazard-rate parameter, l, or a larger-the-better, for the case of expressions
related to the MTTF parameter, y, or percentiles of the exponential distribution.
It is customary to adjust the degrees of freedom to 2r 2 for type I
censoring. Equation (4.26) holds exactly true for failure-censored data. For type I
(time) censoring, the number of recorded failure is a random quantity. To
approximately account for this effect, it is customary to adjust the degrees of
freedom associated with the chi-square quantities to 2r 2 degrees of freedom.
(See ,9:2:1).
An inversion of the condence interval expression for (4.26) results in a
one-sided lower condence interval expression on y, the MTTF parameter:
P y _
2T
w
2
2r;(1C)
_ _
.,,.
y
L
_
_
_
_
_
_
_
_
= C (4:27)
To obtain condence interval limits on other reliability metrics, we begin
with the C = 1 a limits on y:
P(y
L
_ y) _ C
which, in turn may be used to develop a one-sided, C*100% lower condence
limit on R(t):
P exp(t=y
L
)
.,,.
R
L
_ exp(t=y)
.,,.
R
_
_
_
_
_
_
_ C (4:28)
For percentiles of the survival distribution, t
R
= l
1
ln R = y ln R, and
so we multiply the condence bounds on y by ln R:
P(y
L
_ y) _ C
P y
L
ln R
.,,.
t
R;L
_ y ln R = t
R
_
_
_
_
_
_
_ C (4:29)
Example 4-6
We wish to be C = 90% condent of meeting a l = 0:2%=Khr specication
(MTTF = 500Khr). We can run the test for 5000 hr, and we agree to allow up to
r = 5 failures. What sample size is needed?
Solution: This is a type I censored test. If we assume that r _n, and we
discount the possibility of a large number of suspensions, then T - 5nKhr. For a
one-sided condence interval on l, we make use of Eq. (4.27), adjusted for a type
I test:
y
L
-
2T
w
2
2r2;1C
= 500 Khr with r = 5 and w
2
2*52;0:10
= 18:55
Accordingly, we need to solve
y
L
-
2T
w
2
2r2;1C
=
2 5 n Khr
18:55
= 500 Khr =n = 500 18:55=10 = 928
This is a very large sample-size requirement. Increasing the risk, 1 C, or
the test time, or both can reduce the sample-size requirements. Adjustments in the
allowed number of failures may also be considered, but its effect on both the
numerator and denominator must be considered.
4.7.3 Use of Hazard Plots
Method 1: Accumulating Inverse Ranks for Complete Samples
The hazard plot is a plot of the cumulative hazard function, H(t), versus usage, t.
From Table 3-5
H(t) = l t (4:30)
Accordingly, the hazard plot is a very useful tool for judging the adequacy
of an exponential t. For complete samples, H(t) is just the sum of the inverse
ranks of the recorded failures.
H(t)
_
t
0
l(t
/
)dt -
i
l(t
i
) Dt
i1
where Dt
i1
= t
i1
t
i
(4:31)
But
^
ll(t) =
1
(n 1 i) Dt
i1
for t
i
< t _ t
i1
=H(t) -
i
1
(n 1 i) Dt
i1
_ _
Dt
i1
=
i
1
(n 1 i)
_ _
Example 4-7: Hazard plot for complete data set
An illustration of the construction of a hazard plot is presented in Figure 4-14 for
the sample data set of Table 4-7. The cumulative hazard function was formed by
accumulating the inverse ranks.
Method 2: Use of Natural Log of R(t)
For multiply censored data sets, we make use of the empirical relationship
^
HH(t) = ln
^
RR(t) (4:32)
Equation (4.32) is simply a logarithmic transform of the complement of the
rank estimator of F(t). The adequacy of the linear t on the hazard plot can be
used to assess whether or not l(t) might be increasing or decreasing rather than
FIGURE 4-14 Cumulative hazard plot of an exponential data set.
TABLE 4-7 Sample Exponential Data
Time Inv. rank Hazard value
139 8 0.125
271 7 0.268
306 6 0.435
344 5 0.635
553 4 0.885
1020 3 1.218
1380 2 1.718
2708 1 2.718
constant (see Figure 4-15). A convex t is indicative of an increasing hazard-rate
function; a concave t is indicative of a decreasing hazard-rate function.
Example 4-8: Worked-out example (multiply failure censored data)
A test consisting of n = 16 electromechanical components was conducted. The
test was stopped at 1510 cycles after r = 10 failures (see Table 4-8). One test item
was suspended from the test early due to reasons not associated with the study. Its
censoring time is marked by a next to its entry. The data is presented in
Table 4-8. The adjusted ranks are shown along with the empirical reliability
function,
^
RR(t), based on use of the mean rank formula. The calculations for
FIGURE 4-15 Adequacy of linear t of hazard plot.
TABLE 4-8 Multiply Failure-Censored Data Set (n = 16; r = 10)
Rank T (cycles) Rev Rank Adj. Rank
^
RR(t)
^
HH(t)
1 21 16 1 0.941 0.061
2 188 15 2 0.882 0.125
3 342 14 3 0.824 0.194
4 379 13 4 0.765 0.268
5 488 12 5 0.706 0.348
6 663 11
7 768 10 6.09 0.642 0.444
8 978 9 7.18 0.578 0.549
9 1186 8 8.27 0.513 0.667
10 1361 7 9.36 0.449 0.800
11 1510 6 10.46 0.385 0.954
^
HH(t) = ln
^
RR(t) are shown along with a plot of the cumulative hazard function in
Figure 4-16.
Analysis. In Table 4-8 the ranks are adjusted for suspensions. H(t) is
estimated with the use of Eq. (4.32). The hazard plot is presented in Figure 4-16.
The t appears to be adequate. An inverse, linear regression through the origin
was run. The output from the regression is shown in Figure 4-17. The tted model
is
t = 1662
^
HH(t)
Thus,
^
ll
1
= MTTF = 1662 cycles. The model appears to be a very good
t ( p-value of 0.000). Residual analysis should be conducted to verify the
adequacy of the hazard-rate model. The residuals represent unexplained differ-
ences between t
i
and 1662
^
HH(t). A plot of the residuals versus t is also contained
in Figure 4-17. The residuals appear to uctuate randomly about zero, which is an
indication of an adequate t.
ML estimates of l and the MTTF were obtained with the use of Eq. (4.25):
^
ll =
r
T
=
10
21 188 342 379 488 663 768
978 1186 1361 1510 (5 1510)
=
10
15;435
= 0:000648 cycles
1
=
^
yy = 1=0:000648 cycles
= 1543:5 cycles
FIGURE 4-16 Cumulative hazard plot of data in Table 4-8.
A one-sided lower 95% condence limit on the mean time-to-failure was
calculated based on the use of Eq. (4.27):
y
L
=
2T
w
2
2r;0:05
=
2(15;435)
31:4
= 983:1 cycles
This means that we can assign a level of condence, C = 95%, that our
MTTF is at least 983.1 cycles or greater.
FIGURE 4-17 Regression Analysis of Hazard Plot Data [Minitab
1
output].
We also constructed a 95% one-sided, lower condence limit on the B
10
life
or t
0:90
. From Eq. (4.29), the lower condence interval on t
0:90
is given by
t
0:90;L
= y
L
ln R = 983:1 ln 0:90 = 103:2 cycles
Thus, we assign a condence level of 0.95 that our B
10
life or t
0:90
exceeds 103.2
cycles.
4.8 THREE-PARAMETER WEIBULL
ML Estimation
The task of obtaining ML parameter estimates of the three-parameter Weibull
distribution is quite challenging. The difculties of ML estimation are discussed
in Chapter 9, ,9:1:5. In general, the solution to the ML equations may have two or
no solutions (see Lawless, 1982, p. 192). Additionally, the parameter estimates
for d-values in the neighborhood of t
1
are unstable, particularly for b < 1. For a
wide range of conditions, the choice to set
^
dd = t
1
, or just slightly less than t
1
,
makes a lot of sense in that the likelihood function is quite at around these
values of d. Another way to handle ML estimation is to nd conditional ML
estimates of y and b for a range of d-values, and then to choose the d-value that
results in the lowest overall value of the likelihood function. This procedure is
also described in Chapter 7.
Parameter estimation can be readily conducted with the use of Weibull
probability plotting techniques. We know that a practical limitation on d is for d
not to exceed the rst-order statistic, t = t
1
. Graphically, we determine the value
of d that provides the best linear t to the data. Regression techniques can be used
to nd a suitable value for d, by which the data is scaled by subtracting its value
from each time of failure. The resultant data set may then be tted using
TABLE 4-9 Grinding Wheel Life
Wheel number Pieces per wheel Adj. life data pieces19,600
1 22,000 2400
2 25,000 5400
3 30,000 10,400
4 33,000 13,400
5 35,000 15,400
6 52,000 32,400
7 63,000 43,400
8 104,000 84,400
conventional two-parameter Weibull techniques. This approach is illustrated in
the following worked-out example.
Example 4.9: Grinding wheel life (Kapur and Lamberson, 1977, p. 313)
A data set from Kapur and Lamberson (1977) is presented in Table 4-9 on
grinding wheel life in pieces. AWeibull plot of the median ranks is displayed as
the uncorrected plot in Figure 4-18. Note the curvature of the t. To correct for
FIGURE 4-18 Weibull plots of grinding wheel life data both before and after scaling data
by
^
dd = 19;600 pieces
this, the data was rescaled using a minimum life,
^
dd, of 19,600. This value was
approximated using 90% of t
1
= 0:9 22;000 pieces = 19;600. In Figure 4-18
the rescaled data is plotted on Weibull paper. Note the improved t!
The estimated parameters from WinSmith
TM
are as follows:
^
yy = 24;500 pieces:
^
bb = 0:84:
^
dd = 19;600 pieces:
FIGURE 4-18 (continued )
FIGURE 4-19 Use of Microsoft Excel
1
to estimate
^
dd using MSE regression criteria. Note use of slope and intercept functions
with array formula to develop rank regression estimates.
A second estimate of d was obtained using an inverse regression approach.
y = ln(t) was regressed upon x = ln(ln(1
^
FF(t)). Excel was used to evaluate
the t for a range of d-values in [0; 22;000). The mean square error (MSE) was
used to evaluate the t. It is a smaller-the-better characteristic. The MSE was
calculated as the sum of the squared differences between actual and predicted
values of ln(t
i
d):
MSE =
n
i=1
ln(t
i
d) ln
^
yy b
1
ln(ln(1
^
FF(t
i
))
_ _
2
n 2
(4:33)
The Excel spreadsheet with accompanying Cartesian plot of MSE versus d
is presented in Figure 4-19. The minimum value of the MSE statistic of
MSE = 0:2052 occurred at
^
dd = 20;500 pieces. The inverse rank regression
estimates of the three Weibull parameters were identied as
^
yy = 25;079 pieces:
^
bb = 0:860:
^
dd = 20;500 pieces:
It is also interesting to identify the conditional ML estimates of y and b
given d = 20;500. They are
^
yy = 24;597 pieces:
^
bb = 0:9653:
The range of reported estimates of b and d must perplex the reader. This is
not unexpected for several reasons:
1. Graphical best straight-line ts are subjective in nature. In particular,
the estimate of the slope, b, is very difcult.
2. Regression ts differ greatly from ML estimates. Regression ts are
greatly inuenced by values in the lower tail region as explained in
,4:2.
3. ML estimates tend to be biased (see Abernethy, 1996).
4.9 EXERCISES
1. Use Excel to generate n = 30 Weibull data values for b = 1:25 and
y = 10;000 hr. Failures truncate at r = 26. To do this, do the following:
a. Use rand ( ) function to generate 30 (U) random uniform values in the
interval (0,1).
b. Use inverse Weibull function, t = y (ln(1 U))
1=b
.
c. Replace t
27
; t
28
; t
29
, and t
30
by t
26
.
2. Develop ML estimates for the data set of Exercise 1.
a. Create a Weibull plot of the data.
b. Use Minitab or Reliasoft Weibull software or a program of your
choice to generate point and 95% condence limits on y; b, and the
B
10
(t
0:90
) life.
3. We are given the following life data:
Time
40
43
50 suspended
110
150
Test the t against a lognormal distribution by doing the following:
a. Plot the logged values on normal probability plotting paper.
b. Estimate the s and t
med
parameters of the lognormal distribution.
c. Estimate B
10
life.
d. Generate inverse rank regression values:
x = F
1
(
^
FF(t)) and y = ln(t)
e. Use simple linear regression to develop point estimates of t
med
and s.
4. Given the following exponential life data set for n = 10 prototype items (
denotes a censored reading), estimate the following:
63 313 752 951 1101 1179 1182 1328 1433 2776
a. Assuming an exponential distribution, estimate y, the MTTF parameter.
b. Develop a lower 90% condence limit on y and B
10
life.
c. Develop a lower 90% condence limit on R(1433).
5. Given the Minitab output below:
1. Find
^
yy and
^
bb.
2. Find 95% condence interval on b.
3. Find 95% condence interval on t
0:10
.
Minitab output:
Distribution Analysis: Weibull
Variable: Weibull
Censoring Information Count
Uncensored value 20
Estimation Method: Maximum Likelihood
Distribution: Weibull
Parameter Estimates
Standard 95.0% Normal CI
Parameter Estimate Error Lower Upper
Shape 0.8958 0.1590 0.6326 1.2686
Scale 740.7 194.7 442.4 1240.1
Log-Likelihood = -152.996
Characteristics of Distribution
Standard 95.0% Normal
CI
Estimate Error Lower
Upper
Mean (MTTF) 781.3562 195.4084 478.6004 1275.631
Standard Deviation 873.8590 279.1305 467.2487 1634.311
Median 492.0013 144.6176 276.5453 875.3188
First Quartile (Q1) 184.3508 76.2000 81.9988 414.4602
Third Quartile (Q3) 1066.596 266.9132 653.1134 1741.852
Interquartile Range (IQR) 882.2450 224.1208 536.2342 1451.523
Table of Percentiles
Standard 95.0% Normal CI
Percent Percentile Error Lower Upper
1 4.3604 4.4691 0.5849 32.5047
2 9.5063 8.4739 1.6568 54.5463
3 15.0334 12.2313 3.0515 74.0636
4 20.8458 15.8136 4.7130 92.2019
5 26.8978 19.2596 6.6104 109.4467
6 33.1624 22.5943 8.7238 126.0630
7 39.6224 25.8350 11.0391 142.2163
8 46.2661 28.9949 13.5461 158.0192
9 53.0851 32.0843 16.2373 173.5533
10 60.0738 35.1113 19.1066 188.8804
20 138.8314 63.0962 56.9682 338.3317
30 234.3477 89.1138 111.2193 493.7887
40 349.9454 115.4453 183.3129 668.0479
50 492.0013 144.6176 276.5453 875.3188
60 671.8435 180.6366 396.6505 1137.963
70 911.2558 231.2580 554.1442 1498.504
80 1259.957 314.8393 772.0699 2056.150
90 1879.245 494.1186 1122.466 3146.254
APPENDIX 4A MONTE CARLO ESTIMATION
Monte Carlo (MC) simulation approaches are useful for approximating stochastic
relationships when no known exact expression is available. MC methods are
computer-intensive, relying on a stream of pseudo-random numbers to simulate
realizations from a distribution. In our application we wish to devise condence
intervals on parameters and reliability metrics of interest based on multiply
censored data sets. For many (log-)location-scale distributions, it is not possible
to derive closed-form expressions for the sampling distribution of the maximum
likelihood estimates for parameters, y and s, in the presence of censoring. In such
cases, the condence intervals of interest must be approximated.
Given the power of todays desktop computers, MC simulation is becoming
a popular way to form this approximation. For example, the WinSmith Weibull
analysis software now includes built-in capabilities for devising MC-based
condence intervals. Assuming a data set of size n consisting of r recorded
failures, MC samples are drawn from a hypothetical (log-)location-scale distribu-
tion with parameters y =
^
yy and s = ^ ss. This is a form of parametric bootstrap
sampling (see Hjorth, 1994, ,6:1, and Meeker and Escobar, 1998).
APPENDIX 4A.1 MONTE CARLO SIMULATION
STRATEGY
A generalized procedure for Monte Carlo (MC) simulation is illustrated in Figure
4-20. An effective simulation strategy must mimic as best as possible the
underlying process that leads to an evolution of failures and censored observa-
tions. A specialized strategy is required for each type of censoring scenario.
Abernethy (1996) accomplishes this rather effectively, but crudely, by omitting all
random suspensions items from the data and then treating the resultant data
subset as a complete data set. The resultant subset is t to the distribution of
interest. After each MC sample, the suspended readings are added back.
A summary describing a comprehensive strategy for common censoring
scenarios is presented in Figure 4-20.* If data is time-censored, then any MC
times beyond the censoring time, t*, are reset to t and labeled as right-censored.
Similarly, if data is failure-censored, then any MC times beyond t
r
, the rth
recorded failure, are reset to time t
r
and labeled as right-censored. If the data set
contains a small number of random suspensions, then the method of Abernethy
* With careful thought, the simulation strategy outlined by Figure 4-20 can be applied to a wide variety
of test plans. For example, with sudden-death testing, there are k groups, each run until a single
failure is recorded. In this case the simulation should be run as if there are independent, failure-
censored samples. The data is then aggregated after the simulation into one larger group for analysis.
(1996) is recommended. Otherwise, if the data set contains a large number of
suspended items, with r _n, then alternate strategies should be pursued. For
competing failure modes, a censoring distribution might be t to the censored
values, and MC simulation used to generate values from it. Meeker and Escobar
(1998, ,4:13:3) provide an efcient algorithm for generating MC values in such
cases.
The use of pivotal quantities is sometimes helpful to reduce the demands of
Monte Carlo simulation. The latest release of WinSmith provides this capability.
For the Weibull distribution, Lawless (1982, p. 147) recommends the use of the
parameter-free, scaled quantities:
Z
1
=
ln
^
yy ln y
^
bb
_ _
Z
2
=
^
bb
b
_ _
and Z
3
=
ln
^
yy ln y
b
_ _
For the (log-) normal distribution, Lawless (1982) recommends the use of
the following scaled quantities:
Z
1
=
^ mm m
ss
_ _
Z
2
=
^ ss
s
_ _
and Z
3
=
^ mm m
s
_ _
A key advantage in working with pivotal quantities is that you are able to
resample from the standard normal distribution. WinSmith software provides the
option of conducting MC simulation on pivotal quantities. The theory of their use
is outlined by Lawless (1974, 1982). The use of pivotal quantities also allows for
direct approximation of the condence limits.
FIGURE 4-20 Monte Carlo simulation strategy.
APPENDIX 4A.2 WORKED-OUT EXAMPLES
Example 4-10: Normal data set of Table 4-1
We now present the details for the development of MC condence intervals on
the bearing life normal data set of Table 4-1. The ML estimates of m and s were
^ mm = 85:564 and ^ ss = 8:714 (see Figure 4-9).
A Monte Carlo simulation was conducted using m = 85:478 and s =
8:713. According to the procedures outlined by Figure 4-20 and Table 4-10, we
repeatedly simulated observations from an N(85:564; 8:714
2
) distribution based
on a sample of n = 15 points. An Exec Macro in Minitab was used to accomplish
TABLE 4-10 Monte Carlo Simulation Strategy for Different Censoring Scenarios from
G(
^
yy; ^ ss) Distribution
No.
failures
Time-censored
at t = t
/
;
c
1
time-
censored
Failure-
censored
at t = t
r
;
c
1
failure-
censored
Random
censoring c
2
removed
early Simulation strategy
1 n No No No n items from G(
^
yy; ^ ss).
2 r Yes No No n items from G(
^
yy; ^ ss); truncate at
t = t* and set censor ag
accordingly.
3 r No Yes No n items from G(
^
yy; ^ ss); truncate
all failure times beyond t = t
r
,
and set censor ag
accordingly.
4 n c
2
No No c
2
items If c
2
is small (_ 4), keep c
2
observations and generate
n c
2
items from G(
^
yy; ^ ss).
5 n c
1
c
2
Yes No c
2
items If c
2
2
n c
2
items from G(
^
yy; ^ ss);
truncate at t = t*, and set
censor ag accordingly.
6 r No Yes c items If c
2
2
n c
2
items from G(
^
yy; ^ ss);
truncate all failure times
beyond t = t
r
, and set
censor ag accordingly.
this.* Any realization that exceeded t = 90 was truncated to its time-censored
value of 90. The Minitab Stat > Reliability=Survival > Parametric Right
Censored procedure was run on each simulated sample to drive ML estimates
of m; s, and t
0:90
. The macro was executed 1000 times, and descriptive statistics
on the 1000 realizations of ^ mm; ^ ss, and t
0:90
were generated.
The Exec Macro is shown in Figure 4-21. Descriptive statistics on the 1000
ordered parameter estimates were used to generate MC-based percentile con-
dence limits on m and s. They are summarized in Table 4-11. Note the increased
width of the MC intervals compared to the asymptotic ML intervals that were
presented in Figure 4-9. In general, ML-based condence intervals are less
conservative than MC intervals.
FIGURE 4-21 Minitab Exec Macro to generate Monte Carlo interval estimates of para-
meters and reliability metrics associated with sample, time-censored data set.
* The Exec Macro procedure in Minitab constitutes a rst-generation macro capability. It is a batch le
of session commands. Minitab has extended this earlier capability immensely, adding formal local
and global macro capabilities.
Example 4-11: Weibull data set of Table 4-2
The data set of Table 4-2 consists of n = 20 observations. It is assumed that early
observations at t = 30 and t = 35 are suspended readings due to causes that are
beyond the scope of the study. Additionally, there are two suspended readings at
t = 200, the time at which the test was stopped. Therefore, we adopt a strategy
outlined by Table 4-10 with n = 20; c
1
= 2, and c
2
= 2. There are r = 16
failures in all.
From Figure 4-12 the ML estimates of y and b are
^
yy = 132:8K revs:
^
bb = 1:785:
To initialize the MC simulation, we rst delete the two early suspensions
from the data set and develop ML estimates of y and b for the reduced data set.
They are
^
yy (reduced) = 131:73K revs:
^
bb (reduced) = 1:751:
A global macro in Minitab was developed to sample 1000 times from this
hypothetical distribution.* It is presented in Figure 4-22. Eighteen observations
were randomly simulated from a Weibull (131.73, 1.751) distribution. Any
observations beyond t = 200Krevs were treated as time-censored observations
at t = 200Krevs. The two suspensions at t = 30 and 35 were incorporated into
each simulated sample to bring the sample size back up to n = 20. The Minitab
Stat > Reliability=Survival > Parametric Right Censored procedure was called
during each simulation run, and ML estimates of y; b, and t
0:90
were stored. The
procedure was called 1000 times, and the 1000 realizations of
^
yy;
^
bb, and t
0:90
were
generated and stored. The output from the macro is summarized in Table 4-12.
TABLE 4-11 Monte Carlo Percentile Condence and Median
Values for m, s, and t
0:90
Parameter 5th percentile Median 95th percentile
m 79.737 85.551 89.964
s 4.008 8.113 12.2844
t
0:90
66.552 74.980 79.993
* It should be noted that this macro is a full-featured macro language, which differs considerably in its
capabilities from the Exec Macro illustrated by Figure 4-21. The Exec Macro is basically a batch
le of Minitab session commands.
FIGURE 4-22 Minitab global macro for simulating 1000 MC for Weibull data set of Table
4-2.
TABLE 4-12 Monte Carlo Percentile Condence and Median
Values for y; b, and t
0:90
Parameter 5th percentile Median 95th percentile
y 103.46 133.22 1667.28
b 1.313 1.842 2.739
t
0:90
22.98 39.38 62.44
APPENDIX 4B REFERENCE TABLES AND CHARTS
Ref 4-1 Normal plotting paper with standard Z units annotated on Y-axis.
Ref 4-2 Sample Weibull plotting paper. (From Ford, 1972.)

CH 04 - Overview of Estimation Techniques

Încărcat de

Informații document

Descriere originală:

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

CH 04 - Overview of Estimation Techniques

Încărcat de

Drepturi de autor:

Formate disponibile

4

S-ar putea să vă placă și