Documente Academic
Documente Profesional
Documente Cultură
The Bootstrap
1 Introduction
The bootstrap is a method for estimating the variance of an estimator and for finding ap-
proximate confidence intervals for parameters. Although the method is nonparametric, it
can also be used for inference about parameters in parametric and nonparametric models.
2 Empirical Distribution
In other words, Pn puts mass 1/n at each Xi . Recall also that a parameter of the form
θ = T (P ) is called a statistical functionl and that the plug-in estimator is θbn = T (Pn ).
X1∗ , . . . , Xn∗ ∼ Pn .
Bootstrap samples play an important role in what follows. Note that drawing an iid sample
X1∗ , . . . , Xn∗ from Pn is equivalent to drawing n observations, with replacement, from the
original data {X1 , . . . , Xn }. Thus, bootstrap sampling is often described as “resampling the
data.”
3 The Bootstrap
Now we give the bootstrap algorithms for estimating the variance of θbn and for constructing
confidence intervals. The explanation of why (and when) the bootstrap works is mainly
deferred until Section 5. Let θbn = g(X1 , . . . , Xn ) denote some estimator.
1
Note that VarP (θbn ) is some function pf P (and n) so I have written VarP (θbn ) = Sn (P ). If
we knew P , we could approximate Sn (P ) by simulation as follows:
draw X1 , . . . , Xn ∼ P
compute θb(1) = g(X1 , . . . , Xn )
n
draw X1 , . . . , Xn ∼ P
compute θb(2) = g(X1 , . . . , Xn )
n
....
..
draw X1 , . . . , Xn ∼ P
compute θb(B) = g(X1 , . . . , Xn ).
n
(1) (B)
Let s2 be the sample variance of θbn , . . . , θbn . So
B B
!2
2 1 X b(j) 2 1 X b(j)
s = (θ ) − θ .
B j=1 n B j=1 n
Since we can take B as large as we want, we have that s2 ≈ VarP (θbn ). In other words, we
can approximate Sn (P ) by repeatedly simulating n observations from P .
But we don’t know P . So we estimate Sn (P ) with Sn (Pn ) where Pn is the empirical dis-
tribution. Since Pn is a consistent estimator, we expect that Sn (Pn ) ≈ Sn (P ). In other
words:
or in other words
\
VarP (θbn ) = VarPn (θbn ).
But how do we compute Sn (Pn )? We use the simulation method above, except that we
simulate from Pn instead of P . This leads to the following algorithm:
2
Bootstrap Variance Estimator
3. Compute: v
u B
u1 X
sb = t (θb∗ − θ)2
B j=1 n,j
1
PB b∗
where θ = B j=1 θn,j .
4. Output sb.
The are two sources of error in this apprixmation. The first is due to the fact that n is finite
and the second is due to the fact that B is finite. However, we can make B as large as we
like. (In practice, it usually suffices to take B = 10, 000.) So we ignore the error due to finite
B.
s2 P
Theorem 1 Under appropriate regularity conditions, Var(θbn )
→ 1 as n → ∞.
Now we describe the confidence interval algorithm. This will look less intuitive than the
variance estimator; I’ll eplain it in Section 5.
3
Bootstrap Confidence Interval
3. Let
B
1 X √ b∗
F (t) =
b I n(θn,j − θn ≤ t).
b
B j=1
4. Let
t1−α/2 tα/2
Cn = θbn − √ , θbn − √
n n
where tα/2 = Fb−1 (α/2) and t1−α/2 = Fb−1 (1 − α/2).
5. Output Cn .
4 Examples
4
0.5
●
●
●
● ●
●
0.0
● ● ● ●
● ●
● ● ● ●
●
● ●
● ● ●
● ●
●
●
● ● ●
●
●
−0.5
●
●
●
●
●
●
● ●
● ●
●
●
−1.0
●
●
●●
Figure 1: 50 points drawn from the model Yi = −1 + 2Xi − Xi2 + i where Xi ∼ Uniform(0, 2)
and i ∼ N (0, .22 ). In this case, the maximum of the polynomail occurs at θ = 1. The
true and estimated curves are shown in the figure. At the bottom of the plot we show the 95
percent boostrap confidence interval based on B = 1, 000.
where Ω = Σ−1 and Σ is the covariance matrix of W = (X, Y, Z)T . The partial correla-
tion measures the linear dependence between X and Y after removing the effect of Z. For
illustration, suppose we generate the data as follows: we take Z ∼ N (0, 1), X = 10Z +
and Y = 10Z + δ where , δ ∼ N (0, 1). The correlation between X and Y is very large.
But the partial correlation is 0. We generated n = 100 data points from this model. The
sample correlation was 0.99. However, the estimate partial correaltion was -0.16 which is
much closer to 0. The 95 percent bootstrap confidence interval is [-.33,.02] which includes
the true value, namely, 0.
To explain why the bootstrap works, let us begin with a heuristic. Let
√
Fn (t) = P( n(θb − θ) ≤ t).
5
If we knew Fn we could easily construct a confidence interval. Let
t1−α/2 b tα/2
Cn = θ − √ , θ − √
b
n n
where tα = Fn−1 (α). Then
t1−α/2 tα/2
P(θ ∈ Cn ) = P θ − √ ≤ θ ≤ θ − √
b b
n n
√
= P(tα/2 ≤ n(θb − θ) ≤ t1−α/2 ) = Fn (t1−α/2 ) − Fn (tα/2 )
α α
= Fn (Fn−1 (1 − α/2)) − Fn (Fn−1 (α/2)) = 1 − − = 1 − α.
2 2
Usually, Fn will be close to some limiting distribution L. Similarly, Fbn will be close to some
limiting distribution L.
b Moreover, L and L b will be close which implies that Fn and Fbn are
close. In practice, we usually approximate Fbn by its Monte Carlo version
B
1 X √ b∗ b
F (t) = I( n(θj − θj ) ≤ t).
B j=1
Now we will give more detail in a simple, special case. Suppose that X1 , . . . , Xn ∼ P where
Xi has mean µ and variance σ 2 . Suppose we want to construct a confidence interval for µ.
is close to Fn .
6
√
O(1/ n)
Fn L
√
OP (1/ n)
Fbn √ L
b
OP (1/ n)
√
O(1/ B)
√
Figure 2: The distribution Fn (t) = P( n(θbn − θ) ≤ t) is close to some limit distribution
√
L. Similarly, the bootstrap distribution Fbn (t) = P( n(θbn∗ − θbn ) ≤ t|X1 , . . . , Xn ) is close to
some limit distribution L.
b Since Lb and L are close, it follows that Fn and Fbn are close. In
practice, we approximate Fbn with its Monte Carlo version F which we can make as close to
Fbn as we like by taking B large.
7
To prove this result, let us recall that Berry-Esseen Theorem.
sup |Fbn (t) − Fn (t)| ≤ sup |Fn (t) − Φσ (t)| + sup |Φσ (t) − Φσb (t)| + sup |Fbn (t) − Φσb (t)|
t t t t
= I + II + III.
Let Z ∼ N (0, 1). Then, σZ ∼ N (0, σ 2 ) and from the Berry-Esseen theorem,
√
I = sup |Fn (t) − Φσ (t)| = sup P n(bµn − µ) ≤ t − P (σZ ≤ t)
t t
√
n(b
µ n − µ) t t 33 µ3
= sup P ≤ −P Z ≤ ≤ √ .
t σ σ σ 4 σ3 n
b3 = n1 i=1 |Xi − µ bn |3 is the empirical third moment. By the strong law of large
P
where µ
numbers, µ b3 converges almost surely to µ3 and σ b converges almost surely to σ. So, almost
4µ
b3 ≤ 2µ3 and σ
surely, for all large n, µ b ≥ (1/2)σ and III ≤ 33 4
√ 3 . From the fact that
n
p p
b − σ = OP ( 1/n) it may be shown that II = supt |Φσ (t) − Φσb (t)| = OP ( 1/n). (This may
σ
be seen by Taylor expanding Φσb (t) around σ.) This completes the proof.
We have shown that supt |Fbn (t) − Fn (t)| = OP √1n . From this, it may be shown that, for
each 0 < β < 1, tβ − zβ = OP √1n . From this, one can prove Theorem 2.
So far we have focused on the mean. Similar theorems may be proved for more general
parameters. The details are complex so we will not discuss them here. More information is
in the appendix. See also Chapter 23 of van der Vaart (1998).
8
6 The Parametric Bootstrap
The bootstrap can also be used for parametric inference. Suppose that X1 , . . . , Xn ∼ p(x; θ).
Let θb be the mle. Let ψ = g(θ) and ψb = g(θ).
b To estimate the standard error of ψb we could
find the Fishher information followed by the delta method.
Alternatively, we simply compute the standard deviation of the bootstrap replications ψb1∗ , . . . , ψbB .
The only difference is that now we draw from the bootstrap sampples from p(x; θ).b In other
words:
X ∗ , . . . , X ∗ ∼ p(x; θ).
1 n
b
1. The bootstrap is nonparametric but it does require some assumptions. You can’t
assume it is always valid. (See the appendix.)
2. The bootstrap is an asymptotic method. √ Thus the coverage of the confidence interval
is 1 − α + rn where, typically, rn = C/ n.
3. There is a related method called the jackknife where the standard error is estimated by
leaving out one observation at a time. However, the bootstrap is valid under weaker
conditions than the jackknife. See Shao and Tu (1995).
5. There are many cases where the bootstrap is not formally justified. This is especially
true with discrete structures like trees and graphs. Nonethless, the bootstrap can be
used in an informal way to get some intuition of the variability of the procedure. But
keep in mind that the formal guarantees may not apply in these cases. For example,
see Holmes (2003) for a discussion of the bootstrap applied to phylogenetic tres.
6. There is a method related to the bootstrap called subsampling. In this case, we draw
samples of size m < n without replacement. Subsampling produces valid confidence
intervals under weaker conditions than the bootstrap. See Politis, Romano and Wolf
(1999).
9
7. There are many modifications of the bootstrap that lead to more accurate confidence
intervals; see Efron (1996).
8. There is a version of the bootstrap that works in high dimensions. We discuss this in
10/36-702.
8 Summary
The bootstrap provides nonparametric standard errors and confidence intervals. To draw
a bootstrap sample we draw n observations X1∗ , . . . , Xn∗ from the empirical distribution
Pn . This is equivalent to drawing n observations with replacement from the original daa
X1 , . . . , Xn . We then compute the estimator θb∗ = g(X1∗ , . . . , Xn∗ ). If we repeat this whole
process B times we get θb1∗ , . . . , θB
∗
. The standard deviation of these values approximates the
stanard error of θn = g(X1 , . . . , Xn ).
b
9 References
Efron, Bradley and Tibshirani, Robert. (1994). An introduction to the bootstrap. CRC
press.
Appendix
Hadamard Differentiability. The key condition needed for the bootstrap is Hadamard
differentiability. Let D and E be normed spaces and let T : D → E. We say that T is
Hadamard differentiable at P ∈ D if there exists a continuous linear map Tp0 : D → E such
that
T (P + tQt ) − T (P ) 0
− TP (Q) → 0
t R
whenever t ↓ 0 and Qt → Q.
10