Documente Academic
Documente Profesional
Documente Cultură
November, 2004
Abstract
The paper explores prediction of the curve-valued autoregression processes. It develops a novel technique,
the predictive factor decomposition, for the estimation of the autoregression operator. The technique is
designed to be better suited than the principal components method for the prediction purposes. The
consistency of the predictive factor estimates is proved. The new method is illustrated by an analysis of the
dynamics of the term structure of Eurodollar futures rates.
1. Introduction
The statistical analysis of problems from different disciplines increasingly relies on
functional data, where each observation is a curve as opposed to a finite-dimensional vector. An
up to date treatment of the functional data analysis, that includes many interesting examples, is
Ramsay and Silverman (1997). In this paper we consider a situation, in which the data generating
process is a functional autoregression:
(1) f t h f t t h .
1
skarguine@cornerstone.com ;
Cornerstone Research, 599 Lexington Avenue floor 43, New York, NY 10022
2
ao2027@columbia.edu;
Economics Department, Columbia University, 1007 International Affairs Building MC 3308, 420 West
118th Street , New York, NY 10027
3
Appendix A briefly describes the formalism of Hilbert space valued random variables and explains how it
relates to a more familiar language of random processes.
1
forecasting may be erroneous. The new evidence indicates that the best predictors of interest rate
movements are among those factors that do not contribute much to the overall interest rate
variation. This observation motivates much of the general discussion below.
When the data generating process is described by (1), forecasting f t h calls for estimation
of the infinitely-dimensional operator . Since only a finite number of data points is observed, a
dimension reduction technique is needed. In his book on functional autoregression, Bosq (2000)
proposes to estimate operator by reduction to the subspace spanned by the first eigenvectors
of the sample covariance operator4. We argue that this method is essentially the familiar principal
components method and is not well suited for forecasting. The reason is that the largest
eigenvectors of the covariance operator for f t may have nothing to do with the best predictors of
f t h in the same way as the factors explaining most of the interest rate variation may have less
2
that generalized eigenvalue problems often arise in different research areas, our consistency result
has an independent interest.
As an application, we illustrate the method using ten years of data on Eurodollar futures
contracts. Consistently with previous research, we find that the best predictors of future interest
rates are not among the largest principal components but are hidden among the residual
components.
Meant to be an illustration of the predictive factors technique, our empirical analysis has
several limitations. We do not attempt to use non-interest rate macroeconomic variables for
interest rate forecasting. We do not aim to derive implications of the interest rate predictability for
the control of the economy by interest rate targeting. We also do not address the question whether
financial portfolios that correspond to the predictable combinations of interest rates generate
excess returns that cannot be explained by traditional risk factors. Overcoming these limitations
would be a separate research effort.
The rest of the paper is organized as follows. The principal component method of
estimation of the functional autoregression operator is described in section 2. The predictive
factor analysis is in Section 3. The data are described in Section 4. The results of estimation of the
predictive factors for the interest rate curve are in Section 5. And Section 6 concludes.
In this paper, we focus on prediction of curves f t (x ) that belong to the Hilbert space of the
square summable functions of x [0, X ] . We assume that the curve dynamics is governed by a
stationary functional autoregression (1). According to theorem 3.1 of Bosq (2000), the
stationarity is guaranteed by the following
curve f t and 21 be the cross-covariance operator for curves f t and f t h . It is easy to see
that the following useful operator relationship holds:
(2) 12 11 .
To estimate , it is tempting to substitute the covariance and cross-covariance operators
with their estimates in (2) and solve the resulting equation for . Unfortunately, this will not
work. Indeed, the empirical covariance and cross-covariance operators are
3
n
( x) 1 f , x f , 1 nh
11
n i 1
i i 12 ( x) f i , x f ih ,
n h i 1
where , denotes the scalar product in L2 .
projecting on principal components of 11 . The idea is to determine how the operator acts
on those linear combinations of f t that have the largest variation. In more detail, denote the span
~ ~ ~ 1
Note that is 12 11 on H k , and zero on the orthogonal complement to H k . The
n n
claim is that under certain assumptions on the covariance operator, this estimator is consistent.
Here is the precise result.
Assumption 2 All eigenvalues of 11 are positive and distinct.
Assumption 3 The first k n eigenvalues of 11 are positive for any n , almost surely.
1
Let a1 (1 2 ) 1 , and a i max (i 1 i ) , (i i 1 )
1
for i 1 , where i are
Theorem 1. Suppose that assumptions 1, 2, and 3 hold, that process f t has finite fourth
kn
a
1
k n j O n1 / 4 (log n) ,
1
we have:
~n L2
0 a.s.
4
Remark: The conditions of the theorem require that the eigenvalues of the covariance matrix do
not approach zero too fast, and that the eigenvalues be not too close to each other.
Proof: This is a restatement of Theorem 8.7 in Bosq (2000).
While consistent, the principal component estimation method may perform very badly in
small samples if the best predictors of the future evolution have little to do with the largest
principal components. To see why, consider a k -factor version of Vasicek (1977) interest rate
model. Let the spot interest rate, rt , be a sum of k independent factors z it that follow an
Ornstein-Uhlenbeck process:
rt i 1 z it ,
k
As explained by Dybvig (1997), the forward rate curve in such a model will be simply a sum
of the forward rate curves implied by the single-factor models based on z it . Therefore, for the
forward rate curves (net of their means) we have (see formula (29) of Vasicek (1977)):
f t ( x) i 1 z it i 1 e i x ,
k
where x denotes time to maturity of the forward contract.
Since the discrete time sampling of z it follows an autoregression:
z i ,t h i e i h z i ,t i i ,t h ,
i ~ i.i.d .N (0, si2 )
,
1 e 2 i h
s
2
i i
2
2 i
the model lands itself in the functional autoregression framework. We can, for example, define
the Hilbert space H as the space of functions on the positive semi-axis that are square integrable
with respect to exponential density e x so that the norm of an element of H has the following
form:
2
f e x f ( x ) 2 dx .
0
The functional autoregression operator is then equal to the composition of a projection on and
scaling along the subspace spanned by 1 e i x , i 1,..., k , and the strong H -white noise
5
t has a singular covariance operator with eigenvectors corresponding to non-zero eigenvalues
spanning the above subspace.
In this example we will ignore the estimation issues and simply assume that we observe all
the factors and are able to estimate well the parameters of the corresponding Ornstein-Uhlenbeck
processes. However, to illustrate the potential problem with the principal components method we
assume that we can only use r k factors for prediction and set the rest of the factors equal to
their mean. What factors should we use?
Let the loss from predicting f t 1 by ft 1 be
2
E f t 1 ft 1 . Since the factors are
independent, the reduction in the mean squared error due to the forecasting of factor z i must be
equal to the explained portion of variance of z i , Var z i Var i , times the squared norm of
11 : g ( x) i 1Var z i 1 e iu , g (u ) 1 e i x ,
k
Therefore, eigenvectors corresponding to non-zero eigenvalues of 11 are equal to
i2 i
(4) .
i 1 2 i 1
Hence, the principal components method would choose r factors to be used in the prediction
according to the ranking of (4).
The choice of the factors made by the principal components method may in principle be
very different from the optimal choice based on ranking of (3). For example, if factor z i has
huge instantaneous variance i and large mean reversion parameter i , it may well happen
2
6
that the principal components method would rank z i first to include, and the optimal method
would rank it last to include. In such a case, although z i would explain almost all variation in
the forward curve, its predictive power will be miniscule because z i lacks persistence. Factors
that better predict the curve would be hidden among more distant principal components.
Note that the optimal choice of factors depends on the horizon h of our forecasting
problem. When the horizon goes to infinity, the first factor becomes equal to the most persistent
factor. If the most persistent factor has small instantaneous variance then it is unlikely to be
captured by a few largest principal components of the curve variation.
The above example suggests that we might be better off by searching for good predictors
directly without first projecting a curve on the largest principal components. In the next section,
we develop a method that takes this suggestion seriously.
3. Predictive Factors
To start with, note that the principal components method is a particular way to approximate a
full-ranked by a reduced-rank operator. In general, a rank k approximation to has form
Ak Bk' ,
method would not choose the approximation optimally from the forecasting point of view.
'
We would like, therefore, to find Ak and Bk that minimize the mean squared error of the
prediction
2
(5) E f t 1 Ak Bk' f t min ,
elements on the diagonal. Fortier (1966) considers such problem in the static context, when
predictors are not the lagged values of the forecasted series, and calls the corresponding variables
Bk' f t simultaneous linear predictions. In what follows, we will call Bk' f t the first k
first predictive factor5 b1 ' f t and the first predictive factor loading a1 correspond to solution of
(5) for k 1 . The second predictive factor and factor loading are defined as solving the same
X
f ( x ) g ( x ) dx as f ' g and functionals
5
In what follows, we will denote scalar products like
0
7
problem subject to an additional constraint that b2 ' must be orthogonal to b1 ' in the metric
11 , that is b2 ' 11 b1 0 . And so on for the third, fourth, etc. factors and factor loadings.
Let us define an operator , whose properties are essential for the existence of the
predictive factors, as 11
1/ 2
' 111 / 2 . We will make the following assumption:
Assumption 2a All eigenvalues of are positive and distinct.
Appendix B proves the following:
If is compact, Ak Bk 0 as k .
'
iii) L2
Remark: For Ak and Bk to be well defined for a given k , it is enough to require that the
first k eigenvalues of are positive and distinct.
To build intuition let us return to the multifactor Vasicek model example. In that example,
12 : g ( x) i 1 Cov z it , z i ,t h 1 e i u , g (u ) 1 e i x .
k
Cov 2 z it , z i ,t h 2
The non-zero eigenvalues of 2112 11 are equal to 1 e i u , which
Var z it
is exactly equal to the ratio in (3) used to optimally rank the factors.
Theorem 2 relates the problem of finding optimal predictive factors to a generalized
eigenvalue problem. Its significance is twofold. First, it relates the problem of optimal prediction
to a well studied area of generalized eigenvalue problems. Second, it suggests a method for
estimation of the optimal predictive factors that proceeds by solving a regularized version of the
generalized eigenvalue problem.
It seems natural to estimate Ak and Bk by computing the eigenvectors of
21 12 11 and using theorem 2. Unfortunately, similarly to the situation with the canonical
8
variates studied by Leurgans, Moyeed and Silverman (1993), such a method of estimation would
be inconsistent and the corresponding estimators meaningless. It is because the predictive factors
are designed to extract those linear combinations of the data that have small variance relative to
their covariance with the next periods data. Linear combinations with small variance are poorly
estimated and a seemingly strong covariance (in relative terms) with the next periods data may
easily be an artifact of the sample.
Leurgans, Moyeed and Silverman (1993) deal with the problem for the canonical correlation
analysis by introducing a penalty for roughness of the estimated canonical covariates. We use the
same idea to obtain a consistent estimate of the predictive factors.
Let us denote the j -th eigenvalue and eigenvector of the operator pencils
b' 11 b
define j , and g i 81 (i i 1 ) .
1
min
bsp ( b1 ,...,b j ) b' b
Theorem 3 Suppose that assumptions 1 and 2a hold and that process f t has bounded support.
1 g n / log n
k n 1
1 1 / 2
If k n is chosen so that g k n i kn1 0 , then
i 1
ii) sup
j k
bj b j ' 11 bj b j 0 almost surely as n .
n
Remarks:
1) When f t does not have a bounded support but its fourth moment is finite, the theorem
9
2) Of course, what can be consistently estimated is not the eigenvector itself, but the
subspace generated by this eigenvector. For this reason, statement ii) holds for a particular choice
Corollary 1 Suppose that assumptions 1 and 2a hold and that process f t has bounded support.
i) k k 0 almost surely as n .
ii) bk bk ' 11 bk bk 0 almost surely as n .
The above corollary can be used to prove consistency of the estimates of the predictive
factors in the following sense. Suppose that we estimate a predictive factor, b j ' f t , where f t is
chosen at random from its unconditional distribution, by bj ' f t . Conditionally on our estimate
bj , a probability that the difference between the factor and its estimate is greater by absolute
Pr bj ' f t b j ' f t | bj 2Var bj b j ' f t | bj
.
2 bj b j ' 11 bj b j .
According to statement ii) of the corollary, this bound tends to zero almost surely as n .
Statement ii) also implies convergence in probability of our estimates of the predictive factor
12
12 bj 12 bj b j
Lemma 2 from appendix C implies that the first term in the above expression tends in probability
to 0. For the second term we have:
12 bj b j
11 bj b j
b b ' b
j j 11 j
b j 111 / 2 ,
which tends to zero almost surely according to statement ii) of the corollary.
10
In sum, corollary 1 essentially says that by maximizing a regularized Rayleigh criterion we
can consistently estimate the factors having largest predictive power, the corresponding factor
loadings, and the reduction in the mean squared error achievable by using the factors. Hence, the
concept of predictive factors can be effectively used for the data exploration purposes and may
provide researchers with a practically more efficient tool of the finite-dimensional approximation
relative to the principal components.
Moreover, when the number of the observed curves and the number of the predictive factors
estimates used to approximate the autoregressive operator go to infinity simultaneously, the
predictive power of the approximation converges to the theoretical maximum. Although, as
theorem 1 implies, the same is true for the principal components, it is comforting to realize that
the predictive factors technique is not handicapped in this respect.
Theorem 3 can be used to establish a precise result. Suppose that f t is chosen at random
from its unconditional distribution and the task is to forecast f t h , given f t . The best, but
B
infeasible, forecast is f t . We approximate this forecast by A ' f , where
t
B [b 1 ,..., bk n ] and A
B
12
. Appendix D proves the following
Theorem 4 Suppose that assumptions 1 and 2a hold, the process f t has bounded support, and
is compact . If n / log n 1 / 2 , 0 , and k n increases to infinity slowly, so that
k n 1
k n g kn 1 g i 1 n / log n kn1 0 as n , then for any 0 ,
1 / 2
i 1
B
Pr f t A , B
f | A
t
0
almost surely as n .
The need for the regularization of the Rayleigh criterion makes estimation of the predictive
factors a harder problem than the estimation of the principal components. Despite the theoretical
appeal of the predictive factors technique, its practical advantages over the principal components
method should be investigated empirically. It may, for example, happen that with a realistic
amount of data the theoretical advantages are discredited by the estimation problems. In the rest
of the paper, we use the data on the term structure of the Eurodollar futures prices to illustrate the
predictive factors method and to compare its predictive performance with several alternatives.
4. Data
11
We use daily settlement data on the Eurodollar that we obtained from Commodity Research
Bureau. The Eurodollar futures are traded on Chicago Mercantile Exchange. Each contract is an
obligation to deliver a 3-month deposit of $1,000,000 in a bank account outside of the United
States at a specified time. The available contracts has delivery dates that starts from several first
months after the current date and then go each quarter up to 10 years into the future.
The available data start in 1982, however, we use only the data starting in 1994 when the
trading on 10-year contract appeared. We interpolated available data points by cubic splines to
obtain smooth forward rate curves. We restricted the curve to points that are 30 days from each
other to speed up the estimation.6 We also removed datapoints with less than 90 or more than
3480 days to expirations. That left us with 114 points per curve and 2507 valid dates.
The main difference of futures contract from the forward contract is that it settled during the
entire life of the contract, while in the forward contract the payment is made only at the
settlement date. This difference and variability of short-term interest rates make the values of the
forward and futures contracts different. While the difference is small for short maturities, it can be
significant for long maturities. There exists methods to adjust for this difference but for our
illustrative purposes we will simply ignore it.
The rate on the forward contract is approximately the forward rate that we defined above.
Indeed, the buyer of the contract expects to have a negative cash flow (the price of the forward
contract) on the settlement date and a positive cash flow ($1,000,000) 3 months after the
settlement date. He has the following alternative investment: he buys a discount bond that will
pay $1,000,000 three months after the settlement date. This costs $1,000,000 Pt (T ) , where
denotes 3 months. He complements this by selling a discount bond that matures on the
settlement day. If his overall investment is zero, then he is sure that on the settlement day he can
6
This is essentially equivalent to approximating the true data by step functions.
12
Note: The forward curves correspond to Eurodollar rates from January 1994 to December 2003.
The calendar time on the right axis and the time to maturity (in months) is on the left axis.
13
7
-1
-2
2/27/97 2/26/99 2/16/01 2/19/03 2/27/97 2/26/99 2/16/01 2/19/03 2/27/97 2/26/99
Note: The operator is estimated using the daily data on the Eurodollar forward rates. The
estimation is on the rolling basis so it uses all the information available at the time of estimation.
The dashed vertical line on the chart corresponds to the NBERs beginning date of the last
US recession. The coefficients estimates are visibly unstable in between the normal growth
and the recession period. In the rest of the paper, therefore, we restrict our attention to the
subsample corresponding to the normal growth period from January 1994 to the end of
February 2001. We hope that for this period, the functional autoregression describes the term
structure dynamics reasonably well.7 For this subsample, we use the predictive factor method
described in section 3 to estimate model (1).
Figure 3 shows our estimates of the weights of the first predictive factor and the
7
Perhaps a switching regimes functional autoregression would describe the whole sample data better. We
do not investigate this question here.
14
Figure 3. First predictive factor weights and loadings for 0 , 0.1 , and 1 .
-0.4
5000
=0
-0.5
0
-0.6
-5000 -0.7
0 20 40 60 80 100 0 20 40 60 80 100
0.4 0.7
0.2 0.6
=0.1
0 0.5
-0.2 0.4
-0.4 0.3
0 20 40 60 80 100 0 20 40 60 80 100
0.2 0.45
0.1 0.4
=1
0 0.35
-0.1 0.3
-0.2 0.25
0 20 40 60 80 100 0 20 40 60 80 100
The horizontal axis on figure 3 corresponds to time to maturity measured in months, the
longest maturity being about 10 years. As we mentioned before, without the regularization (the
case 0 ) the estimate of the predictive factor is not consistent. The estimated factor has no
sense, which is clearly confirmed by the upper left graph of figure 3. As grows, the weights of
the factor become smoother. For 0.1 , the factor weights are negative for maturities less than
about 9 months, positive for very long maturities (more than 8 years) and wiggling around zero
for other maturities. For 1 , the factor weights look more like a linear function with positive
slope.
15
Below, we will focus on the case 0.1 because we found that its pseudo out of sample
forecast performance (to be described shortly) is better than that for 1 . Table 1 shows the
21 12 11 I .
first 5 eigenvalues of the operator pencil
21
Table 1. Eigenvalues of 12
11 0.1I .
Eigenvalue 0.1,1 0.1, 2 0.1, 3 0.1, 4 0.1, 5
22.03 0.42 0.04 0.01 0.00
Recall that the eigenvalues can be interpreted as estimates of the reductions in the mean
square error of forecasting due to the corresponding predictive factors. We see that the error
reduction due to the first predictive factor is much larger than the reductions corresponding to the
other factors. We have decided to use 3 factors to estimate the autoregressive operator mainly
because of the tradition in the literature. Note, however, that the predictive factors are not
designed to explain the variation in the data and hence the above reference to the tradition is not
substantive. In addition, our restricting attention to 3 factors does not mean that we really believe
that the rank of the autoregressive operator is equal to 3. Instead, we simply think that from the
practical point of view, considering more factors in our estimation procedure would not improve
the predictive value of our estimates much.
Figure 4 shows our estimates of the weights and loadings of the first three predictive factors.
Note that the weights of the first factor do not look like traditional level, slope, or curvature
principal components of variation.
16
Figure 4. The weights and loadings of the first 3 predictive factors, 0.1
0.2 0.6
First factor
0 0.5
-0.2 0.4
-0.4 0.3
0 20 40 60 80 100 0 20 40 60 80 100
0.6 0.2
Second factor
0.4 0.1
0.2 0
0 -0.1
-0.2 -0.2
0 20 40 60 80 100 0 20 40 60 80 100
0.5 0.1
0.05
Third factor
0 0
-0.05
-0.5 -0.1
0 20 40 60 80 100 0 20 40 60 80 100
According to the estimates, an unexpected one percentage point increase in the 3-month
forward rate (our first observation) leads to a quarter percentage point decrease in 1 year forward
rates and to smaller decreases, but above 0.1 percentage points, for other maturities.
To assess the predictive performance of our estimate of model (1) based on 3 predictive
factors, we run the following experiment. We first separate our sample 01/01/94:02/28/01 into
two parts of equal sizes: a subsample 01/01/94:07/25/97 and a subsample 07/28/97:02/28/01.
Then, we estimate our functional autoregression based on the first subsample and forecast the
term structure one year ahead. The next step is to extend the first subsample to include one more
day, re-estimate the functional autoregression, and forecast the term structure one year ahead and
so on until we add the day one year before the end of our second subsample. After that, our
forecasting would correspond to the term structures beyond the second subsample, we would not
be able to compare the forecast with the actual term structure, and therefore we stop the exercise.
Our measure of the predictive performance is the root mean squared error based on the
difference between actual term structure and the forecasted one. This measure will be different for
different maturities. Therefore, we report a whole curve of the root mean squared errors.
17
Our empirical analysis contributes to a long-standing problem of whether the interest rates
are predictable. Some research (Duffee (2002), Ang and Piazzesi (2003)) indicates that it is hard
to predict better than simply assuming random walk evolution. This means that todays interest
rate is the best predictor for tomorrows interest rate, or, for that matter, for the interest rate one
year from now. The subject, however, is torn with controversy. Cochrane and Piazzesi (2002) and
Diebold and Lie (2002) report, on the contrary, that their methods improve over random walk
prediction.
We compare predictive performance of our method with 4 different methods. The first one is
the same functional autoregression but estimated based on the first 3 principal components as
discussed in section 2. The second method is the random walk. The third method is the mean
forecast, when the term structure a year ahead is predicted to be equal to the average term
structure so far. Finally, we consider Diebold-Li forecasting procedure.
The Diebold and Lis (2002) procedure consists of the following steps. First, we regress the
term structure on three deterministic curves, the components of the Nelson and Siegels (1987)
forward rate curve:
f t (T ) 1t 2t e t T 3t t Te tT .
This regression is run for each day in the subsample which the forecast is based upon. 8 Then,
the time series for the coefficients of the regression are modeled as 3 separate autoregressive
processes of order 1 (each of the the current coefficient is regressed on the corresponding
coefficient one year before). A one-year ahead forecast of the coefficients is made, and the
corresponding Nelson-Siegel forward curve is taken as the one-year ahead forecast of the term
structure.
Figure 5 shows the predictive performance of the alternative methods considered.
8
We fix parameter t at 0.0609 as Diebold and Li (2002) do.
18
Figure 5. The Predictive Performance of Different Forecasting Methods
1.4
1.3
1.2
Root mean squared error
1.1
0.9
0.8
0.7
0 20 40 60 80 100 120
Maturity
The thick solid line on the above graph corresponds to our method. The dashed line is for
functional autoregression estimated with principal components. The dotted line is for mean
prediction. The dash-dot line is for Diebold and Li (2002) method and solid thin line is for
random walk. Our method has the best pseudo out of sample forecasting record uniformly across
different maturities. The functional autoregression estimated with principal components is the
second best for relatively short maturities and the third best for long maturities where the mean
prediction works equally well with our preferred method. The worst performance is shown by the
random walk.
It should be mentioned that if we include the recession period in our data the random walk
outperforms the other methods. Our reason for not pooling the normal growth and the recession
data is, as was mentioned above, that the functional autoregression model seems to be unstable
across these two periods. A switching regimes functional autoregression may be a solution to the
problem. We, however, leave this question for future research.
Conclusion
19
We have shown that prediction of function-valued autoregressive processes can benefit from
a novel dimension-reduction technique, the predictive factor decomposition. The technique
differs from the usual principal components method by focusing on the estimation of those linear
combinations of variables that matter most for the prediction, as opposed to those that matter
most for describing the variance. It turns out that the predictive factors can be consistently
estimated using a regularization of a generalized eigenvalue problem. To the extent that such
problems often arise in different research areas, our theoretical results on consistency of the
estimation procedure have an independent interest.
An empirical illustration of the new method to the interest rate curve dynamics demonstrates
that the new method is easy to estimate numerically and performs well. The results of this
illustration show that the predictive factors method not only outperforms the principal
components method but also performs on par with the best of the other prediction methods.
The possible venues for further developing the new method is to investigate how to chose
the optimal regularization parameter and the optimal number of predictive factors, and whether it
can help in making inferences about the autoregressive operator.
20
Appendix A
Consider an abstract real Hilbert space H . Let function f n map a probability space (,A,P) to H
. We call this function an H-valued random variable if the scalar product ( g , f n ) is a standard random
If Ef 0 , one sets C f C f Ef .
C f1 , f 2 C f1 Ef1 , f 2 Ef 2 ,
C f 2 , f1 C f 2 Ef 2 , f1 Ef1 .
1) 0 E n
2
2 , E n 0, Cn do not depend on n and
2) n is orthogonal to m ; n, m Z , n m ; i.e.,
E ( x, n )( y, m ) 0, for any x, y H .
n , n Z is said to be a strong H-white noise if it satisfies 1) and
9
The definitions that follow are slight modifications of those in Chapters 2,3 of Bosq (2000).
21
between two processes be Ef i ( x ) f j (u ) ij ( x, u ) . Assume that with probability 1 the sample paths
of the processes are in L2 [0, T ] . Each stochastic process defines an H-valued random variable with zero
and the cross-covariance operator of f i and f j is the integral operator with kernel ij ( x, u ) .
Appendix B
We first prove the following
normalized eigenvector x i is unique and satisfies equation xi i1111 / 2 ' 111 / 2 xi . Using relationship
equality follows from the constraint B ' 11 B I k imposed on B .11 To see that the third equality holds,
10
Here uniqueness is understood modulo change in sign
11
We omit subscript k on Ak and Bk whenever convenient to make our notations more concise.
22
write tr AA' ei ' AA' ei and tr AB ' 21 ei ' AB ' 21ei , where ei is an
i 1 i 1
arbitrary basis in L2 . Then use the fact that A and B ' 21 are finite-dimensional vectors of functions
from L2 , and apply Parcevals equality.
We will first minimize the transformed objective function with respect to A , taking B as given. A
necessary condition for the optimal A to exist is that the Frchet derivative of the objective function with
respect to A is equal to zero (see, for example, proposition 2 in 7.2 and theorem 1 in 7.4 of Luenberger
(1969)). That is, 212 B 2 A 0 and we have A 12 B in accordance with statement i) of the
theorem.
Substituting A 12 B into the objective function, we get
E f t 1 AB' f t
2
tr 11 tr B ' 2112 B tr 11 tr B ' 11
1/ 2
11
1/ 2
B .
We can, therefore, reformulate problem (5) as
tr B' 111 / 2 111 / 2 B max , subject to a constraint
subject to X ' X I k and a requirement that X ' X is a diagonal matrix with non-increasing elements
along the diagonal (see the proof of the spectral theorem III.5.1 in Gohberg and Gohberg (1981)). The
maximum is equal to the sum of the k largest eigenvalues of , and the solution X consists of the
corresponding normalized eigenvectors. By lemma 1, B 111 / 2 X is well defined and consists of the
~ is another
first k eigenvectors of 2112 11 . It is obviously a unique solution to (5), for if B
solution, then ~
~ because there are no zero eigenvalues of
111 / 2 B B 0 , which implies B B .
Statement ii) of the theorem follows from the facts that, by lemma 1, the eigenvalues of and
2112 11 coincide, the maxima in (B1) and (5) are equal, and the maximum in (B1) is equal to the
sum of the k largest eigenvalues of .
To prove iii) note that since Ker 11 0 , Im111 / 2 L2 , and therefore:
1/ 2
Let x i be the i -th normalized eigenvector of . Note that xi forms an orthonormal basis in L2
and
(B3) Ak Bk' 111 / 2 111 / 2 k ,
23
where
B B x , x
1/ 2 ' 1/ 2 k
k 11 k k 11 i1 i i
and the latter equality follows from lemma 1. Substituting (B3)
Suppose that Ak B does not converge to zero. Then there exists a sequence z k such that
'
k
1/ 2
11
z k is bounded and 111 / 2 I k z k does not converge to zero. Without loss of generality, we
can assume that
(B4) 111 / 2 I k z k 0
for any k.
Note that since, by assumption, is a compact operator and
1/ 2
11 z k is a bounded sequence,
sequence 1/ 2
11
z k must have a converging subsequence. Without loss of generality, let us assume that
(B5) 111 / 2 z k z
for some z L2 .
Since x i are the eigenvectors of 11
1/ 2
' 111 / 2 , the compact operator 111 / 2 has
representation
x , y
1/ 2 1/ 2
1 i1 i i i
, where y i is an orthonormal basis in L2 . Denoting xi , z k as
yi is an orthonormal basis in L2 ,
ik 1i / 2 y i i k i y i ik 1i / 2 y i i 1 i y i
ik i 1
and hence
(B8) i k
ik 1i / 2 y i i k i y i / 2
for any k K 1 .
Let K 2 be so large that
(B9) i k
i yi / 2
for any k K 2 .
Combining (B8) and (B9), we have
24
ik
ik 1i / 2 y i ik
ik 1i / 2 y i i k i y i i k i y i
ik
ik 1i / 2 y i i k i y i ik
i yi
for any k max K 1 , K 2 . But this contradicts (B6). Hence our assumption that Ak Bk' does
not converge to zero is wrong and statement iii) of the theorem is established.
Appendix C
Proof of Theorem 3: We, first, prove an extension of lemma 1 in Leurgans et al (1993). Let us define
, (3n )
11 11 , (2n )
(1n ) 12 12
12 2112 , and n max (in ) . We
21
i 1, 2 , 3
have:
(in ) log n / n
1/ 2
a.s. Now,
21
12 21 12
21
12 21
12 21
12 21 12
21 21
12 12 log n / n 1 / 2
12 21
almost surely, which completes the proof.
j max min (b), j max min (b), and j max min (b).
dim M j bM dim M j bM dim M j bM
sup (b) (b) (1 1 ) 1 n o 1 n 0 almost surely as n .
2
bL
The proof is based on lemma 2 and is essentially the same as that of Proposition 3 in Leurgans et al (1993)
and we omit it here.
25
Using Proposition 1, it is easy to prove part i) of our theorem. Note that since (b) (b) for
any b L2 , we have:
Therefore,
sup j j sup j j /(1 j 1 ) 1 1 1 /(1 kn1 ) 0 .
jk n jk n
sup j j sup max min (b) max min (b) sup (b) (b) 0 a.s.
j kn dim M j bM
j kn dim M j bM 2
bL
Thus,
Let us now turn to part ii) of the theorem. Denote bj b j ' 11 bj b j as d j and bj ' bj
as m j . Below, we are going to find an upper bound on sup
d j m j and show that this bound tends to
jk n
bj i 1 ji bi s j ,
j
where, for any i j , ji bj ' 11 bi , and the residuals s j have the following properties. First,
(C4)
s j ' 2112 s j bj ' 2112
21 12 bj j 1 j1 ... j jj
2 2
Finally, we have:
(C5) s j ' 21 12 s j j 1 s j ' 11 s j .
26
Subtracting (C2) from (C3), rearranging, and using the fact that bj ' 11
b 1 , we
11 j n
obtain:
(C6) d j m j 21 jj 1 n .
Expanding (C5) using (C2) and (C4), and rearranging, we get:
1 jj2 j j 1
1
b ' (11 11 )bj bj ' (2112 21 12 )bj
j 1 j
j j i 1 (i j 1 ) ji2 j 1m j .
j 1
Recalling that bj ' 11 11 j n
b 1 , b '
j 21 12
21 12 bj
1
n , that m j 0 ,
and that, from the proof of statement i),
the sign of the eigenvectors bj and b j . More precisely, the regression coefficient jj must be
positive. This fact implies that 1 jj 1 jj . Combining this inequality with (C6) and (C7), and using
2
and
b j ' 11 bi b '
2
j 11
11 bi bj ' bi 2
(C10)
2bj ' 11 11 bj bi ' 11 11 bi 2bj ' bj bi ' bi
o( 1 n ) 2mi
Using (C9) and (C10), we have:
ji2 bj ' 11 bi b ' b b b ' b
2
j 11 i i j 11 i
2
2b ' b b 2b ' b
2 2
j 11 i i j 11 i
(C11)
2b ' b b b ' b b 2b ' b
2
j 11 j i i 11 i i j 11 i
4 d i mi
27
for large enough n . Substituting (C11) into (C8), rearranging, and using the fact that, for j kn ,
(C12) d j m j 81 j j 1
1
(1 1
1 ) 1 n kn1 i 1 d i mi
j 1
.
It is straightforward to check, that if a sequence of real numbers x j satisfies recursive inequality
x j g j f i 1 xi
j 1
, then x j g j 1 g f . Applying this observation to (C12), we get:
j 1
i 1 i
(C13)
sup d j m j g k n i n1 1 g i (1 11 ) 1 n kn1 ,
k 1
jkn
1
where g i 81 (i i 1 ) . The right hand side of (C13) tends to zero almost surely as n by
lemma 2 and the assumptions of the theorem. This completes our proof of statement ii).
Appendix D
Proof of theorem 4: First, note that
B
f t A ' f a a a a ,
t 1 2 3 4
where a1 AB' f t , a 2 12 B B
B' f , a
t 3 12
B
12
B ' f , and
t
B
a4 12
BB
' f .
t
Let
4
t i 1 i
B ' f | A , B 4 Pr a | A , B . Below we will
. We have: Pr f t A
show that each of the terms in the latter expression converges to zero almost surely.
Pr a 2 | A , B 2 tr 12 B B B' 11 B B B ' 21 2 tr B B ' 2112 B B
2 tr B B ' 111 / 2 111 / 2 B B 1 2 tr B B ' 11 B B
1 2 k n g k n i n1 1 g i (1 11 ) 1 n kn1 0
k 1
a.s.
Pr a3 | A , B 2 tr 12
12 B B' 11 BB ' 21
12 B B ' 21
21 2 tr 12 21
21
2 tr B ' 21
12 B 2 tr B ' B
21 12 21
12 12
2
2 k n 0 n
a.s.
28
Pr a 4 | A
, B 2 tr
12
B B B ' B B B '
11 21
2 tr B B ' B B B '
11
21 12 B
2 tr B B ' B B B '
11 21 12
B tr B B ' B B
2
11
2 k n g k n i 1 1 g i (1 11 ) 1 n kn1 0
k n 1
a.s.
This completes the proof of theorem 4.
References:
A. Ang and M. Piazzesi (2003) A no-arbitrage vector autoregression of term structure dynamics with
macroeconomic and latent variables Journal of Monetary Economics, 50, 745-787
T. W. Anderson (1984) An Introduction to Multivariate Statistical Analysis, 2nd edition, John Wiley
and Sons
D. Bosq (2000) Linear Processes in Function Spaces: Theory and Applications, Springer-Verlag
G.E.P. Box and G.C. Tiao (1977) A canonical analysis of multiple time series Biometrika, 64, 355-
365
J. H. Cochrane and M. Piazzesi (2002) Bond Risk Premia NBER Working paper 9178
F. X. Diebold and C. Li (2002) Forecasting the Term Structure of Government Bond Yields
Working Paper (available at http://www.ssc.upenn.edu/~diebold )
G. R. Duffee (2002) Term Premia and Interest Rate Forecasts in Affine Models Journal of Finance,
57, 405-443
D. Duffie and R. Kan (1996) A Yield-Factor model of Interest Rates Mathematical Finance, 6, 379-
406.
P.H. Dybvig (1997) Bond and Bond Option Pricing Based on the Current Term Structure in
Mathematics of Derivative Securities ed. by M.A.H. Dempster and S.R.Pliska, Cambridge University
Press, 271-293
D. Eschwe and M. Langer (2004) Variational principles for eigenvalues of self-adjoint operator
functions Integral Equations and Operator Theory, 49, 287-321
29
J.J. Fortier (1966) Simultaneous linear prediction Psychometrika, 31, 369-381.
I. Gohberg and S. Gohberg (1981) Basic Operator Theory, Birkhauser, Boston, Basel, Berlin.
H. Hotelling (1936) Relations between two sets of variates Biometrika, 28, 321-377
D. G. Luenberger (1969) Optimization by Vector Space Methods, John Wiley & Sons, Inc. New
York, Chichester, Weiheim, Brisbane, Singapore, Toronto.
M. Piazzesi (2003) Bond Yields and the Federal Reserve Working paper
J.O. Ramsay and B.W. Silverman (1997), Functional data analysis, Springer, New York.
G. Reinsel (1983) Some results on multivariate autoregressive index models Biometrika, 70, 145-
156
30