Euro Curve Forecasting

Curve Forecasting by Functional Autoregression
V. Kargin1 (Cornerstone Research)

A. Onatski2 (Columbia University)
November, 2004
Abstract
The paper explores prediction of the curve-valued autoregression processes. It develops a novel technique,
the predictive factor decomposition, for the estimation of the autoregression operator. The technique is
designed to be better suited than the principal components method for the prediction purposes. The
consistency of the predictive factor estimates is proved. The new method is illustrated by an analysis of the
dynamics of the term structure of Eurodollar futures rates.
1. Introduction
The statistical analysis of problems from different disciplines increasingly relies on
functional data, where each observation is a curve as opposed to a finite-dimensional vector. An
up to date treatment of the functional data analysis, that includes many interesting examples, is
Ramsay and Silverman (1997). In this paper we consider a situation, in which the data generating
process is a functional autoregression:
(1) f t h f t t h .
Here for each t , f t is an element of a Hilbert space H , is a linear bounded operator
on H , t is a strong H-white noise3, and h is the lag length. When H is a finite-dimensional

Hilbert space, the functional autoregression reduces to the familiar vector autoregression
a workhorse of the empirical time series analysis. We study the problem of curve forecasting for
infinite-dimensional autoregression.
Although many interesting research objects are functional in nature, their dynamics is often
approximated by finite-dimensional processes. For example, in economics, it is a widespread
opinion that the yield curve dynamics can be completely described in terms of three factors,
which are often modeled as principal components of the interest rate variation. While it is true
that more than 95% percent of the variation can be decomposed into three principal components,
recent research (Cochrane and Piazzesi (2002)) suggests that using only these factors for
1
skarguine@cornerstone.com ;
Cornerstone Research, 599 Lexington Avenue floor 43, New York, NY 10022
2
ao2027@columbia.edu;
Economics Department, Columbia University, 1007 International Affairs Building MC 3308, 420 West
118th Street , New York, NY 10027
3
Appendix A briefly describes the formalism of Hilbert space valued random variables and explains how it
relates to a more familiar language of random processes.
1
forecasting may be erroneous. The new evidence indicates that the best predictors of interest rate
movements are among those factors that do not contribute much to the overall interest rate
variation. This observation motivates much of the general discussion below.
When the data generating process is described by (1), forecasting f t h calls for estimation
of the infinitely-dimensional operator . Since only a finite number of data points is observed, a
dimension reduction technique is needed. In his book on functional autoregression, Bosq (2000)
proposes to estimate operator by reduction to the subspace spanned by the first eigenvectors
of the sample covariance operator4. We argue that this method is essentially the familiar principal
components method and is not well suited for forecasting. The reason is that the largest
eigenvectors of the covariance operator for f t may have nothing to do with the best predictors of
f t h in the same way as the factors explaining most of the interest rate variation may have less
predictive power than other factors.

If not principal components, then what dimension reduction tool is appropriate for curve
forecasting? One possibility is to use predictive factors, an equivalent of the simultaneous linear
predictions introduced in the static context by Fortier (1966). For the time series data, the method
is closely related to but is better suited for the prediction purposes than the canonical analysis of
Box and Tiao (1977), an extension of the classical canonical correlation analysis originally
developed by Hotelling in early 1930s.
The main idea of the predictive factor method is to focus on estimation of those linear
functionals of the data that can contribute most to the reduction of expected predictive error. This
goal necessitates a subtle balance between the search for large correlations and large variances
hidden in the data.
In the finite-dimensional case, the predictive factor analysis is related to the reduced-rank
autoregression extensively studied by Reinsel (1983). A contribution of this paper is an extension
of the analysis, which accounts for the functional nature of interest rate data. It parallels in many
respects the extension of the classical canonical correlation analysis to the functional data
performed by Leurgans, Moyeed and Silverman (1993).
Our main theoretical results are in Theorems 2 and 3. Theorem 2 relates the predictive
factors to eigenvectors of a certain generalized eigenvalue problem. Since the Courant-Fischer
theorem characterizes the eigenvectors as solutions of a minimax problem, results of Theorem 2
suggest estimating the predictive factors as solutions of a regularized minimax problem. Theorem
3 proves that with a certain choice of the regularization this procedure is consistent. To the extent
4
The covariance operator measures the covariance of different functionals of the random curve. The
covariance of the curves observed at different times is measured by the cross-covariance operator.
2
that generalized eigenvalue problems often arise in different research areas, our consistency result
has an independent interest.
As an application, we illustrate the method using ten years of data on Eurodollar futures
contracts. Consistently with previous research, we find that the best predictors of future interest
rates are not among the largest principal components but are hidden among the residual
components.
Meant to be an illustration of the predictive factors technique, our empirical analysis has
several limitations. We do not attempt to use non-interest rate macroeconomic variables for
interest rate forecasting. We do not aim to derive implications of the interest rate predictability for
the control of the economy by interest rate targeting. We also do not address the question whether
financial portfolios that correspond to the predictable combinations of interest rates generate
excess returns that cannot be explained by traditional risk factors. Overcoming these limitations
would be a separate research effort.
The rest of the paper is organized as follows. The principal component method of
estimation of the functional autoregression operator is described in section 2. The predictive
factor analysis is in Section 3. The data are described in Section 4. The results of estimation of the
predictive factors for the interest rate curve are in Section 5. And Section 6 concludes.
2. The Estimation Problem
In this paper, we focus on prediction of curves f t (x ) that belong to the Hilbert space of the
square summable functions of x [0, X ] . We assume that the curve dynamics is governed by a
stationary functional autoregression (1). According to theorem 3.1 of Bosq (2000), the
stationarity is guaranteed by the following
Assumption 1 There exists an integer j 1 such that 1.

j
L2
Here L2 denotes the operator norm induced by the L2 norm.
To forecast f t h we need to estimate . Let 11 be the covariance operator of random
curve f t and 21 be the cross-covariance operator for curves f t and f t h . It is easy to see
that the following useful operator relationship holds:
(2) 12 11 .
To estimate , it is tempting to substitute the covariance and cross-covariance operators
with their estimates in (2) and solve the resulting equation for . Unfortunately, this will not
work. Indeed, the empirical covariance and cross-covariance operators are
3
n
( x) 1 f , x f , 1 nh
11
n i 1
i i 12 ( x) f i , x f ih ,
n h i 1
where , denotes the scalar product in L2 .
Consequently, the empirical covariance operator 11 is singular and cannot be inverted.

Intuitively, the estimation problem that we try to handle is ill-posed: we estimate a functional
dependence from a discrete set of data. As a consequence, obtaining a consistent estimate of
requires a regularization of the problem.
One possible regularization method was introduced by Bosq (2000) who advocates
projecting on principal components of 11 . The idea is to determine how the operator acts
on those linear combinations of f t that have the largest variation. In more detail, denote the span
of k n eigenvectors of 11 associated with the largest eigenvalues, as H k n , and let k n be the

orthogonal projector on this subspace. Define the regularized covariance and cross-covariance
~ ~
' and
estimates as follows: 11 k n 12 k n 12 k n . These are simply the empirical
'
11 k n
covariance and cross-covariance operators restricted to H k n . Then define

~ ~ 1
~ '
k 12 11 k .n n
~ ~ ~ 1
Note that is 12 11 on H k , and zero on the orthogonal complement to H k . The
n n
claim is that under certain assumptions on the covariance operator, this estimator is consistent.
Here is the precise result.
Assumption 2 All eigenvalues of 11 are positive and distinct.
Assumption 3 The first k n eigenvalues of 11 are positive for any n , almost surely.
1

Let a1 (1 2 ) 1 , and a i max (i 1 i ) , (i i 1 )
1

for i 1 , where i are
eigenvalues of the covariance operator 11 ordered in the decreasing order.
Theorem 1. Suppose that assumptions 1, 2, and 3 hold, that process f t has finite fourth
unconditional moment, and that is Hilbert-Schmidt. Then, if for some 1

kn
a
1
k n j O n1 / 4 (log n) ,
1
we have:
~n L2
0 a.s.
4
Remark: The conditions of the theorem require that the eigenvalues of the covariance matrix do
not approach zero too fast, and that the eigenvalues be not too close to each other.
Proof: This is a restatement of Theorem 8.7 in Bosq (2000).
While consistent, the principal component estimation method may perform very badly in
small samples if the best predictors of the future evolution have little to do with the largest
principal components. To see why, consider a k -factor version of Vasicek (1977) interest rate
model. Let the spot interest rate, rt , be a sum of k independent factors z it that follow an
Ornstein-Uhlenbeck process:
rt i 1 z it ,
k
dz i i ( i z i )dt i dwi , i 1,..., k
As explained by Dybvig (1997), the forward rate curve in such a model will be simply a sum
of the forward rate curves implied by the single-factor models based on z it . Therefore, for the
forward rate curves (net of their means) we have (see formula (29) of Vasicek (1977)):

f t ( x) i 1 z it i 1 e i x ,
k

where x denotes time to maturity of the forward contract.
Since the discrete time sampling of z it follows an autoregression:
z i ,t h i e i h z i ,t i i ,t h ,
i ~ i.i.d .N (0, si2 )
,
1 e 2 i h
s
2
i i
2
2 i
the model lands itself in the functional autoregression framework. We can, for example, define
the Hilbert space H as the space of functions on the positive semi-axis that are square integrable
with respect to exponential density e x so that the norm of an element of H has the following
form:

2
f e x f ( x ) 2 dx .
0
The functional autoregression operator is then equal to the composition of a projection on and
scaling along the subspace spanned by 1 e i x , i 1,..., k , and the strong H -white noise
5
t has a singular covariance operator with eigenvectors corresponding to non-zero eigenvalues
spanning the above subspace.
In this example we will ignore the estimation issues and simply assume that we observe all
the factors and are able to estimate well the parameters of the corresponding Ornstein-Uhlenbeck
processes. However, to illustrate the potential problem with the principal components method we
assume that we can only use r k factors for prediction and set the rest of the factors equal to
their mean. What factors should we use?
Let the loss from predicting f t 1 by ft 1 be
2
E f t 1 ft 1 . Since the factors are
independent, the reduction in the mean squared error due to the forecasting of factor z i must be
equal to the explained portion of variance of z i , Var z i Var i , times the squared norm of
1 e i x . It is straightforward to check that such a reduction is equal to

i2 i e 2 i h
(3) .
i 1 2 i 1
The optimal choice of r factors to be used for forecasting should, therefore, be based on the
ranking of the loss reductions (3). The first factor to be used should correspond to the largest
value of (3), the second one should correspond to the second largest value of (3) and so on.
How would the principal components method choose the r factors? For the above
example, 11 acts as follows

11 : g ( x) i 1Var z i 1 e iu , g (u ) 1 e i x ,
k

Therefore, eigenvectors corresponding to non-zero eigenvalues of 11 are equal to
1 e i x , i 1,..., k and the eigenvalues are equal to Var xi 1 e x

2
i
respectively. The
explicit formula for the eigenvalues is:
i2 i
(4) .
i 1 2 i 1
Hence, the principal components method would choose r factors to be used in the prediction
according to the ranking of (4).
The choice of the factors made by the principal components method may in principle be
very different from the optimal choice based on ranking of (3). For example, if factor z i has
huge instantaneous variance i and large mean reversion parameter i , it may well happen
2
6
that the principal components method would rank z i first to include, and the optimal method
would rank it last to include. In such a case, although z i would explain almost all variation in
the forward curve, its predictive power will be miniscule because z i lacks persistence. Factors
that better predict the curve would be hidden among more distant principal components.
Note that the optimal choice of factors depends on the horizon h of our forecasting
problem. When the horizon goes to infinity, the first factor becomes equal to the most persistent
factor. If the most persistent factor has small instantaneous variance then it is unlikely to be
captured by a few largest principal components of the curve variation.
The above example suggests that we might be better off by searching for good predictors
directly without first projecting a curve on the largest principal components. In the next section,
we develop a method that takes this suggestion seriously.
3. Predictive Factors
To start with, note that the principal components method is a particular way to approximate a
full-ranked by a reduced-rank operator. In general, a rank k approximation to has form
Ak Bk' ,
where Ak : R L and Bk : L R . In section 2 we argued that the principal components

k 2 ' 2 k
method would not choose the approximation optimally from the forecasting point of view.
'
We would like, therefore, to find Ak and Bk that minimize the mean squared error of the
prediction
2
(5) E f t 1 Ak Bk' f t min ,
subject to normalizing constraints Bk 11 Bk I k and Ak Ak is diagonal with non-increasing

' '
elements on the diagonal. Fortier (1966) considers such problem in the static context, when
predictors are not the lagged values of the forecasted series, and calls the corresponding variables
Bk' f t simultaneous linear predictions. In what follows, we will call Bk' f t the first k
predictive factors and Ak the corresponding predictive factor loadings.

Similarly to principal components, the predictive factors can be defined recursively. The
first predictive factor5 b1 ' f t and the first predictive factor loading a1 correspond to solution of
(5) for k 1 . The second predictive factor and factor loading are defined as solving the same
X
f ( x ) g ( x ) dx as f ' g and functionals
5
In what follows, we will denote scalar products like
0
transforming g into f ' g as f ' .
7
problem subject to an additional constraint that b2 ' must be orthogonal to b1 ' in the metric
11 , that is b2 ' 11 b1 0 . And so on for the third, fourth, etc. factors and factor loadings.
Let us define an operator , whose properties are essential for the existence of the
predictive factors, as 11
1/ 2
' 111 / 2 . We will make the following assumption:
Assumption 2a All eigenvalues of are positive and distinct.
Appendix B proves the following:
Theorem 2 Under assumptions 1 and 2a we have:

i) For any integer k 1 , there exist Ak and Bk solving (4).This solution is unique
up to a simultaneous change in sign of Bk and Ak . Vector Bk consists of the eigenvectors of
21 12 11 arranged in the order of declining eigenvalues. Vector Ak is equal to 12 Bk .
ii) The i th eigenvalue of 21 12 11 is equal to the reduction in the mean squared

error of forecasting due to the i-th predictive factor.
If is compact, Ak Bk 0 as k .
'
iii) L2
Remark: For Ak and Bk to be well defined for a given k , it is enough to require that the
first k eigenvalues of are positive and distinct.
To build intuition let us return to the multifactor Vasicek model example. In that example,
The cross-covariance operator 12 acts as follows:

12 : g ( x) i 1 Cov z it , z i ,t h 1 e i u , g (u ) 1 e i x .
k

Cov 2 z it , z i ,t h 2
The non-zero eigenvalues of 2112 11 are equal to 1 e i u , which
Var z it
is exactly equal to the ratio in (3) used to optimally rank the factors.
Theorem 2 relates the problem of finding optimal predictive factors to a generalized
eigenvalue problem. Its significance is twofold. First, it relates the problem of optimal prediction
to a well studied area of generalized eigenvalue problems. Second, it suggests a method for
estimation of the optimal predictive factors that proceeds by solving a regularized version of the
generalized eigenvalue problem.
It seems natural to estimate Ak and Bk by computing the eigenvectors of

21 12 11 and using theorem 2. Unfortunately, similarly to the situation with the canonical
8
variates studied by Leurgans, Moyeed and Silverman (1993), such a method of estimation would
be inconsistent and the corresponding estimators meaningless. It is because the predictive factors
are designed to extract those linear combinations of the data that have small variance relative to
their covariance with the next periods data. Linear combinations with small variance are poorly
estimated and a seemingly strong covariance (in relative terms) with the next periods data may
easily be an artifact of the sample.
Leurgans, Moyeed and Silverman (1993) deal with the problem for the canonical correlation
analysis by introducing a penalty for roughness of the estimated canonical covariates. We use the
same idea to obtain a consistent estimate of the predictive factors.
Let us denote the j -th eigenvalue and eigenvector of the operator pencils
2112 11 , 2112 11 I and

21
12
11 I as j , j , j and
b j , bj , bj respectively. Here 0 is a regularization parameter. We assume that the
eigenvectors are normalized so that
b j ' 11 bi ji , bj ' 11 I bi ji , bj '

11 I bi ji . Further, for any integer j 1 ,
b' 11 b
define j , and g i 81 (i i 1 ) .
1
min
bsp ( b1 ,...,b j ) b' b
Appendix C proves the following
Theorem 3 Suppose that assumptions 1 and 2a hold and that process f t has bounded support.
If n / log n 1 / 2 , 0 , and k n does not increase too fast so that kn1 0 as n ,

then
i) sup j j 0 almost surely as n .

j k
n
1 g n / log n
k n 1
1 1 / 2
If k n is chosen so that g k n i kn1 0 , then
i 1
ii) sup
j k

bj b j ' 11 bj b j 0 almost surely as n .
n
Remarks:
1) When f t does not have a bounded support but its fourth moment is finite, the theorem
remains true if n / log n 1 / 2 is replaced by log n n / log n 1 / 4 for some 1 / 4 .
9
2) Of course, what can be consistently estimated is not the eigenvector itself, but the
subspace generated by this eigenvector. For this reason, statement ii) holds for a particular choice
in the sign of the eigenvectors bj and b j .

3) Accurate estimation of a fixed finite number of the predictive factors seems to have more
practical relevance than the ability to estimate well ever increasing number of factors. Clearly,
theorem 3 can be relaxed to have the following form.
Corollary 1 Suppose that assumptions 1 and 2a hold and that process f t has bounded support.
If n / log n 1 / 2 and 0 as n , then for any integer k 1
i) k k 0 almost surely as n .

ii) bk bk ' 11 bk bk 0 almost surely as n .
The above corollary can be used to prove consistency of the estimates of the predictive
factors in the following sense. Suppose that we estimate a predictive factor, b j ' f t , where f t is
chosen at random from its unconditional distribution, by bj ' f t . Conditionally on our estimate
bj , a probability that the difference between the factor and its estimate is greater by absolute
value than can be bounded as follows:

Pr bj ' f t b j ' f t | bj 2Var bj b j ' f t | bj
.

2 bj b j ' 11 bj b j .
According to statement ii) of the corollary, this bound tends to zero almost surely as n .
Statement ii) also implies convergence in probability of our estimates of the predictive factor
loadings, aj . Indeed, we have:

b b
aj a j 12 j 12 j
12
12 bj 12 bj b j
Lemma 2 from appendix C implies that the first term in the above expression tends in probability
to 0. For the second term we have:

12 bj b j
11 bj b j
b b ' b
j j 11 j
b j 111 / 2 ,
which tends to zero almost surely according to statement ii) of the corollary.
10
In sum, corollary 1 essentially says that by maximizing a regularized Rayleigh criterion we
can consistently estimate the factors having largest predictive power, the corresponding factor
loadings, and the reduction in the mean squared error achievable by using the factors. Hence, the
concept of predictive factors can be effectively used for the data exploration purposes and may
provide researchers with a practically more efficient tool of the finite-dimensional approximation
relative to the principal components.
Moreover, when the number of the observed curves and the number of the predictive factors
estimates used to approximate the autoregressive operator go to infinity simultaneously, the
predictive power of the approximation converges to the theoretical maximum. Although, as
theorem 1 implies, the same is true for the principal components, it is comforting to realize that
the predictive factors technique is not handicapped in this respect.
Theorem 3 can be used to establish a precise result. Suppose that f t is chosen at random
from its unconditional distribution and the task is to forecast f t h , given f t . The best, but
B
infeasible, forecast is f t . We approximate this forecast by A ' f , where
t
B [b 1 ,..., bk n ] and A

B
12
. Appendix D proves the following
Theorem 4 Suppose that assumptions 1 and 2a hold, the process f t has bounded support, and
is compact . If n / log n 1 / 2 , 0 , and k n increases to infinity slowly, so that

k n 1
k n g kn 1 g i 1 n / log n kn1 0 as n , then for any 0 ,
1 / 2
i 1
B
Pr f t A , B
f | A
t
0
almost surely as n .
The need for the regularization of the Rayleigh criterion makes estimation of the predictive
factors a harder problem than the estimation of the principal components. Despite the theoretical
appeal of the predictive factors technique, its practical advantages over the principal components
method should be investigated empirically. It may, for example, happen that with a realistic
amount of data the theoretical advantages are discredited by the estimation problems. In the rest
of the paper, we use the data on the term structure of the Eurodollar futures prices to illustrate the
predictive factors method and to compare its predictive performance with several alternatives.
4. Data
11
We use daily settlement data on the Eurodollar that we obtained from Commodity Research
Bureau. The Eurodollar futures are traded on Chicago Mercantile Exchange. Each contract is an
obligation to deliver a 3-month deposit of $1,000,000 in a bank account outside of the United
States at a specified time. The available contracts has delivery dates that starts from several first
months after the current date and then go each quarter up to 10 years into the future.
The available data start in 1982, however, we use only the data starting in 1994 when the
trading on 10-year contract appeared. We interpolated available data points by cubic splines to
obtain smooth forward rate curves. We restricted the curve to points that are 30 days from each
other to speed up the estimation.6 We also removed datapoints with less than 90 or more than
3480 days to expirations. That left us with 114 points per curve and 2507 valid dates.
The main difference of futures contract from the forward contract is that it settled during the
entire life of the contract, while in the forward contract the payment is made only at the
settlement date. This difference and variability of short-term interest rates make the values of the
forward and futures contracts different. While the difference is small for short maturities, it can be
significant for long maturities. There exists methods to adjust for this difference but for our
illustrative purposes we will simply ignore it.
The rate on the forward contract is approximately the forward rate that we defined above.
Indeed, the buyer of the contract expects to have a negative cash flow (the price of the forward
contract) on the settlement date and a positive cash flow ($1,000,000) 3 months after the
settlement date. He has the following alternative investment: he buys a discount bond that will
pay $1,000,000 three months after the settlement date. This costs $1,000,000 Pt (T ) , where
denotes 3 months. He complements this by selling a discount bond that matures on the
settlement day. If his overall investment is zero, then he is sure that on the settlement day he can
make a payment of at least $1,000,000 Pt (T ) / Pt (T ) . By arbitrage considerations we see

that the price of the forward contract is $1,000,000 times one minus the forward rate.
The next chart illustrates the evolution of Eurodollar forward rate curves.
Figure 1. Forward Curve Evolution
6
This is essentially equivalent to approximating the true data by step functions.
12
Note: The forward curves correspond to Eurodollar rates from January 1994 to December 2003.
The calendar time on the right axis and the time to maturity (in months) is on the left axis.
5. One year ahead prediction of the term structure

We first investigate whether the data can be sensibly represented by the functional
autoregression model (1) with lag length h equal to one year. To this goal, we estimate the
autoregressive operator on a rolling basis using daily data starting from the subsample 3-Jan-
94:10-Mar-95 and increasing it to the full sample. We restrict the estimates to the subspace
spanned by the three principal components of the sample covariance operator.
In the basis formed by the principal components estimates, our estimate of the autoregressive
operator can be represented by a 3 by 3 matrix. We plot the results of the estimation as the
number of data increases in figure 2:
Figure 2. The evolution of matrix entries of the estimate of operator .
13
7
-1
-2
2/27/97 2/26/99 2/16/01 2/19/03 2/27/97 2/26/99 2/16/01 2/19/03 2/27/97 2/26/99
Note: The operator is estimated using the daily data on the Eurodollar forward rates. The
estimation is on the rolling basis so it uses all the information available at the time of estimation.
The dashed vertical line on the chart corresponds to the NBERs beginning date of the last
US recession. The coefficients estimates are visibly unstable in between the normal growth
and the recession period. In the rest of the paper, therefore, we restrict our attention to the
subsample corresponding to the normal growth period from January 1994 to the end of
February 2001. We hope that for this period, the functional autoregression describes the term
structure dynamics reasonably well.7 For this subsample, we use the predictive factor method
described in section 3 to estimate model (1).
Figure 3 shows our estimates of the weights of the first predictive factor and the
corresponding factor loadings ( b1 and a1 respectively, in terminology of section 5) for 0 ,

0.1 , and 1 . In this paper we do not study how the optimal choice of the regularization
parameter should be made. A systematic analysis of the optimal choice is left for future work.
7
Perhaps a switching regimes functional autoregression would describe the whole sample data better. We
do not investigate this question here.
14
Figure 3. First predictive factor weights and loadings for 0 , 0.1 , and 1 .
First predictive factor weights First predictive factor loadings

10000 -0.3
-0.4
5000
=0
-0.5
0
-0.6
-5000 -0.7
0 20 40 60 80 100 0 20 40 60 80 100
0.4 0.7
0.2 0.6
=0.1
0 0.5
-0.2 0.4
-0.4 0.3
0 20 40 60 80 100 0 20 40 60 80 100
0.2 0.45
0.1 0.4
=1
0 0.35
-0.1 0.3
-0.2 0.25
0 20 40 60 80 100 0 20 40 60 80 100
The horizontal axis on figure 3 corresponds to time to maturity measured in months, the
longest maturity being about 10 years. As we mentioned before, without the regularization (the
case 0 ) the estimate of the predictive factor is not consistent. The estimated factor has no
sense, which is clearly confirmed by the upper left graph of figure 3. As grows, the weights of
the factor become smoother. For 0.1 , the factor weights are negative for maturities less than
about 9 months, positive for very long maturities (more than 8 years) and wiggling around zero
for other maturities. For 1 , the factor weights look more like a linear function with positive
slope.
15
Below, we will focus on the case 0.1 because we found that its pseudo out of sample
forecast performance (to be described shortly) is better than that for 1 . Table 1 shows the

21 12 11 I .
first 5 eigenvalues of the operator pencil
21
Table 1. Eigenvalues of 12
11 0.1I .
Eigenvalue 0.1,1 0.1, 2 0.1, 3 0.1, 4 0.1, 5
22.03 0.42 0.04 0.01 0.00
Recall that the eigenvalues can be interpreted as estimates of the reductions in the mean
square error of forecasting due to the corresponding predictive factors. We see that the error
reduction due to the first predictive factor is much larger than the reductions corresponding to the
other factors. We have decided to use 3 factors to estimate the autoregressive operator mainly
because of the tradition in the literature. Note, however, that the predictive factors are not
designed to explain the variation in the data and hence the above reference to the tradition is not
substantive. In addition, our restricting attention to 3 factors does not mean that we really believe
that the rank of the autoregressive operator is equal to 3. Instead, we simply think that from the
practical point of view, considering more factors in our estimation procedure would not improve
the predictive value of our estimates much.
Figure 4 shows our estimates of the weights and loadings of the first three predictive factors.
Note that the weights of the first factor do not look like traditional level, slope, or curvature
principal components of variation.
16
Figure 4. The weights and loadings of the first 3 predictive factors, 0.1
Predictive factor weights Predictive factor loadings

0.4 0.7
0.2 0.6
First factor
0 0.5
-0.2 0.4
-0.4 0.3
0 20 40 60 80 100 0 20 40 60 80 100
0.6 0.2
Second factor
0.4 0.1
0.2 0
0 -0.1
-0.2 -0.2
0 20 40 60 80 100 0 20 40 60 80 100
0.5 0.1
0.05
Third factor
0 0
-0.05
-0.5 -0.1
0 20 40 60 80 100 0 20 40 60 80 100
According to the estimates, an unexpected one percentage point increase in the 3-month
forward rate (our first observation) leads to a quarter percentage point decrease in 1 year forward
rates and to smaller decreases, but above 0.1 percentage points, for other maturities.
To assess the predictive performance of our estimate of model (1) based on 3 predictive
factors, we run the following experiment. We first separate our sample 01/01/94:02/28/01 into
two parts of equal sizes: a subsample 01/01/94:07/25/97 and a subsample 07/28/97:02/28/01.
Then, we estimate our functional autoregression based on the first subsample and forecast the
term structure one year ahead. The next step is to extend the first subsample to include one more
day, re-estimate the functional autoregression, and forecast the term structure one year ahead and
so on until we add the day one year before the end of our second subsample. After that, our
forecasting would correspond to the term structures beyond the second subsample, we would not
be able to compare the forecast with the actual term structure, and therefore we stop the exercise.
Our measure of the predictive performance is the root mean squared error based on the
difference between actual term structure and the forecasted one. This measure will be different for
different maturities. Therefore, we report a whole curve of the root mean squared errors.
17
Our empirical analysis contributes to a long-standing problem of whether the interest rates
are predictable. Some research (Duffee (2002), Ang and Piazzesi (2003)) indicates that it is hard
to predict better than simply assuming random walk evolution. This means that todays interest
rate is the best predictor for tomorrows interest rate, or, for that matter, for the interest rate one
year from now. The subject, however, is torn with controversy. Cochrane and Piazzesi (2002) and
Diebold and Lie (2002) report, on the contrary, that their methods improve over random walk
prediction.
We compare predictive performance of our method with 4 different methods. The first one is
the same functional autoregression but estimated based on the first 3 principal components as
discussed in section 2. The second method is the random walk. The third method is the mean
forecast, when the term structure a year ahead is predicted to be equal to the average term
structure so far. Finally, we consider Diebold-Li forecasting procedure.
The Diebold and Lis (2002) procedure consists of the following steps. First, we regress the
term structure on three deterministic curves, the components of the Nelson and Siegels (1987)
forward rate curve:
f t (T ) 1t 2t e t T 3t t Te tT .
This regression is run for each day in the subsample which the forecast is based upon. 8 Then,
the time series for the coefficients of the regression are modeled as 3 separate autoregressive
processes of order 1 (each of the the current coefficient is regressed on the corresponding
coefficient one year before). A one-year ahead forecast of the coefficients is made, and the
corresponding Nelson-Siegel forward curve is taken as the one-year ahead forecast of the term
structure.
Figure 5 shows the predictive performance of the alternative methods considered.
8
We fix parameter t at 0.0609 as Diebold and Li (2002) do.
18
Figure 5. The Predictive Performance of Different Forecasting Methods
1.4
1.3
1.2
Root mean squared error
1.1
0.9
0.8
0.7
0 20 40 60 80 100 120
Maturity
The thick solid line on the above graph corresponds to our method. The dashed line is for
functional autoregression estimated with principal components. The dotted line is for mean
prediction. The dash-dot line is for Diebold and Li (2002) method and solid thin line is for
random walk. Our method has the best pseudo out of sample forecasting record uniformly across
different maturities. The functional autoregression estimated with principal components is the
second best for relatively short maturities and the third best for long maturities where the mean
prediction works equally well with our preferred method. The worst performance is shown by the
random walk.
It should be mentioned that if we include the recession period in our data the random walk
outperforms the other methods. Our reason for not pooling the normal growth and the recession
data is, as was mentioned above, that the functional autoregression model seems to be unstable
across these two periods. A switching regimes functional autoregression may be a solution to the
problem. We, however, leave this question for future research.
Conclusion
19
We have shown that prediction of function-valued autoregressive processes can benefit from
a novel dimension-reduction technique, the predictive factor decomposition. The technique
differs from the usual principal components method by focusing on the estimation of those linear
combinations of variables that matter most for the prediction, as opposed to those that matter
most for describing the variance. It turns out that the predictive factors can be consistently
estimated using a regularization of a generalized eigenvalue problem. To the extent that such
problems often arise in different research areas, our theoretical results on consistency of the
estimation procedure have an independent interest.
An empirical illustration of the new method to the interest rate curve dynamics demonstrates
that the new method is easy to estimate numerically and performs well. The results of this
illustration show that the predictive factors method not only outperforms the principal
components method but also performs on par with the best of the other prediction methods.
The possible venues for further developing the new method is to investigate how to chose
the optimal regularization parameter and the optimal number of predictive factors, and whether it
can help in making inferences about the autoregressive operator.
20
Appendix A
Consider an abstract real Hilbert space H . Let function f n map a probability space (,A,P) to H
. We call this function an H-valued random variable if the scalar product ( g , f n ) is a standard random
variable for every g from H .9

Definition 1. If E f , then there exists an element of H , denoted as Ef and called
expectation of f , such that
E ( g , f ) ( g , Ef ), for any g H .
and Ef 0 . The
2
Definition 2. Let f be an H-valued random variable, such that E f
covariance operator of f is the bounded linear operator on H , defined by

C f ( g ) E ( g , f ) f , gH .
If Ef 0 , one sets C f C f Ef .
Definition 3. Let f 1 and f 2 be two H-valued random variables, such that
and Ef 1 Ef 2 0 . Then the cross-covariance operators of f 1 and f 2

2 2
E f1 , E f 2
are bounded linear operators on H defined by

C f1 , f 2 ( g ) E ( g , f1 ) f 2 , g H ,
C f 2 , f1 ( g ) E ( g , f 2 ) f1 , g H .
If Ef 1 0 or/and Ef 2 0 , one sets
C f1 , f 2 C f1 Ef1 , f 2 Ef 2 ,
C f 2 , f1 C f 2 Ef 2 , f1 Ef1 .
Definition 4. A sequence n , n Z of H-valued random variables is said to be H-white noise if
1) 0 E n
2
2 , E n 0, Cn do not depend on n and
2) n is orthogonal to m ; n, m Z , n m ; i.e.,
E ( x, n )( y, m ) 0, for any x, y H .
n , n Z is said to be a strong H-white noise if it satisfies 1) and
2) n , n Z is a sequence of i.i.d. H-valued random variables.
Example: Stochastic Processes

Consider a set of stochastic processes, f i (x ) , on interval [0, X ] . Let Ef i ( x ) 0 for each x.
Let the covariance function of each process be Ef i ( x) f i (u ) ii ( x, u ) , and cross-covariance function
9
The definitions that follow are slight modifications of those in Chapters 2,3 of Bosq (2000).
21
between two processes be Ef i ( x ) f j (u ) ij ( x, u ) . Assume that with probability 1 the sample paths
of the processes are in L2 [0, T ] . Each stochastic process defines an H-valued random variable with zero
mean. The covariance operator of f i is the integral operator with kernel ii ( x, u ) :

T
C f i : g ( x) 0
ii ( x, u ) g (u ) du ,
and the cross-covariance operator of f i and f j is the integral operator with kernel ij ( x, u ) .
Appendix B
We first prove the following
Lemma 1 Under assumptions 1 and 2a, i is an eigenvalue of if and only if it is an eigenvalue of
21 12 11 . The corresponding eigenvectors of and 2112 11 , x i and bi
respectively, normalized so that xi 1 and 11 bi 1 are unique10 and such that xi 11 bi .

1/ 2 1/ 2
Proof: Suppose that i is an eigenvalue of . Assumption 2a guarantees that the corresponding
normalized eigenvector x i is unique and satisfies equation xi i1111 / 2 ' 111 / 2 xi . Using relationship
12 11 , it is straightforward to check that i1 ' 11

1/ 2
xi is an eigenvector of 21 12 11
associated with eigenvalue i . Now let i be an eigenvalue of 21 12 11 and bi a
corresponding normalized eigenvector. We have : 11

1/ 2
1/ 2
11 i
b i 111 / 2 bi 0 . Assumption 2a
implies that Ker 11

1/ 2
0 , and , therefore, 111 / 2 bi i 111 / 2 bi 0 , which proves that i is an
eigenvalue of and xi 11 bi is the corresponding normalized eigenvector. Since x i is unique and

1/ 2
Ker 11 0 , the eigenvector bi is unique.

1/ 2
Proof of Theorem 2: Transform the objective function in problem (5) as
tr 11 AB ' 21 12 BA' AB ' 11 BA'

2
E f t 1 AB ' f t
tr 11 tr AB ' 21 12 BA' tr AA'
tr 11 2tr B ' 21 A tr A' A ,
where the first equality follows from the fact that the expectation of the squared norm of an L2 -valued
random variable is equal to the trace of its covariance operator (see Bosq (2000) p.37), and the second
equality follows from the constraint B ' 11 B I k imposed on B .11 To see that the third equality holds,
10
Here uniqueness is understood modulo change in sign
11
We omit subscript k on Ak and Bk whenever convenient to make our notations more concise.
22
write tr AA' ei ' AA' ei and tr AB ' 21 ei ' AB ' 21ei , where ei is an

i 1 i 1
arbitrary basis in L2 . Then use the fact that A and B ' 21 are finite-dimensional vectors of functions
from L2 , and apply Parcevals equality.
We will first minimize the transformed objective function with respect to A , taking B as given. A
necessary condition for the optimal A to exist is that the Frchet derivative of the objective function with
respect to A is equal to zero (see, for example, proposition 2 in 7.2 and theorem 1 in 7.4 of Luenberger
(1969)). That is, 212 B 2 A 0 and we have A 12 B in accordance with statement i) of the
theorem.
Substituting A 12 B into the objective function, we get
E f t 1 AB' f t
2
tr 11 tr B ' 2112 B tr 11 tr B ' 11
1/ 2

11
1/ 2
B .
We can, therefore, reformulate problem (5) as
tr B' 111 / 2 111 / 2 B max , subject to a constraint
B ' 11 11 B I k and a requirement that B' 11

1/ 2 1/ 2 1/ 2
111 / 2 B is a diagonal matrix with non-increasing
elements along the diagonal.
Assumption 2a implies that there exists a unique solution X to the related problem:
(B1) tr X ' X max
subject to X ' X I k and a requirement that X ' X is a diagonal matrix with non-increasing elements
along the diagonal (see the proof of the spectral theorem III.5.1 in Gohberg and Gohberg (1981)). The
maximum is equal to the sum of the k largest eigenvalues of , and the solution X consists of the
corresponding normalized eigenvectors. By lemma 1, B 111 / 2 X is well defined and consists of the
~ is another
first k eigenvectors of 2112 11 . It is obviously a unique solution to (5), for if B
solution, then ~
~ because there are no zero eigenvalues of
111 / 2 B B 0 , which implies B B .
Statement ii) of the theorem follows from the facts that, by lemma 1, the eigenvalues of and
2112 11 coincide, the maxima in (B1) and (5) are equal, and the maximum in (B1) is equal to the
sum of the k largest eigenvalues of .
To prove iii) note that since Ker 11 0 , Im111 / 2 L2 , and therefore:
1/ 2
(B2) Ak Bk' sup Ak Bk' z sup A B

k
'
k
1/ 2
11 x
z 1 1/ 2
11 x 1
Let x i be the i -th normalized eigenvector of . Note that xi forms an orthonormal basis in L2
and
(B3) Ak Bk' 111 / 2 111 / 2 k ,
23
where
B B x , x
1/ 2 ' 1/ 2 k
k 11 k k 11 i1 i i
and the latter equality follows from lemma 1. Substituting (B3)
into (B2), we have: Ak Bk sup

'
111 / 2 I k x .
1/ 2
x 1 11
Suppose that Ak B does not converge to zero. Then there exists a sequence z k such that
'
k
1/ 2
11
z k is bounded and 111 / 2 I k z k does not converge to zero. Without loss of generality, we
can assume that
(B4) 111 / 2 I k z k 0
for any k.
Note that since, by assumption, is a compact operator and
1/ 2
11 z k is a bounded sequence,
sequence 1/ 2
11
z k must have a converging subsequence. Without loss of generality, let us assume that
(B5) 111 / 2 z k z
for some z L2 .
Since x i are the eigenvectors of 11
1/ 2
' 111 / 2 , the compact operator 111 / 2 has
representation
x , y
1/ 2 1/ 2
1 i1 i i i
, where y i is an orthonormal basis in L2 . Denoting xi , z k as
ik and y i , z as i , we therefore can rewrite (B4) and (B5) as:

(B6) ik
ik 1i / 2 y i
and
ik 1i / 2 y i i 1 i y i

(B7) i 1
ik 1i / 2 y i i 1 i y i / 2 for any k K 1 . Since

Let now K 1 be so large that i 1
yi is an orthonormal basis in L2 ,
ik 1i / 2 y i i k i y i ik 1i / 2 y i i 1 i y i

ik i 1
and hence
(B8) i k
ik 1i / 2 y i i k i y i / 2
for any k K 1 .
Let K 2 be so large that
(B9) i k
i yi / 2
for any k K 2 .
Combining (B8) and (B9), we have
24
ik
ik 1i / 2 y i ik
ik 1i / 2 y i i k i y i i k i y i
ik
ik 1i / 2 y i i k i y i ik
i yi
for any k max K 1 , K 2 . But this contradicts (B6). Hence our assumption that Ak Bk' does
not converge to zero is wrong and statement iii) of the theorem is established.
Appendix C
Proof of Theorem 3: We, first, prove an extension of lemma 1 in Leurgans et al (1993). Let us define
, (3n )
11 11 , (2n )
(1n ) 12 12
12 2112 , and n max (in ) . We
21
i 1, 2 , 3

have:
Lemma 2 If assumption 1 holds and f t has bounded support, n log n / n

1/ 2
almost
surely.
Proof: Corollary 4.1 and theorem 4.8 of Bosq (2000) imply that for i 1 and i 2 ,

(in ) log n / n
1/ 2
a.s. Now,
21
12 21 12
21
12 21
12 21
12 21 12
21 21
12 12 log n / n 1 / 2
12 21
almost surely, which completes the proof.
Consider Rayleigh functionals:
b' 21 12 b b' 2112 b b' 21 12 b

(b) , (b) , and (b)
b' 11 b b' 11 I b b' 11 I b
for operator pencils 2112 11 ,
2112 11 I , and 21 12 11 I
respectively. According to the max-min principle (see Eschw and Langer (2004)), the eigenvalues of the
above operator pencils satisfy relationships:
j max min (b), j max min (b), and j max min (b).
dim M j bM dim M j bM dim M j bM
To prove convergence j j , we will explore relationship between these Rayleigh functionals.
Proposition 1. Suppose that 0 , n / log n

1/ 2
as n . Then

sup (b) (b) (1 1 ) 1 n o 1 n 0 almost surely as n .
2
bL
The proof is based on lemma 2 and is essentially the same as that of Proposition 3 in Leurgans et al (1993)
and we omit it here.
25
Using Proposition 1, it is easy to prove part i) of our theorem. Note that since (b) (b) for
any b L2 , we have:
j max min (b) j max min (b) min (b) j /(1 j 1 ) .

dim M j bM dim M j bM bsp b1 ,...,b j
Therefore,

sup j j sup j j /(1 j 1 ) 1 1 1 /(1 kn1 ) 0 .
jk n jk n
Furthermore, Proposition 1 implies that
sup j j sup max min (b) max min (b) sup (b) (b) 0 a.s.
j kn dim M j bM
j kn dim M j bM 2
bL
Thus,
sup j j sup j j sup j j 0 a.s.

j kn j kn j kn
which proves part i) of the theorem.

Let us now turn to part ii) of the theorem. Denote bj b j ' 11 bj b j as d j and bj ' bj
as m j . Below, we are going to find an upper bound on sup
d j m j and show that this bound tends to
jk n
zero almost surely.

on b1 ,..., b j in metric ,
Consider the least squares regression of bj 11
bj i 1 ji bi s j ,
j
where, for any i j , ji bj ' 11 bi , and the residuals s j have the following properties. First,
(C1) s j ' 11 bi i1 s j ' 21 12 bi 0 .
Using (C1) and the normalization bj '

I b , we also get:
11 i ji
(C2)
s j ' 11 s j bj ' 11
11 bj m j j21 ... jj2 1 .
In addition, using (C1), it is straightforward to check that:
(C3) s j ' 11 s j d j j21 ... j2, j 1 (1 jj ) 2 ,

and
(C4)
s j ' 2112 s j bj ' 2112

21 12 bj j 1 j1 ... j jj
2 2
Finally, we have:
(C5) s j ' 21 12 s j j 1 s j ' 11 s j .
It is because, as follows from lemma 1, j 1 max (b).

b 11 sp b1 ,..., b j
26
Subtracting (C2) from (C3), rearranging, and using the fact that bj ' 11
b 1 , we
11 j n
obtain:
(C6) d j m j 21 jj 1 n .
Expanding (C5) using (C2) and (C4), and rearranging, we get:
1 jj2 j j 1
1
b ' (11 11 )bj bj ' (2112 21 12 )bj
j 1 j
j j i 1 (i j 1 ) ji2 j 1m j .
j 1

Recalling that bj ' 11 11 j n
b 1 , b '
j 21 12

21 12 bj
1
n , that m j 0 ,
and that, from the proof of statement i),
j j j j 1 o( j 1 ) (1 1 ) 1 n o( 1 n ) , we have, for large enough n:

(C7) 1 jj2 j j 1
1
(3 1 j 1 ) 1 n 2 j j 1 i 1 (i j 1 ) ji2 .
j 1

Note that, as was mentioned in remark 2 to the theorem, statement ii) holds for a particular choice in
the sign of the eigenvectors bj and b j . More precisely, the regression coefficient jj must be
positive. This fact implies that 1 jj 1 jj . Combining this inequality with (C6) and (C7), and using
2
the fact that 0 j 1 , we get for large enough n:

(C8) d j m j 2 j j 1
1
3(1 ) 1 n 2 j j 1 1 i 1 ji2
1 j 1

Now, we will analyze the behavior of the least squares regression coefficients ji , i j as n .
First, note that the normalization bj '
I b implies:
11 i ji
(C9) bj ' 11 bj bj ' 11

11 b j bj '
b 1 1
11 j n
and
b j ' 11 bi b '
2
j 11
11 bi bj ' bi 2
(C10)
2bj ' 11 11 bj bi ' 11 11 bi 2bj ' bj bi ' bi
o( 1 n ) 2mi
Using (C9) and (C10), we have:

ji2 bj ' 11 bi b ' b b b ' b
2
j 11 i i j 11 i
2
2b ' b b 2b ' b
2 2
j 11 i i j 11 i
(C11)
2b ' b b b ' b b 2b ' b
2
j 11 j i i 11 i i j 11 i
4 d i mi
27
for large enough n . Substituting (C11) into (C8), rearranging, and using the fact that, for j kn ,
j 1 kn1 , we obtain for large enough n :
(C12) d j m j 81 j j 1
1
(1 1
1 ) 1 n kn1 i 1 d i mi
j 1
.
It is straightforward to check, that if a sequence of real numbers x j satisfies recursive inequality

x j g j f i 1 xi
j 1
, then x j g j 1 g f . Applying this observation to (C12), we get:
j 1
i 1 i
(C13)
sup d j m j g k n i n1 1 g i (1 11 ) 1 n kn1 ,
k 1

jkn
1
where g i 81 (i i 1 ) . The right hand side of (C13) tends to zero almost surely as n by
lemma 2 and the assumptions of the theorem. This completes our proof of statement ii).
Appendix D
Proof of theorem 4: First, note that
B
f t A ' f a a a a ,
t 1 2 3 4
where a1 AB' f t , a 2 12 B B
B' f , a
t 3 12
B
12
B ' f , and
t
B
a4 12
BB
' f .
t
Let

4
t i 1 i
B ' f | A , B 4 Pr a | A , B . Below we will
. We have: Pr f t A
show that each of the terms in the latter expression converges to zero almost surely.
Since f t , statement iii) of theorem 2 implies that a1 0 and Pr ai | A , B 0 a.s.

Further, using (C13), we have:

Pr a 2 | A , B 2 tr 12 B B B' 11 B B B ' 21 2 tr B B ' 2112 B B

2 tr B B ' 111 / 2 111 / 2 B B 1 2 tr B B ' 11 B B

1 2 k n g k n i n1 1 g i (1 11 ) 1 n kn1 0
k 1
a.s.
For a 3 and a 4 we have:

Pr a3 | A , B 2 tr 12
12 B B' 11 BB ' 21
12 B B ' 21
21 2 tr 12 21
21

2 tr B ' 21
12 B 2 tr B ' B
21 12 21

12 12
2
2 k n 0 n
a.s.

28

Pr a 4 | A
, B 2 tr
12
B B B ' B B B '
11 21
2 tr B B ' B B B '
11

21 12 B
2 tr B B ' B B B '
11 21 12
B tr B B ' B B
2
11

2 k n g k n i 1 1 g i (1 11 ) 1 n kn1 0
k n 1
a.s.
This completes the proof of theorem 4.
References:
A. Ang and M. Piazzesi (2003) A no-arbitrage vector autoregression of term structure dynamics with
macroeconomic and latent variables Journal of Monetary Economics, 50, 745-787
T. W. Anderson (1984) An Introduction to Multivariate Statistical Analysis, 2nd edition, John Wiley
and Sons
D. Bosq (2000) Linear Processes in Function Spaces: Theory and Applications, Springer-Verlag
G.E.P. Box and G.C. Tiao (1977) A canonical analysis of multiple time series Biometrika, 64, 355-
365
J. H. Cochrane and M. Piazzesi (2002) Bond Risk Premia NBER Working paper 9178
F. X. Diebold and C. Li (2002) Forecasting the Term Structure of Government Bond Yields
Working Paper (available at http://www.ssc.upenn.edu/~diebold )
G. R. Duffee (2002) Term Premia and Interest Rate Forecasts in Affine Models Journal of Finance,
57, 405-443
D. Duffie and R. Kan (1996) A Yield-Factor model of Interest Rates Mathematical Finance, 6, 379-
406.
P.H. Dybvig (1997) Bond and Bond Option Pricing Based on the Current Term Structure in
Mathematics of Derivative Securities ed. by M.A.H. Dempster and S.R.Pliska, Cambridge University
Press, 271-293
D. Eschwe and M. Langer (2004) Variational principles for eigenvalues of self-adjoint operator
functions Integral Equations and Operator Theory, 49, 287-321
29
J.J. Fortier (1966) Simultaneous linear prediction Psychometrika, 31, 369-381.
I. Gohberg and S. Gohberg (1981) Basic Operator Theory, Birkhauser, Boston, Basel, Berlin.
H. Hotelling (1936) Relations between two sets of variates Biometrika, 28, 321-377
S. E. Leurgans, R. A. Moyeed, and B. W. Silverman (1993) Canonical Correlation Analysis when

Data are Curves Journal of Royal Statistical Society B, 55, 725-740
D. G. Luenberger (1969) Optimization by Vector Space Methods, John Wiley & Sons, Inc. New
York, Chichester, Weiheim, Brisbane, Singapore, Toronto.
M. Piazzesi (2003) Bond Yields and the Federal Reserve Working paper
J.O. Ramsay and B.W. Silverman (1997), Functional data analysis, Springer, New York.
G. Reinsel (1983) Some results on multivariate autoregressive index models Biometrika, 70, 145-
156
O. A. Vasicek (1977) An Equilibrium Characterization of the Term Structure Journal of Financial

Economics, 5, 177-188
30

Euro Curve Forecasting

Încărcat de

Informații document

Descriere originală:

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Euro Curve Forecasting

Încărcat de

Drepturi de autor:

Formate disponibile

Curve Forecasting by Functional Autoregression

V. Kargin1 (Cornerstone Research)

Here for each t , f t is an element of a Hilbert space H , is a linear bounded operator

on H , t is a strong H-white noise3, and h is the lag length. When H is a finite-dimensional

predictive power than other factors.

2. The Estimation Problem

Assumption 1 There exists an integer j 1 such that 1.

Here L2 denotes the operator norm induced by the L2 norm.

To forecast f t h we need to estimate . Let 11 be the covariance operator of random

Consequently, the empirical covariance operator 11 is singular and cannot be inverted.

of k n eigenvectors of 11 associated with the largest eigenvalues, as H k n , and let k n be the

covariance and cross-covariance operators restricted to H k n . Then define

eigenvalues of the covariance operator 11 ordered in the decreasing order.

unconditional moment, and that is Hilbert-Schmidt. Then, if for some 1

dz i i ( i z i )dt i dwi , i 1,..., k

1 e i x . It is straightforward to check that such a reduction is equal to

example, 11 acts as follows

1 e i x , i 1,..., k and the eigenvalues are equal to Var xi 1 e x

explicit formula for the eigenvalues is:

where Ak : R L and Bk : L R . In section 2 we argued that the principal components

subject to normalizing constraints Bk 11 Bk I k and Ak Ak is diagonal with non-increasing

predictive factors and Ak the corresponding predictive factor loadings.

transforming g into f ' g as f ' .

Theorem 2 Under assumptions 1 and 2a we have:

up to a simultaneous change in sign of Bk and Ak . Vector Bk consists of the eigenvectors of

21 12 11 arranged in the order of declining eigenvalues. Vector Ak is equal to 12 Bk .

ii) The i th eigenvalue of 21 12 11 is equal to the reduction in the mean squared

The cross-covariance operator 12 acts as follows:

2112 11 , 2112 11 I and

b j ' 11 bi ji , bj ' 11 I bi ji , bj '

Appendix C proves the following

If n / log n 1 / 2 , 0 , and k n does not increase too fast so that kn1 0 as n ,

i) sup j j 0 almost surely as n .

remains true if n / log n 1 / 2 is replaced by log n n / log n 1 / 4 for some 1 / 4 .

in the sign of the eigenvectors bj and b j .

If n / log n 1 / 2 and 0 as n , then for any integer k 1

value than can be bounded as follows:

loadings, aj . Indeed, we have:

make a payment of at least $1,000,000 Pt (T ) / Pt (T ) . By arbitrage considerations we see

Figure 1. Forward Curve Evolution

5. One year ahead prediction of the term structure

Figure 2. The evolution of matrix entries of the estimate of operator .

corresponding factor loadings ( b1 and a1 respectively, in terminology of section 5) for 0 ,

First predictive factor weights First predictive factor loadings

Predictive factor weights Predictive factor loadings

variable for every g from H .9

covariance operator of f is the bounded linear operator on H , defined by

Definition 3. Let f 1 and f 2 be two H-valued random variables, such that

and Ef 1 Ef 2 0 . Then the cross-covariance operators of f 1 and f 2

are bounded linear operators on H defined by

Definition 4. A sequence n , n Z of H-valued random variables is said to be H-white noise if

2) n , n Z is a sequence of i.i.d. H-valued random variables.

Example: Stochastic Processes

mean. The covariance operator of f i is the integral operator with kernel ii ( x, u ) :

Lemma 1 Under assumptions 1 and 2a, i is an eigenvalue of if and only if it is an eigenvalue of

21 12 11 . The corresponding eigenvectors of and 2112 11 , x i and bi

respectively, normalized so that xi 1 and 11 bi 1 are unique10 and such that xi 11 bi .

Proof: Suppose that i is an eigenvalue of . Assumption 2a guarantees that the corresponding

12 11 , it is straightforward to check that i1 ' 11

associated with eigenvalue i . Now let i be an eigenvalue of 21 12 11 and bi a

corresponding normalized eigenvector. We have : 11

implies that Ker 11

eigenvalue of and xi 11 bi is the corresponding normalized eigenvector. Since x i is unique and