Documente Academic
Documente Profesional
Documente Cultură
model accuracy. In the model confidence set approach, the ŷi . These models might include naive autoregressions,
number of models trimmed is not fixed exogenously by Bayesian vector autoregressions, factor models, and DSGE,
the econometrician, but is determined by a statistical test among others.
comparing model accuracies. In our application to the fore- We first provide an introduction to the MCS, then detail
casting of macroeconomic indicators in the US, we employ how we use it as a trimming device to parse models and
the often-used approach of averaging the forecasts of many form conditional forecast combinations. We then contrast
bivariate models,2 and find substantial improvements in the results obtained using the MCS with those obtained
forecast combination accuracy after trimming the set of using a rule that simply ranks the models according to their
potential models to be combined using both the fixed and past out-of-sample forecasting performances and trims a
MCS schemes, but the gains from using the MCS approach fixed share of the worst performing models.
are larger and more robust.
The idea of trimming the set of potential models prior to 2.1. Exogenous fixed trimming
forecast combination is not novel. Makridakis and Winkler
(1983) studied the effects of adding forecasts to a simple In the fixed-rule trimming scheme, the number of fore-
combination, and found the marginal benefit of adding casting models to be discarded is fixed exogenously. The
forecasts to a simple combination to decrease very rapidly analysis below refers to this approach as fixed trimming.
once a relatively small number of forecasts has been We construct the conditional forecast combination by
included. In the same spirit, Timmermann (2006) argued ranking the models according to their past MSPEs, discard-
that the benefit of adding forecasts should be weighed ing a fixed proportion of models, and using the remaining
against the cost of introducing an increased parameter ones to form the set of best forecasts. It is important to
estimation error. He considered three straightforward note that while the number of models to be discarded (and
trimming rules: combining only the top 75%, 50% or 25% of hence the number to be combined) is fixed exogenously,
models, based on the models’ out-of-sample MSPEs.3 The there is nothing constraining the procedure to discard the
author found aggressive trimming to yield better results; same models in each forecast period. Different models will
in other words, including fewer models in the combination be trimmed and used according to their respective MSPE
led to better forecasts. In stock return forecasting, Favero ranks in the periods preceding the forecasting period. More
and Aiolfi (2005) also found that aggressive trimming rules formally, let Fτ be a set of i = 1, . . . , n candidate mod-
based on models’ R2 values improved forecasts. In their els for forecasting in period τ . We estimate each model i
application, trimming 80% of the forecasts led to the best using R periods of data. Fixed trimming requires a train-
results. When combining forecasts from various models for ing sample of S periods of forecasts from each of candidate
inflation in Norway, Bjørnland et al. (2011) argued that a models. Thus, the first period for which we can apply fixed
strategy that combines only the 5% best models leads to trimming is R + S + 1. Individual models are estimated us-
the best forecast combination. ing data from periods t = τ − R, . . . , τ − 1, and a rolling
We find that significant gains for the fixed trimming sample of S previous forecasts is used to compare model
method are restricted to strategies that aggressively trim performances. The particular rule that we employ discards
80%–95% of the models. On the other hand, the MCS trim- a fixed proportion of the models in Fτ , such that
ming rule results in significant accuracy improvements for
a wide range of parameters that govern the confidence Fτ∗ = {i ∈ Fτ : MSEi,τ ≤ Pτ (x)}, (1)
level with which the set of best models is identified. Monte where Pτ (x) is the xth percentile of MSEi,τ . With this trim-
Carlo evidence informs the intuition that forecast accuracy ming rule, the forecaster has to decide on the proportion
gains from trimming models based on their historical out- of models to be trimmed. We perform a systematic analy-
of-sample performances arise mainly in environments in sis to show how the MSPE of the final combination would
which some of the models have very little predictive abil- change for a wide range of different percentiles.
ity relative to others.
The outline of the paper is as follows: Section 2 lays out 2.2. The model confidence set approach to trimming
the trimming schemes, while Section 3 details the results
of the Monte Carlo exercise. Section 4 describes our em- An important drawback of the simple trimming rule
pirical application to the forecasting of US macroeconomic discussed above is that it does not take into account the sta-
variables. Finally, Section 5 concludes. tistical significance of differences in the historical perfor-
mances of the forecasting models. In principle, one might
2. Trimming rules easily conjecture a situation where the best and worst fore-
casts have mean squared prediction errors that are not
Our starting point is a situation in which the forecaster statistically different from each other. We use the model
has a toolbox of different models with which to predict confidence set method of Hansen et al. (2011) to identify
a variable of interest y. Each model i implies a forecast the set of best models, then trim the models that are ex-
cluded from the MCS prior to forecast combination. From
a frequentist perspective, the model confidence set ap-
2 See for example Faust et al. (2013), Stock and Watson (2004), and proach is a tool for summarizing the relative performances
Wright (2009). of an entire set of models by determining which models
3 Timmermann (2006) used a recursive weighting scheme based on the can be considered to be statistically superior, and at what
MSE. We use a rolling window. level of significance.
50 J.D. Samuels, R.M. Sekkel / International Journal of Forecasting 33 (2017) 48–60
We keep the notation of Hansen et al. (2011), and When constructing the model confidence set M̂τ∗ , we
refer the reader to the original paper for a more detailed choose our baseline confidence level of α by performing
exposition. The MCS aims to identify the set Mτ∗ , such that: a systematic analysis, then keep models with associated
p-values that are greater than or equal to α . We set our
Mτ∗ = {i ∈ Mτ0 : ui,j,τ ≤ 0 for all j ∈ Mτ0 },
baseline α using the results from our Monte Carlo exer-
where Mτ0 are the initial set of candidate models. Like with cise, and conduct a sensitivity analysis to this choice in Sec-
the fixed trimming, the candidate models are estimated tion 4.6.
using R observations, and the MCS method requires a
training sample of S periods of forecasts from each of the 3. Monte Carlo evidence
candidate models. These S periods of forecasts are used to
define ui,j,τ : ui,j,τ = E (dij,τ ), where dij,τ = Li,t − Lj,t is We shed more light on the benefits of our trimming
the model loss differential defined over the training period. approach by conducting a Monte Carlo study, in line with
We define the loss in period i as the squared error. That Alvarez et al. (2012) and Inoue and Kilian (2008). We posit
is, given the set of all forecasting models Mτ0 in the initial that there are N = 50 predictors for yt +1 . The simulations
comparison set, the MCS searches for the set of models at are based on 500 replications, with T = 150.7
time τ that cannot be rejected as statistically inferior at a The data generating process (DGP) for the predictors
chosen level of confidence. {xit }Ni=1 follows a factor structure, and is given by
The MCS is based on the following algorithm. Starting
from the set of all models Mτ0 , repeatedly test the null xt = F t Λ + ϵ t ,
hypothesis of equal predictive accuracy, H0,M : ui,j =
where Λ is a column vector of N ones, and Ft is generated
0 ∀i, j, at significance level α . If the null is rejected,
the procedure eliminates a model from Mτ , and this is from a standard normal distribution.
repeated until the null of no difference between models The idiosyncratic component is assumed to be a zero
cannot be rejected at the chosen level of significance. The mean shock, with variance–covariance matrix Σϵ . We
∗ introduce correlation into the idiosyncratic component of
set M̂τ 1−α with the remaining models is denoted as the
the N different predictors by assuming that Σϵ is given by
MCS, Mτ∗ .4 the Toeplitz matrix
We test H0 , which is done sequentially until we reach
the case when the null is not rejected, by constructing ρ ρ2 ρ N −1
n
1 ···
t-statistics based on d̄ij,τ ≡ S −1 t =1 dij,τ , the relative loss ρ 1 ρs ··· ρ N −2
of model i relative to model j. The pertinent test statistics
Σv = ρ ρ ρ N −3
2
1 ··· . (2)
are . .. .. ..
.. ..
d̄ij,τ . . . .
tij,τ = ρ N −1 ρ N −2 ρ N −3 ··· 1
v ar
ˆ (dij,τ )
and We report results for ρ = 0, 0.1, 0.5, and 0.9. That is,
we consider a range of simulations, from the case where
TR,Mτ = max |tij,τ |. the idiosyncratic terms between the first two predictors
i,j∈Mτ
are completely uncorrelated, up to the case where this
The TR,Mτ statistic ensures that the rejection or other- correlation is 0.9.
wise of the null of no difference in model performances de- Finally, the DGP for yt +1 is given by:
pends only on the model with the greatest relative loss. The
implementation of this statistic is particularly convenient yt +1 = β ′ xt + εt +1 , (3)
because the decision rule as to which model to eliminate
is given by eR,M = arg maxi∈M supj∈M tij , that is, the model where εt +1 ∼ NID(0, 1). Following Inoue and Kilian
with the largest t-statistic. (2008), we propose five different scenarios for the slope
The asymptotic distribution of this test statistic is non- parameter vector:
standard, as it depends on the cross-sectional correlation of Design B1. β = c1 [1, 1, 1, . . . , 1]′
the tij,τ . The MCS procedure addresses these issues by using
Design B2. β = c2 [50, 49, 48, . . . , 1]′
bootstrap methods to estimate the distribution of the test
Design B3. β = c3 [1, 1/2, 1/3, . . . , 1/50]′
statistic. We ensure that the estimates of the distribution
reflect the persistence in the di,j,τ by employing the block
Design B4. β = c4 [11×10 , 01×40 ]′
Design B5. β = c5 [e−1 , e−2 , e−3 , . . . , e−50 ]′ .
bootstrap proposed by Hansen et al. (2011).5 Our choice of
the block size depends on the forecast horizon.6 In design B1, all variables are equally important
predictors of yt +1 . In such an environment, one would not
expect to find any gains from trimming the set of potential
4 If the null is not rejected in the first round, M ∗ = M 0 .
τ τ predictors based on their past forecasting performances,
5 Given the computational intensity of the MCS method, increasing the
because all predictors should have equal predictive power,
number of models to thousands or millions would make the approach
very costly computationally.
6 For the one-quarter-ahead forecasts, we use a block size of two
quarters. For the two- and four-quarter-ahead forecasts, we use block 7 Within each Monte Carlo draw, each forecasting period t requires a
sizes of three and six quarters, respectively. bootstrap in order to determine the model confidence set.
J.D. Samuels, R.M. Sekkel / International Journal of Forecasting 33 (2017) 48–60 51
on average. In all other designs, the predictive powers of Finally, all of these designs exhibit an increase in accuracy
the 50 predictors generated are different. In design B4, gains as the importance of the predictors relative to the
a small group of variables (ten) have equal importance, errors (R2 ) increases.
while the majority of predictors have no importance at all The Monte Carlo results indicate that greater gains from
(zero loadings). Designs B2, B3 and B5 incorporate smooth trimming are expected when many of the predictors have
decays in the relative importance of each xi . This decay weak forecasting power, which is a common situation
is slow in design B2, but fast in design B5 (exponential), in macroeconomic forecasting. Hence, assigning weak
meaning that a few variables will have relatively high forecasts a weight of zero reduces the impact of parameter
predictive power for yt , and the remainder will have estimation uncertainty, and leads to a better bias–variance
basically zero forecasting power. One would expect that trade-off. The bias from under-fitting is more than
the gains from trimming the set of predictors should compensated for by the fall in the estimation uncertainty.
be particularly large in situations like those proxied by
design B5. As per (Inoue and Kilian, 2008), the scaling
constants c1 . . . c5 are chosen such that the R2 values of 4. Empirical application
the forecasting models are the same across all designs. We
show the results for R2 values of 25% and 50%. We test the benefits of trimming the number of models
We compare the performances of untrimmed forecast prior to forecast combination by applying the methods
combination and the fixed and MCS trimmed forecast discussed above in a commonly-used setting, averaging
combinations. To begin, we construct an out-of-sample bi-variate models’ forecasts using a large panel of US
forecast for yt +1 based on each of the 50 predictors, using macroeconomic data.8
a univariate model for each predictor xi :
yt +1 = γi xi,t + εi,t , (4) 4.1. Data
Table 1
Variables and transformations in our large dataset.
Variable Transf. Variable Transf.
real-time or revised data should not affect the relative fore- from t − 1 to t, and xt be the additional predictor. We define
cast accuracy when comparing different approaches. ȳt +h as the h-quarter-ahead average growth rate to be
i=1 yt +i /h. We estimate the models
h
forecasted, ȳt +h =
4.2. Models for h = 1, 2, and 4. For each individual series {xi,t }107
i=1 in
our macroeconomic and financial panel, we estimate the
The individual forecasts to be combined are based on following model for each variable of interest:
linear autoregressive models that contain one additional P
predictor per model. Let t date the predictors, yt be the ȳt +h = αi + βi yt −j + γi xi,t + εi,t +h . (5)
annualized growth rate of the variable to be forecasted j =0
J.D. Samuels, R.M. Sekkel / International Journal of Forecasting 33 (2017) 48–60 53
Table 2
Monte Carlo fixed trimming.
ρ = 0.90 ρ = 0.50 ρ = 0.10 ρ=0
R2 = 25 R2 = 50 R2 = 25 R2 = 50 R2 = 25 R2 = 50 R2 = 25 R2 = 50
These models are estimated using rolling samples.10 pseudo out-of-sample forecast, estimated when predictor
P is based on the Bayesian information criterion (BIC) of i becomes available at time t. The combined forecast is
the univariate AR of yt on its lags, and is calculated for constructed as:
each rolling sample, meaning that P can change for each
m
t’s forecast, but is held fixed across each predictor-based
ȳˆ t +h = wi,t ȳˆ i,t +h , (6)
model i, and each horizon h.
i=1
We estimate the parameters of the models using an in-
sample rolling sample size of R = 40 quarters. Given the where ȳˆ t +h is the final combination forecast, wi,t is the
start date of our data (1959Q3), the first one-step-ahead
out-of-sample forecast from the individual models will be weight assigned to each i individual forecast ȳˆ i,t +h at
for 1969Q3. We then use a training sample of S = 20 period t.
out-of-sample forecasts to evaluate which models should
be trimmed before combination. Hence, our first one-step-
4.3.1. Equal weights
ahead forecast combination will be for 1974Q3.
The simplest, and often the most effective, forecast
combination method is the simple mean of the panel of
4.3. Forecast combination methods forecasts. With this approach,
Table 3
Monte Carlo MCS trimming.
ρ = 0.90 ρ = 0.50 ρ = 0.10 ρ=0
R2 = 25 R2 = 50 R2 = 25 R2 = 50 R2 = 25 R2 = 50 R2 = 25 R2 = 50
S the number of periods in the rolling training sample, the follows Faust et al. (2013) closely. We start with n possible
weight for model i used in forecasting period t is models, Mi . The ith model is given by
MSE − 1
i,(t −1−S ,t −1)
P
wi,t = . (8) ȳt +h = αi + βi yt −j + γi xi,t + εi,t +h , (10)
M
MSE − 1 j =0
i,(t −1−S ,t −1)
i=1
where ȳt +h is the variable that we are forecasting at
Thus, the weights will be bounded by zero and one, and horizon h, xit is the predictor that is specific to model i, and
sum to one. This approach is an intermediate case between εi,t +h ∼ i.i.d. N (0, σi ). All models have the same number
equal weighting and ‘‘optimal weighting’’ because it relies of lags P. The model-specific predictor xit is assumed to be
on data for weights, but limits the parameter estimation orthogonal to the common predictors (a constant and the
by ignoring the covariances of the forecasts. This approach lags of yt ).
was considered by Bates and Granger (1969). Given a prior probability that the ith model is true,
P (Mi ) and the data D, the posterior probability that the ith
model is the true model can be updated according to
4.3.3. Mallows model averaging
Our third combination method is Mallows model P (D|Mi )P (Mi )
averaging (MMA), proposed by Hansen (2007, 2008). This P (Mi |D) = n
, (11)
P (D|Mj )P (Mj )
combination method is based on the model selection
j =1
criterion of Mallows (1973). The basic idea of MMA is to
obtain the combination weights that minimize the MSE where P (D|Mi ) is the marginal likelihood of the ith model.
over the set of possible forecast combinations. This method We assume that all models are equally likely, meaning
selects the weights W by minimizing the Mallows criterion P (Mi ) = 1/n. The parameter priors for αi , βi and σi are
2 uninformative, and proportional to (1/σi ) for all i. The
m m
prior for γi conditional on σi is the (Zellner, 1986) g-prior,
Cn ( W ) = ȳt +h − wi ȳˆ i,t +h + 2σ̂ 2 wi km , (9)
N (0, φσi2 (Xi′ Xi′ )−1 ), where the hyperparameter φ governs
i =1 i=1
the strength of the prior.
where km is a vector with the number of parameters in each The Bayesian h-period-ahead forecast for each model
model. In our case, km is the same for every model in the is
combination because each model uses the same number
P
of own lags and a single additional predictor.
ȳ˜ i,t +h = α̂ + β̂ yt −j + γ̃i xi,t , (12)
j =0
4.3.4. Bayesian model averaging φ
where γ̃i = ( 1+φ )γ̂i represents the posterior mean of γi ,
The last combination approach that we consider is
Bayesian model averaging (BMA), and our implementation and α̂ and β̂ are the OLS estimators of α and β , respectively.
J.D. Samuels, R.M. Sekkel / International Journal of Forecasting 33 (2017) 48–60 55
In this framework, the marginal likelihood of the ith model 4.5. Main results
reduces to
12 We concentrate our analysis on the one-year-ahead
1 forecasts, and provide additional evidence for the one- and
P (D|Mi ) ∝
1+φ two-quarter-ahead horizons in a supplementary appendix
(see Appendix A). Table 4 shows the one-year-ahead
− (T −2 P )
φ
1 MSPEs and MCS p-values for the 12 combinations included
∗ SSR + SSRi , (13) in Mτ0 .
1+φ 1+φ
Several results emerge from this exercise. Of the
where SSR is the sum of squared residuals from a regression four data-rich forecasts without trimming, we see that
without xi s, and SSRi is the squared residuals from model i. either MMA or BMA often has the lowest MSPE for the
The posterior probabilities can be calculated from Eq. (2), predicted variables. Wright (2009) also found that BMA
and the final BMA forecast is given by forecasts are generally more accurate for US inflation
than simple averaging. Schwarzmüller (2015) studied the
M
performances of different pooling methods for Euro area
ȳ˜ t +h = P (Mi |D)ȳ˜ i,t +h . (14)
GDP bridge models, and showed that MMA compared
i=1
favorably to other pooling methods. The MCS-trimmed
Hence, the final BMA forecast takes model uncertainty forecast combinations perform well relative to this set of
into account by weighting each model by its posterior forecasts. For most of the variables, the trimmed forecasts
probability. combined with either EW or MSPE weights are the most
As was observed by Faust et al. (2013), we view the accurate forecasts.
forecasting scheme above as a pragmatic approach to the In addition to showing the results for the full sample, we
combination of the individual models, and make no claims also display them for an initial sub-sample from 1974Q3 to
as to its Bayesian optimality properties. Several of the 1984Q4. This sub-sample precedes the start of the Great
conditions for strict optimality are not met in typical macro Moderation (GM) period. Stock and Watson (2007) and
time series studies. First, the regressors are assumed to Tulip (2009) argued that the predictable component of
be strictly exogenous, an assumption that is clearly false macroeconomic series was reduced significantly during
in the current application. Second, the errors are assumed the GM, especially in the case of inflation. Hence, there
to be i.i.d., but the overlapping nature of the h-step- was more information for distinguishing the forecasting
ahead forecasts introduces serial correlation in the forecast performances of the models prior to the GM period. Table 5
errors that are less than h periods apart. Nevertheless, shows the results for this pre-GM period. The MSEs of
several authors have shown this approach to produce very the forecast combinations are considerably higher for
competitive out-of forecasts in similar applications.11 this period than for the full sample overall. Nonetheless,
the accuracy gains from trimming the worst performing
models are significantly higher as well.
4.4. Inference Another point that is worth highlighting is the fact that
there are minimal to no gains from using MMA or BMA
We compare the performances of the forecast com- after trimming. In sharp contrast to the other combination
binations, before and after trimming, with the above- methods (EW and inverse MSE), trimming leads to only
mentioned alternative data-rich forecasts by making use of very small gains, or even losses, in forecast accuracy with
the model confidence set of Hansen et al. (2011) a second BMA and MMA weighting. We shed further light on this
time. In this comparison, Mτ0 , the initial set of all models, last point by constructing the following two sets of figures.
consists of 12 different models: (i) four data-rich forecasts First, Fig. 1 shows the proportion of times in our out-of-
without trimming (EW combination, inverse MSE weights sample forecasting exercise that each of the 107 models
combination, MMA and BMA forecasts), (ii) the same initial is selected as belonging to the set of best forecasts using
set of forecasts after fixed trimming with a baseline cutoff the MCS or the fixed trimming approach for the one-
of 90%,12 and finally, (iii) the same initial set of forecasts year-ahead forecasts. The models (x-axis) are sorted from
after MCS trimming with a p-value cutoff of 50%.13 Sec- lowest to highest proportion rates (y-axis). Under both
tion 4.6 examines the robustness of our results to different schemes, a subset of models are never included in the set
choices of cutoffs. of best forecasts, and this is followed by a larger group
The MCS indicates the set of best models by attaching with increasing selection rates. Finally, on the other end of
p-values to each of these 12 different forecast combina- the spectrum, there is a small group of models that have
tions. The results below give the MSPEs and p-values for significantly higher selection rates. Thus, this evidence
the combined models included in Mτ0 . points to the existence of some persistence in the out-
of-sample forecasting performances of the best models,
but also shows a considerable degree of instability in
11 See for example Faust et al. (2013), and Wright (2008, 2009). the remaining ones. Next, Fig. 2 shows the average BMA
12 Namely, we trim the worst 90% of models. and MMA weights for each of the 107 models over the
13 We keep only models that receive p-values of 50% or higher associated out-of-sample forecasting period. The results bear a close
with the null hypothesis that the model belongs to the set of best models. resemblance to the previous figure. On average, the vast
The choice of 50% is based on the MC evidence. majority of models get essentially zero weight. A very small
56 J.D. Samuels, R.M. Sekkel / International Journal of Forecasting 33 (2017) 48–60
Table 4
Final model comparisons: full sample one-year-ahead forecasts.
GDP IP EMP HSTARTS GDP deflator
MSPE p-value MSPE p-value MSPE p-value MSPE p-value MSPE p-value
Equal weights 4.38 0.38** 26.05 0.37** 2.54 0.16* 472 0.24* 1.59 0.46**
MSE weights 4.31 0.49** 25.79 0.37** 2.49 0.18* 465 0.24* 1.57 0.46**
BMA 3.56 0.62** 23.02 0.52** 2.40 0.39** 434 0.24* 1.40 0.46**
MMA 3.56 0.68** 22.23 0.81** 2.25 0.39** 409 0.24* 1.40 0.46**
Fixed trimmed: EW 3.85 0.57** 22.24 0.68** 1.99 0.90** 412 0.24* 1.40 0.46**
Fixed trimmed: MSE 3.66 0.68** 22.11 0.88** 1.98 1.00** 404 0.39** 1.39 0.46**
Fixed trimmed: BMA 3.37 0.68** 21.02 1.00** 2.21 0.39** 406 0.24* 1.40 0.46**
Fixed trimmed: MMA 3.52 0.68** 21.16 0.99** 2.17 0.39** 365 1.00** 1.43 0.46**
MCS trimmed: EW 3.00 0.68** 21.02 1.00** 2.37 0.37** 384 0.57** 1.36 0.73**
MCS trimmed: MSE 2.96 1.00** 21.01 1.00** 2.34 0.39** 382 0.57** 1.35 0.85**
MCS trimmed: BMA 3.55 0.68** 23.04 0.52** 2.47 0.16* 396 0.39** 1.32 1.00**
MCS trimmed: MMA 3.45 0.68** 22.03 0.68** 2.34 0.39** 386 0.39** 1.34 0.85**
Note: This table gives MSPEs and p-values for each forecasting scheme under the null that each scheme has the same relative loss. Fixed trims the worst
90% of models based on their MSPEs, MCS trims models with p-values of less than 50%.
*
The model is in the set of best models at the 10% level.
**
The model is in the set of best models at the 25% level.
Table 5
Final model comparisons: pre-Great Moderation one-year-ahead forecasts.
GDP IP EMP HSTARTS GDP deflator
MSPE p-value MSPE p-value MSPE p-value MSPE p-value MSPE p-value
** ** **
Equal weights 9.58 0.26 57.96 0.40 5.95 0.03 961 0.31 4.59 0.04
MSE weights 9.33 0.30** 57.09 0.40** 5.83 0.03 940 0.31** 4.51 0.18*
BMA 6.25 0.47** 45.66 0.41** 5.11 0.40** 757 0.31** 3.82 0.56**
MMA 6.48 0.35** 42.81 0.89** 4.46 0.78** 691 0.65** 3.87 0.56**
Fixed trimmed: EW 7.81 0.33** 46.17 0.41** 4.31 0.78** 749 0.39** 3.80 0.56**
Fixed trimmed: MSE 7.08 0.47** 45.30 0.53** 4.25 1.00** 728 0.65** 3.77 0.56**
Fixed trimmed: BMA 5.74 0.47** 41.34 0.95** 4.56 0.42** 667 0.65** 3.84 0.18*
Fixed trimmed: MMA 6.15 0.35** 40.99 0.95** 4.29 0.92** 619 1.00** 4.04 0.04
MCS trimmed: EW 4.63 0.47** 38.76 0.95** 5.32 0.03 678 0.65** 3.63 0.87**
MCS trimmed: MSE 4.50 1.00** 38.62 1.00** 5.22 0.04 671 0.65** 3.61 0.93**
MCS trimmed: BMA 6.49 0.47** 46.56 0.41** 5.43 0.04 640 0.65** 3.55 1.00**
MCS trimmed: MMA 5.95 0.47** 42.13 0.95** 4.84 0.15* 653 0.65** 3.62 0.93**
Note: This table gives MSPEs and p-values for each forecasting scheme under the null that each scheme has the same relative loss. Fixed trims the worst
90% of models based on their MSPEs, MCS trims models with p-values of less than 50%.
*
The model is in the set of best models at the 10% level.
**
The model is in the set of best models at the 25% level.
number of models (usually fewer than 10) get most of the We analyze the year-ahead forecasts here, but provide the
weight. The fact that a few models are usually assigned same figures for the one- and two-quarter-ahead results in
most or all of the weight is a well-known feature of BMA, a supplementary appendix (see Appendix A).
but less so, to the best of our knowledge, for MMA. Thus, Fig. 3 provides evidence on the relative performances of
the lack of improvement in the BMA and MMA forecasts the two trimming methods for various cutoffs options by
is explained by the fact that these pooling methods are showing the ratio of the trimmed forecast combination to
essentially already trimming the worst performing models. the non-trimmed forecast combination’s MSPE for a wide
On the other hand, our evidence also shows that once the range of choice of cutoffs. A ratio that is smaller than one
pool of models has been trimmed with the MCS, applying means that the trimmed forecast combination has an MSPE
weights other than equal weights to the remaining models smaller than the non-trimmed one. For fixed trimming,
the x-axis represents the proportion of models that are
has very little benefit, if any, for the resulting combined
excluded from the combination. We start by keeping all
forecast.
models in the combination, and hence the ratios start at
one. We then trim all but the 2% best performing models.
4.6. Robustness For MCS trimming, the x-axis shows the p-values based
on which the set of best models is being identified. Low
In this section, we analyze the sensitivity of our results p-values result in fewer models being trimmed, whereas
to the choice of the cutoff. Because the implementation high p-values induce more models to be trimmed. We plot
of the MCS and fixed trimming schemes depends on the the relative accuracy of the MCS trimmed forecast using
cutoff choice, we conduct a careful analysis in order to p-values that vary from 1% to 98%.
determine how the results vary with different selections of Fig. 3 indicates that very aggressive trimming is
the cutoff. The models are combined using equal weights. required in order for fixed trimming to provide forecasts
J.D. Samuels, R.M. Sekkel / International Journal of Forecasting 33 (2017) 48–60 57
Fig. 1. Full sample selection rate of each model in the fixed and MCS sets of best models for one-year-ahead forecasts. Notes: This figure displays the full
sample selection rate of the models in the set of best models selected by the MCS with a p-value of 50% and by fixed trimming with a cutoff of 10%, sorted
from the lowest to highest selection rates.
that improve on those of the simple average combination aggressive trimming rules tend to be superior, as per
scheme. For the variables forecast in this paper, a fixed Bjørnland et al. (2011) and Favero and Aiolfi (2005).
rule that trims around 90% of the models, and therefore For MCS trimming, the highest gains from trimming are
combines fewer than 10% of the models, provides the achieved with p-values of between 30% and 60%. Starting
most accurate forecast combination. With this level of with a p-value of 1%, only the very strongly statistically
trimming, one can achieve sizable MSPE reductions of inferior forecasts with p-values between zero and 1%
around 25% over combining all models’ predictions. As are discarded. Hence, the differences between the MCS-
was discussed above, other papers have also found that trimmed and non-trimmed combinations are small, as
58 J.D. Samuels, R.M. Sekkel / International Journal of Forecasting 33 (2017) 48–60
Fig. 2. Average BMA and MMA weights for the one-year-ahead out-of-sample forecasts. Notes: This figure displays the average weight attached to each
model by BMA and MMA over the out-of-sample forecasting period, ranked from lowest to highest weights.
is evidenced by the fact that most of the ratios start at When comparing fixed and MCS trimming, it is clear
approximately one. As we increase the level of significance that MCS trimming performs better for most of the
required to include a forecast to the set of best forecasts, cutoff space. As was discussed earlier, fixed trimming
more forecasts are trimmed and the gains from MCS only provides significant forecast accuracy gains when we
discard a very high share of the models. On the other
trimming increase. Above the 60% level, increasing the p-
hand, the MCS trimming results are relatively unchanged
value cutoff leads to worse forecasts for all of the variables for a wide range of p-value cutoffs. By taking into account
that we analyze. Importantly, MCS trimming exhibits large the significance of the statistical differences between the
and robust gains in forecasting performance for a wide forecasts, one is able to select more carefully which models
range of p-value cutoffs. should be trimmed.
J.D. Samuels, R.M. Sekkel / International Journal of Forecasting 33 (2017) 48–60 59
Fig. 3. RMSE ratio of fixed and MCS trimmed to untrimmed forecast combination for one-year-ahead forecasts. Notes: This figure shows the MSPE ratio of
trimmed to non-trimmed forecast combinations with equal weights (y-axis) for the full sample. A ratio smaller than one means that the trimmed forecast
combination has a smaller MSPE than the combination with the full set of models. For fixed trimming, the x-axis represents the proportion of models being
excluded from the combination. For MCS trimming, the x-axis shows the p-values with which the set of best models is being identified.
Acknowledgments Clemen, R., & Winkler, R. (1986). Combining economic forecasts. Journal
of Business & Economic Statistics, 39–46.
Faust, J., Gilchrist, S., Wright, J. H., & Zakrajšsek, E. (2013). Credit
We would like to thank Natsuki Arai, Peter Christo- spreads as predictors of real-time economic activity: a Bayesian
phersen, Jon Faust, Domenico Giannone, Cheng Hsiao, model-averaging approach. Review of Economics and Statistics, 95,
Maral Kichian, Eva Ortega, Gabriel Perez-Quiros, Tatevik 1501–1519.
Faust, J., & Wright, J. (2009). Comparing Greenbook and reduced form
Sekhposyan and Jonathan Wright for useful discussions forecasts using a large realtime dataset. Journal of Business and
and suggestions, as well as seminar participants at Johns Economic Statistics, 27, 468–479.
Hopkins University, Bank of Spain, BlackRock, Bank of Favero, C., & Aiolfi, M. (2005). Model uncertainty, thick modelling and the
predictability of stock returns. Journal of Forecasting, 24, 233–254.
Canada, Instituto de Pesquisa e Ensino, DePaul University, Hansen, B. E. (2007). Least squares model averaging. Econometrica, 75,
2012 CEF, 2012 LAMES, 1st Vienna Workshop on High Di- 1175–1189.
mensional Time Series, 2013 Canadian Economic Associa- Hansen, B. E. (2008). Least-squares forecast averaging. Journal of
Econometrics, 146, 342–350.
tion, 2013 International Symposium on Forecasting, 2013 Hansen, P., Lunde, A., & Nason, J. (2011). The model confidence set.
NASM, and the 2013 NBER-NSF Time Series Conference Econometrica, 79, 453–497.
(poster). Jon Samuels thanks the SRC for the Robert M. Hendry, D., & Clements, M. (2004). Pooling of forecasts. The Econometrics
Journal, 7, 1–31.
Burger Fellowship, and Rodrigo Sekkel, Capes/Fulbright Inoue, A., & Kilian, L. (2008). How useful is bagging in forecasting
and the Campbell Fellowship for financial support. Jon economic time series? A case study of US consumer price inflation.
Samuels thanks the SRC for the Robert M. Burger Fellow- Journal of the American Statistical Association, 103, 511–522.
Makridakis, S., & Winkler, R. (1983). Averages of forecasts: Some
ship, and Rodrigo Sekkel, Capes/Fulbright and the Camp-
empirical results. Management Science, 987–996.
bell Fellowship for financial support during our graduate Mallows, C. L. (1973). Some comments on C p. Technometrics, 15, 661–675.
studies. The views expressed in this paper are solely those Schwarzmüller, T. (2015). Model pooling and changes in the informational
of the authors and not necessarily those of the US Bureau content of predictors: An empirical investigation for the euro area. Tech.
rep., Kiel Working Paper
of Economic Analysis, the US Department of Commerce or Stock, J., & Watson, M. (2002). Macroeconomic forecasting using diffusion
the Bank of Canada. indexes. Journal of Business and Economic Statistics, 20, 147–162.
Stock, J., & Watson, M. (2004). Combination forecasts of output growth in
a seven-country data set. Journal of Forecasting, 23, 405–430.
Appendix A. Supplementary data Stock, J., & Watson, M. (2007). Why has US inflation become harder to
forecast? Journal of Money, Credit and Banking, 39, 3–33.
Supplementary material related to this article can be Timmermann, A. (2006). Forecast combinations. In Handbook of economic
found online at http://dx.doi.org/10.1016/j.ijforecast.2016. forecasting. Vol 1 (pp. 135–196).
Tulip, P. (2009). Has the economy become more predictable? changes in
07.004. greenbook forecast accuracy. Journal of Money, Credit and Banking, 41,
1217–1231.
Wright, J. (2008). Bayesian model averaging and exchange rate forecasts.
References
Journal of Econometrics, 146, 329–341.
Wright, J. (2009). Forecasting US inflation by Bayesian model averaging.
Aiolfi, M., & Timmermann, A. (2006). Persistence in forecasting perfor- Journal of Forecasting1, 28, 131–144.
mance and conditional combination strategies. Journal of Economet- Zellner, A. (1986). On assessing prior distributions and Bayesian
rics, 135, 31–53. regression analysis with g-prior distributions. In Bayesian Inference
Alvarez, R., Camacho, M., & Perez-Quiros, G. 2012. Finite sample
and Decision Techniques: Essays in Honor of Bruno De Finetti, Vol 6
performance of small versus large scale dynamic factor models.
(pp. 233–243).
Bates, J., & Granger, C. (1969). The combination of forecasts. OR, 20,
451–468.
Bernanke, B., & Boivin, J. (2003). Monetary policy in a data-rich
environment* 1. Journal of Monetary Economics, 50, 525–546. Jon D. Samuels is a research economist at the Bureau of Economic
Bjørnland, H., Gerdrup, K., Jore, A., Smith, C., & Thorsrud, L. (2011). Does Analysis at the US Department of Commerce. He obtained his Ph.D. in
forecast combination improve norges bank inflation forecasts? Oxford economics from Johns Hopkins University.
Bulletin of Economics and Statistics, 74, 163–179.
Capistrán, C., Timmermann, A., & Aiolfi, M. 2010. Forecast combinations,
Working Papers. Rodrigo M. Sekkel is a senior analyst at the Canadian Economic Analysis
Clemen, R. (1989). Combining forecasts: A review and annotated department of the Bank of Canada. He obtained his Ph.D. in economics
bibliography. International Journal of Forecasting, 5, 559–583. from Johns Hopkins University.