Sunteți pe pagina 1din 13

International Journal of Forecasting 33 (2017) 48–60

Contents lists available at ScienceDirect

International Journal of Forecasting


journal homepage: www.elsevier.com/locate/ijforecast

Model Confidence Sets and forecast combination


Jon D. Samuels a , Rodrigo M. Sekkel b,∗
a
BEA, United States
b
Bank of Canada, Canada

article info abstract


Keywords: A longstanding finding in the forecasting literature is that averaging the forecasts from
Model combination
a range of models often improves upon forecasts based on a single model, with equal
Performance-based weighting
weight averaging working particularly well. This paper analyzes the effects of trimming
Trimming
the set of models prior to averaging. We compare different trimming schemes and propose
a new approach based on Model Confidence Sets that takes into account the statistical
significance of the out-of-sample forecasting performance. In an empirical application to
the forecasting of U.S. macroeconomic indicators, we find significant gains in out-of-sample
forecast accuracy from using the proposed trimming method.
© 2016 International Institute of Forecasters. Published by Elsevier B.V. All rights reserved.

1. Introduction While a large body of literature has examined model


combination weights, Capistrán et al. (2010) pointed out
Since the original work of Bates and Granger (1969), a that there has been little research focusing on how to
myriad of papers have argued that combining predictions choose the models to combine, given a pool of potential
from alternative models often improves upon forecasts models. Theoretically, a potential model should be used for
based on a single best model.1 In an environment in forecasting if has any useful information. Nevertheless, in
which individual models are subject to structural breaks small samples, where parameter estimation error is often
and misspecified to varying degrees, a strategy that pools pervasive, it may be that discarding predictions, that is, as-
information from many models typically performs better signing them a zero weight, will lead to better final forecast
than methods that try to select the best forecasting model. combinations. As Aiolfi and Timmermann (2006) argued,
When using this strategy, the forecaster faces two basic the problem of parameter estimation error is particularly
choices: which models to include in the model pool, and acute when the number of models is large relative to the
how to combine the model predictions. With the present sample size, as is often the case with large macroeconomic
easy access to large panel data sets, a vast body of research datasets. In such cases, trimming models could lead to bet-
has investigated optimal model combination, but found ter estimates of each model’s weight in the combined fore-
repeatedly that a simple average of the forecasts produced cast. Hence, the benefits of adding one additional forecast
by individual models is a difficult benchmark to beat, to the combination should be weighed against the cost of
and commonly outperforms more sophisticated weighting estimating additional parameters.
schemes that rely on the estimation of theoretically This paper uses a novel approach to select the mod-
optimal weights. This is known as the forecast combination els to be included in the forecast combination. In par-
puzzle. ticular, we use the concept of the model confidence set
(Hansen et al., 2011) to determine the statistically supe-
∗ rior set of best models, conditional on the model’s past out-
Corresponding author.
E-mail address: rsekkel@bankofcanada.ca (R.M. Sekkel).
of-sample performance. We compare this method with the
1 See Clemen (1989), Clemen and Winkler (1986), Hendry and commonly-used approach of fixing the proportion of mod-
Clements (2004), Makridakis and Winkler (1983), Stock and Watson els to keep and discarding the remaining models, with-
(2004), and Timmermann (2006), among many others. out regard for the statistical significance of differences in
http://dx.doi.org/10.1016/j.ijforecast.2016.07.004
0169-2070/© 2016 International Institute of Forecasters. Published by Elsevier B.V. All rights reserved.
J.D. Samuels, R.M. Sekkel / International Journal of Forecasting 33 (2017) 48–60 49

model accuracy. In the model confidence set approach, the ŷi . These models might include naive autoregressions,
number of models trimmed is not fixed exogenously by Bayesian vector autoregressions, factor models, and DSGE,
the econometrician, but is determined by a statistical test among others.
comparing model accuracies. In our application to the fore- We first provide an introduction to the MCS, then detail
casting of macroeconomic indicators in the US, we employ how we use it as a trimming device to parse models and
the often-used approach of averaging the forecasts of many form conditional forecast combinations. We then contrast
bivariate models,2 and find substantial improvements in the results obtained using the MCS with those obtained
forecast combination accuracy after trimming the set of using a rule that simply ranks the models according to their
potential models to be combined using both the fixed and past out-of-sample forecasting performances and trims a
MCS schemes, but the gains from using the MCS approach fixed share of the worst performing models.
are larger and more robust.
The idea of trimming the set of potential models prior to 2.1. Exogenous fixed trimming
forecast combination is not novel. Makridakis and Winkler
(1983) studied the effects of adding forecasts to a simple In the fixed-rule trimming scheme, the number of fore-
combination, and found the marginal benefit of adding casting models to be discarded is fixed exogenously. The
forecasts to a simple combination to decrease very rapidly analysis below refers to this approach as fixed trimming.
once a relatively small number of forecasts has been We construct the conditional forecast combination by
included. In the same spirit, Timmermann (2006) argued ranking the models according to their past MSPEs, discard-
that the benefit of adding forecasts should be weighed ing a fixed proportion of models, and using the remaining
against the cost of introducing an increased parameter ones to form the set of best forecasts. It is important to
estimation error. He considered three straightforward note that while the number of models to be discarded (and
trimming rules: combining only the top 75%, 50% or 25% of hence the number to be combined) is fixed exogenously,
models, based on the models’ out-of-sample MSPEs.3 The there is nothing constraining the procedure to discard the
author found aggressive trimming to yield better results; same models in each forecast period. Different models will
in other words, including fewer models in the combination be trimmed and used according to their respective MSPE
led to better forecasts. In stock return forecasting, Favero ranks in the periods preceding the forecasting period. More
and Aiolfi (2005) also found that aggressive trimming rules formally, let Fτ be a set of i = 1, . . . , n candidate mod-
based on models’ R2 values improved forecasts. In their els for forecasting in period τ . We estimate each model i
application, trimming 80% of the forecasts led to the best using R periods of data. Fixed trimming requires a train-
results. When combining forecasts from various models for ing sample of S periods of forecasts from each of candidate
inflation in Norway, Bjørnland et al. (2011) argued that a models. Thus, the first period for which we can apply fixed
strategy that combines only the 5% best models leads to trimming is R + S + 1. Individual models are estimated us-
the best forecast combination. ing data from periods t = τ − R, . . . , τ − 1, and a rolling
We find that significant gains for the fixed trimming sample of S previous forecasts is used to compare model
method are restricted to strategies that aggressively trim performances. The particular rule that we employ discards
80%–95% of the models. On the other hand, the MCS trim- a fixed proportion of the models in Fτ , such that
ming rule results in significant accuracy improvements for
a wide range of parameters that govern the confidence Fτ∗ = {i ∈ Fτ : MSEi,τ ≤ Pτ (x)}, (1)
level with which the set of best models is identified. Monte where Pτ (x) is the xth percentile of MSEi,τ . With this trim-
Carlo evidence informs the intuition that forecast accuracy ming rule, the forecaster has to decide on the proportion
gains from trimming models based on their historical out- of models to be trimmed. We perform a systematic analy-
of-sample performances arise mainly in environments in sis to show how the MSPE of the final combination would
which some of the models have very little predictive abil- change for a wide range of different percentiles.
ity relative to others.
The outline of the paper is as follows: Section 2 lays out 2.2. The model confidence set approach to trimming
the trimming schemes, while Section 3 details the results
of the Monte Carlo exercise. Section 4 describes our em- An important drawback of the simple trimming rule
pirical application to the forecasting of US macroeconomic discussed above is that it does not take into account the sta-
variables. Finally, Section 5 concludes. tistical significance of differences in the historical perfor-
mances of the forecasting models. In principle, one might
2. Trimming rules easily conjecture a situation where the best and worst fore-
casts have mean squared prediction errors that are not
Our starting point is a situation in which the forecaster statistically different from each other. We use the model
has a toolbox of different models with which to predict confidence set method of Hansen et al. (2011) to identify
a variable of interest y. Each model i implies a forecast the set of best models, then trim the models that are ex-
cluded from the MCS prior to forecast combination. From
a frequentist perspective, the model confidence set ap-
2 See for example Faust et al. (2013), Stock and Watson (2004), and proach is a tool for summarizing the relative performances
Wright (2009). of an entire set of models by determining which models
3 Timmermann (2006) used a recursive weighting scheme based on the can be considered to be statistically superior, and at what
MSE. We use a rolling window. level of significance.
50 J.D. Samuels, R.M. Sekkel / International Journal of Forecasting 33 (2017) 48–60

We keep the notation of Hansen et al. (2011), and When constructing the model confidence set M̂τ∗ , we
refer the reader to the original paper for a more detailed choose our baseline confidence level of α by performing
exposition. The MCS aims to identify the set Mτ∗ , such that: a systematic analysis, then keep models with associated
p-values that are greater than or equal to α . We set our
Mτ∗ = {i ∈ Mτ0 : ui,j,τ ≤ 0 for all j ∈ Mτ0 },
baseline α using the results from our Monte Carlo exer-
where Mτ0 are the initial set of candidate models. Like with cise, and conduct a sensitivity analysis to this choice in Sec-
the fixed trimming, the candidate models are estimated tion 4.6.
using R observations, and the MCS method requires a
training sample of S periods of forecasts from each of the 3. Monte Carlo evidence
candidate models. These S periods of forecasts are used to
define ui,j,τ : ui,j,τ = E (dij,τ ), where dij,τ = Li,t − Lj,t is We shed more light on the benefits of our trimming
the model loss differential defined over the training period. approach by conducting a Monte Carlo study, in line with
We define the loss in period i as the squared error. That Alvarez et al. (2012) and Inoue and Kilian (2008). We posit
is, given the set of all forecasting models Mτ0 in the initial that there are N = 50 predictors for yt +1 . The simulations
comparison set, the MCS searches for the set of models at are based on 500 replications, with T = 150.7
time τ that cannot be rejected as statistically inferior at a The data generating process (DGP) for the predictors
chosen level of confidence. {xit }Ni=1 follows a factor structure, and is given by
The MCS is based on the following algorithm. Starting
from the set of all models Mτ0 , repeatedly test the null xt = F t Λ + ϵ t ,
hypothesis of equal predictive accuracy, H0,M : ui,j =
where Λ is a column vector of N ones, and Ft is generated
0 ∀i, j, at significance level α . If the null is rejected,
the procedure eliminates a model from Mτ , and this is from a standard normal distribution.
repeated until the null of no difference between models The idiosyncratic component is assumed to be a zero
cannot be rejected at the chosen level of significance. The mean shock, with variance–covariance matrix Σϵ . We
∗ introduce correlation into the idiosyncratic component of
set M̂τ 1−α with the remaining models is denoted as the
the N different predictors by assuming that Σϵ is given by
MCS, Mτ∗ .4 the Toeplitz matrix
We test H0 , which is done sequentially until we reach
the case when the null is not  rejected, by constructing ρ ρ2 ρ N −1
 
n
1 ···
t-statistics based on d̄ij,τ ≡ S −1 t =1 dij,τ , the relative loss  ρ 1 ρs ··· ρ N −2 
of model i relative to model j. The pertinent test statistics
Σv =  ρ ρ ρ N −3 
 2 
 1 ··· . (2)
are  . .. .. .. 
 .. ..
d̄ij,τ . . . . 
tij,τ =  ρ N −1 ρ N −2 ρ N −3 ··· 1
v ar
ˆ (dij,τ )
and We report results for ρ = 0, 0.1, 0.5, and 0.9. That is,
we consider a range of simulations, from the case where
TR,Mτ = max |tij,τ |. the idiosyncratic terms between the first two predictors
i,j∈Mτ
are completely uncorrelated, up to the case where this
The TR,Mτ statistic ensures that the rejection or other- correlation is 0.9.
wise of the null of no difference in model performances de- Finally, the DGP for yt +1 is given by:
pends only on the model with the greatest relative loss. The
implementation of this statistic is particularly convenient yt +1 = β ′ xt + εt +1 , (3)
because the decision rule as to which model to eliminate
is given by eR,M = arg maxi∈M supj∈M tij , that is, the model where εt +1 ∼ NID(0, 1). Following Inoue and Kilian
with the largest t-statistic. (2008), we propose five different scenarios for the slope
The asymptotic distribution of this test statistic is non- parameter vector:
standard, as it depends on the cross-sectional correlation of Design B1. β = c1 [1, 1, 1, . . . , 1]′
the tij,τ . The MCS procedure addresses these issues by using
Design B2. β = c2 [50, 49, 48, . . . , 1]′
bootstrap methods to estimate the distribution of the test
Design B3. β = c3 [1, 1/2, 1/3, . . . , 1/50]′
statistic. We ensure that the estimates of the distribution
reflect the persistence in the di,j,τ by employing the block
Design B4. β = c4 [11×10 , 01×40 ]′
Design B5. β = c5 [e−1 , e−2 , e−3 , . . . , e−50 ]′ .
bootstrap proposed by Hansen et al. (2011).5 Our choice of
the block size depends on the forecast horizon.6 In design B1, all variables are equally important
predictors of yt +1 . In such an environment, one would not
expect to find any gains from trimming the set of potential
4 If the null is not rejected in the first round, M ∗ = M 0 .
τ τ predictors based on their past forecasting performances,
5 Given the computational intensity of the MCS method, increasing the
because all predictors should have equal predictive power,
number of models to thousands or millions would make the approach
very costly computationally.
6 For the one-quarter-ahead forecasts, we use a block size of two
quarters. For the two- and four-quarter-ahead forecasts, we use block 7 Within each Monte Carlo draw, each forecasting period t requires a
sizes of three and six quarters, respectively. bootstrap in order to determine the model confidence set.
J.D. Samuels, R.M. Sekkel / International Journal of Forecasting 33 (2017) 48–60 51

on average. In all other designs, the predictive powers of Finally, all of these designs exhibit an increase in accuracy
the 50 predictors generated are different. In design B4, gains as the importance of the predictors relative to the
a small group of variables (ten) have equal importance, errors (R2 ) increases.
while the majority of predictors have no importance at all The Monte Carlo results indicate that greater gains from
(zero loadings). Designs B2, B3 and B5 incorporate smooth trimming are expected when many of the predictors have
decays in the relative importance of each xi . This decay weak forecasting power, which is a common situation
is slow in design B2, but fast in design B5 (exponential), in macroeconomic forecasting. Hence, assigning weak
meaning that a few variables will have relatively high forecasts a weight of zero reduces the impact of parameter
predictive power for yt , and the remainder will have estimation uncertainty, and leads to a better bias–variance
basically zero forecasting power. One would expect that trade-off. The bias from under-fitting is more than
the gains from trimming the set of predictors should compensated for by the fall in the estimation uncertainty.
be particularly large in situations like those proxied by
design B5. As per (Inoue and Kilian, 2008), the scaling
constants c1 . . . c5 are chosen such that the R2 values of 4. Empirical application
the forecasting models are the same across all designs. We
show the results for R2 values of 25% and 50%. We test the benefits of trimming the number of models
We compare the performances of untrimmed forecast prior to forecast combination by applying the methods
combination and the fixed and MCS trimmed forecast discussed above in a commonly-used setting, averaging
combinations. To begin, we construct an out-of-sample bi-variate models’ forecasts using a large panel of US
forecast for yt +1 based on each of the 50 predictors, using macroeconomic data.8
a univariate model for each predictor xi :
yt +1 = γi xi,t + εi,t , (4) 4.1. Data

where γi is estimated using a rolling period of R = 40


The macroeconomic dataset of potential predictors that
observations for in-sample estimation, and a rolling period
we use consists of 107 economic and financial variables.
of S = 20 out-of-sample observations as a training sample,
This large panel contains data on both aggregate and
to which we apply the trimming methods. The first forecast
disaggregate macroeconomic data, surveys, and financial
combination in the simulations is for period R + S + 1 = 61.
indicators. Table 1 details the series contained in the panel.
Specifically, we have fifty out-of-sample forecasts in period
The panel starts in 1959Q3 and ends in 2013Q2. Table 1
61, one based on each individual model. We use the 20
out-of-sample predictions from periods 41 to 60 as means also details the transformation applied to each series to
of distinguishing each model’s performance over the eliminate trends. The panel closely resembles that of Stock
training sample. For the fixed trimming, we examine the and Watson (2002).
performance with three different cutoff choices: trimming We use this dataset to predict various measures of eco-
50%, 75% and 90% of the worst performing models nomic activity and inflation, namely: gross domestic prod-
exogenously when choosing the models for forecasting uct (GDP), nonfarm payroll (EMP), industrial production
period 61. For the MCS trimming, we identify the sets (IP), housing starts (HST) and the gross domestic product
of best models with p-values of 10%, 25% and 50%. For deflator (DEF).9 As a baseline exercise, we report results for
both methods, the higher the cutoff, the more models a forecasting exercise using real-time data for these series.
are trimmed from the combinations. The final combined The dataset was obtained from the Real-Time Dataset for
forecast for period 61 is formed by taking a simple average Macroeconomists at the Federal Reserve Bank of Philadel-
of the predictions from the models that have not been phia. We measure the actual realized growth rates of these
trimmed. This process is iterated forward to T = 150. series using the data as recorded in the real-time dataset
Table 2 shows the Monte Carlo results for fixed of the Federal Reserve Bank of Philadelphia two quarters
trimming, while the results for the MCS trimming are after the quarter to which the data refer. This typically cor-
shown in Table 3. A ratio smaller than one means that responds to the data recorded in the second revision of
the MSPE of the trimmed forecast combination is smaller the national income and product accounts. The macroeco-
than that of the untrimmed one. A few observations nomic panel was gleaned from a number of data sources,
stand out. For both trimming methods, the gains from but mostly from the St. Louis Fed FRED database.
trimming are higher for stricter trimming cutoffs. For Unfortunately, there is no real-time data set of macroe-
the MCS, trimming with a p-value of 50% leads to larger conomic variables that covers all of our predictors over our
gains than the smaller cutoffs. Similarly, for the fixed whole sample to allow us to perform the exercise fully in
trimming, excluding 90% of the worst performing models real-time. Nevertheless, as was shown by Bernanke and
also generally results in the highest gains from trimming. A Boivin (2003) and Faust and Wright (2009), the use of
second result that emerges from the Monte Carlo exercise
is that the predictor designs B3 to B5 are associated with
larger accuracy gains. These designs have the common 8 See for example, Faust et al. (2013), Stock and Watson (2004), and
characteristic that the predictive power diminishes rapidly Wright (2009), among many others.
across predictors. The Monte Carlo also indicates that the 9 We also examined the robustness of the results to other measures
gains from trimming are higher with stronger correlations of inflation, namely CPI and PCE, and found similar results across all
between the idiosyncratic components of the predictors. measures.
52 J.D. Samuels, R.M. Sekkel / International Journal of Forecasting 33 (2017) 48–60

Table 1
Variables and transformations in our large dataset.
Variable Transf. Variable Transf.

Moody’s AAA Bond Yield 2 Civilian Unemployed: 15 to 26 Weeks 5


Moody’s AAA Bond Spread 2 Civilian Unemployed: 5 to 14 Weeks 5
Avg.Hourly earnings: Construction 5 Civilian Unemployed: Less than 5 Weeks 5
Avg.Hourly earnings: Manufacturing 5 Civilian Unemployed: 27 Weeks and over 5
Avg.Weekly Hours: Manufacturing 1 Civilian Labor force 5
Avg.Weekly Overtime Hours: Manufacturing 2 Avg. Duration of Unemployment 5
Moody’s BAA Bond Yield 2 Exchange Rate: Switzerland 5
Moody’s BAA Bond Spread 2 Exchange Rate: Japan 5
ISM Manufacturing PIM Composite Index 1 Exchange Rate: UK 5
ISM Manufacturing Employment Index 1 Exchange Rate: Canada 5
ISM Manufacturing Inventory Index 1 S&P Earning Price Ratio 5
ISM Manufacturing New Orders Index 1 Real Compensation per Hour 5
ISM Manufacturing Production Index 1 Corporate Profits after tax 5
ISM Manufacturing Prices Index 1 Real Disposable personal income 5
Avg.Weekly Hours: Nondurable Goods 1 Real Exports 5
Avg.Weekly Hours: Durable Goods 1 Real Final Sales Domestic Products 5
S&P Returns 1 Real Government Expenditures: Federal Government 5
Fama–French Factor: RmRf 1 Real Government Expenditures: State and Local 5
Fama–French Factor: SMB 1 Real Imports 5
Fama–French Factor: HML 1 Real Compensation per hour: Business Sector 5
Fed Funds Rate 2 Unit Labor Cost: Nonfarm Business 5
1-Year Yield 2 Real Personal Consumption Expenditures: Services 5
5-Year Yield 2 Real Personal Consumption Expenditures: Durables 5
10-Year Yield 2 Real Personal Consumption Expenditures: Nondurables 5
3-Month Treasury Bill 2 Real Investment: Intellectual Properties 5
6-Month Treasury Bill 2 Real Investment: Equipment and Software 5
6-Month minus 3-Month Spread 1 Real Investment: Nonresidential Structures 5
1-Year minus 3-Month Spread 1 Real Investment: Residential Structures 5
10-Year minus 3-Month Spread 1 Nonfarm Business: hours all persons 5
Personal Saving Rate 2 Nonfarm Business: output per hour all persons 5
Unemployment Rate 2 Commercial and Industrial Loans 6
Housing Starts: Midwest 4 M1 - Money Stock 6
Housing Starts: Northeast 4 M2 - Money Stock 6
Housing Starts: South 4 St.Louis Adjusted Monetary Base 6
Housing Starts: West 4 PPI: All Commodities 6
All Employees: Durable Goods 5 PPI: Crude Materials 6
All Employees: Manufacturing 5 PPI: Finished Foods 6
All Employees: Nondurable Goods 5 PPI: Industrial Commodities 6
All Employees: Services 5 PPI: Intermediate Materials 6
All Employees: Construction 5 CPI: All items 6
All Employees: Government 5 CPI: Core 6
All Employees: Mining 5 PCE: Excluding Food and Energy 6
All Employees: Retail Trade 5 PCE: Durables 6
All Employees: Wholesale Trade 5 PCE: Nondurables 6
All Employees: Finance 5 PCE: Services 6
All Employees: Trade, Transp. and Utilities 5 Price of Investment: Structures 6
Industrial Production: Business Equipment 5 Price of Investment: Equipment 6
Industrial Production: Consumer Goods 5 Price of Investment: Residential Structures 6
Industrial Production: Durable Consumer Goods 5 Price of Exports 6
Industrial Production: Final Goods 5 Price of Imports 6
Industrial Production: Materials 5 Price of Federal Government Expenditures 6
Industrial Production: Nondurable Goods 5 Price of State and Local Government Expenditures 6
Industrial Production: Durable Materials 5
Industrial Production: Nondurable Materials 5
Civilian Unemployed: 15 Weeks and over 5
Note: This table shows our dataset, together with the transformation applied to each of the series: 1: No change; 2: log; 3: first difference; 4: second
difference; 5: first difference of logs; 6: second difference of logs.

real-time or revised data should not affect the relative fore- from t − 1 to t, and xt be the additional predictor. We define
cast accuracy when comparing different approaches. ȳt +h as the h-quarter-ahead average growth rate to be
i=1 yt +i /h. We estimate the models
h
forecasted, ȳt +h =
4.2. Models for h = 1, 2, and 4. For each individual series {xi,t }107
i=1 in
our macroeconomic and financial panel, we estimate the
The individual forecasts to be combined are based on following model for each variable of interest:
linear autoregressive models that contain one additional P

predictor per model. Let t date the predictors, yt be the ȳt +h = αi + βi yt −j + γi xi,t + εi,t +h . (5)
annualized growth rate of the variable to be forecasted j =0
J.D. Samuels, R.M. Sekkel / International Journal of Forecasting 33 (2017) 48–60 53

Table 2
Monte Carlo fixed trimming.
ρ = 0.90 ρ = 0.50 ρ = 0.10 ρ=0
R2 = 25 R2 = 50 R2 = 25 R2 = 50 R2 = 25 R2 = 50 R2 = 25 R2 = 50

Fixed trimming 90%


B1 1.04 1.10 0.99 0.99 0.99 0.99 0.98 0.98
B2 1.03 1.04 0.99 0.99 0.98 0.98 0.98 0.98
B3 0.99 0.91 0.98 0.95 0.98 0.96 0.98 0.97
B4 0.92 0.73 0.97 0.91 0.98 0.95 0.98 0.97
B5 0.92 0.73 0.95 0.83 0.96 0.88 0.97 0.90
Fixed trimming 75%
B1 1.01 1.03 0.98 0.97 0.98 0.96 0.97 0.96
B2 1.00 0.98 0.98 0.96 0.98 0.96 0.97 0.96
B3 0.97 0.89 0.97 0.94 0.97 0.95 0.97 0.95
B4 0.92 0.74 0.96 0.91 0.97 0.94 0.97 0.95
B5 0.94 0.82 0.96 0.90 0.97 0.92 0.97 0.93
Fixed trimming 50%
B1 1.00 1.00 0.98 0.97 0.98 0.97 0.98 0.97
B2 0.99 0.96 0.98 0.97 0.98 0.97 0.98 0.97
B3 0.97 0.92 0.98 0.96 0.98 0.96 0.98 0.97
B4 0.95 0.86 0.98 0.95 0.98 0.96 0.98 0.97
B5 0.97 0.91 0.98 0.95 0.98 0.96 0.98 0.96
Note: This table shows the ratio of the MSPEs of trimmed combination to those of untrimmed combinations with equal
weights, for different choices of cutoffs: 50%, 75% and 90% of the worst performing models. The ρ s determine the cross-
correlation of the idiosyncratic shocks between all series. B1–B5 represent different designs for the predictors. The R2 values
control for the relevance of the predictors.

These models are estimated using rolling samples.10 pseudo out-of-sample forecast, estimated when predictor
P is based on the Bayesian information criterion (BIC) of i becomes available at time t. The combined forecast is
the univariate AR of yt on its lags, and is calculated for constructed as:
each rolling sample, meaning that P can change for each
m
t’s forecast, but is held fixed across each predictor-based 
ȳˆ t +h = wi,t ȳˆ i,t +h , (6)
model i, and each horizon h.
i=1
We estimate the parameters of the models using an in-
sample rolling sample size of R = 40 quarters. Given the where ȳˆ t +h is the final combination forecast, wi,t is the
start date of our data (1959Q3), the first one-step-ahead
out-of-sample forecast from the individual models will be weight assigned to each i individual forecast ȳˆ i,t +h at
for 1969Q3. We then use a training sample of S = 20 period t.
out-of-sample forecasts to evaluate which models should
be trimmed before combination. Hence, our first one-step-
4.3.1. Equal weights
ahead forecast combination will be for 1974Q3.
The simplest, and often the most effective, forecast
combination method is the simple mean of the panel of
4.3. Forecast combination methods forecasts. With this approach,

After estimating the individual models, we weight wi,t = 1/M , (7)


the predictions in order to produce the final combined
forecast. We form these combinations both before and where M is the total number of models. When forecasting
after applying trimming, to enable us to analyze the gains output growth in the G7, Stock and Watson (2004)
from the trimming methods described above. We employ found the equal weights combination for output forecasts
four commonly-used forecast combination techniques to to produce forecasts that outperformed those from a
combine the forecasts based on all of the individual collection of more elaborate weighting schemes.
models, as well as on the sets of best models chosen by our
trimming methods. These forecast combination methods
are described briefly here. 4.3.2. Inverse MSE weights
The methods that we use to combine forecasts from We combine forecasting models by weighting accord-
individual models are weighted averages of each of the ing to the inverse of each model’s MSPE. With this method,
individual forecasts. Let ȳˆ i,t +h denote the ith individual models that have lower mean squared prediction errors get
higher weights in the combined forecast. Because we want
to consider the out-of-sample performances of the mod-
10 Hansen et al. (2011) recommends the use of a rolling window to els, we use a rolling training sample of S = 20 quarters
guard against non-stationarity in the models’ loss differentials, which is
to calculate the out-of-sample MSPEs for the individual
a requirement when comparing loss functions over time using the model predictor-based models. The sample gets rolled forward as
confidence set approach. each additional out-of-sample forecast is produced. Calling
54 J.D. Samuels, R.M. Sekkel / International Journal of Forecasting 33 (2017) 48–60

Table 3
Monte Carlo MCS trimming.
ρ = 0.90 ρ = 0.50 ρ = 0.10 ρ=0
R2 = 25 R2 = 50 R2 = 25 R2 = 50 R2 = 25 R2 = 50 R2 = 25 R2 = 50

MCS trimming 50%


B1 1.01 1.03 0.99 0.99 0.99 0.99 0.99 0.98
B2 1.00 0.99 0.99 0.99 0.99 0.99 0.99 0.98
B3 0.98 0.92 0.99 0.97 0.99 0.98 0.99 0.98
B4 0.94 0.78 0.98 0.95 0.99 0.97 0.99 0.98
B5 0.94 0.78 0.97 0.87 0.98 0.91 0.98 0.93
MCS trimming 25%
B1 1.00 1.01 0.99 0.98 0.99 0.98 0.99 0.98
B2 1.00 0.98 0.99 0.98 0.99 0.98 0.99 0.98
B3 0.98 0.93 0.99 0.97 0.99 0.98 0.99 0.98
B4 0.95 0.82 0.98 0.95 0.99 0.97 0.99 0.98
B5 0.96 0.83 0.98 0.91 0.98 0.94 0.99 0.96
MCS trimming 10%
B1 1.00 1.00 0.99 0.99 0.99 0.99 0.99 0.99
B2 0.99 0.98 0.99 0.99 0.99 0.99 0.99 0.99
B3 0.99 0.95 0.99 0.98 0.99 0.99 0.99 0.98
B4 0.97 0.87 0.99 0.97 0.99 0.98 0.99 0.99
B5 0.98 0.89 0.99 0.95 0.99 0.96 0.99 0.98
Note: This table shows the ratio of the MSPEs of trimmed combination to those of untrimmed combinations with equal
weights, for different choices of p-values: 10%, 25% and 50%. The ρ s determine the cross-correlation of the idiosyncratic
shocks between all series. B1–B5 represent different designs for the predictors. The R2 values control for the relevance of
the predictors.

S the number of periods in the rolling training sample, the follows Faust et al. (2013) closely. We start with n possible
weight for model i used in forecasting period t is models, Mi . The ith model is given by

MSE − 1
i,(t −1−S ,t −1)
P

wi,t = . (8) ȳt +h = αi + βi yt −j + γi xi,t + εi,t +h , (10)
M
MSE − 1 j =0

i,(t −1−S ,t −1)
i=1
where ȳt +h is the variable that we are forecasting at
Thus, the weights will be bounded by zero and one, and horizon h, xit is the predictor that is specific to model i, and
sum to one. This approach is an intermediate case between εi,t +h ∼ i.i.d. N (0, σi ). All models have the same number
equal weighting and ‘‘optimal weighting’’ because it relies of lags P. The model-specific predictor xit is assumed to be
on data for weights, but limits the parameter estimation orthogonal to the common predictors (a constant and the
by ignoring the covariances of the forecasts. This approach lags of yt ).
was considered by Bates and Granger (1969). Given a prior probability that the ith model is true,
P (Mi ) and the data D, the posterior probability that the ith
model is the true model can be updated according to
4.3.3. Mallows model averaging
Our third combination method is Mallows model P (D|Mi )P (Mi )
averaging (MMA), proposed by Hansen (2007, 2008). This P (Mi |D) = n
, (11)
P (D|Mj )P (Mj )

combination method is based on the model selection
j =1
criterion of Mallows (1973). The basic idea of MMA is to
obtain the combination weights that minimize the MSE where P (D|Mi ) is the marginal likelihood of the ith model.
over the set of possible forecast combinations. This method We assume that all models are equally likely, meaning
selects the weights W by minimizing the Mallows criterion P (Mi ) = 1/n. The parameter priors for αi , βi and σi are
 2 uninformative, and proportional to (1/σi ) for all i. The
m m
  prior for γi conditional on σi is the (Zellner, 1986) g-prior,
Cn ( W ) = ȳt +h − wi ȳˆ i,t +h + 2σ̂ 2 wi km , (9)
N (0, φσi2 (Xi′ Xi′ )−1 ), where the hyperparameter φ governs
i =1 i=1
the strength of the prior.
where km is a vector with the number of parameters in each The Bayesian h-period-ahead forecast for each model
model. In our case, km is the same for every model in the is
combination because each model uses the same number
P
of own lags and a single additional predictor. 
ȳ˜ i,t +h = α̂ + β̂ yt −j + γ̃i xi,t , (12)
j =0
4.3.4. Bayesian model averaging φ
where γ̃i = ( 1+φ )γ̂i represents the posterior mean of γi ,
The last combination approach that we consider is
Bayesian model averaging (BMA), and our implementation and α̂ and β̂ are the OLS estimators of α and β , respectively.
J.D. Samuels, R.M. Sekkel / International Journal of Forecasting 33 (2017) 48–60 55

In this framework, the marginal likelihood of the ith model 4.5. Main results
reduces to
  12 We concentrate our analysis on the one-year-ahead
1 forecasts, and provide additional evidence for the one- and
P (D|Mi ) ∝
1+φ two-quarter-ahead horizons in a supplementary appendix
(see Appendix A). Table 4 shows the one-year-ahead
− (T −2 P )
φ

1 MSPEs and MCS p-values for the 12 combinations included
∗ SSR + SSRi , (13) in Mτ0 .
1+φ 1+φ
Several results emerge from this exercise. Of the
where SSR is the sum of squared residuals from a regression four data-rich forecasts without trimming, we see that
without xi s, and SSRi is the squared residuals from model i. either MMA or BMA often has the lowest MSPE for the
The posterior probabilities can be calculated from Eq. (2), predicted variables. Wright (2009) also found that BMA
and the final BMA forecast is given by forecasts are generally more accurate for US inflation
than simple averaging. Schwarzmüller (2015) studied the
M
 performances of different pooling methods for Euro area
ȳ˜ t +h = P (Mi |D)ȳ˜ i,t +h . (14)
GDP bridge models, and showed that MMA compared
i=1
favorably to other pooling methods. The MCS-trimmed
Hence, the final BMA forecast takes model uncertainty forecast combinations perform well relative to this set of
into account by weighting each model by its posterior forecasts. For most of the variables, the trimmed forecasts
probability. combined with either EW or MSPE weights are the most
As was observed by Faust et al. (2013), we view the accurate forecasts.
forecasting scheme above as a pragmatic approach to the In addition to showing the results for the full sample, we
combination of the individual models, and make no claims also display them for an initial sub-sample from 1974Q3 to
as to its Bayesian optimality properties. Several of the 1984Q4. This sub-sample precedes the start of the Great
conditions for strict optimality are not met in typical macro Moderation (GM) period. Stock and Watson (2007) and
time series studies. First, the regressors are assumed to Tulip (2009) argued that the predictable component of
be strictly exogenous, an assumption that is clearly false macroeconomic series was reduced significantly during
in the current application. Second, the errors are assumed the GM, especially in the case of inflation. Hence, there
to be i.i.d., but the overlapping nature of the h-step- was more information for distinguishing the forecasting
ahead forecasts introduces serial correlation in the forecast performances of the models prior to the GM period. Table 5
errors that are less than h periods apart. Nevertheless, shows the results for this pre-GM period. The MSEs of
several authors have shown this approach to produce very the forecast combinations are considerably higher for
competitive out-of forecasts in similar applications.11 this period than for the full sample overall. Nonetheless,
the accuracy gains from trimming the worst performing
models are significantly higher as well.
4.4. Inference Another point that is worth highlighting is the fact that
there are minimal to no gains from using MMA or BMA
We compare the performances of the forecast com- after trimming. In sharp contrast to the other combination
binations, before and after trimming, with the above- methods (EW and inverse MSE), trimming leads to only
mentioned alternative data-rich forecasts by making use of very small gains, or even losses, in forecast accuracy with
the model confidence set of Hansen et al. (2011) a second BMA and MMA weighting. We shed further light on this
time. In this comparison, Mτ0 , the initial set of all models, last point by constructing the following two sets of figures.
consists of 12 different models: (i) four data-rich forecasts First, Fig. 1 shows the proportion of times in our out-of-
without trimming (EW combination, inverse MSE weights sample forecasting exercise that each of the 107 models
combination, MMA and BMA forecasts), (ii) the same initial is selected as belonging to the set of best forecasts using
set of forecasts after fixed trimming with a baseline cutoff the MCS or the fixed trimming approach for the one-
of 90%,12 and finally, (iii) the same initial set of forecasts year-ahead forecasts. The models (x-axis) are sorted from
after MCS trimming with a p-value cutoff of 50%.13 Sec- lowest to highest proportion rates (y-axis). Under both
tion 4.6 examines the robustness of our results to different schemes, a subset of models are never included in the set
choices of cutoffs. of best forecasts, and this is followed by a larger group
The MCS indicates the set of best models by attaching with increasing selection rates. Finally, on the other end of
p-values to each of these 12 different forecast combina- the spectrum, there is a small group of models that have
tions. The results below give the MSPEs and p-values for significantly higher selection rates. Thus, this evidence
the combined models included in Mτ0 . points to the existence of some persistence in the out-
of-sample forecasting performances of the best models,
but also shows a considerable degree of instability in
11 See for example Faust et al. (2013), and Wright (2008, 2009). the remaining ones. Next, Fig. 2 shows the average BMA
12 Namely, we trim the worst 90% of models. and MMA weights for each of the 107 models over the
13 We keep only models that receive p-values of 50% or higher associated out-of-sample forecasting period. The results bear a close
with the null hypothesis that the model belongs to the set of best models. resemblance to the previous figure. On average, the vast
The choice of 50% is based on the MC evidence. majority of models get essentially zero weight. A very small
56 J.D. Samuels, R.M. Sekkel / International Journal of Forecasting 33 (2017) 48–60

Table 4
Final model comparisons: full sample one-year-ahead forecasts.
GDP IP EMP HSTARTS GDP deflator
MSPE p-value MSPE p-value MSPE p-value MSPE p-value MSPE p-value

Equal weights 4.38 0.38** 26.05 0.37** 2.54 0.16* 472 0.24* 1.59 0.46**
MSE weights 4.31 0.49** 25.79 0.37** 2.49 0.18* 465 0.24* 1.57 0.46**
BMA 3.56 0.62** 23.02 0.52** 2.40 0.39** 434 0.24* 1.40 0.46**
MMA 3.56 0.68** 22.23 0.81** 2.25 0.39** 409 0.24* 1.40 0.46**
Fixed trimmed: EW 3.85 0.57** 22.24 0.68** 1.99 0.90** 412 0.24* 1.40 0.46**
Fixed trimmed: MSE 3.66 0.68** 22.11 0.88** 1.98 1.00** 404 0.39** 1.39 0.46**
Fixed trimmed: BMA 3.37 0.68** 21.02 1.00** 2.21 0.39** 406 0.24* 1.40 0.46**
Fixed trimmed: MMA 3.52 0.68** 21.16 0.99** 2.17 0.39** 365 1.00** 1.43 0.46**
MCS trimmed: EW 3.00 0.68** 21.02 1.00** 2.37 0.37** 384 0.57** 1.36 0.73**
MCS trimmed: MSE 2.96 1.00** 21.01 1.00** 2.34 0.39** 382 0.57** 1.35 0.85**
MCS trimmed: BMA 3.55 0.68** 23.04 0.52** 2.47 0.16* 396 0.39** 1.32 1.00**
MCS trimmed: MMA 3.45 0.68** 22.03 0.68** 2.34 0.39** 386 0.39** 1.34 0.85**
Note: This table gives MSPEs and p-values for each forecasting scheme under the null that each scheme has the same relative loss. Fixed trims the worst
90% of models based on their MSPEs, MCS trims models with p-values of less than 50%.
*
The model is in the set of best models at the 10% level.
**
The model is in the set of best models at the 25% level.

Table 5
Final model comparisons: pre-Great Moderation one-year-ahead forecasts.
GDP IP EMP HSTARTS GDP deflator
MSPE p-value MSPE p-value MSPE p-value MSPE p-value MSPE p-value
** ** **
Equal weights 9.58 0.26 57.96 0.40 5.95 0.03 961 0.31 4.59 0.04
MSE weights 9.33 0.30** 57.09 0.40** 5.83 0.03 940 0.31** 4.51 0.18*
BMA 6.25 0.47** 45.66 0.41** 5.11 0.40** 757 0.31** 3.82 0.56**
MMA 6.48 0.35** 42.81 0.89** 4.46 0.78** 691 0.65** 3.87 0.56**
Fixed trimmed: EW 7.81 0.33** 46.17 0.41** 4.31 0.78** 749 0.39** 3.80 0.56**
Fixed trimmed: MSE 7.08 0.47** 45.30 0.53** 4.25 1.00** 728 0.65** 3.77 0.56**
Fixed trimmed: BMA 5.74 0.47** 41.34 0.95** 4.56 0.42** 667 0.65** 3.84 0.18*
Fixed trimmed: MMA 6.15 0.35** 40.99 0.95** 4.29 0.92** 619 1.00** 4.04 0.04
MCS trimmed: EW 4.63 0.47** 38.76 0.95** 5.32 0.03 678 0.65** 3.63 0.87**
MCS trimmed: MSE 4.50 1.00** 38.62 1.00** 5.22 0.04 671 0.65** 3.61 0.93**
MCS trimmed: BMA 6.49 0.47** 46.56 0.41** 5.43 0.04 640 0.65** 3.55 1.00**
MCS trimmed: MMA 5.95 0.47** 42.13 0.95** 4.84 0.15* 653 0.65** 3.62 0.93**
Note: This table gives MSPEs and p-values for each forecasting scheme under the null that each scheme has the same relative loss. Fixed trims the worst
90% of models based on their MSPEs, MCS trims models with p-values of less than 50%.
*
The model is in the set of best models at the 10% level.
**
The model is in the set of best models at the 25% level.

number of models (usually fewer than 10) get most of the We analyze the year-ahead forecasts here, but provide the
weight. The fact that a few models are usually assigned same figures for the one- and two-quarter-ahead results in
most or all of the weight is a well-known feature of BMA, a supplementary appendix (see Appendix A).
but less so, to the best of our knowledge, for MMA. Thus, Fig. 3 provides evidence on the relative performances of
the lack of improvement in the BMA and MMA forecasts the two trimming methods for various cutoffs options by
is explained by the fact that these pooling methods are showing the ratio of the trimmed forecast combination to
essentially already trimming the worst performing models. the non-trimmed forecast combination’s MSPE for a wide
On the other hand, our evidence also shows that once the range of choice of cutoffs. A ratio that is smaller than one
pool of models has been trimmed with the MCS, applying means that the trimmed forecast combination has an MSPE
weights other than equal weights to the remaining models smaller than the non-trimmed one. For fixed trimming,
the x-axis represents the proportion of models that are
has very little benefit, if any, for the resulting combined
excluded from the combination. We start by keeping all
forecast.
models in the combination, and hence the ratios start at
one. We then trim all but the 2% best performing models.
4.6. Robustness For MCS trimming, the x-axis shows the p-values based
on which the set of best models is being identified. Low
In this section, we analyze the sensitivity of our results p-values result in fewer models being trimmed, whereas
to the choice of the cutoff. Because the implementation high p-values induce more models to be trimmed. We plot
of the MCS and fixed trimming schemes depends on the the relative accuracy of the MCS trimmed forecast using
cutoff choice, we conduct a careful analysis in order to p-values that vary from 1% to 98%.
determine how the results vary with different selections of Fig. 3 indicates that very aggressive trimming is
the cutoff. The models are combined using equal weights. required in order for fixed trimming to provide forecasts
J.D. Samuels, R.M. Sekkel / International Journal of Forecasting 33 (2017) 48–60 57

Fig. 1. Full sample selection rate of each model in the fixed and MCS sets of best models for one-year-ahead forecasts. Notes: This figure displays the full
sample selection rate of the models in the set of best models selected by the MCS with a p-value of 50% and by fixed trimming with a cutoff of 10%, sorted
from the lowest to highest selection rates.

that improve on those of the simple average combination aggressive trimming rules tend to be superior, as per
scheme. For the variables forecast in this paper, a fixed Bjørnland et al. (2011) and Favero and Aiolfi (2005).
rule that trims around 90% of the models, and therefore For MCS trimming, the highest gains from trimming are
combines fewer than 10% of the models, provides the achieved with p-values of between 30% and 60%. Starting
most accurate forecast combination. With this level of with a p-value of 1%, only the very strongly statistically
trimming, one can achieve sizable MSPE reductions of inferior forecasts with p-values between zero and 1%
around 25% over combining all models’ predictions. As are discarded. Hence, the differences between the MCS-
was discussed above, other papers have also found that trimmed and non-trimmed combinations are small, as
58 J.D. Samuels, R.M. Sekkel / International Journal of Forecasting 33 (2017) 48–60

Fig. 2. Average BMA and MMA weights for the one-year-ahead out-of-sample forecasts. Notes: This figure displays the average weight attached to each
model by BMA and MMA over the out-of-sample forecasting period, ranked from lowest to highest weights.

is evidenced by the fact that most of the ratios start at When comparing fixed and MCS trimming, it is clear
approximately one. As we increase the level of significance that MCS trimming performs better for most of the
required to include a forecast to the set of best forecasts, cutoff space. As was discussed earlier, fixed trimming
more forecasts are trimmed and the gains from MCS only provides significant forecast accuracy gains when we
discard a very high share of the models. On the other
trimming increase. Above the 60% level, increasing the p-
hand, the MCS trimming results are relatively unchanged
value cutoff leads to worse forecasts for all of the variables for a wide range of p-value cutoffs. By taking into account
that we analyze. Importantly, MCS trimming exhibits large the significance of the statistical differences between the
and robust gains in forecasting performance for a wide forecasts, one is able to select more carefully which models
range of p-value cutoffs. should be trimmed.
J.D. Samuels, R.M. Sekkel / International Journal of Forecasting 33 (2017) 48–60 59

Fig. 3. RMSE ratio of fixed and MCS trimmed to untrimmed forecast combination for one-year-ahead forecasts. Notes: This figure shows the MSPE ratio of
trimmed to non-trimmed forecast combinations with equal weights (y-axis) for the full sample. A ratio smaller than one means that the trimmed forecast
combination has a smaller MSPE than the combination with the full set of models. For fixed trimming, the x-axis represents the proportion of models being
excluded from the combination. For MCS trimming, the x-axis shows the p-values with which the set of best models is being identified.

5. Conclusion fraction to discard, without considering the statistical sig-


nificance of the differences between the models.
This paper has proposed the use of model confidence We show that substantial gains in forecast accuracy can
sets for forming conditional forecast combination strate- be achieved by discarding the worst performing models
gies. In an environment in which the econometrician has before combining the forecasts. We argue that the model
access to a large number of models, we have compared the confidence set approach offers a more robust procedure for
performance of this proposed method to the more com- selecting the forecasting models based on their past out-of-
mon approach of ranking the models and choosing a fixed sample performances.
60 J.D. Samuels, R.M. Sekkel / International Journal of Forecasting 33 (2017) 48–60

Acknowledgments Clemen, R., & Winkler, R. (1986). Combining economic forecasts. Journal
of Business & Economic Statistics, 39–46.
Faust, J., Gilchrist, S., Wright, J. H., & Zakrajšsek, E. (2013). Credit
We would like to thank Natsuki Arai, Peter Christo- spreads as predictors of real-time economic activity: a Bayesian
phersen, Jon Faust, Domenico Giannone, Cheng Hsiao, model-averaging approach. Review of Economics and Statistics, 95,
Maral Kichian, Eva Ortega, Gabriel Perez-Quiros, Tatevik 1501–1519.
Faust, J., & Wright, J. (2009). Comparing Greenbook and reduced form
Sekhposyan and Jonathan Wright for useful discussions forecasts using a large realtime dataset. Journal of Business and
and suggestions, as well as seminar participants at Johns Economic Statistics, 27, 468–479.
Hopkins University, Bank of Spain, BlackRock, Bank of Favero, C., & Aiolfi, M. (2005). Model uncertainty, thick modelling and the
predictability of stock returns. Journal of Forecasting, 24, 233–254.
Canada, Instituto de Pesquisa e Ensino, DePaul University, Hansen, B. E. (2007). Least squares model averaging. Econometrica, 75,
2012 CEF, 2012 LAMES, 1st Vienna Workshop on High Di- 1175–1189.
mensional Time Series, 2013 Canadian Economic Associa- Hansen, B. E. (2008). Least-squares forecast averaging. Journal of
Econometrics, 146, 342–350.
tion, 2013 International Symposium on Forecasting, 2013 Hansen, P., Lunde, A., & Nason, J. (2011). The model confidence set.
NASM, and the 2013 NBER-NSF Time Series Conference Econometrica, 79, 453–497.
(poster). Jon Samuels thanks the SRC for the Robert M. Hendry, D., & Clements, M. (2004). Pooling of forecasts. The Econometrics
Journal, 7, 1–31.
Burger Fellowship, and Rodrigo Sekkel, Capes/Fulbright Inoue, A., & Kilian, L. (2008). How useful is bagging in forecasting
and the Campbell Fellowship for financial support. Jon economic time series? A case study of US consumer price inflation.
Samuels thanks the SRC for the Robert M. Burger Fellow- Journal of the American Statistical Association, 103, 511–522.
Makridakis, S., & Winkler, R. (1983). Averages of forecasts: Some
ship, and Rodrigo Sekkel, Capes/Fulbright and the Camp-
empirical results. Management Science, 987–996.
bell Fellowship for financial support during our graduate Mallows, C. L. (1973). Some comments on C p. Technometrics, 15, 661–675.
studies. The views expressed in this paper are solely those Schwarzmüller, T. (2015). Model pooling and changes in the informational
of the authors and not necessarily those of the US Bureau content of predictors: An empirical investigation for the euro area. Tech.
rep., Kiel Working Paper
of Economic Analysis, the US Department of Commerce or Stock, J., & Watson, M. (2002). Macroeconomic forecasting using diffusion
the Bank of Canada. indexes. Journal of Business and Economic Statistics, 20, 147–162.
Stock, J., & Watson, M. (2004). Combination forecasts of output growth in
a seven-country data set. Journal of Forecasting, 23, 405–430.
Appendix A. Supplementary data Stock, J., & Watson, M. (2007). Why has US inflation become harder to
forecast? Journal of Money, Credit and Banking, 39, 3–33.
Supplementary material related to this article can be Timmermann, A. (2006). Forecast combinations. In Handbook of economic
found online at http://dx.doi.org/10.1016/j.ijforecast.2016. forecasting. Vol 1 (pp. 135–196).
Tulip, P. (2009). Has the economy become more predictable? changes in
07.004. greenbook forecast accuracy. Journal of Money, Credit and Banking, 41,
1217–1231.
Wright, J. (2008). Bayesian model averaging and exchange rate forecasts.
References
Journal of Econometrics, 146, 329–341.
Wright, J. (2009). Forecasting US inflation by Bayesian model averaging.
Aiolfi, M., & Timmermann, A. (2006). Persistence in forecasting perfor- Journal of Forecasting1, 28, 131–144.
mance and conditional combination strategies. Journal of Economet- Zellner, A. (1986). On assessing prior distributions and Bayesian
rics, 135, 31–53. regression analysis with g-prior distributions. In Bayesian Inference
Alvarez, R., Camacho, M., & Perez-Quiros, G. 2012. Finite sample
and Decision Techniques: Essays in Honor of Bruno De Finetti, Vol 6
performance of small versus large scale dynamic factor models.
(pp. 233–243).
Bates, J., & Granger, C. (1969). The combination of forecasts. OR, 20,
451–468.
Bernanke, B., & Boivin, J. (2003). Monetary policy in a data-rich
environment* 1. Journal of Monetary Economics, 50, 525–546. Jon D. Samuels is a research economist at the Bureau of Economic
Bjørnland, H., Gerdrup, K., Jore, A., Smith, C., & Thorsrud, L. (2011). Does Analysis at the US Department of Commerce. He obtained his Ph.D. in
forecast combination improve norges bank inflation forecasts? Oxford economics from Johns Hopkins University.
Bulletin of Economics and Statistics, 74, 163–179.
Capistrán, C., Timmermann, A., & Aiolfi, M. 2010. Forecast combinations,
Working Papers. Rodrigo M. Sekkel is a senior analyst at the Canadian Economic Analysis
Clemen, R. (1989). Combining forecasts: A review and annotated department of the Bank of Canada. He obtained his Ph.D. in economics
bibliography. International Journal of Forecasting, 5, 559–583. from Johns Hopkins University.

S-ar putea să vă placă și