Evaluating Time Series

Journal of Travel Research http://jtr.sagepub.
com/
Evaluating Time-Series Models to Forecast the Demand for Tourism in Singapore : Comparing
Within-Sample And Postsample Results
Chi-Ok Oh and Bernard J. Morzuch
Journal of Travel Research 2005 43: 404
DOI: 10.1177/0047287505274653
The online version of this article can be found at:

http://jtr.sagepub.com/content/43/4/404
Published by:
http://www.sagepublications.com
On behalf of:
Travel and Tourism Research Association
Additional services and information for Journal of Travel Research can be found at:
Email Alerts: http://jtr.sagepub.com/cgi/alerts
Subscriptions: http://jtr.sagepub.com/subscriptions
Reprints: http://www.sagepub.com/journalsReprints.nav
Permissions: http://www.sagepub.com/journalsPermissions.nav
Citations: http://jtr.sagepub.com/content/43/4/404.refs.html
>> Version of Record - Apr 5, 2005
What is This?
Downloaded from jtr.sagepub.com at Universiti Teknologi MARA (UiTM) on October 4, 2012

MAY
10.1177/0047287505274653
JOURNAL
2005 OF TRAVEL RESEARCH
Evaluating Time-Series Models to Forecast

the Demand for Tourism in Singapore
Comparing Within-Sample And Postsample Results
CHI-OK OH AND BERNARD J. MORZUCH
The authors look at eight models to forecast inbound total exports (or $9.4 billion) and result in 198,010 jobs, or
tourist arrivals to Singapore, six of which were analyzed by 9.6% of total employment.
Chan and by Chu. The authors explore model performance Expenditures in a tourist-generating country offer a
from a different perspective than either of these authors and robust source of income; they generally have both immediate
arrive at different conclusions. Major suggestions are as fol- and delayed monetary impacts on the economy. The initial
lows: (1) a complete comparison among competing models income generated by visitors works its way through the econ-
during the estimation phase and a battery of performance omy as a direct source of income and jobs for people
statistics when evaluating these models sheds light on sev- involved in providing tourism services. Indirectly, this
eral top-performing models; (2) when evaluating the fore- income then supports other sectors of the economy through
casting performance of competing models, different perfor- induced consumption expenditures in the region or country.
mance statistics may lead to different model selections; (3) This process of primary and secondary income flows is the
among competing models, a model that performs best during multiplier effect of tourism expenditures (Archer 1977;
the within-sample period does not necessarily perform best Eadington and Redman 1991; Frechtling and Horvath 1999;
in the postsample period; (4) changing the length of the fore- Milne 1987).
cast horizon can have an effect on the choice of the best Because tourism can have a huge effect on an economy, it
model; and (5) a combined model may be the one that pro- becomes useful to estimate its potential impact. Reasonably
vides the best forecasting performance. accurate forecasts of the number of tourists to be served, the
time of year of their visits, and their service needs are essen-
Keywords: time-series model; within-sample and tial for planning future infrastructure and superstructure,
postsample performance; forecast horizon accommodations, transportation, attractions, promotion, and
changes; combined model other important services (Uysal and Crompton 1985). Accu-
rate and timely forecasts of tourist demand can assist a gov-
Tourism is an important sector in established economies ernment with policy decisions and help the private sector
and perhaps one of the most important in developing econo- with decisions relating to sizing, location selection, and oper-
mies. It has the potential to promote economic growth and ations (Calantone, Di Benedetto, and Bojanic 1987). On the
improve the international balance of payments. Tourism other hand, the absence of accurate and timely tourism fore-
behaves in much the same manner as an economys export casts does not contribute to the information base that is nec-
sector. Effectively, outsiders travel to a region or nation to essary for intelligent decisions. For all of these reasons, the
development of forecast models for the travel and tourism
purchase tourism services. These payments represent foreign
industry has received an increasing amount of attention
monetary inflows into the economy. It is in a countrys inter- during the past decade.
est to develop a tourism industry that generates new capital In light of the forecasting models presented in the tourism
but creates minimal environmental disruption. literature, a fair amount of attention has focused on model
According to the World Travel and Tourism Council development and choice among competing models after they
(WTTC; 2002a), travel and tourism are expected to contrib- are applied to postsample data. In general, the choice among
ute 3.6% to worldwide gross domestic product (GDP) and an models is made on the basis of some well-established perfor-
estimated 198,098,000 jobs, or 7.8% of total employment on mance statistic. Regarding model development, the choice
a yearly basis. A developing economy like Singapore does and comparison are usually between univariate and causal
not have significant natural resources and depends heavily on
international trade to promote its economic well-being. For
countries like this, international tourism becomes a source of Chi-Ok Oh is a PhD candidate in the Department of Recreation,
Park and Tourism Sciences at Texas A&M University in College Sta-
reducing balance of payments deficits and helps to diversify tion. Bernard J. Morzuch, PhD, is a professor in the Department of
the structure of a countrys economy and lessen the negative Resource Economics at the University of Massachusetts in Amherst.
impact of regional economic imbalances (Uysal and Journal of Travel Research, Vol. 43, May 2005, 404-413
Crompton 1985). The WTTC (2002b) reports that the tour- DOI: 10.1177/0047287505274653
ism industry in Singapore is expected to contribute 6.4% of 2005 Sage Publications

JOURNAL OF TRAVEL RESEARCH 405
models. Assessment has usually been based on applying a values of the series by extrapolating the movement of the
specific performance statistic like mean absolute deviation series through time. Martin and Witt (1989) tested the fore-
(MAD) or root mean square error (RMSE), for example, to a casting performance for outbound tourism from five coun-
fitted model that makes use of a postsample data set. The tries and showed that the nave no-change model generally
model providing the most favorable outcome in terms of the provides the most accurate 1-year-ahead tourism-arrivals
performance statistic is deemed to be the best. forecasts based on MAPE and root mean square percentage
In many tourist demand studies, minimal attention has error (RMSPE).
been devoted to comparing the within-sample performance Generating monthly forecasts of visitor arrivals in Las
of competing models as the basis for model selection and to Vegas, Nevada, Witt, Newbould, and Watkins (1992) showed
comparing the postsample performance of a model across that Winterss exponential smoothing and nave II models
different forecast horizons. As examples, Chan (1993) pro- outperform the nave I model in terms of MAPE. Sheldon
posed a sine-wave regression model to forecast tourism in (1993) determined that the nave no-change model and the
Singapore. Chu (1998a) demonstrated that his autoregressive double exponential smoothing model are superior to four
integrated moving average (ARIMA) (3,1,0)(0,1,0) 12 model other proposed models in terms of MAPE when predicting
outperformed Chans model. In each study, there were no aggregate annual expenditures in the United States by tour-
within-sample performance comparisons among competing ists from six countries (i.e., from Canada, Japan, United
models. These authors moved directly to analyzing post- Kingdom, West Germany, France, and Italy). Dharmaratne
sample performance and making comparisons among mod- (1995) used data on tourist arrivals in Barbados, and Chu
els using mean absolute percentage error (MAPE). Our study (1998b) gathered monthly international tourist arrivals data
is intended to complement both of these studies by paying in Taiwan, Japan, Hong Kong, South Korea, Singapore, the
attention to the within-sample performance of the competing Philippines, Indonesia, Thailand, New Zealand, and Austra-
models. Doing so may provide more insight into the choice lia to show that an ARIMA model outperforms other models
of a final model for use in forecasting. in terms of MAPE. Witt and Witt (1995) summarized numer-
Specifically, our objectives are as follows: (1) Define a ous published articles dealing with tourism forecasting. Basi-
within-sample period and a postsample period using avail- cally, these studies used one or two performance measures
able Singapore tourist data. Construct alternative models and focused on evaluating the postsample periods.
of the Singapore tourism industry to be used on the within- Chan (1993) developed five models (i.e., the nave I
sample data set. Determine which model, if any, captures the model, the nave II model, a simple linear regression time-
data-generating process. (2) Calculate within-sample perfor- trend model, a sine-wave time-trend regression model, and
mance statistics for each model that we developed in the pre- an ARIMA model), and Chu (1998a) developed another
vious step. Use this result to provide guidance in choosing ARIMA model to forecast visitor arrivals to Singapore. Chan
the correct model to be applied to the postsample data. (3) (1993) concluded that a sine-wave time-series regression
Apply each model to the postsample data and calculate the model provided the most accurate forecasts with the smallest
same performance measures for each model in the post- MAPE compared to the other proposed models. However,
sample that we calculated in the within-sample period. Will Chu (1998a) pointed out that Chan ignored the seasonal
the model that performed best during the within-sample peri- component for his ARIMA model and that when seasonal
od perform best during the postsample period? This distinc- components are properly specified, the ARIMA model out-
tion is frequently disregarded in many forecasting studies. performed the sine-wave time-series regression model.
(4) Examine the forecasting performance of a particular Kulendran and Wilson (2000) made use of MAPE and
model for different forecast horizons to determine whether a RMSPE when evaluating Australian tourism demand fore-
forecasting model shows consistent forecasting power. As cast models for a single time horizon. Turner and Witt (2001)
indicated by Diebold (1998), the best forecasting model will developed various tourism demand models for New Zealand
often change with the forecast horizon. (5) Suggest a strategy and evaluated forecasting performance using MAPE for dif-
for choosing a model for final use when there appear to be ferent time horizons. Cho (2003) used three different uni-
several individual good model choices. variate approaches to predict Hong Kong tourism demand;
The major theme that we address in this research is that evaluation criteria were RMSE and MAPE. Kulendran and
all forecasting models are approximations to the underlying
Shan (2002) and Louvieris (2002) used ARIMA models to
dynamic patterns in the series that we forecast. There is no
forecast Chinas and Greeces inbound travel demand,
reason that the model that provided the best fit during the
respectively, for a single horizon; each used both MAPE and
estimation period should remain the best during the forecast
RMSPE as evaluation criteria.
period. Scant attention seems to be paid to this issue in a good
Kulendran and Wilson (2000) and Turner and Witt
number of tourism-forecasting studies. As subtle as this con-
(2001) used multivariate procedures to construct forecasting
cern may be, it puts into perspective the fragility of a single
models; they made postsample evaluations using the tradi-
forecasting model that the researcher thinks is best.
tional accuracy measures. Kulendran and King (1997) made
use of MAPE, RMSPE, and RMSE for four different time
horizons when using error correction models. Kulendran and
LITERATURE REVIEW Witt (2001) compared the forecasting performance of mod-
els estimated by cointegration techniques and least squares
One use of forecasting models in tourism is predicting the regression using MAPE for four different time horizons.
number of arrivals to countries. According to Uysal and Greenidge (2001) used a structural time series approach to
Crompton (1985), the univariate time series methodology is forecast tourist demand for Barbados. He used MAPE to
useful for relatively short-term forecasts. The historical val- evaluate his one-step-ahead forecasts. Finally, Kulendran
ues of a single time series form the basis for projecting future and Witt (2003a, 2003b) used MAPE and RMSPE to
406 MAY 2005
evaluate the performance of their ARIMA and error correc- Unbiasedness suggests that 1 = 0 and 2 = 1 and that the
tion models over multiple time horizons. forecast errors (t + 1) should have a mean of zero. Conduct the
In summary, the past 10 years have witnessed an explo- joint hypothesis test that 1 = 0 and 2 = 1. If the forecasts are
sion of modeling activity. Few studies, however, have thor- unbiased, an F test or a Wald test will permit imposition of
oughly examined forecasting performance using a battery of the restrictions. The estimated error series should behave as a
performance measures. Furthermore, no study has compared white-noise process. This procedure is explained fully in
the within-sample performance of various models against Enders (2004, pp. 8385) and applied in Witt, Song, and
their performance in the postsample. Louvieris (2003).
In this study, we apply our approach to the Singapore The five remaining performance criteria are loss mea-
tourist demand data using the six models proposed by Chan sures and include mean absolute error (MAE), MAPE,
(1993) and Chu (1998a), along with two of our own models. RMSE, Akaikes Information Criterion (AIC), and
We compare and rank the within-sample performance of all Schwartzs Bayesian Criterion (SBC). MAE, MAPE, and
eight models using six familiar performance statistics. We RMSE are straightforward. AIC and SBC deserve a brief
repeat the comparisons using postsample data. We change explanation. Essentially, they are used with relatively sophis-
the forecast horizon and repeat the process. We suggest a ticated models that are subject to having a number of parame-
final model based on all of these comparisons. ters estimated (e.g., ARIMA models). Each imposes a pen-
alty for estimating additional parameters.
The Akaike Information Criterion (AIC) is calculated as
TIME SERIES MODELS
AIC = n ln(residual sum of squares) + 2k (3)
The models proposed by either Chan (1993) or by Chu
(1998a) include nave no-change (nave I), nave change where n is the number of observations, ln is the natural loga-
(nave II), simple linear time trend, Winterss triple exponen- rithm, and k is the number of parameters being estimated.
tial smoothing, ARIMA, and sine-wave time-series regres- The Schwartz Bayesian Criterion (SBC) is calculated as
sion. We provide no explanation for the first five because
they are fairly standard. Nave I, nave II, and linear time SBC = n ln(residual sum of squares) + k ln(n). (4)
trend are discussed in Witt and Witt (1992) or Newbold and
Bos (1994). Newbold and Bos likewise provide a fine discus-
sion of Winterss triple exponential smoothing and ARIMA. In a typical ARIMA model, k = the number of auto-
Sine-wave time-series regression is perhaps the most regressive parameters + the number of moving average pa-
unique of the group. Basically, and according to Chan rameters + a possible intercept term. When AIC is computed
(1993), this model consists of a linear trend component and a for two competing models, the model with the smaller AIC is
component to capture the periodic trend. The second compo- selected. The same applies to SBC.
nent is interpreted as the magnitude of the deviation from the
linear trend model at time t.
The model is as follows: DATA
Yt = a1 + a2t + a3sin (a4 + a5 t) + t (1)
We used monthly observations from July 1977 to July
where YDt is the actual value of the time series variable in 1990 on travelers arrivals to Singapore. We obtained the data
time period t; 1 is a random disturbance in time period t, as- from the Singapore Tourist Promotion Board (various years).
sumed to abide by the usual classical assumptions; a1, a2, a3, Our intention was to replicate the studies by Chan (1993) and
a4, and a5 are parameters to be estimated; and sin is the sine Chu (1998a). We used the same data set as far as possible.
function. More specifically, and as pointed out by Chan Our estimation or within-sample period was July 1977 to
(1993), a1 = intercept of the linear model; a2 = slope of the December 1988, inclusive. Our out-of-sample or postsample
linear model; a3 = amplitude of the sine function; a4 = phase period was January 1989 to July 1990, inclusive. These
of the angle of the sine function; and a5 = frequency of the matched Chan (1993) and Chu (1998a). Figure 1 indicates an
sine function. upward movement in the series with seasonal variation.
Observations for the period July 1977 to December 1988,
inclusive, were used to construct each of our eight forecast
MODEL PERFORMANCE CRITERIA models. Each model was then used to generate 1-month-
ahead forecasts for the entire postsample period, which con-
sisted of 19 months. This amounts to a set of 19 monthly
We use six measures to evaluate our forecasts. The first is forecasts. We repeated the process to generate 1-month-
concerned with determining whether the forecast is unbi- ahead forecasts for shorter forecast horizons (e.g., 3 months
ased. The remaining five quantify loss due to forecast error. and 15 months).
The test for unbiasedness is implemented as follows. Gener-
ate a series of 1-step-ahead forecasts (Yt +1 ) using any of the
models under consideration. Match each forecast (Yt +1 ) with
its actual value (Yt +1 ). Regress the series of one-step-ahead
EMPIRICAL RESULTS
actual values on the series of one-step-ahead forecasts, as
indicated below: Our starting point is Chans (1993) sine-wave regression
model and Chus (1998a) ARIMA (3,1,0)(0,1,0) 12 model.
Yt + 1 = 1 + 2Yt + 1 + t + 1 (2) We were able to replicate their estimation results using the
FIGURE 1
TIME PLOT OF INBOUND TRAVELERS TO SINGAPORE
Number of Arrivals
500,000
450,000
400,000
350,000
300,000
Number
250,000
200,000
150,000
100,000
50,000
0
Jul-77
Jul-78
Jul-79
Jul-80
Jul-81
Jul-82
Jul-83
Jul-84
Jul-85
Jul-86
Jul-87
Jul-88
Jul-89
Jul-90
Month
TABLE 1 and hypothesized that a more parsimonious ARIMA model

SINE-WAVE TIME-SERIES REGRESSION RESULTS might result in better postsample forecast performance.
Within the family of ARIMA models, we expected simpler
Approximate Approxi- Approxi- to be better.
Coefficient Standard mate mate Regarding our ARIMA model, we see that it fits the data
Parameter Estimate Error t-Value p Value well. Like Chus (1998a) model, parameter estimates are sta-
a1 75,744.6 30,418.4 2.49 .014 tistically significant. We performed Dickey-Fuller tests to
a2 2,425.7 448.3 5.41 .00005 test for stationarity. For each coefficient, we were able to
a3 64,338.3 20,897.6 3.08 .004 reject the null hypothesis of a unit root. For each ARIMA
a4 0.8467 0.37 2.29 .02 model presented in Table 2, we report the AIC and SBC. Val-
a5 0.0345 0.006 5.75 .00003
ues for these statistics are slightly smaller for our model than
Note: See Chan (1993), Table 1, page 59, for matching esti- for Chus model, suggesting that our model might be slightly
mation results. better than Chus.
For both ARIMA models, we also report the Ljung-Box
Q statistic for four lag structures of the residuals. This statis-
data set described in the previous section. Table 1 presents tic is used to test whether the residuals have a mean of zero,
our estimation results for Chans model. There are slight dis- constant variance, and are serially uncorrelated. We test for
crepancies between this authors original results and our serial correlation at lag lengths 1 to 6, 1 to 12, 1 to 24, and 1 to
reestimations. As indicated previously, we obtained the data 32. The null hypothesis in this setting is that the residuals are
set from the Singapore Tourism Bureau. The bureau pointed not autocorrelated. The p value for each lag structure is very
out to us that slight adjustments were made to the data set high, suggesting that we do not have evidence to reject the
originally used by Chan and by Chu. null hypothesis that the models residuals are white noise.
In Table 2, we report our estimation results for Chus Finally, we estimated a Winterss three-parameter expo-
(1998a) ARIMA (3,1,0)(0,1,0) 12 model. We also report esti- nential smoothing model. Winterss method employs a
mation results for another ARIMA model that we thought smoothing process three times to estimate the level, trend,
might provide better results than Chus. This was an ARIMA and seasonal components in a series. We used an optimiza-
(0,1,1)(1,1,0) 12 model. We chose to estimate this model tion routine in SAS to obtain estimates. These smoothing
because our visual interpretation of the autocorrelation func- weights are determined so as to minimize the sum-of-
tion (ACF) for this series led us to a different conclusion than squared one-step-ahead prediction errors. (See Frechtling
Chu. Also, it was slightly more parsimonious than the [2001] for a straightforward text explanation of deriving the
ARIMA model estimated by Chu; that is, it contained one smoothing constants.) The estimated weights for level, trend,
less parameter. We adopted conventional forecasting wisdom and seasonal components were .3365, .1204, and .2626,
408 MAY 2005
TABLE 2
ARIMA MODEL ESTIMATION RESULTS
ARIMA (3,1,0) (0,1,0)12a ARIMA (0,1,1) (1,1,0)12

Coefficient Standard Coefficient Standard
Variable Estimate Error p Value Variable Estimate Error p Value
AR lag 1 0.494 .089 .00005 MA lag 1 0.499 .079 .00001
AR lag 2 0.231 .097 .019 AR seasonal lag 12 0.276 .09 .003
AR lag 3 0.197 .089 .029
AIC 2,288.4 AIC 2,279.0
SBC 2,296.8 SBC 2,284.7
Ljung-Box Q Ljung-Box Q
Lag 6 .9902 Lag 6 .9401
Lag 12 .1212 Lag 12 .3805
Lag 24 .1181 Lag 24 .2623
Lag 32 .0551 Lag 32 .2919
ARIMA = Autoregressive integrated moving average.
a. See Chu (1998a), Table 1, page 82, for matching estimation results.
TABLE 3
WITHIN-SAMPLE PERFORMANCE, INDIVIDUAL MODELS
Model Unbiasedness MAPE MAE RMSE AIC SBC

Nave I 4.59 7.24 (6) 17,393.7 (5) 21,820.8 (5)
Nave II 7.17* 11.73 (8) 28,363.5 (8) 35,214.0 (8)
Linear regression 45.17* 7.12 (5) 17,651.4 (6) 22,926.7 (6)
Winterss model 0.09 2.46 (1) 5,979.9 (1) 7,707.3 (1)
ARIMA (2,1,2) 3.15 10.47 (7) 25,885.3 (7) 32,920.6 (7) 2,858.1 2,869.8
ARIMA (3,1,0)(0,1,0)12 0.15 2.95 (3) 7,224.0 (3) 9,222.9 (3) 2,288.4 (2) 2,296.8 (2)
ARIMA (0,1,1)(1,1,0)12 0.86 2.84 (2) 6,967.9 (2) 8,957.0 (2) 2,279.0 (1) 2,284.7 (1)
Sine-wave regression 0.01 5.56 (4) 13,348.5 (4) 16,956.1 (4)
Note: ARIMA = autoregressive integrated moving average; MAPE = absolute percentage error; MAE = mean absolute error;
RMSE = root mean square error; AIC Akaikes Information Criterion; SBC = Schwartzs Bayesian Criterion. Boldface text indi-
cates the best performance using the criteria. A number in parentheses next to each performance statistic represents a models
rank for that particular statistic.
*p < .05.
respectively. Their p values were .0001, .003, and .0002, Relative to the majority of studies previously cited, the
respectively. These estimates are not presented in any table. novelty of Table 3 is the inclusion of six performance statis-
tics: the unbiasedness statistic plus three to five measures that
Within-Sample Performance quantify loss due to forecast error for each estimated model.
The test for unbiasedness, MAPE, MAE, and RMSE are
At this juncture, we pretend that the postsample data computed for all eight models. AIC and SBC are computed
set does not exist. If this truly were the case, the appropriate for the three ARIMA models. Neither Chan (1993) nor Chu
question is how to make use of what we have so far to deter- (1998a) conducted this type of analysis on their within-
mine which of our models best captures the data generating
sample models. They emphasized MAPE only, and the
process. The answer is to calculate the battery of perfor-
MAPE that they did report applied only to results in the
mance statistics for each of the eight models estimated dur-
ing the within-sample period; that is, from July 1977 through postsample. Also, Chu did report AIC and SBC in a table as
December 1988, inclusive. Table 3 provides a summary of part of his estimation results.
our performance statistics. There are important reasons for reporting multiple per-
Regarding the choice of models, Chan (1993) analyzed formance statistics. Evaluating forecast performance begins
the nave I, nave II, linear regression, ARIMA (2,1,2), and with the ability to measure loss. Ideally, if the researcher is so
sine-wave regression models. Chu (1998a) analyzed the familiar with the phenomenon being analyzed that he can
ARIMA (3,1,0)(0,1,0) 12 model. The performance statistics specify a function that captures all implicit and explicit
reported in Table 3 are based upon our reestimations of these costs resulting from each periods forecast error (commonly
models. In Table 3, we also present these statistics for our referred to as a cost of error function), measuring loss
more parsimonious ARIMA (0,1,1)(1,1,0) 12 model and for becomes precise. When the evaluator is a passive observer of
our Winterss model. the phenomenon, the best that can be expected when
measuring loss is to report and compare performance statis- models, we see that ours ranks above Chus (1998a), and
tics that at least capture the different aspects of the cost func- Chus ranks above Chans (1993).
tion itself. Any performance statistic is not all-encompass- In summary, to this point, and on the basis of these
ing. For example, MSE and its transform RMSE capture within-sample results, we have evidence that Winterss
quadratic loss; MAE captures absolute loss. Since a given three-parameter model provides unbiased forecasts and per-
performance statistic captures only one dimension of cost, a forms the best not only in terms of the smallest MAPE but
suggested approach is to report several as a way of judging also in terms of the remaining performance statistics. Next in
consistency of the model over different loss metrics. line is our ARIMA model, followed by Chus (1998a)
Witt and Witt (1992), for example, echo the sentiment ARIMA model. Chan (1993) and Chu do not make use of
that knowledge of the tourism industrys cost function is nec-
this phase of the forecasting sequence. Rather, they apply all
essary when attempting to determine loss due to different
forecasting methods. Lack of information about this func- models to the postsample data set directly and then select the
tion, particularly as it relates to the diverse sectors of the tour- model that yields the smallest MAPE.
ism industry, suggests the need to make comparisons among
alternative performance measures (rather than relying on a Postsample Performance
single measure) as the fallback position for not knowing the
cost function itself. The previous section provided clear evidence of model
Column 2 of Table 3 presents the results of the forecast performance during the within-sample period. This, in turn,
unbiasedness tests for each model listed in Column 1. The leads to reasonable expectations as to how these models
test statistic used is Walds. It is distributed as 2 with J might perform during the postsample period. That is, barring
degrees of freedom, where J is the number of restrictions a structural change during the postsample period, it seems
placed on the parameters in equation 2. In this particular reasonable to expect that these within-sample rankings will
case, J = 2 because we are imposing the simultaneous restric- carry over to the postsample period.
tions that 1 = 0 and 2 = 1. We are forced to reject the null To see what happens, we apply each model to the
hypothesis of forecast unbiasedness for the nave II and trend postsample data set. More specifically, we consider the entire
models at the .05 level of significance. We cannot reject the holdout sample of 19 observations as the period for which we
null hypothesis of unbiasedness for the remaining models. must provide forecasts. So our forecast horizon is 19 months.
Thus, if unbiasedness is an important component of the deci- We calculate a series of one-step-ahead forecasts for the
sion makers loss function, the nave II and linear regression holdout sample. We compare each forecast to its matching
models would not be acceptable as candidates because they actual value for that month. We then calculate the same per-
fail the unbiasedness test. formance statistics for the postsample period that we devel-
Consider now the remaining performance measures.
oped for the within-sample period. This requires that we con-
While it may seem reasonable to expect models to be ranked
struct a new table in the format of Table 3 but this time use the
in the same order when using these different performance
statistics, this is not necessarily the case. For example, col- postsample values. In turn, we get immediate feedback
umn 3 in Table 3 shows that the MAPE calculations for the regarding our rankings during the postsample period. The
nave I and linear regression models are 7.24 and 7.12, postsample performance statistics and model rankings by
respectively. On the basis of MAPE alone, the linear regres- performance statistic are presented in Table 4.
sion model performs marginally better than the nave I model
because its MAPE is smaller. When using MAE as the per- Comparison Between Within-Sample
formance measure, we get 17,393.7 and 17,651.4 for the and Postsample Rankings
same two models, respectively. Here, the nave I model out-
performs the linear regression model. A cursory review of Regarding unbiasedness, we see that the linear regression
Table 3 reveals that switching in this fashion is not a big model provides biased forecasts during the postsample
problem. We have cited the only instance where this has phase; test results indicate that nave II does not provide
occurred. In other settings, the difference in rankings can be biased forecasts as it did during the within-sample period.
quite large. For the other performance statistics, model rankings during
With all of this as background, we now use the results the within-sample phase are different than the rankings dur-
of Table 3 to assist with the selection of the best within- ing the postsample phase; there is simply no consistency in
sample model. Not only do test results indicate that Winterss model rankings between within-sample and postsample
model provides unbiased forecasts, but MAPE, MAE, and periods.
RMSE are smallest for Winterss multiplicative model rela- Regarding the top-performing models, notice that Win-
tive to this models seven remaining competitors. This indi- terss model consistently ranked first among the competitors
cates that Winterss model outperforms the other models dur-
during the model construction phase (Table 3), but it ranked
ing the within-sample estimation period. These values are
2.46 for MAPE, 5,979.7 for MAE, and 7,707.3 for RMSE. third during the postsample period when using MAPE,
They are highlighted in Table 3. Adjacent to each highlighted MAE, and RMSE (see Table 4). Chus (1998a) model consis-
measure is the rank of 1 for this model when using this partic- tently ranked third during the model construction phase and
ular performance measure. Using these same three perfor- inconsistently ranked first during the postsample phase. With
mance measures, we see that our ARIMA model ranks sec- MAPE and MAE as performance statistics, Chus ARIMA
ond, Chus ARIMA model ranks third, and Chans sine-wave model ranked first, and our ARIMA model ranked second.
regression model ranks fourth. Finally, when using AIC and When using RMSE, AIC, and SBC, our ARIMA ranked first,
SBC as performance measures among the three ARIMA and Chus ARIMA ranked second.
410 MAY 2005
TABLE 4
POSTSAMPLE FORECASTING PERFORMANCE, INDIVIDUAL MODELS: 19-MONTH HORIZON
Model Unbiasedness MAPE MAE RMSE AIC SBC

Nave I 3.68 5.53 (5) 23,165.6 (5) 30,046.6 (5)
Nave II 3.97 5.54 (6) 23,297.4 (6) 30,082.7 (6)
Linear regression 189.43* 17.65 (8) 74,761.9 (8) 79,759.5 (8)
Winterss model 2.80 2.70 (3) 11,130.4 (3) 13,062.7 (3)
ARIMA (2,1,2) 1.48 8.61 (7) 36,719.4 (7) 46,038.7 (7) 416.0 419.7
ARIMA (3,1,0)(0,1,0)12 1.60 1.79 (1) 7,429.7 (1) 9,168.9 (2) 352.7 (2) 355.5 (2)
ARIMA (0,1,1)(1,1,0)12 0.63 1.82 (2) 7,504.5 (2) 8,997.4 (1) 349.9 (1) 351.7 (1)
Sine-wave regression 0.81 4.78(4) 19,573.6 (4) 23,632.4 (4)
*p < .05.
TABLE 5
POSTSAMPLE FORECASTING PERFORMANCE, INDIVIDUAL MODELS: 3-MONTH HORIZON
Model MAPE MAE RMSE AIC SBC

Nave I 7.86 (6)30,287.0 (6) 34,005.1 (6)
Nave II 11.68 (7)45,612.6 (7) 52,288.9 (7)
Linear regression 13.03 (8)50,458.4 (8) 54,400.3 (8)
Winterss model 2.25 (2) 8,902.8 (2) 11,308.5 (3)
ARIMA (2,1,2) 6.74 (5)25,863.8 (5) 28,481.5 (5) 69.5 65.9
ARIMA (3,1,0)(0,1,0)12 1.96 (1) 7,414.3 (1) 8,386.2 (1) 60.2 (2) 57.5 (1)
ARIMA (0,1,1)(1,1,0)12 2.34 (3) 8,939.5 (3) 10,521.6 (2) 59.5 (1) 57.7 (2)
Sine-wave regression 4.73(4) 17,887.7 (4) 19,422.2 (4)
Model Ranking When DISCUSSION

Forecast Horizon Changes
We have emphasized the judicious comparison of within-
We changed the forecast horizon to 15 months to deter-
sample performance statistics with their postsample counter-
mine the effect that a small decrease in the forecast horizon
parts as part of the model selection process. Whether the
would have on model performance. The rankings remained
the same as with the 19-month horizon. (We do not present a researcher uses this approach will have no effect on the
separate table showing the new performance statistics corre- choice of the single best postsample model. But doing so puts
sponding to a 15-month horizon.) We repeated the process into perspective the limitations of any forecasting model and
for a 3-month horizon and present the results in Table 5. at a minimum suggests a subset of best-performing models.
Here, Chus (1998a) ARIMA model ranked first for four of We introduced the ARIMA (0,1,1)(1,1,0) 12 model to this
five of the performance statistics. Winterss model ranked study as an alternative to Chus (1998a) ARIMA(3,1,0)(0,1,0)12
second when using MAPE and MAE and ranked third when specification. Returning to Table 3, we see that Winterss
using RMSE. Our ARIMA model ranked third when using model was the clear winner with respect to within-sample fit
MAPE and MAE, second when using RMSE and SBC, and over all of its competitors and using all of the performance
first when using AIC. measures. When comparing only the two ARIMA models
In summary, Chus (1998a) ARIMA model is strong for during the estimation period, our ARIMA (0,1,1)(1,1,0) 12
the low-forecast horizon in particular. For the higher forecast model performed better than Chus ARIMA (3,1,0)(0,1,0) 12
horizons, it remains strong but starts to give way to our model.
ARIMA model for forecast horizons of 15 and 19 months. As presented in Table 4, when applied to the postsample
Winterss model enjoyed a rank of two for the low-forecast data, the ARIMA (3,1,0)(0,1,0) 12 model performed best
horizon but then dropped to a position of third for the higher among all eight specifications for the 19-month forecast
forecast horizons. horizon when using MAPE and MAE. However, the ARIMA
(0,1,1)(1,1,0) 12 model provided the best results when RMSE, models are 2.70, 1.79, and 4.78, respectively. From Table 6,
AIC, and SBC were used to judge forecast performance. the postsample MAPE for the combination having these
Winterss model dropped out of the picture as the best three components is 2.52. Notice that MAPE for the combi-
performer. nation (2.52) is not worse than MAPE for the poorest per-
Generalizing these results, we see that the best fitting forming component in the combination, which is sine-wave
model during the estimation period is not necessarily the regression with a MAPE of 4.78.
model that provides the best postsample forecasts. Likewise, In addition, a combination may even perform better than
ranking model performance can be sensitive to the perfor- the single best performing model (Makridakis and Winkler
mance statistic that is used. 1983). For example, the combined forecast obtained from the
Table 5 shows that when applied to the 3-month forecast ARIMA (0,1,1)(1,1,0) 12 and ARIMA (3,1,0)(0,1,0) 12 models
horizon rather than to the 19-month forecast horizon, these performed marginally better than either of these models
two ARIMA models exchanged ranks in terms of RMSE and alone when using MAE and RMSE. As shown in Table 6, for
SBC. This time, Chus (1998a) ARIMA was on top and our this combination, these measures are 7,408.4 and 8,988.2,
ARIMA was second. This suggests that the model that pro- respectively. As shown in Table 4, for the individual models,
vides the best postsample forecasts for a given forecast hori- these measures are 7,504.5 and 8,997.4, respectively, for
zon will not necessarily continue to provide the best forecasts ARIMA (0,1,1)(1,1,0) 12 and 7,429.7 and 9,168.9, respec-
as the forecast horizon changes. tively, for ARIMA (3,1,0)(0,1,0) 12.
Results like these are typical when it comes to modeling There is a rational explanation why combining works.
any time series. Evidence is usually neither conclusive nor From Table 5, we see that there is a tendency for the ARIMA
consistent regarding one clear winner. Different performance (3,1,0)(0,1,0) 12 model to work better across shorter forecast
measures and estimation periods will lead to varied results. horizons. This suggests that the forecast errors are larger
So where does all of this leave the researcher who ultimately when using ARIMA (0,1,1)(1,1,0) 12 for the short term. As
seeks the best forecast? Is there one best forecasting model? the forecast horizon lengthens, Table 4 shows that ARIMA
What kind of information can be gleaned from those models (0,1,1)(1,1,0) 12 begins to take over in terms of better perfor-
that provide respectable performance? mance. This likewise suggests that the forecast errors are
larger when using ARIMA (3,1,0)(0,1,0) 12 for the longer
term. When forming a combined forecast, averaging cancels
COMBINING MODELS out large forecast errors for any particular month and there-
fore reduces RMSE. A combined forecast spreads the risks
of poor performance associated with any one particular
One strategy is to formulate a forecast that is some com- model.
bination of forecasts obtained from several of the competing
models. The result is commonly referred to as a combined
forecast. There are a number of rule-base methods for
weighting the components of the combined forecast CONCLUSIONS
(Armstrong 2001; Clemen 1989). Perhaps the most straight-
forward combined forecast is the simple average of forecasts The performance of several proposed forecasting models
obtained from a subset of the best performing models. Sim- has been assessed in the context of the flow of numbers of
ple averaging is attractive because it is simple, it works, and it travelers into Singapore. Most existing studies yield the con-
receives support in the literature (Armstrong 2001; Clemen clusion that one forecasting model outperforms other models
1989; Makridakis and Winkler 1983). in terms of a single performance statistic. Chan (1993), for
One example of a combined forecast would be to start example, concluded that the sine-wave regression model pro-
with the individual forecasts from the Winterss model, the vided the best performance among nave I, nave II, linear
ARIMA (3,1,0)(0,1,0) 12 model, and the sine-wave regression regression, and ARIMA (2,1,2) models when using MAPE.
model. The combined forecast for this selection of models Chu (1998a) pointed out that Chan neglected the seasonal
for a particular month would be the average of the forecasts component in the data and that the ARIMA (3,1,0)(0,1,0) 12
obtained from these three individual models for that month. model performed better than the sine-wave regression
To illustrate, we concocted six different model combina- model, likewise when using MAPE.
tions to give six sets of combined forecasts. Table 6 presents However, in this study, an ARIMA (0,1,1)(1,1,0) 12 model
a list of the six different combinations. Adjacent to each com- takes turns with Chus (1998a) ARIMA (3,1,0(0,1,0) 12 model
bination is the MAPE, MAE, and RMSE corresponding to its when it comes to providing the best forecasting performance
forecasts. The first combination listed in Table 6 is the one during the postsample period. We have seen that rank
described directly above. Five other formulations are pre- exchange between these two models is sensitive to the per-
sented. Each formulation is a different combination of from formance statistic used and to the length of the forecast hori-
two to four of the top performing individual models that we zon. If one were to rely on only one or two performance mea-
obtained from either the within-sample or postsample sures to assist with model selection, our results suggest that
analysis. quite a different conclusion could be reached if different per-
One attraction of a combined forecast is that it will never formance criteria were used.
perform more poorly than the poorest performing compo- Also, the model that performs best during the estimation
nent forecast in the combination. For example, consider phase may not necessarily provide the best forecasts, even
again the component models for the first combination in though it ranks highest when using all performance mea-
Table 6. These are Winterss model, the ARIMA sures. This is most reasonable because of the strong possibil-
(3,1,0)(0,1,0) 12 model, and the sine-wave regression model. ity of a structural change during the postsample period. The
From Table 4, the postsample MAPEs for these individual within-sample model would simply not pick this up. We have
412 MAY 2005
TABLE 6
POSTSAMPLE FORECASTING PERFORMANCE, COMBINED MODELS: 19-MONTH HORIZON
Combined Model MAPE MAE RMSE

Winters model + ARIMA (3,1,0)(0,1,0)12+ Sine wave 2.527 (6) 10,225.7 (6) 12,288.4 (6)
ARIMA (0,1,1)(1,1,0)12 + ARIMA (3,1,0)(0,1,0)12 + Winterss 1.970 (2) 8,130.9 (2) 9,676.5 (2)
ARIMA (0,1,1)(1,1,0)12 + Winters 2.185 (4) 9,000.2 (4) 10,428.0 (4)
ARIMA (3,1,0)(0,1,0)12 + Winters 2.092 (3) 8,646.7 (3) 10,249.7 (3)
ARIMA (0,1,1)(1,1,0)12 + ARIMA (3,1,0)(0,1,0)12 1.796 (1) 7,408.4 (1) 8,988.2 (1)
ARIMA (0,1,1)(1,1,0)12 + ARIMA (3,1,0)(0,1,0)12 + Sine wave + Winterss 2.306 (5) 9,346.3 (5) 11,036.2 (5)
RMSE = root mean square error. Boldface text indicates the best performance using the criteria.
seen that Winterss multiplicative model provided the best fit REFERENCES
during the estimation phase but slid in rank during any of the
postsample time horizons. Archer, B. H. (1977). Tourism Multipliers: The State of the Art. Bangor Oc-
Obviously, the goal of the practitioner is to select the casional Papers in Economics. No. 11. Bangor: University of Wales
Press.
model that provides the best forecasting results across a vari- Armstrong, J. Scott (2001). Combining Forecasts. In Principles of Fore-
ety of forecast horizons. The issues presented above indicate casting: A Handbook for Researchers and Practitioners, edited by J.
Scott Armstrong. Boston: Kluwer, pp. 41739.
the difficulties involved in terms of achieving this goal. Calantone, R. J., C. A. Di Benedetto, and D. Bojanic (1987). A Comprehen-
Indeed, a suggested and realistic alternative approach that sive Review of the Tourism Forecasting Literature. Journal of Travel
Research, 26 (2): 2839.
minimizes the risk involved with betting on just one model is Chan, Y. M. (1993). Forecasting Tourism: A Sine Wave Time Series Re-
to derive forecasts based on a simple averaging of forecasts gression Approach. Journal of Travel Research, 32: 5860.
obtained from a subset of models that perform well. Because Cho, V. (2003). A Comparison of Three Different Approaches to Tourist
Arrival Forecasting. Tourism Management, 23: 32330.
the two ARIMA models provide consistent and reliable fore- Chu, F. L. (1998a). Forecasting Tourist Arrivals: Nonlinear Sine Wave or
casting performance across different forecast horizons, com- ARIMA? Journal of Travel Research, 36: 7984.
(1998b). Forecasting Tourism Demand in Asian-Pacific Coun-
bining forecasts from these two is a viable option. tries. Annals of Tourism Research, 25: 597615.
The zealous researcher may pretend that the post- Clemen, R. T. (1989). Combining Forecasts: A Review and Annotated Bib-
sample observations are truly unavailable and decide upon a liography. International Journal of Forecasting, 5: 55983.
Dharmaratne, C. S. (1995). Forecasting Tourist Arrivals in Barbados. An-
combined model based upon the best performing within- nals of Tourism Research, 22 (4): 80418.
sample models. For our particular example, the result would Diebold, F. X. (1998). Elements of Forecasting. Cincinnati, OH: South-
Western College.
be a combined model consisting of ARIMA (0,1,1)(1,1,0) 12 Eadington, W., and M. Redman (1991). Economics and Tourism. Annals of
+ ARIMA (3,1,0)(0,1,0) 12 + Winterss. This combined model Tourism Research, 18: 4156.
Enders, W. (2004). Applied Econometric Time Series. New York: John
provides excellent performance statistics and is developed on Wiley.
the premise that the postsample data cannot be seen. This is Frechtling, Douglas C. (2001). Forecasting Tourism Demand: Methods and
the same setting faced when doing ex ante forecasts. Strategies. Oxford, UK: Butterworth Heinemann.
Frechtling, D. C., and E. Horvath (1999). Estimating the Multiplier Effects
Our intent was to emphasize a comprehensive method- of Tourism Expenditures on a Local Economy through a Regional In-
ology for evaluating a forecasting model. We chose to evalu- put-Output Model. Journal of Travel Research, 37: 32432.
Greenidge, K. (2001). Forecasting Tourism Demand: An STM Approach.
ate models that were predominantly of the univariate variety Annals of Tourism Research, 28: 98112.
to make comparisons with Chans (1993) and Chus (1998a) Kulendran, N., and M. L. King (1997). Forecasting International Quarterly
Tourist Flows Using Error Correction and Time-Series Models. Inter-
univariate approaches for modeling inbound tourist arrivals national Journal of Forecasting, 13: 31927.
to Singapore. Kulendran, N., and J. Shan (2002). Forecasting Chinas Monthly Inbound
This approach is similarly transferable to structural Travel Demand. In Tourism Forecasting Marketing, edited by K. K. F.
Wong and H. Song. New York: Haworth, pp. 519.
econometric models that might be used for forecasting. One Kulendran, N., and K. Wilson (2000). Modeling Business Travel. Tourism
very important limitation of univariate approaches is their ad Economics, 6 (1): 4759.
Kulendran, N., and S. F. Witt (2001). Cointegration Versus Least Squares
hoc nature relative to structural approaches that, by their very Regression. Annals of Tourism Research, 28 (2): 291311.
nature, emphasize causal relationships among variables (2003a). Leading Indicator Tourism Forecasts. Tourism Manage-
ment, 24: 50310.
(Song and Witt, 2000; Song and Wong, 2003; Webber 2001). (2003b). Forecasting the Demand for International Business Tour-
In this context, univariate methods shed little light on issues ism. Journal of Travel Research, 41: 26571.
relating to policy. However, when it comes to forecasting, it Louvieris, P. (2002). Forecasting International Tourism Demand for
Greece: A Contingency Approach. In Tourism Forecasting Market-
is a fact that structural econometric models often provide ing, edited by K. K. F. Wong and H. Song. New York: Haworth, pp. 21
worse forecasts than univariate methods when any of the 41.
Makridakis, S., and R. Winkler (1983). Average of Forecasts: Some Empiri-
causal or predictor variables themselves need to be fore- cal Results. Management Science, 29: 98796.
casted. Univariate methods frequently lead to better forecasts Martin, C. A., and S. F. Witt (1989). Accuracy of Econometric Forecasts of
Tourism. Annals of Tourism Research, 16: 40728.
than do structural models because there are fewer reasons for Milne, S. S. (1987). Differential Multipliers. Annals of Tourism Research,
forecast error. 14 (4): 499515.

Newbold, P., and T. Bos (1994). Introductory Business and Economic Fore- Witt, S. F., Newbould, G. D., and A. J. Watkins (1992). Forecasting Domes-
casting. Cincinnati, OH: South-Western College. tic Tourism Demand: Application to Las Vegas Arrivals Data. Journal
Sheldon, J. S. (1993). Forecasting Tourism: Expenditures versus Arrivals. of Tourism Research, 31: 3641.
Journal of Travel Research, 32: 1320. Witt, S. F., Song, H., and P. Louvieris (2003). Statistical Testing in Forecast-
Singapore Tourist Promotion Board. (Various years). Annual Statistical Re- ing Model Selection. Journal of Travel Research, 42: 15158.
port on Visitor Arrivals to Singapore. Witt, S. F., and C. A. Witt (1995). Forecasting Tourism Demand: A Review
Song, H., and S. F. Witt (2000). Tourism Demand Modeling and Forecasting: of Empirical Research. International Journal of Forecasting, 11: 447
Modern Econometric Approaches. Oxford, UK: Elsevier Science. 75.
Song, H., and K. K. F. Wong (2003). Time Demand Modeling: A Time-
Varying Parameter Approach. Journal of Travel Research, 42: 5764. Witt, S. F., and C. A. Witt (1992). Modeling and Forecasting Demand in
Turner, L. W., and S. F. Witt (2001). Forecasting Tourism Using Univariate Tourism. San Diego, CA: Academic.
and Multivariate Structural Time Series Models. Tourism Economics, World Travel and Tourism Council (2002a). The Travel and Tourism Eco-
7 (2): 13547. nomic Research: World. August 22. Available at http://www.wttc.org/
Uysal, M., and J. Crompton (1985). An Overview of Approaches Used to measure/PDF/World.pdf.
Forecast Tourism Demand. Journal of Travel Research, 23: 715. World Travel and Tourism Council (2002b). The Travel and Tourism Eco-
Webber, A. G. (2001). Exchange Rate Volatility and Cointegration in Tour- nomic Research: Singapore. August 22. Available at http://www.wttc.
ism Demand. Journal of Travel Research, 39: 398405. org/measure/PDF/Singapore.pdf.

Evaluating Time Series

Încărcat de

Informații document

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Evaluating Time Series

Încărcat de

Drepturi de autor:

Formate disponibile

Journal of Travel Research http://jtr.sagepub.

The online version of this article can be found at:

Travel and Tourism Research Association

Email Alerts: http://jtr.sagepub.com/cgi/alerts

>> Version of Record - Apr 5, 2005

Downloaded from jtr.sagepub.com at Universiti Teknologi MARA (UiTM) on October 4, 2012