Sunteți pe pagina 1din 10

Arthik Sarokar Volume 1 Year 2010 (Jan-June)

Modelling and Forecasting the Inbound Tourism: The Case of Nepal

Shashi K. Chaudhary

Abstract

The present paper uses monthly time series of inbound tourist arrivals to Nepal from January
1990 to December 2008 in modelling and forecasting using ARIMA model. In the light of
seasonal effect, the best fit model found on Akaike information criteria is
SARIMA(2,1,1)(1,1,1)12. The annual growth rate of inbound tourist has remained about 4.88
percent though it has varied a lot in specific. The highest growth rate of inbound tourism is
37.19 percent in FY 2007/08 while it is the lowest, i.e. negative 23.74 percent in FY 2002/03.
Moreover, in the light of Nepal Tourism Year 2011, the available infrastructure is not sufficient
to withhold the pressure of increasing tourists. In such a situation, this paper has expected to
contribute in forecasting the monthly and annual inbound tourists to Nepal as well as to provide
an idea to the planner and investors about the necessary arrangements and investments needed
for developing infrastructure in this sector to cater services to the tourists.

Keywords: Box-Jenkins methodology, forecasting, inbound tourism, SARIMA, stationarity

1. INTRODUCTION

The tourism sector has remained an integral part of the national economy, though it has
faced varied growth and fall pattern due to the political instability in the country and increasing
terrorism in the world. The average inflow of tourists to Nepal has been found to be 4.88 percent.
It has shown huge negative change of -23.74 percent in the fiscal year 2002/03 followed by
another -22.09 percent in the fiscal year 2001/02. The year 2001/02 can be remembered for WTC
attack in USA while the year 2002/03 can be remembered for Royal Massacre of Nepal. Thus,
the major events in the country or around the world have shown direct impact on tourism sector
of Nepal. The uneven events that occurred and that have expected probability to occur in the
current scenario in the nation have discouraged many tourists from visiting Nepal. The blame
once again goes to none other than internal conflict and the problem of unionism in the hotel
industry.
Recently, the government of Nepal has announced to organize ‘Nepal Tourism Year
(NTY) 2011’, but the available figure shows that the country does not have sufficient
infrastructure to cater services to tourists. Figures of 1998 shows that a total of 739 tourist hotels
offered 28878 beds as against 636 hotels with 25357 beds in 2008 (Dahal, 2009:business page).
In the year 1998, total 463684 tourists visited Nepal (MTCA, 2008:10) and for a moment it is
assumed that the available beds were enough to cater the specified number of tourists. In the year
2011, the government has a tentative plan of attracting one million tourists, which is more than
two times of that in 1998, but the available beds are less. This puts question mark in the success
of NTY 2011. Though, the government has pledged to adopt more investor-friendly policies to
increase the number of hotels in various tourist spots across the country, none of the initiations
have been reported due to uncertainty of the future.
In such context, forecasting the number of tourist helps both the public and private
sectors to improve the allocation of scarce resources and is essential for efficient planning by
tourism‐related businesses, particularly given the perishable nature of the tourism products.
Tourism investment, especially in infrastructures, requires huge financial commitments from
both the public and private sectors. The costs can be very high if the investment projects fail to
fulfill their designed capacities. More accurate forecasts provide better estimates of expected
return on investments, which help guide investment decisions. The forecast of tourism volume in
the form of arrivals is also of special importance for policy makers because government
macroeconomic policies largely depend on the relative importance of individual sectors within a
destination. Hence, accurate forecasts can help the governments in formulating and
implementing appropriate tourism strategies (Brida & Garrido, 2009:para. 2-4).

2. METHODOLOGY

The present paper has made use of seasonal autoregressive integrated moving average
(SARIMA) process to forecast the inbound tourism in Nepal through Box-Jenkins methodology.
The organization of methodological process is as follow:
In first step, the data has been examined graphically and statistically. Then in second
step, test of stationary has been done to determine the order of integration of the time series.
Once the data rendered stationary, identification and estimation of the correct ARMA model has
been sought. Speaking empirically, after making the data stationary estimation of the simple

112
model has been done to decide about ARIMA term(s). Finally, the diagnostic checking of the
chosen ARIMA model and forecasting of inbound tourism have been done.

2.1 The Data


Altogether 228 monthly observations have been taken in this study regarding the number
of tourist arrivals to Nepal {X} from January of 1990 to the December of 2008 from ‘Nepal
Tourism Statistics 2008’. The time series shows to have an upward trend as well as dramatic
swings. A graphical analysis of the data suggests that tourist arrivals to Nepal may be level
nonstationary (Fig. 1(a)). Since variance in the time series seems to increase with time, the data
has been transformed into natural logarithmic form (LNX) and differenced. After taking first-
difference, the series ‘DLNX’ achieved stationary (fig 1(b)). The empirical results of ADF unit
root test (table 1) have confirmed the mentioned conclusions.

80000 0.8

70000
0.4
60000
0.0
50000

40000 -0.4

30000
-0.8
20000
-1.2
10000

0 -1.6
90 92 94 96 98 00 02 04 06 08 90 92 94 96 98 00 02 04 06 08

Number of tourists in Nepal (1990-2008) DLNX

Fig 1(b): Differenced Monthly Tourists arrival


Fig 1(a): Monthly Tourists arrival in Nepal
to Nepal after log transformation

Table 1: Results of Unit Root Tests


ADF statistic
Series Degree of Integration
Level First difference
LNX -2.707 -4.394* I(1)
D12LNX -4.56* --- I(0)
* indicates 0.01 level of significance based on Mackinnon’s critical values.
All the regressions with level and first difference have been done with trend and intercept criteria.
Further, from the correlogram of ‘DLNX’, we found a significant seasonal spike of ACF
appeared at period 12, 24, and 36 and so on (appendix A). It implies the presence of seasonality

113
effect and the series may need seasonal adjustment. It has been done by taking 12-order
difference since the frequency of data is monthly. The seasonally adjusted series ‘D12LNX’ was
found to be stationary. Its ADF statistic was (-4.56) which is significant at one percent level.

2.2 Identification of SARIMA Model


The autocorrelation plot of DLNX shows an alternating pattern of positive and negative
spikes. It also shows a repeating pattern every 12 lags, which indicates a seasonality effect. So it
is needed to include seasonal terms in fitting a Box-Jenkins model. A seasonal difference of 12th
order has been taken and the autocorrelation plot on the seasonally differenced data has been
generated (appendix B). This autocorrelation plot shows a mixture of exponential decay and a
damped sinusoidal pattern indicating an AR model, with order greater than one, may be
appropriate. The partial autocorrelation plot suggests that an AR(2) model might be appropriate
since the partial autocorrelation becomes zero after the second lag. The lag 12 is also significant,
indicating some remaining seasonality.
From several trial models (appendix C), the SARIMA(2,1,1)(1,1,1)12 has been selected as
the best fit to the data on the basis of minimum Akaike information criterion (AIC), minimum
standard error of regression (SEE) and the highest adjusted R2. The residuals of the selected best
model have also been tested for being white noise process.

2.3 Estimation of the Model and Diagnostic Checks


The output estimation of the SARIMA(2,1,1)(1,1,1)12 model has been presented in the
table 2. The coefficients of all the SARIMA terms are significant (t-statistic) within five percent
level except that of AR(2). The D-W and F-statistic shows good fitting of the model. The
adjusted determination of coefficient (Adj-R2) indicates about 71 percent of the variation is
explained by the variables. Hence, the mentioned model corresponds to the following
specification:
D12LNX(t) = 0.02 + 0.97D12LNX(t-1) – 0.06D12LNX(t-2) + 0.16D12LNX(t-12) – 0.36ε(t-1) –
0.93ε(t-12) ………. (2.1)
where ε’s are the residual values of the estimated model in the corresponding time period.

114
Table 2: Output Estimates of D12LNX under SARIMA (2, 1, 1)(1,1,1)12 Model
Variable Coefficient Std. Error t-Statistic Prob.
C 0.015328 0.014809 1.035073 0.3019
AR(1) 0.967040 0.138328 6.990913 0.0000
AR(2) -0.057662 0.123390 -0.467312 0.6408
SAR(12) 0.155226 0.074955 2.070936 0.0397
MA(1) -0.365210 0.122363 -2.984650 0.0032
SMA(12) -0.929525 0.016061 -57.87586 0.0000
Adj. R2 = 0.707 F-statistic =98.16* D-W statistic = 1.99
* indicates significant at one percent level.

When the 12-order difference transformation is undone and rearranged the like terms, equation
(2.1) takes the following form:

LNX(t) = 0.02 + 0.97LNX(t-1) – 0.06LNX(t-2) + 1.16LNX(t-12) – 0.97LNX(t-13) +


0.06LNX(t-14) – 0.16LNX(t-24) -0.36 ε(t-1) – 0.93ε(t-12) ………. (2.2)

The equation (2.2) constitutes the main model for forecasting inbound tourism in Nepal. The root
mean square error (RMSE) and mean absolute error (MAE) of the model are found to be 0.22
and 0.17 respectively. They, thus together support the validity of the model.
The diagnostic checks for the model have been done through observation of roots,
impulse response (appendix D) and the residual tests. All AR roots lie inside the unit circle.
Further, the impulse responses asymptote to zero, thus confirming the stationarity of the
SARMA model. The residual of the model specified is also a white noise process and any term is
not exterior to the confidence intervals. Therefore, there is no need to re-specify the model and
the estimation of model is validated.

2.4 Measuring Forecast Accuracy: In-sample Forecasts and MSPE


It is important to remember that the data has been taken on the monthly basis. Hence, if
January is represented by by 1, February by 2 and so on, then from equation (2.2), the tourism
forecasting for January, 2005 (for example) can be written as-

LNX2005:01 = 0.02 + 0.97LNX2004:12 – 0.06LNX2004:11 + 1.16LNX2004:01 – 0.97LNX2003:12 +


0.06LNX2004:11 – 0.16LNX2003:01 -0.36 ε2004:12 – 0.93ε2004:01 ………. (2.3)

115
On the basis of this equation, the in-sample forecasting can be done on monthly as well
as annual basis. Here, in-sample forecasting on annual basis which is the sum of monthly
forecasting has been presented here to check the fit of the model on the data.

Table 3: In-sample Forecasts of the Model


Year Forecasts Actual Values Forecasting Error
1994/95 332857 326531 0.019
1995/96 377634 363395 0.039
1996/97 391477 393613 -0.005
1997/98 422864 421857 0.002
1998/99 461309 463684 -0.009
1999/00 484335 491504 -0.015
2000/01 459370 463646 -0.009
2001/02 392382 361237 0.086
2002/03 283661 275468 0.030
2003/04 333949 338132 -0.012
2004/05 409503 385297 0.063
2005/06 391227 375398 0.042
2006/07 397438 383926 0.035
2007/08 518909 526705 -0.015
2008/09 516999 500277 0.019
(-) sign indicates the underestimate in the forecast values.

The annual forecast values are close to the actual value in many cases. The correlation
coefficient has also been found to be 0.986. It means 98.6 percent variation in the actual value is
explained by the model. Despite, there is some significant fluctuation in year 2001/02 and
2004/05, thus indicating the need for structural adjustment in the model for better forecasts. In
aggregate, there is only about 2 percent fluctuation in forecast and actual values. This is very
strong point for the model. Further, the mean square prediction error (mspe) is very small (0.037)
and hereby conclude the model to be reliable.

2.5 Expansion of the Model: Out-sample Forecasts


The equation (2.2) can be generalized to make out-sample forecasts for the considered
time period. For example, the out-sample forecast for the month of January, 2009 can be made in
the following adjusted model specification:

LNX2009:01 = 0.02 + 0.97LNX2008:12 – 0.06LNX2008:11 + 1.16LNX2008:01 – 0.97LNX2007:12 +


0.06LNX2008:11 – 0.16LNX2007:01 -0.36 ε2008:12 – 0.93ε2008:01 ………. (2.4)

116
The residual values for different months can be obtained from the residual series of the
model specified in the equation (2.1). Hence, generating the forecast values for different months
in the year 2009 with the help equation (2.4), we get-

Table 4: Out-sample Forecasts of the Model


Months, 2009 Forecasts Actual number through airway only*
January 34258 26064
February 45056 25181
March 51183 33005
April 37671 37819
May 38463 25129
June 32846 23222
July 34989 23266
August 45921 27676
September 40289 34281
October 63788 56009
November 41949 39784
December 39180 26576**
Total = 505593 378012
* Nepal Tourism Board (NTB). Source: Arrival Statistics through www.welcomenepal.com retrieved on
22nd December, 2009. **Adjusted by the author himself as total of 378012 tourist arrival through TIA
was reported by Kantipur newspaper dated 2nd January, 2010. p.13.

The model forecasts 505593 tourist arrivals to Nepal in the year 2009. The validation of
the out-sample forecasts in this case cannot be justified statistically, though the correlation
coefficient of out-sample forecasts with tourist arrival through airway only has been calculated to
be 0.76. This value is not as strong as that in the case of in-sample forecast, yet 76 percent
explanation can be taken satisfactorily.

3. CONCLUSION

This paper has used monthly time series of inbound tourist arrivals from January 1990 to
December 2008 for the purpose of modelling and forecasting. The data analysis has shown that
there was presence of strong seasonality and needed to be adjusted. So it has been done by taking
12-order difference of the natural logarithm of main data series. In the light of seasonal effect,
the best fit model that has been found on Akaike information criteria is SARIMA(2,1,1)(1,1,1)12.
On this specification, the model has been checked against the annual in-sample forecasts. The

117
forecast values were very close to the actual value in many cases. The correlation coefficient has
been found to be 0.986. In aggregate, there was only about 2 percent fluctuation in forecast and
actual values. After proving the validation of the model, it has been used to forecast the out-
sample values for the year 2009 and it has forecasted 505593 tourist arrivals to Nepal. Moreover,
in the light of Nepal Tourism Year 2011, this paper has expected to contribute in forecasting the
monthly and annual inbound tourists to Nepal as well as to provide an idea to the planner and
investors about the necessary arrangements and investments needed for developing infrastructure
in this sector to cater services to the tourists.

REFERENCES

1. Anders, W. (2008). Applied Econometric Time Series. New Delhi: John Wiley & Sons.
2. Brida, J.B., & Garrido, N. (2009). Tourism Forecasting using SARIMA models in Chilenean
Regions. Retrieved December 16, 2009.
website:papers.ssrn.com/sol3/papers.cfm?abstract_id=1457984.
3. Dahal, B. (2009, November 24). National Tourism Council meet, Need to implement
resolutions accordingly. The Rising Nepal. Retrieved December 21, 2009. Website:
http://www.gorkhapatra.org.np/gopa.detail.php?article_id=27156&cat_id=27
4. Dobre, I., & Alexandru, A.A. (2008). Modelling Unemployment Rate using Box Jenkins
Procedure. Journal of Applied Quantitative Methods, 3(2): p.156-166.
5. Franses, P.H., & Taylor, A.M.R. (2000). Determining the order of differencing in seasonal
time series processes. Econometrics Journal, 3: p.250-264.
6. Gujrati, D.N. 92004). Basic Econometics. New Delhi: Tata McGraw Hills.
7. MTCA (2008). Nepal Tourism Statistics 2008. Kathmandu: G/N.

118
APPENDIX

A. Autocorrelation of DLNX B. Autocorrelation of D12LNX

lag AC PAC Q-Stat Prob lag AC PAC Q-Stat Prob

1 -0.046 -0.046 0.4925 0.483 1 0.718 0.718 112.80 0.000


2 -0.094 -0.096 2.5228 0.283 2 0.606 0.187 193.58 0.000
3 -0.446 -0.460 48.633 0.000 3 0.504 0.035 249.82 0.000
4 -0.229 -0.390 60.907 0.000 4 0.405 -0.025 286.31 0.000
5 0.252 0.076 75.801 0.000 5 0.348 0.034 313.34 0.000
6 0.127 -0.133 79.601 0.000 6 0.277 -0.026 330.58 0.000
12 0.732 0.406 280.65 0.000 12 -0.190 -0.215 352.63 0.000
24 0.669 0.141 532.79 0.000 24 -0.202 -0.231 388.34 0.000
36 0.647 0.049 773.90 0.000 36 0.137 -0.052 436.73 0.000
48 0.571 0.048 974.87 0.000 48 -0.028 -0.061 471.54 0.000
60 0.516 0.002 1170.4 0.000 60 -0.142 -0.121 485.51 0.000

C. Trial and Errors of best fit SARIMA Model Selection

SARIMA model BIC AIC Adj. R2 SEE Residual Correlogram


(2, 1, 1)(0,1,1)12 -1.2785 -1.3571 0.706 0.121 White noise Process
(2, 1, 2)(0,1,1)12 -1.2646 -1.3589 0.708 0.121 White noise Process
(2, 1, 3)(0,1,1)12 -1.2457 -1.3558 0.709 0.121 White noise Process
(2, 1, 1)(1,1,1)12 -1.2745 -1.3728 0.707 0.120 White noise Process
(2, 1, 2)(1,1,1)12 -1.2467 -1.3613 0.705 0.120 White noise Process
(2, 1, 3)(1,1,1)12 -1.2250 -1.3560 0.705 0.120 White noise Process
(3, 1, 1)(1,1,1)12 -1.2338 -1.3489 0.702 0.121 White noise Process

119
D. SARMA Equation Diagnostics
Inverse Roots of AR/MA Polynomial(s)

i. Roots 1.5

1.0

0.5

0.0

-0.5

-1.0

-1.5
-1.5 -1.0 -0.5 0.0 0.5 1.0 1.5

AR roots MA roots

ii. Impulse Responses

Impulse Response ± 2 S.E.


.15

.10

.05

.00

-.05

-.10
5 10 15 20 25 30 35 40 45 50 55 60

Accumulated Response ± 2 S.E.


1.0

0.8

0.6
0.4

0.2
0.0

-0.2
5 10 15 20 25 30 35 40 45 50 55 60

120

S-ar putea să vă placă și