Sunteți pe pagina 1din 11

Table of Contents

1. Objective

2. Data Source

3. Methodology

4. Stationarity

5. Seasonality

6. ACF & PACF

7. Auto Regressive Model(p)

8. Moving Average Model(q)

9. Conclusion

10. Appendix
1. Objective

The objective of the analysis of time series data on Darwin sea level pressure is to build a model
for the purpose of forecasting climate pattern.

2. Data Source

The data set consists of monthly values of Darwin sea level pressure from 1882 to 1998. The
data set is a univariate time series, which means that the data has only one variable. The data set
has considerable number of entries to build a forecasting model.

A time series is a set of observations on values that a variable takes at different times. Time
series are used in statistics, econometrics, mathematical finance, weather forecasting, earthquake
prediction and many other applications. Such data can be collected at regular intervals, such as,
monthly (e.g. CPI), weekly (e.g. Money supply), quarterly (e.g. GDP) or annually (e.g.
Government budget).

3. Methodology

The data is tested for stationarity using Augmented Dickey-Fuller (ADF) test, KPSS test and
Chow test. On checking the value of ACF for the data set we find that the data shows
seasonality. In order to build a model, the elimination of seasonality is performed using
‘decompose’ function in R. The adjusted data is subsequently used for the formation of the
model. We calculated PACF and ACF for the adjusted data and found that the AR(p) = 3 and
MA(q) = 10. The data after adjustment of seasonality became stationary and difference was not
needed, therefore, d=0. We started building the model in ARIMA with p and q value both 0 and
iterating the values for different results. To choose the best model, AIC values were considered
and the model with lowest AIC value was selected.

4. Stationarity

A) Augmented Dickey-Fuller (ADF) test


The formal statistical test for stationarity is called augmented Dickey-Fuller (ADF). The
null hypothesis assumes that the series is non-stationary. ADF test is done to check if the
change in Y can be explained by lagged value and a linear trend or not. If the series is
non-stationary and null hypothesis will not be rejected, then contribution of the lagged
value to the change in Y has to be non-significant and there should be a presence of a
trend component.

> adf.test(DSLP$`sea pressure`)


Augmented Dickey-Fuller Test
Data: DSLP$`sea pressure`
Dickey-Fuller = -6.0074, Lag order = 11, p-value = 0.01
Alternative hypothesis: stationary

Since p value is less than significant level value (0.05), we reject the null hypothesis and accept
the alternate hypothesis that data is stationary.

B) KPSS
The tests that are used for testing a null hypothesis that an observable time series is
stationary around a deterministic trend (i.e. trend-stationary) against the alternative of a
unit root is call Kwiatkowski–Phillips–Schmidt–Shin (KPSS) Test.

> kpss.test(DSLP$`sea pressure`,"Trend")


KPSS Test for Trend Stationarity
data: DSLP$`sea pressure`
KPSS Trend = 0.11721, Truncation lag parameter = 8, p-value = 0.1
> kpss.test(DSLP$`sea pressure`,"Level")
KPSS Test for Level Stationarity
data: DSLP$`sea pressure`
KPSS Level = 0.1807, Truncation lag parameter = 8, p-value = 0.1

In both trend and level analysis, the p value (0.1) is greater than the significant level value (0.05)
and therefore we cannot reject the null hypothesis that the data is stationary.
C) Chow
The chow test is carried to determine whether the true coefficients in two different linear
regressions on two different data sets are equal. This test was proposed by econometrical
Gregory Chow in 1960.

5. Seasonality

On calculating ACF value for data, the seasonality in the data set was clearly visible as shown in
the graph below.

The following function was used to eliminate seaonality in order to build a model.
ts_DSLP = ts(DSLP, frequency = 12, start = 1882)
decompose_DSLP = decompose(ts_DSLP, "additive")
adjust_DSLP = ts_DSLP - decompose_DSLP$seasonal
6. ACF & PACF

After clearing out the seasonality, PACF and ACF for the adjusted Darwin sea level pressure was
calculated.
Autocorrelation refers to the way the observations in a time series are related to each other and is
measured by a simple correlation between current observation (Yt) and the observation p periods
from the current one (yt-p).
Partial Autocorrelations are used to measure the degree of association between Yt and Yt-p when
the effects at other time lags 1,2,3, …. , (p-1) are removed.
7. Auto Regressive Model(p)

An AR Model is one in which Yt depends only on its own past values Yt-1,Yt-2,Yt-3, etc. Thus,
Yt=f(Yt-1,Yt-2,Yt-3, …. , et). A common representation of an autoregressive model where it
depends on p of its past values is called as AR(p) model. It is represented as Yt = β0 + β1 Yt-1 + β2
Yt-2 + …. + βpYt-p + ep.

8. Moving Average Model(q)

A Moving Average model is one when Yt depends only on the random error terms which follow
a white noise process. A common representation of a moving average model where it depends on
q of its past values is called MA(q) model.

From the graphs, we calculated the p and q values. The PACF diagram predicts that lag till the
fourth value is significant in this model. Therefore, to find ARIMA model we vary the value of
‘p’ from 1 to 4. Similarly, the ACF diagram predicts that lag till the 10th value is significant.
Therefore, we vary ‘q’ value from 0 to 10.

We start by with a p value of 1 and q value of 0 and build ARIMA model and find AIC value.

Akaike Information Critera (AIC) is a widely used measure of a statistical model. It represents
the goodness of fit, and the simplicity of the model.

> model<-arima(adjust_DSLP,order= c(1,0,0))


> model
Call:
arima(x = adjust_DSLP, order = c(1, 0, 0))
Coefficients:
ar1 intercept
0.5331 9.8886
s.e. 0.0226 0.0517
sigma^2 estimated as 0.818: log likelihood = -1846.03, aic = 3698.06
> model<-arima(adjust_DSLP,order= c(2,0,2))
> model
Call:
arima(x = adjust_DSLP, order = c(2, 0, 2))
Coefficients:
ar1 ar2 ma1 ma2 intercept
1.8229 -0.8451 -1.4904 0.5496 9.8839
s.e. 0.0426 0.0369 0.0489 0.0364 0.0614
sigma^2 estimated as 0.7395: log likelihood = -1775.55, aic = 3563.1

On creating ARIMA model for different values of p and q, the least AIC value come when the p.

> auto.arima(adjust_DSLP)
Series: adjust_DSLP
ARIMA(2,0,2) with non-zero mean
Coefficients:
ar1 ar2 ma1 ma2 mean
1.8229 -0.8451 -1.4904 0.5496 9.8839
s.e. 0.0426 0.0369 0.0489 0.0364 0.0614
sigma^2 estimated as 0.7421: log likelihood=-1775.55
value and q value, both are equal to 2. This is also corroborated by the auto ARIMA function.

AIC=3563.1 AICc=3563.16 BIC=3594.56

9. Conclusion

The ARIMA model built with p value of 2 and q value of 2 had the lowest AIC value. Therefore,
this model was taken to be the best fit model for the given data set. This further confirmed by the
ACF value of residuals of the model, which are not significant as shown in the graph.

Additionally, the BOX Ljung test performed gave a p-value of 0.9037. At this p-value the null
hypothesis, which states that data are independently distributed, was rejected.
10. Appendix

DSLP = read.csv("DSLP.csv")
plot(DSLP)
summary(DSLP)
install.packages("forecast")
install.packages("fpp")
installed.packages("tseries")
installed.packages("strucchange")
require(tseries)
require(strucchange)
require(fpp)
require(forecast)

plotForecastErrors <-function(forecasterrors)
{
# make a red histogram of the forecast errors:
mybinsize <- IQR(forecasterrors)/4
mymin <- min(forecasterrors)*3
mymax <- max(forecasterrors)*3
mybins <- seq(mymin, mymax, mybinsize)
hist(forecasterrors, col="red", freq=FALSE, breaks=mybins)
# freq=FALSE ensures the area under the histogram = 1
mysd <- sd(forecasterrors)
# generate normally distributed data with mean 0 and standard deviation mysd
mynorm <- rnorm(10000, mean=0, sd=mysd)
myhist <- hist(mynorm, plot=FALSE, breaks=mybins)
# plot the normal curve as a blue line on top of the histogram of forecast errors:
points(myhist$mids, myhist$density, type="l", col="blue", lwd=2)
}
kpss.test(DSLP$`sea pressure`,"Trend")
kpss.test(DSLP$`sea pressure`,"Level")
adf.test(DSLP$`sea pressure`)
Acf(DSLP$`sea pressure`)

ts_DSLP = ts(DSLP, frequency = 12, start = 1882)


decompose_DSLP = decompose(ts_DSLP, "additive")
adjust_DSLP = ts_DSLP - decompose_DSLP$seasonal
plot(adjust_DSLP)
Acf(adjust_DSLP)
Pacf(adjust_DSLP)
model<-arima(adjust_DSLP,order= c(1,0,0))
model
model<-arima(adjust_DSLP,order= c(2,0,0))
model
model<-arima(adjust_DSLP,order= c(3,0,0))
model
model<-arima(adjust_DSLP,order= c(4,0,0))
model
model<-arima(adjust_DSLP,order= c(1,0,1))
model
model<-arima(adjust_DSLP,order= c(2,0,1))
model
model<-arima(adjust_DSLP,order= c(3,0,1))
model
model<-arima(adjust_DSLP,order= c(4,0,1))
model
model<-arima(adjust_DSLP,order= c(1,0,2))
model
model<-arima(adjust_DSLP,order= c(2,0,2))
model
model<-arima(adjust_DSLP,order= c(3,0,2))
model
model<-arima(adjust_DSLP,order= c(4,0,2))
model
model<-arima(adjust_DSLP,order= c(1,0,3))
model
model<-arima(adjust_DSLP,order= c(2,0,3))
model
model<-arima(adjust_DSLP,order= c(3,0,3))
model
model<-arima(adjust_DSLP,order= c(4,0,3))
model
model<-arima(adjust_DSLP,order= c(5,0,0))
model
model<-arima(adjust_DSLP,order=c(2,0,2))
auto.arima(adjust_DSLP)
Acf(residuals(model))
Box.test(residuals(model), type="Ljung-Box")
plot.ts(residuals(model))
plotForecastErrors(residuals(model))

S-ar putea să vă placă și