Documente Academic
Documente Profesional
Documente Cultură
April 2014
Question 1
A pension fund analyst is investigating pension packages for people working in three sectors:
Education, Government, and Industry. The study is conducted in five geographic regions:
Atlantic, Quebec, Ontario, Prairie, and BC. Four sample values are selected for each sector-
region combination.
Factor 1: Sector
Factor 2: Region
3 x 5 x 4 = 60
3. What type of statistical model is most appropriate for this study?
Source DF SS MS F P
Region 4 24583 6145.8 16.15 0.000
Sector 2 76471 38235.5 100.50 0.000
Interaction 8 5232 654.0 1.72 0.120
Error 45 17120 380.5
Total 59 123407
4. Test the hypothesis that there is a significant interaction between Region and Sector.
Ho: No interaction
H1: Interaction
Ho: µE = µG = µI
We see that Industry has the highest values, Government is second and Education is the
lowest value.
6. Although the interaction is not statistically at the 5% level of significance, the analyst has decided
to inspect the interaction plot, and obtained the following result.
250
200
150
Atlantic BC Ontario Prairie Quebec
Region
Would you agree that there is no significant interaction? Explain. Identify any aspect of the
graph that might indicate some weak interaction.
Ontario Prairie
Based on the scenario of Question 1, suppose the following data were collected:
Source DF SS MS F P
Factor 4 17158 4289 3.79 0.040
Error 10 11312 1131
Total 14 28470
From the individual 95% CIs it appears that the only significant difference is between the mean
values for Quebec and BC.
Here is part of the Tukey post-hoc output:
Explain how you can tell that the mean value for Quebec is less than the mean value for BC.
Because the CI for µ BC - µ Q is (11.37, 191.96) showing that the mean for BC is greater than the
mean for Quebec by at least 11.37 and at most 191.96.
_ _
1 1
BC Q ( x BC x Q )t.01;10 s
nBC nQ
1 1
270.33-168.67 2.764(33.63)
3 3
101.66 75.90
25.76 BC Q 177.56
Construct a Bonferroni CI for µ Quebec with a family error rate of 10%.
𝑴𝑺𝑬
𝝁𝑸 = 𝟏𝟔𝟖. 𝟔𝟕 ± 𝒕∝/𝟐𝒌;𝒏𝑻−𝒑 √
𝟑
𝟏𝟏𝟑𝟏
= 𝟏𝟔𝟖. 𝟔𝟕 ± 𝒕.𝟎𝟏;𝟏𝟎 √
𝟑
= 𝟏𝟔𝟖. 𝟔𝟕 ± 𝟐. 𝟕𝟔𝟒√𝟑𝟕𝟕
= 𝟏𝟔𝟖. 𝟔𝟕 ± 𝟓𝟑. 𝟔𝟕
The following table shows an extract of average monthly exchange rates from US to Canadian dollars
from Jan 2006 to December 2009:
Time Series Plot of USD_CDN
1.30
1.25
1.20
USD_CDN
1.15
1.10
1.05
1.00
0.95
1 5 10 15 20 25 30 35 40 45
Index
Trend Analysis Plot for USD_CDN
Linear Trend Model
Yt = 1.1095 - 0.000214*t
1.30 Variable
A ctual
1.25 Fits
A ccuracy Measures
1.20 MA PE 5.77722
MA D 0.06345
MSD 0.00581
USD_CDN
1.15
1.10
1.05
1.00
0.95
1 5 10 15 20 25 30 35 40 45
Index
Comment on trend.
Seasonal Indices
Period Index
1 1.03452
2 1.03519
3 1.03885
4 1.01790
5 0.98301
6 0.97400
7 0.97778
8 0.97726
9 0.97427
10 0.98507
11 0.99257
12 1.00959
Accuracy Measures
MAPE 5.30149
MAD 0.05773
MSD 0.00507
Estimate forecasts for Jan 2010 and June 2010 using trend and seasonal effects only. Assume t =
1 in January 2006.
In Jan 2010, t = 49. Therefore, we are calculating forecast values for t = 49 (Jan 2010) and t
= 54 (June 2010)
The ACF and PACF graphs for the USD -> CDN exchange rates are show below:
1.0 1.0
0.8 0.8
0.6 0.6
Partial Autocorrelation
0.4 0.4
Autocorrelation
0.2 0.2
0.0 0.0
-0.2 -0.2
-0.4 -0.4
-0.6 -0.6
-0.8 -0.8
-1.0 -1.0
1 5 10 15 20 25 30 35 1 5 10 15 20 25 30 35
Lag Lag
Comment on seasonality in the data.
Not much evidence of seasonal variation. (Because there are no spikes at seasonal lags)
Significant spikes in PACF at lags 1 and 2. The spike at lag 5 is probably due to a random
shock since it is not at a seasonal lag and there is no obvious reason to expect that lag 5
would exert a major influence on the exchange rate data.
Number of observations: 48
Residuals: SS = 0.0342418 (backforecasts excluded)
MS = 0.0007609 DF = 45
Yt 0 Y
1 y 1 Y
2 t 2 et
Question 4
(𝟏 − 𝑩𝟒 )(𝟏 − 𝝓𝟏 𝑩)𝒀𝒕 = 𝝓𝟎 + 𝒆𝒕
The following annual time series data is to be analyzed to develop a suitable forecasting model.
t Y
1 8.7776
2 21.2374
3 13.9845
4 20.3498
5 13.2213
6 21.9456
7 16.8978
8 18.6708
9 15.9082
10 21.0304
11 22.6019
12 22.3881
13 20.2390
14 26.3421
15 26.7765
16 25.9876
17 21.0126
18 22.5638
19 28.7220
20 27.7834
25
20
Y
15
10
2 4 6 8 10 12 14 16 18 20
Index
Is the data set stationary.
NO
10
5
D1
-5
-10
2 4 6 8 10 12 14 16 18 20
Index
Explain why this time series appears to be stationary with regard to the mean.
No trend
Analysis of Variance
Source DF SS MS F P
Regression 1 307.58 307.58 27.63 0.000
Residual Error 18 200.40 11.13
Total 19 507.98
Unusual Observations
50 0.0
-2.5
10
-5.0
1
-8 -4 0 4 8 15 18 21 24 27
Residual Fitted Value
Residual
2 0.0
-2.5
1
-5.0
0
-6 -4 -2 0 2 4 6 2 4 6 8 10 12 14 16 18 20
Residual Observation Order
Comment on the assumptions of the regression model. If you think any of the assumptions are
not satisfied, explain your reasoning.
The four-in-one plot shows that the assumptions of normality and homoscedasticity appear
to be satisfied. However the “versus order” plot appears as if it may a have a pattern
indicating first order autocorrelation. We confirm this suspicion by examining the Durbin-
Watson statistic of 2.65415.
Looking up the critical values of the DW statistic for n = 20 and k = 1 we get DL,.05 = 1.20 and
DU,.05 = 1.41. Since DW = 2.65415 we must look at the upper tail values 4 – 1.41 = 2.59 and 4
– 1.20 = 2.80. Since DW = 2.65415 lies between 2.59 and 2.80 it is in the inconclusive region,
therefore we cannot make any claim about the presence or absence of first order negative
autocorrelation.
Thus, the simple regression model above may be an acceptable forecasting model. We note
that the adjusted R2 = .584 and MSE = 11.13.
Let us now investigate some ARIMA models. Output for three models is shown. Discuss the
models and recommend one, justifying your conclusion.
1.0 1.0
0.8 0.8
0.6 0.6
Partial Autocorrelation
0.4 0.4
Autocorrelation
0.2 0.2
0.0 0.0
-0.2 -0.2
-0.4 -0.4
-0.6 -0.6
-0.8 -0.8
-1.0 -1.0
1 2 3 4 5 6 1 2 3 4 5 6
Lag Lag
ARIMA(1,1,0)
This model has a significant p-value for the Lag 1 variable Yt-1 but the MS value is 13.33
(compared with 11.13 for the SLR model).
ARIMA(0,1,1)
Here we see that both the AR(1) term and the MA(1) are significant, and the MS values has
made a large drop to 8.863.