MGSC 372 Final Review April 2014: Factor 1: Sector Factor 2: Region

MGSC 372 Final Review
April 2014
Question 1
A pension fund analyst is investigating pension packages for people working in three sectors:
Education, Government, and Industry. The study is conducted in five geographic regions:
Atlantic, Quebec, Ontario, Prairie, and BC. Four sample values are selected for each sector-
region combination.
1. What are the factors for this study?
Factor 1: Sector
Factor 2: Region
Specify the levels of each factor.
Factor 1: Level 1 = Education

Level 2 = Government
Level 3 = Industry
Factor 2: : Level 1 = Atlantic

Level 2 = Quebec
Level 3 = Ontario
Level 4 = Prairie
Level 5 = BC
2. What is the total sample size?
3 x 5 x 4 = 60
3. What type of statistical model is most appropriate for this study?
Two-way ANOVA with replication
Minitab output is as follows:
Two-way ANOVA: Value versus Region, Sector
Source DF SS MS F P
Region 4 24583 6145.8 16.15 0.000
Sector 2 76471 38235.5 100.50 0.000
Interaction 8 5232 654.0 1.72 0.120
Error 45 17120 380.5
Total 59 123407
4. Test the hypothesis that there is a significant interaction between Region and Sector.
Ho: No interaction
H1: Interaction
TS: F = 654/380.5 = 1.72
CV: F.05;8,45 2.18
Conclusion: Do not reject Ho i.e. no significant interaction

5. Test the hypothesis that the main effect Sector is significant.
Ho: µE = µG = µI
H1: Not all µ are equal
TS: F = 38235.5/380.5 = 100.5
CV: F.05;2,45 3.2
Conclusion: Reject Ho => Not all Sector means are equal
The Minitab output for Individual 95% CIs is shown below:
Individual 95% CIs For Mean Based on

Pooled StDev
Region Mean -+---------+---------+---------+--------
Atlantic 250.167 (---*----)
BC 268.500 (---*----)
Ontario 225.750 (---*----)
Prairie 245.000 (----*----)
Quebec 209.917 (----*----)
-+---------+---------+---------+--------
200 225 250 275

Pooled StDev
Sector Mean -------+---------+---------+---------+--
Education 197.0 (--*--)
Government 238.2 (--*--)
Industry 284.4 (--*--)
-------+---------+---------+---------+--
210 240 270 300
We see that Industry has the highest values, Government is second and Education is the
lowest value.
6. Although the interaction is not statistically at the 5% level of significance, the analyst has decided
to inspect the interaction plot, and obtained the following result.
Interaction Plot for Value

Data Means
350 Sector
Education
Gov ernment
Industry
300
Mean
250
200
150
Atlantic BC Ontario Prairie Quebec
Region
Would you agree that there is no significant interaction? Explain. Identify any aspect of the
graph that might indicate some weak interaction.
Overall patterns are very similar, with the following exception:
Ontario Prairie
- Govt and Industry increase

- Education decreases
Question 2
Based on the scenario of Question 1, suppose the following data were collected:
A one-way ANOVA is conducted with the following results:
One-way ANOVA: Atlantic, Quebec, Ontario, Prairie, BC
Source DF SS MS F P
Factor 4 17158 4289 3.79 0.040
Error 10 11312 1131
Total 14 28470
S = 33.63 R-Sq = 60.27% R-Sq(adj) = 44.37%

Pooled StDev
Level N Mean StDev -----+---------+---------+---------+----
Atlantic 3 240.00 32.79 (--------*--------)
Quebec 3 168.67 3.51 (--------*-------)
Ontario 3 210.00 36.06 (--------*--------)
Prairie 3 231.67 45.37 (-------*--------)
BC 3 270.33 34.79 (--------*--------)
-----+---------+---------+---------+----
150 200 250 300
What can you conclude from the above output?
Conclude not all 5 means are equal.
From the individual 95% CIs it appears that the only significant difference is between the mean
values for Quebec and BC.
Here is part of the Tukey post-hoc output:
Tukey 95% Simultaneous Confidence Intervals

All Pairwise Comparisons
Individual confidence level = 99.18%
Quebec subtracted from:
Lower Center Upper ---------+---------+---------+---------+

Ontario -48.96 41.33 131.63 (--------*--------)
Prairie -27.29 63.00 153.29 (--------*--------)
BC 11.37 101.67 191.96 (--------*--------)
---------+---------+---------+---------+
-100 0 100 200
Explain how you can tell that the mean value for Quebec is less than the mean value for BC.
Because the CI for µ BC - µ Q is (11.37, 191.96) showing that the mean for BC is greater than the
mean for Quebec by at least 11.37 and at most 191.96.
Construct a 98% CI for µ BC - µ Q
_ _
1 1
BC Q ( x BC x Q )t.01;10 s
nBC nQ
1 1
270.33-168.67 2.764(33.63)
3 3
101.66 75.90
25.76 BC Q 177.56
Construct a Bonferroni CI for µ Quebec with a family error rate of 10%.
𝑴𝑺𝑬
𝝁𝑸 = 𝟏𝟔𝟖. 𝟔𝟕 ± 𝒕∝/𝟐𝒌;𝒏𝑻−𝒑 √
𝟑
𝟏𝟏𝟑𝟏
= 𝟏𝟔𝟖. 𝟔𝟕 ± 𝒕.𝟎𝟏;𝟏𝟎 √
𝟑
= 𝟏𝟔𝟖. 𝟔𝟕 ± 𝟐. 𝟕𝟔𝟒√𝟑𝟕𝟕
= 𝟏𝟔𝟖. 𝟔𝟕 ± 𝟓𝟑. 𝟔𝟕
The Bonferroni CI with family confidence = 90% is 𝟏𝟏𝟓 ≤ 𝝁𝑸 ≤ 𝟐𝟐𝟐. 𝟑𝟒

Question 3
The following table shows an extract of average monthly exchange rates from US to Canadian dollars
from Jan 2006 to December 2009:
Time Series Plot of USD_CDN
1.30
1.25
1.20
USD_CDN
1.15
1.10
1.05
1.00
0.95
1 5 10 15 20 25 30 35 40 45
Index
Trend Analysis Plot for USD_CDN
Linear Trend Model
Yt = 1.1095 - 0.000214*t
1.30 Variable
A ctual
1.25 Fits
A ccuracy Measures
1.20 MA PE 5.77722
MA D 0.06345
MSD 0.00581
USD_CDN
1.15
1.10
1.05
1.00
0.95
1 5 10 15 20 25 30 35 40 45
Index
Comment on trend.
Very little trend. Slope is 0.000214.
Comment on stationarity with regard to the mean.
Since the trend is negligible, we can consider the series to be stationary.
Seasonal Indices
Period Index
1 1.03452
2 1.03519
3 1.03885
4 1.01790
5 0.98301
6 0.97400
7 0.97778
8 0.97726
9 0.97427
10 0.98507
11 0.99257
12 1.00959
Accuracy Measures
MAPE 5.30149
MAD 0.05773
MSD 0.00507
Estimate forecasts for Jan 2010 and June 2010 using trend and seasonal effects only. Assume t =
1 in January 2006.
In Jan 2010, t = 49. Therefore, we are calculating forecast values for t = 49 (Jan 2010) and t
= 54 (June 2010)
Jan 2010: t = 49 => T49 = 1.1095 - .000214(49) = 1.099014

S1 = 1.03452
F49 = 1.099014 x 1.03452 = 1.13695
June 2010: t = 54 => T54 = 1.1095 - .000214(54) = 1.097944

S6 = 0.974
F54 = 1.097944 x 0.974 = 1.0694
Calculate a deseasonalized exchange rate for April 2007
April 2007 corresponds to t = 16.
Deseasonalized value for April 2007 = Y16/S4 = 1.13425/1.0179 = 1.1143
The ACF and PACF graphs for the USD -> CDN exchange rates are show below:
Autocorrelation Function for USD_CDN Partial Autocorrelation Function for USD_CDN

(with 5% significance limits for the autocorrelations) (with 5% significance limits for the partial autocorrelations)
1.0 1.0
0.8 0.8
0.6 0.6
Partial Autocorrelation
0.4 0.4
Autocorrelation
0.2 0.2
0.0 0.0
-0.2 -0.2
-0.4 -0.4
-0.6 -0.6
-0.8 -0.8
-1.0 -1.0
1 5 10 15 20 25 30 35 1 5 10 15 20 25 30 35
Lag Lag
Comment on seasonality in the data.
Not much evidence of seasonal variation. (Because there are no spikes at seasonal lags)
Specify a potential ARIMA model.
Exponential decay in ACF
Significant spikes in PACF at lags 1 and 2. The spike at lag 5 is probably due to a random
shock since it is not at a seasonal lag and there is no obvious reason to expect that lag 5
would exert a major influence on the exchange rate data.
Therefore we suggest an ARIMA(2,0,0) model.
The ARIMA(2,0,0) model is shown below:
Final Estimates of Parameters
Type Coef SE Coef T P

AR 1 1.2983 0.1372 9.47 0.000
AR 2 -0.3988 0.1373 -2.90 0.006
Constant 0.111048 0.003992 27.82 0.000
Mean 1.10474 0.03971
Number of observations: 48
Residuals: SS = 0.0342418 (backforecasts excluded)
MS = 0.0007609 DF = 45
Write out the theoretical model.
Yt 0 Y
1 y 1 Y
2 t 2 et
Write out the estimated model.
Yˆt 0.111048 1.2983Yt 1 0.3988Yt 2

The last four months of data appear as follows:
Sep 2009 1.08176190 0.92441784

Oct 2009 1.05485238 0.94799995
Nov 2009 1.05957500 0.94377463
Dec 2009 1.05440000 0.94840668
Use this model to forecast values for Jan 2010.
Jan 2010: Y49 = .111048 + 1.2983Y48 - .3988Y47

= .111048 + 1.2983(1.0544) - .3988(1.059575)
= 1.057417
Question 4
Consider the ARIMA model ARIMA(2,1,1)
a) Express this model using the Backshift Operator.
(𝟏 − 𝑩)(𝟏 − 𝝓𝟏 𝑩 − 𝝓𝟐 𝑩𝟐 )𝒀𝒕 = 𝝓𝟎 − (𝟏 − 𝜽𝟏 𝑩)𝒆𝒕
b) Consider the model ARIMA(1,0,0)(0,1,0)4.
1) Express this in backshift notation.
(𝟏 − 𝑩𝟒 )(𝟏 − 𝝓𝟏 𝑩)𝒀𝒕 = 𝝓𝟎 + 𝒆𝒕
2) Express this in a form to forecast Yt based on lagged values of Yt.
𝒀𝒕 = 𝝓𝟎 + 𝝓𝟏 𝒀𝒕−𝟏 + 𝒀𝒕−𝟒 − 𝝓𝟏 𝒀𝒕−𝟓 + 𝒆𝒕

Question 5
The following annual time series data is to be analyzed to develop a suitable forecasting model.
t Y
1 8.7776
2 21.2374
3 13.9845
4 20.3498
5 13.2213
6 21.9456
7 16.8978
8 18.6708
9 15.9082
10 21.0304
11 22.6019
12 22.3881
13 20.2390
14 26.3421
15 26.7765
16 25.9876
17 21.0126
18 22.5638
19 28.7220
20 27.7834
The time series plot follows:
Time Series Plot of Y

30
25
20
Y
15
10
2 4 6 8 10 12 14 16 18 20
Index
Is the data set stationary.
NO
How would you make it stationary?
Take a first difference D1 = Yt – Yt-1
The time series plot of the first difference D1 is as follows:
Time Series Plot of D1

15
10
5
D1
-5
-10
2 4 6 8 10 12 14 16 18 20
Index
Explain why this time series appears to be stationary with regard to the mean.
No trend
A simple linear regression of Y on time (t) appears as follows:
Regression Analysis: Y versus t
The regression equation is

Y = 13.7 + 0.680 t
Predictor Coef SE Coef T P

Constant 13.681 1.550 8.83 0.000
t 0.6801 0.1294 5.26 0.000
S = 3.33663 R-Sq = 60.6% R-Sq(adj) = 58.4%
Analysis of Variance
Source DF SS MS F P
Regression 1 307.58 307.58 27.63 0.000
Residual Error 18 200.40 11.13
Total 19 507.98
Unusual Observations
Obs t Y Fit SE Fit Residual St Resid

2 2.0 21.237 15.041 1.329 6.196 2.02R
R denotes an observation with a large standardized residual.
Durbin-Watson statistic = 2.65415
Residual Plots for Y

Normal Probability Plot Versus Fits
99
5.0
90
2.5
Residual
Percent
50 0.0
-2.5
10
-5.0
1
-8 -4 0 4 8 15 18 21 24 27
Residual Fitted Value
Histogram Versus Order

4
5.0
3 2.5
Frequency
Residual
2 0.0
-2.5
1
-5.0
0
-6 -4 -2 0 2 4 6 2 4 6 8 10 12 14 16 18 20
Residual Observation Order
Comment on the assumptions of the regression model. If you think any of the assumptions are
not satisfied, explain your reasoning.
The four-in-one plot shows that the assumptions of normality and homoscedasticity appear
to be satisfied. However the “versus order” plot appears as if it may a have a pattern
indicating first order autocorrelation. We confirm this suspicion by examining the Durbin-
Watson statistic of 2.65415.
Looking up the critical values of the DW statistic for n = 20 and k = 1 we get DL,.05 = 1.20 and
DU,.05 = 1.41. Since DW = 2.65415 we must look at the upper tail values 4 – 1.41 = 2.59 and 4
– 1.20 = 2.80. Since DW = 2.65415 lies between 2.59 and 2.80 it is in the inconclusive region,
therefore we cannot make any claim about the presence or absence of first order negative
autocorrelation.
Thus, the simple regression model above may be an acceptable forecasting model. We note
that the adjusted R2 = .584 and MSE = 11.13.
Let us now investigate some ARIMA models. Output for three models is shown. Discuss the
models and recommend one, justifying your conclusion.
The ACF and PACF functions for D1 are shown below:
Autocorrelation Function for D1 Partial Autocorrelation Function for D1

(with 5% significance limits for the autocorrelations) (with 5% significance limits for the partial autocorrelations)
1.0 1.0
0.8 0.8
0.6 0.6
Partial Autocorrelation
0.4 0.4
Autocorrelation
0.2 0.2
0.0 0.0
-0.2 -0.2
-0.4 -0.4
-0.6 -0.6
-0.8 -0.8
-1.0 -1.0
1 2 3 4 5 6 1 2 3 4 5 6
Lag Lag
We see that these graphs have a significant spike at lag 1 so an ARIMA(1,1,0) or

ARIMA(0,1,1) would be plausible models. Let’s investigate further.
ARIMA(1,1,0)

AR 1 -0.8386 0.1462 -5.74 0.000
Constant 1.3981 0.8380 1.67 0.114
Differencing: 1 regular difference

Number of observations: Original series 20, after differencing 19
MS = 13.330 DF = 17
This model has a significant p-value for the Lag 1 variable Yt-1 but the MS value is 13.33
(compared with 11.13 for the SLR model).
ARIMA(0,1,1)

MA 1 0.9458 0.2839 3.33 0.004
Constant 0.7029 0.1413 4.97 0.000

MS = 10.529 DF = 17
This model also has a significant Lag 1 variable et-1, but the MS values has been reduced to
10.529, better than the SLR model value of 11.13.
Finally, we will look at the mixed model ARIMA(1,1,1):

AR 1 -0.4981 0.2103 -2.37 0.031
MA 1 0.9302 0.2233 4.17 0.001
Constant 0.98249 0.08066 12.18 0.000

MS = 8.863 DF = 16
Here we see that both the AR(1) term and the MA(1) are significant, and the MS values has
made a large drop to 8.863.
This is the model I recommend!

MGSC 372 Final Review April 2014: Factor 1: Sector Factor 2: Region

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

MGSC 372 Final Review April 2014: Factor 1: Sector Factor 2: Region

Încărcat de

Drepturi de autor:

Formate disponibile

MGSC 372 Final Review

1. What are the factors for this study?

Specify the levels of each factor.

Factor 1: Level 1 = Education

Factor 2: : Level 1 = Atlantic

2. What is the total sample size?

Two-way ANOVA with replication

Minitab output is as follows:

Two-way ANOVA: Value versus Region, Sector

TS: F = 654/380.5 = 1.72

CV: F.05;8,45 2.18

Conclusion: Do not reject Ho i.e. no significant interaction

H1: Not all µ are equal

TS: F = 38235.5/380.5 = 100.5

CV: F.05;2,45 3.2

Conclusion: Reject Ho => Not all Sector means are equal

The Minitab output for Individual 95% CIs is shown below:

Individual 95% CIs For Mean Based on

Individual 95% CIs For Mean Based on

Interaction Plot for Value

Overall patterns are very similar, with the following exception:

- Govt and Industry increase

A one-way ANOVA is conducted with the following results:

One-way ANOVA: Atlantic, Quebec, Ontario, Prairie, BC

S = 33.63 R-Sq = 60.27% R-Sq(adj) = 44.37%

Individual 95% CIs For Mean Based on

What can you conclude from the above output?

Conclude not all 5 means are equal.

Tukey 95% Simultaneous Confidence Intervals

Individual confidence level = 99.18%

Quebec subtracted from:

Lower Center Upper ---------+---------+---------+---------+

Construct a 98% CI for µ BC - µ Q

The Bonferroni CI with family confidence = 90% is 𝟏𝟏𝟓 ≤ 𝝁𝑸 ≤ 𝟐𝟐𝟐. 𝟑𝟒

Very little trend. Slope is 0.000214.

Comment on stationarity with regard to the mean.

Since the trend is negligible, we can consider the series to be stationary.

Jan 2010: t = 49 => T49 = 1.1095 - .000214(49) = 1.099014

F49 = 1.099014 x 1.03452 = 1.13695

June 2010: t = 54 => T54 = 1.1095 - .000214(54) = 1.097944

F54 = 1.097944 x 0.974 = 1.0694

Calculate a deseasonalized exchange rate for April 2007

April 2007 corresponds to t = 16.

Deseasonalized value for April 2007 = Y16/S4 = 1.13425/1.0179 = 1.1143

Autocorrelation Function for USD_CDN Partial Autocorrelation Function for USD_CDN

Specify a potential ARIMA model.

Exponential decay in ACF

Therefore we suggest an ARIMA(2,0,0) model.

The ARIMA(2,0,0) model is shown below:

Final Estimates of Parameters

Type Coef SE Coef T P

Write out the theoretical model.

Write out the estimated model.

Yˆt 0.111048 1.2983Yt 1 0.3988Yt 2

Sep 2009 1.08176190 0.92441784

Use this model to forecast values for Jan 2010.

Jan 2010: Y49 = .111048 + 1.2983Y48 - .3988Y47

Consider the ARIMA model ARIMA(2,1,1)

a) Express this model using the Backshift Operator.

(𝟏 − 𝑩)(𝟏 − 𝝓𝟏 𝑩 − 𝝓𝟐 𝑩𝟐 )𝒀𝒕 = 𝝓𝟎 − (𝟏 − 𝜽𝟏 𝑩)𝒆𝒕

b) Consider the model ARIMA(1,0,0)(0,1,0)4.

1) Express this in backshift notation.

2) Express this in a form to forecast Yt based on lagged values of Yt.