Documente Academic
Documente Profesional
Documente Cultură
forecasting for
business and
economics
Ch5. Regression models
OTexts.org/fpp2/
1
Outline
2
Multiple regression and forecasting
autoplot(uschange[,c("Consumption","Income")]) +
ylab("% change") + xlab("Year")
2.5
% change
series
0.0 Consumption
Income
−2.5
4
Example: US consumption expenditure
●
●
2 ●
● ●
● ●
●
● ● ●●●
● ● ●
●
● ● ●
●●
Consumption (quarterly % change)
● ● ● ●
● ● ●
● ● ● ● ●
●
●● ● ● ● ●
●
● ● ●●● ● ●●●
●●
●
● ●● ●
1 ●● ● ● ●● ● ● ●
●●● ●
●● ● ● ● ●●● ●
●
● ● ● ●● ●
● ●● ●●● ● ●
● ● ●
● ● ●
● ●●
●● ●
● ● ●●● ● ●
● ●● ● ● ●●● ● ●●
● ●
●● ● ●● ● ● ●●● ● ●●●
●
● ●● ●● ● ● ●
●● ● ● ● ● ● ●
● ● ● ● ●
●
0 ●● ●
● ●●
●
●
● ●
● ●
●
● ● ●
●
−1
●
−2
●
2.5
Production
0.0
−2.5
−5.0
50
25
Savings
0
−25
−50
1.5
1.0
Unemployment
0.5
0.0
−0.5
−1.0
1970 1980 1990 2000 2010
Year
7
Example: US consumption expenditure
Consumption Income Production Savings Unemployment
Consumption
0.6
0.4
Corr: Corr: Corr: Corr:
0.2 0.399 0.547 −0.24 −0.541
0.0
●
●
2.5 ●
Income
●●●●● ●●●●●
●
●●●●
●
● ● ●●●●
●●
●
●
●●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●
●●●
●
●
●●
● ●●●●
●
●●●●● ●
●
●●● ●●
●
Corr: Corr: Corr:
0.0 ● ●●
●●●
●
●
●●
●
●●
●
●
●
●●
●●●
●●● ●●●
● ● ●●● ●●
−2.5
● ●
●
● 0.278 0.714 −0.23
●
●
● ● ●● ●
● ● ● ●● ●● ●● ●
Production
●●
●
2.5 ● ●●
●●●●
●●●●●
●
● ●●●
● ●
● ●
●●●
●●
●
●●
●●
●●
●●●
●
●
●●● ●
●
●
●
●
●
●
●●●
●
●
●●
●●
●
●●
●●●●
●
● ●
●●●
● ● ●● ●●
●●
●●
●
●
●
●●●
●
●
●●
●●
●●
●●●
●
●●
●
0.0 ●
●
●●●
●●●
●●
●●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●●●
●
●●
● ●●
●●
● ●
●●
●
●
●●
●
●
●
●
●● ● ●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●●
●
●
●●
●
●
●●●
●●●
●
●●
Corr: Corr:
●
● ● ● ●●● ● ●●●
●● ● ●● ●●
●
● ● ●
● ● ● ●
● ● ● ●
−2.5 ●●
●
● ●
−0.0687 −0.787
● ●● ● ● ●
−5.0 ● ●
● ●
50 ●
●●● ● ●
●
●●
● ●
●
●
●
●
● ● ● ● ● ● ●●
25
Savings
●● ● ●●
● ●● ● ● ● ●●
● ●●●● ●
●
● ● ● ●● ●●●
0
● ● ● ●
●● ●
●●
●
●●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●●
●
●
●
●●
●
●
●
●●●● ●●
●
● ●●
●●●●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●●
●
●
●●
●●
●
●
●
● ●
●● ●●
●
●
●●
●
● ●
●●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●●●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●●
●
●
●●
●● ●
●●●●●
● Corr:
●
●●
●●●
●●
●●
●
●
●●●●●●
● ●●
●
●●
●● ● ●
●
●●
●
●●
●
●
●●
●
●●●
●
●●●
● ●● ● ●●
●●
●
●
●●
●
●
●
●
●
●●
●●●
●●
●
● ●
●● ●● ●● ● ●●●● ●
−25 ●
● ●● ● ● ●
●
● 0.11
● ● ●
−50 ● ● ●
● ● ●
Unemployment
1.5 ● ●● ● ● ● ●
● ● ● ● ●● ●● ●●
● ●
1.0 ●● ● ● ●● ●●● ●● ●
● ● ● ●● ● ●●● ● ●● ●● ●● ●
● ● ● ●●● ●●
0.5 ●● ● ●●●
● ●●
●
●
●
●●
●
●●
●
●●● ●
●
●
●
●●●●●
●
● ●●● ● ●●●
●
●● ●
●● ●
●●●
●● ●●
●
● ●●●
●●
●
● ●
●●●● ●●
● ● ●●
● ●
●●
●
●●
●●●
●●
●
●
●●●
● ● ●● ● ●●●●● ● ● ● ●●● ●●
●●
● ●● ●
●●
● ● ●● ●●
0.0 ●●
●●●
●●
●●
●●
●
●●
●●
●
●●
●
●●
●
●
●●
●
●
● ●●
●● ●● ●●
●●
●●
●
●
●
●●
●
●
●●
●●
●
●
●
●●
●
●●●●●●●
● ●● ●
●
●●
●●
●
●●
●●
●
●
●
●●
●
●
●
●
●●
●
●
●●
●●
●
● ●● ● ●●●
●
●●
●
●●
●
●
●
●●
●
●●
●
●●
●
●●
●●●● ● ●
●
●
●
●
●●
●
●
●●●
●●●
●●
●
●●●
●●●●
● ●●●
●● ●●
● ●●●●
●
●
●
●●
●
●
●
●●●
●
●
●
●●
●
●
●
●●
●●●
●●
●●●
●
●●●
●
●
●
●●
●
●●
●
●●
●
●
●●●
●
●●
●● ● ● ●●●●
●●
●
●●
●
●
●●
●
●●
●●
●
●●
●●
●●
●
●●
●●
● ●
●●●●● ●
●●
●● ●●
● ● ● ●● ●●●●
●●
●●
●
● ●
●
●●
●●●
●● ● ●●
●●
●●
●
●●
● ●
●●●● ●●
● ● ● ●●●●
●
●●●●
●●
●●
●
●●
●●
●●
−0.5 ● ●
●
● ● ● ● ●●●
●●● ●●●● ●
● ● ● ●
●●●●● ●
●● ●● ●● ●●
−1.0
−2 −1 0 1 2 −2.5 0.0 2.5 −5.0−2.5 0.0 2.5 −50 −25 0 25 50−1.0−0.5 0.0 0.5 1.0 1.5
8
Example: US consumption expenditure
fit.consMR <- tslm(
Consumption ~ Income + Production + Unemployment + Savings,
data=uschange)
summary(fit.consMR)
##
## Call:
## tslm(formula = Consumption ~ Income + Production + Unemployment +
## Savings, data = uschange)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.88296 -0.17638 -0.03679 0.15251 1.20553
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.26729 0.03721 7.184 1.68e-11 ***
## Income 0.71449 0.04219 16.934 < 2e-16 ***
## Production 0.04589 0.02588 1.773 0.0778 .
## Unemployment -0.20477 0.10550 -1.941 0.0538 .
## Savings -0.04527 0.00278 -16.287 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3286 on 182 degrees of freedom
## Multiple R-squared: 0.754, Adjusted R-squared: 0.7486 9
## F-statistic: 139.5 on 4 and 182 DF, p-value: < 2.2e-16
Example: US consumption expenditure
Percentage change in US consumption expenditure
Data
0
Fitted
−1
−2
2
● ●●
● ●
● ●●
● ● ●
● ● ●●
● ● ● ●
●
●
●
● ● ●
Data (actual values)
●● ● ●
● ● ● ● ● ● ●
●●●● ● ●● ●
1 ● ●● ● ●●●
● ● ●
● ●
●
● ● ● ●●●●
●
●
● ●
●●
●●●● ●●●● ●● ● ● ●
● ●
●●●● ● ●● ●
● ● ●● ●●●●●●● ●
● ● ●●●● ●
● ● ●
●● ● ● ● ●●
● ● ● ●●●●
●
● ●●● ●● ●●● ● ●
● ● ● ●
● ● ● ● ● ●
● ●
● ● ● ●
●
0 ●
●
● ●
● ● ● ●
● ● ●
● ●
●
●
●
●
−1
−2 −1 0 1 2
Fitted (predicted values)
11
Example: US consumption expenditure
Residuals from Linear regression model
●
1.0 ● ●
●
●
●
●
● ● ●
● ●
0.5 ●
●
●
●
●
●
●
●
●
●
● ●
● ●
● ● ● ● ●
● ●
● ●●
● ● ● ●
● ● ● ● ●● ● ●
● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ●
● ● ● ● ● ●● ●
0.0 ●●
●
●
●
●
●
● ● ●
●
●
●
●
● ●
●
●
●
●
●
●
●
● ●● ●
●
●
●
●●
●
●
●
●
●
●
●
●● ●
● ●
●
●●
●
●
●
●●
●
●
●
●
●
● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ●
● ●
● ● ● ● ●
● ● ● ●
● ● ●
● ● ●●
●
−0.5 ●
●
●
●
●
● ●
● ●
●
●
●
0.1
30
count
0.0
ACF
20
−0.1 10
−0.2 0
4 8 12 16 20 −1.0 −0.5 0.0 0.5 1.0
Lag residuals
12
Outline
13
Multiple regression and forecasting
14
Multiple regression and forecasting
14
Residual plots
15
Residual patterns
16
Breusch-Godfrey test
OLS regression:
yt = β0 + β1 xt,1 + · · · + βk xt,k + ut
Auxiliary regression:
ût = β0 + β1 xt,1 + · · · + βk xt,k + ρ1 ût−1 + · · · + ρp ût−p + εt
##
## Breusch-Godfrey test for serial correlation of order up to 8
##
## data: Residuals from Linear regression model
## LM test = 14.874, df = 8, p-value = 0.06163
18
Outline
19
Trend
Linear trend
xt = t
t = 1, 2, . . . , T
Strong assumption that trend will continue.
20
Dummy variables
If a categorical variable takes
only two values (e.g., ‘Yes’ or
‘No’), then an equivalent
numerical variable can be
constructed taking value 1 if
yes and 0 if no. This is called
a dummy variable.
21
Dummy variables
If there are more than
two categories, then the
variable can be coded
using several dummy
variables (one fewer
than the total number of
categories).
22
Beware of the dummy variable trap!
23
Uses of dummy variables
Seasonal dummies
For quarterly data: use 3 dummies
For monthly data: use 11 dummies
For daily data: use 6 dummies
What to do with weekly data?
24
Uses of dummy variables
Seasonal dummies
For quarterly data: use 3 dummies
For monthly data: use 11 dummies
For daily data: use 6 dummies
What to do with weekly data?
Outliers
If there is an outlier, you can use a dummy variable
(taking value 1 for that observation and 0 elsewhere)
to remove its effect.
24
Uses of dummy variables
Seasonal dummies
For quarterly data: use 3 dummies
For monthly data: use 11 dummies
For daily data: use 6 dummies
What to do with weekly data?
Outliers
If there is an outlier, you can use a dummy variable
(taking value 1 for that observation and 0 elsewhere)
to remove its effect.
Public holidays
For daily data: if it is a public holiday, dummy=1,
otherwise dummy=0.
24
Beer production revisited
Australian quarterly beer production
500
megalitres
450
400
Regression model
yt = β0 + β1 t + β2 d2,t + β3 d3,t + β4 d4,t + εt
26
Beer production revisited
fit.beer <- tslm(beer ~ trend + season)
summary(fit.beer)
##
## Call:
## tslm(formula = beer ~ trend + season)
##
## Residuals:
## Min 1Q Median 3Q Max
## -42.903 -7.599 -0.459 7.991 21.789
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 441.80044 3.73353 118.333 < 2e-16 ***
## trend -0.34027 0.06657 -5.111 2.73e-06 ***
## season2 -34.65973 3.96832 -8.734 9.10e-13 ***
## season3 -17.82164 4.02249 -4.430 3.45e-05 ***
## season4 72.79641 4.02305 18.095 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 12.23 on 69 degrees of freedom
## Multiple R-squared: 0.9243, Adjusted R-squared: 0.9199
## F-statistic: 210.7 on 4 and 69 DF, p-value: < 2.2e-16 27
Beer production revisited
Quarterly Beer Production
500
Megalitres
series
Data
450
Fitted
400
480
Quarter
● 1
Fitted
● 2
●
440 ●
● ● ● 3
●
● ● ●
●
● ● ● 4
●
●
● ● ●
● ●
● ● ●
● ●●
● ● ●
●●
● ●
●●
● ● ● ● ●
● ●●
400 ● ●●
●
● ●
● ●
● ●
● ● ●
● ●
20 ●
●
●
●
●
●
●
● ●
●
●
●
● ● ● ●
● ●
● ● ● ●
● ● ●
● ● ● ●
● ●
0 ●
●
● ●
●
●
● ●
●
●
●
●
●
●
● ●
●
● ●
●
●
●
● ●
● ● ●
● ●
●
● ● ● ●
● ● ●
−20 ●
● ●
−40 ●
0.2
15
0.1
count
ACF
0.0 10
−0.1
5
−0.2
0
4 8 12 16 −40 −20 0 20
Lag residuals
30
Beer production revisited
Forecasts from Linear regression model
500
level
450
beer
80
95
400
350
1995 2000 2005 2010
Time
31
Fourier series
Spikes
Equivalent to a dummy variable for handling an
outlier.
34
Intervention variables
Spikes
Equivalent to a dummy variable for handling an
outlier.
Steps
Variable takes value 0 before the intervention
and 1 afterwards.
34
Intervention variables
Spikes
Equivalent to a dummy variable for handling an
outlier.
Steps
Variable takes value 0 before the intervention
and 1 afterwards.
Change of slope
Variables take values 0 before the intervention
and values {1, 2, 3, . . . } afterwards.
34
Holidays
35
Trading days
z1 = # Mondays in month;
z2 = # Tuesdays in month;
..
.
z7 = # Sundays in month.
36
Distributed lags
37
Nonlinear trend
38
Nonlinear trend
38
Nonlinear trend
NOT RECOMMENDED!
38
Example: Boston marathon winning times
autoplot(marathon) +
xlab("Year") + ylab("Winning times in minutes")
170
Winning times in minutes
160
150
140
130
170
Winning times in minutes
160
150
140
130
autoplot(residuals(fit.lin)) +
xlab("Year") + ylab("Residuals from a linear trend")
Residuals from a linear trend
10
−10
41
Example: Boston marathon winning times
# Linear trend
fit.lin <- tslm(marathon ~ trend)
fcasts.lin <- forecast(fit.lin, h=10)
# Exponential trend
fit.exp <- tslm(marathon ~ trend, lambda = 0)
fcasts.exp <- forecast(fit.exp, h=10)
42
Example: Boston marathon winning times
Boston Marathon
160
Winning times in minutes
Exponential
Linear
140 Piecewise
120
10 ●
●
● ● ●
●
● ●
● ● ● ● ● ●
● ● ●
● ● ● ● ●
● ● ● ● ● ●
● ● ●
● ● ● ● ●
● ● ● ● ●
● ● ● ● ● ●
0 ●
●
● ●
●
●
●
●
●
●
●
●
●
● ● ●
● ●
●
●
●
● ●
● ● ●
●
●
● ● ●
●
●
●
●
● ●
●
● ● ● ● ●
● ● ●
● ●
●
● ●
● ● ●
● ●
● ● ●
●
● ● ●
−10 ●
●
● ●
●
●
0.3
0.2
20
0.1
count
ACF
0.0
10
−0.1
−0.2
0
0 5 10 15 20 −20 −10 0 10 20
Lag residuals
44
Interpolating splines
45
Interpolating splines
46
Interpolating splines
47
Interpolating splines
48
Interpolating splines
48
General linear regression splines
49
General cubic splines
50
Example: Boston marathon winning times
# Spline trend
library(splines)
t <- time(marathon)
fit.splines <- lm(marathon ~ ns(t, df=6))
summary(fit.splines)
##
## Call:
## lm(formula = marathon ~ ns(t, df = 6))
##
## Residuals:
## Min 1Q Median 3Q Max
## -13.0028 -2.5722 0.0122 2.1242 21.5681
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 168.447 2.086 80.743 < 2e-16 ***
## ns(t, df = 6)1 -6.948 2.688 -2.584 0.011 *
## ns(t, df = 6)2 -28.856 3.416 -8.448 1.16e-13 ***
## ns(t, df = 6)3 -35.081 3.045 -11.522 < 2e-16 ***
## ns(t, df = 6)4 -32.563 2.652 -12.279 < 2e-16 ***
## ns(t, df = 6)5 -64.847 5.322 -12.184 < 2e-16 ***
## ns(t, df = 6)6 -21.002 2.403 -8.741 2.46e-14 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.834 on 113 degrees of freedom
## Multiple R-squared: 0.8418, Adjusted R-squared: 0.8334
51
## F-statistic: 100.2 on 6 and 113 DF, p-value: < 2.2e-16
Example: Boston marathon winning times
Boston Marathon
160
Winning times in minutes
Cubic spline
Piecewise linear
140
120
● ●
●
● ●
160 ● ●
●● ●
● ●
● ● ● ●●●● ●●
● ● ● ●● ●●
marathon
● ●
● ● ● ●● ●● ● ● ●
● ●
● ●● ● ● ●
●
●●
● ● ●● ●
140 ● ● ● ● ●
●● ●● ● ●
●●● ●● ●
● ● ● ● ●
● ● ● ● ●
● ● ●● ●●●● ●● ● ● ●●● ●●●●●● ●●
● ●●● ● ● ●
●●
●
●
120
53
splinef
54
Outline
55
Selecting predictors
56
Selecting predictors
However . . .
R2 does not allow for “degrees of freedom”.
Adding any variable tends to increase the value
of R2 , even if that variable is irrelevant.
58
Comparing regression models
However . . .
R2 does not allow for “degrees of freedom”.
Adding any variable tends to increase the value
of R2 , even if that variable is irrelevant.
To overcome this problem, we can use adjusted R2 :
T−1
R̄2 = 1 − (1 − R2 )
T−k−1
where k = no. predictors and T = no. observations.
58
Comparing regression models
However . . .
R2 does not allow for “degrees of freedom”.
Adding any variable tends to increase the value
of R2 , even if that variable is irrelevant.
To overcome this problem, we can use adjusted R2 :
T−1
R̄2 = 1 − (1 − R2 )
T−k−1
where k = no. predictors and T = no. observations.
Maximizing R̄2 is equivalent to minimizing σ̂ 2 .
1 T
2
ε2t
X
σ̂ =
T − k − 1 t=1 58
Cross-validation
59
Cross-validation
Traditional evaluation
Training data Test data
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● time
60
Cross-validation
Traditional evaluation
Training data Test data
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● time
Leave-one-out cross-validation
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 60
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
Cross-validation
Five-fold cross-validation
20 observations. 4 test observations per fold
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
62
Cross-validation
Five-fold cross-validation
20 observations. 4 test observations per fold
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
Ten-fold cross-validation
20 observations. 2 test observations per fold
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
62
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
Cross-validation
Ten-fold cross-validation
Randomly split data into 10 parts.
Select one part for test set, and use remaining
parts as training set. Compute accuracy
measures on test observations.
Repeat for each of 10 parts
Average over all measures.
63
Akaike’s Information Criterion
64
Akaike’s Information Criterion
65
Bayesian Information Criterion
66
Bayesian Information Criterion
66
Choosing regression variables
67
Choosing regression variables
68
Choosing regression variables
70
Ex-ante versus ex-post forecasts
71
Scenario based forecasting
72
Building a predictive regression model
73
Beer production
beer2 <- window(ausbeer, start=1992)
fit.beer <- tslm(beer2 ~ trend + season)
fcast <- forecast(fit.beer)
autoplot(fcast) +
ggtitle("Forecasts of beer production using regression") +
xlab("Year") + ylab("megalitres")
Forecasts of beer production using regression
500
megalitres
level
450
80
95
400
350 74
1995 2000 2005 2010
US Consumption
75
US Consumption
autoplot(uschange[, 1]) +
ylab("% change in US consumption") +
autolayer(fcast.up, PI = TRUE, series = "increase") +
autolayer(fcast.down, PI = TRUE, series = "decrease") +
guides(colour = guide_legend(title = "Scenario"))
2
% change in US consumption
Scenario
0 decrease
increase
−1
−2
77
Matrix formulation
78
Matrix formulation
78
Matrix formulation
Then
y = Xβ + ε. 78
Matrix formulation
79
Matrix formulation
β̂ = (X 0 X)−1 X 0 y
79
Matrix formulation
β̂ = (X 0 X)−1 X 0 y
(The “normal equation”.)
79
Matrix formulation
β̂ = (X 0 X)−1 X 0 y
(The “normal equation”.)
1
σ̂ 2 = (y − X β̂)0 (y − X β̂)
T−k−1
Note: If you fall for the dummy variable trap, (X 0 X) is
a singular matrix.
79
Likelihood
80
Likelihood
80
Likelihood
80
Likelihood
80
Multiple regression forecasts
Optimal forecasts
ŷ∗ = E(y∗ |y, X, x∗ ) = x∗ β̂ = x∗ (X 0 X)−1 X 0 y
where x∗ is a row vector containing the values of the
predictors for the forecasts (in the same format as X).
81
Multiple regression forecasts
Optimal forecasts
ŷ∗ = E(y∗ |y, X, x∗ ) = x∗ β̂ = x∗ (X 0 X)−1 X 0 y
where x∗ is a row vector containing the values of the
predictors for the forecasts (in the same format as X).
Forecast variance
Var(y∗ |X, x∗ ) = σ 2 1 + x∗ (X 0 X)−1 (x∗ )0
h i
81
Multiple regression forecasts
Optimal forecasts
ŷ∗ = E(y∗ |y, X, x∗ ) = x∗ β̂ = x∗ (X 0 X)−1 X 0 y
where x∗ is a row vector containing the values of the
predictors for the forecasts (in the same format as X).
Forecast variance
Var(y∗ |X, x∗ ) = σ 2 1 + x∗ (X 0 X)−1 (x∗ )0
h i
Fitted values
ŷ = X β̂ = X(X 0 X)−1 X 0 y = Hy
where H = X(X 0 X)−1 X 0 is the “hat matrix”.
82
Multiple regression forecasts
Fitted values
ŷ = X β̂ = X(X 0 X)−1 X 0 y = Hy
where H = X(X 0 X)−1 X 0 is the “hat matrix”.
Leave-one-out residuals
Let h1 , . . . , hT be the diagonal values of H, then the
cross-validation statistic is
1X T
CV = [et /(1 − ht )]2 ,
T t=1
where et is the residual obtained from fitting the
model to all T observations. 82
Outline
83
Correlation is not causation
84
Multicollinearity
85
Multicollinearity
If multicollinearity exists. . .
the numerical estimates of coefficients may be
wrong (worse in Excel than in a statistics
package)
don’t rely on the p-values to determine
significance.
there is no problem with model predictions
provided the predictors used for forecasting are
within the range used for fitting.
omitting variables can help.
combining variables can help. 86
Outliers and influential observations