BOOKFE Kalot14

Financial Econometrics
PART II
Lecturer: Dr Elena Kalotychou

e.kalotychou@city.ac.uk
Cass Business School, City University

FINANCIAL ECONOMETRICS, PART II
Lecturer: Dr Elena Kalotychou
EDUCATIONAL AIM
The main objective of the course is to introduce students to the fundamental concepts of
econometrics and its usefulness in analysing financial data. The module is designed to
give students an understanding of why econometrics is necessary, to provide for them a
working ability with basic econometric tools and illustrate their application in finance.
More broadly, it is concerned with estimating financial relationships, testing assumptions
involving financial behaviour and forecasting financial variables. The course is highly
participative; there will be exercises on the development of empirical models that are
coherent with financial data and on the explanation of estimation results.
EDUCATIONAL OBJECTIVES
Introduce students to the basic econometric tools for empirical modelling
Accustom students with applying these tools to estimation, statistical inference, and
forecasting in financial markets.
Develop the necessary skills to critically interpret the results of econometric analyses.
LEARNING OUTCOMES
Have a solid knowledge of the econometrics required to formulate and test
appropriate financial models.
Have a comprehensive understanding of how information from observed financial
data could be processed.
Be able to draw conclusions regarding the financial behaviour and to assess critically
the extant empirical research in Finance.
Appreciate the range of more advanced techniques that exist and have the foundations
for further study of econometrics.
TEACHING FORMAT
The course will comprise 11 lectures of 2 hours contact time each. Students are also
expected to attend regularly scheduled workshops. The latter are intended to demonstrate
the use of the econometric package EViews for the practical implementation of the
theoretical material covered during the lectures using a data set provided by the lecturer.
LECTURES
The aforementioned aims and the intended learning outcomes will be addressed in a
series of lectures. The lectures will embody activities such as formal lecturing,
participative discussions, and exercises.
Lecture 1
The Simple Linear Regression Model
Lecture 2
Inference in Simple Regression: Interval Estimation and Hypothesis Testing
Lecture 3
The Multiple Regression Model
Lecture 4 and 5
Heteroskedasticity and Autocorrelation: Causes, Consequences, Remedies
Lecture 6
Misspecification Problems: Multicollinearity, Functional Form, Normality,
Omitted Variables
Lecture 7
Seasonality with Financial Data
Lecture 8
Time Series Analysis and Stationarity
Lecture 9
ARMA Models
Lecture 10
Limited Dependent Variable Models
Lecture 11
Revision
LECTURE 1
What is Econometrics? What is Regression Analysis? The Basic Econometric Model,
Introducing the Error Term, Assumptions of Simple Linear Regression Model, Parameter
Interpretation, Point Estimation, The Method of Ordinary Least Squares, Properties and
Precision of the Least Squares Estimators.
LECTURE 2
Interval Estimation: Confidence Intervals for Regression Parameters, Hypothesis Testing,
Testing a Hypothesis Involving a Linear Combination of Parameters, Test Statistics,
Critical Region, p Value, Scaling and Units of Measurement, Functional Form.
LECTURE 3
The Multiple Regression Model, Assumptions of Multiple Regression Model,
Interpretation of Multiple Regression Equation, The Meaning of the Regression
Coefficients, Single Hypothesis Tests, Goodness of Fit, Joint Hypotheses Tests, The Ftest, Testing the Overall Significance of a Model.
LECTURE 4
The Nature of the Heteroskedasticity Problem, Causes of Heteroskedastic Errors,
Consequences of Ignoring Heteroskedasticity, Detecting Heteroskedasticity: Graphical
Method and Tests, Treatment of Heteroskedasticity.
LECTURE 5
The Nature of the Autocorrelation Problem, First-Order Autocorrelated Errors, Causes of
Autocorrelation, Consequences of Ignoring Autocorrelation, Detecting Autocorrelation:
Graphical Method and Tests, Treatment of Autocorrelation.
LECTURE 6
Other Misspecification Issues, Multicollinearity, Non-normality of Errors, Nonlinearity,
Functional Form, The RESET Misspecification Test, Omitted Variable Bias, Irrelevant
Variables.
LECTURE 7
The Use of Intercept Dummy Variables, The Use of Interaction (Slope) Dummy
Variables, Controlling for Time: Estimating Seasonal Effects, Testing for Structural
Stability.
LECTURE 8
White Noise, Non-Stationary vs. Stationary Processes, Stochastic and Deterministic
Trend, Random Walk, Stationarity, Unit Root Tests.
LECTURE 9
Time Series Regression, ARMA Models, The Autocorrelation
Autocorrelation Functions, Identification, Estimation, Diagnostic Tests.
and
Partial
LECTURE 10
Discrete Variable Models, Linear Probability Model, Probit Model, Logit Model,
Parameter Interpretation, Goodness of Fit.
LECTURE 11
Review of course.
ASSESSMENT
There will be one piece of group coursework, and a written examination that will be
weighted as 30% and 70%, respectively. The coursework is highly empirical, and the
students will have to apply their theoretical and quantitative skills to investigate a given
problem in finance. The coursework should demonstrate a sufficient understanding of the
issues analyzed during the course.
READING LIST
Brooks, C., Introductory Econometrics for Finance, Cambridge University

Press.
Gujarati, D., Basic Econometrics, McGraw-Hill.
Ramanathan, R., Introductory Econometrics with Applications, Southwest.
Hill, R., W. Griffiths and G. Judge, Undergraduate Econometrics, John Wiley.
Wooldridge, J. M., Introductory Econometrics, Thomson
Welcome and Enjoy the Ride!
Dr Elena Kalotychou
Brief Outline of Material in this Booklet
Topic
p 1
Topic 2
Topic 3
Topic 4
Topic 5
Topic 6
Topic 7
Simple
p linear regression
g
assumptions,
p
, OLS estimation
Hypothesis testing single and multiple hypotheses
Multiple regression Cross section and Time series
Goodness of fit statistics
Violations of the assumptions - causes, consequences, remedies
Non-stationarity, unit root tests; ARMA models
Administrative Preliminaries
This module comprises 22 hours of lectures plus EViews sessions in the
computer lab.
Attend all the lectures/labs and dont switch groups.
Textbook: Brooks, but others are also good.
Assessment:
Comprising 30% group coursework and 70% individual final exam.
Motivations for this module:

Finance is quantitative
Most jobs have a quantitative element
Introduction:
The Nature and Purpose of Econometrics
What is Econometrics?
Literal meaning is measurement in economics.
Definition of financial econometrics:

The application of statistical and mathematical techniques to problems in
finance.
Examples of the kinds of problems that

may be solved by an Econometrician
1. Testing whether financial markets are weak-form informationally efficient.

2. Testing whether the CAPM or APT represent superior models for the
determination of returns on risky assets.
3. Measuring and forecasting the volatility of bond returns.
4. Explaining the determinants of bond credit ratings used by the ratings agencies.
5. Modelling long-term relationships between prices and exchange rates.
Examples of the kinds of problems that

may be solved by an Econometrician (contd)
6. Determiningg the optimal

p
hedge
g ratio for a spot
p pposition in oil.
7. Testing technical trading rules to determine which makes the most money.
8. Testing the hypothesis that earnings or dividend announcements have no effect
on stock prices.
9. Testing whether spot or futures markets react more rapidly to news.
10. Forecasting the correlation between the returns to the stock indices of
two countries.
6
What are the Special Characteristics

of Financial Data?
Frequency & quantity of data

Stock market prices are measured every time there is a trade or somebody
posts a new quote.
Quality
Recorded asset prices are usually those at which the transaction took place.
No possibility for measurement error but financial data are noisy.
Types of Data and Notation

There are 3 types of data which econometricians might use for analysis:
1. Time series data
2. Cross-sectional data
3. Panel data, a combination of 1 & 2
The data may be quantitative (e.g. exchange rates, stock prices, number of
shares outstanding), or qualitative (e.g. day of the week).
Examples of time series data
Series
GNP or unemployment
government budget deficit
money supply
value of a stock market index
Frequency
monthly, or quarterly
annually
weekly
as transactions occur
8
Time-Series Versus Cross-Sectional Data

Examples of Problems that Could be Tackled Using a Time Series Regression
- How the value of a countrys
y stock index has varied with that countrys
y
macroeconomic fundamentals.
- How the value of a companys stock price has varied when it announced the
value of its dividend payment.
- The effect on a countrys currency of an increase in its interest rate.
Cross-sectional data are data on one or more variables collected at a single
point in time, e.g.
- A poll of usage of internet stock broking services .
- Cross-section of stock returns on the New York Stock Exchange.
- A sample of bond credit ratings for UK banks.
9
Cross-Sectional Versus Panel Data

Examples of Problems that Could be Tackled Using a Cross-Sectional
Regression
- The relationship between company size and the return to investing in its
shares
- The relationship between a countrys GDP level and the probability that the
government will default on its sovereign debt.
Panel Data has the dimensions of both time series and cross-sections, e.g. the
daily prices of a number of blue chip stocks over two years.
It is common to denote each observation by the letter t and the total number of
observations by T for time series data, and to to denote each observation by the
letter i and the total number of observations by N for cross-sectional data.
10
Returns in Financial Modelling
It is preferable not to work directly with asset prices, so we usually convert the
raw prices into a series of returns. There are two ways to do this:
Simple returns
or
log returns
Rt
pt pt 1
100%
pt 1
p
rt ln t 100 %
pt 1
where, Rt and rt denote the return at time t

pt denotes the asset price at time t
ln denotes the natural logarithm
We also assume that the price series have been already adjusted to account for
dividends.
11
Log Returns
The returns are also known as log price relatives.
We will use the log-returns.
g
There are a number of reasons for this:
1. They have the nice property that they can be interpreted as continuously
compounded returns.
2. Can add them up, e.g. if we want a weekly return and we have calculated
daily log returns:
r1 = ln p1/p0 = ln p1 - ln p0
r2 = ln p2/p1 = ln p2 - ln p1
r3 = ln p3/p2 = ln p3 - ln p2
r4 = ln p4/p3 = ln p4 - ln p3
r5 = ln p5/p4 = ln p5 - ln p4
ln p5 - ln p0 = ln p5/p0
12
Steps involved in the formulation of

econometric models
Economic or Financial Theory (Previous Studies)
Formulation of an Estimable Theoretical Model
Collection of Data
Model Estimation
Is the Model Statistically Adequate?
No
Yes
Reformulate Model
Interpret Model
Use for Analysis
13
Regression
Regression
g
is pprobably
y the single
g most important
p
tool at the econometricians
disposal.
But what is regression analysis?
It is concerned with describing and evaluating the relationship between a given
variable (usually called the dependent variable) and one or more other
variables (usually known as the independent variable(s)).
variable(s))
14
Some Notation
Denote the dependent variable by y and the independent variable(s) by
x1, x2, ... , xk where there are k independent variables.
Some alternative names for the y and x variables:
y
x
dependent variable
independent variables
regressand
regressors
effect variable
causal variables
p
variable
explanatory
p
y variable
explained
Note that there can be many x variables but we will limit ourselves to the case
where there is only one x variable to start with. In our set-up, there is only one
y variable.
15
Regression is different from Correlation

If we say y and x are correlated, it means that we are treating y and x in a
completely symmetrical way.
In regression, we treat the dependent variable (y) and the independent
variable(s) (xs) very differently.
The y variable is assumed to be random or stochastic in some way, i.e. to
have a probability distribution.
The x variables are, however, assumed to have fixed (non-stochastic) values
in repeated samples.
16
Simple Regression
For simplicity, suppose that k 1. This is the situation where y depends on only
one x variable.
Examples of the kind of relationship that may be of interest include:
How asset returns vary with their level of market risk
Measuring the long-term relationship between spot prices and dividends.
Constructing an optimal hedge ratio
17
Simple Regression: An Example

Suppose that we have the following data on the excess returns on a fund
managers portfolio (fund XXX) together with the excess returns on a market
i d
index:
Year t
1
2
3
4
5
Excess return = rXXXt rft

17.8
39.0
12.8
24.2
.
17.2
Excess return on market

index = rmt rf t
13.7
23.2
6.9
16.8
6.8
12.3
We have some intuition that the beta on this fund is positive, and we therefore
want to find whether there appears to be a relationship between x and y given
the data that we have. The first stage would be to form a scatter plot of the two
variables.
18
Graph (Scatter Diagram)
Exce
ess return on fund XXX
45
40
35
30
25
20
15
10
5
0
0
10
15
20
25
Excess return on market portfolio

19
Finding a Line of Best Fit
We can use the ggeneral equation

q
for a straight
g line,,
y = a + bx
to get the line that best fits the data.
However, this equation (y = a + bx) is completely deterministic.
Is this realistic? No. So what we do is to add a random disturbance term, u into
the equation.
yt = + xt + ut
where t = 1,2,3,4,5
20
Why do we include a Disturbance term?
The disturbance term can capture a number of features:

- We always leave out some determinants of yt
- There may be errors in the measurement of yt that cannot be
modelled.
- Random outside influences on yt which we cannot model
21
Determining the Regression Coefficients

So how do we determine and ?
Choose and so that the ((vertical)) distances from the data ppoints to the
fitted lines are minimised (so that the line fits the data as closely as possible):
y
22
Ordinary Least Squares
The most common method used to fit a line to the data is known as
OLS (ordinary least squares).
squares)
What we actually do is take each distance and square it (i.e. take the
area of each of the squares in the diagram) and minimise the total sum
of the squares (hence least squares).
Tightening up the notation, let
yt denote the actual data point t
y t denote the fitted value from the regression line
ut denote the residual, yt - yt
23
Actual and Fitted Value
yt
u t
y t
xt
x
24
How OLS Works
2
2
2
2
2
2
So min. u1 u2 u3 u4 u5 , or minimise ut . This is known as the
t 1
residual sum of squares.
But what was ut ?

the line, yt - y t .
It was the difference between the actual point and
So minimising yt y t is equivalent to minimising

with respect to and .
2
ut2
25
Deriving the OLS Estimator
But
y t x t , so let
L ( y t y t ) 2 ( y t xt ) 2
t
Want to minimise L with respect to (w.r.t.) and , so differentiate L w.r.t.

and
L
(1)
2 ( yt xt ) 0

t
L
2 xt ( yt xt ) 0 (2)
t
From (1),
yt xt 0 yt T xt 0
t
But
yt T y
and
xt
Tx .
26
Deriving the OLS Estimator: 2

So we can write Ty T Tx 0
y x 0
or
xt ( y t xt ) 0
From (2),
(3)
(4)
y x
From (3),
(5)
Substitute into (4) for from (5),

(5)
xt ( yt y x xt ) 0
t
xt yt y xt x xt xt
xt yt Tyx Tx 2 xt
27
Deriving the OLS Estimator: 3
Rearranging for ,
(Tx 2 xt2 ) Tyx xt yt
So overall we have
xt yt Tx y
xt2 Tx 2
y x
This method of finding the optimum is known as ordinary least squares.

28
What do we use and for?

In the CAPM example used above, plugging the 5 observations in to make up
the formulae given above would lead to the estimates
1.64.
and
We would write the fitted line as:
1.74
y t 1.74 1.64 xt
Question: If an analyst tells you that she expects the market to yield a return
20% higher than the risk-free rate next year, what would you expect the return
on fund XXX to be?
Solution: We can say that the expected value of y = -1.74 + 1.64 value of x,
so plug x = 20 into the equation to get the expected value for y:
y t 1.74 1.64 20 31.06

29
Accuracy of Intercept Estimate

Care needs to be exercised when considering the intercept estimate,
particularly if there are no or few observations close to the y-axis:
y
30
The Population and the Sample

The population is the total collection of all objects or people to be studied, for
example,
Interested in
predicting outcome
of an election
Population of interest
the entire electorate
A sample is a selection of just some items from the population.

A random sample is a sample in which each individual item in the population is
equally likely to be drawn.
31
The DGP and the PRF

The population regression function (PRF) is a description of the model that
is thought to be generating the actual data and the true relationship
b t
between
th variables
the
i bl (i.e.
(i the
th true
t
values
l
off andd ).
)
The PRF is
yt xt ut
The SRF is
y t x t
and we also know that
ut yt y t
We use the SRF to infer likely values of the PRF.

We also want to know how good our estimates of and are.
32
Linearity
In order to use OLS, we need a model which is linear in the parameters (
and ). It does not necessarily have to be linear in the variables (y and x).
Linear in the parameters means that the parameters are not multiplied
together, divided, squared or cubed etc.
Some models can be transformed to linear ones by a suitable substitution
or manipulation. For example, if theory suggests that y and x should be
inversely related:
yt ut
xt
then the regression can be estimated using OLS by substituting
1
zt
xt
33
Linear and Non-Linear Models
But some models are intrinsically non-linear, e.g.

yt xt ut
Some models look as if they cannot be estimated using OLS. But they can by
taking logs:
Y t e X t e u t ln Yt ln X t u t
Then let yt=ln Yt and xt=ln Xt
yt xt ut
This is known as the exponential regression model. Here, the coefficients can
be interpreted as elasticities.
34
Estimator or Estimate?
Estimators are the formulae used to calculate the coefficients
Estimates are the actual numerical values for the coefficients.
35
The Assumptions Underlying the

Classical Linear Regression Model (CLRM)
The model which we have used is known as the classical linear regression
model.
We observe data for xt, but since yt also depends on ut, we must be specific
about how the ut are generated.
We usually make the following set of assumptions about the uts (the
unobservable error terms):
Interpretation
Technical Notation
1. E(ut) = 0
The errors have zero mean
2. Var (ut) = 2
The variance of the errors is constant and
finite over all values of xt
3. Cov (ui,uj)=0
The errors are statistically independent of
one another
No relationship between the error and
4. Cov (ut,xt)=0
corresponding x variate
36
The Fifth Assumption: Normality
An alternative assumption to 4, which is slightly stronger, is that the xts are

non-stochastic or fixed in repeated samples.
samples
A fifth assumption is required if we want to make inferences about the
population parameters (the actual and ) from the sample parameters
( and )
Additional Assumption
5. ut is normally distributed
37
Properties of the OLS Estimator
If assumptions
p
1 through
g 4 hold,, then the estimators and determined by
y
OLS are known as Best Linear Unbiased Estimators (BLUE).
What does the acronym stand for?
Estimator
Linear
Unbiased
Best
- is an estimator of the true value of .

- is a linear estimator
- On average, the actual value of the and s will be equal to
th true
the
t
values.
l
- means that the OLS estimator has minimum variance among
the class of linear unbiased estimators. The Gauss-Markov
theorem proves that the OLS estimator is best.
38
Consistency/Unbiasedness/Efficiency
Consistent
The least squares estimators and are consistent. That is, the estimates will
converge to their true values as the sample size increases to infinity. We need the
assumptions E(xtut)=0 and Var(ut)=2 < to prove this. Consistency implies that
lim Pr 0 0
T
Unbiased
The least squares estimates of and are unbiased. That is E( ) = and
E( ) =
Thus on average the estimated value will be equal to the true values. To prove this
also
l requires
i
th assumption
the
ti that
th t E(u
E( t)=0.
) 0 Unbiasedness
U bi d
i a stronger
is
t
condition
diti
than consistency.
Efficiency
An estimator of parameter is said to be efficient if it is unbiased and no other
unbiased estimator has a smaller variance. If the estimator is efficient, we are
minimising the probability that it is a long way off from the true value of .
39
Precision and Standard Errors

Any set of regression estimates of and are specific to the sample used in
their estimation.
Recall that the estimators of and from the sample parameters ( and ) are
given by
x y Tx y
t 2 t
and y x
xt T x 2
What we need is some measure of the reliability or precision of the estimators
( and ). The precision of the estimate is given by its standard error. Given
assumptions 1 - 4 above, then the standard errors can be shown to be given by
SE ( ) s
xt2
T ( xt x ) 2
SE ( ) s
1
s
( xt x ) 2
xt
T xt2 T 2 x 2
xt2
1
Tx 2
where s is the estimated standard deviation of the residuals.
40
Estimating the Variance of the Disturbance Term

The variance of the random variable ui is given by
( t) = E[(u
[( t))-E(u
( t)]2
Var(u
which reduces to
Var(ut) = E(ut2)
We could estimate this using the average of ut2 :
1
ut2
T
Unfortunately, this is not workable since ut is not observable. We can use the
sample counterpart to ut , which is ut :
1
s 2 ut2
T
But this estimator is a biased estimator of 2.
s2
41
Estimating the Variance of the Disturbance Term

(contd)
An unbiased estimator of is given by
s
where
ut2
ut2
T 2
is the residual sum of squares and T is the sample size.
Some Comments on the Standard Error Estimators

1. Both SE( ) and SE( ) depend on s2 (or s). The greater the variance s2,
then the more dispersed the errors are about their mean value and therefore
th more dispersed
the
di
d y will
ill be
b about
b t its
it mean value.
l
2. The sum of the squares of x about their mean appears in both formulae.
The larger the sum of squares, the smaller the coefficient variances.
42
Example: How to Calculate Parameter Estimates
Assume we have the following data calculated from a regression of y on a

single variable x and a constant over 22 observations.
Data:
xt yt 830102
x 416.5
xt2 3919654
y 86.65
Calculations:
T 22
RSS 130.6
830102 22 416.5 86.65

3919654 22 416.52
0.35
86.65 0.35 416.5 59.12

y t x t
We write
y t 59.12 0.35 xt
43
Example: How to Get Standard Error Estimates

SE(regression),
ut2
T 2
130.6
2.55
20
SE ( ) 2.55
3919654
3.35
22 3919654 22 416.5 2
SE ( ) 2.55
1
0.0079
3919654 22 416.5 2
We now write the results as

y t 59.12 0.35 xt
(3.35) (0.0079)
44
An Introduction to Statistical Inference
We want to make inferences about the likely population values from the
regression parameters.
Example: Suppose we have the following regression results:
y t 20 . 3 0 . 5091 x t
(14 . 38 ) ( 0 . 2561 )
0.5091 is a single (point) estimate of the unknown population

parameter, .
How reliable is this estimate?
The reliability of the point estimate is measured by the coefficients standard
error.
45
Hypothesis Testing: Some Concepts

We can use the information in the sample to make inferences about the
population.
We will always have two hypotheses that go together, the null hypothesis
(denoted H0) and the alternative hypothesis (denoted H1).
The null hypothesis is the statement or the statistical hypothesis that is actually
being tested. The alternative hypothesis represents the remaining outcomes of
interest.
For example, suppose given the regression results above, we are interested in
the hypothesis
yp
that the true value of is in fact 0.5. We would use the notation
H0 : = 0.5
H1 : 0.5
This would be known as a two sided test.
46
One-Sided Hypothesis Tests

Sometimes we may have some prior information that, for example, we would
expect > 0.5 rather than < 0.5. In this case, we would do a one-sided test:
H0 : = 0.5
H1 : > 0.5
or we could have had
H0 : = 0.5
H1 : < 0.5
There are two ways to conduct a hypothesis test: via the test of significance
approach or via the confidence interval approach.
47
The Probability Distribution of the

Least Squares Estimators
We assume that ut N(0,2)
Since the least squares estimators are linear combinations of the random
variables
wt yt
i.e.
The weighted sum of normal random variables is also normally distributed, so
N(, Var())
N(, Var())
What if the disturbances are not normally distributed? Will the parameter
estimates still be normally distributed?
Yes, if the other assumptions of the CLRM hold, and the sample size is
sufficiently large.
48
The Probability Distribution of the

Least Squares Estimators (contd)
Standard normal variates can be constructed from and :

~ N 0,1
var
and

~ N 0,1
var
But var() and var() are unknown, so

~ tT 2
SE ( )
andd

~ tT 2
SE ( )
49
Testing Hypotheses:
The Test of Significance Approach
Assume that , for t=1,2,...,T , the regression equation is given by
yt xt ut
The steps involved in doing a test of significance are:

1. Estimate , and SE ( ) , SE ( ) in the usual way
2. Calculate the test statistic. This is given by the formula

test statistic
SE
where is the value of under the null hypothesis.

50
The Test of Significance Approach
3. We need some tabulated distribution with which to compare

p
the estimated
test statistics. Test statistics derived in this way can be shown to follow a
t-distribution with T-2 degrees of freedom.
As the number of degrees of freedom increases, we need to be less cautious in
our approach since we can be more sure that our results are robust.
4. We need to choose a significance level, often denoted . This is also
sometimes called the size of the test and it determines the region where we
will reject or not reject the null hypothesis that we are testing. It is
conventional to use a significance level of 5%
5%.
Intuitive explanation is that we would only expect a result as extreme as this
or more extreme 5% of the time as a consequence of chance alone.
Conventional to use a 5% size of test, but 10% and 1% are also commonly
used.
51
Determining the Rejection Region for A Test of Significance
5. Given a significance level, we can determine a rejection region and nonrejection region.
region For a 2-sided test:
f(x)
2.5%
rejection region
95% non-rejection
i
2.5%
rejection region
52
The Rejection Region for a 1-Sided Test (Upper Tail)
f(x)
95% non-rejection
5% rejection region
53
The Rejection region for a 1-Sided Test (Lower Tail)
f(x)
95% non-rejection
j
region
g
5% rejection region
54
The Test of Significance Approach: Drawing Conclusions
66. Use the t-tables

tables to obtain a critical value or values with which to compare
the test statistic.
7. Finally perform the test. If the test statistic lies in the rejection region then
reject the null hypothesis (H0), else do not reject H0.
55
A Note on the t and the Normal Distribution
You should all be familiar with the normal distribution and its characteristic
bell
bell shape.
We can scale a normal variate to have zero mean and unit variance by
subtracting its mean and dividing by its standard deviation.
There is, however, a specific relationship between the t- and the standard
normal distribution. Both are symmetrical and centred on zero.
But the t-distribution has another parameter - its degrees of freedom. We will
always know this (for the time being from the number of observations - 2).
56
What Does the t-Distribution Look Like?
normal distribution
t-distribution
57
Comparing the t and the Normal Distribution

In the limit, a t-distribution with an infinite number of degrees of freedom is a
standard normal, i.e.
t N 0,1
Examples from statistical tables:
N(0,1) t(40)
Significance level
50%
0
0
5%
1.64
1.68
2.5%
1.96
2.02
0.5%
2.57
2.70
t(4)
0
2.13
2.78
4.60
The reason for using the t-distribution rather than the standard normal is that
we had to estimate 2, the variance of the disturbances.
58
The Confidence Interval Approach

to Hypothesis Testing
An example of its usage: We estimate a parameter, say to be 0.93, and a 95%

confidence interval to be (0.77,1.09). This means that we are 95% confident
that the interval containing the true (but unknown) value of .
Confidence intervals are almost invariably two-sided, although in theory a onesided

id d interval
i t
l can be
b constructed.
t t d
59
How to Carry out a Hypothesis Test

Using Confidence Intervals
1. Calculate , and SE ( ) , SE ( ) as before.
2. Choose a significance level, , (again the convention is 5%). This is
equivalent to choosing a (1-)100% confidence interval, i.e. 5% significance
level = 95% confidence interval
3. Use the t-tables to find the appropriate critical value, which will again have
T-2 degrees of freedom.
4 The
4.
Th confidence
fid
i t
interval
l is
i given
i
b t crit SE , t crit SE
by
5. Perform the test: If the hypothesised value of (*) lies outside the
confidence interval, then reject the null hypothesis that = *, otherwise do
not reject the null.
60
Confidence Intervals Versus Tests of Significance

Note that the Test of Significance and Confidence Interval approaches always
give the same answer.
Under the test of significance approach, we would not reject H0 that = * if
the test statistic lies within the non-rejection region, i.e. if
t crit

t crit
SE
Rearranging, we would not reject if
t crit SE t crit SE
t crit SE t crit SE
But this is just the rule under the confidence interval approach.
61
Conducting Tests of Significance and Constructing

Confidence Intervals: An Example
Using
g the regression
g
results above,,
y t 20.3 0.5091xt
(14.38) (0.2561)
T=22
Using both the test of significance and confidence interval approaches, test the
hypothesis that =1 against a two-sided alternative.
The first step is to obtain the critical value. We want tcritit = t20;5%
20 5%
62
Reading the t-tables
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
35
40
45
50
60
70
80
90
100
120
150
200
300
0.4
0.3249
0.2887
0 2767
0.2767
0.2707
0.2672
0.2648
0.2632
0.2619
0.2610
0.2602
0.2596
0.2590
0.2586
0.2582
0.2579
0.2576
0.2573
0.2571
0.2569
0.2567
0.2566
0.2564
0.2563
0.2562
0.2561
0 2560
0.2560
0.2559
0.2558
0.2557
0.2556
0.2553
0.2550
0.2549
0.2547
0.2545
0.2543
0.2542
0.2541
0.2540
0.2539
0.2538
0.2537
0.2536
0.2533
0.25
0.15
1.0000
0.8165
0 7649
0.7649
0.7407
0.7267
0.7176
0.7111
0.7064
0.7027
0.6998
0.6974
0.6955
0.6938
0.6924
0.6912
0.6901
0.6892
0.6884
0.6876
0.6870
0.6864
0.6858
0.6853
0.6848
0.6844
0 6840
0.6840
0.6837
0.6834
0.6830
0.6828
0.6816
0.6807
0.6800
0.6794
0.6786
0.6780
0.6776
0.6772
0.6770
0.6765
0.6761
0.6757
0.6753
0.6745
1.9626
1.3862
1 2498
1.2498
1.1896
1.1558
1.1342
1.1192
1.1081
1.0997
1.0931
1.0877
1.0832
1.0795
1.0763
1.0735
1.0711
1.0690
1.0672
1.0655
1.0640
1.0627
1.0614
1.0603
1.0593
1.0584
1 0575
1.0575
1.0567
1.0560
1.0553
1.0547
1.0520
1.0500
1.0485
1.0473
1.0455
1.0442
1.0432
1.0424
1.0418
1.0409
1.0400
1.0391
1.0382
1.0364
0.1
3.0777
1.8856
1 6377
1.6377
1.5332
1.4759
1.4398
1.4149
1.3968
1.3830
1.3722
1.3634
1.3562
1.3502
1.3450
1.3406
1.3368
1.3334
1.3304
1.3277
1.3253
1.3232
1.3212
1.3195
1.3178
1.3163
1 3150
1.3150
1.3137
1.3125
1.3114
1.3104
1.3062
1.3031
1.3006
1.2987
1.2958
1.2938
1.2922
1.2910
1.2901
1.2886
1.2872
1.2858
1.2844
1.2816
0.05
6.3138
2.9200
2 3534
2.3534
2.1318
2.0150
1.9432
1.8946
1.8595
1.8331
1.8125
1.7959
1.7823
1.7709
1.7613
1.7531
1.7459
1.7396
1.7341
1.7291
1.7247
1.7207
1.7171
1.7139
1.7109
1.7081
1 7056
1.7056
1.7033
1.7011
1.6991
1.6973
1.6896
1.6839
1.6794
1.6759
1.6706
1.6669
1.6641
1.6620
1.6602
1.6577
1.6551
1.6525
1.6499
1.6449
0.025
0.01
12.7062
4.3027
3 1824
3.1824
2.7764
2.5706
2.4469
2.3646
2.3060
2.2622
2.2281
2.2010
2.1788
2.1604
2.1448
2.1314
2.1199
2.1098
2.1009
2.0930
2.0860
2.0796
2.0739
2.0687
2.0639
2.0595
2 0555
2.0555
2.0518
2.0484
2.0452
2.0423
2.0301
2.0211
2.0141
2.0086
2.0003
1.9944
1.9901
1.9867
1.9840
1.9799
1.9759
1.9719
1.9679
1.9600
31.8205
6.9646
4 5407
4.5407
3.7469
3.3649
3.1427
2.9980
2.8965
2.8214
2.7638
2.7181
2.6810
2.6503
2.6245
2.6025
2.5835
2.5669
2.5524
2.5395
2.5280
2.5176
2.5083
2.4999
2.4922
2.4851
2 4786
2.4786
2.4727
2.4671
2.4620
2.4573
2.4377
2.4233
2.4121
2.4033
2.3901
2.3808
2.3739
2.3685
2.3642
2.3578
2.3515
2.3451
2.3388
2.3263
0.005
0.001
0.0005
63.6567 318.3087 636.6189

9.9248 22.3271 31.5991
5 8409 10.2145
5.8409
10 2145 12.9240
12 9240
4.6041 7.1732 8.6103
4.0321 5.8934 6.8688
3.7074 5.2076 5.9588
3.4995 4.7853 5.4079
3.3554 4.5008 5.0413
3.2498 4.2968 4.7809
3.1693 4.1437 4.5869
3.1058 4.0247 4.4370
3.0545 3.9296 4.3178
3.0123 3.8520 4.2208
2.9768 3.7874 4.1405
2.9467 3.7328 4.0728
2.9208 3.6862 4.0150
2.8982 3.6458 3.9651
2.8784 3.6105 3.9216
2.8609 3.5794 3.8834
2.8453 3.5518 3.8495
2.8314 3.5272 3.8193
2.8188 3.5050 3.7921
2.8073 3.4850 3.7676
2.7969 3.4668 3.7454
2.7874 3.4502 3.7251
2 7787 3.4350
2.7787
3 4350 3.7066
3 7066
2.7707 3.4210 3.6896
2.7633 3.4082 3.6739
2.7564 3.3962 3.6594
2.7500 3.3852 3.6460
2.7238 3.3400 3.5911
2.7045 3.3069 3.5510
2.6896 3.2815 3.5203
2.6778 3.2614 3.4960
2.6603 3.2317 3.4602
2.6479 3.2108 3.4350
2.6387 3.1953 3.4163
2.6316 3.1833 3.4019
2.6259 3.1737 3.3905
2.6174 3.1595 3.3735
2.6090 3.1455 3.3566
2.6006 3.1315 3.3398
2.5923 3.1176 3.3233
2.5758 3.0902 3.2905
Source: Biometrika Tables for Statisticians (1966), Volume 1, 3rd Edition. Reprinted with
permission of Oxford University Press.
63
Determining the Rejection Region
f(x)
2.5% rejection region
-2.086
+2.086
64
Performing the Test

The hypotheses are:
H0 : = 1
H1 : 1
Test of significance
approach
test stat
Confidence interval
approach
t crit SE

SE
0.5091 1
1.917
0.2561
Do not reject H0 since

test stat lies within
non-rejection region
0.5091 2.086 0.2561

0.0251,1.0433
Since 1 lies within the

confidence interval,
do not reject H0
65
Testing other Hypotheses
What if we wanted to test H0 : = 0 or H0 : = 2?

Note that we can test these with the confidence interval approach.
For interest, test
H0 : = 0
vs. H1 : 0
vs.
H0 : = 2
H1 : 2
66
Changing the Size of the Test

But note that we looked at only a 5% size of test. In marginal cases
(e.g. H0 : = 1), we may get a completely different answer if we use a different
size of test. This is where the test of significance approach is better than a
confidence interval.
For example, say we wanted to use a 10% size of test. Using the test of
significance approach,
test stat

SE
0.5091 1
1.917
0.2561
as above. The only thing that changes is the critical t-value.

67
Changing the Size of the Test: The New Rejection Regions
f(x)
5% rejection region
-1.725
5% rejection region
+1.725
68
Changing the Size of the Test: The Conclusion
t20;10% = 1.725. So now, as the test statistic lies in the rejection region, we
would reject H0.
Caution should therefore be used when placing emphasis on or making

decisions in marginal cases (i.e. in cases where we only just reject or not
reject)
reject).
69
Some More Terminology
If we reject the null hypothesis at the 5% level, we say that the result of the test
is statistically significant.
Note that a statistically significant result may be of no practical significance.

E.g. if a shipment of cans of beans is expected to weigh 450g per tin, but the
actual mean weight of some tins is 449g,
449g the result may be highly statistically
significant but presumably nobody would care about 1g of beans.
70
The Errors That We Can Make

Using Hypothesis Tests
We usually reject H0 if the test statistic is statistically significant at a
chosen significance level.
level
There are two possible errors we could make:
1. Rejecting H0 when it was really true. This is called a type I error.
2. Not rejecting H0 when it was in fact false. This is called a type II error.
Result of
Test
Significant
(reject H0)
Insignificant
( do not
reject H0)
Reality
H0 is true
Type I error
=
H0 is false
Type II error
=
71
The Trade-Off Between Type I and Type II Errors
The probability of a type I error is just , the significance level or size of test we
chose.
h
T see this,
To
hi recall
ll what
h we said
id significance
i ifi
at the
h 5% level
l l meant: it
i is
i only
l
5% likely that a result as or more extreme as this could have occurred purely by
chance.
Note that there is no chance for a free lunch here! What happens if we reduce the size
of the test (e.g. from a 5% test to a 1% test)? We reduce the chances of making a type
I error ... but we also reduce the probability that we will reject the null hypothesis at
all, so we increase the probability of a type II error:
less likely
to falsely reject
Reduce size
of test
more strict
criterion for
rejection
reject null
hypothesis
less often
more likely to
incorrectly not
reject
So there is always a trade off between type I and type II errors when choosing a
significance level. The only way we can reduce the chances of both is to increase the
sample size.
72
The Exact Significance Level or p-value
This is equivalent to choosing an infinite number of critical t-values from

tables. It gives us the marginal significance level where we would be
indifferent between rejecting and not rejecting the null hypothesis.
If the test statistic is large in absolute value, the p-value will be small, and
vice versa. The p-value gives the plausibility of the null hypothesis.
e.g. a test statistic is distributed as a t62 = 1.47.
The p-value = 0.12.
Do we reject at the 5% level?...........................No
Do we reject at the 10% level?.........................No
Do we reject at the 20% level?.........................Yes
73
Generalising the Simple Model to

Multiple Linear Regression
Before, we have used the model
t = 1,2,...,T
12 T
yt xt ut
But what if our dependent (y) variable depends on more than one
independent variable?
For example the number of cars sold might plausibly depend on
1. the price of cars
2. the price of public transport
3. the price of petrol
4 the
4.
th extent
t t off the
th publics
bli concern about
b t global
l b l warming
i
Similarly, stock returns might depend on several factors.
Having just one independent variable is no good in this case - we want to
have more than one x variable. It is very easy to generalise the simple
model to one with k-1 regressors (independent variables).
74
Multiple Regression in Matrix Form

Now we write yt 1 2 x2t 3 x3t ... k xkt ut
where t=1,2,...,T
We can write this in matrix form y = X +u

where
y is T 1
X is T k
is k 1
u is T 1
e.g. if k is 2, we have 2 regressors
y1 1 x21
u1
y 1 x

22 1 u 2
2
2
y 1 x
u
T
T
2T
T 1
T2
21 T1
75
How Do We Calculate the Parameters (the )

in this Generalised Case?
Previously, we took the residual sum of squares, and minimised it
w r t and .
w.r.t.
In the matrix notation, we have
u1
u
u 2

uT
The RSS would be given by
u ' u u1 u 2
u1
u
uT 2 u12 u22 ... uT2 ut2

uT
76
The OLS Estimator for Multiple Regression Models
In order to obtain the parameter estimates, 1, 2,..., k, we would

minimise the RSS with respect to all the s.
It can be shown that
2 X ' X 1 X ' y

k
77
Calculating the Standard Errors in the Multiple Regression

Model
Check the dimensions: is k 1 as required.
But how do we calculate the standard errors of the coefficient estimates?
ut2 .
Previously, to estimate the variance of the errors, 2, we used s 2
T 2
Now using the matrix notation, we use
s2
u 'u
T k
where k = number of regressors. It can be proved that the OLS estimator of

1
2
the variance of is given by the diagonal elements of s X ' X , so that
the variance of 1 is the first element, the variance of 2 is the second
element, and , and the variance of k is the kth diagonal element.
78
Calculating Parameter and Standard Error Estimates for

Multiple Regression Models: An Example
Example: The following model with k=3 is estimated over 15 observations:
y t 1 2 x 2 t 3 x3 t u t
and the following data have been calculated from the original Xs.
2.0 3.5 1.0
3.0
1
X ' X 3.5 1.0 6.5 X ' y 2.2

u ' u 10.96
1.0 6.5 4.3
0.6
Calculate the coefficient estimates and their standard errors.
To calculate the coefficients, just multiply the matrix by the vector to obtain
X ' X 1 X ' y
To calculate the standard errors, we need an estimate of 2.
s2
RSS 10.96
0.91
T k 15 3
79
Calculating Standard Errors for Multiple Regression

Models: An Example
The variance-covariance matrix of is given by
1.83 3.20 0.91
1
1
2
s X ' X 0.91 X ' X 3.20 0.91 5.94
0.91 5.94 3.93

The variances are on the leading diagonal:

Var 3 3.93
Var 1 1.83
Var 2 0.91

SE 3 1.98
SE 1 1.35
SE 2 0.96
We write:
y t 1.10 4.40 x2 t 19 .88 x3t
1.35 0.96 1.98
80
A Special Type of Hypothesis Test: The t-ratio
Recall that the formula for a test of significance approach to hypothesis

testing using a tt-test
test was
test statistic
i i
SE i
H0 : i = 0
H1 : i 0
i.e. a test that the population coefficient is zero against a two-sided
alternative this is known as a t-ratio
alternative,
t ratio test:
If the test is
Since i* = 0,
test statistic
i
SE i
The ratio of the coefficient to its SE is known as the t-ratio or t-statistic.

81
The t-ratio: An Example

In the last example above:
Coefficient
Coe
ce t
SE
t-ratio
1.10
. 0
1.35
0.81
Compare this with a tcrit with 15-3

(2% in each tail for a 5% test)
Do we reject H0:
H0 :
H0 :
1 = 0?
2 = 0?
3 = 0?
-4.40
. 0
0.96
-4.63
=
=
=
12 d.f.
2.179
3.055
19.88
9.88
1.98
10.04
5%
1%
(No)
(Yes)
(Yes)
82
What does the t-ratio Tell Us?

If we reject H0, we say that the result is significant. If the coefficient is not
significant (e.g.
(e g the intercept coefficient in the last regression above),
above) then
significant
it means that the variable is not helping to explain variations in y. Variables
that are not significant are usually removed from the regression model.
In practice there are good statistical reasons for always having a constant
even if it is not significant. Look at what happens if no intercept is included:
yt
83
xt
Data Mining
Data
ata miningg iss searching
sea c g many
a y series
se es for
o statistical
stat st ca relationships
e at o s ps without
w t out
theoretical justification.
For example, suppose we generate one dependent variable and twenty
explanatory variables completely randomly and independently of each other.
If we regress the dependent variable separately on each independent variable,
on average one slope coefficient will be significant at 5%.
5%
If data mining occurs, the true significance level will be greater than the
nominal significance level.
84
An Example of the Use of a Simple t-test to Test a Theory in

Finance
Testingg for the ppresence and significance

g
of abnormal returns ((Jensens
alpha - Jensen, 1968).
The Data: Annual Returns on the portfolios of 115 mutual funds from 19451964.
The model:
R jt R ft j j Rmt R ft u jt
We are interested in the significance of j.

The null hypothesis is H0: j = 0 .
85
Frequency Distribution of t-ratios of Mutual Fund Alphas

(gross of transactions costs)
Source Jensen (1968). Reprinted with the permission of Blackwell publishers.

86
Frequency Distribution of t-ratios of Mutual Fund

Alphas (net of transactions costs)
Source Jensen (1968). Reprinted with the permission of Blackwell publishers.
87
Can UK Unit Trust Managers Beat the Market?

We now perform a variant on Jensens test in the context of the UK market,
considering monthly returns on 76 equity unit trusts. The data cover the
period January 1979 May 2000 (257 observations for each fund). Some
summary statistics for the funds are:
Mean Minimum Maximum Median
Average monthly return, 1979-2000
1.0%
0.6%
1.4% 1.0%
Standard deviation of returns over time 5.1%
4.3%
6.9% 5.0%
Jensen Regression Results for UK Unit Trust Returns,
Returns January 1979-May
1979 May
2000
R jt R ft j j Rmt R ft u jt
88
Can UK Unit Trust Managers Beat the Market?

Results
Estimates of
t-ratio on
Mean
-0.02%
0.91
-0.07
Minimum
-0.54%
0.56
-2.44
Maximum
0.33%
1.09
3.11
Median
-0.03%
0.91
-0.25
In fact, gross of transactions costs, 9 funds of the sample of 76 were able to

significantly out-perform the market by providing a significant positive alpha,
while 7 funds yielded significant negative alphas.
89
Calculating the F-test Statistic
The test statistic is given by

test statistic
RRSS URSS T k
URSS
m
where URSS = RSS from unrestricted regression

RRSS = RSS from restricted regression
m = number of restrictions
T
= number of observations
k
= number of regressors in unrestricted regression including a
constant (or total number of parameters to be estimated in unrestricted
regression).
90
The F-Distribution
The test statistic follows the F-distribution, which has 2 d.f. parameters.
The value of the degrees of freedom parameters are m and (T-k) respectively
(the order of the d.f. parameters is important).
The appropriate critical value will be in column m, row (T-k).
The F-distribution has only positive values and is not symmetrical. We
therefore only reject the null if the test statistic > critical F-value.
91
The F-test: Restricted and Unrestricted Regressions
Example
Th generall regression
The
i is
i
yt = 1 + 2x2t + 3x3t + 4x4t + ut
(1)
We want to test the restriction that 3+4 = 1 (we have some hypothesis from
theory which suggests that this would be an interesting hypothesis to study).
The unrestricted regression is (1) above, but what is the restricted regression?
yt = 1 + 2x2t + 3x3t + 4x4t + ut s.t. 3+4 = 1
We substitute the restriction ( 3+4 = 1) into the regression so that it is
automatically imposed on the data.
3+4 = 1 4 = 1- 3
92
The F-test: Forming the Restricted Regression
yt = 1 + 2x2t + 3x3t + (1
(1-3)x4t + ut
yt = 1 + 2x2t + 3x3t + x4t - 3x4t + ut
Gather terms in s together and rearrange
(yt - x4t) = 1 + 2x2t + 3(x3t - x4t) + ut
This is the restricted regression. We actually estimate it by creating two
new variables, call them, say, Pt and Qt.
Pt = yt - x4t
Qt = x3t - x4t
so
Pt = 1 + 2x2t + 3Qt + ut is the restricted regression we actually estimate.
93
Determining the Number of Restrictions in an F-test
Examples :
H0: hypothesis
h th i
N off restrictions,
No.
t i ti
m
1 + 2 = 2
1
2 = 1 and 3 = -1
2
2 = 0, 3 = 0 and 4 = 0
3
If the model is yt = 1 + 2x2t + 3x3t + 4x4t + ut,
then the null hypothesis
H0: 2 = 0,
0 and 3 = 0 and 4 = 0 is tested by the regression F
F-statistic
statistic. It
tests the null hypothesis that all of the coefficients except the intercept
coefficient are zero.
Note the form of the alternative hypothesis for all tests when more than one
restriction is involved: H1: 2 0, or 3 0 or 4 0
94
Hypotheses that we Cannot Test with Either an F or a t-test
We cannot test using this framework hypotheses which are not linear
or which are multiplicative,
e.g.
H0 : 2 3 = 2
or
H0 : 2 2 = 1
cannot be tested.
95
F-test Example
Question: Suppose a researcher wants to test whether the returns on a
company stock (y) show unit sensitivity to two factors (factor x2 and factor x3)
among three
h considered.
id d The
Th regression
i is
i carried
i d out on 144 monthly
hl
observations. The regression is yt = 1 + 2x2t + 3x3t + 4x4t+ ut
- What are the restricted and unrestricted regressions?
- If the two RSS are 436.1 and 397.2 respectively, perform the test.
Solution:
Unit sensitivity implies H0:2=1 and 3=1. The unrestricted regression is the
one in the question. The restricted regression is (yt-x2t-x3t)= 1+ 4x4t+ut or
letting
zt=yt-x2t-x3t, the restricted regression is zt= 1+ 4x4t+ut
In the F-test formula, T=144, k=4, m=2, RRSS=436.1, URSS=397.2
F-test statistic = 6.68. Critical value is an F(2,140) = 3.07 (5%) and 4.79 (1%).
Conclusion: Reject H0.
96
Reading the F-tables
Degre
es of
Freedo
m for
denom
inator
(T-k)
Degrees of freedom for numerator (m)

5
6
7
8
9
10
12
15
20
24
30
1 161
200 216
2 18.5
19.0 19.2
19.5 19.5 19.5 19.5
3 10.1 9.55 9.28
4 7.71 6.94 6.59
5 6.61 5.79 5.41
225
19.2
19.5
9.12
6.39
5.19
230
19.3
234
19.3
237
19.4
239
19.4
241
19.4
242 244
19.4 19.4
246
19.4
248
19.4
249
19.5
250
9.01
6.26
5.05
8.94 8.89
6.16 6.09
4.95 4.88
8.85
6.04
4.82
8.81
6.00
4.77
8.79
5.96
4.74
8.74
5.91
4.68
8.70
5.86
4.62
8.66
5.80
4.56
8.64 8.62
5.77 5.75
4.53 4.50
6
7
8
9
10
5.99
5.59
5.32
5.12
4.96
5.14
4.74
4.46
4.26
4.10
4.76
4.35
4.07
3.86
3.71
4.53
4.12
3.84
3.63
3.48
4.39
3.97
3.69
3.48
3.33
4.28
3.87
3.58
3.37
3.22
4.21
3.79
3.50
3.29
3.14
4.15
3.73
3.44
3.23
3.07
4.10
3.68
3.39
3.18
3.02
4.06
3.64
3.35
3.14
2.98
4.00
3.57
3.28
3.07
2.91
3.94
3.51
3.22
3.01
2.85
3.87
3.44
3.15
2.94
2.77
3.84
3.41
3.12
2.90
2.74
3.81
3.38
3.08
2.86
2.70
11
12
13
14
15
4.84
4.75
4.67
4.60
4.54
3.98
3.89
3.81
3.74
3.68
3.59
3.49
3.41
3.34
3.29
3.36
3.26
3.18
3.11
3.06
3.20
3.11
3.03
2.96
2.90
3.09
3.00
2.92
2.85
2.79
3.01
2.91
2.83
2.76
2.71
2.95
2.85
2.77
2.70
2.64
2.90
2.80
2.71
2.65
2.59
2.85
2.75
2.67
2.60
2.54
2.79
2.69
2.60
2.53
2.48
2.72
2.62
2.53
2.46
2.40
2.65
2.54
2.46
2.39
2:33
2.61
2.51
2.42
2.35
2.29
2.57
2.47
2.38
2.31
2.25
16
17
18
19
20
4.49
4.45
4.41
4.38
4.35
3.63
3.59
3.55
3.52
3.49
3.24
3.20
3.16
3.13
3.10
3.01
2.96
2.93
2.90
2.87
2.85
2.81
2.77
2.74
2.71
2.74 2.66
2.70 2.61
2.66 2.58
2.63 2.54
2.60 2.51
2.59
2.55
2.51
2.48
2.45
2.54 2.49
2.49 2.45
2.46 2.41
2.42 2.38
2.39 2.35
2.42 2.35
2.38 2.31
2.34 2.27
2.31 2.23
2.28 2.20
2.28
2.23
2.19
2.16
2.12
2.24 2.19
2.19 2.15
2.15 2.11
2.11 2.07
2.08 2.04
21
22
23
24
25
4.32
4.30
4.28
4.26
4.24
3.47
3.44
3.42
3.40
3.39
3.07
3.05
3.03
3.01
2.99
2.84
2.82
2.80
2.78
2.76
2.68
2.66
2.64
2.62
2.60
2.57
2.55
2.53
2.51
2.49
2.49
2.46
2.44
2.42
2.40
2.42
2.40
2.37
2.36
2.34
2.37
2.34
2.32
2.30
2.28
2.32
2.30
2.27
2.25
2.24
2.25
2.23
2.20
2.18
2.16
2.18
2.15
2.13
2.11
2.09
2.10
2.07
2.05
2.03
2.01
2.05
2.03
2.01
1.98
1.96
2.01
1.98
1.96
1.94
1.92
30
40
60
120
4.17
4.08
4.00
3.92
3.84
3.32
3.23
3.15
3.07
3.00
2.92
2.84
2.76
2.68
2.60
2.69
2.61
2.53
2.45
2.37
2.53
2.45
2.37
2.29
2.21
2.42
2.34
2.25
2.18
2.10
2.33
2.25
2.17
2.09
2.01
2.27
2.18
2.10
2.02
1.94
2.21
2.12
2.04
1.96
1.88
2.16
2.08.
1.99
1.91
1.83
2.09
2.00
1.92
1.83
1.75
2.01
1.92
1.84
1.75
1.67
1.93
1.84
1.75
1.66
1.57
1.89
1.79
1.70
1.61
1.52
1.84
1.74
1.65
1.55
1.46
Source: Biometrika Tables for Statisticians (1966), Volume 1, 3rd Edition. Reprinted with permission
of Oxford University Press.
97
Goodness of Fit Statistics
We would like some measure of how well our regression model actually fits
the
h data.
d
We have goodness of fit statistics to test this: i.e. how well the sample
regression function (srf) fits the data.
The most common goodness of fit statistic is known as R2. One way to define
R2 is to say that it is the square of the correlation coefficient between y and y .
For another explanation, recall that what we are interested in doing is
explaining the variability of y about its mean value, , i.e. the total sum of
squares,
q
, TSS:
TSS yt y 2
t
We can split the TSS into two parts, the part which we have explained (known
as the explained sum of squares, ESS) and the part which we did not explain
using the model (the RSS).
98
Defining R2
That is,
TSS =
ESS
yt y
+ RSS
y t y 2 ut2
t
Our goodness of fit statistic is

ESS
TSS
But since TSS = ESS + RSS, we can also write
R2
R2
ESS TSS RSS

RSS
1
TSS
TSS
TSS
R2 must always lie between zero and one. To understand this, consider two
extremes
RSS = TSS i.e.
ESS = 0 so
R2 = ESS/TSS = 0
ESS = TSS i.e.
RSS = 0 so
R2 = ESS/TSS = 1
99
The Limit Cases: R2 = 0 and R2 = 1
yt
yt
xt
xt
100
Problems with R2 as a Goodness of Fit Measure

There are a number of them:
1. R2 is defined in terms of variation about the mean of y so that if a model is
reparameterised (rearranged) and the dependent variable changes, R2 will
change.
2. R2 never falls if more regressors are added. to the regression, e.g. consider:
Regression 1: yt = 1 + 2x2t + 3x3t + ut
Regression 2: y = 1 + 2x2t + 3x3t + 4x4t + ut
R2 will always be at least as high for regression 2 relative to regression 1.
3. R2 quite often takes on values of 0.9 or higher for time series regressions.
101
Adjusted R2
In order to get around these problems, a modification is often made which
takes into account the loss of degrees of freedom associated with adding extra
2
variables. This is known as R , or adjusted R2:
T 1
R 2 1
1 R2
T k
So if we add an extra regressor, k increases and unless R2 increases by a more

than offsetting amount, R 2 will actually fall.
There are still problems with the criterion:
1. A soft rule
2. No distribution for R 2or R2
102
A Regression Example:
Hedonic House Pricing Models
Hedonic models are used to value real assets, especially housing, and view
the asset as representing a bundle of characteristics.
Des
D R
Rosiers
i and
d Th
Thrialt
i lt (1996) consider
id th
the effect
ff t off various
i
amenities
iti on
rental values for buildings and apartments 5 sub-markets in the Quebec area
of Canada.
The rental value in Canadian Dollars per month (the dependent variable) is a
function of 9 to 14 variables (depending on the area under consideration). The
paper employs 1990 data, and for the Quebec City region, there are 13,378
observations, and the 12 explanatory variables are:
LnAGE
- log of the apparent age of the property
NBROOMS - number of bedrooms
AREABYRM - area per room (in square metres)
ELEVATOR - a dummy variable = 1 if the building has an elevator; 0
otherwise
BASEMENT - a dummy variable = 1 if the unit is located in a basement; 0
otherwise
103
Hedonic House Pricing Models:

Variable Definitions
OUTPARK - number of outdoor parking spaces
INDPARK - number of indoor parking spaces
NOLEASE - a dummy variable = 1 if the unit has no lease attached to it; 0
otherwise
LnDISTCBD - log of the distance in kilometres to the central business district
SINGLPAR - percentage of single parent families in the area where the
building stands
DSHOPCNTR- distance in kilometres to the nearest shopping centre
VACDIFF1 - vacancy difference between the building and the census figure
Examine the signs and sizes of the coefficients.
The coefficient estimates themselves show the Canadian dollar rental
price per month of each feature of the dwelling.
104
Hedonic House Price Results

Dependent Variable: Canadian Dollars per Month
Variable
Intercept
LnAGE
NBROOMS
AREABYRM
ELEVATOR
BASEMENT
OUTPARK
INDPARK
NOLEASE
LnDISTCBD
SINGLPAR
DSHOPCNTR
VACDIFF1
Coefficient
282.21
-53.10
48.47
3.97
88.51
-15.90
7.17
73.76
-16.99
5 84
5.84
-4.27
-10.04
0.29
t-ratio
56.09
-59.71
104.81
29.99
45.04
-11.32
7.07
31.25
-7.62
4 60
4.60
-38.88
-5.97
5.98
A priori sign expected

+
+
+
+
+
+
-
Notes: Adjusted R2 = 0.65l; regression F-statistic = 2082.27. Source: Des Rosiers and
Thrialt
(1996). Reprinted with permission of the American Real Estate Society.
105
Violation of the Assumptions of the CLRM
Recall that we assumed of the CLRM disturbance terms:

1. E(ut) = 0
2. Var(ut) = 2 <
3. Cov (ui,uj) = 0
4. The X matrix is non-stochastic or fixed in repeated samples
5. ut N(0,2)
106
Investigating Violations of the Assumptions of the CLRM

We will now study these assumptions further, and in particular look at:
- How we test for violations
- Causes
- Consequences
in general we could encounter any combination of 3 problems:
the coefficient estimates are wrong
the associated standard errors are wrong
the distribution that we assumed for the
test statistics
stat st cs will
w be inappropriate
app op ate
- Solutions
the assumptions are no longer violated
we work around the problem so that we
use alternative techniques which are still valid
107
Statistical Distributions for Diagnostic Tests
Often, an F- and a 2- version of the test are available.

The F-test version involves estimating a restricted and an unrestricted
version of a test regression and comparing the RSS.
The 2- version is sometimes called an LM test, and only has one degree
of freedom parameter: the number of restrictions being tested, m.
Asymptotically
Asymptotically, the 2 tests are equivalent since the 2 is a special case of the
F-distribution:
2 m
F m, T k as T k
m
For small samples, the F-version is preferable.
108
Assumption 1: E(ut) = 0
Assumption that the mean of the disturbances is zero.

For all diagnostic tests, we cannot observe the disturbances and so perform the
tests of the residuals.
The mean of the residuals will always be zero provided that there is a constant
t
term
in
i the
th regression.
i
109
Assumption 2: Var(ut) = 2 <

We have so far assumed that the variance of the errors is constant, 2 - this is
known as homoscedasticity.
homoscedasticity If the errors do not have a constant variance,
variance we
say that they are heteroscedastic e.g. say we estimate a regression and
calculate the residuals, ut .
u +
t
x2t
110
-
Causes of Heteroscedastic errors

a) CROSS-SECTION DATA
Regression of consumption expenditure (y) and income (x) for 20 families
(thousands of $)
y=0.847+0.899x+
yt
(0.703) (0.0253)
xt
(points more widely scattered around regression line for larger x)
111
Causes of Heteroscedastic errors (contd)

because of scale or size effects (countries, companies)
yy=1+ 2 x+ 3 z+
y=R&D, x=sales volume, z=real interest rate
b) TIME SERIES DATA
Flow of new information or news is time-varying (shocks)
Rt 1 2 RtM t
var( t ) var( Rt | RtM ) not constant
112
Detection of Heteroscedasticity
Graphical methods
Formal
F
l tests:
t t
One of the best is Whites general test for heteroscedasticity.
The test is carried out as follows:
1. Assume that the regression we carried out is as follows
yt = 1 + 2x2t + 3x3t + ut
And we want to test Var(ut) = 2. We estimate the model,
model obtaining the
residuals, ut
2. Then run the auxiliary regression
ut2 1 2 x2t 3 x3t 4 x22t 5 x32t 6 x2t x3t vt
113
Whites Test for Heteroscedasticity
33. Obtain
Obt i R2 from
f
the
th auxiliary
ili
regression
i andd multiply
lti l it by
b the
th number
b off
observations, T. It can be shown that
T R2 2 (m)
where m is the number of regressors in the auxiliary regression excluding the
constant term.
4. If the 2 test statistic from step 3 is greater than the corresponding value
from the statistical table then reject the null hypothesis that the disturbances are
homoscedastic.
114
ARCH type heteroskedasticity
Suppose yt=1+ 2 xt+ t

y=price of stock A, x=dividend flow
conditional mean:
conditional variance:
E(yt | xt ) 1 2 xt
var(yt | xt ) var( t ) E ( t 2 ) t2
AutoRegressive Conditional Heteroskedasticity (ARCH) is defined as:

current volatility,
volatility or conditional variance
depends on the magnitude of past shocks
115
Engles test for ARCH type heteroskedasticity
- Heteroskedasticity of ARCH type

of order 1 or ARCH(1): t2 0 1t21
ARCH(p): 2 2 ... 2
t
1 t 1
p t p
- How does Engles LM test work?

1) Estimate yt=1+ 2 xt+ t and compute residuals
2) Estimate 2 2 ... 2 e
t
1 t 1
p t p
and test for H0:1= 2= = p=0 (no ARCH)

- Test statistic: LM= TR2 ~ 2p distribution
116
Consequences of Using OLS in the Presence of

Heteroscedasticity
OLS estimation still gives unbiased coefficient estimates,

estimates but they are no
longer BLUE.
This implies that if we still use OLS in the presence of heteroscedasticity, our
standard errors could be inappropriate and hence any inferences we make could
be misleading.
Wh
Whether
th the
th standard
t d d error estimates
ti t calculated
l l t d using
i the
th usuall formulae
f
l are
too big or too small will depend upon the form of the heteroscedasticity.
117
How Do we Deal with Heteroscedasticity?
If the form (i.e. the cause) of the heteroscedasticity is known, then we can use
an estimation method which takes this into account (called generalised least
squares, GLS).
A simple illustration of GLS is as follows: Suppose that the error variance is
related to another variable zt by varut 2 zt2
To remove the heteroscedasticity, divide the regression equation by zt
yt
x
x
1
1 2 2t 3 3t vt
zt
zt
zt
zt
u
where vt t is an error term.
zt
u varut 2 zt2
Now varvt var t
2 2 for known zt.
2
z
z
zt
t
t
So the disturbances from the new regression equation will be homoscedastic.
118
Other Remedies for Heteroscedasticity
Other solutions include:

1. Transforming the variables into logs or reducing by some other measure
of size.
2. Use Whites heteroscedasticity consistent standard error estimates.
The effect of using Whites correction is that in general the standard errors
for the slope coefficients are increased relative to the usual OLS standard
errors
errors.
This makes us more conservative in hypothesis testing, so that we
would need more evidence against the null hypothesis before we would
reject it.
119
Autocorrelation: Some Background The Concept of a

Lagged Value
Background: The Concept of a Lagged Value
t
yt
yt-1
1989M09
0.8
1989M10
1.3
0.8
1989M11
-0.9
1.3
1989M12
0.2
-0.9
1990M01
-1.7
0.2
1990M02
2.3
-1.7
1990M03
0.1
2.3
1990M04
0.0
0.1
.
.
.
.
.
.
.
.
.
yt
1.3-0.8=0.5
-0.9-1.3=-2.2
0.2--0.9=1.1
-1.7-0.2=-1.9
2.3--1.7=4.0
0.1-2.3=-2.2
0.0-0.1=-0.1
.
.
.
120
Autocorrelation
We assumed of the CLRMs errors that Cov (ui , uj) = 0 for ij, i.e.
y g there is no p
pattern in the disturbances.
This is essentiallyy the same as saying
In a regression, if there are patterns in the residuals from a model, we say that
they are autocorrelated.
Obviously we never have the actual us, so we use their sample counterpart, the
residuals (the ut ).
Some stereotypical patterns we may find in the residuals are given on the next
three slides.
121
Positive Autocorrelation
+
u t
u t
u t 1
Time
Positive Autocorrelation is indicated by a cyclical residual plot over time.
122
Negative Autocorrelation
ut
+
u t
u t 1
Time
Negative autocorrelation is indicated by an alternating pattern where the residuals

cross the time axis more frequently than if they were distributed randomly
123
No pattern in residuals No Autocorrelation
ut
u t
u t 1
No pattern in residuals at all: this is what we would like to see

124
Causes of Correlated errors
a) CROSS-SECTION DATA spatial correlation

omitted variables or shocks common to all groups (countries,
(countries companies,
companies
households)
b) TIME SERIES DATA serial correlation
Business cycle inertia causes positive autocorrelation in macroeconomic
time series
Overlapping effect of shocks: the effect of a time t shock persists over t+1,
t+2, ...
Model misspecification: omitted relevant variables which are correlated
across time (inertia)
125
Detecting Autocorrelation:
The Durbin-Watson Test
The Durbin Watson (DW) is a test for first order autocorrelation - i.e. it
assumes that
th t the
th relationship
l ti hi is
i between
b t
an error and
d the
th previous
i
one
ut = ut-1 + vt
(1)
where vt N(0, v2).
The DW test statistic actually tests
H0 : =0 and H1 : 0
The test statistic is calculated by
T
ut ut 1
DW t 2
ut2
t 2
126
The Durbin-Watson Test Critical Values
We can also write
DW 21
(2)
where is the estimated correlation coefficient. Since is a correlation, it
implies that 1 1 .
Rearranging for DW from (2) would give 0DW4.
If = 0, DW = 2. So roughly speaking, do not reject the null hypothesis if DW
is near 2 i.e. there is little evidence of autocorrelation
Unfortunately, DW has 2 critical values, an upper critical value (du) and a
lower critical value (dL), and there is also an intermediate region where we can
neither reject nor not reject H0.
127
Reading Durbin Watson Critical Value Tables
k=1
k=2
k=4
k=5
dU
dL
dU
dL
dU
dL
dU
0.81
0.84
0.87
0.90
0.93
0.95
1.07
1.09
1.10
1.12
1.13
1.15
0.70
0.74
0.77
0.80
0.83
0.86
1.25
1.25
1.25
1.26
1.26
1.27
0.59
0.63
0.67
0.71
0.74
0.77
1.46
1.44
1.43
1.42
1.41
1.41
0.49
0.53
0.57
0.61
0.65
0.68
1.70
1.66
1.63
1.60
1.58
1.57
0.39
0.44
0.48
0.52
0.56
0.60
1.96
1.90
1.85
1.80
1.77
1.74
21
22
23
24
25
0.97
1.00
1.02
1.04
1.05
1.16
1.17
1.19
1.20
1.21
0.89
0.91
0.94
0.96
0.98
1.27
1.28
1.29
1.30
1.30
0.80
0.83
0.86
0.88
0.90
1.41
1.40
1.40
1.41
1.41
0.72
0.75
0.77
0.80
0.83
1.55
1.54
1.53
1.53
1.52
0.63
0.66
0.70
0.72
0.75
1.71
1.69
1.67
1.66
1.65
26
27
28
29
30
1.07
1.09
1.10
1.12
1.13
1.22
1.23
1.24
1.25
1.26
1.00
1.02
1.04
1.05
1.07
1.31
1.32
1.32
1.33
1.34
0.93
0.95
0.97
0.99
1.01
1.41
1.41
1.41
1.42
1.42
0.85
0.88
0.90
0.92
0.94
1.52
1.51
1.51
1.51
1.51
0.78
0.81
0.83
0.85
0.88
1.64
1.63
1.62
1.61
1.61
31
32
33
34
35
1.15
1.16
1.17
1.18
1.19
1.27
1.28
1.29
1.30
1.31
1.08
1.10
1.11
1.13
1.14
1.34
1.35
1.36
1.36
1.37
1.02
1.04
1.05
1.07
1.08
1.42
1.43
1.43
1.43
1.44
0.96
0.98
1.00
1.01
1.03
1.51
1.51
1.51
1.51
1.51
0.90
0.92
0.94
0.95
0.97
1.60
1.60
1.59
1.59
1.59
36
37
38
39
40
1.21
1.22
.
1.23
1.24
1.25
1.32
1.32
.3
1.33
1.34
1.34
1.15
1.16
. 6
1.18
1.19
1.20
1.38
1.38
.38
1.39
1.39
1.40
1.10
1.11
.
1.12
1.14
1.15
1.44
1.45
. 5
1.45
1.45
1.46
1.04
1.06
.06
1.07
1.09
1.10
1.51
1.51
.5
1.52
1.52
1.52
0.99
1.00
.00
1.02
1.03
1.05
1.59
1.59
.59
1.58
1.58
1.58
45
50
55
60
65
70
1.29
1.32
1.36
1.38
1.41
1.43
1.38
1.40
1.43
1.45
1.47
1.49
1.24
1.28
1.32
1.35
1.38
1.40
1.42
1.45
1.47
1.48
1.50
1.52
1.20
1.24
1.28
1.32
1.35
1.37
1.48
1.49
1.51
1.52
1.53
1.55
1.16
1.20
1.25
1.28
1.31
1.34
1.53
1.54
1.55
1.56
1.57
1.58
1.11
1.16
1.21
1.25
1.28
1.31
1.58
1.59
1.59
1.60
1.61
1.61
75
1.45
1.50
1.42
1.53
1.39
1.56
1.37
1.59
1.34
80
1.47
1.52
1.44
1.54
1.42
1.57
1.39
1.60
1.36
85
1.48
1.53
1.46
1.55
1.43
1.58
1.41
1.60
1.39
90
1.50
1.54
1.47
1.56
1.45
1.59
1.43
1.61
1.41
95
1.51
1:55
1.49
1.57
1.47
1.60
1.45
1.62
1.42
100
1.52
1.56
1.50
1.58
1.48
1.60
1.46
1.63
1.44
T, number of observations; k, number of explanatory variables (excluding a constant term).
Source: Econometrica, 48, 1554. Reprinted with the permission of the Econometric Society.
1.62
1.62
1.63
1.64
1.64
1.65
dL
dU
k=3
dL
15
16
17
18
19
20
128
The Durbin-Watson Test: Interpreting the Results
Conditions which Must be Fulfilled for DW to be a Valid Test

1. Constant term in regression
2. Regressors are non-stochastic
3. No lags of dependent variable
129
Another Test for Autocorrelation:

The Breusch-Godfrey Test
It is a more general test for rth order autocorrelation:
ut 1ut 1 2 ut 2 3ut 3 ..... r ut r vt ,
vt N(0, v2)
The null and alternative hypotheses are:
H0 : 1 = 0 and 2 = 0 and ... and r = 0
H1 : 1 0 or 2 0 or ... or r 0
The test is carried out as follows:
1. Estimate the linear regression using OLS and obtain the residuals, ut .
2 Regress ut on all of the regressors from stage 1 (the xs) plus ut 1, ut 2, ...., ut r
2.
Obtain R2 from this regression.
3. It can be shown that (T-r)R2 2(r)
If the test statistic exceeds the critical value from the statistical tables, reject
the null hypothesis of no autocorrelation.
130
Consequences of Ignoring Autocorrelation

if it is Present
The coefficient estimates derived using OLS are still unbiased, but they are
inefficient, i.e. they are not BLUE, even in large sample sizes.
Thus if the standard error estimates are inappropriate, there exists the
possibility that we could make the wrong inferences.
R2 is
i likely
lik l to
t be
b inflated
i fl t d relative
l ti to
t its
it correct
t value
l for
f positively
iti l correlated
l t d
residuals.
131
Remedies for Autocorrelation

If the form of the autocorrelation is known, we could use a GLS procedure
i.e. an approach that allows for autocorrelated residuals e.g., Cochrane-Orcutt.
But such procedures that correct for autocorrelation require assumptions
about the form of the autocorrelation.
If these assumptions are invalid, the cure would be more dangerous than the
disease! - see Hendry and Mizon (1978).
However, it is unlikely to be the case that the form of the autocorrelation is
known, and a more modern view is that residual autocorrelation presents an
opportunity to modify the regression.
132
Dynamic Models
All of the models we have considered so far have been static,, e.g.
g
yt = 1 + 2x2t + ... + kxkt + ut
But we can easily extend this analysis to the case where the current value of yt
depends on previous values of y or one of the xs, e.g.
yt = 1 + 2x2t + ... + kxkt + 1yt-1 + 2x2t-1 + + kxkt-1+ ut
We could extend the model even further by adding extra lags, e.g. x2t-2 , yt-3 .
133
Problems with Adding Lagged Regressors

to Cure Autocorrelation
Inclusion of lagged
gg values of the dependent
p
variable violates the assumption
p
that the RHS variables are non-stochastic.
What does an equation with a large number of lags actually mean?
Note that if there is still autocorrelation in the residuals of a model including
lags, then the OLS estimators will not even be consistent.
134
Multicollinearity
This problem occurs when the explanatory variables are very highly correlated
with each other.
Perfect multicollinearity
Cannot estimate all the coefficients
- e.g. suppose x3t = 2x2t
and the model is yt = 0 + 2x2t + 3x3t + 4x4t + ut
Problems if Near Multicollinearity is Present but Ignored
- R2 will be high but the individual coefficients will have high standard errors.
- The regression becomes very sensitive to small changes in the specification.
- Thus confidence intervals for the parameters will be very wide, and
significance tests might therefore give inappropriate conclusions.
135
Measuring Multicollinearity
The easiest way to measure the extent of multicollinearity is simply to

look at the matrix of correlations between the individual variables.
variables e.g.
eg
Corr
x2t
x3t
x4t
x2t
0.2
0.8
x3t
0.2
0.3
x4t
0.8
0.3
-
But another problem: if 3 or more variables are linear

- e.g. x2t + x3t = x4t
Note that high correlation between yt and one of the xt variables is not
muticollinearity.
136
Solutions to the Problem of Multicollinearity
Traditional approaches, such as ridge regression or principal

components. But
B these
h
usually
ll bring
b i more problems
bl
than
h they
h solve.
l
Some econometricians argue that if the model is otherwise OK, just
ignore it
The easiest ways to cure the problems are
- drop one of the collinear variables
- transform
f
the
h highly
hi hl correlated
l d variables
i bl into
i
a ratio
i
- go out and collect more data e.g.
- a longer run of data
- switch to a higher frequency
137
Adopting the Wrong Functional Form
We have previously assumed that the appropriate functional form is linear.

This
Thi may nott always
l
b true.
be
t
We can formally test this using Ramseys RESET test, which is a general test
for mis-specification of functional form.
Essentially the method works by adding higher order terms of the fitted values
(e.g. y t2 , y t3 etc.) into an auxiliary regression:
Regress ut on powers of the fitted values:
ut 0 1 y t2 2 y t3 .... p 1 y tp vt
Obtain R2 from this regression. The test statistic is given by TR2 and is
distributed as a 2 p 1 .
2
So if the value of the test statistic is greater than a p 1 then reject the null
hypothesis that the functional form was correct.
138
But what do we do if this is the case?

The RESET test gives us no guide as to what a better specification might be.
One possible cause of rejection of the test is if the true model is
In this case the remedy is obvious.
yt 1 2 x2t 3 x22t 4 x23t ut
Another possibility is to transform the data into logarithms. This will linearise
many previously multiplicative models into additive ones:
yt Axt eut ln yt ln xt ut
139
Testing the Normality Assumption
Why did we need to assume normality for hypothesis testing?

Testing for Departures from Normality
The Bera Jarque normality test
A normal distribution is not skewed and is defined to have a coefficient of
kurtosis of 3.
The kurtosis of the normal distribution is 3 so its excess kurtosis (b2-3) is zero.
Skewness and kurtosis are the (standardised) third and fourth moments of a
distribution.
140
Normal versus Skewed Distributions
f(x)
f(x)
A normal distribution
A skewed distribution
141
Leptokurtic versus Normal Distribution
0.5
0.4
0.3
0.2
0.1
0.0
-5.4
-3.6
-1.8
-0.0
1.8
3.6
5.4
142
Testing for Normality
Bera and Jarque formalise this by testing the residuals for normality by
testing whether the coefficient of skewness and the coefficient of excess
kurtosis are jointly zero.
It can be proved that the coefficients of skewness and kurtosis can be
expressed respectively as:
b1
E[u 3 ]
2 32
and
b2
E[u 4 ]
2 2
The Bera Jarque test statistic is given by
b 2 b 32
2
W T 1 2
~ 2
6
24
We estimate b1 and b2 using the residuals from the OLS regression, u .

143
What do we do if we find evidence of Non-Normality?
It is not obvious what we should do!

Could use a method which does not assume normality, but difficult and what
are its properties?
Often the case that one or two very extreme residuals causes us to reject the
normality assumption.
An alternative is to use dummy variables.
e.g. say we estimate a monthly model of asset returns from 1980-1990, and
we plot the residuals, and find a particularly large outlier for October 1987:
144
What do we do if we find evidence

of Non-Normality? (contd)
ut
+
Oct
1987
Time
Create a new variable:

D87M10t = 1 during October 1987 and zero otherwise.
This effectively knocks out that observation. But we need a theoretical
reason for adding dummy variables.
145
Seasonality in Financial Markets

If we have quarterly or monthly or even daily data, these may have patterns in.
Seasonal effects in financial markets have been widely observed and are often
termed calendar anomalies.
Examples include day-of-the-week effects, open- or close-of-market effect,
January effects, or bank holiday effects.
Th
These result
l in
i statistically
i i ll significantly
i ifi
l different
diff
b h i
behaviour
d i
during
some
seasons compared with others.
Their existence is not necessarily inconsistent with the EMH.
146
Constructing Dummy Variables for Seasonality

One way to cope with this is the inclusion of dummy variables- e.g. for
quarterly data, we could have 4 dummy variables:
D1t
D2t
D3t
D4t
= 1 in Q1 and zero otherwise

How many dummy variables do we need? We need one less than the
seasonality of the data. e.g. for quarterly series, consider what happens if we
use all 4 dummies
147
Constructing Quarterly Dummy Variables
1986 Q1
Q2
Q3
Q4
1987 Q1
Q2
Q3
D1t
1
0
0
0
1
0
0
D2t
0
1
0
0
0
1
0
etc.
D3t
0
0
1
0
0
0
1
D4t
0
0
0
1
0
0
0
Sumt
1
1
1
1
1
1
1
Problem of multicollinearity so (X'X)-1 does not exist.

Solution is to just use 3 dummy variables plus the constant or 4 dummies and
no constant.
148
How Does the Dummy Variable Work?

It works by changing the intercept.
Consider the followingg regression:
g
yt
yt = 1 + 1D1t + 2D2t + 3D3t + 2x2t +... + ut

3
So we have as the constant

1 1 in the first quarter
1 2 in the second quarter
1 3 in the third quarter
1 in the fourth quarter
xt
Q3
Q2
Q1
Q0
149
Seasonalities in South East Asian

Stock Returns
Brooks and Persand (2001) examine the evidence for a day-of-the-week effect
in five Southeast Asian stock markets: South Korea, Malaysia, the Philippines,
Taiwan and Thailand.
The data, are on a daily close-to-close basis for all weekdays (Mondays to
Fridays) falling in the period 31 December 1989 to 19 January 1996 (a total of
1581 observations).
They use daily dummy variables for the day of the week effects in the
regression:
rt = 1D1t + 2D2t + 3D3t + 4D4t + 5D5t + ut
Then the coefficients can be interpreted as the average return on each day of
the week.
150
Values and Significances of Day of the Week Effects in South

East Asian Stock Markets
Monday
Tuesday
Wednesday
Thursday
Friday
South Korea
0.49E-3
(0.6740)
-0.45E-3
(-0.3692)
-0.37E-3
-0.5005)
0.40E-3
(0 5468)
(0.5468)
-0.31E-3
(-0.3998)
Thailand
0.00322
(3.9804)**
-0.00179
(-1.6834)
-0.00160
(-1.5912)
0.00100
(1 0379)
(1.0379)
0.52E-3
(0.5036)
Malaysia
0.00185
(2.9304)**
-0.00175
(-2.1258)**
0.31E-3
(0.4786)
0.00159
(2 2886)**
(2.2886)**
0.40E-4
(0.0536)
Taiwan
0.56E-3
(0.4321)
0.00104
(0.5955)
-0.00264
(-2.107)**
-0.00159
( 1 2724)
(-1.2724)
0.43E-3
(0.3123)
Philippines
0.00119
(1.4369)
-0.97E-4
(-0.0916)
-0.49E-3
(-0.5637)
0.92E-3
(0 8908)
(0.8908)
0.00151
(1.7123)
Notes: Coefficients are given in each cell followed by t-ratios in parentheses; * and ** denote significance at the
5% and 1% levels respectively. Source: Brooks and Persand (2001).
151
Slope Dummy Variables
As well as or instead of intercept dummies, we could also use slope dummies:

For
F example,
l this
thi diagram
di
d i t the
depicts
th use off one dummy
d
e.g., for
f bi-annual
bi
l
(twice yearly) or open and close data.
y
In the latter case, we could
y x D x u
define Dt = 1 for open observations
and Dt=0 for close.
y x u
Such dummies change the slope
but leave the intercept
p unchanged.
g
We could use more slope dummies
or both intercept and slope dummies.
t
xt
152
Seasonalities in South East Asian

Stock Returns Revisited
It is possible that the different returns on different days of the week could be a
result of different levels of risk on different days.
To allow for this, Brooks and Persand re-estimate the model allowing for
different betas on different days of the week using slope dummies:
5
rt = ( iDit + i DitRWMt) + ut
i 1
where Dit is the ith dummy variable taking the value 1 for day t=i and zero
otherwise and RWMt is the return on the world market index
otherwise,
Now both risk and return are allowed to vary across the days of the week.
153
Values and Significances of Day of the Week Effects in South

East Asian Stock Markets
allowing for Time-Varying risks
Monday
Tuesday
Wednesday
Thursday
Friday
Beta-Monday
Beta-Tuesdayy
Beta-Wednesday
Beta-Thursday
Beta-Friday
Thailand
0.00322
(3.3571)**
(3.3571)
-0.00114
(-1.1545)
-0.00164
(-1.6926)
0.00104
(1.0913)
0.31E-4
(0.03214)
0.3573
(2.1987)*
1.0254
(8.0035)**
0.6040
(3.7147)**
0.6662
(3.9313)**
0.9124
(5.8301)**
Malaysia
0.00185
(2.8025)**
(2.8025)
-0.00122
(-1.8172)
0.25E-3
(0.3711)
0.00157
(2.3515)*
-0.3752
(-0.5680)
0.5494
(4.9284)**
0.9822
(11.2708)**
0.5753
(5.1870)**
0.8163
(6.9846)**
0.8059
(7.4493)**
Taiwan
0.544E-3
(0.3945)
0.00140
(1.0163)
-0.00263
(-1.9188)
-0.00166
(-1.2116)
-0.13E-3
(-0.0976)
0.6330
(2.7464)**
0.6572
(3.7078)**
0.3444
(1.4856)
0.6055
(2.5146)*
1.0906
(4.9294)**
Notes: Coefficients are given in each cell followed by t-ratios in parentheses; * and ** denote significance at the
5% and 1% levels respectively. Source: Brooks and Persand (2001).
154
Omission of an Important Variable or

Inclusion of an Irrelevant Variable
Omission of an Important Variable

Consequence: The estimated coefficients on all the other variables will be
biased and inconsistent unless the excluded variable is uncorrelated with
all the included variables.
Even if this condition is satisfied, the estimate of the coefficient on the
constant term will be biased.
The standard errors will also be biased.
Inclusion of an Irrelevant Variable
Coefficient estimates will still be consistent and unbiased, but the
estimators will be inefficient.
155
Parameter Stability Tests

So far, we have estimated regressions such as yt 1 2 x2t 3 x3t ut
We have implicitly assumed that the parameters (1, 2 and 3) are constant
for the entire sample period.
We can test this implicit assumption using parameter stability tests. The idea
is essentially to split the data into sub-periods and then to estimate up to three
models, for each of the sub-parts and for all the data and then to compare
the RSS of the models.
models
We can look at the Chow test (analysis of variance test)
156
The Chow Test
The steps involved are:

1. Split the data into two sub-periods. Estimate the regression over the whole
period and then for the two sub-periods separately (3 regressions). Obtain the
RSS for each regression.
2. The restricted regression is now the regression for the whole period while
the unrestricted regression comes in two parts: for each of the sub-samples.
We can thus form an F-test which is the difference between the RSSs.
The statistic is
RSS RSS1 RSS 2 T 2k
RSS1 RSS 2
k
157
The Chow Test (contd)

where:
p
RSS = RSS for whole sample
RSS1 = RSS for sub-sample 1
RSS2 = RSS for sub-sample 2
T = number of observations
2k = number of regressors in the unrestricted regression (since it comes in
two parts)
k = number of regressors in (each part of the) unrestricted regression
3. Perform the test. If the value of the test statistic is greater than the critical
value from the F-distribution, which is an F(k, T-2k), then reject the null
hypothesis that the parameters are stable over time.
158
A Chow Test Example

Consider the following regression for the CAPM (again) for the returns on
Glaxo.
Say that we are interested in estimating Beta for monthly data from 19811992. The model for each sub-period is
1981M1 - 1987M10
0.24 + 1.2RMt
1987M11 - 1992M12
0.68 + 1.53RMt
1981M1 - 1992M12
0.39 + 1.37RMt
T = 82
RSS1 = 0.03555
T = 62
RSS2 = 0.00336
T = 144
RSS = 0.0434
159
A Chow Test Example - Results
The null hypothesis is
H 0 : 1 2
and
1 2
The unrestricted model is the model where this restriction is not imposed
Test statistic
0.0434 0.0355 0.00336 144 4
0.0355 0.00336
2
= 7.698
Compare with 5% F(2,140) = 3.06
We reject H0 at the 5% level and say that we reject the restriction that the
coefficients are the same in the two periods.
160
How do we decide the sub-parts to use?

As a rule of thumb, we could use all or some of the following:
- Plot the dependent variable over time and split the data accordingly to any
obvious
b i
structural
t t l changes
h
i the
in
th series,
i e.g.
1400
Value of Series (yt)
1200
1000
800
600
400
200
443
417
391
365
339
313
287
261
235
209
183
157
131
79
105
53
27
0
Sample Period
- Split the data according to any known important historical events

(e.g. stock market crash, new government elected)
161
Two Approaches to Building Econometric Models

There are 2 popular philosophies of building econometric models: the
specific-to-general and general-to-specific approaches.
Specific-to-general was used almost universally until the mid 1980s, and
involved starting with the simplest model and gradually adding to it.
Little, if any, diagnostic testing was undertaken. But this meant that all
inferences were potentially invalid.
A
An alternative
l
i and
d more modern
d
approachh to model
d l building
b ildi is
i the
h LSE or
Hendry general-to-specific methodology.
The advantages of this approach are that it is statistically sensible and also the
theory on which the models are based usually has nothing to say about the lag
structure of a model.
162
The General-to-Specific Approach

First step is to form a large model with lots of variables on the right hand
side
This is known as a GUM (generalised unrestricted model)
At this stage, we want to make sure that the model satisfies all of the
assumptions of the CLRM
If the assumptions are violated, we need to take appropriate actions to remedy
this, e.g.
- taking logs
- adding lags
- dummy
d
variables
i bl
We need to do this before testing hypotheses
Once we have a model which satisfies the assumptions, it could be very big
with lots of lags & independent variables
163
The General-to-Specific Approach:

Reparameterising the Model
The next stage is to reparameterise the model by
oc g out ve
very
y insignificant
s g ca t regressors
eg esso s
- knocking
- some coefficients may be insignificantly different from each other,
so we can combine them.
At each stage, we need to check the assumptions are still OK.
Hopefully at this stage, we have a statistically adequate empirical model which
we can use for
- testing underlying financial theories
- forecasting future values of the dependent variable
- formulating policies, etc.
164
A Strategy for Building Econometric Models
Our Objective:
To build a statistically adequate empirical model which
- satisfies the assumptions of the CLRM
- is parsimonious
- has the appropriate theoretical interpretation
- has the right shape - i.e.
- all signs on coefficients are correct
- all
ll sizes
i
off coefficients
ffi i
are correct
- is capable of explaining the results of all competing models
165
Sample Plots for various Stochastic Processes:

A White Noise Process
4
3
2
1
0
-1 1
-2
-3
-4
40 79 118 157 196 235 274 313 352 391 430 469
166
White Noise
A WN process is a sequence of random disturbances (or shocks) t

t=1,2, drawn independently from the same probability distribution
with zero mean and variance 2
A WN process is a completely random and unpredictable sequence of
random disturbances t t=1,2,
167
Stationary Process
A process xt is stationary if it is in a particular state of statistical

equilibrium
A stationary process xt is also called mean-reverting because there is a
long-run equilibrium x to which it reverts after being shocked
Shocks to a stationary process have transitory effects (finite memory)
168
Stationary Process Illustration

Intuition
Intuition:
Shock hits
long-run
xt
Formally: a stationary process has constant mean and variance and

(auto)covariances do not depend on t but on the time lag k
(a ) mean( xt ) E(xt ) for all t
(b) var( xt ) E ( xt 2 ) 2 for all t
(c) cov( xt , xt k ) k for all k
Is a WHITE NOISE process xt a stationary process?

Yes, a white noise process is a particular type of stationary process but not the
only one!
169
A Random Walk and a Random Walk with Drift
70
60
Random Walk
Random Walk with Drift
50
40
30
20
10
0
1
19 37 55 73 91 109 127 145 163 181 199 217 235 253 271 289 307 325 343 361 379 397 415 433 451 469 487
-10
-20
170
A Deterministic Trend Process
30
25
20
15
10
5
0
-5 1
40 79 118 157 196 235 274 313 352 391 430 469
171
Why do we need to test for Non-Stationarity?
The stationarity
y or otherwise of a series can strongly
g y influence its behaviour
and properties - e.g. persistence of shocks will be infinite for nonstationary
series
Spurious regressions. If two variables are trending over time, a regression of
one on the other could have a high R2 even if the two are totally unrelated
If the variables in the regression model are not stationary,
stationary then it can be proved
that the standard assumptions for asymptotic analysis will not be valid. In other
words, the usual t-ratios will not follow a t-distribution, so we cannot validly
undertake hypothesis tests about the regression parameters.
172
Value of R2 for 1000 Sets of Regressions of a

Non-stationary Variable on another Independent
Non-stationary Variable
173
Value of t-ratio on Slope Coefficient for 1000 Sets of

Regressions of a Non-stationary Variable on another
Independent Non-stationary Variable
174
Two types of Non-Stationarity
Various definitions of non-stationarityy exist.

There are two models which have been frequently used to characterise nonstationarity: the random walk model with drift:
(1)
yt = + yt-1 + ut
and the deterministic trend process:
yt = + t + ut
(2)
where ut is iid in both cases.
175
Stochastic Non-stationarity
Note that the model ((1)) could be ggeneralised to the case where yt is an
explosive process:
yt = + yt-1 + ut
where > 1.
Typically, the explosive case is ignored and we use = 1 to characterise the
non-stationarity because
> 1 does
d
nott describe
d
ib many data
d t series
i in
i economics
i andd finance.
fi
> 1 has an intuitively unappealing property: shocks to the system are not
only persistent through time, they are propagated so that a given shock will
have an increasingly large influence.
176
The Impact of Shocks for Stationary and Non-stationary

Series
We have 3 cases:
1 <1 T0
1.
0 as T
T
So the shocks to the system gradually die away.
2. =1 T =1 T
So shocks persist in the system and never die away. We obtain:
yt y0 ut
as T
i 0
So just an infinite sum of past shocks plus some starting value of y0.
3. >1. Now given shocks become more influential as time goes on,
since if >1, 3>2> etc.
177
Detrending a Non-stationary Series

Going back to our 2 characterisations of non-stationarity, the r.w. with drift:
yt = + yt-1 + ut
(1)
and
d the
th trend-stationary
t d t ti
process
yt = + t + ut
(2)
The two will require different treatments to induce stationarity. The second
case is known as deterministic non-stationarity and what is required is
detrending.
The first case is known as stochastic non-stationarity. If we let
yt = yt - yt-1
and
L yt = yt-1
t1
so
(1-L) yt = yt - L yt = yt - yt-1
If we take (1) and subtract yt-1 from both sides:
yt - yt-1 = + ut
yt
= + ut
We say that we have induced stationarity by differencing once.
178
Definition of Stochastic Non-Stationarity

Consider again the simplest stochastic trend model:
yt = yt-1 + ut
or
yt= ut
We can generalise this concept to consider the case where the series contains
more than one unit root. That is, we would need to apply the first
difference operator, , more than once to induce stationarity.
Definition
If a non
non-stationary
stationary series, yt must be differenced d times before it becomes
stationary, then it is said to be integrated of order d. We write yt I(d).
So if yt I(d) then dyt I(0).
An I(0) series is a stationary series
An I(1) series contains one unit root,
e.g. yt = yt-1 + ut
179
How do we test for a unit root?

The early and pioneering work on testing for a unit root in time series was
done by Dickey and Fuller (Dickey and Fuller 1979, Fuller 1976). The basic
objective of the test is to test the null hypothesis that =1 in:
yt = yt-1 + ut
against the one-sided alternative <1. So we have
H0: series contains a unit root
vs. H1: series is stationary.
We
W usually
ll use the
h regression:
i
yt = yt-1 + ut
so that a test of =1 is equivalent to a test of =0 (since -1=).
180
Computing the Dickey-Fuller Test Statistic
For the null hypothesis, we can write

yt=ut
where yt = yt- yt-1, and the alternative may be expressed as
yt = yt-1+ut
with =-1. In each case, the tests are based on the t-ratio on the yt-1 term in the
estimated regression of yt on
yt-1, plus a constant in case ii) and a constant and trend in case iii). The test statistics are
defined as
test statistic =
SE
The test statistic does not follow the usual t-distribution under the null, since the null is
one of non-stationarity, but rather follows a non-standard distribution. Critical values
are derived from Monte Carlo experiments in, for example, Fuller (1976). Relevant
examples of the distribution are shown in table 4.1 below
181
Critical Values for the Dickey-Fuller Test
Significance level 10%

5%
C.V. for constant -2.57
-2.86
but no trend
C.V. for constant -3.12
-3.41
and trend
Table 4.1: Critical Values for DF and
1976, p373).
1%
-3.43
-3.96
ADF Tests (Fuller,
The null hypothesis of a unit root is rejected in favour of the stationary alternative
in each case if the test statistic is more negative than the critical value.
182
The Augmented Dickey-Fuller (ADF) Test
The tests above are only valid if ut is white noise. In particular, ut will be
autocorrelated if there was autocorrelation in the dependent variable of the
regression (yt) which we have not modelled. The solution is to augment
the test using p lags of the dependent variable. The alternative model in
case (i) is now written:
p
yt yt 1 i yt i ut
i 1
The same critical values from the DF tables are used as before. A problem
now arises in determining the optimal number of lags of the dependent
variable.
i bl
There are 2 ways
- use the frequency of the data to decide
- use information criteria
183
Criticism of Dickey-Fuller Tests

Main criticism is that the power of the tests is low if the process is stationary
but with a root close to the non-stationary boundary.
e.g. the tests are poor at deciding if
=1 or =0.95,
especially with small sample sizes.
If the true data generating process (dgp) is
yt = 0.95yt-1 + ut
then the null hypothesis of a unit root should be rejected.
One way to get around this is to use a stationarity test as well as the unit root
tests we have looked at.
184
Univariate Time Series Models

Where we attempt to predict returns (or any other series) using only information
contained in their past values.
S
Some
N t ti and
Notation
d Concepts
C
t
A Strictly Stationary Process
A strictly stationary process is one where
P{yt1 b1,..., ytn bn } P{yt1 m b1,..., ytn m bn }
i.e. the probability measure for the sequence {yt} is the same as that for {yt+m} m.
A Weakly Stationary Process
If a series satisfies the next three equations,
equations it is said to be weakly or covariance
stationary
t = 1,2,...,
1. E(yt) = ,
2. E ( yt1 )( yt 2 ) t 2 t1 t1 , t2
3. E ( yt )( yt ) 2
185
Autocovariances
So if the process is covariance stationary, all the variances are the same and all the
covariances depend on the difference between t1 and t2. The moments
E ( yt E ( yt ))( yt s E ( yt s )) s , s = (-,+)
are known as the covariance function.
The covariances, s, are known as autocovariances.
However, the autocovariances depend on the units of measurement of yt.
It is
i thus
h more convenient
i to use the
h autocorrelation
l i which
hi h are the
h autocovariances
i
normalised by dividing by the variance:
s = (-,+)
,
s s
0
If we plot s against s=0,1,2,... then we obtain the autocorrelation function or
correlogram.
186
A White Noise Process
A white noise process is one with (virtually) no discernible structure. A definition of a

white noise process is
E ( yt )
Var ( yt ) 2
2 if t r
t r
0
otherwise
Thus the autocorrelation function will be zero apart from a single peak of 1 at s = 0.
s approximately N(0,1/T) where T = sample size
We can use this to do significance tests for the autocorrelation coefficients by
constructing a confidence interval.
interval
For example, a 95% confidence interval would be given by 1.96/T.
If the sample autocorrelation coefficient,s , falls outside this region for any value of s,
then we reject the null hypothesis that the true value of the coefficient at lag s is
zero.
187
Joint Hypothesis Tests
We can also test the joint hypothesis that all m of the k correlation coefficients
are simultaneously equal to zero using the Q-statistic
Q statistic developed by Box and
m
Pierce:
2
Q T k
k 1
where T = sample size, m = maximum lag length

The Q-statistic is asymptotically distributed as a m2.
However, the Box Pierce test has poor small sample properties, so a variant
has been developed,
developed called the Ljung-Box
Ljung Box statistic:
m
Q T T 2
k2
k 1 T k
~ m2
This statistic is very useful as a test of linear dependence in time series. It can
also be used on residuals.
188
An ACF Example
Question:
S
Suppose
th
thatt a researcher
h had
h d estimated
ti t d the
th first
fi t 5 autocorrelation
t
l ti coefficients
ffi i t
using a series of length 100 observations, and found them to be (from 1 to 5):
0.207, -0.013, 0.086, 0.005, -0.022.
Test each of the individual coefficient for significance, and use both the BoxPierce and Ljung-Box tests to establish whether they are jointly significant.
Solution:
A coefficient would be significant if it lies outside ((-0.196,+0.196)
0.196, 0.196) at the 5%
level, so only the first autocorrelation coefficient is significant.
Q=5.09 and Q*=5.26
Compared with a tabulated 2(5)=11.1 at the 5% level, so the 5 coefficients
are jointly insignificant.
189
Autoregressive Moving Average (ARMA) Processes
These models are an alternative to the structural models that we have

examined so far in this module.
module
They are easy to set up, requiring only one data series (y)
They are atheroretical
y are sometimes useful for forecastingg
They
We will examine these models now, starting with the component parts
Dont worry about the algebra!
190
Moving Average Processes
L
Lett ut (t=1,2,3,...)
(t 1 2 3 ) be
b a sequence off independently
i d
d tl andd identically
id ti ll
distributed (iid) random variables with E(ut)=0 and Var(ut)=2, then
yt = + ut + 1ut-1 + 2ut-2 + ... + qut-q
th
is a q order moving average model MA(q).
Its p
properties
p
are
Constant mean
Constant variance
Autocovariances are zero beyond lag q
191
ACF Plot
Thus the acf plot will appear as follows:

1.2
1
0.8
0.6
acf
0.4
0.2
0
0
-0.2
-0.4
-0.6
192
Autoregressive Processes
An autoregressive model of order p, an AR(p) can be expressed as
y t 1 y t 1 2 y t 2 ... p y t p u t
Or using the lag operator notation:
Lyt = yt-1
Liyt = yt-i
p
y t i y t i u t
i 1
or
y t i Li y t u t
i 1
or ( L) y t u t
where
( L) 1 (1 L 2 L2 ... p Lp ) .
193
Wolds Decomposition Theorem
States that any stationary series can be decomposed into the sum of two
unrelated
l t d processes, a purely
l deterministic
d t
i i ti partt andd a purely
l stochastic
t h ti
part,
For an AR(p), Wolds decomposition theorem essentially means that the
model can be written as an MA().
An MA(q) can also be written as an AR()
194
The Partial Autocorrelation Function

Measures the correlation between an observation k periods ago and the
current observation,
observation after controlling for observations at intermediate lags
(i.e. all lags < k).
So the pacf measures the correlation between yt and yt-k after removing the
effects of yt-k+1 , yt-k+2 , , yt-1 .
At lag 1, the acf = pacf always
At lag 2, pacf = (2-12) / (1-12)
For lags 3+, the formulae are more complex.
195
The Partial Autocorrelation Function (contd)
The pacf is useful for telling the difference between an AR process and an
ARMA process.
In the case of an AR(p), there are direct connections between yt and yt-s only
for s p.
So for an AR(p), the theoretical pacf will be zero after lag p.
In the case of an MA(q), this can be written as an AR(), so there are direct
connections between yt and all its previous values.
For an MA(q), the theoretical pacf will be geometrically declining.
196
ARMA Processes
By combining the AR(p) and MA(q) models, we can obtain an ARMA(p,q)

model:
d l
( L) y t ( L)u t
where ( L) 1 1 L 2 L2 ... p Lp
and ( L) 1 1 L 2 L2 ... q Lq
or
yt 1 y t 1 2 y t 2 ... p yt p 1u t 1 2 u t 2 ... q u t q u t
2
2
with E (u t ) 0; E (u t ) ; E (u t u s ) 0, t s
197
ARMA Processes
Let x denote a financial variable. It is said to follow an ARMA process if its

current value is determined by a linear combination of its past values and also
current and past shocks.
Hence fitting an ARMA(p,q) model (e.g. to uncover anomalies which can be
exploited for making profits, for forecasting purposes) to a financial time series
amounts to estimating the AR and MA parameters
198
Summary of the Behaviour of the acf for

AR and MA Processes
An autoregressive process has

a geometrically decaying acf
number of spikes of pacf = AR order
A moving average process has
Number of spikes of acf = MA order
a geometrically decaying pacf
199
Some sample acf and pacf plots

for standard processes
The acf and pacf are not produced analytically from the relevant formulae for a model of that
type, but rather are estimated using 100,000 simulated observations with disturbances drawn
from a normal distribution.
ACF and PACF for an MA(1) Model: yt = 0.5ut-1 + ut
0.05
0
1
10
-0.05
acf a
and pacf
-0.1
-0.15
-0.2
-0.25
-0.3
acf
-0.35
pacf
-0.4
-0.45
Lag
200
ACF and PACF for an MA(2) Model:

yt = 0.5ut-1 - 0.25ut-2 + ut
0.4
acf
0.3
pacf
0.2
acf and pacf
0.1
0
1
10
-0.1
-0.2
-0.3
-0.4
Lags
201
ACF and PACF for a slowly decaying AR(1) Model:

yt = 0.9yt-1 + ut
1
0.9
acf
pacf
0.8
0.7
acf and pacf
0.6
0.5
0.4
0.3
0.2
0.1
0
1
10
-0.1
Lags
202
ACF and PACF for a more rapidly decaying AR(1)

Model: yt = 0.5yt-1 + ut
0.6
0.5
acf
pacf
acf and pacf
0.4
0.3
0.2
01
0.1
0
1
10
-0.1
Lags
203
ACF and PACF for a more rapidly decaying AR(1)

Model with Negative Coefficient: yt = -0.5yt-1 + ut
0.3
0.2
0.1
acf and pacf
0
1
10
-0.1
-0.2
-0.3
-0.4
acf
pacf
-0.5
-0.6
Lags
204
ACF and PACF for a Non-stationary Model

(i.e. a unit coefficient): yt = yt-1 + ut
1
0.9
acf
pacf
0.8
acf and pacf
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
1
10
Lags
205
ACF and PACF for an ARMA(1,1):

yt = 0.5yt-1 + 0.5ut-1 + ut
0.8
0.6
acf
pacf
acf and pacf
0.4
0.2
0
1
10
-0.2
-0.4
Lags
206
Building ARMA Models

- The Box Jenkins Approach
Box and Jenkins (1970) were the first to approach the task of estimating an
ARMA model
d l iin a systematic
i manner. There
Th are 3 steps to their
h i approach:
h
1. Identification
2. Estimation
3. Model diagnostic checking
Step 1:
- Involves determining the order of the model.
- Use of graphical procedures
- A better procedure is now available
207
Building ARMA Models

- The Box Jenkins Approach (contd)
Step 2:
- Estimation of the parameters
- Can be done using least squares or maximum likelihood depending
on the model.
Step 3:
- Model checking
Box and Jenkins suggest 2 methods:
- deliberate overfitting
- residual diagnostics
208
Some More Recent Developments in

ARMA Modelling
Identification would typically not be done using acfs.

We
W wantt to
t form
f
a parsimonious
i
i
model.
d l
Reasons:
- variance of estimators is inversely proportional to the number of degrees of
freedom.
- models which are profligate might be inclined to fit to data specific features
This gives motivation for using information criteria, which embody 2 factors
- a term which is a function of the RSS
- some penalty for adding extra parameters
The object is to choose the number of parameters which minimises the
information criterion.
209
Information Criteria for Model Selection
The information criteria vary according to how stiff the penalty term is.
The
Th th
three mostt popular
l criteria
it i are Akaikes
Ak ik (1974) information
i f
ti criterion
it i
(AIC), Schwarzs (1978) Bayesian information criterion (SBIC), and the
Hannan-Quinn criterion (HQIC).
AIC ln( 2 ) 2 k / T
k
ln T
T
2
k
ln(ln(T ))
HQIC ln( 2 )
T
where k = p + q + 1,
1 T = sample size.
size So we min.
min IC s.t.
s t p p, q q
SBIC embodies a stiffer penalty term than AIC.
Which IC should be preferred if they suggest different model orders?
SBIC is strongly consistent but (inefficient).
AIC is not consistent, and will typically pick bigger models.
SBIC ln( 2 )
210
ARIMA Models
As distinct from ARMA models. The I stands for integrated.

An integrated autoregressive process is one with a characteristic root on the
unit circle.
Typically researchers difference the variable as necessary and then build an
ARMA model on those differenced variables.
An ARMA(p,q) model in the variable differenced d times is equivalent to an
ARIMA(p,d,q) model on the original data.
211
ARIMA Examples from Mills (p22-30)

Example 1: Modelling Interest Rate Spreads
Data: Difference between UK 20 yyear ggilts and 3 month T-bills qquarterly
y from 19521988 = 148 observations (we assume).
The SACFs & SPACFs are given in the following table:
lag
k
kk
1
0.829
0.829
2
0.672
-0.048
3
0.547
0.009
4
0.435
-0.034
5
0.346
0.005
6
0.279
0.012
7
0.189
-0.116
8
0.154
0.114
9
0.145
0.047
10
0.164
0.100
11
0.185
0.028
12
0.207
0.038
212
ARIMA Examples from Mills (p22-30) (Contd)

Using the 1.961/T rule of thumb for assessing significance, all the SACFs to lag 7
are significant.
g
Only the first SPACF is significant.
Conclusion: looks like an AR(1). We can estimate it using OLS
.
xt 0176
0.856 xt 1
(0.098)
(0.045)
tt-ratios:
ratios:
1.796
19.022
213
ARIMA Examples from Mills (p22-30) (Contd)

Is the model adequate?
Can use the LB-Q* test:
Estimated Q*(12) = 12.26 c.f. tabulated 2(12)5% = 21.0
( )10% = 18.5
No evidence of autocorrelation in the residuals.
The other Box-Jenkins adequacy test is deliberate over-fitting.
( )
An AR(2):
t-ratios:
xt
0197
.
( 0101
. )
1.950
0.927 xt 1 0.079 xt 2
(0.084)
11.036
(0.084)
-0.940
xt 0.213 0.831xt 1 0.092ut 1

An ARMA(1,1):
(0104
. )
(0.051)
( 0.095)
t-ratios:
2.048
16.294
-0.968
Conclusion: original AR(1) specification is adequate.
214
Example 2: Modelling the Returns on the FTA-All Share
Data are 312 monthly observations on the FTA returns 1965-1990.

LB-Q*(12) on raw data = 19.3 with p-value 0.08.
The SACF and SPACF are
lag
1
2
3
4
5
6
7
8
9
10
11
12
kk
0.148
-0.061
0.117
0.067
-0.082
0 082
0.013
0.041
-0.011
0.087
0.021
-0.008
0.026
0.148
0.085
0.143
0.020
-0.079
0 079
0.034
0.008
0.002
0.102
-0.030
0.012
0.010
215

(Contd)
1.961/T = 0.111.
The first and the third autocorrelation coefficients are significant.
The first and third partial autocorrelation coefficients are significant.
Could be an ARMA mixed process but difficult to tell, so useful to let the IC decide.
q
3.685
3.694
3.704
3.698
3.688
3.696
3.699
3.707
3.709
3.730
3.752
3.758
3.724
3.744
3.759
3.779
p
0
1
2
3
3.701
3.689
3.691
3.683
0
1
2
3
3.701
3.701
3.715
3.719
AIC
3.684
3.685
3.695
3.693
SBIC
3.696
3.709
3.731
3.741
216

(Contd)
AIC minimised at ARMA(3,0)

SBIC minimised at ARMA(0,1)
The two estimated models are

- AR(3):
. xt 1 0.108 xt 2
xt 4.41 0173
(0.62)
( 0.057)
( 0.057)
t-ratios:
( )
- MA(1):
t-ratios:
7.11
xt
3.04
-1.89
LB-Q*(12)=6.07
.
558
0.186ut 1
(0.35)
(0.056)
15.94
0.144 xt 3
(0.057)
2.526
3.00
LB-Q*(12)=15.30
217
There are numerous examples of instances where this may arise, for example
where
h we wantt to
t model:
d l
Why firms choose to list their shares on the NASDAQ rather than the NYSE
Why some stocks pay dividends while others do not
What factors affect whether countries default on their sovereign debt
Why some firms choose to issue new stock to finance an expansion while
others issue bonds
Why
y some firms choose to engage
g g in stock splits
p while others do not.
It is fairly easy to see in all these cases that the appropriate form for the
dependent variable would be a 0-1 dummy variable since there are only two
possible outcomes. There are, of course, also situations where it would be
more useful to allow the dependent variable to take on other values, but these
will be considered later.
234
The Linear Probability Model
We will first examine a simple and obvious, but unfortunately flawed,

method for dealing with binary dependent variables
variables, known as the linear
probability model.
it is based on an assumption that the probability of an event occurring, Pi, is
linearly related to a set of explanatory variables
Pi p ( yi 1) 1 2 x2i 3 x3i k xki ui
The actual probabilities cannot be observed, so we would estimate a model

where the outcomes, yi (the series of zeros and ones), would be the dependent
variable.
This is then a linear regression model and would be estimated by OLS.
The set of explanatory variables could include either quantitative variables or
dummies or both.
The fitted values from this regression are the estimated probabilities for yi =1
for each observation i.
234
The Linear Probability Model
The slope estimates for the linear probability model can be interpreted as the
change in the probability that the dependent variable will equal 1 for a oneone
unit change in a given explanatory variable, holding the effect of all other
explanatory variables fixed.
Suppose, for example, that we wanted to model the probability that a firm i
will pay a dividend p(yi = 1) as a function of its market capitalisation (x2i,
measured in millions of US dollars), and we fit the following line:
Pi 0.3 0.012 x2i
where Pi denotes the fitted or estimated probability for firm i.

This model suggests that for every $1m increase in size, the probability that
the firm will pay a dividend increases by 0.012 (or 1.2%).
A firm whose stock is valued at $50m will have a -0.3+0.01250=0.3 (or
30%) probability of making a dividend payment.
235
The Fatal Flaw of the Linear Probability Model
Graphically, the situation we have is
236
Disadvantages of the Linear Probability Model
While the linear probability model is simple to estimate and intuitive to

i
interpret,
the
h diagram
di
on the
h previous
i
slide
lid should
h ld immediately
i
di l signal
i la
problem with this setup.
For any firm whose value is less than $25m, the model-predicted probability
of dividend payment is negative, while for any firm worth more than $88m,
the probability is greater than one.
Clearly, such predictions cannot be allowed to stand,

Clearly
stand since the probabilities
should lie within the range (0,1).
An obvious solution is to truncate the probabilities at 0 or 1, so that a

probability of -0.3, say, would be set to zero, and a probability of, say, 1.2,
would be set to 1.
237
Disadvantages of the Linear Probability Model 2
However, there are at least two reasons why this is still not adequate.
The process of truncation will result in too many observations for which the
estimated probabilities are exactly zero or one.
More importantly, it is simply not plausible to suggest that the firm's

probability of paying a dividend is either exactly zero or exactly one. Are we
really certain that very small firms will definitely never pay a dividend and
that large firms will always make a payout?
Probably not, and so a different kind of model is usually used for binary
dependent variables either a logit or a probit specification.
238
Disadvantages of the Linear Probability Model 3
The LPM also suffers from a couple of more standard econometric problems
that
h we hhave examined
i d in
i previous
i
chapters.
h
Since the dependent variable only takes one or two values, for given (fixed in
repeated samples) values of the explanatory variables, the disturbance term
will also only take on one of two values.
Hence the error term cannot plausibly be assumed to be normally distributed.
Since the disturbance term changes systematically with the explanatory

variables, the former will also be heteroscedastic.
It is therefore essential that heteroscedasticity-robust standard errors are always
used in the context of limited dependent variable models.
239
Logit and Probit: Better Approaches
Both the logit and probit model

approaches
h are able
bl to overcome
the limitation of the LPM that it
can produce estimated
probabilities that are negative or
greater than one.
They do this by using a function
that effectively transforms the
regression model so that the fitted
values
l
are bounded
b
d d within
ithi the
th
(0,1) interval.
Visually, the fitted regression
model will appear as an S-shape
rather than a straight line, as was
the case for the LPM.
240
The Logit Model
The logit
g model is so-called because it uses a the cumulative logistic
g
distribution to transform the model so that the probabilities follow the Sshape given on the previous slide.
With the logistic model, 0 and 1 are asymptotes to the function and thus the
probabilities will never actually fall to exactly zero or rise to one, although
they may come infinitesimally close.
The logit model is not linear (and cannot be made linear by

b a transformation)
and thus is not estimable using OLS.
Instead, maximum likelihood is usually used to estimate the parameters of

the model.
241
Using a Logit to Test the Pecking Order Hypothesis
The theory
y of firm financingg suggests
gg
that corporations
p
should use the
cheapest methods of financing their activities first (i.e. the sources of funds
that require payment of the lowest rates of return to investors) and then only
switch to more expensive methods when the cheaper sources have been
exhausted.
This is known as the pecking order hypothesis.
Differences in the relative cost of the various sources of funds are argued to
arise largely from information asymmetries since the firm's senior managers
will know the true riskiness of the business,, whereas potential
p
outside
investors will not.
Hence, all else equal, firms will prefer internal finance and then, if further
(external) funding is necessary, the firm's riskiness will determine the type of
funding sought.
242
Data
Helwege and Liang (1996) examine the pecking order hypothesis in the
context off a set off US fi
firms that
h hhad
d bbeen newly
l li
listedd on the
h stockk market
k in
i
1983, with their additional funding decisions being tracked over the 1984 1992 period.
Such newly listed firms are argued to experience higher rates of growth, and
are more likely to require additional external funding than firms which have
been stock market listed for many years.
Theyy are also more likely
y to exhibit information asymmetries
y
due to their lack of
a track record.
The list of initial public offerings (IPOs) was obtained from the Securities
Data Corporation and the Securities and Exchange Commission with data
obtained from Compustat.
243
Aims of the Study and the Model
A core objective of the paper is to determine the factors that affect the
probability
b bili off raising
i i externall financing.
fi
i
As such, the dependent variable will be binary -- that is, a column of 1's (firm
raises funds externally) and 0's (firm does not raise any external funds).
Thus OLS would not be appropriate and hence a logit model is used.
The explanatory
Th
l
variables
i bl are a set that
h aims
i to capture the
h relative
l i degree
d
off
information asymmetry and degree of riskiness of the firm.
If the pecking order hypothesis is supported by the data, then firms should be
more likely to raise external funding the less internal cash they hold.
244
Variables used in the Model
The variable deficit measures (capital expenditures + acquisitions + dividends earnings).

i )
Positive deficit is a variable identical to deficit but with any negative deficits (i.e.
surpluses) set to zero
Surplus is equal to the negative of deficit for firms where deficit is negative
Positive deficit operating income is an interaction term where the two variables
are multiplied together to capture cases where firms have strong investment
opportunities but limited access to internal funds
Assets is used as a measure of firm size
Industry asset growth is the average rate of growth of assets in that firm's
industry over the 1983-1992 period
Firm's growth of sales is the growth rate of sales averaged over the previous 5
years
Previous financing is a dummy variable equal to one for firms that obtained
external financing in the previous year.
245
Results from Logit Estimation
Source: Helwege and Liang (1996)
246
Analysis of Results
The key variable, deficit has a parameter that is not statistically significant
and
d hence
h
the
h probability
b bili off obtaining
b i i externall fi
financing
i does
d
not depend
d
d on
the size of a firm's cash deficit.
Or an alternative explanation, as with a similar result in the context of a
standard regression model, is that the probability varies widely across firms
with the size of the cash deficit so that the standard errors are large relative to
the point estimate.
The parameter on the surplus variable has the correct negative sign,
indicating that the larger a firm's surplus, the less likely it is to seek external
financing, which provides some limited support for the pecking order
hypothesis.
Larger firms (with larger total assets) are more likely to use the capital
markets, as are firms that have already obtained external financing during the
previous year.
247
The Probit Model
Instead of using the cumulative logistic function to transform the model, the
cumulative normal distribution is sometimes used instead.
This gives rise to the probit model.
As for the logistic approach, this function provides a transformation to ensure

that the fitted probabilities will lie between zero and one.
248
Logit or Probit?
For the majority of the applications, the logit and probit models will give very
similar characterisations of the data because the densities are very similar.
similar
That is, the fitted regression plots will be virtually indistinguishable, and the
implied relationships between the explanatory variables and the probability
that yi =1 will also be very similar.
Both approaches are much preferred to the linear probability model. The only
instance where the models may give non-negligibility different results occurs
when the split of the yi between 0 and 1 is very unbalanced - for example,
when yi =1 occurs only 10% of the time.
Stock and Watson (2006) suggest that the logistic approach was traditionally
preferred since the function does not require the evaluation of an integral and
thus the model parameters could be estimated faster.
However, this argument is no longer relevant given the computational speeds
now achievable and the choice of one specification rather than the other is now
usually arbitrary.
249
Parameter Interpretation for Logit and Probit Models
Standard errors and t-ratios will automatically be calculated by the

econometric software package used,
used and hypothesis tests can be conducted in
the usual fashion.
However, interpretation of the coefficients needs slight care.
It is tempting, but incorrect, to state that a 1-unit increase in x2i, for example,
causes a 2 % increase in the probability that the outcome corresponding to yi
=1 will be realised.
This would have been the correct interpretation for the linear probability
model.
However, for logit or probit models, this interpretation would be incorrect
because the form of the function is Pi = 1 + 2 x2i + ui, for example, but rather
Pi = F(x2i) where F represents the (non-linear) logistic or cumulative normal
function.
250
Parameter Interpretation for Logit and Probit Models
To obtain the required relationship between changes in x2i and Pi, we would
need to differentiate F with respect to x2i and it turns out that this derivative is
2F(x2i) .
So in fact, a 1-unit increase in x2i will cause a 2F(x2i) increase in probability.
Usually, these impacts of incremental changes in an explanatory variable are
evaluated by setting each of them to their mean values.
These estimates are sometimes known as the marginal effects.
There is also another way of interpreting discrete choice models known as the
random utility model.
The idea is that we can view the value of y that is chosen by individual i
(either 0 or 1) as giving that person a particular level of utility, and the choice
that is made will obviously be the one that generates the highest level of
utility.
This interpretation is particularly useful in the situation where the person faces
a choice between more than 2 possibilities see a later slide.
251
Goodness of Fit for Probit and Logit Models
While it would be possible to calculate the values of the standard goodness of

fit measures such as RSS,
RSS R2 these cease to have any real meaning.
meaning
2
R , if calculated in the usual fashion, will be misleading because the fitted
values from the model can take on any value but the actual values will only be
either 0 and 1.
Thus if yi =1 and Pi = 0.8, the model has effectively made the correct
prediction, whereas R2 and will not give it full credit for this.
Two goodness of fit measures that are commonly reported for limited
dependent variable models are
The percentage of yi values correctly predicted
A measure known as pseudo-R2 (also known as McFadden's R2), defined
as one minus the ratio of the LLF for the logit or probit model to the LLF
for a model with only an intercept.
252
The End!
Good luck in the exams

Dont fail
Hope you enjoyed and learnt
Have a nice break
253
CASS BUSINESS SCHOOL
City University
Exercise 1: Simple Linear Regression and Hypothesis Testing
1. What are the five assumptions usually made about the unobservable disturbance
terms? Briefly explain the meaning of each. Why do we need to make these
assumptions?
2. Which of the following models can be estimated (following a suitable
rearrangement) using ordinary least squares?
(Hint: the models need to be linear in the parameters).
where x, y, z are variables and are parameters to be estimated.
1. y t xt u t
2. y t e xt e ut
3. y t xt u t
4. ln( y t ) ln( xt ) u t
5. y t xt z t u t
3. The capital asset pricing model (CAPM) model can be written as
E ( Ri ) R f [ E ( Rm ) R f ] .
using the standard notation.

The first step in using the CAPM is to estimate the stocks beta using the market
model. The market model can be written as:
Rit i i Rmt uit
(1)
Where Rit is the return for security i at time t, Rmt is the return on a proxy for the
market portfolio at time t, and ut is an iid random disturbance term.
The coefficient beta in this case is also the CAPM beta for security i.
Suppose that you had estimated equation (1) and found that the estimated value of
beta, was 1.147. The standard error associated with this coefficient SE ( ) is
estimated to be 0.0548.
A City Analyst has told you that this security closely follows the market, but that it
is no more risky, on average, than the market. This can be tested by the null
hypothesis that the value of beta is one. The model is estimated over 62 daily
observations. Test this hypothesis against a one-sided alternative that the security is
more risky than the market, at the 5% level. Write down the null and alternative
hypothesis. What do you conclude? Are the Analysts claims empirically verified?
4. The Analyst also tells you that shares in Chris Mining PLC have no systematic
risk, in other words the returns on its shares are completely un-related to
movements in the market. The value of beta and its standard error are calculated to
be 0.214 and 0.186 respectively. The model is estimated over 38 quarterly
observations. Write down the null and alternative hypothesis. Test this null
hypothesis against a two-sided alternative.
5. Form and interpret a 95% and a 99% confidence interval for beta using the
figures given in question 4.
CITY UNIVERSITY
Solutions to Exercise 1: Simple Linear Regression and Hypothesis
Testing
1. A list of the assumptions of the classical linear regression models disturbance
terms is given in the lecture notes handout.
We need to make the first four assumptions in order to prove that the ordinary least
squares estimators of and are best, that is to prove that they have minimum
variance among the class of linear unbiased estimators. The theorem that proves
that OLS estimators are BLUE (provided the assumptions are fulfilled) is known as
the Gauss-Markov theorem. If these assumptions are violated (which we will look
at in detail later in the course), then it may be that OLS estimators are no longer
unbiased or efficient. That is, they may be inaccurate or subject to fluctuations
between samples.
We needed to make the fifth assumption, that the disturbances are normally
distributed, in order to make statistical inferences about the population parameters
from the sample data, i.e. to test hypotheses about the coefficients. Making this
assumption (provided that the other assumptions also hold) implies that test
statistics will follow a t-distribution.
2. If the models are linear in the parameters, we can use OLS.
(1) Yes, can use OLS since the model is the usual linear model we have been
dealing with.
(2) Yes. The model can be linearised by taking logarithms of both sides and by
rearranging. Although this is a very specific case, it has sound theoretical
foundations (e.g. the Cobb-Douglas production function in economics), and it is the
case that many relationships can be approximately linearised by taking logs of the
variables. The effect of taking logs is to reduce the effect of extreme values on the
regression function, and it may be possible to turn multiplicative models into
additive ones which we can easily estimate.
(3) Yes and no! We can estimate this model using OLS, but we would not be able
to obtain the values of both and , but we would obtain the value of these two
coefficients multiplied together.
(4) Yes, we can use OLS, since this model is linear in the logarithms. For those
who have done some economics, models of this kind which are linear in the
logarithms have the interesting property that the coefficients ( and ) can be
interpreted as elasticities.
(5) Yes, in fact we can still use OLS since it is linear in the parameters. If we make
a substitution, say qt = xtzt, then we can run the regression:
yt = +qt + ut as usual.
So, in fact, we can estimate a fairly wide range of model types using these simple
tools.
3. The null hypothesis is that the true (but unknown) value of beta is equal to one,
against a one sided alternative that it is greater than one:
H0 : = 1
H1 : >1
* 1.147 1
test stat
2.682
0.0548
SE ( )
We want to compare this with a value from the t-table with T-2 degrees of freedom,
where T is the sample size and T-2 =60. We want a value with 5% all in one tail
since we are doing a 1-sided test. The critical t-value from the t-table is 1.671:
f(x)
5% rejection
region
+1.671
The value of the test statistic is in the rejection region and hence we can reject the
null hypothesis. We have statistically significant evidence that this security has a
beta greater than one, i.e. it is significantly more risky than the market as a whole.
4. We want to use a two-sided test to test the null hypothesis that shares in Chris
Mining are completely un-related to movements in the market as a whole. In other
words, the value of beta in the regression model would be zero so that whatever
happens to the value of the market proxy, Chris Mining would be completely
unaffected by it.
The null and alternative hypotheses are therefore:
H0 : = 0
H1 : 0
The test statistic has the same format as before, and is given by:
4
test stat
* 0.214 0
1.150
0.186
SE ( )
We want to find a value from the t-tables for a variable with 38-2=36 degrees of
freedom, and we want to look up the value that puts 2.5% of the distribution in each
tail since we are doing a two-sided test and we want to have a 5% size of test over
all. The critical t-value is 2.03:
-2.03
+2.03
Since the test statistic is not within the rejection region, we do not reject the null
hypothesis. We therefore conclude that we have no statistically significant evidence
that Chris Mining has any systematic risk. In other words, we have no evidence that
changes in the companys value are driven by movements in the market.
5. A confidence interval for beta is given by the formula:
( SE ( ) t crit , SE ( ) t crit )
Confidence intervals are almost invariably 2-sided, unless we are told otherwise
(which we are not here), so we want to look up the values which put 2.5% in the
upper tail and 0.5% in the upper tail for the 95% and 99% confidence intervals
respectively.
The 0.5% critical values given as follows for a t-distribution with T-2=38-2=36:
degrees of freedom.
-2.72
+2.72
The confidence interval in each case is thus given by

(0.2140.186*2.03) for a 95% confidence interval, which solves to (-0.164,0.592)
and
(0.2140.186*2.72) for a 99% confidence interval, which solves to (-0.292,0.720)
There are a couple of points worth noting.
First, an intuitive interpretation of an X% confidence interval is that we are X%
sure that the true value of the population parameter lies within the interval. So we
are 95% sure that the true value of beta lies within the interval (-0.164,0.592) and
we are 99% sure that the true population value of beta lies within (-0.292,0.720).
Thus in order to be more sure that we have the true vale of beta contained in the
interval, i.e. as we move from 95% to 99% confidence, the interval must become
wider.
The second point to note is that we can test an infinite number of hypotheses about
beta once we have formed the interval. For example, we would not reject the null
hypothesis contained in the last question (i.e. that beta = 0), since that value of beta
lies within the 95% and 99% confidence intervals. Would we reject or not reject a
null hypothesis that the true value of beta was 0.6? At the 5% level, we should have
enough evidence against the null hypothesis to reject it, since 0.6 is not contained
within the 95% confidence interval. But at the 1% level, we would no longer have
sufficient evidence to reject the null hypothesis, since 0.6 is now contained within
the interval. Therefore we should always if possible conduct some sort of
sensitivity analysis to see if our conclusions are altered by (sensible) changes in the
level of significance used.
City University
Exercise 2: Multiple Linear Regression F-tests and Goodness of Fit
Statistics
1. By using examples from the relevant statistical tables, explain the relationship
between the t and the F-distributions.
For questions 2-7 assume that our econometric model is of the form
yt = 1 + 2x2t + 3x3t + 4x4t + 5x5t + ut
(1)
2. Do we test hypotheses about the actual values of the coefficients (i.e. ) or their
estimated values (i.e. ) and why?
3. Which of the following hypotheses about the coefficients can be tested using a ttest? Which of them can be tested using an F-test? In each case, state the number of
restrictions.
i)
H0 : 2 = 2
ii)
H0 : 2 + 3 = 1
iii)
H0 : 2 + 3 = 1 and 4 = 1
iv)
H0 : 2=0 and 3 =0 and 4 = 0 and 5 = 0
v)
H0 : 2 3 = 1
4. Which of the above null hypotheses constitutes the regression F-statistic? Why
are we always interested in this null hypothesis whatever the regression relationship
under study? What exactly would constitute the alternative hypothesis in this case?
5. Which would we expect to be bigger - the unrestricted residual sum of squares or
the restricted residual sum of squares and why?
6. You decide to investigate the relationship given in the null hypothesis of
question 3 part (iii). What would constitute the restricted regression? The
regressions are carried out on a sample of 96 quarterly observations, and the
residual sum of squares for the restricted and unrestricted regressions are 102.87
and 91.41 respectively. Perform the test. What is your conclusion?
7. You estimate a regression of the form given by the equation below in order to
evaluate the effect of various firm-specific factors on the firms return series. You
run a cross-sectional regression with 200 firms
ri = 1 + 2Si + 3MBi + 4PEi + 5BETAi + ui
where ri is the percentage annual return for the stock
Si is the size of firm i measured in terms of sales revenue
MBi is the market to book ratio of the firm
PEi is the price-earnings ratio of the firm
BETAi is the stocks CAPM beta coefficient
You obtain the following results (with standard errors in parentheses):
7
ri = 0.080 + 0.801Si + 0.321MBi + 0.164PEi - 0.084BETAi

(0.064) (0.147) (0.136)
(0.420)
(0.120)
Calculate the t-ratios. What do you conclude about the effect of each variable on
the returns of a security? On the basis of your results what variables would you
consider deleting from the regression? If the stocks beta increased from 1 to 1.2,
what would be the effect on the stocks return? Is the sign on beta what you would
have expected?
8.Why is it desirable to remove insignificant variables from a regression?
City University
Solutions to Exercise 2: Multiple Linear Regression F-tests and
Goodness of Fit Statistics
1. It can be proved that a t-distribution is just a special case of the more general Fdistribution. The square of a t-distribution with T-k degrees of freedom will be
identical to an F-distribution with (1,T-k) degrees of freedom. But remember that if
we use a 5% size of test, we will look up a 5% value for the F-distribution because
the test is 2-sided even though we only look in one tail of the distribution. We look
up a 2.5% value for the t-distribution since the test is 2-tailed.
Examples at the 5% level from tables
T-k
20
40
60
120
F critical value
4.35
4.08
4.00
3.92
t critical value
2.09
2.02
2.00
1.98
2. We test hypotheses about the actual coefficients, not the estimated values. We
want to make inferences about the likely values of the population parameters (i.e. to
test hypotheses about them). We do not need to test hypotheses about the estimated
values since we know exactly what our estimates are because we calculated them!
3.i)
H0 : 2 = 2
We could use an F- or a t- test for this one since it is a single hypothesis involving
only one coefficient. We would probably in practice use a t-test since it is
computationally simpler and we only have to estimate one regression. There is one
restriction.
ii)
H0 : 2 + 3 = 1
Since this involves more than one coefficient, we should use an F-test. There is one
restriction.
iii)
H0 : 2 + 3 = 1 and 4 = 1
Since we are testing more than one hypothesis simultaneously, we would use an Ftest. There are 2 restrictions.
iv)
H0 : 2 =0 and 3 = 0 and 4 = 0 and 5 = 0
As for iii), we are testing multiple hypotheses so we cannot use a t-test. We have 4
restrictions.
v)
H0 : 2 3 = 1
Although there is only one restriction, it is a multiplicative restriction. We therefore
cannot use a t-test or an F-test to test it. In fact we cannot test it at all using the
methodology that we have learned.
4. THE regression F-statistic would be given by the test statistic associated with
hypothesis iv) above. We are always interested in testing this hypothesis since it
tests whether all of the coefficients in the regression (except the constant) are
jointly insignificant. If they are then we have a completely useless regression,

where none of the variables that we have said influence y actually do. So we would
need to go back to the drawing board and start again with new data or a new model!
The alternative hypothesis is:
H1 : 2 0 or 3 0 or 4 0 or 5 0
Note the form of the alternative hypothesis: or indicates that only one of the
components of the null hypothesis would have to be rejected for us to reject the null
hypothesis as a whole.
5. The restricted residual sum of squares will always be at least as big as the
unrestricted residual sum of squares i.e.
RRSS URSS
To see this, think about what we were doing when we determined what the
regression parameters should be: we chose the values that minimised the residual
sum of squares. We said that OLS would provide the best parameter values given
the actual sample data. Now when we impose some restrictions on the model, so
that they cannot all be freely determined, then the model should not fit as well as it
did before, and hence the residual sum of squares must be higher once we have
imposed the restrictions, otherwise the parameter values that OLS chose originally
without the restrictions could not be the best.
In the extreme case (very unlikely in practice), the two sets of residual sum of
squares could be identical if the restrictions were already present in the data, so that
imposing them on the model would yield no penalty in terms of loss of fit.
6. The null hypothesis is: H0 : 2 + 3 = 1 and 4 = 1
The first step is to impose this on the regression model:
yt = 1 + 2x2t + 3x3t + 4x4t + 5x5t + ut subject to 2 + 3 = 1 and 4 = 1.
We can rewrite the first part of the restriction as 3 = 1 - 2
Then rewrite the regression with the restriction imposed
yt = 1 + 2x2t + (1-2)x3t + x4t + 5x5t + ut
which can be re-written
yt = 1 + 2x2t + x3t - 2x3t + x4t + 5x5t + ut
and rearranging
(yt - x3t - x4t ) = 1 + 2x2t - 2x3t + 5x5t + ut
(yt - x3t - x4t) = 0 + 2(x2t -x3t) + 5x5t + ut
Now create two new variables, call them P and Q:
pt = (yt - x3t - x4t)
qt = (x2t -x3t)
We can then run the linear regression:
pt = 0 + 2qt+ 5x5t + ut
which constitutes the restricted regression model.
The test statistic is calculated as ((RRSS-URSS)/URSS)*(T-k)/m
In this case, m=2, T=96, k=5 so the test statistic = 5.704. Compare this to an Fdistribution with (2,91) degrees of freedom which is approximately 3.10. Hence we
10
reject the null hypothesis that the restrictions are valid. We cannot impose these
restrictions on the data without a substantial increase in the residual sum of squares.
7.
ri = 0.080 + 0.801Si + 0.321MBi + 0.164PEi - 0.084BETAi

(0.064) (0.147) (0.136)
(0.420)
(0.120)
1.25
5.45
2.36
0.390
- 0.700
The t-ratios are given in the final row above, and are in italics. They are calculated
by dividing the coefficient estimate by its standard error.
The relevant value from the t-tables is for a 2-sided test with 5% rejection overall.
T-k = 195; tcrit = 1.97. The null hypothesis is rejected at the 5% level if the absolute
value of the test statistic is greater than the critical value.
We would conclude based on this evidence that only firm size and market to book
value have a significant effect on stock returns.
If a stocks beta increases from 1 to 1.2, then we would expect the return on the
stock to FALL by (1.2-1)*0.084 = 0.0168 = 1.68%
This is not the sign we would have expected on beta, beta would be expected to be
positively related to return, since investors would require higher returns as
compensation for bearing higher market risk.
We would thus consider deleting the price/earnings and beta variables from the
regression since these are not significant in the regression - i.e. they are not helping
much to explain variations in y. We would not delete the constant term from the
regression even though it is insignificant since there are good statistical reasons for
its inclusion.
8. We need the concept of a parsimonious model - one that describes the most
important features of the data but using as few parameters as possible. We do want
to form a model that fits the data as well as possible, but in most financial series,
there is a substantial amount of noise. This can be interpreted as a random event
that is unlikely to be repeated in any forecastable way. We want to fit a model to
the data that will be able to generalise. In other words, we want a model that fits
to features of the data that will be replicated in future; we do not want to fit to
sample-specific noise.
This is why we need the concept of parsimony - fitting the smallest possible
mode to the data. Otherwise we may get a great fit to the data in sample, but any
use of the model for forecasts could yield terrible results.
Another important point is that the larger the number of estimated parameters (i.e.
the more variables we have), then the smaller will be the number of degrees of
freedom, and this will imply that standard errors will be larger than they would
otherwise have been. This could lead to a loss of power in hypothesis tests, and
variables which would have been significant are now insignificant.
11
City University
Exercise 3: Goodness of Fit Statistics and the Assumptions of the
CLRM
1. A researcher estimates the following econometric models including a lagged
dependent variable
yt 1 2 x2t 3 x3t 4 yt 1 ut
yt 1 2 x2t 3 x3t 4 yt 1 vt
where ut and vt are iid disturbances.
Will these models have the same value of
(i) the residual sum of squares (RSS)
(ii) R2
(iii)Adjusted R2 ?
2. A researcher estimates the following two econometric models
yt 1 2 x2t 3 x3t ut
(1)
yt 1 2 x2t 3 x3t 4 x4t vt

where ut and vt are iid disturbances and x3t is an irrelevant variable which does not
enter into the data generating process for yt.
Will the value of (i) R2 (ii)Adjusted R2 be higher for the second model?
3. Do we make assumptions about the unobservable error terms (ut) or about their
sample counterparts, the estimated residuals ( ut )?
4. What pattern(s) would we like to see in a residual plot?
12
City University
Solutions to Exercise 3: Goodness of Fit Statistics and the
Assumptions of the CLRM
1.
yt 1 2 x2t 3 x3t 4 yt 1 ut
yt 1 2 x2t 3 x3t 4 yt 1 vt
Note that we have not changed anything substantial between these models in the
sense that the second model is just a re-parameterisation (rearrangement) of the
first, where we have subtracted yt-1 from both sides of the equation.
(i) Remember that the residual sum of squares is the sum of each of the squared
residuals. So lets consider what the residuals will be in each case. For the first
model in the level of y
ut yt y t yt 1 2 x2t 3 x3t 4 yt 1
Now for the second model, the dependent variable is now the change in y:
ut yt y t yt 1 2 x2t 3 x3t 4 yt 1
where y is the fitted value in each case (note that we do not need at this stage to
assume they are the same).
Rearranging this second model would give
ut yt yt 1 1 2 x2t 3 x3t 4 yt 1
yt 1 2 x2t 3 x3t (4 1) yt 1
If we compare this formulation with the one we calculated for the first model, we
can see that the residuals are exactly the same for the two models, with 4 4 1 .
Hence if the residuals are the same, the residual sum of squares must also be the
same. In fact the two models are really identical, since one is just a rearrangement
of the other.
(ii) Now considering R2, if you recall how we calculate it:

RSS
R2 1
for the first model and
( yi y ) 2
RSS
in the second case. Therefore since the total sum of squares
(yi y ) 2
(the denominator) has changed, then the value of R2 must have also changed as a
consequence of changing the dependent variable.
R2 1
(iii) By the same logic, since the value of the adjusted R2 is just an algebraic
modification of R2 itself, the value of the adjusted R2 must also change.
13
2. A researcher estimates the following two econometric models

(1)
yt 1 2 x2t 3 x3t ut
yt 1 2 x2t 3 x3t 4 x4t vt
(2)
2
(i) The value of R will almost always be higher for the second model since it has
another variable added to the regression. The value of R2 would only be identical
for the two models in the very, very unlikely event that the estimated coefficient on
the X3t variable was exactly zero. Otherwise, the R2 must be higher for the second
model than the first.
(ii) The value of the adjusted R2 could fall as we add another variable. The reason
for this is that the adjusted version of R2 has a correction for the loss of degrees of
freedom associated with adding another regressor into a regression. This implies a
penalty term, so that the value of the adjusted R2 will only rise if the increase in this
penalty is more than outweighed by the rise in the value of R2.
3. In the same way as we make assumptions about the true value of beta and not the
estimated values, we make assumptions about the true unobservable disturbance
terms rather than their estimated counterparts, the residuals.
We know the exact value of the residuals, since they are defined by u t y t y t . So
we do not need to make any assumptions about the residuals since we already know
their value. We make assumptions about the unobservable error terms since it is
always the true value of the population disturbances that we are really interested in,
although we never actually know what these are.
4. We would like to see no pattern in the residual plot! If there is a pattern in the
residual plot, this is an indication that there is still some action or variability left
in yt that has not been explained by our model. This indicates that potentially it may
be possible to form a better model, perhaps using additional or completely different
explanatory variables, or by using lags of either the dependent or of one or more of
the explanatory variables. Recall that the two plots I showed in the lectures, where
the residuals followed a cyclical pattern, and when they followed an alternating
pattern are used as indications that the residuals are positively and negatively
autocorrelated respectively.
Another problem if there is a pattern in the residuals is that, if it does indicate the
presence of autocorrelation, then this may suggest that our standard estimates for
the coefficients could be wrong and hence any inferences we make about the
coefficients could be misleading.
14
City University
Exercise 4: Assumptions of the CLRM 2
1. A researcher estimates the following model for stock market returns, but thinks
that there may be a problem with it. By calculating the t-ratios, and considering
their significance or otherwise, suggest what the problem might be.
0.638 + 0.402 x2t - 0.891 x3t
R 2 0.96, R 2 0.89
(0.436) (0.291)
(0.763)
How might you go about solving the perceived problem?
y t =
2.
(i) State in algebraic notation and explain the assumption about the CLRMs
error terms that is referred to by the term homoscedasticity.
(ii) What would the consequence be for a regression model if the errors are
not homoscedastic?
(iii) How might you proceed if you found that (ii) was actually the case?
3.
(i) What do you understand by the term autocorrelation?

(ii) An econometrician suspects that the residuals of her model might be
autocorrelated. Explain the steps involved in testing this theory using the Durbin
Watson test.
(iii) The econometrician follows your guidance (!!!) in part (ii) and
calculates a value for the Durbin Watson statistic of 0.95. The regression has 60
quarterly observations and three explanatory variables (plus a constant term).
Perform the test. What is your conclusion?
(iv) In order to allow for autocorrelation, the econometrician decides to use
a model in first differences with a constant:
yt 1 2 x2t 3x3t 4 x4t ut
By attempting to calculate the long run solution to this model, what might be a
problem with estimating models entirely in first differences?
(v) The econometrician finally settles on a model with both first differences
and lagged levels terms of the variables:
yt 1 2 x2t 3x3t 4 x4t 5 x2t 1
6 x3t 1 6 x4t 1 vt
Can the Durbin Watson test still validly be used in this case?
4. Calculate the long run static equilibrium solution to the following dynamic
econometric model:
yt 1 2 x2t 3x3t 4 yt 1 5 x2t 1
6 x3t 1 7 x3t 4 ut
15
City University
Solutions to Exercise 4: Assumptions of the CLRM 2
1. The t-ratios for the coefficients in this model are given in the third row below
after the standard errors. They are calculated by dividing the individual coefficients
by their standard errors.
0.638 + 0.402 x2t - 0.891 x3t
R 2 0.96, R 2 0.89
(0.436) (0.291)
(0.763)
t-ratio 1.46
1.38
1.17
The problem appears to be that the regression parameters are all individually
insignificant (i.e. not significantly different from zero), although the value of R2
and its adjusted version are both very high, so that the regression taken as a whole
seems to indicate a good fit. This looks like a classic example of what we term near
multicollinearity. This is where the individual regressors are very closely related, so
that it becomes difficult to disentangle the effect of each individual variable upon
the dependent variable.
y t =
The solution to near multicollinearity that is usually suggested is that since the
problem is really one of insufficient information in the sample to determine each of
the coefficients, then one should go out and get more data. In other words, we
should switch to a higher frequency of data for analysis (e.g. weekly instead of
monthly, monthly instead of quarterly etc.). An alternative is also to get more data
by using a longer sample period (i.e. one going further back in time), or to combine
the two independent variables in a ratio (e.g. x2t / x3t ).
Other, more ad hoc methods for dealing with the possible existence of near
multicollinearity were discussed in the lectures:
- Ignore it: if the model is otherwise adequate, i.e. statistically and in terms of
each coefficient being of a plausible magnitude and having an appropriate sign.
Sometimes, the existence of multicollinearity does not reduce the t-ratios on
variables that would have been significant without the multicollinearity
sufficiently to make them insignificant. It is worth stating that the presence of
near multicollinearity does not affect the BLUE properties of the OLS estimator
i.e. it will still be consistent, unbiased and efficient since the presence of near
multicollinearity does not violate any of the CLRM assumptions 1-4. However,
in the presence of near multicollinearity, it will be hard to obtain small standard
errors. This will not matter if the aim of the model-building exercise is to
produce forecasts from the estimated model, since the forecasts will be
unaffected by the presence of near multicollinearity so long as this relationship
between the explanatory variables continues to hold over the forecasted sample.
- Drop one of the collinear variables - so that the problem disappears. However,
this may be unacceptable to the researcher if there were strong a priori
theoretical reasons for including both variables in the model. Also, if the
16
removed variable was relevant in the data generating process for y, an omitted
variable bias would result.
Transform the highly correlated variables into a ratio and include only the ratio
and not the individual variables in the regression. Again, this may be
unacceptable if financial theory suggests that changes in the dependent variable
should occur following changes in the individual explanatory variables, and not
a ratio of them.
2.
(i) The assumption of homoscedasticity is that the variance of the errors is
constant and finite over time. Technically, we write Var (u t ) u2 .
(ii) The coefficient estimates would still be the correct ones (assuming
that the other assumptions for OLS optimality are not violated), but the problem
would be that the standard errors could be wrong. Hence if we were trying to test
hypotheses about the true parameter values, we could end up drawing the wrong
conclusions. In fact, for all of the variables except the constant, the standard errors
would typically be too small, so that we would end up rejecting the null hypothesis
too many times.
(iii) There are a number of ways to proceed in practice, including
- Using heteroscedasticity robust standard errors which correct for the problem by
enlarging the standard errors relative to what they would have been for the situation
where the error variance is positively related to one of the explanatory variables.
- Transforming the data into logs, which has the effect of reducing the effect of
large errors relative to small ones.
3.
(i) This is where there is a relationship between the ith and jth residuals.
Recall that one of the assumptions of the CLRM was that such a relationship did
not exist. We want our residuals to be random, and if there is evidence of
autocorrelation in the residuals, then it implies that we could predict the sign of the
next residual and get the right answer more than half the time on average!
(ii) The Durbin Watson test is a test for first order autocorrelation. The test
is calculated as follows. You would run whatever regression you were interested in,
and obtain the residuals. Then calculate the statistic
ut ut 1
T
DW t 2
ut
t 2
You would then need to look up the two critical values from the Durbin Watson
tables, and these would depend on how many variables and how many observations
and how many regressors (excluding the constant this time) you had in the model.
The rejection / non-rejection rule would be given by selecting the appropriate
region from the following diagram:
17
(iii) We have 60 observations, and the number of regressors excluding the

constant term is 3. The appropriate lower and upper limits are 1.48 and 1.69
respectively, so the Durbin Watson is lower than the lower limit. It is thus clear that
we reject the null hypothesis of no autocorrelation. So it looks like the residuals are
positively autocorrelated.
(iv) yt 1 2 x2t 3x3t 4 x4t ut
The problem with a model entirely in first differences, is that once we calculate the
long run solution, all the first difference terms drop out (as in the long run we
assume that the values of all variables have converged on their own long run values
so that yt = yt-1 etc.) Thus when we try to calculate the long run solution to this
model, we cannot do it because there isnt a long run solution to this model!
(v) yt 1 2 x2t 3x3t 4 x4t 5 x2t 1 6 x3t 1 7 x4t 1 vt
The answer is yes, there is no reason why we cannot use Durbin Watson in this
case. You may have said no here because there are lagged values of the regressor
(X) variables in the regression. In fact this would be wrong since there are no lags
of the DEPENDENT (y) variable and hence DW can still be used.
4. yt 1 2 x2t 3x3t 4 yt 1 5 x2t 1 6 x3t 1 7 x3t 4 ut
The major steps involved in calculating the long run solution are to
- set the disturbance term equal to its expected value of zero
- drop the time subscripts
- remove all difference terms altogether since these will all be zero by the definition
of the long run in this context.
Following these steps, we obtain
0 1 4 y 5 x 2 6 x3 7 x3
We now want to rearrange this to have all the terms in x2 together and so that y is
the subject of the formula:
4 y 1 5 x 2 6 x3 7 x3
4 y 1 5 x 2 ( 6 7 ) x3
( 4 )
y 1 5 x2 6
x3
4 4
4
The last equation above is the long run solution.
18
City University
Exercise 5: Assumptions of the CLRM and Structural Stability
1. What might we use Ramseys RESET test for? What could we do if we find that
we have failed the RESET test?
2.
(i) Why do we need to assume that the disturbances of a regression model
are normally distributed?
(ii) In a practical econometric modelling situation, how might we get around
the problem of residuals that are not normal?
3. A researcher is attempting to form an econometric model to explain daily
movements of stock returns. A colleague suggests that she might want to see
whether her data are influenced by daily seasonality.
(i) How might she go about doing this?
(ii) The researcher estimates a model with the dependent variable as the daily
returns on a given share traded on the London stock exchange, and various
macroeconomic variables and accounting rations as independent variables. She
attempts to estimate this model, together with five daily dummy variables (one for
each day of the week), and a constant term, using EViews. EViews then tells her
that it cannot estimate the parameters of the model. Explain what has probably
happened, and how she can fix it.
(iii) The final model for asset returns, rt is as follows (with standard errors in
parentheses):
rt = 0.0034 - 0.0183 D1t + 0.01554 D2t -0.0007 D3t - 0.0272 D4t+ other variables
(0.0146) (0.0068)
(0.0231)
(0.0179)
(0.0193)
The model is estimated using 500 observations. Is there significant evidence of any
day of the week effects? Assume that there are 3 other variables.
(Continued)
19
4.(a) What do you understand by the term parameter structural stability?

(b) A financial econometrician thinks that the stock market crash of October 1987
fundamentally changed the risk-return relationship given by the CAPM-type
equation. He decides to test this hypothesis using a Chow test. The model is
estimated using monthly data from January 1980 - December 1995, and then two
separate regressions are run for the sub-periods corresponding to data before and
after the crash. The model is
rt = + Rmt + ut
so that the return on security i at time t is regressed upon the return on the market at
time t. The results for the 3 models estimated for shares in British Telecom are as
follows
1981M1-1995M12
RSS=0.189 T=180
rt = 0.0215 + 1.491 Rmt
1981M1-1987M10
RSS=0.079 T=82
rt = 0.0163 + 1.308 Rmt
1987M11-1995M12
RSS=0.082 T=98
rt = 0.0360 + 1.613 Rmt
(i) What are the null and alternative hypotheses that are being tested here, in terms
of and ?
(ii) Perform the test. What is your conclusion?
5. Another way to test whether the regression parameters are structurally stable is to
perform a predictive failure test.
For the same model as above, and given the following results, do a forward and
backward predictive failure test.
1981M1-1995M12
RSS=0.189 T=180
rt = 0.0215 + 1.491 Rmt
1981M1-1994M12
RSS=0.148 T=168
rt = 0.0212 + 1.478 Rmt
1982M1-1995M12
RSS=0.182 T=168
rt = 0.0217 + 1.523 Rmt
What are your conclusions?
20
City University
Solutions to Exercise 5: Assumptions of the CLRM and Structural
Stability
1. Ramseys RESET test is a test of whether the functional form of the regression
is appropriate. In other words, we test whether the relationship between the
dependent variable and the independent variables really should be linear or whether
a non-linear form would be more appropriate. The test works by adding powers of
the fitted values from the regression into a second regression. If the appropriate
model was a linear one, then the powers of the fitted values would not be
significant in this second regression.
If we fail Ramseys RESET test, then the easiest solution is probably to transform
all of the variables into logarithms. This has the effect of turning a multiplicative
model into an additive one.
If this still fails, then we really have to admit that the relationship between the
dependent variable and the independent variables was probably not linear after all
so that we have to either estimate a non-linear model for the data (which is beyond
the scope of this course) or we have to go back to the drawing board and run a
different regression containing different variables.
2.
(i) It is important to note that we did not need to assume normality in order
to derive the sample estimates of and or in calculating their standard errors. We
needed the normality assumption at the later stage when we come to test hypotheses
about the regression coefficients, either singly or jointly, so that the test statistics
we calculate would indeed have the distribution (t or F) that we said they would.
(ii) One solution would be to use a technique for estimation and inference
which did not require normality. But these techniques are often highly complex and
also their properties are not so well understood, so we do not know with such
certainty how well the methods will perform in different circumstances. Nonnormality is only really a problem when the sample size is small.
One pragmatic response to failing the normality test is to plot the estimated
residuals of the model, and look for one or more very extreme outliers. These
would be residuals that are much bigger (either very big and positive, or very big
and negative) than the rest. It is, fortunately for us, often the case that one or two
very extreme outliers will cause a violation of the normality assumption. The
reason that one or two extreme outliers can cause a violation of the normality
assumption is that they would lead the skewness and/or kurtosis of the residuals to
be very large.
Once we spot a few extreme residuals, we should look at the dates when these
outliers occurred. If we have a good theoretical reason for doing so, we can add in
separate dummy variables for big outliers caused by, for example, wars, changes of
government, stock market crashes, changes in market microstructure (e.g. the big
21
bang in 1986). The effect of the dummy variable is exactly the same as if we had
removed the observation from the sample altogether and estimated the regression
on the remainder. If we only remove observations in this way, then we make sure
that we do not lose any useful pieces of information represented by sample points.
3.
(i) The researcher could construct four dummy variables, which take the value 1 for
the day of the week they correspond to, and zero elsewhere. For example, if the
sample starts on a Tuesday, and D1t, D2t, D3t, and D4t represent the dummies for
Monday - Thursday respectively, then the additional variables to be put into the
regression would be
Day
Value of
D2t
D3t
D4t
D1t
Tuesday
0
1
0
0
Wednesday 0
0
1
0
Thursday
0
0
0
1
Friday
0
0
0
0
Monday
1
0
0
0
Tuesday
0
1
0
0
Wednesday 0
0
1
0
Thursday
0
0
0
1
(ii) The problem is probably one of perfect multicollinearity between the five daily
dummy variables and the constant term. The reason is that when we add the five
dummy variables together, they will sum to one in every time period. Thus the sum
is exactly the same as the constant term in the regression. Thus there is a perfect
positive correlation between the dummy variables and the constant term, which in
not allowed!
Technically, the problem would be that (XX) will not be of full rank and hence its
inverse will not exist. Hence we cannot calculate any of the parameter estimates
which are computed using (XX)-1.
Fortunately, the problem is easy to fix. What we would do it to just have four
dummy variables and a constant, or all five of the dummy variables, but no
constant, in the regression. Thus the multicollinearity problem would never arise.
The convention is to use Monday - Friday dummy variables and to leave out the
constant, or to use a constant plus the first four dummy variables, although as far as
I am aware there is no theoretical reason for doing this.
(iii) The thing to do to test whether there are significant day of the week effects
is of course to calculate the t-ratios and to therefore see if the coefficients are
significantly different from zero. The t-ratios are given in the third line under the
standard errors. The coefficients that are significant at the 5% level are indicated by
an asterisk (*):
rt = 0.0034 - 0.0183D1t + 0.01554D2t - 0.0007D3t - 0.0272D4t + other variables
(0.0146) (0.0068)
(0.0231)
(0.0179)
(0.0193)
0.233 -2.691*
0.673
-0.0391
-1.409
So there is evidence that there is a significant Monday effect. Since the sign on
the Monday dummy variable is negative, then it implies that everything else being
22
equal, returns will be significantly lower on Mondays relative to other days of the
week. There is no statistically significant evidence of any other seasonalities,
however, according to these results.
4.(a) Parameter structural stability refers to whether the coefficient estimates for a
regression equation are stable over time. If the regression is not structurally stable,
it implies that the coefficient estimates would be different for some sub-samples of
the data compared to others. This is clearly not what we want to find since when we
estimate a regression, we are implicitly assuming that the regression parameters are
constant over the entire sample period under consideration.
(b)
1981M1-1995M12
RSS=0.189 T=180
rt = 0.0215 + 1.491 Rmt
1981M1-1987M10
RSS=0.079 T=82
rt = 0.0163 + 1.308 Rmt
1987M11-1995M12
RSS=0.082 T=98
rt = 0.0360 + 1.613 Rmt
(i) If we define the coefficient estimates for the first and second halves of the
sample as 1 and 1, and 2 and 2 respectively, then the null and alternative
hypotheses are
H0 : 1 = 2 and 1 = 2
and
H1 : 1 2 or 1 2
(ii) The test statistic is calculated as

Test stat. =
RSS ( RSS1 RSS 2 ) (T 2k ) 0.189 (0.079 0.082) 180 4
*
*
15.304
0.079 0.082
2
RSS1 RSS 2
k
This follows an F distribution with (k,T-2k) degrees of freedom. F(2,176) = 3.05 at
the 5% level. Clearly we reject the null hypothesis that the coefficients are equal in
the two sub-periods.
5.The data we have is
1981M1-1995M12
RSS=0.189 T=180
rt = 0.0215 + 1.491 Rmt
1981M1-1994M12
RSS=0.148 T=168
rt = 0.0212 + 1.478 Rmt
1982M1-1995M12
RSS=0.182 T=168
rt = 0.0217 + 1.523 Rmt
First, the forward predictive failure test - i.e. we are trying to see if the model for
1981M1-1994M12 can predict 1995M1-1995M12.
RSS RSS1 T1 k 0.189 0.148 168 2
*
*
3.832
RSS1
T2
0.148
12
Where T1 is the number of observations in the first period (i.e. the period that we
actually estimate the model over), and T2 is the number of observations we are
trying to predict. The test statistic follows an F-distribution with (T2, T1-k)
23
degrees of freedom. F(12, 166) = 1.81 at the 5% level. So we reject the null
hypothesis that the model can predict the observations for 1995. We would
conclude that our model is no use for predicting this period, and from a practical
point of view, we would have to consider whether this failure is a result of a-typical
behaviour of the series out-of-sample (i.e. during 1995), or whether it results from a
genuine deficiency in the model.
The backward predictive failure test is a little more difficult to understand, although
no more difficult to implement. The test statistic is given by
RSS RSS1 T1 k 0.189 0.182 168 2
*
*
0.532
RSS1
T2
0.182
12
Now we need to be a little careful in our interpretation of what exactly are the
first and second sample periods. It would be possible to define T1 as always
being the first sample period. But I think it easier to say that T1 is always the sample
over which we estimate the model (even though it now comes after the hold-outsample). Thus T2 is still the sample that we are trying to predict, even though it
comes first. You can use either notation, but you need to be clear and consistent. If
you wanted to choose the other way to the one I suggest, then you would need to
change the subscript 1 everywhere in the formula above so that it was 2, and change
every 2 so that it was a 1.
Either way, we conclude that there is little evidence against the null hypothesis.
Thus our model is able to adequately back-cast the first 12 observations of the
sample.
24
City University
Exercise 6: Testing for Unit Roots
1. Why is it in general important to test for non-stationarity in time series data
before attempting to build an empirical model?
2. A researcher wants to test the order of integration of some time series data. He
decides to use the DF test. He estimates a regression of the form
yt = + yt-1 + ut
and obtains the estimate =-0.02 with standard error = 0.31.
(i) What are the null and alternative hypotheses for this test?
(ii) Given the data, and a critical value of -2.88, perform the test.
(iii) What do we conclude from this test what should be the next step?
(iv) Why can we not compare the estimated test statistic with the corresponding
critical value from a t-distribution, even though the test statistic takes the form of
the usual t-ratio?
3. Using the same regression as above, but on a different set of data, the researcher
now obtains the estimate =-0.52 with standard error = 0.16.
(i) Perform the test.
(ii) What do we conclude, and what should be the next step?
(iii) Another researcher suggests that there may be a problem with this
methodology since it assumes that the disturbances (ut) are white noise. Suggest a
possible source of difficulty and how we might in practice get around it.
25
City University
Solutions to Exercise 6: Testing for Unit Roots
1. Non-stationarity can be an important determinant of the properties of a series.
Also, if two series are non-stationary, we may experience the problem of spurious
regression. This occurs when we regress one non-stationary variable on a
completely unrelated non-stationary variable, but yield a reasonably high value of
R2, apparently indicating that the model fits well.
Most importantly therefore, we are not able to perform any hypothesis tests in
models which inappropriately use non-stationary data since the test statistics will no
longer follow the distributions which we assumed they would (e.g. a t or F), so any
inferences we make are likely to be invalid.
2. (i)The null hypothesis is of a unit root against a one sided stationary alternative,
i.e. we have
H0 : yt I(1)
H1 : yt I(0)
which is also equivalent to
H0 : = 0
H1 : < 0
(ii) The test statistic is given by / SE ( ) which equals -0.02 / 0.31 = -0.06
Since this is not more negative than the appropriate critical value, we do not reject
the null hypothesis.
(iii) We therefore conclude that there is at least one unit root in the series (there
could be 1, 2, 3 or more). What we would do now is to regress 2yt on yt-1 and test
if there is a further unit root. The null and alternative hypotheses would now be
H0 : yt I(1) i.e. yt I(2)
H1 : yt I(0) i.e. yt I(1)
If we rejected the null hypothesis, we would therefore conclude that the first
differences are stationary, and hence the original series was I(1). If we did not reject
at this stage, we would conclude that yt must be at least I(2), and we would have to
test again until we rejected.
(iv) We cannot compare the test statistic with that from a t-distribution since we
have non-stationarity under the null hypothesis and hence the test statistic will no
longer follow a t-distribution.
3. Using the same regression as above, but on a different set of data, the researcher
now obtains the estimate =-0.52 with standard error = 0.16.
(i) The test statistic is calculated as above. The value of the test statistic = -0.52
/0.16 = -3.25. We therefore reject the null hypothesis since the test statistic is
smaller (more negative) than the critical value.
26
(ii) We conclude that the series is stationary since we reject the unit root
null hypothesis. We need do no further tests since we have already rejected.
(iii) The researcher is correct. One possible source of non-whiteness is when the
errors are autocorrelated. This will occur if there is autocorrelation in the original
dependent variable in the regression (yt). In practice, we can easily get around
this by augmenting the test with lags of the dependent variable to soak up the
autocorrelation. The appropriate number of lags can be determined using the
information criteria.
27
City University
Exercise 7: Linear Univariate Time Series Modelling
1. Why are ARMA models particularly useful for financial time series?
2. Consider the following three models which a researcher suggests might be a
reasonable model of stock market prices:
(1)
yt = yt-1 + ut
(2)
yt = 0.5 yt-1 + ut
yt = 0.8 ut-1 + ut
(3)
(i) What classes of model are these examples of?
(ii) What would the autocorrelation and partial autocorrelation functions for each of
these processes look like? (You do not need to calculate the acf, simply consider
what shape it might have given the class of model from which it is drawn).
(iii) Which model is more likely to represent stock market prices from a theoretical
perspective, and why? If any of the three models truly represented the way stock
market prices move, which could we potentially use to make money by forecasting
future values of the series?
(iv) By making a series of successive substitutions or from your knowledge of the
behaviour of these types of processes, consider the extent of persistence of shocks
to the series in each case.
3.
(i) Describe the steps that Box and Jenkins (1970) suggested should be
involved in constructing an ARMA model.
(ii) What particular aspect of this methodology has been the subject of
criticism and why?
(iii) Describe an alternative procedure that could be used for this aspect.
4. A researcher is trying to determine the appropriate order of an ARMA model to
describe some actual data, with 200 observations available. She has the following
figures for the log of the estimated residual variance (i.e. log ( 2 )) for various
candidate models. She has assumed that an order greater than (3,3) should not be
necessary to model the dynamics of the data. What is the optimal model order?
log (2)
0.932
0.864
0.902
0.836
0.801
0.821
0.789
0.773
0.782
0.764
ARMA (p,q) model order

(0,0)
(1,0)
(0,1)
(1,1)
(2,1)
(1,2)
(2,2)
(3,2)
(2,3)
(3,3)
28
5. How could you determine if the order you suggested for question 5 was in fact
appropriate?
6. Given that the objective of any econometric modelling exercise is to find the
model that most closely fits the data, then adding more lags to an ARMA model
will almost invariably lead to a better fit. Therefore a large model is best because it
will fit the data more closely.
Comment on the validity (or otherwise) of this statement.
7. (a) You obtain the following sample autocorrelations and partial autocorrelations
for a sample of 100 observations from actual data:
Lag
1
2
3
4
5
6
7
8
acf
0.420
0.104
0.032 -0.206 -0.138
0.042 -0.018
0.074
pacf
0.632
0.381
0.268
0.199
0.205
0.101
0.096
0.082
Can you identify the most appropriate time series process for this data?
(b) Use the Ljung-Box Q* test to determine whether the first three autocorrelation
coefficients taken together are jointly significantly different from zero.
8. Explain what stylised shapes would be expected for the autocorrelation and
partial autocorrelation functions for the following stochastic processes:
- white noise
- An AR(2)
- An MA(1)
- An ARMA (2,1)
9. (a) Briefly explain any difference you perceive between the characteristics of
macroeconomic and financial data. Which of these features suggest the use of
different econometric tools for each class of data?
(b) Consider the following autocorrelation and partial autocorrelation
coefficients estimated using 500 observations for a weakly stationary series, yt:
Lag
1
2
3
4
5
Acf
0.307
-0.013
0.086
0.031
-0.197
pacf
0.307
0.264
0.147
0.086
0.049
Using a simple rule of thumb, determine which, if any, of the acf and pacf
coefficients are significant at the 5% level. Use both the Box-Pierce and LjungBox statistics to test the joint null hypothesis that the first 5 autocorrelation
coefficients are jointly zero.
(c) What process would you tentatively suggest could represent the most
appropriate model for the series in part (b)? Explain your answer.
29
(d) Outline two methods proposed by Box and Jenkins (1970) for determining
the adequacy of the models proposed in part (c).
30
City University
Solutions to Exercise 7: Linear Univariate Time Series Modelling
1. ARMA models are of particular use for financial series due to their flexibility.
They are fairly simple to estimate, can often produce reasonable forecasts, and most
importantly, they require no knowledge of any structural variables that might be
required for more traditional econometric analysis. When the data are available at
high frequencies, we can still use ARMA models while exogenous explanatory
variables (e.g. macroeconomic variables, accounting ratios) may be unobservable at
any more than monthly intervals at best.
2.
yt = yt-1 + ut
yt = 0.5 yt-1 + ut
yt = 0.8 ut-1 + ut
(1)
(2)
(3)
(i) The first two models are AR(1) models (although the first one is the very special
case of a unit root or random walk), while the last is an MA(1). Strictly, since the
first model is a random walk, it should be called an ARIMA(0,1,0) model, but it
could still be viewed as a special case of an autoregressive model.
(ii) We know that the theoretical acf of an MA(q) process will be zero after q lags,
so the acf of the MA(1) will be zero at all lags after one. For an autoregressive
process, the acf dies away gradually. It will die away fairly quickly for case (2),
with each successive autocorrelation coefficient taking on a value equal to half that
of the previous lag. For the first case, however, the acf will never die away, and in
theory will always take on a value of one, whatever the lag.
Turning now to the pacf, the pacf for the first two models would have a large
positive spike at lag 1, and no statistically significant pacfs at other lags. Again,
the unit root process of (1) would have a pacf the same as that of a stationary AR
process. The pacf for (3), the MA(1), will decline geometrically.
(iii) Clearly the first equation (the random walk) is more likely to represent stock
prices in practice. The discounted dividend model of share prices states that the
current value of a share will be simply the discounted sum of all expected future
dividends. If we assume that investors form their expectations about dividend
payments rationally, then the current share price should embody all information that
is known about the future of dividend payments, and hence todays price should
only differ from yesterdays by the amount of unexpected news which influences
dividend payments.
Thus stock prices should follow a random walk. Note that we could apply a similar
rational expectations and random walk model to many other kinds of financial
series.
If the stock market really followed the process described by equations (2) or (3),
then we could potentially make useful forecasts of the series using our model. In
the latter case of the MA(1), we could only make one-step ahead forecasts since the
31
memory of the model is only that length. In the case of equation (2), we could
potentially make a lot of money by forming multiple step ahead forecasts and
trading on the basis of these.
Hence after a period, it is likely that other investors would spot this potential
opportunity and hence the model would no longer be a useful description of the
data.
(iv) See the handouts for the algebra. This part of the question is really an extension
of the others. Analysing the simplest case first, the MA(1), the memory of the
process will only be one period, and therefore a given shock or innovation, ut,
will only persist in the series (i.e. be reflected in yt) for one period. After that, the
effect of a given shock would have completely worked through.
For the case of the AR(1) given in equation (2), a given shock, ut, will persist
indefinitely and will therefore influence the properties of yt for ever, but its effect
upon yt will diminish geometrically as time goes on.
In the first case, the series yt could be written as an infinite sum of past shocks, and
therefore the effect of a given shock will persist indefinitely, and its effect will not
diminish over time.
3.
(i) Box and Jenkins were the first to consider ARMA modelling in this
logical and coherent fashion. Their methodology consists of 3 steps:
Identification - determining the appropriate order of the model using graphical
procedures (e.g. plots of autocorrelation functions).
Estimation - of the parameters of the model of size given in the first stage. This can
be done using least squares or maximum likelihood depending on the model.
Diagnostic checking - this step is to ensure that the model actually estimated is
adequate. B & J suggest two methods for achieving this:
- overfitting, which involves deliberately fitting a model larger than that
suggested in step 1 and testing the hypothesis that all the additional coefficients can
jointly be set to zero.
- Residual diagnostics. If the model estimated is a good description of the
data, there should be no further linear dependence in the residuals of the estimated
model. Therefore, we could calculate the residuals from the estimated model, and
use the Ljung-Box test on them, or calculate their acf. If either of these reveal
evidence of additional structure, then we assume that the estimated model is not an
adequate description of the data.
If the model appears to be adequate, then it can be used for policy analysis and for
constructing forecasts. If it is not adequate, then we must go back to stage 1 and
start again!
(ii) The main problem with the B & J methodology is the inexactness of the
identification stage. Autocorrelation functions and partial autocorrelations for
actual data are very difficult to interpret accurately, rendering the whole procedure
often little more than educated guesswork. A further problem concerns the
32
diagnostic checking stage, which will only indicate when the proposed model is
too small and would not inform the researcher of when it is too large.
(iii) We could use Akaikes or Schwarzs Bayesian information criteria. Our
objective would then be to fit the model order that minimises these.
We can calculate the value of Akaikes (AIC) and Schwarzs (SBIC) Bayesian
information criteria using the following respective formulae
AIC = ln ( 2 ) + 2k/T
SBIC = ln ( 2 ) + k ln(T)/T
The information criteria trade off an increase in the number of parameters and
therefore an increase in the penalty term against a fall in the RSS, implying a closer
fit of the model to the data.
4. Using the formulae above, we end up with the following values for each criterion
and for each model order (with an asterisk denoting the smallest value of the
information criterion in each case).
ARMA (p,q) model order
(0,0)
(1,0)
(0,1)
(1,1)
(2,1)
(1,2)
(2,2)
(3,2)
(2,3)
(3,3)
log ( 2 )
0.932
0.864
0.902
0.836
0.801
0.821
0.789
0.773
0.782
0.764
AIC
0.942
0.884
0.922
0.866
0.841
0.861
0.839
0.833*
0.842
0.834
SBIC
0.944
0.887
0.925
0.870
0.847
0.867
0.846
0.842*
0.851
0.844
The result is pretty clear: both SBIC and AIC say that the appropriate model is an
ARMA(3,2).
5. We could still perform the Ljung-Box test on the residuals of the estimated
models to see if there was any linear dependence left unaccounted for by our
postulated models.
Another test of the models adequacy that we could use is to leave out some of the
observations at the identification and estimation stage, and attempt to construct out
of sample forecasts for these. For example, if we have 2000 observations, we may
use only 1800 of them to identify and estimate the models, and leave the remaining
200 for construction of forecasts. We would then prefer the model that gave the
most accurate forecasts.
6. This is not true in general. Yes, we do want to form a model which fits the data
as well as possible. But in most financial series, there is a substantial amount of
noise. This can be interpreted as a number of random events that are unlikely to
be repeated in any forecastable way. We want to fit a model to the data that will be
33
able to generalise. In other words, we want a model that fits to features of the
data that will be replicated in future; we do not want to fit to sample-specific noise.
This is why we need the concept of parsimony - fitting the smallest possible
model to the data. Otherwise we may get a great fit to the data in sample, but any
use of the model for forecasts could yield terrible results.
Another important point is that the larger the number of estimated parameters (i.e.
the more variables we have), then the smaller will be the number of degrees of
freedom, and this will imply that coefficient standard errors will be larger than they
would otherwise have been. This could lead to a loss of power in hypothesis tests,
and variables that would otherwise have been significant are now insignificant.
7. (a) We class an autocorrelation coefficient or partial autocorrelation coefficient
1
= 0.196. Under this rule, the sample
as significant if it exceeds 1.96 *
T
autocorrelation coefficients (sacfs) at lag 1 and 4 are significant, and the spacfs at
lag 1, 2, 3, 4 and 5 are all significant.
This clearly looks like the data are consistent with a first order moving average
process since all but the first acfs are not significant (the significant lag 4 acf is a
typical wrinkle that one might expect with real data and should probably be
ignored), and the pacf has a slowly declining structure.
(b) The formula for the Ljung-Box Q* test is given by
m
Q* T (T 2)
k 1
k2
T k
m2
using the standard notation.

In this case, T=100, and m=3. The null hypothesis is H0: 1 = 0 and 2 = 0 and
3 = 0. The test statistic is calculated as
0.420 2 0.104 2 0.032 2
Q* 100 102
19.41.
100 1 100 2 100 3
The 5% and 1% critical values for a 2 distribution with 3 degrees of freedom are
7.81 and 11.3 respectively. Clearly, then, we would reject the null hypothesis that
the first three autocorrelation coefficients are jointly not significantly different from
zero.
8. The shapes of the acf and pacf are perhaps best summarised in a table:
Process
acf
pacf
White noise No significant coefficients
No significant coefficients
AR(2)
Geometrically declining or damped First 2 pacf coefficients significant,
sinusoid acf
all others insignificant
MA(1)
First acf coefficient significant, all Geometrically declining or damped
others insignificant
sinusoid pacf
ARMA(2,1) Geometrically declining or damped Geometrically declining or damped
sinusoid acf
sinusoid pacf
34
A couple of further points are worth noting. First, it is not possible to tell what the
signs of the coefficients for the acf or pacf would be for the last three processes,
since that would depend on the signs of the coefficients of the processes. Second,
for mixed processes, the AR part dominates from the point of view of acf
calculation, while the MA part dominates for pacf calculation.
9. (a) Some of the stylised differences between the typical characteristics of

macroeconomic and financial data were presented in Section 1.2 of Chapter 1 of the
textbook. In particular, one important difference is the frequency with which
financial asset return time series and other quantities in finance can be recorded.
This is of particular relevance for the models discussed in Chapter 5, since it is
usually a requirement that all of the time-series data series used in estimating a
given model must be of the same frequency. Thus, if, for example, we wanted to
build a model for forecasting hourly changes in exchange rates, it would be difficult
to set up a structural model containing macroeconomic explanatory variables since
the macroeconomic variables are likely to be measured on a quarterly or at best
monthly basis. This gives a motivation for using pure time-series approaches (e.g.
ARMA models), rather than structural formulations with separate explanatory
variables.
It is also often of particular interest to produce forecasts of financial variables in
real time. Producing forecasts from pure time-series models is usually simply an
exercise in iterating with conditional expectations. But producing forecasts from
structural models is considerably more difficult, and would usually require the
production of forecasts for the structural variables as well.
(b) A simple rule of thumb for determining whether autocorrelation coefficients
and partial autocorrelation coefficients are statistically significant is to classify
1
, where T is the
them as significant at the 5% level if they lie outside of 1.96 *
T
sample size. In this case, T = 500, so a particular coefficient would be deemed
significant if it is larger than 0.088 or smaller than 0.088. On this basis, the
autocorrelation coefficients at lags 1 and 5 and the partial autocorrelation
coefficients at lags 1, 2, and 3 would be classed as significant. The formulae for the
Box-Pierce and the Ljung-Box test statistics are respectively
m
Q T k2
and
k 1
Q* T (T 2)
k 1
k2
T k
In this instance, the statistics would be calculated respectively as

Q 500 [0.307 2 (0.013 2 ) 0.086 2 0.0312 (0.197 2 )] 70.79
and
0.307 2 (0.013 2 ) 0.086 2
0.0312 (0.197 2 )
Q* 500 502
71.39 .
500 2
500 3 500 4
500 5
500 1
The test statistics will both follow a 2 distribution with 5 degrees of freedom (the
number of autocorrelation coefficients being used in the test). The critical values
are 11.07 and 15.09 at 5% and 1% respectively. Clearly, the null hypothesis that the
first 5 autocorrelation coefficients are jointly zero is resoundingly rejected.
35
(c) Setting aside the lag 5 autocorrelation coefficient, the pattern in the table is for
the autocorrelation coefficient to only be significant at lag 1 and then to fall rapidly
to values close to zero, while the partial autocorrelation coefficients appear to fall
much more slowly as the lag length increases. These characteristics would lead us
to think that an appropriate model for this series is an MA(1). Of course, the
autocorrelation coefficient at lag 5 is an anomaly that does not fit in with the pattern
of the rest of the coefficients. But such a result would be typical of a real data series
(as opposed to a simulated data series that would have a much cleaner structure).
This serves to illustrate that when econometrics is used for the analysis of real data,
the data generating process was almost certainly not any of the models in the
ARMA family. So all we are trying to do is to find a model that best describes the
features of the data to hand. As one econometrician put it, all models are wrong, but
some are useful!
(d) The methods are overfitting and residual diagnostics. Overfitting involves
selecting a deliberately larger model than the proposed one, and examining the
statistical significances of the additional parameters. If the additional parameters are
statistically insignificant, then the originally postulated model is deemed acceptable.
The larger model would usually involve the addition of one extra MA term and one
extra AR term. Thus it would be sensible to try an ARMA(1,2) in the context of
Model A, and an ARMA(3,1) in the context of Model B. Residual diagnostics
would involve examining the acf and pacf of the residuals from the estimated
model. If the residuals showed any action, that is, if any of the acf or pacf
coefficients showed statistical significance, this would suggest that the original
model was inadequate. Residual diagnostics in the Box-Jenkins sense of the term
involved only examining the acf and pacf, rather than the array of diagnostics. It is
worth noting that these two model evaluation procedures would only indicate a
model that was too small. If the model were too large, i.e. it had superfluous terms,
these procedures would deem the model adequate.
36
EVIEWS TUTORIALS 1 & 2

1
INTRO TO EVIEWS ........................................................................................ 2
1.1
Prelude ___________________________________________________ 2
1.2
What is EViews? ___________________________________________ 2
1.3
Getting Started: ____________________________________________ 2
1.4
Importing data: ____________________________________________ 2
1.5
Simple Data Analysis _______________________________________ 5
1.6
Running a Simple Regression ________________________________ 6
1.7
Estimating a Cross-Section Regression _________________________ 9
ESTIMATING AND EVALUATING REGRESSIONS IN EVIEWS .......... 14
2.1
Import Data ______________________________________________ 14
2.2
Transforming the Variables _________________________________ 14
2.3
Estimating the Model ______________________________________ 15
2.4
Evaluating the Model ______________________________________ 17
2.5
Looking at the Diagnostics of the Model_______________________ 18
2.6
Estimating a Cross-Section Regression ________________________ 21
2.7
Estimating a Hedge Ratio___________________________________ 21
2.8
Multiple Regression Model _________________________________ 23
1 INTRO TO EVIEWS
1.1
Prelude
This guide is compiled following the textbook by Chris Brooks. Some of the
material contained within is for you to do, while some is for reading only. Instructions
where you actually have to do something (e.g. type, point or click) are given in boldfaced type.
1.2
What is EViews?
EViews is a simple to use, interactive econometric software package. It can be used

for data analysis and evaluation through univariate and multivariate regressions,
forecasting, and simulation. It has a wide variety of statistical tests and EViews
command and batch processing language allows you to store commands in programs
that document your research project for later execution.
1.3
Getting Started:
To load Eviews, go to the START - Programs - Departmental Software - Cass - EViews

6. You will now see the blank main window of EViews.
1.4
Importing data:
There are several different ways to input data into EViews. The simplest way to input
(very small amounts of) data is manually, but since theres no real practical reason to
do this, we will start by importing data from an Excel Spreadsheet or CSV file.
File Open Foreign data as workfile
We will use data on the stock of British Petroleum and the FTSE All Shares Index.
The data are from Datastream and come in a CSV format. The data is in
T:\Eviews\BP.csv
You will see the following windows
Click through the screens
In Step 3 you can edit the column names by selecting each one.
Once done click Finish, you will now have a new Eviews workfile:
Right-click on each variable to rename/delete/open etc. (resid and c are always

generated automatically).CTRL + select and then right-click allows you to select more
than 1 variable at a time and gives more options:
1.5
Simple Data Analysis
For now lets double-click on BP to open it in a series window. You will see a
button bar on the top. On that button bar, click on the view button to see a series of
options. These options include:
View:
Spreadsheet
Graph (Line Area - Bar Stack etc.)
Descriptive Statistics
Histogram and Stats
Tests for Descriptive Stats
Simple Hypothesis Tests
To see some descriptive statistics for the variables, select the option View; Descriptive
Statistics; Histogram and Stats. The histogram of the variable and amongst others the
mean, the minimum the maximum, the median and the standard deviation of the series
will now be shown.
To see the spreadsheet again, click on View; Spreadsheet.
To plot variable BP, we simply chose the view graph - line graph option, and in the
same window, the graph of variable BP will appear.
It is also possible to create other types of graph. For example a scatter plot. Close the
BP series window, select 2 variables (using ctrl) and select Open as group.
Now, View Graph Scatter Simple Scatter will give you a scatter plot.
If you click on Name and label the scatter plot (e.g. Scatter), in the workfile window,
there will now be a new object that is named scatter. If you double click on this object,
a scatter plot of the two variables will now appear. NOTE that saving the plot in this
way doesnt actually freeze the output, it saves the group of 2 variables you selected.
You can still select View Spreadsheet and return to just the numbers. To save just the
output, e.g. just the graph, select Freeze, then in the new window that appears select
Name and name your new Graph object. Double-click this object and you will see that
the View menu has changed.
1.6
Running a Simple Regression
Now we will look at how to run a simple OLS regression in EViews. Dont worry
about whether it makes sense to run this particular regression. At this stage we are
simply trying to familiarize ourselves with the package and the concepts.
There are 2 ways to do this:
Either:
Object - New Object. In the new window select Equation and name the equation
REG1. Click on OK.
In order to specify the regression equation we can use two alternative methods. In the
new window, type EITHER
BP=c(1)+c(2)*FTAS
OR
BP C FTAS
Both expressions are equivalent and it is obvious that we are instructing EViews to
run a regression of BP on a constant and FTAS. It is possible to choose another
estimation method, but for now use LS - Least Squares (NLS and ARMA).
ALTERNATIVELY, you can select BP and FTAS together, right-click, and open as
equation object. This will get you to the same point that the procedure outlined above
does.
We could also select not to include all our observations in the regression by
specifying a subset of observations to be used. This would allow some data points to be
left out and used for other purposes (e.g. forecasting). For now, all the available
observations will be used and so the sample is left at 1 370.
Click on OK and the screen will display the regression results. However complex
the regression equation, the results screen will always have the same format. The
results specify the dependent variable, the estimation method, the sample and the
observations used, the coefficients names; values; standard errors; t-statistics and pvalues of the t-test. Below this, some regression statistics are shown, including the Rsquared; the adjusted R-squared and the-value of the F-test of the regression.
Dependent Variable: BP
Method: Least Squares
Sample: 1 370
Included observations: 370
Variable
FTAS
C
R-squared
Adjusted R-squared
S.E. of regression
Sum squared resid
Log likelihood
Durbin-Watson stat
Std.
Error
t-Statistic
Prob.
0.223737 0.021684
-100.1781 64.80856
10.31805
-1.545754
0.0000
0.1230
Coefficient
Mean dependent
0.224385 var
0.222277 S.D. dependent var
Akaike info
42.84394 criterion
675501.9 Schwarz criterion
-1914.303 F-statistic
0.079089 Prob(F-statistic)
568.1250
48.58218
10.35840
10.37955
106.4622
0.000000
Note that the p-values are given so that one does not have to look up the t-values in a t
statistical table. Recall that the t-test examines whether the value of an individual
coefficient is statistically significant or not. This test in effect examines whether the
value of the coefficient is equal to zero or it is significantly different from zero. Under
this setting, the test takes the form:
H0: = 0
H1: 0
for the intercept coefficient, denoted , and for a single slope coefficient denoted ,
H0: = 0
H1: 0
The value of the estimated t-statistic is then compared with the value of the
tabulated t-statistic for a number of degrees of freedom and the confidence interval and
a conclusion can be reached. An alternative way to test the hypothesis is by looking
at the p-value. The p-value is the probability by chance alone of drawing a tstatistic as extreme as the one actually observed. This probability is the marginal
significance level and given a p-value, it can be concluded immediately whether to
reject or not reject the null hypothesis that the true coefficient is zero. If
conducting a one-sided test, the probability is one-half that reported by EViews. In the
simple regression above, the slope coefficient is statistically significant (i.e.
significantly different from zero), since the p-value of the t-test is 0.0000 and thus the
null hypothesis is rejected. On the other hand, the intercept coefficient is statistically
insignificant and the null hypothesis that its true value is zero cannot be rejected.
For statistical reasons, the intercept coefficient should never be excluded from a
regression equation.
We now turn our attention to the F-Test statistic. This test examines whether all the
slope coefficients are statistically insignificant or not. This is a joint test of significance
that, assuming the slope coefficients are denoted 1, 2, 3, , k, takes the form:
H0: 1 = 0 and 2 = 0 and and k = 0
H1: 1 0 or 2 0 or or k 0
In this regression, there is only one slope coefficient and so the t- and the F-statistic
provide the same information.
The R-squared value shows how much of the variability of the dependent variable is
explained by the independent variable(s).
It is possible to copy these results into the clipboard and then to paste them into a
document by simply selecting the result table and then copying them formatted or
unformatted into the clipboard. To copy the results, Ctrl-A (selects non-empty cells)
then Ctrl+C or Edit - Copy and Ctrl-V (or Edit Paste) wherever you want to paste
the table. To save regression results into an equation object use Name as we did with
the scatter plot. In order to change the estimation method or the regression equation,
click on the estimate button in the equation window.
To save just the table of output use Freeze and Name (again as with the scatter plot).
Let us turn our attention now to the residuals of the regression. We can view the
table of the actual and fitted values and the residuals by clicking on the view button in
the equation window and selecting Actual, Fitted, Residuals; Actual, Fitted,
Residuals Table. This will give a table of the actual and fitted values and of the
residuals with a plot of the residuals. To obtain a graph of the residuals, select Actual,
Fitted, Residuals; Residual Graph.
In order to save your work, go to File; Save and then provide a name for your file.
EViews will save the entire workfile with all the objects created in it. This means that
all series, equations and graphs will be saved in this workfile in a .wf1 format that can
be accessed only from EViews.
This file can be accessed in the future by simply opening EViews and then clicking on
File Open EViews workfile and selecting the file you have created.
This guide has so far covered how to import data, how to view various descriptive
statistics, how to plot the variables, how to run a regression, how to interpret simple
regression statistics and tests, how to view the residuals, and how to save a workfile.
We will now continue with some real data and a financial application.
1.7
Estimating a Cross-Section Regression
For the population of chief executive officers, let yt be annual salary (salary) in
thousands of $. Thus, a value y = 1256.3 indicates an annual salary of $1 256 300. Let
xt denote the average return on equity (roe) for the CEOs firm for the previous three
years. (Return on equity is defined in terms of net income as a percentage of common
equity). For example, if roe = 19, then average return on equity is 19%. The file
CEOSAL1 contains information on 209 CEOs for the year 1990; these data were
obtained from Business Week 5/6/91 (Wooldridge, 2003).
(a) What is the average annual salary in this sample? And the average return on
equity?
In order to find the average annual salary and the average roe we open the
corresponding series in the workfile and we click View/Descriptive Stats and
Tests/Stats Table.
As we can see the average annual salary is $ 1281.120 and the roe is 17.18%.
The alternative solution is to run a simple regression on a constant with salary and the
roe as dependent variables in order to find the average salary and roe, respectively.
(b) What is the variance of the average annual salary in this sample? And the
average return on equity?
The variance is equal to the squared standard deviation. Thus, the variance of the
annual salary is (1372.345)2 = 1883619 and the variance of roe is equal to (8.5185)2 =
72.564
(c) Graph the data in a scatterplot and interpret it.

16,000
14,000
12,000
SALARY
10,000
8,000
6,000
4,000
2,000
0
0
10
20
30
40
ROE
50
60
There is no strong relationship between roe and salary.
(d) Estimate the following regression model (by LS) relating salary to roe.
salaryi = 1 + 2*roei + t
What estimates do you obtain for the model parameters? Which standard error
each of them has? Interpret the latter.
Dependent Variable: SALARY
Date: 08/28/12 Time: 13:09
Sample: 1 209
Variable
Coefficient
Std. Error
t-Statistic
Prob.
C
ROE
963.1913
18.50119
213.2403
11.12325
4.516930
1.663290
0.0000
0.0978
R-squared
Adjusted R-squared
S.E. of regression
Sum squared resid
Log likelihood
F-statistic
Prob(F-statistic)
0.013189
0.008421
1366.555
3.87E+08
-1804.543
2.766532
0.097768
Mean dependent var

S.D. dependent var
Akaike info criterion
Schwarz criterion
Hannan-Quinn criter.
Durbin-Watson stat
1281.120
1372.345
17.28750
17.31948
17.30043
2.104990
(e) If the return on equity is zero, what is the expected (or predicted) salary?
If the roe = 0, then E(salary) = $ 963 191.3.
(f) If the return on equity increases by 1 percentage point, by how much is salary
expected to increase?
If the roe increases by 1%, then the salary is going to increase by $ 18.50 .
(g) Use your estimation results to compare predicted salaries at different values
of roe, for instance roe = 30% and roe = 15%.
If for 1% increase of roe the salary increases by $ 18.50, then for 30% of roe the salary
is going to increase by 30 * $ 18.50 =$ 555 and 15 * $ 18.50 = $ 277.5.
10
(h) On the basis of the current sample and your estimation results, can we claim
that roe plays a significant role in explaining CEOs salary? (Hint: conduct a
hypothesis test)
Hypothesis
H0: b2 = 0
HA: b2 0
t-test
t-stat = 18.50/ 11.12 = 1.66
The critical value is t2.5%, 207 = 1.97 or t5%, 207 = 1.65
For 5% level of significance roe doesnt play a significant role. However, for 10%
level roe is significant.
Alternatively, we can see the p-value of the relevant t-statistic. As we can see the pvalue is 9.7% which is greater than 5% level of significance and thus we dont reject
the H0 that b2=0. Thus, b2 is insignificant at the 5% level.
However it is significant at the 10% level since p-value = 9.7% is smaller than 10%.
(i) Construct a two-sided 95% confidence interval on the marginal effect of roe
on CEOs salary and answer the previous question using this interval. Do you
get the same answer as in (h)?
The confidence interval is
b2 tcrit*SE(b2) = 18.50 t2.5%, 207*11.12= 18.50 1.97*11.12 = (-3.40, 40.40)
As we can see the estimated b2 is inside the confidence interval and thus we dont
reject the null hypothesis that b2=0. Thus b2 is insignificant at the 5% level of
significance.
Same answer was obtained in (h). The hypothesis testing with the confidence intervals
and test of significance approach leads to the same results.
(j) Test the hypothesis that if the roe increases by 1 percentage point the CEOs
salary is expected to change by more than $15,000.
H0 : b1=15
HA : b1> 15
t-stat= (18.50-15)/ 11.12 = 0.31
t5%, 207 = 1.65
Since we have a one-side test the t-critical is 1.65 and is greater than the t-statistic,
which means that we do not reject the null that b1= 15.000. Hence, we reject the
11
alternative hypothesis that if roe increases by 1% the CEOs salary is going to change
by more than $ 15.000.
(k) Plot the predicted salaries (according to this model) and the actual salaries.
Discuss.
15,000
12,500
10,000
7,500
5,000
2,500
16,000
0
12,000
8,000
4,000
0
-4,000
25
50
75
100
Residual
125
Actual
150
175
200
Fitted
From the plot we can see that there is no big difference between the actual and the
fitted (predicted) salaries.
(l) Plot the estimated errors or residuals. Discuss.

RESID
14,000
12,000
10,000
8,000
6,000
4,000
2,000
0
-2,000
25
50
75
100
125
12
150
175
200
The errors are equal to the actual salaries minus the predicted salaries. As we can see
with the exception of three cases our model predicts well the actual salaries of the 209
CEOs.
(m) Report and interpret the marginal impact of roe on CEOs salary.
If roe changes by 1%, the CEOs salary will increase by $ 18.50. But the impact is
only significant at the 10% level.
(n) Which percentage of the variability in CEOs salaries is explained by the

returns on equity?
The proportion of variability in CEOs salary explained by roe is very small, R2 =
1.3%.
13
2 ESTIMATING AND EVALUATING REGRESSIONS IN EVIEWS

2.1
Import Data
We will continue to use the BP workfile.

2.2
Transforming the Variables
These series are daily closing prices of the FTSE All Shares Index and BP stock. In
order to estimate a CAPM equation, these series should first be transformed into
returns and then calculate excess returns over the risk free rate. To transform the series,
click on the Generate button (Genr) on the workfile window. In the new window, type:
RFTAS=LOG(FTAS/FTAS(-1))
This will create a new series named RFTAS that will contain the returns of the
FTAS. The operator (-1) is used to instruct EViews to use the one period lagged
observation of the series.
To estimate returns on the BP Stock, press the (Genr) button again and type
RBP=LOG(BP/BP(-1))
This will yield a new series named RBP that will contain the returns of the BP
Stock.
EViews allows various different kinds of transformations to the series. For example:
X2=X/2
XSQ=X^2
LX=LOG(X)
LAGX=X(-1) lags X by 1 period
LAGX2=X(-2) lags X by two periods etc.
Other functions include:
d(x)
first difference
d(x,n)
n-th order difference
d(x,n,s)
n-th order difference with a seasonal difference at s
dlog(x)
first difference of the logarithm
dlog(x,n)
n-th order difference of the logarithm
dlog(x,n,s) n-th order difference of the logarithm with a seasonal difference at s
14
abs(y)
absolute value of y
Note that if in the transformation the new series is given the same name as the old
series, the old series will be overwritten. (best to avoid doing this to your original
series)
Using to the above functions it is possible to generate new series by simply typing in
the Genr window:
RFTAS=DLOG(FTAS)
In the same way, we will transform the returns into excess returns.
Click the Genr button again and type in:
RFTAS=RFTAS-UKTBILL
Now transform the series of returns for BP to excess returns.
Note that this has now overwritten the returns series with excess returns.
We will now run a CAPM regression to identify the beta and the alpha for BP stock.
2.3
Estimating the Model
Now that we have the excess returns of the series, we can proceed to run the CAPM
regression. Before running the regression, plot the data to examine visually whether the
series move together. To do this, create a new object by clicking on the Objects; New
Object menu on the menu bar. Select Graph, provide a name (call the graph graph1)
and then in the new window provide the names of the series to plot. In this new
window, type:
RFTAS RBP
In the new window press OK. What does the graph imply about the beta of BP
Stock?
Close the window of the graph and return to the workfile window. Select RBP and
RFTAS, open group and select the scatter plot. It should appear as below.
15
To estimate the CAPM equation we click on Objects; New Objects. In the new
window, select Equation and Name the equation CAPM. Click on OK. In the new
window, specify the regression equation. The regression equation takes the form:
(RBP-rf)t = + (RM-rf)t + ut
Since the data has already been transformed, in order to specify this regression
equation, type in the equation window:
RBP c RFTAS
To use all the observations in the sample and to estimate the regression using LS
Least Squares, click on OK. The results screen appears and has the same format as the
screen of the previous section.
Dependent Variable: RBP

Included observations: 369 after adjustments
Variable
Coefficient
Std. Error
t-Statistic
Prob.
C
RFTAS
0.000720
0.697092
0.001105
0.107925
0.651537
6.459017
0.5151
0.0000
R-squared
Adjusted R-squared
S.E. of regression
Sum squared resid
Log likelihood
Durbin-Watson stat
0.102072
0.099626
0.021235
0.165489
898.8419
1.898081
Mean dependent var

S.D. dependent var
Schwarz criterion
F-statistic
Prob(F-statistic)
16
0.000761
0.022379
-4.860932
-4.839735
41.71890
0.000000
2.4
Evaluating the Model
Take a couple of minutes to examine the results of the regression. What do the
results tell us? What is the slope coefficient estimate and what does it signify? Is this
coefficient statistically significant?
We can see that the beta coefficient (the slope coefficient / the coefficient of
RFTAS) is equal to 0.697092. The p-value of the t-ratio is 0.0000 signifying that the
return on the market proxy is able to explain the variability of the returns of BP.
What does the constant coefficient mean? Is it statistically significant?
The F test shows that the regression slope coefficient is significantly different from
zero, which in this case is the same result as the t-test for the beta coefficient (note that
we only have one slope coefficient).
Finally, by examining the R2 and the adjusted R2, it can be seen that the excess
returns of the market proxy are able to explain a relatively small proportion of the
variability of the excess returns on BP stock.
It is of interest to test whether the beta coefficient is statistically different from 1. To
do this, click on the View button in the regression window and choose Coefficient
tests; Wald-Coefficient Restrictions. In the new window type:
C(2)=1
This tells EViews to test whether the slope coefficient (i.e. the coefficient on the
second variable, since the intercept is c(1)) is equal to 1. Click on OK. In the new test
result screen you will see:
Wald Test:
Equation: CAPM
Test Statistic
F-statistic
Chi-square
Value
7.877243
7.877243
df
Probability
(1, 367)
1
0.0053
0.0050
Value
Std. Err.
Null Hypothesis Summary:

Normalized Restriction (= 0)
-1 + C(2)
-0.302908
0.107925
Restrictions are linear in coefficients.
There are two versions of the test given: an F-version and a 2-version. The F-version
is adjusted for small sample bias and should be used when the regression is estimated
using a small sample. Both statistics asymptotically yield the same result, and hence in
a sample of this size, the p-values are very similar. The beta of BP is significantly
different from 1. We thus conclude that, in the CAPM world, BP returns on average
17
fluctuate less than those of the market as a whole, as we might expect of a basic
commodity supplier.
2.5
Looking at the Diagnostics of the Model
We will now examine the residuals of the regression. To examine whether there is
autocorrelation and heteroscedasticity it is important to look at the residuals. Plot the
residuals by selecting View; Actual, Fitted, Residuals; Residual Graph.
If the residuals of the regression have systematically changing variability over the
sample, that is sign of heteroscedasticity. In that case, any inferences that are made
regarding the coefficient estimates may be wrong since although the coefficient
estimates are still unbiased in the presence of heteroscedasticity, they are no longer
BLUE.
To test for heteroscedasticity using the White heteroscedasticity test, click on the
View button in the regression window and select Residual Tests; White
Heteroscedasticity (no cross terms). The results of the test are:
White Heteroskedasticity Test:

F-statistic
Obs*R-squared
0.712712
1.431533
Probability
Probability
0.490992
0.488817
Test Equation:
Dependent Variable: RESID^2
Variable
C
RFTAS
RFTAS^2
R-squared
Adjusted R-squared
S.E. of regression
Sum squared resid
Log likelihood
Durbin-Watson stat
Coefficient Std. Error

0.000419
0.003739
0.280350
0.003879
-0.001564
0.000839
0.000258
2091.485
1.775284
t-Statistic
Prob.
5.36E-05
7.813221
0.004320
0.865445
0.295599
0.948413
Mean dependent var
S.D. dependent var
Schwarz criterion
F-statistic
Prob(F-statistic)
0.0000
0.3874
0.3435
0.000448
0.000839
-11.31970
-11.28791
0.712712
0.490992
What do we conclude from the results? Remember the null is well specified i.e. no
heteroskedasticity.
We will now examine whether there is autocorrelation in the residuals. If
autocorrelation is present, the coefficient estimates of the regression are still unbiased
but they are inefficient.
There are several ways to test for autocorrelation. The easiest (but not very
accurate) is by examining the Durbin Watson statistic for first order autocorrelation.
This statistic was given in the general results screen shown above. To view the results
screen again, click on the View button in the regression window and select Estimation
output. The DW statistic is found at the bottom of the table. What does the DW
statistic tell us in this case?
18
0>DW>4
DW around 2 = no autocorrelation
DW << 2 positive a/c
DW>>2 negative a/c
To examine whether the residuals contain any higher order autocorrelation, we
could plot them over time, although this is likely to be difficult to interpret. A
statistical approach would be to use the Breusch-Godfrey Serial Correlation LM Test.
This test can be conducted by selecting View; Residual Tests; Serial Correlation
LM Tests. In the new window, type the number of lagged residuals you want to
include in the test and click on OK. Assuming that you selected to employ 10 lags in
the test, the results would be:
Breusch-Godfrey Serial Correlation LM Test:
F-statistic
Obs*R-squared
1.799298
17.70543
Probability
Probability
0.059338
0.060141
Test Equation:
Dependent Variable: RESID
Presample missing value lagged residuals set to zero.
Variable
C
RFTAS
RESID(-1)
RESID(-2)
RESID(-3)
RESID(-4)
RESID(-5)
RESID(-6)
RESID(-7)
RESID(-8)
RESID(-9)
RESID(-10)
-3.65E-06
-0.016466
0.039652
-0.118282
-0.123380
-0.025917
-0.087858
0.022433
0.038984
-0.051403
0.029604
-0.080451
R-squared
Adjusted R-squared
S.E. of regression
Sum squared resid
Log likelihood
Durbin-Watson stat
0.047982
0.018648
0.021007
0.157548
907.9140
1.992850
0.001094
0.107771
0.052849
0.052990
0.053359
0.053765
0.054094
0.054170
0.054162
0.053886
0.053578
0.053732
t-Statistic
Prob.
-0.003335
-0.152785
0.750292
-2.232153
-2.312255
-0.482043
-1.624178
0.414124
0.719768
-0.953924
0.552542
-1.497262
0.9973
0.8787
0.4536
0.0262
0.0213
0.6301
0.1052
0.6790
0.4721
0.3408
0.5809
0.1352
Mean dependent var -3.39E-18

S.D. dependent var
0.021206
Akaike info criterion -4.855903
Schwarz criterion
-4.728722
F-statistic
1.635726
Prob(F-statistic)
0.086976
The test is an F-test of serial correlation and if the p-value of the F-statistic is
smaller than 0.05 we reject the null of no serial correlation.
Another assumption of the CLRM is that the disturbances follow a normal
distribution. If the residuals do not follow a normal distribution then we cannot make
correct inferences about the true coefficients from the coefficient estimates.
To test for normality, the Jarque-Bera test is used. This test can be viewed by
selecting View; Residual Tests; Histogram-Normality Test. The Jarque-Bera
19
statistic has a 2 distribution with two degrees of freedom under the null hypothesis of
normally distributed errors. If the residuals are normally distributed, the histogram
should be bell-shaped and the Jarque-Bera statistic would not be significant. This
means that the p-value given at the bottom of the normality test screen should be
bigger than 0.05 to not reject the null of normality at the 5% level. In this case, the
screen would appear as:
In this case, the null hypothesis for residual normality is rejected, implying that the
inferences we make about the coefficient estimates could be wrong, although the
sample is probably sufficiently large to not give great cause for concern.
Let us now shift our attention to the functional form of the regression. A simple test
for the functional form of the model is the Ramsey Reset Test found in the View menu
of the regression window under Stability tests; Ramsey RESET test. It examines
whether the coefficients of the equation are nonlinear and should be examined to find
misspecification problems. You are asked for the number of fitted terms, equivalent
to the number of powers of the fitted value to be used in the regression; type 1 to
consider only the square of the fitted values. The Ramsey RESET test for this
regression is in effect testing whether the relationship between the stock excess returns
and the market proxy excess returns is linear or not. The results of this test for one
fitted term are:
Ramsey RESET Test:

F-statistic
Log likelihood ratio
1.368964
1.377610
Probability
Probability
0.242751
0.240509
Test Equation:
Dependent Variable: RBP
Variable
20
t-Statistic
Prob.
C
RFTAS
FITTED^2
-0.000208
0.699135
17.99653
R-squared
Adjusted R-squared
S.E. of regression
Sum squared resid
Log likelihood
Durbin-Watson stat
0.105418
0.100530
0.021224
0.164872
899.5307
1.893274
0.001360
0.107885
15.38130
-0.152761
6.480356
1.170027
Mean dependent var

S.D. dependent var
Schwarz criterion
F-statistic
Prob(F-statistic)
0.8787
0.0000
0.2428
0.000761
0.022379
-4.859245
-4.827450
21.56490
0.000000
We can see that there is no apparent nonlinearity in the regression equation and we
thus conclude that the linear model in the returns is appropriate.
Taking the results as a whole, what are the implications for the validity and
testability of the estimated model?
Heteroskedasticity? Autocorrelation? Normality? Non-linearity?
2.6
Estimating a Cross-Section Regression
For the population of chief executive officers, let y_{t} be annual salary (salary) in
thousands of $. Thus, a value y=1256.3 indicates an annual salary of $1256,300. Let
x_{t}..
2.7
Estimating a Hedge Ratio
Another interesting application of the simple, one variable model is to the

estimation of the hedge ratio in a hedge of a spot position with a futures contract. In
order to estimate this model, data on spot prices of a security and the price of the
appropriate future are required. In this case, monthly data on coffee spot and the 2month futures prices, taken from Datastream International, will be used. The
commodity used is the COFFEE, BRAZILIAN (NY) CENTS/LB spot prices and the
futures contract is the LIFFE-COFFEE FUT. 2ND.POSN. U$/MT both traded in the
US.
The data can be found in t:\Eviews\hedge.csv
Import the data into a new workfile as before. This is monthly data starting 1991:4
and ending 2000:9
Transform the series into returns and then plot the returns series in a scatter plot.
The scatter plot that should be obtained is shown below.
21
What does the graph tell us?

A simple regression will now be conducted in order to calculate the hedge ratio. The
regression equation to estimate is of the following form (with time subscripts
suppressed):
Return(Spot) = + * Return(Future) + u
This equation will be estimated using Least Squares.
Estimate the equation and examine the results, which should appear as shown below:
Dependent Variable: RSPOT
Variable
Coefficient
Std. Error
t-Statistic
Prob.
C
RFUTURE
0.001895
0.936129
0.008300
0.083072
0.228254
11.26887
0.8199
0.0000
R-squared
Adjusted R-squared
S.E. of regression
Sum squared resid
Log likelihood
Durbin-Watson stat
0.533589
0.529387
0.088202
0.863536
115.0470
2.002975
Mean dependent var

S.D. dependent var
Schwarz criterion
F-statistic
Prob(F-statistic)
-0.000578
0.128572
-2.000832
-1.952560
126.9875
0.000000
What does the slope coefficient tell us? Does it have the correct sign? What does the
R tell us?
2
22
2.8
Multiple Regression Model
Yt 1 X 1t 2 X 2t 3 X 3t .... t
OLS assumptions and properties of OLS estimators are the same as in the simple
regression case.
A Practical Example CAPM to a 3 Factor Model
Data:
We will use the data in the file MRM that can be found at:
T:\Eviews\MRM
This data comes from Professor Kenneth Frenchs website:
http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/data_library.html
We have monthly data from 1980:01 to 2007:08
SMB (Small Minus Big) is the average return on the three small portfolios minus
the average return on the three big portfolios
HML (High Minus Low) is the average return on the two value portfolios minus the
average return on the two growth portfolios
Rm-Rf, the excess return on the market, is the value-weight return on all NYSE,
AMEX, and NASDAQ stocks (from CRSP) minus the one-month Treasury bill rate
(from Ibbotson Associates)
LRG_GRW: average monthly returns on large, growth portfolio
SM_VAL: average monthly returns on small, value portfolio
MID: neutral
Import the data into a new workfile as before.
Transform returns into excess returns.
23
We can first estimate a 1 factor model (CAPM) as before, for example using the large
growth portfolio. The regression would be:
ERLG = c(1) * MKT_RF + c(2)
Or equivalently
ERLG MKT_RF c
Dependent Variable: ERLG

Date: 10/22/07 Time: 20:04
Sample: 1980M01 2007M08
Variable
Coefficie
nt
Std. Error
t-Statistic
MKT_RF
1.204779
0.021592
55.79650
0.0000
-0.093009
0.095212
-0.976863
0.3294
Prob.
R-squared
0.904160
Mean dependent var
0.667018
Adjusted R-squared
0.903870
S.D. dependent var
5.537812
S.E. of regression
1.716991
3.925029
Sum squared resid
972.8590
Schwarz criterion
3.947952
F-statistic
3113.250
Prob(F-statistic)
0.000000
Log likelihood
Durbin-Watson stat
-649.5549
2.076069
This output is of course interpreted as before. What would the constant tell us in this
case? How about the coefficient on the market excess return?
Now we want to expand on CAPM by adding size and value factors in addition to the
market risk factor. The Fama-French 3 factor model considers the fact that value (high
book to market) and small cap stocks historically outperform markets. Accounting for
this observation and adjusting for it should give us a better model for expected (and
24
therefore excess) returns. The 3 factor model adjusts downward for small cap and
value outperformance.
Now the regression to estimate is:
R Rf 1 ( Rm Rf ) 2 SMB 3 HML
Dependent Variable: ERLG
Date: 10/22/07 Time: 19:22
Sample: 1980M01 2007M08
Variable
Coefficient
Std. Error
t-Statistic
Prob.
MKT_RF
1.104110
0.021004
52.56632
0.0000
SMB
0.122345
0.027086
4.516952
0.0000
HML
-0.249939
0.031572
-7.916555
0.0000
0.119039
0.084372
1.410880
0.1592
R-squared
0.930598
Mean dependent var
0.667018
Adjusted R-squared
0.929963
S.D. dependent var
5.537812
S.E. of regression
1.465553
3.614318
Sum squared resid
704.4935
Schwarz criterion
3.660163
F-statistic
1466.027
Prob(F-statistic)
0.000000
Log likelihood
Durbin-Watson stat
-595.9767
1.673780
How do we interpret this model? What extra considerations must we make when we have
more than one regressor?
Try the same exercise with the small value and neutral portfolio.
25
Interpretation (in the general case):

Again, a slope parameter measures the ratio of the change in the dependent variable
(Y) to the change in the independent variable (X):
Y
X
BUT
Since we have more than 1 independent variable in the MRM we write:
Y
X 1
Y
X 2
Y
X 3
These partial derivatives are called the partial regression coefficients and they measure
the isolated effect of each variable onto Y:
1 : The change in Y due to a unit change in X1, keeping X2 and X3 fixed. (In this
case where we have 3 independent variables)
Therefore it is important that we have no perfect collinearity (i.e. correlation)
between regressors X1, X2, X3
e.g. X1 and X2 can be correlated (and probably will be) BUT they must not be a
perfect linear function of each other.
Intuitively, it would be like having one variable only in the model and the 2 effects
cannot be separated.
NOTE:
If there is high (but not perfect) correlation between variables, the coefficients can still
be estimated but the high correlation will affect their reliability, i.e. they will have too
large a variance and as a consequence the estimated t-values will be low.Therefore it
will be more likely to find a variable non-significant when it is actually significant.
R 2 : now called multiple coefficient of determination
R 2 : important as variables are added to the model
26
F-statistic: tests null hypothesis that all coefficients except the intercept are zero. Now
we have more than 1 regressor this statistic can give us different information than the tstatistic in the SRM.
Specification Bias
Inclusion of an irrelevant variable does not alter the results of our estimation (except an
increase in R2). This is called model overfitting and is not a serious problem as
coefficient estimates remain consistent and unbiased, but are inefficient.
Model underfitting, i.e. omitting an important variable makes coefficient estimates
biased and inconsistent, and variances of both the regression and the estimators will be
wrongly calculated. NO inferences can be made based on the usual tests.
27
EVIEWS TUTORIAL 3
3 BUILDING AN EMPIRICAL MODEL OF RETURNS FOR GE...................2
3.1
Introduction..............................................................................................2
3.2
Preliminary Regression ...........................................................................3
3.3
Heteroskedasticity-robust Standard Errors..........................................8
3.4
Multicollinearity.....................................................................................10
3.5
Normality, Outliers and Breakpoints...................................................11
3.6
Dummy Variables ..................................................................................13
3.7
Stability Tests .........................................................................................15
3 BUILDING AN EMPIRICAL MODEL OF RETURNS FOR GE

3.1 Introduction
The previous sessions have considered how to build and estimate simple econometric
models with one explanatory variable. The intention of these previous sessions was to
familiarize you with the interface of EViews, and to learn how to test for violations of
the CLRM assumptions. This section will attempt to build a full econometric model to
explain the returns on the shares of General Electric Company. This section will also
examine possible remedial action for perceived problems with the regressions.
Monthly price data of GE in the UK will be used to try to identify the variables that
help to explain the time series variability of returns. The data is comprised of the
following series:
P
DIVY
APS
EI
ROI
FTSE
GDP
RPI
LY
SY
EER
Price of GEC
Dividend Yield
Assets per Share
Earnings Index
Return on Investment
Value of FTSE 100 Index
Index Level of GDP
Retail Price Index
Redemption Yield on Long Gilts
3 Month T-Bill Rate
Sterling Effective Exchange Rate
Some of the variables are general macroeconomic variables while others are
company-specific accounting variables. All of these variables could, a priori, be
expected to affect the returns on the share of GE. There are a total of 11 variables.
We have monthly data from Jan. 1980 to Mar. 1999: 1980:1 - 1999:3
Import the data from:
T:\EViews\GEC
You should have 11 new variables in the workfile, each named as in the table above.
Plot the price of GE against the FTSE in a line graph.
Note that the two series take significantly different values (scale).
Do the series appear to move together?
3.2 Preliminary Regression

Before running the regression, first generate series of returns or growth rates for
each of the existing variables.
RP=DLOG(p)
RFTSE=DLOG(FTSE)
LDIVY=DLOG(DIVY)
LAPS=DLOG(APS)
LEI=DLOG(EI)
LROI=DLOG(ROI)
LGDP=DLOG(GDP)
LRPI=DLOG(RPI)
LEER=DLOG(EER)
In addition to the above we need to calculate the term spread:
TS=LY-SY
Note that it is not possible to transform the term spread into logarithms because the
term spread can be negative. In this case, only the first difference of the term spread
will be taken:
DTS=TS-TS(-1)
It is now possible to determine whether the above variables affect the returns of GE
shares. Initially, run the CAPM regression
RP C RFTSE
Examine the significance of the coefficients and perform a test for the slope
coefficient to examine whether it is equal to one.
Examine the Diagnostics of this regression by following the same steps as in
section 2 above.
Check for residual autocorrelation, heteroscedasticity, non-normality, and
functional form misspecification.
Is there any apparent violation of the CLRM assumptions?
The results for the main regression and for some of the diagnostics are given below.
CAPM Regression:
Dependent Variable: RP
Date: 11/14/07 Time: 16:55
Sample (adjusted): 1980M02 1999M03
Variable
C
RFTSE
R-squared
Adjusted R-squared
S.E. of regression
Sum squared resid
Log likelihood
Durbin-Watson stat

-0.001349
0.879839
0.394127
0.391470
0.054761
0.683723
342.7465
1.943777
0.003688
0.072245
t-Statistic
Prob.
-0.365769
12.17854
0.7149
0.0000
Mean dependent var

S.D. dependent var
Schwarz criterion
F-statistic
Prob(F-statistic)
Wald Test:
Equation: CAPM
Test Statistic
F-statistic
Chi-square
Value
2.766384
2.766384
df
Probability
(1, 228)
1
0.0976
0.0963
0.007791
0.070199
-2.963013
-2.933117
148.3167
0.000000
Breusch-Godfrey Serial Correlation LM Test:

F-statistic
Obs*R-squared
0.692363
8.519170
Probability
Probability
0.758055
0.743358
Test Equation:
Dependent Variable: RESID
Date: 10/10/07 Time: 23:37
Presample missing value lagged residuals set to zero.
Variable
C
RFTSE
RESID(-1)
RESID(-2)
RESID(-3)
RESID(-4)
RESID(-5)
RESID(-6)
RESID(-7)
RESID(-8)
RESID(-9)
RESID(-10)
RESID(-11)
RESID(-12)
R-squared
Adjusted R-squared
S.E. of regression
Sum squared resid
Log likelihood
Durbin-Watson stat
t-Statistic
Prob.
0.082807
-0.362828
0.433083
0.578822
0.312816
0.797125
-0.764335
1.045118
1.431019
0.369498
-0.756973
0.210279
-1.422079
0.609690
0.9341
0.7171
0.6654
0.5633
0.7547
0.4263
0.4455
0.2971
0.1539
0.7121
0.4499
0.8336
0.1564
0.5427
0.000308
-0.027081
0.029441
0.039321
0.021543
0.054740
-0.052836
0.072001
0.099430
0.025695
-0.052712
0.014645
-0.099172
0.043024
0.003724
0.074639
0.067980
0.067932
0.068869
0.068672
0.069127
0.068893
0.069482
0.069541
0.069636
0.069646
0.069737
0.070567
0.037040
-0.020916
0.055210
0.658398
347.0870
1.997872
Mean dependent var

S.D. dependent var
Schwarz criterion
F-statistic
Prob(F-statistic)
1.36E-18
0.054641
-2.896409
-2.687134
0.639104
0.819499
White Heteroskedasticity Test:

F-statistic
Obs*R-squared
0.617161
1.243871
Probability
Probability
0.540376
0.536904
Test Equation:
Dependent Variable: RESID^2
Date: 10/10/07 Time: 23:39
Sample: 1980M02 1999M03
Variable
C
RFTSE
RFTSE^2
R-squared
Adjusted R-squared
S.E. of regression
Sum squared resid
Log likelihood
Durbin-Watson stat
t-Statistic
Prob.
7.937081
0.902845
-0.286822
0.0000
0.3676
0.7745
0.002945
0.006337
-0.014673
0.000371
0.007019
0.051158
0.005408
-0.003355
0.004978
0.005624
894.7962
2.027455
Mean dependent var

S.D. dependent var
Schwarz criterion
F-statistic
Prob(F-statistic)
0.002973
0.004969
-7.754750
-7.709905
0.617161
0.540376
Dont forget a normality test and test of functional form the results are not
included here but you should still run the tests.
What other misspecification issues might we face?
As we can see, the returns of the market proxy are able to explain only a small
percentage of the variability of the returns of GE and the model does not appear to
suffer from any violations of the CLRM.
However, are there any other variables that might affect the returns on GE Stock?
To identify other variables, we will run regressions that are more complex.
Now run two separate regressions: one including only macroeconomic variables
and the market proxy returns and one using only accounting variables and the
market returns.
The results you should obtain are as follows:
Macroeconomic variables and market proxy returns:

Date: 11/15/07 Time: 11:52
Variable
C
RFTSE
LGDP
DTS
LRPI
LEER
R-squared
Adjusted R-squared
S.E. of regression
Sum squared resid
Log likelihood
Durbin-Watson stat

-0.006194
0.887173
0.192159
0.009940
1.113573
0.200556
0.410141
0.396974
0.054513
0.665652
345.8269
1.909868
0.004651
0.072014
0.302943
0.008463
0.671929
0.200599
t-Statistic
Prob.
-1.331952
12.31951
0.634309
1.174484
1.657277
0.999785
0.1842
0.0000
0.5265
0.2414
0.0989
0.3185
Mean dependent var

S.D. dependent var
Schwarz criterion
F-statistic
Prob(F-statistic)
0.007791
0.070199
-2.955017
-2.865327
31.15033
0.000000
Company specific regression:

Variable
C
RFTSE
LDIVY
LAPS
LROI
LEI
R-squared
Adjusted R-squared
S.E. of regression
Sum squared resid
Log likelihood
Durbin-Watson stat

-0.003378
0.059660
-0.070017
0.201698
0.875756
-0.107015
0.939278
0.937922
0.017490
0.068525
607.2878
2.370655
0.001248
0.029927
0.033471
0.026445
0.036151
0.037254
t-Statistic
Prob.
-2.705825
1.993494
-2.091854
7.627164
24.22464
-2.872610
0.0073
0.0474
0.0376
0.0000
0.0000
0.0045
Mean dependent var

S.D. dependent var
Schwarz criterion
F-statistic
Prob(F-statistic)
0.007791
0.070199
-5.228590
-5.138901
692.9854
0.000000
Run all the diagnostics checks. There are some problems with both sets of
regressions: suggest remedies for these.
3.3 Heteroskedasticity-robust Standard Errors
When the form of heteroskedasticity is unknown, it is usually not possible to obtain
efficient coefficient estimates of the parameters using weighted least squares. OLS
provides consistent parameter estimates in the presence of heteroskedasticity but the
usual OLS standard errors will be incorrect and should not be used for inference. In
order to allow for this problem, heteroskedasticity-robust standard errors are
constructed. White (1980) has derived a heteroskedasticity consistent covariance
matrix estimator which provides correct estimates of the coefficient covariances in the
presence of heteroskedasticity of unknown form. The White covariance matrix
assumes that the residuals of the estimated equation are serially uncorrelated. Newey
and West (1987) have proposed a more general covariance estimator that is consistent
in the presence of both heteroskedasticity and autocorrelation of unknown form.
Note that using the White heteroskedasticity consistent or the Newey-West HAC
covariance estimates will not change the estimated coefficient values, but only their
estimated standard errors.
In order to estimate the regression with heteroskedasticity robust standard errors,
select this option from the option button in the regression entry window. If we click
on the Estimate button in the regression window we will come to the screen where
we input the regression equation. By clicking on the Options button, a new window
will open that will allow the selection of the required methodology:
Macro Model Correction:

Date: 11/15/07 Time: 11:58
White Heteroskedasticity-Consistent Standard Errors & Covariance
Variable
C
RFTSE
LGDP
DTS
LRPI
LEER
R-squared
Adjusted R-squared
S.E. of regression
Sum squared resid
Log likelihood
Durbin-Watson stat

-0.006194
0.887173
0.192159
0.009940
1.113573
0.200556
0.410141
0.396974
0.054513
0.665652
345.8269
1.909868
0.004812
0.059768
0.473926
0.007385
0.577138
0.211973
t-Statistic
Prob.
-1.287331
14.84350
0.405462
1.345988
1.929474
0.946138
0.1993
0.0000
0.6855
0.1797
0.0549
0.3451
Mean dependent var

S.D. dependent var
Schwarz criterion
F-statistic
Prob(F-statistic)
0.007791
0.070199
-2.955017
-2.865327
31.15033
0.000000
Company Specific Model Correction:

Date: 11/15/07 Time: 12:01
Variable
C
RFTSE
LDIVY
LAPS
LROI
LEI
R-squared
Adjusted R-squared
S.E. of regression
Sum squared resid
Log likelihood
Durbin-Watson stat
Coefficient
Std. Error
t-Statistic
Prob.
-0.003378
0.059660
-0.070017
0.201698
0.875756
-0.107015
0.001279
0.023409
0.039532
0.130585
0.040083
0.098188
-2.640086
2.548646
-1.771120
1.544580
21.84847
-1.089904
0.0089
0.0115
0.0779
0.1239
0.0000
0.2769
0.939278
0.937922
0.017490
0.068525
607.2878
2.370655
Mean dependent var

S.D. dependent var
Schwarz criterion
F-statistic
Prob(F-statistic)
0.007791
0.070199
-5.228590
-5.138901
692.9854
0.000000
Compare the results of estimation using heteroskedasticity-consistent and the usual

standard errors. We should find that the standard error on the intercept coefficient
falls, while those on the slope coefficients rise when heteroskedasticity-robust
standard errors are used.
Now that the problems highlighted by the diagnostic tests have been addressed,
examine the significance of all the included explanatory variables.
Why is it desirable to remove variables with insignificant coefficients from a
regression equation?
Why dont we start with just one variable and add more variables to the model to
improve it?
Given that the dependent variable is GE returns, measured as continuously
compounded monthly returns, determine the effect that a 1 unit increase in each of the
independent variables would have on the dependent variable.
3.4 Multicollinearity
Until this session we were running regressions with only one independent variable. In
the most recent set of regressions, there were multiple independent variables included
in the equations. One of the assumptions of the CLRM is that there is no
multicollinearity of the independent variables. This implies that none of the
independent variables is a linear combination of another independent variable. We can
examine whether there is multicollinearity present by viewing the correlation matrix
of the independent variables. This can be achieved by typing in the command
window the command
Cor RFTSE LDIVY LAPS LROI LEI
[Alternatively select all these variables, open as group, and select VIEW
CORRELATION COMMON SAMPLE]
A new window will be displayed that contains the correlation matrix of the series:
RFTSE
LDIVY
LAPS
LROI
LEI
RFTSE
1.000000
-0.582813
0.062844
0.626814
0.070636
LDIVY
-0.582813
1.000000
-0.089244
-0.872911
-0.001587
LAPS
0.062844
-0.089244
1.000000
0.071339
0.283713
LROY
0.626814
-0.872911
0.071339
1.000000
-0.028627
LEI
0.070636
-0.001587
0.283713
-0.028627
1.000000
Do the results indicate any significant correlations between the independent variables?
(The log-differences of the return on investment and of the dividend yield have
correlation 0.87, which indicates that they are on average closely related but moving
in opposite directions).
Now repeat this step for the macroeconomic variables. Overall, which model do
you think better explains the variability of returns of GE stock?
10
Due to the fact that the company specific model seems to be a better fitting model, we
will concentrate on this one. From the regression output, it is evident that three of the
variables are statistically insignificant. We remove the insignificant variables starting
with the variable that is showing the highest level of insignificance. As a result, we
get the following (note that after removing LEI and LAPS, LDIVY becomes
significant):
Date: 11/15/07 Time: 12:12
Variable
C
RFTSE
LDIVY
LROI
R-squared
Adjusted R-squared
S.E. of regression
Sum squared resid
Log likelihood
Durbin-Watson stat
3.5
Coefficient
Std. Error
t-Statistic
Prob.
-0.003060
0.054179
-0.079813
0.880422
0.001337
0.022717
0.037850
0.040137
-2.289359
2.384958
-2.108668
21.93556
0.0230
0.0179
0.0361
0.0000
0.923361
0.922344
0.019562
0.086486
580.5169
2.316229
Mean dependent var

S.D. dependent var
Schwarz criterion
F-statistic
Prob(F-statistic)
0.007791
0.070199
-5.013190
-4.953398
907.6348
0.000000
Normality, Outliers and Breakpoints
Note that in the case of the company specific regression the normality test suggests
that the residuals do not follow a normal distribution. This might be caused by an
outlier or a breakpoint in the regression residuals. In order to check whether this is the
case or not, we will examine two tests.
First, we will examine whether there is a breakpoint in the regression relationship.
This will inform us of any changes in the regression equation caused by a specific
event. We can identify such an event by plotting the actual values, the fitted values
and the residuals of the regression. This can be achieved by selecting View; Actual,
Fitted, Residual; Actual, Fitted, Residual Graph. The plot should look as follows:
11
From the graph, we can see that some time in late 1996 there is a big residual outlier
that is probably disrupting the model. In order to identify the exact date that this
outlier was realized, we use the shading option by right clicking on the graph and
selecting the add shading option. In the new window, input 1996M10 as the ending
date of the shade.
12
We can see that October 1996 is the probable day of the outlier. Another approach to
determining the exact date of the break would be to view the residuals in a Table
(again from the View button). The large negative residual is indeed observed in
October 1996 this represents a mini-crash in the markets. We need the exact date of
the outlier in order to adjust our model to correct for it.
3.6 Dummy Variables

One way to cure the problem of big outliers in the data is by using dummy
variables. To create a dummy variable, first generate a series called dummy that will
initially contain only zeros. Generate this series and then double click on the new
object to open the spreadsheet. Turn on the editing mode and input a single 1 in the
cell that corresponds to October 1996. Leave all other cell entries as zeros.
Once the dummy variable has been created, rerun the company specific regression
including this dummy variable. This can most easily be achieved by clicking on the
Estimate button and adding the dummy variable to the end of the variable list.
The results of this regression are:
13
Date: 11/15/07 Time: 13:03
Variable
C
RFTSE
LDIVY
LROI
OCTDUMMY
R-squared
Adjusted R-squared
S.E. of regression
Sum squared resid
Log likelihood
Durbin-Watson stat
Coefficient
Std. Error
t-Statistic
Prob.
-0.002042
0.055328
-0.087297
0.871601
-0.212753
0.000864
0.022499
0.037562
0.039670
0.000880
-2.362356
2.459101
-2.324069
21.97137
-241.7862
0.0190
0.0147
0.0210
0.0000
0.0000
0.963280
0.962627
0.013571
0.041438
665.1324
2.621413
Mean dependent var

S.D. dependent var
Schwarz criterion
F-statistic
Prob(F-statistic)
0.007791
0.070199
-5.740281
-5.665541
1475.623
0.000000
Note that the dummy variable is highly significant.

An alternative (and unrelated) use of dummy variables is in the identification of
seasonal effects in the data. In fact, seasonal effects that have not been accounted for
can cause autocorrelation in the residuals of the regression. Furthermore, if there is a
seasonal pattern, it is useful to be able to identify it for example, there is evidence in
the academic finance literature that supports the existence of calendar effects. The
most commonly observed calendar effect in monthly data is a January effect. In order
to examine whether there is indeed a January effect in this regression, create a dummy
variable that will take the value 1 only in the months of January. This is most easily
achieved by creating a new dummy variable called JANDUMMY containing zeros
everywhere, and then editing the variable entries manually, changing all of the zeros
for January months to ones. Create this variable using the methodology described
above, and run the regression again including this new dummy variable.
14
The results of this regression are:

Date: 11/15/07 Time: 13:05
Variable
C
RFTSE
LDIVY
LROI
OCTDUMMY
JANDUMMY
R-squared
Adjusted R-squared
S.E. of regression
Sum squared resid
Log likelihood
Durbin-Watson stat
Coefficient
Std. Error
t-Statistic
Prob.
-0.001596
0.057380
-0.089424
0.871254
-0.213223
-0.005578
0.000872
0.022565
0.037555
0.039882
0.000883
0.003954
-1.830720
2.542887
-2.381145
21.84575
-241.4054
-1.410659
0.0685
0.0117
0.0181
0.0000
0.0000
0.1597
0.963752
0.962943
0.013514
0.040906
666.6191
2.620576
Mean dependent var

S.D. dependent var
Schwarz criterion
F-statistic
Prob(F-statistic)
0.007791
0.070199
-5.744514
-5.654825
1191.127
0.000000
3.7 Stability Tests

The stability of the model is another important consideration. If there was one (or
more) structural shifts during our sample, we would not expect to have stable
parameters over the whole period. The stability of the model can be examined using
the Chow Breakpoint test. The idea of this test is to fit the equation separately for
each sub sample and to see whether there are significant differences in the estimated
equations. A significant difference indicates a structural change in the relationship.
As we know, a weakness of the Chow test is that we must input the date when we
think there was a structural break. For example, March 1989 appears to indicate the
start of a period of lower volatility. We can see this on a graph, where the volatility of
the data seems lower after this period. We can test this theory using the Chow test.
It is important to note that in order to carry out a Chow test the dummy variable(s)
need to be excluded from the regression. This is due to the fact that the dummy and
the constant of the regression are perfectly correlated. To access the Chow test you
must click on the View; Stability Tests; Chow Breakpoint Test in the regression
window of the company specific regression. In the new window that appears, enter
the date at which it is believed that a breakpoint occurred.
15
Chow Breakpoint Test: 1989M03

F-statistic
Log likelihood ratio
2.173931
8.837132
Prob. F(4,222)
Prob. Chi-Square(4)
0.072854
0.065302
The result indicates that there was no structural break in the data. If there was a
structural break, how would we account for it in our model?
Finally, before you exit, save the workfile as an EViews file.
16
EVIEWS TUTORIALS 4 & 5

4
NON-STATIONARITY ........................................................................ 2
4.1
Stationarity _______________________________________________2
Constructing ARMA models in EViews .............................................. 7

5.1
Getting Started ____________________________________________7
5.2
Estimating the autocorrelation coefficients of up to 25 lags ________8
5.3
Using Information Criteria to Decide on Model Orders__________12
4 NON-STATIONARITY
A Test that should always be examined before running a regression is a nonstationarity test of the included variables. If the series included in a regression are not
stationary then the mean and variance of a series are not well defined, making any
inferences about their coefficients unreliable. To examine the variables for a unit root
we perform a Dickey Fuller/Augmented Dickey Fuller test.
Monthly price data for the S&P 500 Index and the GBP/USD monthly exchange rate
for the same period will be used.
Import the file: t:\Eviews\stationary.xls.
Transform both series into logarithms. Call the new variables LNSP500 and
LNFX.
Plot the series in a graph. Do they look stationary?

We can now test the 2 series formally for stationarity.
4.1
Stationarity
To test for stationarity, double click on the series and then in the View menu,
select Unit Root Test.
EViews gives you the options to select from various nonstationarity tests (Augmented
Dickey Fuller, Phillips-Perron, KPSS etc.) and to select whether to:
1. Test the levels or the first or second differences of the series
2. Include an intercept, an intercept and a trend or neither of the two
3. Select a number of lagged differences to be included
Run an ADF test with an intercept and a trend on the levels of the series leaving
the lag length selection to automatic. The lagged differences to be included can be
selected based on the data frequency or chosen using an information criterion such as
AIC or Schwarz.
The results of the Test for the S&P 500 index are:
Null Hypothesis: LNSP500 has a unit root
Exogenous: Constant, Linear Trend
Lag Length: 0 (Automatic based on SIC, MAXLAG=14)
Augmented Dickey-Fuller test statistic

Test critical values:
1% level
5% level
10% level
t-Statistic
Prob.*
-1.535907
-3.997083
-3.428819
-3.137851
0.8146
*MacKinnon (1996) one-sided p-values.
Augmented Dickey-Fuller Test Equation

Dependent Variable: D(LNSP500)
Date: 10/26/07 Time: 19:17
Variable
Coefficient
Std. Error
t-Statistic
Prob.
LNSP500(-1)
C
@TREND(1987M11)
-0.018202
0.115889
9.78E-05
0.011851
0.067318
9.73E-05
-1.535907
1.721508
1.005046
0.1259
0.0865
0.3159
R-squared
Adjusted R-squared
S.E. of regression
Sum squared resid
Log likelihood
Durbin-Watson stat
0.014945
0.006561
0.039104
0.359339
435.2880
2.055505
Mean dependent var

S.D. dependent var
Schwarz criterion
F-statistic
Prob(F-statistic)
0.007948
0.039233
-3.632672
-3.588904
1.782663
0.170456
Is the series Stationary? No, since the test statistic is bigger than (i.e. not as negative
as) the critical values. Can also simply look at the p-value.
Run the same test for the FX Series.

The results are:
Null Hypothesis: LNFX has a unit root

1% level
5% level
10% level
t-Statistic
Prob.*
-2.088622
-3.997587
-3.429063
-3.137995
0.5490
The results clearly show that both series are non stationary. Try performing the test
regressions with different numbers of lags and without a trend does this make
any difference to the conclusion? (No.)
In this case, the first differences of the series must also be examined for nonstationarity, to determine the order of integration.
We can test the first differences from the Stationarity test window by selecting the
Option Test for Unit Root in 1st difference. Note that this could also have been
achieved by using GENR to construct the two series of first differences and then
testing for a unit root in the levels of these already differenced series.
The results are:
Null Hypothesis: D(LNSP500) has a unit root

1% level
5% level
10% level
t-Statistic
Prob.*
-16.01127
-3.997250
-3.428900
-3.137898
0.0000
Null Hypothesis: D(LNFX) has a unit root


1% level
5% level
10% level
t-Statistic
Prob.*
-11.55602
-3.997418
-3.428981
-3.137946
0.0000
In both cases, the test statistic is now more negative than the critical values, and so we
reject the null hypotheses that the differenced series contain a unit root. Hence the
results indicate that both of the original log-levels series are I(1).
Plotting the series in levels and then in first differenced form (returns) illustrates this
result:
.8
.7
.6
.5
.4
.3
88
90
92
94
96
98
00
02
04
06
LNFX
.08
.04
.00
-.04
-.08
-.12
88
90
92
94
96
98
00
DLNFX
02
04
06
5 Constructing ARMA models in EViews

This example uses data comprising daily prices for British Telecom shares for the
period 02/01/95 30/12/99 and monthly FTSE100 Dividend Yields for the period
January 1986 January 1999, obtained from Datastream. Thus, there are 1304
observations for British Telecom and 168 observations for FTSE100 Dividend Yield.
5.1
Getting Started
Using the instructions discussed previously, open EViews and import the BT data as
before. (Remember a constant term (c) and a residual term (resid) will be added
automatically).
T:/Eviews/BT
Save the workfile in the directory you prefer with the name ARMA.WF1.
Repeat the above procedure for the FTSE100 Dividend Yield (start date 1986:1 and
end date 1999:12)
T:/Eviews/FTDY
Construct sets of log-price changes for the two series. In the case of the BT shares, the
logs of the differences are interpreted as continuously compounded returns, while the
log-differences of the dividend yield are simply the continuously compounded
changes in the dividend yield.
The objective of this exercise is to build an ARMA model for both the British
Telecom returns and the FTSE100 Dividend Yield. Recall that there are three stages
involved: identification, estimation, and diagnostic checking. The first stage is carried
out by looking at the autocorrelation coefficients to identify any structure in the data.
5.2
Estimating the autocorrelation coefficients of up to 25 lags
In the BT workfile, click on View and choose Correlogram.

In the Correlogram Specification window, choose Level and 25 lags. Click on OK.
Repeat this exercise for:
BT returns
FTSE100 dividend yield in levels
Log-differences of the FTSE100 dividend yield
The graphs, together with the relevant test statistics, are given below:
BT:
Date: 10/26/07 Time: 20:28
Sample: 1/02/1995 12/30/1999
Autocorrelation
|********
|********
|********
|********
|*******|
|*******|
|*******|
|*******|
|*******|
|*******|
|*******|
|*******|
|*******|
|*******|
|*******|
|*******|
|*******|
|*******|
|*******|
|*******|
|*******|
|*******|
|*******|
|*******|
|*******|
Partial Correlation
|********
|
|
|*
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
*|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
AC
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
0.994
0.988
0.983
0.978
0.974
0.969
0.965
0.960
0.956
0.952
0.948
0.945
0.940
0.935
0.930
0.926
0.921
0.916
0.911
0.906
0.901
0.897
0.893
0.889
0.885
PAC
Q-Stat
Prob
0.994
-0.007
0.071
0.019
0.026
-0.010
0.024
-0.004
0.019
0.002
0.056
-0.028
-0.050
-0.026
-0.014
0.004
0.039
-0.066
-0.042
0.035
-0.013
0.049
0.059
-0.018
0.014
1291.7
2569.1
3834.4
5088.4
6331.7
7564.1
8786.2
9998.2
11200.
12393.
13578.
14754.
15919.
17074.
18218.
19351.
20474.
21586.
22686.
23775.
24853.
25921.
26982.
28034.
29078.
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
FTDY
Date: 10/26/07 Time: 20:33
Sample: 1986M01 1999M12
Autocorrelation
.|*******|
.|*******|
.|*******|
.|****** |
.|****** |
.|****** |
.|***** |
.|***** |
.|***** |
.|***** |
.|**** |
.|**** |
.|**** |
.|**** |
.|**** |
.|*** |
.|*** |
.|*** |
.|*** |
.|*** |
.|*** |
.|** |
.|** |
.|** |
.|** |
Partial Correlation
.|*******|
*|.
|
.|*
|
.|.
|
.|.
|
.|.
|
.|.
|
.|.
|
.|*
|
.|.
|
.|.
|
.|.
|
.|.
|
*|.
|
.|.
|
.|.
|
.|.
|
.|.
|
.|.
|
.|.
|
.|.
|
.|.
|
*|.
|
.|.
|
.|.
|
AC
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
0.950
0.896
0.854
0.818
0.781
0.739
0.700
0.665
0.637
0.607
0.575
0.541
0.517
0.488
0.463
0.439
0.412
0.391
0.372
0.354
0.339
0.325
0.305
0.285
0.266
PAC
Q-Stat
Prob
0.950
-0.060
0.092
0.037
-0.032
-0.056
0.019
-0.009
0.068
-0.043
-0.022
-0.024
0.057
-0.072
0.050
-0.010
-0.036
0.035
0.013
-0.012
0.039
0.002
-0.076
0.003
-0.012
154.22
292.30
418.48
535.14
642.04
738.27
825.29
904.16
977.12
1043.8
1103.8
1157.5
1206.7
1250.8
1290.7
1326.9
1359.1
1388.2
1414.8
1439.0
1461.3
1482.0
1500.3
1516.4
1530.5
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
Note that the output is slightly different here to that which appears on the screen. It is
clearly evident from the first columns that the series are both very persistent, with
autocorrelation functions that only die away very slowly. Only the first partial
autocorrelation coefficient appears strongly significant. The numerical values of the
autocorrelation and partial autocorrelation coefficients at lags 1 to 25 are given in
columns 4 and 5 of the output, with the lag length given in column 3. Again, for both
of the raw data series, the slow decay of the acf is evident, especially for the BT share
price series. Even at lag 25, the autocorrelation coefficient is still 0.885.
The penultimate column of output gives the statistic resulting from a Ljung-Box test
with number of lags in the sum equal to the row number (i.e. the number in column
3). The test statistics will follow a 2(1) for the first row, a 2(2) for the second row,
and so on. p-values associated with these test statistics are given in the last column.
Since the raw data are likely to be non-stationary, an application of this test is not
valid. For this and various other reasons, it is usual practice to work with the logs of
9
the changes of a series rather than the series itself. The non-stationary feature of the
raw data will otherwise swamp all others.
The autocorrelation and partial autocorrelation functions for the BT returns and the
continuously compounded changes in the dividend yield are:
BT Return Series:
Date: 10/26/07 Time: 20:37
Sample: 1/02/1995 12/30/1999
Autocorrelation
|*
*|
*|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Partial Correlation
|*
*|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
AC
PAC
Q-Stat
Prob
0.091
-0.087
-0.073
-0.050
-0.046
-0.021
-0.036
0.002
0.065
-0.011
0.000
0.027
0.015
-0.026
-0.021
-0.000
0.018
0.050
-0.007
0.035
0.023
-0.025
-0.057
-0.023
-0.017
0.091
-0.096
-0.057
-0.046
-0.050
-0.025
-0.048
-0.003
0.051
-0.032
0.009
0.026
0.012
-0.021
-0.010
0.007
0.014
0.044
-0.011
0.047
0.018
-0.019
-0.038
-0.011
-0.020
10.916
20.734
27.773
30.989
33.756
34.329
36.066
36.074
41.675
41.841
41.841
42.812
43.091
43.979
44.553
44.553
44.980
48.328
48.395
49.987
50.680
51.537
55.867
56.564
56.968
0.001
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
10
FTDY Log-differences
Date: 10/26/07 Time: 20:41
Sample: 1986M01 1999M12
Autocorrelation
.|.
*|.
*|.
.|.
.|.
.|.
*|.
*|.
.|*
.|.
.|.
.|.
.|*
.|.
.|.
.|.
.|.
.|.
.|.
*|.
.|.
.|.
.|*
.|.
.|.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Partial Correlation
.|.
*|.
*|.
.|.
.|.
.|.
*|.
*|.
.|.
.|.
.|.
.|.
.|*
.|.
.|.
.|.
.|.
.|.
.|.
*|.
.|.
.|.
.|*
.|.
.|.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
AC
PAC
Q-Stat
Prob
0.050
-0.083
-0.091
0.022
-0.005
-0.037
-0.102
-0.090
0.071
0.036
0.028
-0.045
0.091
0.013
-0.018
-0.033
-0.044
-0.034
-0.040
-0.066
0.025
0.009
0.119
0.028
-0.003
0.050
-0.086
-0.083
0.024
-0.022
-0.041
-0.099
-0.093
0.058
-0.001
0.024
-0.036
0.092
-0.011
-0.024
-0.008
-0.035
-0.034
-0.048
-0.073
0.039
-0.030
0.109
0.005
0.005
0.4233
1.5977
3.0165
3.1021
3.1066
3.3505
5.2028
6.6500
7.5523
7.7810
7.9237
8.2938
9.8164
9.8481
9.9054
10.111
10.480
10.701
11.010
11.850
11.967
11.985
14.748
14.900
14.902
0.515
0.450
0.389
0.541
0.684
0.764
0.635
0.575
0.580
0.650
0.720
0.762
0.709
0.773
0.826
0.861
0.882
0.907
0.923
0.921
0.940
0.958
0.903
0.924
0.944
Remember that as a rule of thumb, a given autocorrelation coefficient is classed as

significant if it is outside a 1.96 * 1 (T ) 1 2 band, where T is the number of
observations. In this case, it would imply a correlation coefficient is classed as
significant if it is bigger than approximately 0.054 or smaller than -0.054 for the BT
returns, while the band for the dividend yield would be (-0.15,0.15). The band for the
latter is of course wider since the sampling frequency is monthly rather than daily and
consequently there are fewer observations.
It can be deduced that, for the BT returns series, the first three autocorrelation
coefficients and the first three partial autocorrelation coefficients are significant under
this rule. Since the first acf coefficient is highly significant, the Ljung-Box joint test
11
statistic rejects the null hypothesis of no autocorrelation at the 1% level for all
numbers of lags considered.
In the case of the dividend yield log change series, the second and third
autocorrelation and partial autocorrelation coefficients are significant, although the
first acf and pacf coefficients are not. The Ljung-Box test statistic is never significant
for this series.
In the BT case, it could be concluded that a mixed ARMA process could be
appropriate, although it is hard to precisely determine the appropriate order given this
output.
For the dividend yield series, on the other hand, it seems that there is little structure in
the data that could be captured by a linear time series model. In order to investigate
this issue further, the information criteria are now employed.
5.3
Using Information Criteria to Decide on Model Orders
As demonstrated above, deciding on the appropriate model orders from

autocorrelation functions could be very difficult in practice. An easier way is to
choose the model order that minimizes the value of an information criterion.
An important point to note is that books and statistical packages often differ in their
construction of the test statistic. For example, the formulae for Akaikes and
Schwarzs Information Criteria:
AIC log( 2 )
2k
T
SBIC log( 2 )
k
(log T )
T
where, 2 is the estimator of the variance of regressions disturbances u t k is the

number of parameters and T is the sample size.
When using the criterion based on the estimated standard errors, the model with the
lowest value of AIC and SBIC should be chosen.
EViews uses a formulation of the test statistic based on the log-likelihood function
value using maximum likelihood estimation. The corresponding EViews formulae are
AIC 2 / T
2k
T
SBIC 2 / T
k
(log T )
T
12
where, l
T
1 log(2 ) log(u u / T )
2
Unfortunately, this modification is not benign, since it affects the relative strength of
the penalty term compared with the error variance, sometimes leading different
packages to select different model orders for the same data and criterion!
Suppose that it is thought that ARMA models from order (0,0) to (5,5) are plausible
for these two returns series. This would entail considering 36 models (ARMA(0,0),
ARMA(1,0), ARMA(2,0), ARMA(5,5)), i.e. up to 5 lags in both the autoregressive
and moving average terms.
In EViews, this can be done by separately estimating each of the models and noting
down the value of the information criteria in each case. This can be done in the
following way:
On the EViews main menu, click on Quick and choose Estimate Equation. EViews
will open an Equation Specification window. In the Equation Specification editor,
type, for example
rbt c ar(1) ma(1)
For the estimation settings, select LS Least Squares (NLS and ARMA), select the
whole sample, and click OK this will specify an ARMA(1,1). The output is given
below:
Dependent Variable: RBT
Date: 10/26/07 Time: 21:00
Sample (adjusted): 1/04/1995 12/30/1999
Convergence achieved after 33 iterations
Backcast: 1/03/1995
Variable
Coefficient
Std. Error
t-Statistic
Prob.
C
AR(1)
MA(1)
0.001065
-0.305276
0.412712
0.000556
0.214965
0.205631
1.913614
-1.420121
2.007055
0.0559
0.1558
0.0450
R-squared
Adjusted R-squared
S.E. of regression
Sum squared resid
Log likelihood
Durbin-Watson stat
Inverted AR Roots
Inverted MA Roots
0.012580
0.011059
0.018554
0.447170
3345.227
2.016086
Mean dependent var

S.D. dependent var
Schwarz criterion
F-statistic
Prob(F-statistic)
-.31
-.41
13
0.001064
0.018657
-5.133989
-5.122073
8.274493
0.000269
Note that the header for the EViews output for ARMA models states the number of
iterations that have been used in the model estimation process. This shows that, in
fact, an iterative numerical optimisation procedure has been employed to estimate the
coefficients.
Repeating these steps for the other ARMA models would give all of the required
values for the information criteria.
To give just one more example, in the case of an ARMA(5,5), the following would be
typed in the Equation specification editor box:
rbt c ar(1) ar(2) ar(3) ar(4) ar(5) ma(1) ma(2) ma(3) ma(4) ma(5)
The values of all of the information criteria, calculated using EViews, are as
follows:
Information Criteria for British Telecom Stock Return ARMA Models
AIC
p/q
-5.125
-5.134
-5.138
-5.143
-5.144
-5.144
-5.131
-5.134
-5.146
-5.145
-5.143
-5.142
-5.138
-5.146
-5.144
-5.143
-5.141
-5.140
-5.139
-5.144
-5.143
-5.142
-5.144
-5.143
-5.140
-5.143
-5.143
-5.145
-5.141
-5.141
-5.140
-5.141
-5.140
-5.139
-5.142
-5.141
p/q
-5.121
-5.126
-5.126
-5.127
-5.124
-5.120
-5.123
-5.122
-5.130
-5.125
-5.119
-5.114
-5.126
-5.130
-5.125
-5.119
-5.113
-5.108
-5.123
-5.125
-5.119
-5.114
-5.112
-5.107
-5.120
-5.119
-5.115
-5.113
-5.105
-5.102
-5.117
-5.113
-5.108
-5.103
-5.102
-5.097
SBIC
So what model actually minimises the two information criteria? Both the AIC and
SBIC are minimised at p=1 and q=2 or p=2 and q=1 for British Telecom returns. For
the FTSE100 dividend yield log-changes, both information criteria are minimised at
p=2 and q=3 (not shown). Interestingly, both criteria suggest the same model order for
the BT series and for the dividend yield series, although this is usually not the case.
14

BOOKFE Kalot14

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

BOOKFE Kalot14

Încărcat de

Drepturi de autor:

Formate disponibile

Financial Econometrics

Lecturer: Dr Elena Kalotychou

Cass Business School, City University

Brooks, C., Introductory Econometrics for Finance, Cambridge University

Welcome and Enjoy the Ride!

Brief Outline of Material in this Booklet

Motivations for this module:

Literal meaning is measurement in economics.

Definition of financial econometrics:

Examples of the kinds of problems that

1. Testing whether financial markets are weak-form informationally efficient.

Examples of the kinds of problems that

6. Determiningg the optimal

What are the Special Characteristics

Frequency & quantity of data

Types of Data and Notation

Time-Series Versus Cross-Sectional Data

Cross-Sectional Versus Panel Data

Returns in Financial Modelling

where, Rt and rt denote the return at time t

Steps involved in the formulation of

Regression is different from Correlation

Simple Regression: An Example

Excess return = rXXXt rft

Excess return on market

Graph (Scatter Diagram)

Excess return on market portfolio

Finding a Line of Best Fit

We can use the ggeneral equation

Why do we include a Disturbance term?

The disturbance term can capture a number of features:

Determining the Regression Coefficients

Ordinary Least Squares

Actual and Fitted Value

How OLS Works

But what was ut ?

It was the difference between the actual point and

So minimising yt y t is equivalent to minimising

Deriving the OLS Estimator

Want to minimise L with respect to (w.r.t.) and , so differentiate L w.r.t.

Deriving the OLS Estimator: 2

Substitute into (4) for from (5),

Deriving the OLS Estimator: 3

(Tx 2 xt2 ) Tyx xt yt

This method of finding the optimum is known as ordinary least squares.

What do we use and for?

y t 1.74 1.64 20 31.06

Accuracy of Intercept Estimate

The Population and the Sample

A sample is a selection of just some items from the population.

The DGP and the PRF

and we also know that

We use the SRF to infer likely values of the PRF.

Linear and Non-Linear Models

But some models are intrinsically non-linear, e.g.

Estimators are the formulae used to calculate the coefficients

Estimates are the actual numerical values for the coefficients.

The Assumptions Underlying the

The Fifth Assumption: Normality

An alternative assumption to 4, which is slightly stronger, is that the xts are

Properties of the OLS Estimator

- is an estimator of the true value of .

Precision and Standard Errors