Sunteți pe pagina 1din 234

Lecture Notes in Finance 1 (MiQE/F, MSc course

at UNISG)

Paul Söderlind1

20 December 2010

1 Universityof St. Gallen. Address: s/bf-HSG, Rosenbergstrasse 52, CH-9000 St. Gallen,
Switzerland. E-mail: Paul.Soderlind@unisg.ch. Document name: Fin1MiQEFAll.TeX
Contents

1 Mean-Variance Frontier 4
1.1 Portfolio Return: Mean, Variance, and the Effect of Diversification . . 4
1.2 Mean-Variance Frontier of Risky Assets . . . . . . . . . . . . . . . . 9
1.3 Mean-Variance Frontier of Riskfree and Risky Assets . . . . . . . . . 19
1.4 Examples of Portfolio Weights from MV Calculations . . . . . . . . . 22

A A Primer in Matrix Algebra 24

B A Primer in Optimization 27

2 Index Models 31
2.1 The Inputs to a MV Analysis . . . . . . . . . . . . . . . . . . . . . . 31
2.2 Single-Index Models . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.3 Estimating Beta . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.4 Multi-Index Models . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
2.5 Principal Component Analysis . . . . . . . . . . . . . . . . . . . . . 42
2.6 Estimating Expected Returns . . . . . . . . . . . . . . . . . . . . . . 45
2.7 Estimation on Subsamples . . . . . . . . . . . . . . . . . . . . . . . 46
2.8 Robust Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

3 Risk Measures 54
3.1 Symmetric Dispersion Measures . . . . . . . . . . . . . . . . . . . . 54
3.2 Downside Risk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
3.3 Empirical Return Distributions . . . . . . . . . . . . . . . . . . . . . 65
3.4 Threshold Exceedance . . . . . . . . . . . . . . . . . . . . . . . . . 67

1
4 CAPM 76
4.1 Portfolio Choice with Mean-Variance Utility . . . . . . . . . . . . . . 76
4.2 Beta Representation of Expected Returns . . . . . . . . . . . . . . . 84
4.3 Market Equilibrium . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
4.4 An Application of MV Portfolio Choice: International Assets . . . . . 93

5 Utility-Based Portfolio Choice 100


5.1 Utility Functions and Risky Investments . . . . . . . . . . . . . . . . 100
5.2 Utility Optimization and the Two-Fund Theorem . . . . . . . . . . . 106
5.3 Application of Normal Returns: Value at Risk, ES, Lpm and the Telser
Criterion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
5.4 Behavioural Finance . . . . . . . . . . . . . . . . . . . . . . . . . . 123

6 CAPM Extensions 126


6.1 Nonmarketable Assets . . . . . . . . . . . . . . . . . . . . . . . . . 126
6.2 Heterogenous Investors . . . . . . . . . . . . . . . . . . . . . . . . . 132
6.3 CAPM without a Riskfree Rate . . . . . . . . . . . . . . . . . . . . 134
6.4 Multi-Factor Models and APT . . . . . . . . . . . . . . . . . . . . . 137
6.5 Joint Portfolio and Savings Choice . . . . . . . . . . . . . . . . . . . 140

7 Testing CAPM and Multifactor Models 144


7.1 Market Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
7.2 Several Factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
7.3 Fama-MacBeth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157

A Statistical Tables 160

8 Investment for the Long Run 163


8.1 Time Diversification: Approximate Case . . . . . . . . . . . . . . . . 163
8.2 Time Diversification and the Growth-Optimal Portfolio: Lognormal
Returns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
8.3 More General Utility Functions and Rebalancing . . . . . . . . . . . 175

9 Performance Analysis 178


9.1 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . 178

2
9.2 Performance Attribution . . . . . . . . . . . . . . . . . . . . . . . . 187
9.3 Style Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188

10 Predicting Asset Returns 192


10.1 Asset Prices, Random Walks, and the Efficient Market Hypothesis . . 192
10.2 Autocorrelations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196
10.3 Other Predictors and Methods . . . . . . . . . . . . . . . . . . . . . 204
10.4 Security Analysts . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207
10.5 Technical Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 212
10.6 Spurious Regressions and In-Sample Overfit . . . . . . . . . . . . . 214
10.7 Empirical U.S. Evidence on Stock Return Predictability . . . . . . . . 220

11 Event Studies 225


11.1 Basic Structure of Event Studies . . . . . . . . . . . . . . . . . . . . 225
11.2 Models of Normal Returns . . . . . . . . . . . . . . . . . . . . . . . 227
11.3 Testing the Abnormal Return . . . . . . . . . . . . . . . . . . . . . . 230
11.4 Quantitative Events . . . . . . . . . . . . . . . . . . . . . . . . . . . 233

3
1 Mean-Variance Frontier
Reference: Elton, Gruber, Brown, and Goetzmann (2010) 4–6; Fabozzi, Focardi, and
Kolm (2006) 4

1.1 Portfolio Return: Mean, Variance, and the Effect of Diversifica-


tion

Many portfolio choice models center around two moments of the chosen portfolio: the
expected return and the variance. This section is therefore devoted to discussing how
these moments of the portfolio are related to the corresponding moments of the underlying
assets.

1.1.1 Portfolio Returns: Expected Value and Variance

Remark 1.1 (Expected value and variance of a linear combination) Recall that

E.aR1 C bR2 / D a E.R1 / C b E.R2 /, and


Var.aR1 C bR2 / D a2 11 C b 2 22 C 2ab12 ;

where ij D Cov.Ri ; Rj /; and i i D Cov.Ri ; Ri / D Var.Ri /.

Remark 1.2 (On the notation in these lecture notes ) Mean returns are denoted E.Ri /
or i . Variances are denoted i2 or i i and the standard deviations i . Covariances are
denoted ij .

The return on a portfolio with the portfolio weights w1 ; w2 ; :::; wn (˙inD1 wi D 1) is

Rp D w1 R1 C w2 R2 (with n D 2) (1.1)
n
X
D wi Ri (more generally), (1.2)
i D1

4
and the expected return is

E.Rp / D w1 E.R1 / C w2 E.R2 / (with n D 2) (1.3)


n
X
D wi E.Ri / (more generally). (1.4)
i D1

Let ij D Cov.Ri ; Rj /, and i i D Cov.Ri ; Ri / D Var.Ri /. The variance of a portfolio


return is then

p2 D w12 11 C w22 22 C 2w1 w2 12 (with n D 2) (1.5)


n
X n
X n
X
D wi2 i i C wi wj ij (more generally). (1.6)
i D1 i D1 j D1;j ¤i

In matrix form we have

E.Rp / D w 0 E.R/ and (1.7)


p2 D w 0 ˙w: (1.8)

Remark 1.3 (Details on the matrix form) With two assets, we have the following:
" # " # " #
w1 E.R1 / 11 12
wD ; E.R/ D ; and ˙ D :
w2 E.R2 / 12 22

E.Rp / D w 0 E.R/
" #
h i E.R /
1
D w1 w2
E.R2 /
D w1 E.R1 / C w2 E.R2 /:

5
p2 D w 0 ˙w
" #" #
i 
11 12 w1
h
D w1 w2
12 22 w2
" #
h i w
1
D w1 11 C w2 12 w1 12 C w2 22
w2
D w12 11 C w2 w1 12 C w1 w2 12 C w22 22 :

1.1.2 The Effect of Diversification

First, assume that the returns are uncorrelated (ij D 0 if i ¤ j ). This is clearly not
realistic, but provides a good starting point for illustrating the effect of diversification.
We will consider equally weighted portfolios of n assets (wi D 1=n). There are other
portfolios with lower variance (and the same expected return), but it provides a simple
analytical case.
The variance of an equally weighted portfolio is (when all covariances are zero)
n n
X 1 1 X i i
p2 D ii D (1.9)
i D1
n2 n i D1 n
1
D  i i , (if ij D 0/: (1.10)
n
In this expression,  i i is the average variance of an individual return. This number could
be treated as a constant (that is, not depend on n) if we form portfolios by randomly pick-
ing assets. In any case, (1.10) shows that the portfolio variance goes to zero as the number
of assets (included in the portfolio) goes to infinity. Also a portfolio with a large but finite
number of assets will typically have a low variance (unless we have systematically picked
the very most volatile assets).
Second, we now allow for correlations of the returns. The variance of the equally
weighted portfolio is then
1
p2 D  i i  ij C  ij ; (1.11)

n
where  ij is the average covariance of two returns (which, again, can be treated as a
constant if we pick assets randomly). Realistically,  ij is positive. When the portfolio
includes many assets, then the average covariance dominates. In the limit (as n goes to
infinity), only this non-diversifiable risk matters.

6
Var of (randomly picked) equally weighted portfolio
0.03
Based on 10 US industry portfolios, 1947:1−2009:12

0.025
Variance

Variance
Avg covariance

0.02

0.015
0 2 4 6 8 10
Number of assets in portfolio

Figure 1.1: Effect of diversification

See Figure 1.1 for an example.


Proof. (of (1.11)) The portfolio variance is
n n n
X 1 X X 1
p2 D ii C ij
i D1
n2 i D1
n2
j D1;;j ¤i
n n n
1 X i i n 1X X ij
D C
n i D1
n n i D1 n .n 1/
j D1;j ¤i
1 n 1
D ii C  ij ;
n n
which can be rearranged as (1.11).

Remark 1.4 (On negative covariances in (1.11) ) Formally, it can be shown that  ij
must be non-negative as n ! 1. It is simply not possible to construct a very large
number of random variables (asset returns or whatever other random variable) that are,
on average, negatively correlated with each other. In (1.11) this manifests itself in that
 ij < 0 would give a negative portfolio variance as n increases.

7
A (NoDur)
B (Durbl)
C (Manuf)
D (Enrgy)
E (HiTec)
F (Telcm)
G (Shops)
H (Hlth )
I (Utils)
J (Other)

Table 1.1: Industries

1.1.3 Some Practical Remarks: Annualizing, Portfolio Weights

Remark 1.5 (Annualizing the MV figures ) Suppose we have weekly net returns R t D
P t =P t 1 1. The standard way of annualizing the mean and the standard deviation
is to first estimate means and the covariance matrix on weekly returns, do all the MV
calculations, and then (when showing the results) multiply the mean weekly return by 52
p
and the standard deviation of the weekly return by 52. To see why, notice that an annual
return would be

P t =P t 52 1 D .P t =P t 1 /.P t 1 =P t 2 / : : : .P t 51 =P t 52 / 1
D .R t C 1/.R t 1 C 1/ : : : .R t 51 C 1/ 1
 Rt C Rt 1 C : : : C Rt 51 :

To a first approximation, the mean annual return would therefore be

E.R t C R t 1 C : : : C Rt 51 / D 52 E.R t /;

and if returns are iid (in particular, same variance and uncorrelated across time)

Var.R t C R t 1 C : : : C RtD 52 Var.R t / )


51 /
p
Std.R t C R t 1 C : : : C R t 51 / D 52 Std.R t /:

Remark 1.6 (Portfolio weights ) If your total portfolio is worth W , and you have bought
˛i shares of firm i at the price Pi each, then the portfolio weight of that firm is clearly

8
wi D ˛i Pi =W .

1.2 Mean-Variance Frontier of Risky Assets

To calculate a point on the mean-variance frontier, we have to find the portfolio that
minimizes the portfolio variance, p2 , for a given expected return,  . The problem is
thus
Pn Pn
minwi p2 subject to E.Rp / D i D1 wi i D  and i D1 wi D 1: (1.12)

Let ˙ be the covariance matrix of the asset returns. The portfolio variance is then calcu-
lated as
p2 D Var. niD1 wi Ri / D w 0 ˙w: (1.13)
P

The whole mean-variance frontier is generated by solving this problem for different values
of the expected return ( ). The results are typically shown in a figure with the standard
deviation on the horizontal axis and the required return on the vertical axis. The efficient
frontier is the upper leg of the curve. Reasonably, a portfolio on the lower leg is dominated
by one on the upper leg at the same volatility (since it has a higher expected return). See
Figure 1.2 for an example.

Remark 1.7 (Only two assets.) In the (empirically uninteresting) case of only two assets,
the MV frontier can be calculated by simply calculating the mean and variance

E.Rp / D w1 C .1 w/2


p2 D w11 C .1 w/2 22 C 2w.1 w/12 :

at a set of different portfolio weights (for instance, w D .0; 0:25; 0:5; 0:75; 1/.) The
reason is that, with only two assets, both assets are on the MV frontier—so no explicit
minimization is needed. See Figures 1.3–1.4 for examples.

It is (relatively) straightforward to calculate the mean-variance frontier if there are no


other constraints: it just takes some linear algebra—see Section 1.2.2. See Figure 1.6 for
an example.
There are sometimes additional restrictions, for instance,

no short sales: wi  0: (1.14)

9
Mean−variance frontiers (3 risky assets)
15
original assets

10
Mean, %

E(R) 12.50 10.50 6.00


Std 12.90 9.00 4.80

Correlation matrix:
5 1.00 0.33 0.45
0.33 1.00 0.05
0.45 0.05 1.00
no restrictions
no short sales
0
0 5 10 15
Std, %

Figure 1.2: Mean-variance frontiers

We then have to apply some explicit numerical minimization algorithm to find portfolio
weights. Algorithms that solve quadratic problems are best suited (this is a quadratic
problem—see (1.13)). See Figure 1.2 for an example. Other commonly used restrictions
are that the new weights should not deviate too much from the old (when rebalancing)—in
an effort to reduce trading costs

jwinew wiold j < Ui ; (1.15)

or that the portfolio weights must be between some boundaries

Li  wi  Ui : (1.16)

Such constraints are typically easy to implement numerically.


Consider what happens when we add assets to the investment opportunity set. The old
mean-variance frontier is, of course, still obtainable: we can always put zero weights on
the new assets. In most cases, we can do better than that so the mean-variance frontier
is moved to the left (lower volatility at the same expected return). See Figure 1.5 for an
example.

10
MV−frontier with two assets
8.5
(x,y) means a portfolio with
8 x% in asset A and y% in asset B (0,100)

7.5
(25,75)
7
Mean, %

6.5 (50,50)

6
(75,25)
5.5
5 (100,0)

4.5
4
8 9 10 11 12 13 14 15 16 17
Std, %

Figure 1.3: Mean-variance frontiers for two risky assets.

1.2.1 The Shape of the MV Frontier of Risky Assets

This section discusses how the shape of the MV frontier depends on the correlation of the
assets. For simplicity, only two assets are used but the general findings hold also when
there are more assets.
With intermediate correlations ( 1 <  < 1) the mean-variance frontier is a hyperbola—
see Figures 1.7 and 1.8. Notice that the mean–volatility trade-off improves as the correla-
tion decreases: a lower correlation means that we get a lower portfolio standard deviation
(at the same expected return). In fact, the case of a perfect (positive) correlation is a lim-
iting case: a combination of two assets can never have higher standard deviation than the
line connecting them in the   E.R/ space.
When the assets are perfectly correlated ( D 1), then the MV frontier is a pair of two
straight lines—see Figure 1.8. The efficient frontier is clearly the upper leg. However,
if short sales are ruled out then the MV frontier is just a straight line connecting the two
assets (see the circles in Figure 1.8). The intuition is that a perfect correlation means that
the second asset is a linear transformation of the first (R2 D a C bR1 ), so changing the
portfolio weights essentially means forming just another linear combination of the first
asset. In particular, there are no diversification benefits.

11
MV−frontier with two assets
8.5
corr = 0
8
corr = 0.75
7.5
7
Mean, %

6.5
6
5.5
5
4.5
4
8 9 10 11 12 13 14 15 16 17
Std, %

Figure 1.4: Mean-variance frontiers for two risky assets, different correlations.

Also when the assets are perfectly negatively correlated ( D 1), then the MV
frontier is a pair of straight lines, see Figure 1.8. In contrast to the case with a perfect
positive correlation, this is true also when short sales are ruled out. This means, for
instance, that we can combine the two assets (with positive weights) to get a riskfree
portfolio.
Proof. (of the MV shapes with 2 assets ) With a perfect correlation ( D 1) the
standard deviation can be rearranged. Suppose the portfolio weights are positive (no short
sales). Then we get
1=2
p D w12 11 C .1 w1 /2 22 C 2w1 .1

w1 / 1 2
1=2
D Œw1 1 C .1 w1 / 2 2
˚

D w1 1 C .1 w1 / 2 :

We can rearrange this expression as w1 D p 2 / which we can use in the



2 = .1
expression for the expected return to get
p 2
E.Rp / D ŒE.R1 / E.R2 / C E.R2 /:
1 2

12
Mean−variance frontiers
15
original assets
new asset

10
Mean, %

3 assets
4 assets
0
0 5 10 15
Std, %

Figure 1.5: Mean-variance frontiers

This shows that the mean-variance frontier is just a straight line (if there are no short
sales).
With a perfectly negative correlation ( D 1) the standard deviation can be rear-
ranged as follows (assuming positive weights)
1=2
p D w12 11 C .1 w1 /2 22 2w1 .1 w1 / 1 2

( ˚ 1=2
Œw1 1 .1 w1 / 2 2 D w1 1 .1 w1 / 2 if Œ  0
D ˚ 2 1=2
Œ w1 1 C .1 w1 / 2  D w1 1 C .1 w1 / 2 if Œ  0:

The 2nd expression is 1 times the 1st expression. Only one can be positive at each
time. Both have same form as in case with  D 1, so both generate linear relation:
E Rp D a C bp —but with different slopes. We get a riskfree portfolio (p D 0) if


w1 D 2 =.1 C 2 /.

1.2.2 Calculating the MV Frontier of Risky Assets: No Restrictions

When there are no restrictions on the portfolio weights, then there are two ways of finding
a point on the mean-variance frontier: let a numerical optimization routine do the work or
use some simple matrix algebra. The section demonstrates the second approach.

13
US industry portfolios, 1947:1−2009:12
20

15
D
H E
A CG B
Mean, %

I J
10 F

0
0 5 10 15 20 25
Std, %

Figure 1.6: M-V frontier from US industry indices

To simplify the following equations, define the scalars A; B and C as

A D 0 ˙ 1
; B D 0 ˙ 1
1, and C D 10 ˙ 1
1; (1.17)

where 1 is a (column) vector of ones and 0 is the transpose of the column vector . Then,
calculate the scalars (for a given required return  )

C B A B
D and ı D : (1.18)
AC B 2 AC B 2
The weights for a portfolio on the MV frontier of risky assets (at a given required return
 ) are then
w D ˙ 1 . C 1ı/: (1.19)

Using this in (1.13) gives the variance (take the square root to get the standard deviation).
We can trace out the entire MV frontier, by repeating this calculations for different values
of the required return and then connecting the dots. In the stdmean space, the efficient
frontier (the upper part) is concave. See Figure 1.2 for an example.

14
Mean−variance frontiers
15
original assets

10
Mean, %

original
0.4 higher correlation

0
0 5 10 15
Std, %

Figure 1.7: Mean-variance frontiers for normal and high correlations.

Example 1.8 (Transpose of a matrix) Consider the following examples


2 30 2 30
1 i 1 2
" # " #0 " #
h 1 3 5 1 2 1 2
435 D 1 3 5 ; 43 45 D and D :
6 7 6 7
2 4 6 2 4 2 4
5 5 6

Transposing a symmetric matrix does nothing, that is, if A is symmetric, then A0 D A.

Proof. (of (1.17)–(1.19)) We set up this as a Lagrangian problem

L D .w12 11 C w22 22 C 2w1 w2 12 /=2 C . w1 1 w2 2 / C ı.1 w1 w2 /:

The first order condition with respect to wi is @L=@wi D 0, that is,

for w1 W w1 11 C w2 12 1 ı D 0;


for w2 W w1 12 C w2 22 2 ı D 0:

In matrix notation these first order conditions are


" #" # " # " # " #
11 12 w1 1 1 0
 ı D :
12 22 w2 2 1 0

15
High correlations (>0) Low correlations (>0)
15 15

10 10
Mean, %

Mean, %
5 5

0 corr = 1 0 corr = 1/2


corr = 1/2 corr = 0
−5 −5
0 5 10 15 20 25 0 5 10 15 20 25
Std, % Std, %

Negative correlations Very negative correlations


15 15

10 10
Mean, %

Mean, %

5 5

0 corr = 0 0 corr = −1/2


corr = −1/2 corr = −1
−5 −5
0 5 10 15 20 25 0 5 10 15 20 25
Std, % Std, %

Figure 1.8: Mean-variance frontiers for two risky assets: different correlations. The two
assets are indicated by circles. Points between the two assets can be generated with posi-
tive portfolio weights (no short sales).

We can solve these equations for w1 and w2 as


" # " # " # " #!
w1 1 22 12 1 1
D 2
 Cı
w2 11 22 12 12 11 2 1
" # 1 " # " #!
11 12 1 1
D  Cı
12 22 2 1
1
wD˙ . C ı1/;

where 1 is a column vector of ones. The first order conditions for the Lagrange multipliers

16
MV−frontier at high correlations
15
corr = 1
corr = 0.995
corr = 0.99
10
corr = 0.98
Mean, %

−5
0 5 10 15 20 25
Std, %

Figure 1.9: Mean-variance frontiers for two risky assets at high correlations

are (of course)

for  W  w1 1 w2 2 D 0;
for ı W 1 w1 w2 D 0:

In matrix notation, these conditions are

 D 0 w and 1 D 10 w:

17
Mean−variance frontiers
15
original assets
tangency portfolio

10
Mean, %

risky
risky+riskfree
0
0 5 10 15
Std, %

Figure 1.10: Mean-variance frontiers

Stack these into a 2  1 vector and substitute for w


" # " #
 0
D 0 w
1 1
" #
0
D 0 ˙ 1 . C ı1/
1
" #" #
0 ˙ 1  0 ˙ 1 1 
D 0 1
1 ˙  10 ˙ 1 1 ı
" #" #
A B 
D :
B C ı

Solve for  and ı as


C B A B
and
D ı D :
AC B 2 AC B 2
Use this in the expression for w above.

18
1.3 Mean-Variance Frontier of Riskfree and Risky Assets

We now add a riskfree asset with return Rf . With two risky assets, the portfolio return is

Rp D w1 R1 C w2 R2 C .1 w1 w2 /Rf
D w1 .R1 Rf / C w2 .R2 Rf / C Rf
D w1 R1e C w2 R2e C Rf ; (1.20)

where Rie is the excess return of asset i . We denote the corresponding expected excess
return by ei (so ei D E.Rie /).
The minimization problem is now

minw1 ;w2 .w12 11 C w22 22 C 2w1 w2 12 /=2 subject to w1 e1 C w2 e2 C Rf D  : (1.21)

Notice that we don’t need any restrictions on the sum of weights: the investment in the
riskfree rate automatically makes the overall sum equal to unity.
With more assets, the minimization problem is

Pn
minwi p2 subject to E.Rp / D e
i D1 wi i C Rf D  ; (1.22)

where the portfolio variance is calculated as usual

p2 D Var. niD1 wi Ri / D w 0 ˙w: (1.23)


P

When there are no additional constraints, then we can find an explicit solution in terms
of some matrices and vectors—see Section 1.3.1. In all other cases, we need to apply an
explicit numerical minimization algorithm (preferably for quadratic models).

1.3.1 Calculating the MV Frontier of Riskfree and Risky Assets: No Restrictions

The weights (of the risky assets) for a portfolio on the MV frontier (at a given required
return  ) are
 Rf
w D e 0 1 e ˙ 1 e ; (1.24)
. / ˙ 
where Rf is the riskfree rate and e the vector of mean excess returns ( Rf ). The
weight on the riskfree asset is 1 10 w.

19
US industry portfolios, 1947:1−2009:12
20

15
D
H E
A CG B
Mean, %

I J
10 F

0
0 5 10 15 20 25
Std, %

Figure 1.11: M-V frontier from US industry indices

Using this in (1.13) gives the variance (take the square root to get the standard devia-
tion). We can trace out the entire MV frontier, by repeating this calculations for different
values of the required return and then connecting the dots. In the stdmean space, the
efficient frontier (the upper part) is just a line. See Figure 1.10 for an example.
Proof. (of (1.24)) Define the Lagrangian problem

L D .w12 11 C w22 22 C 2w1 w2 12 /=2 C . w1 e1 w2 e2 Rf /:

The first order condition with respect to wi is @L=@wi D 0, so

for w1 W w1 11 C w2 12 e1 D 0;


for w2 W w1 12 C w2 22 e2 D 0:

20
It is then immediate that we can write them in matrix form as
" #" # " # " #
11 12 w1 e1 0
 e D , so
12 22 w2 2 0
" # " # 1 " #
w1 11 12 e
D  1e , or
w2 12 22 2
1
wD˙ e :

The first order condition for the Lagrange multiplier is (in matrix form)

 D w 0 e C Rf :

Combine to get

 D .e /0 ˙ 1 e C Rf , so
 Rf
 D e 0 1 e:
. / ˙ 
Use in the above expression for w.

1.3.2 Tangency Portfolio

The MV frontier for risky assets and the frontier for risky+riskfree assets are tangent at
one point—called the tangency portfolio. In this case the portfolio weights (1.19) and
(1.24) coincide. Therefore, the portfolio weights (1.24) must sum to unity (so the weight
on the riskfree asset is zero). This helps use to understand what the expected excess return
on the tangency portfolio is—which if used in (1.24) gives the portfolio weights of the
tangency portfolio
˙ 1 e
w D 0 1 e: (1.25)
1˙ 
Proof. (of (1.25)) Put the sum of the portfolio weights in (1.24) equal to zero

 Rf
10 w D e 0 1 e
10 ˙ 1
e D 1;
. / ˙ 
which only happens if
.e /0 ˙ 1 e
 Rf D :
10 ˙ 1 e

21
Using in (1.24) gives (1.25).

1.4 Examples of Portfolio Weights from MV Calculations

With 2 risky assets and 1 riskfree asset the portfolio weights satisfy (1.24). We can write
this as " #
1 22 e1 12 e2
wD 2
; (1.26)
11 22 12 11 e2 12 e1
where  > 0 if we limit our attention to the efficient part where  > Rf . (This follows
from the fact that .e /0 ˙ 1 e > 0 since ˙ 1 is positive definite, because ˙ is). We can
then discuss some general properties of all portfolios in the efficient set.

Simple Case 1: Uncorrelated Assets (12 D 0)

From (1.26) we then get " # " #


w1 e1 =11
D e : (1.27)
w2 2 =22
Suppose that  > 0 (efficient part of the MV frontier) and that both excess returns are
positive. In that case we have the following.
First, both weights are positive. The intuition is that uncorrelated assets make it effi-
cient to diversify (to get the same expected return, but at a lower variance).
Second, the asset with the highest ei =i i ratio has the highest portfolio weight. The
intuition is that an asset with a high excess return and/or low volatility is an efficient way
to achieve a low volatility at a given mean return.
Notice that increasing ei =i i does not guarantee that the actual weight on asset i
increases (because  changes too). For instance, an increase in the expected return of an
asset may allow us to shift assets towards the riskfree asset (and still get the same expected
portfolio return, but lower variance).

Example 1.9 (Portfolio weights with uncorrelated assets) When .e1 ; e2 / D .0:07; 0:07/,
the correlation is zero, .11 ; 22 / D .1; 1/, and  R D 0:09, then (1.27) gives
" # " # " #
w1 0:07 0:64
D 9:18 D :
w2 0:07 0:64

22
If we change to .e1 ; e2 / D .0:09; 0:07/, then
" # " # " #
w1 0:09 0:62
D 6:92 D :
w2 0:07 0:48

If we instead change to .11 ; 22 / D .1=2; 1/, then


" # " # " #
w1 0:14 0:86
D 6:12 D :
w2 0:07 0:43

Simple Case 2: Same Variances (but Correlation)

Let 11 D 22 D 1 (as a normalization), so the covariance becomes the correlation
12 D  where 1 <  < 1:
From (1.26) we then get
" # " #
w1 1 e1 e2
D : (1.28)
w2 1 2 e2 e1

Suppose that  > 0 (efficient part of the MV frontier) and that both excess returns are
positive. In that case, we have the following.
First, both weights are positive if the returns are negatively correlated ( < 0). The
intuition is that a negative correlation means that the assets “hedge” each other (even
better than diversification), so the investor would like to hold both of them to reduce the
overall risk.
Second, if  > 0 and e1 is considerably higher than e2 (so e2 < e1 , which also
implies e1 > e2 ), then w1 > 0 but w2 < 0. The intuition is that a positive correlation
reduces the gain from holding both assets (they don’t hedge each other, and there is rel-
atively little diversification to be gained if the correlation is high). On top of this, asset
1 gives a higher expected return, so it is optimal to sell asset 2 short (essentially a risky
“loan” which allows the investor to buy more of asset 1).

Example 1.10 (Portfolio weights with correlated assets) When .e1 ; e2 / D .0:07; 0:07/,
 D 0:8, and  R D 0:09, then (1.27) gives
" # " # " #
w1 0:039 0:64
D 16:53 D :
w2 0:039 0:64

23
This is the same as in the previous example. If we change to .e1 ; e2 / D .0:09; 0:07/, then
we get " # " # " #
w1 0:094 1:05
D 11:10 D :
w2 0:006 0:06
If we also change to  D 0:8, then we get
" # " # " #
w1 0:406 0:57
D 1:40 D :
w2 0:394 0:55

These two last solutions are very different from the previous example.

A A Primer in Matrix Algebra


Let c be a scalar and define the matrices
" # " # " # " #
x1 z1 A11 A12 B11 B12
xD ;z D ;A D , and B D :
x2 z2 A21 A22 B21 B22

Adding/subtracting a scalar to a matrix or multiplying a matrix by a scalar are both


element by element
" # " #
A11 A12 A11 C c A12 C c
Cc D
A21 A22 A21 C c A22 C c
" # " #
A11 A12 A11 c A12 c
cD :
A21 A22 A21 c A22 c

Example A.1
" # " #
1 3 11 13
C 10 D
3 4 13 14
" # " #
1 3 10 30
10 D :
3 4 30 40

Matrix addition (or subtraction) is element by element


" # " # " #
A11 A12 B11 B12 A11 C B11 A12 C B12
ACB D C D :
A21 A22 B21 B22 A21 C B21 A22 C B22

24
Example A.2 (Matrix addition and subtraction/
" # " # " #
10 2 8
D
11 5 6
" # " # " #
1 3 1 2 2 5
C D
3 4 3 2 6 2

To turn a column into a row vector, use the transpose operator like in x 0
" #0
x1 h i
x0 D D x1 x2 :
x2

Similarly, transposing a matrix is like flipping it around the main diagonal


" #0 " #
A 11 A 12 A 11 A 21
A0 D D :
A21 A22 A12 A22

Example A.3 (Matrix transpose)


"#0
10 h i
D 10 11
11
2 3
" #0 1 4
1 2 3
D 42 55
6 7
4 5 6
3 6

Matrix multiplication requires the two matrices to be conformable: the first matrix
has as many columns as the second matrix has rows. Element ij of the result is the
multiplication of the ith row of the first matrix with the j th column of the second matrix
" #" # " #
A11 A12 B11 B12 A11 B11 C A12 B21 A11 B12 C A12 B22
AB D D :
A21 A22 B21 B22 A21 B11 C A22 B21 A21 B12 C A22 B22

Multiplying a square matrix A with a column vector z gives a column vector


" #" # " #
A11 A12 z1 A11 z1 C A12 z2
Az D D :
A21 A22 z2 A21 z1 C A22 z2

25
Example A.4 (Matrix multiplication)
" #" # " #
1 3 1 2 10 4
D
3 4 3 2 15 2
" #" # " #
1 3 2 17
D
3 4 5 26

For two column vectors x and z, the product x 0 z is called the inner product
" #
h i z
0 1
x z D x1 x2 D x1 z1 C x2 z2 ;
z2

and xz 0 the outer product


" # " #
x1
h i x1 z1 x1 z2
xz 0 D z1 z2 D :
x2 x2 z1 x2 z2

(Notice that xz does not work). If x is a column vector and A a square matrix, then the
product x 0 Ax is a quadratic form.

Example A.5 (Inner product, outer product and quadratic form )


" #0 " # " #
10 2 h i 2
D 10 11 D 75
11 5 5
" # " #0 " # " #
10 2 10 h i 20 50
D 2 5 D
11 5 11 22 55
" #0 " #" #
10 1 3 10
D 1244:
11 3 4 11

A matrix inverse is the closest we get to “dividing” by a matrix. The inverse of a


matrix A, denoted A 1 , is such that

1
AA D I and A 1 A D I;

where I is the identity matrix (ones along the diagonal, and zeroes elsewhere). The matrix
inverse is useful for solving systems of linear equations, y D Ax as x D A 1 y.

26
Example A.6 (Matrix inverse) We have
" #" # " #
4=5 3=5 1 3 1 0
D , so
3=5 1=5 3 4 0 1
" # 1 " #
1 3 4=5 3=5
D :
3 4 3=5 1=5

Let z and x be n  1 vectors. The derivative of the inner product is @.z 0 x/=@z D x.

Example A.7 (Derivative of an inner product) With n D 2


" #
0
@.z x/ @.z1 x 1 C z 2 x 2 / x1
z 0 x D z1 x1 C z2 x2 , so D " # D :
@z @z1 x2
@z2

Let x be n  1 and A a symmetric n  n matrix. The derivative of the quadratic form


is @.x 0 Ax/=@x D 2Ax.

Example A.8 (Derivative of a quadratic form) With n D 2, the quadratic form is


" #" #
h i A A x1
11 12
x 0 Ax D x1 x2 D x12 A11 C x22 A22 C 2x1 x2 A12 :
A12 A22 x2

The derivatives with respect to x1 and x2 are

@.x 0 Ax/ @.x 0 Ax/


D 2x1 A11 C 2x2 A12 and D 2x2 A22 C 2x1 A12 , or
@x1 @x2
" #" #
@.x 0 Ax/ A11 A12 x1
" # D2 :
@x1 A12 A22 x2
@x2

B A Primer in Optimization
You want to choose x and y to minimize

L D .x 2/2 C .4y C 3/2 ;

27
(x−2)2 + (4x+3)2 Contours of (x−2)2 + (4x+3)2

−0.4

5 −0.6

y
−0.8
0 −0.5
4 3 2 −1
x 1 −1 y 1 2 3 4
x

with restriction x+2y=3 (x−2)2 + (4x+3)2 when x+2y=3


20
−0.4
15
−0.6
10
y

−0.8 5
y=(3−x)/2
−1 0
1 2 3 4 1 2 3 4
x x

Figure B.1: Minimization problem

then we have to find the values of x and y that satisfy the first order conditions @L=@x D
@L=@y D 0. These conditions are

0 D @L=@x D 2.x 2/
0 D @L=@y D 8.4y C 3/;

which clearly requires x D 2 and y D 3=4. In this particular case, the first order
condition with respect to x does not depend on y, but that is not a general property. In
this case, this is the unique solution—but in more complicated problems, the first order
conditions could be satisfied at different values of x and y.
See Figure B.1 for an illustration.

28
If you want to add a restriction to the minimization problem, say

x C 2y D 3;

then we can proceed in two ways. The first is to simply substitute for x D 3 2y in L to
get
L D .1 2y/2 C .4y C 3/2 ;

with first order condition

0 D @L=@y D 4.1 2y/ C 8.4y C 3/ D 40y C 20;

which requires y D 1=2. (We could equally well have substituted for y). This is also
the unique solution.
The second method is to use a Lagrangian. The problem is then to choose x, y, and
 to minimize
L D .x 2/2 C .4y C 3/2 C  .3 x 2y/ :

The term multiplying  is the restriction. The first order conditions are now

0 D @L=@x D 2.x 2/ 
0 D @L=@y D 8.4y C 3/ 2
0 D @L=@ D 3 x 2y:

The first two conditions say

x D =2 C 2
y D =16 3=4;

so we need to find . To do that, use these latest expressions for x and y in the third first
order condition (to substitute for x and y)

3 D =2 C 2 C 2 .=16 3=4/ D 5=8 C 1=2, so


 D 4:

29
Finally, use this to calculate x and y as

x D 4 and y D 1=2:

Notice that this is the same solution as before (y D 1=2) and that the restriction holds
(4 C 2. 1=2/ D 3). This second method is clearly a lot clumsier in my example, but it
pays off when the restriction(s) become complicated.

Bibliography
Elton, E. J., M. J. Gruber, S. J. Brown, and W. N. Goetzmann, 2010, Modern portfolio
theory and investment analysis, John Wiley and Sons, 8th edn.

Fabozzi, F. J., S. M. Focardi, and P. N. Kolm, 2006, Financial modeling of the equity
market, Wiley Finance.

30
2 Index Models
Reference: Elton, Gruber, Brown, and Goetzmann (2010) 7–8, 11

2.1 The Inputs to a MV Analysis

To calculate the mean variance frontier we need to calculate both the expected return and
variance of different portfolios (based on n assets). With two assets (n D 2) the expected
return and the variance of the portfolio are
" #
h i 
1
E.Rp / D w1 w2
2
" #" #
h i 2  w1
1 12
P2 D w1 w2 : (2.1)
12 22 w2

In this case we need information on 2 mean returns and 3 elements of the covariance
matrix. Clearly, the covariance matrix can alternatively be expressed as
" # " #
12 12 12 12 1 2
D ; (2.2)
12 22 12 1 2 22

which involves two variances and one correlation (3 elements as before).


There are two main problems in estimating these parameters: the number of parame-
ters increase very quickly as the number of assets increases and historical estimates have
proved to be somewhat unreliable for future periods.
To illustrate the first problem, notice that with n assets we need the following number
of parameters

Required number of estimates With 100 assets


i n 100
i i n 100
ij n.n 1/=2 4950

31
The numerics is not the problem as it is a matter of seconds to estimate a covariance
matrix of 100 return series. Instead, the problem is that most portfolio analysis uses
lots of judgemental “estimates.” These are necessary since there might be new assets
(no historical returns series are available) or there might be good reasons to believe that
old estimates are not valid anymore. To cut down on the number of parameters, it is
often assumed that returns follow some simple model. These notes will discuss so-called
single- and multi-index models.
The second problem comes from the empirical observations that estimates from his-
torical data are sometimes poor “forecasts” of future periods (which is what matters for
portfolio choice). As an example, the correlation between two asset returns tends to be
more “average” than the historical estimate would suggest.
A simple (and often used) way to deal with this is to replace the historical correla-
tion with an average historical correlation. For instance, suppose there are three assets.
Then, estimate ij on historical data, but use the average estimate as the “forecast” of all
correlations:
2 3 2 3
1 12 13 1 N N
estimate 4 1 23 5 , calculate N D .O12 C O13 C O23 /=3, and use 4 1 N5 :
6 7 6 7

1 1

2.2 Single-Index Models

The single-index model is a way to cut down on the number of parameters that we need
to estimate in order to construct the covariance matrix of assets. The model assumes that
the co-movement between assets is due to a single common influence (here denoted Rm )

Ri D ˛i Cˇi Rm Cei , where E.ei / D 0, Cov .ei ; Rm / D 0, and Cov.ei ; ej / D 0: (2.3)

The first two assumptions are the standard assumptions for using Least Squares: the resid-
ual has a zero mean and is uncorrelated with the non-constant regressor. (Together they
imply that the residuals are orthogonal to both regressors, which is the standard assump-
tion in econometrics.) Hence, these two properties will be automatically satisfied if (2.3)
is estimated by Least Squares.
See Figures 2.1 – 2.3 for illustrations.
The key point of the model, however, is the third assumption: the residuals for dif-

32
CAPM regression: Ri−Rf = αi + βi×(Rm−Rf)+ ei
10

8 Intercept (αi) and slope (βi): 2.0 1.3

4
Excess return asset i,%

−2

−4

−6

−8 Data points
Regression line
−10
−10 −5 0 5 10
Market excess return,%

Figure 2.1: CAPM regression

ferent assets are uncorrelated. This means that all comovements of two assets (Ri and
Rj , say) are due to movements in the common “index” Rm . This is not at all guaranteed
by running LS regressions—just an assumption. It is likely to be false—but may be a
reasonable approximation in many cases. In any case, it simplifies the construction of the
covariance matrix of the assets enormously—as demonstrated below.

Remark 2.1 (The market model) The market model is (2.3) without the assumption that
Cov.ei ; ej / D 0. This model does not simplify the calculation of a portfolio variance—but
will turn out to be important when we want to test CAPM.

If (2.3) is true, then the variance of asset i and the covariance of assets i and j are

i i D ˇi2 Var .Rm / C Var .ei / (2.4)


ij D ˇi ˇj Var .Rm / : (2.5)

Together, these equations show that we can calculate the whole covariance matrix by

33
Scatter plot against market return Scatter plot against market return
30 30
US data
Excess return %, HiTec

1970:1−2009:12

Excess return %, Utils


20 20

10 10

0 0

−10 −10
α −0.13 α 0.23
−20 β 1.28 −20 β 0.53
−30 −30
−30 −20 −10 0 10 20 30 −30 −20 −10 0 10 20 30
Excess return %, market Excess return %, market

Figure 2.2: Scatter plot against market return

having just the variance of the index (to get Var .Rm /) and the output from n regressions
(to get ˇi and Var .ei / for each asset). This is, in many cases, much easier to obtain than
direct estimates of the covariance matrix. For instance, a new asset does not have a return
history, but it may be possible to make intelligent guesses about its beta and residual
variance (for instance, from knowing the industry and size of the firm).
This gives the covariance matrix (for two assets)
" #! " # " #
Ri ˇi2 ˇi ˇj Var.ei / 0
Cov D Var .Rm / C , or (2.6)
Rj ˇi ˇj ˇj2 0 Var.ej /
" # " #
ˇi h i Var.ei / 0
D ˇi ˇj Var .Rm / C (2.7)
ˇj 0 Var.ej /

More generally, with n assets we can define ˇ to be an n  1 vector of all the betas and ˙
to be an n  n matrix with the variances of the residuals along the diagonal. We can then
write the covariance matrix of the n  1 vector of the returns as

Cov.R/ D ˇˇ 0 Var .Rm / C ˙: (2.8)

See Figure 2.4 for an example based on the Fama-French portfolios detailed in Table
2.2.

Remark 2.2 (Fama-French portfolios) The portfolios in Table 2.2 are calculated by an-

34
HiTec Utils
constant 0:13 0:23
. 0:83/ .1:42/
market return 1:28 0:53
.31:43/ .12:21/
R2 0:74 0:34
obs 480:00 480:00
Autocorr (t) 0:73 0:85
White 7:34 19:93
All slopes 356:08 165:13

Table 2.1: CAPM regressions, monthly returns, %, US data 1970:1-2009:12. Numbers


in parentheses are t-stats. Autocorr is a N(0,1) test statistic (autocorrelation); White is a
chi-square test statistic (heteroskedasticity), df = K(K+1)/2 - 1; All slopes is a chi-square
test statistic (of all slope coeffs), df = K-1

nual rebalancing (June/July). The US stock market is divided into 5  5 portfolios as


follows. First, split up the stock market into 5 groups based on the book value/market
value: put the lowest 20% in the first group, the next 20% in the second group etc. Sec-
ond, split up the stock market into 5 groups based on size: put the smallest 20% in the first
group etc. Then, form portfolios based on the intersections of these groups. For instance,
in Table 2.2 the portfolio in row 2, column 3 (portfolio 8) belong to the 20%-40% largest
firms and the 40%-60% firms with the highest book value/market value.

Book value/Market value


1 2 3 4 5
Size 1 1 2 3 4 5
2 6 7 8 9 10
3 11 12 13 14 15
4 16 17 18 19 20
5 21 22 23 24 25

Table 2.2: Numbering of the FF indices in the figures.

Proof. (of (2.4)–(2.5) By using (2.3) and recalling that Cov.Rm ; ei / D 0 direct calcu-

35
US industry portfolios, β against the market, 1970:1−2009:12
1.5

1
β

0.5
NoDur Durbl Manuf Enrgy HiTec Telcm Shops Hlth Utils Other

Figure 2.3: ˇs of US industry portfolios

lations give

i i D Var .Ri /
D Var .˛i C ˇi Rm C ei /
D Var .ˇi Rm / C Var .ei / C 2  0
D ˇi2 Var .Rm / C Var .ei / :

Similarly, the covariance of assets i and j is (recalling also that Cov ei ; ej D 0)




ij D Cov Ri ; Rj


D Cov ˛i C ˇi Rm C ei ; ˛j C ˇj Rm C ej


D ˇi ˇj Var .Rm / C 0
D ˇi ˇj Var .Rm / :

36
Correlations, data Difference in correlations: data − model

1 0.5

0.5 0

0 −0.5
25 20 25 20
15 10 20 25 15 10 20 25
5 5 10 15 5 5 10 15
Portfolio Portfolio

25 FF US portfolios, 1957:1−2009:12
Index (factor): US market

Figure 2.4: Correlations of US portfolios

2.3 Estimating Beta

2.3.1 Estimating Historical Beta: OLS and Other Approaches

Least Squares (LS) is typically used to estimate ˛i , ˇi and Std.ei / in (2.3)—and the R2
is used to assess the quality of the regression.

Remark 2.3 (R2 of market model) R2 of (2.3) measures the fraction of the variance (of
Ri ) that is due to the systematic part of the regression, that is, relative importance of mar-
ket risk as compared to idiosyncratic noise (1 R2 is the fraction due to the idiosyncratic
noise)
2 Var.˛i C ˇi Rm / ˇi2 m2
R D D 2 2 :
Var.Ri / ˇi m C ei 2

To assess the accuracy of historical betas, Blume and others estimate betas for non-
overlapping samples (periods)—and then compare the betas across samples. They find
that the correlation of betas across samples is moderate for individual assets, but relatively
high for diversified portfolios. It is also found that betas tend to “regress” towards one: an
extreme historical beta is likely to be followed by a beta that is closer to one. There are
several suggestions for how to deal with this problem.

37
To use Blume’s ad-hoc technique, let ˇOi1 be the estimate of ˇi from an early sample,
and ˇOi 2 the estimate from a later sample. Then regress

ˇOi 2 D 0 C 1 ˇOi1 C i (2.9)

and use it for forecasting the beta for yet another sample. Blume found . O0 ; O1 / D
.0:343; 0:677/ in his sample.
Other authors have suggested averaging the OLS estimate (ˇOi1 ) with some average
beta. For instance, .ˇOi1 C1/=2 (since the average beta must be unity) or .ˇOi1 C˙inD1 ˇOi1 =n/=2
(which will typically be similar since ˙inD1 ˇOi1 =n is likely to be close to one).
The Bayesian approach is another (more formal) way of adjusting the OLS estimate.
It also uses a weighted average of the OLS estimate, ˇOi1 , and some other number, ˇ0 ,
.1 F /ˇOi1 C Fˇ0 where F depends on the precision of the OLS estimator. The general
idea of a Bayesian approach (Greene (2003) 16) is to treat both Ri and ˇi as random. In
this case a Bayesian analysis could go as follows. First, suppose our prior beliefs (before
having data) about ˇi is that it is normally distributed, N.ˇ0 ; 02 /, where (ˇ0 ; 02 ) are some
numbers . Second, run a LS regression of (2.3). If the residuals are normally distributed,
so is the estimator—it is N.ˇOi1 ; ˇ12
/, where we have taken the point estimate to be the
mean. If we treat the variance of the LS estimator (ˇ1 2
) as known, then the Bayesian
estimator of beta is

b D .1 F /ˇOi1 C Fˇ0 , where


2
1=02 ˇ1
F D D 2 : (2.10)
1=02 C 1=ˇ1
2 2
0 C ˇ1

2.3.2 Fundamental Betas

Another way to improve the forecasts of the beta over a future period is to bring in infor-
mation about fundamental firm variables. This is particularly useful when there is little
historical data on returns (for instance, because the asset was not traded before).
It is often found that betas are related to fundamental variables as follows (with signs
in parentheses indicating the effect on the beta): Dividend payout (-), Asset growth (+),
Leverage (+), Liquidity (-), Asset size (-), Earning variability (+), Earnings Beta (slope in
earnings regressed on economy wide earnings) (+). Such relations can be used to make
an educated guess about the beta of an asset without historical data on the returns—but

38
with data on (at least some) of these fundamental variables.

2.4 Multi-Index Models

2.4.1 Overview

The multi-index model is just a multivariate extension of the single-index model (2.3)

Ri D ai C bi1 I1 C bi2 I2 C : : : C bik Ik C ei , where


 
(2.11)
E.ei / D 0, Cov ei ; Ik D 0, and Cov.ei ; ej / D 0:


As an example, there could be two indices: the stock market return and an interest rate.
An ad-hoc approach is to first try a single-index model and then test if the residuals are
approximately uncorrelated. If not, then adding a second index might give an acceptable
approximation.
It is often found that it takes several indices to get a reasonable approximation—but
that a single-index model is equally good (or better) at “forecasting” the covariance over
a future period. This is much like the classical trade-off between in-sample fit (requires a
large model) and forecasting (often better with a small model).
The types of indices vary, but one common set captures the “business cycle” and
includes things like the market return, interest rate (or some measure of the yield curve
slope), GDP growth, inflation, and so forth. Another common set of indices are industry
indices.
It turns out (see below) that the calculations of the covariance matrix are much simpler
if the indices are transformed to be uncorrelated so we get the model

Ri D ai C bi1 I1 C bi 2 I2 C : : : C bi k Ik C ei ; where (2.12)


E.ei / D 0, Cov ei ; Ij D 0, Cov.ei ; ej / D 0 (unless i D j /, and


Cov.Ij ; Ih / D 0 (unless j D h).

If this transformation of the indices is linear (and non-singular, so it is can be reversed if


we want to), then the fit of the regression is unchanged.

39
2.4.2 “Rotating” the Indices

There are several ways of transforming the indices to make them uncorrelated, but the fol-
lowing regression approach is perhaps the simplest and may also give the best possibility
of interpreting the results:

1. Let the first transformed index equal the original index, I1 D I1 (possibly de-
meaned). This would often be the market return.

2. Regress the second original index on the first transformed index, I2 D 0 C 1 I1 C
"2 . Then, let the second transformed index be the fitted residual, I2 D "O2 .

3. Regress the third original index on the first two transformed indices, I3 D 0 C
1 I1 C 2 I2 C "3 . Then, let I3 D "O3 . Follow the same idea for all subsequent
indices.

Recall that the fitted residual (from Least Squares) is always uncorrelated with the
regressor (by construction). In this case, this means that I2 is uncorrelated with I1 (step
2) and that I3 is uncorrelated with both I1 and I2 (step 3). The correlation matrix of the
first three rotated indices is therefore
02 31 2 3
I1 1 0 0
Corr @4I2 5A D 40 1 05 : (2.13)
B6 7C 6 7

I3 0 0 1

This recursive approach also helps in interpreting the transformed indices. Suppose
the first index is the market return and that the second original index is an interest rate.
The first transformed index (I1 ) is then clearly the market return. The second transformed
index (I2 ) can then be interpreted as the interest rate minus the interest rate expected at the
current stock market return—that is, the part of the interest rate that cannot be explained
by the stock market return.
More generally, let the j th index (j D 1; 2; : : : ; k) be

Ij D "Oj , where "Oj is the fitted residual from the regression (2.14)
Ij D j1 C j1 I1 C : : : C j;j 1 Ij 1 C "j : (2.15)

Notice that for the first index (j D 1), the regression is only I1 D 11 C "1 , so I1 equals
the demeaned I1 .

40
2.4.3 Multi-Index Model after “Rotating” the Indices

To see why the transformed indices are very convenient for calculating the covariance
matrix, consider a two-index model. Then, (2.12) implies that the variance of asset i is

i i D Var .ai C bi1 I1 C bi 2 I2 C ei /


2
D bi1 Var .I1 / C bi22 Var .I2 / C Var .ei / : (2.16)

Similarly, the covariance of assets i and j is

ij D Cov ai C bi1 I1 C bi 2 I2 C ei ; aj C bj1 I1 C bj 2 I2 C ej




D bi1 bj1 Var .I1 / C bi 2 bj 2 Var .I2 / : (2.17)

More generally, with n assets and k indices we can define b1 to be an n  1 vector of


the slope coefficients for the first index (bi1 ; bj1 ) and b2 the vector of slope coefficients
for the second index and so on. Also, let ˙ to be an n  n matrix with the variances of
the residuals along the diagonal. The covariance matrix of the returns is then

Cov.R/ D b1 b10 Var .I1 / C b2 b20 Var .I2 / C : : : C bk bk0 Var .Ik / C ˙: (2.18)

See Figure 2.5 for an example.

2.4.4 Multi-Index Model as Method in Portfolio Choice

The factor loadings (betas) can be used for more than just constructing the covariance ma-
trix. In fact, the factor loadings are often used directly in portfolio choice. The reason is
simple: the betas summarize how different assets are exposed to the big risk factors/return
drivers. The betas therefore provide a way to understand the broad features of even com-
plicated portfolios. Combined this with the fact that many analysts and investors have
fairly little direct information about individual assets, but are often willing to form opin-
ions about the future relative performance of different asset classes (small vs large firms,
equity vs bonds, etc)—and the role for factor loadings becomes clear.
See Figures 2.6–2.7 for an illustration.

41
Correlations, data Difference in correlations: data − model

1 0.5

0.5 0

0 −0.5
25 20 25 20
15 10 20 25 15 10 20 25
5 5 10 15 5 5 10 15
Portfolio Portfolio

25 FF US portfolios, 1957:1−2009:12
Indices (factors): US market, SMB, HML

Figure 2.5: Correlations of US portfolios

2.5 Principal Component Analysis

Principal component analysis (PCA) can help us determine how many factors that are
needed to explain a cross-section of asset returns.
Let z t D R t RN t be an n  1 vector of demeaned returns with covariance matrix ˙.
The first principal component (pc1t ) is the (normalized) linear combinations of z t that
account for as much of the variability as possible—and its variance is denoted 1 . The
j th (j  2) principal component (pcjt ) is similar (and its variance is denoted j ), except
that is must be uncorrelated with all lower principal components. Remark 2.4 gives a a
formal definition.

Remark 2.4 (Principal component analysis) Consider the zero mean N 1 vector z t with
covariance matrix ˙ . The first (sample) principal component is pc1t D w10 z t , where w1
is the eigenvector associated with the largest eigenvalue (1 ) of ˙ . This value of w1
solves the problem maxw w 0 ˙w subject to the normalization w 0 w D 1. The eigenvalue
1 equals Var.pc1t / D w10 ˙w1 . The j th principal component solves the same problem,
but under the additional restriction that wi0 wj D 0 for all i < j . The solution is the
eigenvector associated with the j th largest eigenvalue j (which equals Var.pcjt / D
wj0 ˙ wj ).

42
US portfolios, βm, 1957:1−2009:12 US portfolios, βSMBres
1.5
1.4
1
1.2
β

β
0.5

1 0
0 5 10 15 20 25 0 5 10 15 20 25
Portfolio Portfolio

US portfolios, βHMLres

0.8
0.6
0.4
β

0.2
0
−0.2
−0.4
0 5 10 15 20 25
Portfolio

Figure 2.6: Loading (betas) of rotated factors

Let the i th eigenvector be the i th column of the n  n matrix

W D Œ w1    wn : (2.19)

We can then calculate the n  1 vector of principal components as

pc t D W 0 z t : (2.20)

Since the eigenvectors are ortogonal it can be shown that W 0 D W 1


, so the expression
can be inverted as
z t D Wpc t : (2.21)

This shows that the i th eigenvector (the i th column of W ) can be interpreted as the effect
of the ith principal component on each of the elements in z t . However, the sign of column

43
Factor exposure of small growth stocks Factor exposure of large value stocks
HML (res)

HML (res)
Market Market

SMB (res)

SMB (res)
The factor exposure is measured as abs(β)
The factors are rotated to become uncorrelated

Figure 2.7: Absolute loading (betas) of rotated factors

j of W can be changed without any effects (except that the pcjt also changes sign), so
we can always reinterpret a negative cofficient as a positive exposure (to pcjt ).

Example 2.5 (PCA with 2 series) With two series we have


" #0 " # " #0 " #
w11 z1t w12 z1t
pc1t D and pc2t D or
w21 z2t w22 z2t
" # " #0 " #
pc1t w11 w12 z1t
D and
pc2t w21 w22 z2t
" # " #" #
z1t w11 w12 pc1t
D :
z2t w21 w22 pc2t

For instance, w12 shows how pc2t affects z1t , while w22 shows how pc2t affects z2t .

Remark 2.6 (Data in matrices ) Transpose (2.20) to get pc t0 D z t0 W , where the dimen-
sions are 1  n, 1  n and n  n respectively. If we form a T  n matrix of data Z by
putting z t in row t, then the T  N matrix of principal components can be calculated as
P C D ZW .

44
25 FF US portfolios, eigenvectors
0.5
1st (83.5%)
2nd (7.0%)
3rd (3.7%)

−0.5
0 5 10 15 20 25

Figure 2.8: Eigenvectors for US portfolio returns

Notice that (2.21) shows that all n data series in z t can be written in terms of the n prin-
cipal components. Since the principal components are uncorrelated (Cov.pci t ; pcjt / D
0/), we can think of the sum of their variances (˙inD1 i ) as the “total variation” of the
series in z t . In practice, it is common to report the relative importance of principal com-
ponent j as
relative importance of pcj D j =˙inD1 i : (2.22)

For instance, if it is found that the first two principal components account for 75% for the
total variation among many asset returns, then a two-factor model is likely to be a good
approximation.

2.6 Estimating Expected Returns

The starting point for forming estimates of future mean excess returns is typically histor-
ical excess returns. Excess returns are preferred to returns, since this avoids blurring the
risk compensation (expected excess return) with long-run movements in inflation (and
therefore interest rates). The expected excess return for the future period is typically
formed as a judgmental adjustment of the historical excess return. Evidence suggest that
the adjustments are hard to make.

45
It is typically hard to predict movements (around the mean) of asset returns, but a few
variables seem to have some predictive power, for instance, the slope of the yield curve,
the earnings/price yield, and the book value–market value ratio. Still, the predictive power
is typically low.
Makridakis, Wheelwright, and Hyndman (1998) 10.1 show that there is little evidence
that the average stock analyst beats (on average) the market (a passive index portfolio).
In fact, less than half of the analysts beat the market. However, there are analysts which
seem to outperform the market for some time, but the autocorrelation in over-performance
is weak. The evidence from mutual funds is similar. For them it is typically also found
that their portfolio weights do not anticipate price movements.
It should be remembered that many analysts also are sales persons: either of a stock
(for instance, since the bank is underwriting an offering) or of trading services. It could
well be that their objective function is quite different from minimizing the squared forecast
errors—or whatever we typically use in order to evaluate their performance. (The number
of litigations in the US after the technology boom/bust should serve as a strong reminder
of this.)

2.7 Estimation on Subsamples

To capture time-variation in the regression coefficients, it is fairly common to run the


regression
y t D x t0 b C " t (2.23)

on a longer and longer data set (“recursive estimation”). In the standard recursive es-
timation, the first estimation is done on the sample t D 1; 2; : : : ; ; while the second
estimation is done on t D 1; 2; : : : ; ;  C 1; and so forth until we use the entire sample
t D 1 : : : ; T . In the “backwards recursive estimate” we instead keep the end-point fixed
and use more and more of old data. That is, the first sample could be T ; : : : ; T ; the
second T  1; : : : ; T ; and so forth.
Alterntively, a moving data window (“rolling samples”) could be used. In this case,
the first sample is t D 1; 2; : : : ; ; but the second is on t D 2; : : : ; ;  C 1, that is, by
dropping one observation at the start when the sample is extended at the end. See Figure
2.9 for an illustration.
An alternative is to apply an exponentially weighted moving average (EMA) esti-

46
mator, which uses all data points since the beginning of the sample—but where recent
observations carry larger weights. The weight for data in period t is T t where T is the
latest observation and 0 <  < 1, where a smaller value of  means that old data carries
low weights. In practice, this means that we define

xQ t D x t T t
and yQ t D y t T t
(2.24)

and then estimate


yQ t D xQ t0 b C " t : (2.25)

Notice that also the constant (in x t ) should be scaled in the same way. (Clearly, this
method is strongly related to the GLS approach used when residuals are heteroskedastic.
Also, the idea of down weighting old data is commonly used to estimate time-varying
volatility of returns as in the RISK metrics method.)
Estimation on subsamples is not only a way of getting a more recent/modern estimate,
but also a way to gauge the historical range and volatility in the betas—which may be
important for putting some discipline on judgemental forecasts.
See Figures 2.9–2.10 for an illustration.

2.8 Robust Estimation

2.8.1 Robust Means, Variances and Correlations

Outliers and other extreme observations can have very decisive influence on the estimates
of the key statistics needed for financial analysis, including mean returns, variances, co-
variances and also regression coefficients.
The perhaps best way to solve these problems is to carefully analyse the data—and
then decide which data points to exclude. Alternatively, robust estimators can be applied
instead of the traditional ones.
To estimate the mean, the sample average can be replaced by the median or a trimmed
mean (where the x% lowest and highest observations are excluded).
Similarly, to estimate the variance, the sample standard deviation can be replaced by
the interquartile range (the difference between the 75th and the 25th percentiles), divided
by 1:35
StdRobust D Œquantile.0:75/ quantile.0:25/=1:35; (2.26)

47
β of HiTech sector, recursive β of HiTech sector, backwards recursive
2 2

1.5 1.5

1 1
1960 1980 2000 1960 1980 2000
end of sample start of sample

β of HiTech sector, 5−year data window β of HiTech sector, EWMA estimate


2 2

1.5 1.5

1 1
1960 1980 2000 0.9 0.92 0.94 0.96 0.98 1
end of 5−year sample λ

Figure 2.9: Betas of US industry portfolios

or by the median absolute deviation

StdRobust D median.jx t j/=0:675: (2.27)

Both these would coincide with the standard deviation if data was indeed drawn from a
normal distribution without outliers.
A robust covariance can be calculated by using the identity

Cov.x; y/ D ŒVar.x C y/ Var.x y/=4 (2.28)

and using a robust estimator of the variances—like the square of (2.26). A robust cor-
relation is then created by dividing the robust covariance with the two robust standard
deviations.
See Figures 2.11–2.12 for empirical examples.

48
Distribution of betas estimated on moving 5−year data windows

Monthly data 1947:1−2009:12


Monthly returns, %

NoDur Durbl Manuf Enrgy HiTec Telcm Shops Hlth Utils Other

Figure 2.10: Distribution of betas of US industry portfolios (estimated on 5-year data


windows)

2.8.2 Robust Regression Coefficients

Reference: Amemiya (1985) 4.6


The least absolute deviations (LAD) estimator miminizes the sum of absolute residu-
als (rather than the squared residuals)
T
ˇOLAD D arg min
X
x t0 b ˇ (2.29)
ˇ ˇ
ˇy t
b
t D1

This estimator involve non-linearities, but a simple iteration works nicely. It is typically
less sensitive to outliers. (There are also other ways to estimate robust regression coeffi-
cients.) This is illustrated in Figure 2.13.
See Figure 2.14 for an empirical example.
If we assume that the median of the true residual, u t , is zero, then we (typically) have
p XT
T .ˇOLAD ˇ0 / !d N 0; f .0/ 2 ˙xx1 =4 , where ˙xx D plim x t x t0 =T; (2.30)
 
tD1

49
US industry portfolios, ERe
Monthly data 1947:1−2009:12
0.11 mean
median
0.1
mean excess return

0.09

0.08

0.07

0.06

A B C D E F G H I J

Figure 2.11: Mean excess returns of US industry portfolios

where f .0/ is the value of the pdf of the residual at zero. Unless we know this density
function (or else we would probably have used MLE instead of LAD), we need to estimate
it—for instance with a kernel density method.
p
Example 2.7 (N.0;  2 /) When u t  N.0;  2 ), then f .0/ D 1= 2 2 , so the covari-
ance matrix in (2.30) becomes  2 ˙xx1 =2. This is =2 times larger than when using
LS.

Remark 2.8 (Algorithm for LAD) The LAD estimator can be written
T
ˇOLAD D arg min x t0 bO
X
w t uO t .b/2 , w t D 1= juO t .b/j ; with uO t .b/ D y t
ˇ
t D1

so it is a weighted least squares where both y t and x t are multiplied by 1= juO t .b/j. It can
be shown that iterating on LS with the weights given by 1= juO t .b/j, where the residuals
are from the previous iteration, converges very quickly to the LAD estimator.

Some alternatives to LAD: least median squares (LMS), and least trimmed squares

50
US industry portfolios, Std
Monthly data 1947:1−2009:12
std
0.2 iqr/1.35

0.18
std

0.16

0.14

0.12

A B C D E F G H I J

Figure 2.12: Volatility of US industry portfolios

(LTS) estimators which solve

ˇOLM S D arg min median uO 2t , with uO t D y t x t0 bO (2.31)


 
ˇ
h
ˇOLT S D arg min
X
uO 2i , uO 21  uO 22  ::: and h  T: (2.32)
ˇ
iD1

Note that the LTS estimator in (2.32) minimizes the sum of the h smallest squared resid-
uals.

Bibliography
Amemiya, T., 1985, Advanced econometrics, Harvard University Press, Cambridge, Mas-
sachusetts.

Elton, E. J., M. J. Gruber, S. J. Brown, and W. N. Goetzmann, 2010, Modern portfolio


theory and investment analysis, John Wiley and Sons, 8th edn.

Greene, W. H., 2003, Econometric analysis, Prentice-Hall, Upper Saddle River, New
Jersey, 5th edn.

51
OLS vs LAD of y = 0.75*x + u
2
y: −1.125 −0.750 1.750 1.125
1.5
x: −1.500 −1.000 1.000 1.500
1

0.5

0
y

−0.5

−1
Data
−1.5 OLS (0.25 0.90)
LAD (0.00 0.75)
−2
−3 −2 −1 0 1 2 3
x

Figure 2.13: Data and regression line from OLS and LAD

Makridakis, S., S. C. Wheelwright, and R. J. Hyndman, 1998, Forecasting: methods and


applications, Wiley, New York, 3rd edn.

52
US industry portfolios, β
1.5
Monthly data 1947:1−2009:12
OLS
LAD

1
β

0.5
A B C D E F G H I J

Figure 2.14: Betas of US industry portfolios

53
3 Risk Measures
Reference: Hull (2006) 18; McDonald (2006) 25; Fabozzi, Focardi, and Kolm (2006)
4–5; McNeil, Frey, and Embrechts (2005)

3.1 Symmetric Dispersion Measures

3.1.1 Mean Absolute Deviation

The variance (and standard deviation) is very sensitive to the tails of the distribution.
For instance, even if the standard normal distribution and a student-t distribution with
4 degrees of freedom look fairly similar, the latter has a variance that is twice as large
(recall: the variance of a tn distribution is n=.n 2/ for n > 2). This may or may not be
what the investor cares about. If not, the mean absolute deviation is an alternative. Let 
be the mean, then the definition is

mean absolute deviation D E jR j: (3.1)

This measure of dispersion is much less sensitive to the tails—essentially because it does
not involve squaring the variable.
Notice, however, that for a normally distributed return the mean absolute deviation
is proportional to the standard deviation—see Remark 3.1. Both measures will therefore
lead to the same portfolio choice (for a given mean return). In other cases, the portfolio
choice will be different (and perhaps complicated to perform since it is typically not easy
to calculate the mean absolute deviation of a portfolio).

Remark 3.1 (Mean absolute deviation of N.;  2 / and tn ) If R  N.;  2 /, then


p
E jR j D 2=  0:8:
p
If R  tn , then E jRj D 2 n=Œ.n 1/B.n=2; 0:5/, where B is the beta function. For
n D 4, E jRj D 1 which is just 25% higher than for a N.0; 1/ distribution. In contrast,
p
the standard deviation is 2, which is 41% higher than for the N.0; 1/.

54
See Table 3.1 for an empirical illustration.

Small growth Large value


Std 8:4 5:0
VaR (95%) 12:4 7:5
ES (95%) 17:3 10:3
SemiStd 5:5 3:2
Drawdown 77:9 49:4

Table 3.1: Risk measures of monthly returns of two stock indices (%), US data 1957:1-
2009:12.

3.1.2 Index Tracking Errors

Value at risk and density of returns

VaR95% = −(the 5% quantile)

−VaR95% R

Figure 3.1: Value at risk

Suppose instead that our task, as fund managers, say, is to track a benchmark portfolio
(returns Rb and portfolio weights wb )—but we are allowed to make some deviations. For
instance, we are perhaps asked to track a certain index. The deviations, typically measured
in terms of the variance of the tracking errors for the returns, can be motivated by practical
considerations and by concerns about trading costs. If our portfolio has the weights w,
then the portfolio return is Rp D w 0 R, where R are the original assets. Similarly, the

55
benchmark portfolio (index) has the return Rb D wb0 R. If the variance of the tracking
error should be less than U , then we have the restriction

Var.Rp Rb / D .w wb /0 ˙.w wb /  U; (3.2)

where ˙ is the covariance matrix of the original assets. This type of restriction is fairly
easy to implement numerically in the portfolio choice model (the optimization problem).

3.2 Downside Risk

3.2.1 Value at Risk

The mean-variance framework is often criticized for failing to distinguish between down-
side (considered to be risk) and upside (considered to be potential).
The 95% Value at Risk (VaR95% ) says that there is only a 5% chance that the return
(R) will be less than VaR95%

Pr.R  VaR˛ / D 1 ˛: (3.3)

See Figure 3.1.

Example 3.2 (Quantile of a distribution) The 0.05 quantile is the value such that there is
only a 5% probability of a lower number, Pr.R quantile0:05 / D 0:05.

We can solve this expression for the VaR˛ as

VaR˛ D cdfR1 .1 ˛/, (3.4)

where cdfR1 ./ is the inverse cumulative distribution function of the returns, so cdfR1 .1
˛/ is the 1 ˛ quantile (or “critical value”) of the return distribution. For instance,
VaR95% is the negative of the 0:05 quantile of the return distribution. Notice that the return
distribution depends on the investment horizon, so a value at risk measure is typically
calculated for a stated investment period (for instance, one day).
If the return is normally distributed, R  N.;  2 / and c1 ˛ is the 1 ˛ quantile of
a N(0,1) distribution (for instance, 1:64 for 1 ˛ D 1 0:95), then

VaR˛ D . C c1 ˛ /: (3.5)

56
This is illustrated in Figure 3.2.
Notice that the value at risk for a normally distributed return is a strictly increasing
function of the standard deviation (and the variance). Minimizing the VaR at a given mean
return therefore gives the same solution (portfolio weights) as minimizing the variance at
the same given mean return. In other cases, the portfolio choice will be different (and
perhaps complicated to perform).

Remark 3.3 (Critical values of N.;  2 /) If R  N.;  2 /, then there is a 5% proba-


bility that R   1:64, a 2.5% probability that R   1:96 , and a 1% probability
that R   2:33 .

Density of N(0,1) Density of N(8,162)


5% quantile is c = −1.64 3 5% quantile is µ + c*σ = −18
0.4
0.3 2
pdf

pdf

0.2
1
0.1
0 0
−3 c 0 3 −40 0 40
x R

cdf of N(8,162) Inverse of cdf of N(8,162)


1
40
cdf

0.5 0
R

−40
0
−40 0 40 0 0.2 0.4 0.6 0.8 1
R cdf

Figure 3.2: Finding critical value of N(, 2 ) distribution

Example 3.4 (VaR with R  N.;  2 /) If daily returns have  D 8% and  D 16%,
then the 1-day VaR95% D .0:08 1:64  0:16/  0:18; we are 95% sure that we will not

57
loose more than 18% of the investment over one day, that is, VaR95% D 0:18. Similarly,
VaR97:5% D .0:08 1:96  0:16/  0:24.

Example 3.5 (VaR and regulation of bank capital) Bank regulations have used 3 times
the 99% VaR for 10-day returns as the required bank capital.

GARCH std, % Value at Risk95% (one day), %


5 10
4
3
5
2
1
0 0
1980 1990 2000 2010 1980 1990 2000 2010
S&P 500, daily data 1954:1−2010:9

Distribution of returns, VaR


Estimated N(), unconditional
0.4
0.3
0.2
0.1
0
−4 −2 0 2 4

Figure 3.3: Conditional volatility and VaR

Remark 3.6 (Multi-period VaR) If the returns are iid, then a q-period return has the
mean q and variance q 2 , where  and  2 are the mean and variance of the one-period
p
returns respectively. If the mean is zero, then the q-day VaR is q times the one-day VaR.

Remark 3.7 (VaR from t-distribution) The assumption of normally distributed returns
rules thick tails. As an alternative, suppose the normalized return has a t-distribution
with v degrees of freedom
R 
 tv :
s
58
Notice that s 2 is not the variance of R, since Var.R/ D vs 2 =.v 2/ (assuming v > 2,
so the variance is defined). In this case, (3.5) still holds, but with c1 ˛ calculated as
the 1 ˛ quantile of a tv distribution. In practice, for a given value of Var.R/, the t
distribution gives a smaller value of the VaR than the normal distribution. The reason is
that the variance of a t-distribution is very high for low degrees of freedom.

The VaR concept has been criticized for having poor aggregation properties. In par-
ticular, the VaR for a portfolio is not necessarily (weakly) lower than the portfolio of the
VaRs, which contradicts the notion of diversification benefits. (To get this unfortunate
property, the return distributions must be heavily skewed.)
Figures 3.3–3.4 illustrate the VaR calculated from a time series model (to be precise,
a AR(1)+GARCH(1,1) model) for daily S&P returns.

Remark 3.8 (Backtesting VaR) Figure 3.3 shows the results from “backtesting” a VaR
model (in particular model the volatility is time-varying according to a GARCH model).
For instance, we first find the VaR95% and then calculate what fraction of returns that is
actually below this number. If the model is correct it should be 5%. We then repeat this
for VaR96% (only 4% of the returns should be below this number).

3.2.2 Expected Shortfall

The expected shortfall (also called conditional VaR) is the expected loss when the return
actually is below the VaR˛ , that is,

ES˛ D E.RjR  VaR˛ /: (3.6)

This might be more informative than the VaR˛ , which is the minimum loss that will happen
with a 1 ˛ probability.
For a normally distributed return R  N.;  2 / we have

.c1 ˛ /
ES˛ D C ; (3.7)
1 ˛
where ./ is the pdf or a N.0; 1/ variable and where c1 ˛ is the 1 ˛ quantile of a N(0,1)
distribution (for instance, 1:64 for 1 ˛ D 0:05).
Proof. (of (3.7)) If x  N.;  2 /, then E.xjx  b/ D  .b0 /=˚.b0 / where
b0 D .b /= and where ./ and ˚./ are the pdf and cdf of a N.0; 1/ variable

59
Backtesting VaR from GARCH(1,1), daily S&P 500 returns
0.1

0.09

0.08
Empirical Prob(R<VaR)

0.07

0.06

0.05

0.04

0.03

0.02

0.01 Daily S&P 500 returns, 1954:1−2010:9

0
0 0.02 0.04 0.06 0.08 0.1
Theoretical Prob(R<VaR)

Figure 3.4: Backtesting VaR from a GARCH model, assuming normally distributed
shocks

respectively. To apply this, use b D VaR˛ so b0 D c1 ˛. Clearly, ˚.c1 ˛/ D1 ˛ (by


definition of the 1 ˛ quantile). Multiply by 1.

Example 3.9 (ES) If  D 8% and  D 16%, the 95% expected shortfall is ES95% D
0:08 C 0:16. 1:64/=0:05  0:25 and the 97.5% expected shortfall is ES97:5% D
0:08 C 0:16. 1:96/=0:025  0:29.

Notice that the expected shortfall for a normally distributed return (3.7) is a strictly
increasing function of the standard deviation (and the variance). Minimizing the expected
shortfall at a given mean return therefore gives the same solution (portfolio weights) as
minimizing the variance at the same given mean return. In other cases, the portfolio
choice will be different (and perhaps complicated to perform).

3.2.3 Target Semivariance (Lower Partial 2nd Moment) and Max Drawdown

Reference: Bawa and Lindenberg (1977) and Nantell and Price (1979)

60
Value at risk and expected shortfall
3
−ES95% −VaR95%
2.5

1.5

0.5

0
−40 0 40
R, %

Figure 3.5: Value at risk and expected shortfall

Using the variance (or standard deviation) as a measure of portfolio risk (as a mean-
variance investor does) fails to distinguish between the downside and upside. As an alter-
native, one could consider using a target semivariance (lower partial 2nd moment) instead.
It is defined as
p .h/ D EŒmin.Rp h; 0/2 ; (3.8)

where h is a “target level” chosen by the investor. In the subsequent analysis it will be set
Rh
equal to the riskfree rate. (It can clearly also be written p .h/ D 1 .Rp h/2 f .Rp /dRp ,
where f ./ is the pdf of the portfolio return.)
In comparison with a variance

p2 D E.Rp p /2 ; (3.9)

the target semivariance differs on two accounts: (i) it uses the target level h as a reference
point instead of the mean p : and (ii) only negative deviations from the reference point
are given any weight. See Figure 3.6 for an illustration (based on a normally distributed
variable).
For a normally distributed variable, the target semivariance p .h/ is increasing in the
standard deviation (for a given mean)—see Remark 3.10. See also Figure 3.6 for an

61
Probability density function (pdf) Contribution to variance
3
N(µ,σ2) pdf(x)(x−µ)2
µ = 0.08 0.04 Var(x)=area
2
σ = 0.16
0.02
1

0 0
−60 −40 −20 0 20 40 60 −60 −40 −20 0 20 40 60
x, % x, %

Contribution to target semivariance Target semivariance as function of σ2


0.02
pdf(x)min(x−0.02,0)2 N(0.08,σ2)
0.04 target semivariance(x)=area 0.015 Target level: 0.02

0.01
0.02
0.005

0 0
−60 −40 −20 0 20 40 60 0 0.02 0.04 0.06
x, % 2
σ

Figure 3.6: Target semivariance as a function of mean and standard deviation for a
N(, 2 ) variable

illustration. This means that minimizing p .h/ at a given mean return gives the same
solution (portfolio weights) as minimizing p (or p2 ) at the same given mean return. As
a result, with normally distributed returns, an investor who wants to minimize the lower
partial 2nd moment (at a given mean return) is behaving just like a mean-variance investor.
In other cases, the portfolio choice will be different (and perhaps complicated to perform).
See Figure 3.7 for an illustration.
An alternative measure is the (percentage) maximum drawdown over a given horizon,
for instance, 5 years, say. This is the largest loss from peak to bottom within the given
horizon–see Figure 3.8. This is a useful measure when the investor do not know exactly
when he/she has to exit the investment—since it indicates the worst (peak to bottom)
outcome over the sample.
See Figures 3.9–3.10 for an illustration of max drawdown.

62
Std and mean
15
The markers for target semivariance (sv) indicate the std
of the portfolio that minimizes the target semivariance
at the given mean return

10
MV (risky)
Mean, %

MV (risky&riskfree)
target sv (risky)
target sv (risky&riskfree)
5

0
0 5 10 15
Std, %

Figure 3.7: Standard deviation and expected returns

max drawdown
price

time

Figure 3.8: Max drawdown

Remark 3.10 (Target semivariance calculation for normally distributed variable ) For
an N.;  2 / variable, target semivariance around the target level h is

p .h/ D  2 a.a/ C  2 .a2 C 1/˚.a/, where a D .h /=;

where ./ and ˚./ are the pdf and cdf of a N.0; 1/ variable respectively. Notice that
p .h/ D  2 =2 for h D . See Figure 3.6 for a numerical illustration. It is straightfor-
ward (but a bit tedious) to show that
@p .h/
D 2˚.a/;
@

63
Level of return index

Small growth stocks


500
Large value stocks
400

300

200

100

0
1960 1970 1980 1990 2000

Drawdown compared to earlier peak (in 5−year window), %


0

−20

−40

−60

−80
1960 1970 1980 1990 2000

Figure 3.9: Max drawdown

so the target semivariance is a strictly increasing function of the standard deviation.

See Table 3.2 for an empirical comparison of the different risk measures.

Std VaR (95%) ES (95%) SemiStd Drawdown


Std 1:00 0:96 0:97 0:96 0:67
VaR (95%) 0:96 1:00 0:98 0:99 0:64
ES (95%) 0:97 0:98 1:00 0:99 0:64
SemiStd 0:96 0:99 0:99 1:00 0:66
Drawdown 0:67 0:64 0:64 0:66 1:00

Table 3.2: Correlation of risk measures across monthly returns of the 25 FF portfolios
(%), US data 1957:1-2009:12.

64
Level of return index

4 MSCI world
CT hedge funds
3 Global govt bonds

0
1995 1997 2000 2002 2005 2007 2010

Drawdown compared to earlier peak (since start of sample), %


0

−10

−20

−30

−40

−50
−60
1995 1997 2000 2002 2005 2007 2010

Figure 3.10: Max drawdown

3.3 Empirical Return Distributions

Are returns normally distributed? Mostly not, but it depends on the asset type and on the
data frequency. Options returns typically have very non-normal distributions (in partic-
ular, since the return is 100% on many expiration days). Stock returns are typically
distinctly non-linear at short horizons, but can look somewhat normal at longer horizons.
To assess the normality of returns, the usual econometric techniques (Bera–Jarque
and Kolmogorov-Smirnov tests) are useful, but a visual inspection of the histogram and a
QQ-plot also give useful clues. See Figures 3.11–3.13 for illustrations.

Remark 3.11 (Reading a QQ plot) A QQ plot is a way to assess if the empirical distri-
bution conforms reasonably well to a prespecified theoretical distribution, for instance,

65
Daily returns, full Daily returns, zoomed in vertically
8000 25
Number of days

Number of days
20
6000
15
4000
10
2000 5
0 0
−20 −10 0 10 −20 −10 0 10
Daily excess return, % Daily excess return, %

Daily returns, zoomed in horizontally


8000 Daily S&P 500 returns, 1957:1−2010:9
Number of days

The solid line is an estimated normal distribution


6000

4000

2000

0
−3 −2 −1 0 1 2 3
Daily excess return, %

Figure 3.11: Distribution of daily S&P returns

a normal distribution where the mean and variance have been estimated from the data.
Each point in the QQ plot shows a specific percentile (quantile) according to the empiri-
cal as well as according to the theoretical distribution. For instance, if the 2th percentile
(0.02 percentile) is at -10 in the empirical distribution, but at only -3 in the theoretical
distribution, then this indicates that the two distributions have fairly different left tails.

There is one caveat to this way of studying data: it only provides evidence on the
unconditional distribution. For instance, nothing rules out the possibility that we could
estimate a model for time-varying volatility (for instance, a GARCH model) of the returns
and thus generate a description for how the VaR changes over time. However, data with
time varying volatility will typically not have an unconditional normal distribution.

66
QQ plot of daily S&P 500 returns
6
0.1th to 99.9th percentiles

2
Empirical quantiles

−2

−4

Daily S&P 500 returns, 1957:1−2010:9


−6

−6 −4 −2 0 2 4 6
Quantiles from estimated N(µ,σ2), %

Figure 3.12: Quantiles of daily S&P returns

3.4 Threshold Exceedance

Reference: McNeil, Frey, and Embrechts (2005) 7


In risk control, the focus is the distribution of losses beyond some threshold level.
This has three direct implications. First, the object under study is the loss

XD R; (3.10)

that is, the negative of the return. Second, the attention is on how the distribution looks
like beyond a threshold and also on the the probability of exceeding this threshold. In con-
trast, the exact shape of the distribution below that point is typically disregarded. Third,
modelling the tail of the distribution is best done by using a distribution that allows for a
much heavier tail that suggested by a normal distribution. The generalized Pareto (GP)
distribution is often used. See Figure 3.14 for an illustration.

Remark 3.12 (Cdf and pdf of the generalized Pareto distribution) The generalized Pareto

67
QQ plot of daily returns QQ plot of weekly returns
10
5
Empirical quantiles

Empirical quantiles
5
0 0

−5
−5
−10
−6 −4 −2 0 2 4 6 −10 −5 0 5 10
Quantiles from N(µ,σ2), % Quantiles from N(µ,σ2), %

QQ plot of monthly returns Circles denote 0.1th to 99.9th percentiles


Empirical quantiles

10 Daily S&P 500 returns, 1957:1−2010:9

−10

−20
−20 −10 0 10
2
Quantiles from N(µ,σ ), %

Figure 3.13: Distribution of S&P returns (different horizons)

90% probability mass, unknown shape


generalized Pareto dist

u Loss

Figure 3.14: Loss distribution

distribution is described by a scale parameter (ˇ > 0) and a shape parameter (). The

68
cdf (Pr.Z  z/, where Z is the random variable and z is a value) is
(
1 .1 C z=ˇ/ 1= if  ¤ 0
G.z/ D
1 exp. z=ˇ/  D 0;

for 0  z if   0 and z  ˇ= in case  < 0. The pdf is therefore


(
ˇ
1
.1 C z=ˇ/ 1= 1 if  ¤ 0
g.z/ D
ˇ
1
exp. z=ˇ/  D 0:

The mean is defined (finite) if  < 1 and is then E.z/ D ˇ=.1 /. Similarly, the variance
is finite if  < 1=2 and is then Var.z/ D ˇ 2 =Œ.1 /2 .1 2/. See Figure 3.15 for an
illustration.

Remark 3.13 (Random number from a generalized Pareto distribution ) By inverting the
Cdf, we can notice that if u is uniformly distributed on .0; 1, then By inverting the Cdf,
we can notice that if u is uniformly distributed on .0; 1, then we can construct random
variables with a GPD by

z D ˇ Œ.1 u/ 
1 if  ¤ 0
zD ln.1 u/ˇ  D 0:

Consider the loss X (the negative of the return) and let u be a threshold. Assume
that the threshold exceedance (X u) has a generalized Pareto distribution. Let Pu be
probability of X  u. Then, the cdf of the loss for values greater than the threshold
(Pr.X  x/ for x > u) can be written

F .x/ D Pu C G.x u/.1 Pu /, for x > u; (3.11)

where G.z/ is the cdf of the generalized Pareto distribution. Noticed that, the cdf value is
Pu at at x D u (or just slightly above u), and that it becomes one as x goes to infinity.
Clearly, the pdf is

f .x/ D g.x u/.1 Pu /, for x > u; (3.12)

where g.z/ is the pdf of the generalized Pareto distribution. Notice that integrating the
pdf from x D u to infinity shows that the probability mass of X above u is 1 Pu . Since

69
Pdf of generalized Pareto distribution (β = 0.15)
7
ξ=0
6 ξ = 0.25
ξ = 0.45
5

0
0 0.1 0.2 0.3 0.4 0.5

Figure 3.15: Generalized Pareto distributions

the probability mass below u is Pu , it adds up to unity (as it should). See Figure 3.17 for
an illustration.
It is often to calculate the tail probability Pr.X > x/, which in the case of the cdf in
(3.11) is
1 F .x/ D .1 Pu /Œ1 G.x u/; (3.13)

where G.z/ is the cdf of the generalized Pareto distribution.


The VaR˛ (say, ˛ D 0:95) is the ˛-th quantile of the loss distribution

VaR˛ D cdfX 1 .˛/; (3.14)

where cdfX 1 ./ is the inverse cumulative distribution function of the losses, so cdfX 1 .˛/
is the ˛ quantile of the loss distribution. For instance, VaR95% is the 0:95 quantile of the
loss distribution. This clearly means that the probability of the loss to be less than VaR˛
equals ˛
Pr.X  VaR˛ / D ˛: (3.15)

(Equivalently, the Pr.X >VaR˛ / D 1 ˛:)


Assuming ˛ is higher than Pu (so VaR˛  u), the cdf (3.11) together with the form

70
Loss distributions, Pr(loss>12) = 10%

1 N(0.08,0.162)
generalized Pareto (ξ=0.22,β=0.16)
0.8

VaR ES
0.6
Normal dist 18.2 25.3

0.4 GP dist 24.5 48.4

0.2

0
15 20 25 30 35 40 45 50 55 60
Loss (−R), %

Figure 3.16: Comparison of a normal and a generalized Pareto distribution for the tail of
losses

of the generalized Pareto distribution give


8    
< uC ˇ
ˆ 1 ˛
1 if  ¤ 0
 1 Pu
VaR˛ D   , for ˛  Pu : (3.16)
u ˇ ln 11 P˛u D0
ˆ
:

Proof. (of (3.16)) Set F .x/ D ˛ in (3.11) and use z D x u in the cdf from Remark
3.12 and solve for x.
If we assume  < 1 (to make sure that the mean is finite), then straightforward inte-
gration using (3.12) shows that the expected shortfall is

ES˛ D E.XjX  VaR˛ /


VaRa ˇ u
D C , for ˛ > Pu and  < 1: (3.17)
1  1 
Let  DVaR˛ and then subtract  from both sides of the expected shortfall to get the

71
expected exceedance of the loss over another threshold  > u

e./ D E .X jX > /


 ˇ u
D C , for  > u and  < 1. (3.18)
1  1 

The expected exceedance of a generalized Pareto distribution (with  > 0) is increasing


with the threshold level . This indicates that the tail of the distribution is very long. In
contrast, a normal distribution would typically show a negative relation (see Figure 3.17
for an illustration). This provides a way of assessing which distribution that best fits the
tail of the historical histogram.

Remark 3.14 (Expected exceedance from a normal distribution) If X  N.;  2 /, then

.0 /
E.X jX > / D  C  ; with 0 D . /=
1 ˚.0 /

where ./ and ˚ are the pdf and cdf of a N.0; 1/ variable respectively.

The expected exceedance over  is often compared with an empirical estimate of the
same thing: the mean of X t  for those observations where X t > 
PT (
.X t /ı.X t > / 1 if q is true
O
e./ D tD1PT ; where ı.q/ D (3.19)
t D1 .X t > /
0 else.

If it is found that e./


O is increasing (more or less) linearly with the threshold level (),
then it is reasonable to model the tail of the distribution from that point as a generalized
Pareto distribution.
The estimation of the parameters of the distribution ( and ˇ) is typically done by
maximum likelihood. Alternatively, A comparison of the empirical exceedance (3.19)
with the theoretical (3.18) can help. Suppose we calculate the empirical exceedance for
different values of the threshold level (denoted i —all large enough so the relation looks
linear), then we can estimate (by LS)

O i / D a C bi C "i :
e. (3.20)

Then, the theoretical exceedance (3.18) for a given starting point of the GPD u is related

72
to this regression according to

ˇ u 
aD and b D , or
1  1 
b
D and ˇ D a.1 / C u: (3.21)
1Cb
See Figure 3.18 for an illustration.

Expected exeedance (loss minus threshold, v)


30

25

20

N(0.08,0.162)
15
generalized Pareto (ξ=0.22,β=0.16,u=12)
10

0
15 20 25 30 35 40
threshold v, %

Figure 3.17: Expected exceedance, normal and generalized Pareto distribution

Remark 3.15 (Log likelihood function of the loss distribution) Since we have assumed
that the threshold exceedance (X u) has a generalized Pareto distribution, Remark 3.12
shows that the log likelihood for the observation of the loss above the threshold (X t > u)
is
X
LD Lt
t st. X t >u
(
ln ˇ .1= C 1/ ln Œ1 C  .X t u/ =ˇ if  ¤ 0
ln L t D
ln ˇ .X t u/ =ˇ  D 0:

This allows us to estimate  and ˇ by maximum likelihood. Typically, u is not estimated,


but imposed a priori (based on the expected exceedance).

73
Expected exceedance Estimated loss distribution
loss minus threshold, v

(50th to 99th percentiles) u = 1.3, Pr(loss>u) = 6.6%


1.2 u = 1.3, ξ = 0.29, β = 0.51 0.1
ξ = 0.24, β = 0.56
1
0.05
0.8
0.6
0
0 0.5 1 1.5 2 2.5 1.5 2 2.5 3 3.5 4
threshold v, % Loss, %

QQ plot
Daily S&P 500 returns, 1957:1−2010:4
Empirical quantiles

(94th to 99th percentiles)


2.5

1.5
1.5 2 2.5
Quantiles from estimated GPD, %

Figure 3.18: Results from S&P 500 data

Example 3.16 (Estimation of the generalized Pareto distribution on S&P daily returns).
Figure 3.18 (upper left panel) shows that it may be reasonable to fit a GP distribution
with a threshold u D 1:3. The upper right panel illustrates the estimated distribution,
while the lower left panel shows that the highest quantiles are well captured by estimated
distribution.

Bibliography
Bawa, V. S., and E. B. Lindenberg, 1977, “Capital market equilibrium in a mean-lower
partial moment framework,” Journal of Financial Economics, 5, 189–200.

Fabozzi, F. J., S. M. Focardi, and P. N. Kolm, 2006, Financial modeling of the equity
market, Wiley Finance.

74
Hull, J. C., 2006, Options, futures, and other derivatives, Prentice-Hall, Upper Saddle
River, NJ, 6th edn.

McDonald, R. L., 2006, Derivatives markets, Addison-Wesley, 2nd edn.

McNeil, A. J., R. Frey, and P. Embrechts, 2005, Quantitative risk management, Princeton
University Press.

Nantell, T. J., and B. Price, 1979, “An analytical comparison of variance and semivariance
capital market theories,” Journal of Financial and Quantitative Analysis, 14, 221–242.

75
4 CAPM
Reference: Elton, Gruber, Brown, and Goetzmann (2010) 10 and 13
Additional references: Danthine and Donaldson (2002) 6
More advanced material is denoted by a star ( ). It is not required reading.

4.1 Portfolio Choice with Mean-Variance Utility

It is well known that mean-variance preferences (and several other cases) imply that the
optimal portfolio is a mix of the riskfree asset and the tangency portfolio (a portfolio of
risky assets only) that is located at the point where the ray from the riskfree rate is tangent
to the mean-variance frontier of risky assets only. See Figure 4.1 for an example. The
purpose of this section is to derive a formula for the tangency portfolio.

Utility contours, E(Rp) − (k/2)Var(Rp)


0.1
k=5

0.08
k=7
0.06
k=9
Mean

0.04

0.02 Covariance matrix:


0.0256 0.0000
0.0000 0.0144
0
0 0.05 0.1 0.15
Std

Figure 4.1: Iso-utility curves, mean-variance utility

76
4.1.1 A Risky Asset and a Riskfree Asset

The simplest case with one risky asset (stock market index, say) and one riskfree (T-bill,
say) can be used to demonstrate the importance of expected returns, volatility, and risk
aversion.
Suppose there are one risky asset and a riskfree asset. An investor with initial wealth
equal (to simplify the notation) to unity chooses the portfolio weight v (of the risky asset)
to maximize
k
E U.Rp / D E.Rp / Var.Rp /; where (4.1)
2
Rp D vR1 C .1 v/Rf
D vR1e C Rf : (4.2)

The returns Ri could be gross returns (for instance, 1.05) or net returns (for instance,
0.05).
Use the budget constraint in the objective function to get (using the fact that Rf is
known)
k
E U.Rp / D E.vR1e C Rf / Var.vR1e C Rf /
2
k 2
D ve1 C Rf v 11 ; (4.3)
2
where 11 denotes the variance of the risky asset.
The first order condition for an optimum is

0 D @ E U.Rp /=@v D e1 kv11 ; (4.4)

so the optimal portfolio weight of the risky asset is


1 e1
vD : (4.5)
k 11
The weight on the risky asset is increasing in the expected excess return of the risky asset,
but decreasing in the risk aversion and variance.

77
4.1.2 Two Risky Assets and a Riskfree Asset

With two risky assets, we can analyze the effect of correlations of returns.
We now go through the same steps for the case with two risky assets and a riskfree
asset. An investor (with initial wealth equal to unity) chooses the portfolio weights (vi )
to maximize
k
E U.Rp / D E.Rp / Var.Rp /; where (4.6)
2
Rp D v1 R1 C v2 R2 C .1 v1 v2 /Rf
D v1 R1e C v2 R2e C Rf : (4.7)

Combining gives

k
E U.Rp / D E.v1 R1e C v2 R2e C Rf / Var.v1 R1e C v2 R2e C Rf /
2
k 2
D v1 e1 C v2 e2 C Rf v1 11 C v22 22 C 2v1 v2 12 ; (4.8)

2
where 12 denotes the covariance of asset 1 and 2.
The first order conditions (for v1 and v2 ) are that the partial derivatives equal zero

k
0 D @ E U.Rp /=@v1 D e1 .2v1 11 C 2v2 12 / (4.9)
2
k
0 D @ E U.Rp /=@v2 D e2 .2v2 22 C 2v1 12 / , or (4.10)
" # " # " 2
#" #
0 e1 11 12 v1
D k ; (4.11)
0 e2 12 22 v2
021 D e k˙v: (4.12)

We can solve this linear system of equations as


" # " #
v1 1 1 22 e1 12 e2
D 2
(4.13)
v2 k 11 22 12 12 e1 C 11 e2
" #" #
1 1 22 12 e1
D 2
(4.14)
k 11 22 12 12 11 e2
1
D ˙ 1 e ; (4.15)
k

78
MV Utility, 2 risky assets MV frontier

0 0.1
−0.1

Mean
−0.2 0.05

1
1
0 0 0
v2 −1 −1 v1 0 0.05 0.1 0.15 0.2
Std

Riskfree rate: 0.01


Mean returns: 0.09 0.06
Covariance matrix:
0.026 0.000
0.000 0.014

Weights on risky assets and riskfree:


Optimal with k=15: 0.21 0.23 0.56
Tangency portfolio: 0.47 0.53 0.00

Figure 4.2: Choice of portfolios weights

where ˙ is the covariance matrix and e the vector of excess returns.


Notice that the denominator (11 22 12
2
) is positive—since correlations are between
1 and 1. Since k > 0, we have

v1 > 0 if 22 e1 > 12 e2 : (4.16)

Use the fact that 12 D 1 2 where  is the correlation coefficient to rewrite as

v1 > 0 if e1 =1 > e2 =2 , and (4.17)


v2 > 0 if e2 =2 > e1 =1 : (4.18)

This provides a simple way to assess if an asset should be held (in positive amounts): if its
Sharpe ratio exceeds the correlation times the Sharpe ratio of the other asset. For instance,
both portfolio weights are positive if the correlation is zero and both excess returns are
positive.

79
If v1 C v2 ¤ 0, then we can define the weights in the “subportfolio” of risky assets
only as wi D vi =.v1 C v2 /. This gives
" # " #
w1 22 e1 12 e2 1
D (4.19)
w2 12 1 C 11 2 22 1 C 11 2 .e2 C e1 /12
e e e e
" #" #
22 12 e1 1
D (4.20)
12 11 e e e
2 22 1 C 11 2 .e2 C e1 /12
1
D˙ e =10 ˙ 1
e ; (4.21)

where 1 is a vector of ones.


This is the tangency portfolio (where the ray from Rf in the p E Rp space is tangent
to the minimum-variance set). It has the highest Sharpe ratio, pe =p , of all portfolios on
the minimum-variance set. Note that all investors (different k, but same expectations)
hold a mix of this portfolio and the riskfree asset (this follows from the fact that the
weights in (4.13) are scaled by k). This two-fund separation theorem is very useful.
Consider the simple case when the assets are uncorrelated (12 D 0), then (4.19)
becomes " # " #
w1 22 e1 1
D : (4.22)
w2 11 2 22 1 C 11 e2
e e

Results: (i) if both excess returns are positive, then the weight on asset 1 increases if
e1 increases or 11 decreases; (ii) both weights are positive if the excess returns are.
Both results are quite intuitive since the investor likes high expected returns, but dislikes
variance.

Example 4.1 (Tangency portfolio, numerical) When .e1 ; e2 / D .0:08; 0:05/, the corre-
lation is zero, and .11 ; 22 / D .0:162 ; 0:122 /, then (4.22) gives
" # " #
w1 0:47
D :
w2 0:53

When e1 increases from 0:08 to 0:12, then we get


" # " #
w1 0:57
D :
w2 0:43

Now, consider another simple case, where both variances are the same, but the corre-

80
lation is non-zero (11 D 22 D 1 as a normalization, 12 D ). Then (4.19) becomes
" # " #
w1 e1 e2 1
D : (4.23)
w2 e2 e1 .1 C e2 /.1 /
e

Results: (i) both weights are positive if the returns are negatively correlated ( < 0)
and both excess returns are positive; (ii) w2 < 0 if  > 0 and e1 is considerably higher
than e2 (so e2 < e1 ). The intuition for the first result is that a negative correlation
means that the assets “hedge” each other (even better than diversification), so the investor
would like to hold both of them to reduce the overall risk. (Unfortunately, most assets
tend to be positively correlated.) The intuition for the second result is that a positive
correlation reduces the gain from holding both assets (they don’t hedge each other, and
there is relatively little diversification to be gained if the correlation is high). On top of
this, asset 1 gives a higher expected return, so it is optimal to sell asset 2 short (essentially
a risky “loan” which allows the investor to buy more of asset 1).

Example 4.2 (Tangency portfolio, numerical) When .e1 ; e2 / D .0:08; 0:05/, and  D
0:8 we get " # " #
w1 0:51
D :
w2 0:49
If, instead,  D 0:8, then we get
" # " #
w1 1:54
D :
w2 0:54

4.1.3 N Risky Assets and a Riskfree Asset

In the general case with N risky assets and a riskfree asset, the portfolio weights of the
risky assets are
1
v D ˙ 1 e ; (4.24)
k
while the weight on the riskfree asset is 1 10 v. The weights of the tangency portfolio
are therefore
w D ˙ 1 e =10 ˙ 1 e : (4.25)

Proof. (of 4.25) The portfolio has the return Rp D v 0 R C .1 10 v/Rf D v 0 .R

81
Rf / C Rf . The mean and variance are

E Rp D v 0 e C Rf and Var.Rp / D v 0 ˙v:

The optimization problem is

k 0
maxv v 0 e C Rf v ˙v;
2
with first order conditions (see Appendix for matrix calculus)
1
0N 1 D e k˙v, so v D ˙ 1
e .
k

Remark 4.3 (Properties of tangency portfolio) The expected excess return and the vari-
2
ance of the tangency portfolio are eT D e0 ˙ 1 e =10 ˙ 1 e and Var.RTe / D e0 ˙ 1 e = 10 ˙ 1
e .
2
The square of the Sharpe ratio is therefore eT =T D e0 ˙ 1 e :
Figures 4.3–4.4 illustrate mean returns and standard deviations, estimated by exponen-
tially moving averages (as by RiskMetrics). Figures 4.5–4.6 show how the optimal port-
folio weights (based on mean-variance preferences). It is clear that the portfolio weights
change very dramatically—perhaps too much to be realistic.

4.1.4 A Risky Asset and a Riskfree Asset Revisited

Once we have the tangency portfolio (with weights w as in (4.25)), we can actually use
that as the risky asset in the case with only one risky asset (and a riskfree). That is, we
can treat w 0 Re as R1e in (4.2). After all, the portfolio choice is really about mixing the
tangency portfolio with the riskfree asset.
The result is that the weight on the tangency portfolio is (a scalar)
1 0
v D 1˙ 1
e ; (4.26)
k
and 1 v  on the riskfree asset.
Proof. (of (4.26)) From (4.24)–(4.25) we directly get
1 0 1 e
vD 1 ˙  w;
k ƒ‚ …

v

82
Mean excess returns (annualized Mean excess returns (annualized

0.15 0.15

0.1 0.1
Cnsmr HiTec
Manuf Hlth
0.05 0.05
1990 2000 1990 2000

Mean excess returns (annualized

0.15

0.1

Other
0.05
1990 2000

Figure 4.3: Dynamicically updated estimates, 5 U.S. industries

which is just v  in (4.26) times the tangency portfolio w from (4.25). To see that this fits
with (4.5) when w 0 Re is substituted for R1e , notice that

E.w 0 Re /
D 10 ˙ 1
e ;
Var.w 0 R/
so (4.5) could be written just like (4.26).

4.1.5 Portfolio Choice with Short Sale Constraints

The previous analysis assumes that there are no restrictions on the portfolio weights.
However, many investors (for instance, mutual funds) cannot have short positions. In this
case, the objective function is still (4.6), but with the additional restriction

0  vi  1: (4.27)

83
Std (annualized Std (annualized
0.25 0.25
Cnsmr HiTec
Manuf Hlth
0.2 0.2

0.15 0.15
1990 2000 1990 2000

Std (annualized
0.25
Other

0.2

0.15
1990 2000

Figure 4.4: Dynamicically updated estimates, 5 U.S. industries

See Figures 4.7–4.8 for an illustration.

4.2 Beta Representation of Expected Returns

For any portfolio, the expected excess return (pe ) is linearly related to the expected excess
return on the tangency portfolio (eT ) according to

Cov

R p ; RT
pe D ˇp eT , where ˇp D : (4.28)
Var .RT /
This result follows directly from manipulating the definition of the tangency portfolio
(4.25).

Example 4.4 (Effect of ˇ) Suppose the tangency portfolio has an expected excess return
of 8% (which happens to be close to the value for the US market return since WWII). An

84
Portfolio weights, Cnsmr Portfolio weights, Manuf
6 10
fixed mean
4
fixed cov
2 5
0
−2
0
1990 2000 1990 2000

Portfolio weights, HiTec Portfolio weights, Hlth


4
2
2
1
0
0
−2
−1
−4
1990 2000 1990 2000

Figure 4.5: Dynamicically updated portfolio weights, T-bill and 5 U.S. industries

asset with a beta of 0:8 should then have an expected excess return of 6:4%, and an asset
with a beta of 1:2 should have an expected excess return of 9:6%.

Most stock indices (based on the standard characteristics like industry, size, value/growth)
have betas around unity—but there are variations. For instance, building companies, man-
ufacturers of investment goods and cars are typically often very procyclical (high betas),
whereas food and drugs are not (low betas).
Proof. (of (4.28)) To derive 4.28, consider the asset 1 in the two asset case. We have

Cov .R1 ; RT / D Cov .R1 ; w1 R1 C w2 R2 / D w1 11 C w2 12 :

85
Portfolio weights, Other Portfolio weights, riskfree
0

2
−5
0
fixed mean
fixed cov −2
−10
1990 2000 1990 2000

Figure 4.6: Dynamicically updated portfolio weights, T-bill and 5 U.S. industries

3 Asset classes, 2002:12−2010:8


15
C
A MSCI world
B Global govt bonds
C Commodities

10
Mean, %

A
MV frontier
B MV frontier (no short sales)

0
0 5 10 15 20 25 30
Std, %

Figure 4.7: MV frontier, 3 asset classes

The expression for asset 2 is similar. Use the definition vi D wi .v1 C v2 / and the result
above in the first order conditions (4.9)–(4.10)

e1 D .v1 11 C v2 12 / k


0 1
B v1 v2 C
DB
@ v1 C v2 11 C 12
C k .v1 C v2 /
v1 C v2 A
„ ƒ‚ … „ ƒ‚ …
w1 w2

D Cov .R1 ; w1 R1 C w2 R2 / k .v1 C v2 /


86
D Cov .R1 ; RT / k .v1 C v2 / :
Portfolio weights (MV preferences, no short sales), 2002:12−2010:8
1

0.8

0.6 MSCI world


Bonds
0.4 Commodities

0.2

0
0 1 2 3 4 5
Risk aversion

Figure 4.8: Portfolio choice (3 asset classes) with no short sales

The expression for asset two is similar, so we collect the results as

e1 D Cov .R1 ; RT / k.v1 C v2 /;


e2 D Cov .R2 ; RT / k.v1 C v2 /:

Solve for the covariances as

Cov .R1 ; RT / D e1 A


Cov .R2 ; RT / D e2 A;

where A D 1=Œk.v1 C v2 /. These expressions will soon prove to be useful. Notice that
the variance of the tangency portfolio is

Var .RT / D Cov .w1 R1 C w2 R2 ; RT / D w1 Cov .R1 ; RT / C w2 Cov .R2 ; RT / ;

which we can rewrite by using the expressions for the covariances above

Var .RT / D w1 e1 C w2 e2 A




D eT A:

87
Consider asset 1. Divide Cov .R1 ; RT / by Var .RT /

Cov .R1 ; RT / e A
D e1 ;
Var .RT / T A

which can rearranged as (4.28).

Remark 4.5 (Why is Risk = ˇ? Short version) Because ˇ measures the covariance with
the market (and the idiosyncratic risk can be diversified away).

Remark 4.6 (Why is Risk = ˇ? Longer Version) Start by investing 100% in the market
portfolio, then increase position in asset i by a small amount (ı, 2% or so) by borrowing
at the riskfree rate. The portfolio return is then

Rp D Rm C ıRie :

The expected portfolio return is

E Rp D E Rm C ı E Rie
„ƒ‚…
incremental risk premium

and the portfolio variance is

p2 D m2 C ı 2 i2 C 2ı Cov .Ri ; Rm /:


„ ƒ‚ …
incremental risk, but ı 2 i2 0

(For instance, if ı D 2%, then ı 2 D 0:0004 and 2ı D 0:04.) Notice: risk = covariance
with the market. The marginal compensation for more risk is
incremental risk premium E Rie
D :
incremental risk 2 Cov .Ri ; Rm /
In equilibrium, the marginal compensation for more risk must be equal across assets

E Rie E Rje E Rm
e
D D ::: D ;
2 Cov .Ri ; Rm / 2 Cov Rj ; Rm

2m2

since Cov .Rm ; Rm / D m2 . Rearrange as the CAPM expression.

88
4.2.1 Beta of a Long-Short Position

Consider a zero cost portfolio consisting of one unit of asset i and minus one unit of asset
j . The beta representation is clearly

ei je D E.Ri Rj / D .ˇi ˇj /eT : (4.29)

If the two assets have the same betas, then this portfolio is not exposed to the tangency
portfolio (and ought to carry a zero risk premium, at least according to theory). Such a
long-short portfolio is a common way to isolate the investment from certain types of risk
(here the systematic risk with respect to the tangency portfolio).
Proof. (of (4.29)) Notice that

Cov Ri Rj ; RT Cov .Ri ; RT / Cov Rj ; RT


 
D D ˇi ˇj :
Var .RT / Var .RT / Var .RT /

4.3 Market Equilibrium

Suppose all agents have the same expectations about the payoff of the assets and (for
simplicity) also the same risk aversions. They will then all chose portfolios on the efficient
frontier. (An alternative interpretation is that we allow investors to have different risk
aversions, and that the portfolio weights discussed below are the average weights across
investors.)
To determine the equilibrium asset prices (and therefore expected returns) we have to
equate demand (the mean variance portfolios) with supply (exogenous). Since we assume
a fixed and exogenous supply (say, 2000 shares of asset 1 and 407 shares of asset 2),
prices (and therefore returns) are completely driven by demand.
In equilibrium, net supply of the riskfree assets is zero, which implies that the optimal
portfolio weights (4.13) must be such that the weights on the risky assets sum to unity
(v1 C v2 ). Notice that v1 and v2 then defines the tangency portfolio—which coincides
with the market portfolio, and that we can interpret the portfolio weights as the relative
market capitalization of the assets.

89
4.3.1 Finding the Equilibrium

We can solve for e1 and e2 from the expressions for the optimal portfolio weights (4.13)
as " # " #" #
e1 11 12 v1
Dk (4.30)
e2 12 22 v2
(or e D k˙v in matrix notation). Form the tangency portfolio of the left hand side to get
eT D v1 e1 C v2 e2 . Forming the same portfolio of the right hand side gives k Var .RT /.
Combining gives

eT D k 2 .RT / , or (4.31)


eT
SRT D D k .RT / (4.32)
 .RT /
If the tangency portfolio is the market portfolio, then this expression shows how the risk
premium on the market is determined. The Sharpe ratio (4.32) is often called the “market
price of risk.” Having derived an expression for the risk premium, the asset prices can be
calculated (not done here, since it is of little importance for our purposes).
Combining with the beta representation (4.28) we get

ei D ˇi k 2 .RT / D ˇi SRT  .RT / : (4.33)

This shows that the expected excess return (risk premium) on asset i can be thought of as
a product of three components: ˇi which captures the covariance with the market, SRT
which is the price of market risk (risk compensation per unit of standard deviation of the
market return), and  .RT / which measures the amount of market risk.
An important feature of (4.33) is that the only movements in the return of asset i
that matter for pricing are those movements that are correlated with the market (tangency
portfolio) returns. In particular, if asset i and j have the same betas, then they have the
same expected returns—even if one of them has a lot more uncertainty.

4.3.2 Back to Prices (Gordon Model)

The gross return, 1 C R t C1 , is defined as


D t C1 C P t C1
1 C R t C1 D ; (4.34)
Pt

90
where P t is the asset price and D t C1 the dividend it gives at the beginning of the next
period.
Rearranging gives
D t C1 P tC1
Pt D C : (4.35)
1 C R t C1 1 C R t C1
D t C2
Use the same equation but with all time subscripts advanced one period (P t C1 D 1CR t C2
C
P tC2
1CR t C2
) to substitue for P t C1
 
D t C1 1 D t C2 P tC2
Pt D C C : (4.36)
1 C R tC1 1 C R tC1 1 C R t C2 1 C R t C2
Now, substitute for P t C2 and then for P t C3 and so on. Finally, we have
D t C1 D t C2 D tC3
Pt D C C C :::
1 C R tC1 .1 C R t C1 /.1 C R t C2 / .1 C R t C1 /.1 C R tC2 /.1 C R t C3 /
(4.37)
1
X D t Cj
D Qj : (4.38)
j D1 sD1 .1 C R t Cs /

We now make three simplifying assumptions. First, we can appoximate the expec-
tation of a ratio with the ratio of expectations (E.x=y/  E x= E y). Second, that the
expected j -period returns are .1 C /j
Qj
Et sD1 .1 C R tCs /  .1 C /j : (4.39)

Third, that the expected dividends are constant E t D tCj D D and E t R t Cj D  for all
j  1. We can then write (4.38) as
1
X D D
Pt  D ; (4.40)
j D1
.1 C /j 

which is clearly the Gordon model for an asset price.


If expected dividends increase, but expected returns do no (for instance, because the
ˇ of the asset is unchanged), then this is immediately capitalized in today’s price (which
increases). In contrast, if expected dividends are unchanged, but the expected (required)
return increases, then today’s asset price decreases. According to CAPM ((4.28) and

91
(4.31)), the expected return is

 D Rf C ˇem
D Rf C ˇk 2 .Rm / : (4.41)

This expected return increases when (i) the riskfree rate increases; (ii) the market risk
premium increases because of higher risk aversion or higher (beliefs about) market un-
certainty; (iii) or when (beliefs about) beta increases.

4.3.3 CML and SML

According to CAPM, all optimal portfolios (denoted opt ) are on the capital market line
em
opt D Rf C opt ; (4.42)
m
where em and m are the expected value and the standard deviation of the excess return
of the market portfolio. This is clearly the same as the upper leg of the MV frontier (with
risky assets and riskfree asset). See Figure 4.9 for an example.
Proof. (of (4.42)) Ropt D aRm C .1 a/Rf , so Ropt e
D aRm e
. We then have
opt D am and opt D am (since a  0). Solve for a from the latter (a D opt =m )
e e

and use in the former.


CAPM also implies that the beta representation (4.28) holds (where the tangency port-
folio equals the market portfolio). Rewriting we have

Second, we get a beta representation (see lecture notes on CAPM). For any portfolio,
the expected excess return (pe ) is linearly related to the expected excess return on the
market portfolio (em ) according to

i D Rf C ˇi .m Rf /: (4.43)

The plot of i against ˇi (for different assets, i) is called the security market line. See
Figure 4.9 for an example.

92
Capital market line Security market line
15 15

10 10
Mean, %

Mean, %
5 5

0 0
0 5 10 15 0 0.5 1 1.5 2
Std, % β
CML: ER = Rf + σ(ERm−Rf)/σm SML: ER = Rf + β(ERm−Rf)
Location of efficient portfolios Location of all assets

Figure 4.9: CML and SML

4.4 An Application of MV Portfolio Choice: International Assets

4.4.1 Foreign Investments

Let the exchange rate, S , be defined as units of domestic currency per unit of foreign cur-
rency, that is the price (measured in domestic currency) of foreign currency. For instance,
if we take Switzerland to be the domestic economy, then we have around 1.5 CHF per
EUR. Notice that a higher S means a weaker home currency (depreciation) and a lower
S means a stronger home currency (appreciation).
To be really concrete, suppose we bought a foreign asset in t at the price P t , measured
in foreign currency; the cost in domestic currency was then S t P t . One period later (in
t C 1), the value of the asset (in foreign currency) is P tC1 (think of this as the total
value, including dividends or whatever); the value in domestic currency is thus S t C1 P tC1 .
Clearly, the net return in domestic currency (unhedged), Ru , satisfies

S t C1 P tC1
1 C Ru D (4.44)
S t P t
S t C1 P tC1
D
S t P t
D .1 C Rs /.1 C R /; (4.45)

where RS is the return on the currency investment (buying foreign currency in t, selling

93
it in t C 1) and R is just the “local” return of the foreign asset (the return measured in
foreign currency).
Clearly, we can rewrite the net return as

Ru D RS C R C RS R (4.46)
 RS C R (4.47)

where the approximation follows from the fact that the product of two net returns is typ-
ically very small (for instance, 0:05  0:03 D 0:0015). If we instead use log return (the
log of the gross return), then there is no approximation error at all.
The approximation is used throughout this section (since it simplifies many expres-
sions considerably). The expected return and the variance (in domestic currency) are then

E Ru  E RS C E R , and (4.48)
Var.Ru /  Var.RS / C Var.R / C 2 Cov.RS ; R /: (4.49)

To apply the CAPM analysis to the problem of whether to invest internationally or


not, suppose we have only two risky assets: a risky foreign equity index (with domestic
currency return Rw ) and a risky domestic equity index (denoted d ). Then, according to
(4.17) we should invest internationally if ew =w > ed =d . This says that a high Sharpe
ratio of the foreign asset (measured in domestic currency) or a low correlation with the
domestic return both lead to investing internationally.

4.4.2 Exchange Rate Hedging

The return on the foreign investment has two components: the return on the currency
(exchange rate change) and the local foreign return on the asset (equity, say). It may often
be useful to hedge the exchange rate component. For instance, the investor may have
good knowledge about the properties of the equity return, but not about the exchange rate
movements.
In practice, it is not entirely straightforward to hedge the exchange rate component
entirely. The reason is that the future local foreign return is not known with certainty—
that is, we don’t know how many units of currency we need to hedge. In terms of (4.44),
this is just saying that the investor does not know (in t) what P tC1 will be.
There are several ways to deal with this—but they are all just approximations. One

94
t t C1
write contract: pay F ,
agree on F get asset

Figure 4.10: Timing convention of forward contract

possibility is to just hedge the initial investment, that is, S t P t in (4.44). Another pos-
sibility is to hedge a bit more to incorporate the expected local return. To simplify the
subsequent analysis I will assume that we somehow could create a perfect hedge.
An exchange rate hedge clearly affects both expected return and the volatility—and
it is not obvious in which way. We therefore need to dig into the details a bit. Hedging
means entering a forward (or futures) contract which guarantees us a known exchange
rate in the future, F t . See Figure 4.10 for the structure of a forward contract. (A forward
contract is typically a private contract between two investors. A futures contract is similar,
but is typically traded on an exchange. Futures and forwards have very similar, but not
identical, prices.)
In terms of (4.44)—(4.47), we then have the return in domestic currency (hedged)

F t P tC1
1 C Rh D , or (4.50)
S t P t
Rh  RSF C R ; (4.51)

where RSF is the return on the currency hedging part: buy foreign currency in t (at the
price S t ), sell foreign currency in t C1 (at the pre-agreed forward price F t ). To understand
this we have to analyze the pricing for forward contracts.
The expected return and the variance are therefore

E Rh  RSF C E R and (4.52)


Var.Rh /  Var.R /; (4.53)

since RSF is known at the time of investment.

95
The difference in the expected return is

E Ru E Rh  E RS RSF ; (4.54)

which is the difference between the expected return on an uncovered and covered in-
vestment in the foreign currency. If uncovered interest rate parity (UIP) holds, then this
difference is zero. Although it is unclear if UIP actually is a reasonable approximation,
the deviations from it show few systematic patterns. There may be a chance of saying
more about this difference—for a given time and market—but that requires a thorough
knowledge of the FX market and the monetary policy setting.
The difference in variance is

Var.Ru / Var.Rh /  Var.RS / C 2 Cov.RS ; R /; (4.55)

which can have either sign. Although the first term must be positive, the second term can
easily be negative—maybe so negative that the whole difference is negative: the foreign
investment without a currency hedge can be a safer investment. To make the covariance in
(4.55) negative, the exchange rate S must have a tendency to decrease (foreign currency
becomes cheaper) at the same time as the local foreign return is positive. As an example
of how this could happen, let Switzerland be the domestic economy and the Euro zone
the foreign economy. If the European Central Bank takes steps that decrease the value of
the euro, then it might be the case that European firms become more competitive.
See Figure 4.11 for an example.

Remark 4.7 (Forward-Spot Parity) The forward-spot parity for any asset without inter-
mediate dividends is

Forward price = Spot price  (1+interest rate).

The intuition for this expression is that a forward contract is like buying the asset today,
but on credit.

Remark 4.8 (Covered Interest Rate Parity, CIP). CIP is just the forward-spot parity ap-
plied to exchange rates. To derive it, let i and i  be the domestic and foreign interest
rates respectively. The spot price (in t ) of getting of one unit of foreign currency in t C 1
is S t =.1 C i  /: since we get one units foreign currency in t C 1 if we buy 1=.1 C i  / units

96
of foreign currency in t (this is the same as buying one foreign short-term bill). Plugging
into the forward-spot parity gives
St
Ft D .1 C i/:
1 C i
Watch out for the length of period: interest rates are quoted on an annual basis. The
previous equations have implicitly assumed that the period between t and t C 1 is one
year.

Remark 4.9 (Return from the currency position) The return RSF from the currency hedg-
ing part in (4.51) equals F t =S t 1. From covered interest rate parity, we get that this
equals .1 C i/=.1 C i  / 1  i i  . Comparing with the unhedged currency return in
(4.45), Rs D S t C1 =S t , we see that the unhedged position gives a higher average return
than the hedged position if the home currency depreciates more than predicted by UIP
(the interest rate differential).

4.4.3 Invest in Foreign Stocks? Rule-of-Thumb

The result in (4.18) provides a simple rule of thumb for whether we should invest in
foreign assets or not. Let asset 1 represent a domestic market index, and asset 2 a foreign
market index. The rule is then: invest in the foreign market if its Sharpe ratio is higher
than the Sharpe ratio of the domestic market times the correlation of the two markets (that
is, if e2 =2 > e1 =1 ). Clearly, the returns should be measured in the same currency
(but the currency risk may be hedged or not).
See Figure 4.12 for an example.

Bibliography
Danthine, J.-P., and J. B. Donaldson, 2002, Intermediate financial theory, Prentice Hall.

Elton, E. J., M. J. Gruber, S. J. Brown, and W. N. Goetzmann, 2010, Modern portfolio


theory and investment analysis, John Wiley and Sons, 8th edn.

97
US, index value in USD UK, index value in USD
8 Returns (unhedged, hedged):
Mean 9.9 9.9 6 Mean 10.9 7.5
6 Std 16.0 16.0 Std 19.4 15.6
4 4
unhedged
2 hedged 2
1990 1995 2000 2005 2010 1990 1995 2000 2005 2010

FR, index value in USD GE, index value in USD

6 6
Mean 9.0 8.4 Mean 9.6 9.5
Std 23.1 19.3 Std 25.0 21.9
4 4
2 2
1990 1995 2000 2005 2010 1990 1995 2000 2005 2010

JP, index value in USD


1.5
Mean −2.6 1.0
1 Std 23.9 20.2

0.5
1990 1995 2000 2005 2010

Figure 4.11: International stock indices

98
Investing in foreign equity: SR(foreign) > Corr(foreign,home) × SR(home)
0.5

0.4

0.3

0.2

0.1

−0.1 Returns are measured in USD


Home market is US
−0.2
Sample: 1989:1−2010:9
−0.3
SR(foreign)
−0.4
Corr(foreign,home) × SR(home)
−0.5
US UK FR GE JP

Figure 4.12: International stock indices

99
5 Utility-Based Portfolio Choice
Reference: Elton, Gruber, Brown, and Goetzmann (2010) 12 and 18
Additional references: Danthine and Donaldson (2002) 5–6; Huang and Litzenberger
(1988) 4–5; Cochrane (2001) 9 (5); Ingersoll (1987) 3–5 (6)
Material with a star ( ) is not required reading.

5.1 Utility Functions and Risky Investments

Any model of portfolio choice must embody a notion of “what is best?” In finance, that
often means a portfolio that strikes a good balance between expected return and its vari-
ance. However, in order to make sense of that idea—and to be able to go beyond it—we
must go back to basic economic utility theory.

5.1.1 Specification of Utility Functions, U.W /

In theoretical micro the utility function U.x/ is just an ordering without any meaning of
the numerical values: U.x/ > U.y/ only means that the bundle of goods x is preferred to
y (but not by how much). In applied microeconomics we must typically be more specific
than that : we need to specify the functional form of U.x/. As an example, to generate
demand curves for two goods (x1 and x2 ), we may choose to specify the utility function
as U.x/ D x1˛ x21 a (a Cobb-Douglas specification).
In finance (and quite a bit of microeconomics that incorporate uncertainty), the key
features of the utility functions that we use are as follows.
First, utility is a function of a scalar argument, U.x/. This argument (x) can be end-of-
period wealth, consumption or the portfolio return. In particular, we don’t care about the
composition of the consumption basket. In one-period investment problems, the choice
of x is irrelevant since consumption equals wealth, which in turn is proportional to the
portfolio return.
Second, uncertainty is incorporated by letting investors maximize expected utility,
E U.x/. Since returns (and therefore wealth and consumption) are uncertain, we need

100
some way to rank portfolios at the time of investment. Almost always that means that we
use expected utility (see Section 5.1.2) As an example, suppose there are two states of the
world: W (wealth) will be either 1 or 2 with probabilities 1=3 and 2=3. If U.W / D ln W ,
then E U.W / D 1=3  ln 1 C 2=3  ln 2:
Third, the functional form of the utility function is such that more is better and uncer-
tainty is bad (investors are risk averse).

5.1.2 Expected Utility Theorem


P is the right thing to maximize if the investors’ preferences
Expected utility, E U.W /,
U.W / are

1. complete: can rank all possible outcomes;

2. transitive: if A is better than B and B is better than C , then A is better than C


(often violated in experiments);

3. independent: if X and Y are equally preferred, and Z is some other outcome, then
the following gambles are equally preferred

X with prob P and Z with prob 1 P


Y with prob P and Z with prob 1 P

(this is the key assumption); and

4. such that every gamble has a certainty equivalent (a non-random outcome that gives
the same utility, fairly trivial).

5.1.3 Basic Properties of Utility Functions: (1) More is Better

The idea that more is better (nonsatiation) is almost trivial. If U.W / is differentiable, then
this is the same as that marginal utility is positive, U 0 .W / > 0.

Example 5.1 (Logarithmic utility) U.W / D ln W so U 0 .W / D 1=W (assuming W >


0).

101
5.1.4 Basic Properties of Utility Functions: (2) Risk aversion

With a utility function, risk aversion (uncertainty is considered to be bad) is captured by


the concavity of the function.
As an example, consider Figure 5.1. It shows a case where the portfolio (or wealth, or
consumption,...) of an investor will be worth Z or ZC , each with a probability of a half.
This utility function shows risk aversion since the utility of getting the expected payoff
for sure is higher than the expected utility from owning the uncertain asset

U ŒE.Z/ > 0:5U .Z / C 0:5U .ZC / D E U .Z/ : (5.1)

Rearranging gives

U ŒE.Z/ U .Z / > U .ZC / U ŒE.Z/; (5.2)

which says that a loss (left hand side) counts for more than a gain of the same amount.
Another way to phrase the same thing is that a poor man appreciates an extra dollar more
than a rich man. This is a key property of a concave utility function—and it has an
immediate effect on risk premia.
The (lowest) price (P ) the investor is willing to sell this portfolio for is the certain
amount of money which gives the same utility as E U .Z/, that is, the value of P that
solves the equation
U.P / D E U .Z/ : (5.3)

This price is also called the certainty equivalent of the portfolio. From (5.1) we know that
this utility is lower than the utility from the expected payoff, U.P / < U ŒE.Z/—and we
also know that the utility function is an increasing function. It follows directly that the
price is lower than the expected payoff

P < E .Z/ D 0:5Z C 0:5ZC : (5.4)

See Figure 5.1 for an illustration.

Example 5.2 (Certainty equivalent) Suppose you have a CRRA utility function and own
an asset that gives either 85 or 115 with equal probability. What is the certainty equivalent

102
Concave utility function

U(EZ)

U(P) EU(Z)
Utility

Two outcomes (Z− or Z+) with equal probabilities


EZ = 0.5Z− + 0.5Z+

P is the certainly equivalent: solves U(P) = EU(Z)


risk aversion implies that P < EZ

Z− P EZ Z+

Figure 5.1: Example of a utility function

(that is, the lowest price you would sell this asset for)? The answer is the P that solves

P1 k 851 k 1151 k
D 0:5 C 0:5 :
1 k 1 k 1 k
(The answer is P D .0:5  851 k C 0:5  1151 k /1=.1 k/ :) For instance, with k D 0, 2,
5, 10, and 25 we have P  100, 97.75, 94.69, 91.16, and 87.49. Note that if we scale the
asset payoffs (here 85 and 115) with some factor, then the price is scaled with the same
factor. This is a typical feature of the CRRA utility function.

This means that the expected gross return on the risky portfolio that the investor de-
mands is
E RZ D E.Z/=P > 1; (5.5)

which is greater than unity. This “required return” is higher if the investor is very risk
averse (very concave utility function). On the other hand, it goes towards unity as the
investor becomes less and less risk averse (the utility function becomes more and more
linear). In the limit (a risk neutral investor), the required return is unity. Loosely speaking,
we can think of E RZ 1 as a risk premium (more generally, the risk premium is E RZ
minus a riskfree rate). Notice that this analysis applies to the portfolio (or wealth, or

103
consumption,...) that is the argument of the utility function—not to any individual asset.
To analyse an individual asset, we need to study how it changes the argument of the utility
function, so the covariance with the argument plays a key role.

Example 5.3 (Utility and two states) Suppose the utility function is logarithmic and that
.Z ; ZC / D .1; 2/. Then, expected utility in (5.1) is

E U .Z/ D 0:5 ln 1 C 0:5 ln 2  0:35;

so the price must be such that

ln P  0:35, that is, P  e 0:35  1:41:

The expected return (5.5) is

.0:5  1 C 0:5  2/ =1:41  1:06:

5.1.5 Is Risk Aversion Related to the Level of Wealth?

We now take a closer look at what the functional form of the utility function implies for
investment choices. In particular, we study if risk aversion (and portfolio weights) will be
related to the wealth level.
To make this simple, consider choosing between a risky asset and a riskfree asset. It
can be shown that the amount invested in the risky asset changes with the wealth level
according to the change of the absolute risk aversion

U 00 .W /
A.W / D ; (5.6)
U 0 .W /

where U 0 .W / is the first derivative and U 00 .W / the second derivative. If the absolute risk
aversion increases, the amount of risky assets decreases.
Similarly, the portfolio weight on the risky asset changes with the wealth level accord-
ing to the change of the relative risk aversion

W U 00 .W /
R.W / D WA.W / D : (5.7)
U 0 .W /
If the relative risk aversion increases, the fraction of risky assets decreases.

104
Figure 5.2 demonstrates a number of commonly used utility functions, and the fol-
lowing discussion outlines their main properties.
The quadratic utility function, U.W / D W .k=2/W 2 , is very easy to handle (since
the first order conditions are linear), but it is important to make sure that only the upward
sloping part is used—this is where the utility function shows nonsatiation (“more is bet-
ter”). Since U 0 .W / D 1 kW , the upward sloping part is for W < 1=k. However,
an odd feature of this utility function is that a poor investor invests a larger amount (and
certainly fraction) in the risky asset that a rich investor. See the following remark for the
algebra.

Remark 5.4 (Risk aversion in quadratic utility function) U.W / D W .k=2/W 2 gives
U 0 .W / D 1 kW and U 00 .W / D k, so A.W / D k=.1 kW /. The denominator is
positive (to be on the upward sloping part), but becomes smaller as W increases. Hence,
the absolute risk aversion is increasing in the wealth level.

The CARA utility function (constant absolute risk aversion), U.W / D e kW , is also
quite simple to use (in particular when returns are normally distributed—see below), but
has the unappealing feature that the amount invested in the risky asset (in a risky/riskfree
trade-off) is constant across (initial) wealth levels. This means, of course, that wealthy
investors have a lower portfolio weight on risky assets. See the following remark for the
algebra.

Remark 5.5 (Risk aversion in CARA utility function) U.W / D e kW gives U 0 .W / D


ke kW and U 00 .W / D k 2 e kW , so we have A.W / D k. The absolute risk aversion
does not change with the wealth level, so the amount invested in the risky asset remains
unchanged as the initial wealth changes. Clearly, this means an increasing relative risk
aversion, R.W / D W k, so a poor investor has a larger portfolio weight on the risky
asset than a rich investor.

The CRRA utility function (constant relative risk aversion) is often harder to work
with, but has the nice property that the portfolio weights are unaffected by the initial
wealth (once again, see the following remark for the algebra). Most evidence suggests
that the CRRA utility function fits data best. For instance, historical data show no trends
in portfolio weights or risk premia—in spite of investors having become much richer over
time.

105
Quadratic utility CARA
W − (k/2)W 2

− exp (−kW )

k=2 k=2
k=5 k=5

W W

CRRA
1 −k
W /(1 − k)

k=2
k=5

Figure 5.2: Examples of utility functions

Remark 5.6 (Risk aversion in CRRA utility function) U.W / D W 1 k =.1 k/ gives
U 0 .W / D W k and U 00 .W / D kW k 1 , so we have A.W / D k=W and R.W / D k.
The absolute risk aversion decreases with the wealth level in such a way that the relative
risk aversion is constant. In this case, a poor investor has the same portfolio weight on
the risky asset as a rich investor.

5.2 Utility Optimization and the Two-Fund Theorem

This section demonstrates that if the utility function can be (re-)written in terms of the
expected value and the variance of wealth, then the investor will hold a mix of only
2 portfolios (funds): the riskfree asset and the tangency portfolio (from mean-variance
analysis).

106
5.2.1 General Utility-Based Portfolio Choice

For simplicity, assume that consumption equals wealth, which we normalize to unity. The
optimization problem with a general utility function, two risky and a riskfree asset is then

maxv1 ;v2 E U Rp , where (5.8)




Rp D v1 R1 C v2 R2 C .1 v1 v2 /Rf (5.9)
D v1 R1e C v2 R2e C Rf : (5.10)

where Rie is the excess return on asset i and Rf is a riskfree rate.


The first order conditions for the portfolio weights are

@ E U.Rp / @ E U.Rp /
D 0 and D 0; (5.11)
@v1 @v2
which defines two equations in two unknowns: v1 and v2 (use (5.10) to substitute for Rp ).
Suppose we have chosen some utility function and that we know the distribution of
the returns—it should then be possible to solve (5.11) for the portfolio weights. Unfor-
tunately, that can be fairly complicated. For instance, utility might be highly non-linear
so the calculation of its expected value involves difficult integrations (possibly requiring
numerical methods since there is no analytical solution). With many assets there are many
first order conditions, so the system of equations can be large.

Remark 5.7 (Alternative way of writing the first order condition ) In some treatments of
advanced finance topics, you may find the first order condition written as EŒ@U.Rp /=@Rp 
R1e  D 0 instead. It is straightforward to rewrite (5.11) on this form. First, notice that
@ E U.Rp /=@v1 D EŒ@U.Rp /=@v1 . To see why assume 2 possible outcomes Rp and RpC
with a probability  of the former. Then

@ U.Rp / C .1 /U.RpC / @U.RpC /


 
@ E U.Rp / @U.Rp / @U.Rp /
D D C.1 / DE :
@Rp @Rp @Rp @Rp @Rp

Second, use the chain rule to write EŒ@U.Rp /=@v1  as EŒ@U.Rp /=@Rp  @Rp =@v1  and
notice that @Rp =@v1 D R1e .

Example 5.8 (Portfolio choice with log utility and two states) Suppose U.Rp / D ln Rp ,
and that there is only one risky asset. The excess return on the risky asset Re is either

107
Utility, Eln(R)
0.15

0.1

0.05

0
Rf = 1.1
Re = −0.3 or 0.4 with equal probability
−0.05
−1 −0.5 0 0.5 1
Weight on risky asset

Figure 5.3: Example of portfolio choice with a log utility function

a low value Re (with probability ) or a high value ReC (with probability 1 ). The
optimization problem is then

maxv E U Rp where E U Rp D  ln vRe C Rf C .1 / ln vReC C Rf :


   

The first order condition (@ E U Rp =@v D 0) is




Re ReC
 C .1 / D 0;
vRe C Rf vReC C Rf

so we can solve for the portfolio weight as

Re C .1 / ReC
vD Rf :
Re ReC
For instance, with Rf D 1:1; Re D 0:3; ReC D 0:4, and  D 0:5, we get

0:5  . 0:3/ C .1 0:5/ 0:4


vD 1:1  0:46:
. 0:3/  0:4
See Figure 5.3 for an illustration.

108
Utility contours, E(Rp) − (k/2)Var(Rp)
0.1
k=5

0.08
k=7
0.06
k=9
Mean

0.04

0.02 Covariance matrix:


0.0256 0.0000
0.0000 0.0144
0
0 0.05 0.1 0.15
Std

Figure 5.4: Iso-utility curves, mean-variance utility with different risk aversions

5.2.2 When is the Optimal Portfolio on the Minimum-Variance Frontier?

There are important cases where we can side-step most of the problems with solving
(5.11)—since it can be shown that the portfolio choice will actually be such that a portfo-
lio on the minimum-variance frontier (upper MV frontier) will be chosen.
The optimal portfolio must be on the minimum-variance frontier when expected utility
can be (re-)written as a function in terms of the expected return (increasing) and the
variance (decreasing) only, that is

E U Rp D V .p ; p2 /; (5.12)




with @V .p ; p2 /=@p > 0 and @V .p ; p2 /=@p2 < 0:

For an illustration, see Figure 5.4 which shows the isoutility curves (curves with equal
utility) from a mean-variance utility function (E U.Rp / D p .k=2/ p2 ). Whenever
expected utility obeys (5.12) (not just for the mean-variance utility function) the isoutil-
ity curves will look similar—so the optimum is on the minimum-variance frontier. The
intuition behind (5.12) is that an investor wants to move as far to the north-west as possi-

109
ble in Figure 5.4—but that he/she is willing to trade off lower expected returns for lower
volatility, that is, has isoutility functions as in the figure. What is possible is clearly given
by the mean-variance frontier—so the solution is a point on the upper frontier. (This can
also be shown algebraically, but it is slightly messy.) Conditions for (5.12) are discussed
below.
In the case with both a riskfree and risky assets, this means that all investors (provided
they have the same beliefs) will pick some mix of the riskfree asset and the tangency
portfolio (where the ray from the riskfree rate is tangent to the mean-variance frontier of
risky assets). This is the two-fund theorem. Notice that all this says is that the optimal
portfolio is somewhere on the mean variance frontier. We cannot tell exactly where unless
we are more precise about the exact form of the preferences.
See Figures 5.5–5.6 for examples of cases when we do not get a mean-variance port-
folio.

Remark 5.9 (Taylor expansion of the utility function) Make a Taylor series expansion of
the utility function around the expected portfolio return
 1 2
U.Rp / D U.E Rp / C U 0 .E Rp / Rp E Rp C U 00 .E Rp / Rp E Rp
2
1 3
C U 000 .E Rp / Rp E Rp C H4 :
6
Take expectations to get
1 1
E U.Rp / D U.E Rp / C U 00 .E Rp / Var.Rp / C U 000 .E Rp /Skew.Rp / C E H4 ;
2 6
3
where Skew.Rp / is the third central moment, E Rp E Rp . For a CRRA utility func-
tion, .1 C Rp /1 =.1 /, we have

U 00 .E Rp / D .1 C E Rp / 1
< 0 and U 000 .E Rp / D .1 C /.1 C E Rp / 2
> 0;

so variance is bad, but skewness is good. For a normal distribution, the skewness is zero.

5.2.3 The Equilibrium Effect of the Two-Fund Theorem

If all investors hold the tangency portfolio, then it must be the market portfolio—since the
riskfree asset is in zero net supply: the average (or aggregate) investor holds no riskfree

110
Expected utility Expected utility, contours
2

−0.18 1.8

−0.19 1.6

v2
1.4
−0.2
2 1.2
1.5 −0.4
−1 −0.8−0.6 1
v2 1 −1.2 v −1.2 −1 −0.8 −0.6 −0.4
1
v1
MV frontiers Utility function:
R1−γ/(1−γ), γ=5
Two risky assets (A and B) and one riskfree asset
Three states with equal probability
B
A B Rf
1.1
µ

State 1 0.970 0.960 1.065


A State 2 1.080 1.220 1.065
State 3 1.200 1.150 1.065

1.05
0 0.05 0.1 0.15 0.2
σ

Figure 5.5: Example of when the optimal portfolio is (very slightly) off the MV frontier

assets. Note that this observation is about how the equilibrium must look like—not about
how we get there.
There are several important implications of this. First, all optimal portfolios (denoted
opt ) are on the capital market line
em
opt D Rf C opt ; (5.13)
m
where em and m are the expected value and the standard deviation of the excess return
of the market portfolio. This is clearly the same as the upper leg of the MV frontier (with
risky assets and riskfree asset). See Figure 5.7 for an example.
Proof. (of (5.13)) Ropt D aRm C .1 a/Rf , so Ropt e
D aRm e
. We then have

111
Expected utility Expected utility, contours
2

1.12 1.8
1.1
1.6

v2
1.08
1.4
1.06
2 1.2
1.5 −0.4
−1 −0.8−0.6 1
v2 1 −1.2 v −1.2 −1 −0.8 −0.6 −0.4
1
v1
MV frontiers Utility function:
E(R)−(k/2)Var(R)+(l/3)Skew(R), k=3.6, l=0.15
Two risky assets (A and B) and one riskfree asset
Three states with equal probability
B
A B Rf
1.1
µ

State 1 0.970 0.960 1.065


A State 2 1.080 1.220 1.065
State 3 1.200 1.150 1.065

1.05
0 0.05 0.1 0.15 0.2
σ

Figure 5.6: Example of when the optimal portfolio is (very slightly) off the MV frontier

eopt D aem and opt D am (since a  0). Solve for a from the latter (a D opt =m )
and use in the former.
Second, we get a beta representation (see lecture notes on CAPM). For any portfolio,
the expected excess return (pe ) is linearly related to the expected excess return on the
market portfolio (em ) according to

Cov .Ri ; Rm /
ei D ˇi em , where ˇi D : (5.14)
Var .Rm /

The plot of i (D ei CRf ) against ˇi (for different assets, i ) is called the security market
line. See Figure 5.7 for an example.

Remark 5.10 (Minimum variance portfolios of risky assets only ) Any portfolio in the

112
Capital market line Security market line
15 15

10 10
Mean, %

Mean, %
5 5

0 0
0 5 10 15 0 0.5 1 1.5 2
Std, % β
CML: ER = Rf + σ(ERm−Rf)/σm SML: ER = Rf + β(ERm−Rf)
Location of efficient portfolios Location of all assets

Figure 5.7: Capital market line and security market line

set of minimum variance portfolios solves the problem min  2 Rp subject to the re-


strictions that the portfolio mean is  (E Rp D  ) and that the weights sum to unity
n
(˙iD1 wi D 1). We can retrace the entire set by combining any two portfolios in this
set. For instance, we can use we D wg C .1 / wT , where wg D ˙ 1 1n =10n ˙ 1 1n
is the global minimum variance portfolio and wT D ˙ 1 e =10 ˙ 1 e is the tangency
portfolio. The mean net return can be calculated as we0  and the variance as we0 ˙we .

Remark 5.11 (Minimum variance portfolios with risky and riskfree assets ) Adding a
riskfree asset with gross return Rf transforms the minimum-variance set to two straight
lines. The upper one (typically) is a ray that starts at Rf and goes through the tangency
portfolio.

5.2.4 Special Cases

This section outlines special cases when the utility-based portfolio choice problem can be
rewritten as in (5.12) (in terms of mean and variance only), so that the optimal portfolio
belongs to the minimum-variance set. (Recall that with a riskfree asset this minimum-
variance set is a ray that starts at Rf and goes through the tangency portfolio.)

113
Case 1: Mean-Variance Utility

We know that if the investor maximizes E.Rp /  2 .Rp /k=2, then the optimal portfolio
is on the mean-variance frontier. Clearly, this is the same as assuming that the utility
function is U.Rp / D Rp ŒRp E.Rp /2 k=2 (evaluate E U.Rp / to see this).

Case 2: Quadratic Utility

If utility is quadratic in the return (or equivalently, in wealth)

U.Rp / D Rp bRp2 =2; (5.15)

then expected utility can be written

E U.Rp / D E Rp b E.Rp2 /=2


D E Rp bŒ 2 .Rp / C E.Rp /2 =2 (5.16)

since  2 .Rp / D E.Rp2 / E.Rp /2 . (We assume that all these moments are finite.) For
b > 0 this function is decreasing in the variance, and increasing in the mean return if we
consider low wealth levels. The optimal portfolio is therefore on the minimum-variance
frontier. See Figure 5.9 for an example.
The main drawback with this utility function is that we have to make sure that we are
on the portion of the curve where utility is increasing in wealth. Moreover, we already
know that the quadratic utility function has the strange property that the amount invested
in risky assets decreases as wealth increases (increasing absolute risk aversion).

Case 3: Normally Distributed Returns

When the distribution of any portfolio return is fully described by the mean and variance,
then maximizing E U.Rp / will result in a mean variance portfolio—under some extra
assumptions about the utility function discussed below. A normal distribution (among a
few other distributions) is completely described by its mean and variance. Moreover, any
portfolio return would be normally distributed if the returns on the individual assets have
a multivariate normal distribution (recall: x C y is normally distributed if x and y are).
The extra assumptions needed are that utility is strictly increasing in wealth (U 0 .Rp / >
0), displays risk aversion (U 00 .Rp / < 0), and utility must be defined for all possible out-

114
comes. The later sounds trivial, but it is not. For instance, the logarithmic utility function
U.Rp / D ln.Rp / cannot be combined with returns (end of period wealth) that can take
negative values (for instance, ln. 1/ D  i which is not a real number which is something
we require from a utility function).
The algebra required to show this is a bit messy, but the idea is essentially that the
mean and variance fully describe the normal distribution. Since increasing concave util-
ity functions are increasing in the mean and decreasing in the variance (of the portfolio
return), the result is quite intuitive.
Normally distributed returns should be considered as an approximation for three rea-
son. First, limited liability means that the gross return can never be negative (the asset
price cannot be negative), that is, the simple net return can never be less than -100%. A
normal distribution cannot rule out this possibility (although it may have a very low prob-
ability). Second, option returns have distributions which are clearly different from normal
distributions: a lot of probability mass at exactly -100% (no exercise) and then a contin-
uous distribution for higher returns. Third, empirical evidence suggests that most asset
returns have distributions with fatter tails and more skewness than implied by a normal
distribution, especially when the returns are measured over short horizons.
As an illustration, suppose the investor maximizes a utility function with constant
absolute risk aversion k > 0

U.Rp / D exp (5.17)



Rp k :

(It is straightforward that this utility function satisfies the extra conditions.) Maximizing
this utility function is the same as maximizing MV preferences.
Proof. (of that CARA gives MV behaviour) First, recall that if x  N ;  2 , then

2
E e x D e C =2 . Therefore, rewrite expected utility as

E U.Rp / D E exp D exp E.Rp /k C  2 .Rp /k 2 =2 :


   
Rp k

Notice that the assumption of normally distributed returns is crucial for this result. Sec-
ond, recall that if x maximizes (minimizes) f .x/, then it also maximizes (minimizes)
g Œf .x/ if g is a strictly increasing function. The function ln . z/ =k is defined for
z < 0 and it is increasing in z, see Figure 5.8. We can apply this function by letting z be

115
−ln(−z)/k ln[z(1−γ)]/(1−γ)
2
k=1 0.5 γ=3
1 k=5 γ=5
0
0
−0.5
−1
−1
−2
−10 −8 −6 −4 −2 0 −10 −8 −6 −4 −2 0
z z

Figure 5.8: Transforming expected utility

the right hand side of the previous equation to get

ln. z/=k D E.Rp /  2 .Rp /k=2:

Therefore, maximizing the expected CARA utility or MV preferences gives the same
solution.

Case 4: CRRA Utility and Lognormally Distributed Portfolio Returns

There are certainly other cases where maximizing E U.W / or E U.Rp / gives a mean
variance portfolio—or something very close to it.
In particular, consider a CRRA utility function, .1 C Rp /1 =.1 /, and suppose all
log portfolio returns, rp D ln.1 C Rp /, happen to be normally distributed. (This should
be thought of as an approximation since 1 C Rp D ˛.1 C R1 / C .1 ˛/.1 C R2 / is not
lognormally distributed even if both R1 and R2 are.) The solution is then, once again, on
the minimum-variance frontier. This result is especially useful in analysis of multi-period
investments.
See Figure 5.9 for an example.
Proof. (of that CRRA utility and lognormal portfolio returns give MV behaviour).
Notice that
E.1 C Rp /1
E expŒ.1 /rp 
D , where rp D ln.1 C Rp /:
1 1

Since rp is normally distributed, the expectation is (recall that if x  N.;  2 /, then

116
Utility contours, CAPM Utility contours, quadratic utility
0.1 0.1
mean net return

mean net return


o o

0.05 0.05

normal returns normal returns


0 0
0 0.05 0.1 0.15 0 0.05 0.1 0.15
Std Std

Utility contours, CRRA, γ=7 Utility contours, CRRA, γ=11


0.1 0.1
mean net return

mean net return


o o

0.05 0.05

lognormal returns lognormal returns


0 0
0 0.05 0.1 0.15 0 0.05 0.1 0.15
Std Std

Figure 5.9: Contours with same utility level when returns are normally or lognormally
distributed. The means and standard deviations (on the axes) are for the net returns (not
log returns).

2 =2
E e x D e C )
1 1
E expŒ.1 /rp  D expŒ.1 / E rp C .1 /2  2 .rp /=2:
1 1

Assume that > 1. The function ln Œz.1 / =.1 / is then defined for z < 0 and it is
increasing in z, see Figure 5.8.b. Let z be the the right hand side of the previous equation
and apply the transformation to get

E rp C .1 / 2 .rp /=2;

which is increasing in the expected log return and decreasing in the variance of the log
return (since we assumed 1 < 0). To express this in terms of the mean and variance

117
of the return instead of the log return we use the following fact: if ln y  N.;  2 /, then
E y D exp.C 2 =2/ and Std .y/ = E y D exp. 2 / 1. Using this fact on the previous
p

expression gives

ln E.1 C Rp / lnŒ 2 .Rp /=.1 C E Rp /2 C 1=2;

which is increasing in E Rp and decreasing in  2 .Rp /. We therefore get a mean-variance


portfolio.

5.3 Application of Normal Returns: Value at Risk, ES, Lpm and the
Telser Criterion

The mean-variance framework is often criticised for failing to distinguish between down-
side (considered to be risk) and upside (considered to be potential). This section illus-
trates that normally distributed returns often lead to minimum variance portfolios even
if the portfolio selection model seems to be far from the standard mean-variance utility
function.

5.3.1 Value at Risk and the Telser Criterion

If the return is normally distributed, R  N.;  2 /, then the ˛ value at risk, VaR˛ , is

VaR˛ D . C c1 ˛ /; (5.18)

where c1 ˛ is the 1 ˛ quantile of a N(0,1) distribution, for instance, 1:64 for 5%.

Example 5.12 (VaR with R  N.;  2 /) If  D 8% and  D 16%, then VaR95% D


.0:08 1:64  0:16/  0:18; we are 95% sure that we will not loose more than 18% of
the investment.

Suppose we abandon MV preferences and instead choose to minimize the Value at


Risk—for a given mean return. With normally distributed returns, the value at risk (5.18)
is a strictly increasing function of the standard deviation (and the variance). Hence, min-
imizing the value at risk gives the same solution (portfolio weights) as minimizing the
variance. (However, it should be noted that the VaR approach is often used when data is
thought to be strongly non-normal.)

118
Another portfolio choice approach is to use the value at risk as a restriction. For
instance, the Telser criterion says that we should maximize the expected portfolio return
subject to the restriction that the value at risk (at some given probability level) does not
exceed a given level.
The restriction could be that the VaR95% should be less than 10% of the investment.
With a normal distribution, (5.18) says that the portfolio must be such that the mean and
standard deviation satisfy

.p 1:64p / < 0:1, or


p > 0:1 C 1:64p : (5.19)

The portfolio choice problem according to the Telser criterion is then to choose the
portfolio weights (vi ) to

maxvi p subject to p > 0:1 C 1:64p and ˙inD1 vi D 1: (5.20)

More generally, the Telser criterion is

maxvi p subject to p > VaR˛ c1 ˛ p and ˙inD1 vi D 1; (5.21)

where c1 ˛ is the 1 ˛ quantile of a N.0; 1/ distribution.


This problem is illustrated in Figure 5.10, for different VaR restrictions. Any point
above a line satisfies the respective restriction, and the issue is to pick the one with the
highest possible expected return—among those available. In particular, there are no port-
folios above the minimum-variance frontier (with or without a riskfree asset). A lower
VaR is, of course, a tougher restriction.
If the restriction intersects the minium-variance frontier, the solution is the highest
intersection point. This is indeed a point on the minimum-variance frontier, which shows
that the Telser criterion applied to normally distributed returns leads us to a minimum-
variance portfolio. If the restriction doesn’t intersect, then there is no solution to the
problem (the restriction is too demanding, the VaR too low).

119
Telser criterion
0.1
maximize expected return subject to VaR < 0.1
shaded area shows where VaR < 0.1
0.08

0.06
µ (mean)

0.04

0.02 MV (risky)
MV
−0.10 + 1.64σ
0
0 0.05 0.1 0.15
σ (std)

Figure 5.10: Telser criterion and VaR

5.3.2 Expected Shortfall

The expected shortfall is the expected loss when the return actually is below the VaR˛ .
For normally distributed returns, R  N.;  2 /, it can be shown that

.c1 ˛ /
ES˛ D C ; (5.22)
1 ˛
where ./ is the pdf or a N.0; 1/ variable.

Example 5.13 If  D 8% and  D 16%, the 95% expected shortfall is ES95% D 0:08C
 .1:64/=0:05  0:25.

Notice that the expected shortfall for a normally distributed return (5.22) is a strictly
increasing function of the standard deviation (and the variance). As for the VaR, this
means that minimizing expected shortfall at a given mean return therefore gives the same
solution (portfolio weights) as minimizing the variance at the same given mean return.

5.3.3 Lower Partial 2nd Moment

Reference: Bawa and Lindenberg (1977) and Nantell and Price (1979)

120
Using the variance (or standard deviation) as a measure of portfolio risk (as a mean-
variance investor does) fails to distinguish between the downside and upside. As an alter-
native, one could consider using a lower partial 2nd moment instead. It is defined as

p .h/ D EŒmin.Rp h; 0/2 ; (5.23)

where h is a “target level” chosen by the investor. In the subsequent analysis it will be set
equal to the riskfree rate.
Suppose investors preferences are such that they like high expected returns and dislike
the lower partial second moment with a target level equal to the riskfree rate (denoted p
to keep the notation brief), that is, if their expected utility can be written as

E U Rp D V .p ; p /, with @.p ; p /=@p > 0 and @.p ; p /=@p < 0: (5.24)


The results in Bawa and Lindenberg (1977) and Nantell and Price (1979) demonstrate
several important things. First, there is still a two-fund theorem: all investors hold a
combination of a market portfolio and the riskfree asset, so there is a capital market line
as in (5.13). See Figure 5.11 for an illustration (based on normally distributed returns,
which is not necessary). Second, there is still a beta representation as in (5.14), but where
the beta coefficient is different.
Third, in case the returns are normally distributed (or t-distributed), then the optimal
portfolios are also on the mean-variance frontier, and all the usual MV results hold. See
Figure 5.12 for a numerical illustration.
The basic reason is that p .h/ is increasing in the standard deviation (for a given
mean). This means that minimizing p .h/ at a given mean return gives exactly the same
solution (portfolio weights) as minimizing p (or p2 ) at the same given mean return.
As a result, with normally distributed returns, an investor who wants to minimize the
lower partial 2nd moment (at a given mean return) is behaving just like a mean-variance
investor.

Remark 5.14 (Lpm calculation for normally distributed variable ) For an N.;  2 / vari-
able, the lower partial 2nd moment around the target level h is

p .h/ D  2 a.a/ C  2 .a2 C 1/˚.a/, where a D .h /=;

while ./ and ˚./ are the pdf and cdf of a N.0; 1/ variable respectively. Notice that

121
Mean−target semivariance frontier
15

Normally distributed returns


10
Mean, %

E(R) 12.50 10.50 6.00


Std(R) 12.90 9.00 4.80

Correlation matrix:
5 1.00 0.33 0.45
0.33 1.00 0.05
0.45 0.05 1.00
Risky
Risky and riskfree
0
0 5 10 15
Target semivariance, %

Figure 5.11: Lower partial 2nd moment and expected returns

Std and mean


15
The markers for target semivariance (sv) indicate the std
of the portfolio that minimizes the target semivariance
at the given mean return

10
MV (risky)
Mean, %

MV (risky&riskfree)
target sv (risky)
target sv (risky&riskfree)
5

0
0 5 10 15
Std, %

Figure 5.12: Standard deviation and expected returns

p .h/ D  2 =2 for h D . It is straightforward to show that

@p .h/
D 2˚.a/;
@

122
so the lower partial moment is a strictly increasing function of the standard deviation.

5.4 Behavioural Finance

Reference: Elton, Gruber, Brown, and Goetzmann (2010) 18; Forbes (2009); Shefrin
(2005)
There is relatively little direct evidence on investor’s preferences (utility). For obvious
reasons, we can’t know for sure what people really like. The evidence we do have is
from two sources: “laboratory” experiments designed to elicit information about the test
subject’s preferences for risk, and a lot of indirect information.
The laboratory experiments are typically organized at university campuses (mostly by
psychologists and economists) and involve only small compensations—so the test sub-
jects are those students who really need the monetary compensation for taking part or
those that are interested in this type of psychological experiments. The results vary quite
a bit, but a main theme is that the main assumptions in utility-based portfolio choice might
be reasonable, but there are some important systematic deviations from these assumptions.
For instance, investors seem to be unwilling to realize losses, that is, to sell off assets
which they have made a loss on (often called the “disposition effect”). They also seem
to treat the investment problem much more on an asset-by-asset basis than suggested by
mean-variance analysis which pays a lot of attention to the covariance of assets (some-
times called mental accounting). Discounting appears to be non-linear in the sense that
discounting is higher when comapring to dates in the near future (today versus tomor-
row) than in the distant future (one year from now versus one year and a day from now).
Finally, the results seem to move towards tougher play as the experiments are repeated
and/or as more competition is introduced—although the experiments seldom converge to
ultra tough/egoistic behaviour (as typically assumed by utility theory).
The prospect theory (developed by Kahneman and Tversky) try to explain several
of these things by postulating that the utility function is concave over some reference
point (which may shift), but convex below it. This means that gains are treated in a risk
averse way, but losses in a risk loving way. For instance, after a loss (so we are below
the reference point) an asset looks less risky than after a gain—which might explain why
investors hold on to losing investments. Clearly, an alternative explanation is that investors
believe in mean-reversion (losing positions will recover, winning positions will fall back).

123
In general, it is hard to make a clear distinction between non-classical preferences and
(potentially distorted) beliefs.
In laboratory experiments (and studies of the properties of forecasts made by analysts),
several interesting results emerge on how investors seems to form expectations. First,
complex situations are often approached by treating them as a simplified representative
problem—even against better knowledge (often called “representativeness”)—and stands
in contrast to the idea of Bayesian learning where investors update and learn from their
mistakes. Second (and fairly similar), difficult problems are often handled as if they were
similar to some old/easy problem—and all that is required is a small modification of
the logic (called “anchoring”). Third, recent events/data are given much higher weight
than they typically warrant (often called “recency bias” or “availability”). Finally, most
forecasters seem to be overconfident: they overstate the precision of their own forecasts.
The indirect evidence is broadly in line with the implications of utility-based theory—
especially since the costs for holding well diversified portfolios have decreased (mutual
funds). However, there are clearly some systematic deviations from the theoretical im-
plications. For instance, many investors seem to be too little diversified. In particular,
many investors hold assets in companies/countries that are very strongly correlated to
their labour income (local bias). Moreover, diversification is often done in a naive fash-
ion and depend on the “menu” of choices. For instance, many pension savers seems to
diversify by putting the fraction 1=n in each of the n funds offered by the firm/bank—
irrespective of what kind of funds they are.
There are, of course, also large chunks of wealth invested for control reasons rather
than for a pure portfolio investment reason. In these cases, it is typically difficult to
disentangle (distorted) beliefs from non-traditional preferences. For instance, the aversion
of selling off bad investments, may equally well depend on a belief that past losers will
recover.

Bibliography
Bawa, V. S., and E. B. Lindenberg, 1977, “Capital market equilibrium in a mean-lower
partial moment framework,” Journal of Financial Economics, 5, 189–200.

Cochrane, J. H., 2001, Asset pricing, Princeton University Press, Princeton, New Jersey.

124
Danthine, J.-P., and J. B. Donaldson, 2002, Intermediate financial theory, Prentice Hall.

Elton, E. J., M. J. Gruber, S. J. Brown, and W. N. Goetzmann, 2010, Modern portfolio


theory and investment analysis, John Wiley and Sons, 8th edn.

Forbes, W., 2009, Behavioural finance, Wiley.

Huang, C.-F., and R. H. Litzenberger, 1988, Foundations for financial economics, Elsevier
Science Publishing, New York.

Ingersoll, J. E., 1987, Theory of financial decision making, Rowman and Littlefield.

Nantell, T. J., and B. Price, 1979, “An analytical comparison of variance and semivariance
capital market theories,” Journal of Financial and Quantitative Analysis, 14, 221–242.

Shefrin, H., 2005, A behavioral approach to asset pricing, Elsevier Academic Press,
Burlington, MA.

125
6 CAPM Extensions
Reference: Elton, Gruber, Brown, and Goetzmann (2010) 14 and 16

6.1 Nonmarketable Assets

This section discusses the portfolio problem when there are nonmarketable “assets.” For
instance, it often makes sense to treat labour income, social security payments and perhaps
also real estate as (more or less) nonmarketable.
The existence of nonmarketable assets will typically affect the choice among mar-
ketable assets and therefore also their prices—at least as long as the nonmarketable assets
are correlated with some marketable assets. The intuition is that the marketable assets
will (to some extent) be used to hedge against the risk of the nonmarketable assets.

6.1.1 Portfolio Choice with Nonmarketable Assets

To build a simple example, consider a mean-variance investor who can choose between a
riskfree asset (with return Rf ) and equity (with return R1 ), and who has an endowment
of a nonmarketable asset (with return RH ) as well. The investor’s portfolio problem is to
maximize
k
E U.Rp / D E.Rp / Var.Rp /; where (6.1)
2
Rp D vR1 C RH C .1 v /Rf
D vR1e C RH
e
C Rf : (6.2)

Note that  is the portfolio weight of the nonmarketable asset and 1  is the weight of
the financial portfolio (riskfree plus equity).
Use the budget constraint in the objective function to get (using the fact that Rf is
known)
k 2
E U.Rp / D ve1 C H
e
C Rf v 11 C  2 HH C 2v1H ; (6.3)

2

126
where 11 and HH are the variances of equity and the nonmarketable asset respectively,
and 1H is their covariance.
The first order condition for the weight on equity, v, is @ E U.Rp /=@v D 0, that is,

k
0 D e1 .2v11 C 21H / , so
2
e =k 1H
vD 1 : (6.4)
11
First, consider the case when the covariance is zero (1H D 0). Then, v is unaffected
by the amount of the nonmarketable assets, . In contrast, the weight on the riskfree asset,
1 v , would be decreasing in the amount of nonmarketable assets. In terms of the
portfolio weights of the financial (marketable) portfolio, equity has the weight v=.1 /

v e1 =k
D if 1H D 0: (6.5)
1  11 .1 /
This fraction is clearly increasing in the amount of nonmarketable assets, . The intuition
is that a zero covariance means that the nonmarketable asset is quite similar to a bond:
having more of a bond-like asset means that the financial portfolio is tilted away from
actual bonds. Formally, this shows up in the fact that the combined weight on the riskfree
and the nonmarketable asset, 1 v, is unaffected by the value of : an increase in the
amount of nonmarketable assets is matched by a corresponding decrease in the bond
holdings (and vice versa).
Second, consider a positive correlation of the nonmarketable asset and equity (1H >
0). This will make the investor hold relatively more bonds instead. For instance, consider
the extreme case when the correlation is unity and the nonmarketable asset has the same
volatility as equity (to simplify the algebra a bit). Then, (6.4) simplifies to

e1 =k
vC D if Corr.RH ; R1 / D 1 and 11 D HH . (6.6)
11
This shows that the combined weight of the risky asset and the nonmarketable asset is
unaffected by the endowment of the nonmarketable asset (): they are perfect substitutes.

Example 6.1 (Portfolio choice with human capital) Suppose k D 3; e1 D 0:08 and

127
11 D 0:22 , then (6.4) is

v v=.1 / when

0:08=3 0:08=3
Case A 0:04
D 2=3 0:04
D 2=3 D0
0:08=3 0:50
Case B 0:04
D 2=3 2=3=0:5 D 4=3  D 0:5 and 1H D 0
0:08=3 0:50:01
Case C 0:04
 0:54 0:54=0:5 D 1:08  D 0:5 and 1H D 0:01

Comparing cases A and B, we see that adding nonmarketable assets that are uncorrelated
with equity tilts the financial portfolio towards equity. Comparing cases B and C, we see
that this effect is less pronounced if the nonmarketable asset is positively correlated with
equity (it could also be completely overturned).
Example 6.2 (Portfolio choice of young and old) Consider the common portfolio advice
that young investors (with labour income) should invest relatively more in stocks than
old investors (without labour income). In this case, the nonmarketable asset is “human
capital,” that is, the present value of future labour income—and current labour income
can loosely be interpreted as its return. The analysis in the previous section suggests that
a low correlation of stock returns and wages means that the young investor is endowed
with a bond-like asset. His financial portfolio will therefore be tilted towards the risky
asset—compared to the old investor. (This intuition is strengthened by the fact that labour
income is typically a lot less volatile than equity returns.)
With several risky marketable assets we get

1
vD˙ .e =k SH / ; (6.7)

where ˙ is the covariance matrix of all marketable assets and SH is a vector of covari-
ances of the marketable assets with the nonmarketable asset.
Proof. (of (6.7)) The investor solves
k 0
maxv v 0 e C H
e
C Rf v ˙v C  2 HH C 2v 0 SH ;

2
with first order conditions

0 D e k .˙v C SH / , so
1
vD˙ .e =k SH / :

128
Remark 6.3 (Portfolio choice, two traded assets and one non-traded asset) With two
risky traded assets the investor maximizes E.Rp / k2 Var.Rp /, where Rp D v1 R1e C
v2 R2e C RHe
C Rf , that is

k 2
maxv1 ;v2 v1 e1 Cv2 e2 CH
e
CRf v1 11 C v22 22 C  2 HH C 2v1 v2 12 C 2v1 1H C 2v2 2H :

2
The first order conditions are

0 D e1 k Œv1 11 C v2 12 C 1H 


0 D e2 k Œv2 22 C v1 12 C 2H  ;

or " # " #" # " #


e1 11 12 v1 1H
Dk C k :
e2 12 22 v2 2H
The solution is
" # " # " # " #!
v1 1 22 12 e1 1 1H
D 2
 :
v2 11 22 12 12 11 e2 k 2H

Example 6.4 (Portfolio choice of a pharmaceutical engineer) In the previous remark,


suppose asset 1 is an index of pharmaceutical stocks, and asset 2 is the rest of the equity
market. Consider a person working as a pharmaceutical engineer: the covariance of her
labour with asset 1 is likely to be high, while the covariance with asset 2 might be fairly
small. This person should therefore tilt his financial portfolio away from pharmaceutical
stocks: the market portfolio is not the best for everyone.

6.1.2 Asset Pricing Implications of Nonmarketable Assets

The beta representation of expected returns is also affected by the existence of a nonmar-
ketable asset. Let Rm denote the market portfolio of the marketable assets (whose weights
are proportional to (6.7)). We then have

i m C  .iH i m /
ei D ˇQi em , where ˇQi D : (6.8)
mm C  .mH mm /

129
This coincides with the standard case when  D 0 (no nonmarketable asset) or when
both asset i and the market are uncorrelated with the nonmarketable asset. This expression
suggests one reason for why the traditional beta (against the market portfolio only) could
be biased. For instance, if the market is positively correlated with RH , but asset i is
negatively correlated with RH , then ˇQi is lower than the traditional beta.
Proof. ( of (6.8)) Divide the portfolio weights in (6.7) by 1  to get the weights of
the (financial) market portfolio, wm . For any portfolio with portfolio weights wp we have
the covariance with the market

pm D wp0 ˙ wm
D wp0 ˙˙ 1
.e =k SH / = .1 /
D pe = Œk .1 / pH = .1 / :

Apply this equation to the market return itself to get

mm D em = Œk .1 / mH = .1 / :

Combine these two equations as

pm C pH = .1 / pe


D e;
mm C mH = .1 / m

which can be rearranged as (6.8).


Notice that a standard CAPM regression of

Rie D ˛i C bi Rm
e
C "i ; (6.9)

would produce (in a very large sample) the traditional beta (bi D ˇ D i m =mm ) and a
non-zero intercept equal to
˛i D .ˇQi ˇi /em : (6.10)

A rejection of the null that the intercept is zero (a rejection of CAPM) could then be due to
the existence of nonmarketable assets. (There are clearly several other possible reasons.)
Proof. (of (6.10)) Take expectations of (6.9) to get ei D ˛i C ˇi em . From (6.8) we
then have ˇQi em D ˛i C ˇi em which gives (6.10).

Example 6.5 (Different betas) Suppose i m D 0:8; mm D 1; iH D 0:5, and mH D

130
0:5 (
0:8
D 0:8 if  D 0
ˇQi D 1
0:8C0:3. 0:5 1/
1C0:3.0:5 1/
D 0:41 if  D 0:3:
There is also another way to express the expected excess return of asset i —as a multi-
factor model (or multi-beta model).

ei D ˇi m em C ˇiH H


e
: (6.11)

In this case, the expected excess return on asset i depends on how it is related to both the
(financial) market and the nonmarketable asset. The key implication of (6.11) is that there
are two risk factors that influence the required risk premium of asset i : both the market
and the nonmarketable asset matter. The investor’s portfolio choice will typically depend
on the nonmarketable asset, which in turn will affect asset prices (and returns).
It may seem as if we now have a paradox: both the “adjusted” single-beta representa-
tion (6.8) and the multiple-beta representation (6.11) are supposedly true. Can that really
be the case—and how should we then test the model? Well, both expressions are true—but
there is a key difference: the betas in (6.11) could be estimated by a multiple regression,
whereas ˇQi in (6.8) could not.
Proof. ( of (6.11)) The first equation of the Proof of (6.8) can be written

pe =k D .1 / pm C pH (*)


" #
h i 
pm
D 1  
pH
" #" # 1 " #
i 
mm mH mm mH pm
h
D 1  
mH HH mH HH pH
" #" #
h i   ˇ
mm mH pm
D 1  
mH HH ˇpH
" #
h i ˇpm
D .1 / mm C mH .1 / mH C HH : (**)
ˇpH

The third line just multiplies and divides by the covariance matrix. The fourth line follows
from the usual definition of regression coefficients, ˇ D Var.x/ 1 Cov.x; y/.
Apply the first equation (*) on the market return and an asset with the same return
as the RH (this is a short cut, it would be more precise to use a “factor mimicking”

131
portfolio—it is just a bit more complicated). We then get

em =k D .1 / mm C mH and


e
H =k D .1 / mH C HH :

Use these to substitute for the row vector in (**) to get


" #
h i ˇ
pm
pe =k D em =k H e
=k ;
ˇpH

which is the same as (6.11).

6.2 Heterogenous Investors

This section gives a simple example of a model where the investors have different beliefs.
Recall the simple MV problem where investor i solves

max˛ Ei Rp Vari .Rp /ki =2; subject to (6.12)


e
Rp D ˛Rm C Rf : (6.13)

In these expressions, the expectations, variance, and the risk aversion parameter all carry
the subscript i to indicate that they may differ between investors. The solution is that the
weight on the risky asset is
1 Ei Rm e
˛i D ; (6.14)
ki Vari .Rme /

where Ei Rm e
is the investor’s expectation of the excess return of the risky asset and
Vari .Rm / the investor’s perceived variance.
e

If all investors have the same initial wealth, then the average (across investors) ˛i must
be unity—since the riskfree asset is in zero net supply. Suppose there are N investors,
then the average of (6.14) is
1 XN 1 Ei Rm e
1D : (6.15)
N i D1 ki Vari .R e /
m

This is an equilibrium condition that must hold. We consider a few illustrative special
cases.
First, suppose all investors have the same expectations and assessments of the vari-

132
ance, but different risk aversions, ki . Then, (6.15) can be rearranged as
1
E Rm
e
D kQ Var.Rm
e
/; where kQ D 1
PN 1
: (6.16)
N i D1 ki

This shows that the risk premium on the market is increasing in the volatility and k. Q The
latter is not the average risk aversion, but closely related to it. For instance, if all ki is
scaled up by a factor b so is kQ (and therefore the risk premium).

Example 6.6 (“Average” risk aversion) If half of the investors have k D 2 and the other
half has k D 3, then kQ D 2:4:

Second, suppose now that only the expected excess return is the same for all investors.
Then, (6.15) can be rearranged as
1
E Rm
e
D 1
PN 1
: (6.17)
e
i D1 ki Vari .Rm
N /

The market risk premium is now increasing in a complicated expression that is closely
related to a weighted average of the perceived market variances—where the weights are
increasing in the risk aversion. If all variances or risk aversions are scaled up by a factor
b so is the risk premium.
Third, suppose only the expected excess returns differ. Then, (6.15) can be rearranged
as
1 XN
Ei Rm
e
D k Var.Rm e
/: (6.18)
N i D1

Clearly, the average expected excess return is increasing in the risk aversion and variance.
To interpret this a bit more, let the return be the capital gain (assuming no dividend in the
next period), Rm D P t C1 =P t where the current period is t
 
1 XN P t C1
Ei Rf D k Var.Rm e
/ or (6.19)
N iD1 Pt
1 1 XN
Pt D Ei .P tC1 / : (6.20)
k Var.Rm
e /CR N
f i D1

This shows that today’s market price, P t , is simply the average expected future price—
scaled down by the risk aversion, volatility and the riskfree rate (to create a capital gain
to compensate for the risk and the alternative return).

133
These special cases suggest that, although the general expression (6.15) is compli-
cated, we are unlikely to commit serious errors by sticking to the formulation

E Rm
e
D k Var.Rm
e
/; (6.21)

as long as we interpret the components as (close to) averages across investors.

6.3 CAPM without a Riskfree Rate

This section states the main result for CAPM when there is no riskfree asset. It uses two
basic ingredients.
First, suppose investors behave as if they had mean-variance preferences, so they
choose portfolios on the mean-variance frontier (of risky assets only). Different investors
may have different portfolios, but they are all on the mean-variance frontier. The market
portfolio is a weighted average of these individual portfolios, and therefore itself on the
mean-variance frontier. (Linear combinations of efficient portfolios are also efficient.)
Second, consider the market portfolio. We know that we can find some other effi-
cient portfolio (denote it Rz ) that has a zero covariance (beta) with the market portfolio,
Cov.Rm ; Rz / D 0. (Such a portfolio can actually be found for any efficient portfolio, not
just the market portfolio.) Let vm be the portfolio weights of the market portfolio, and ˙
the variance-covariance matrix of all assets. Then, the portfolio weights vz that generate
Rz must satisfy vm 0
˙vz D 0 and vz0 1 D 1 (sum to unity). The intuition for how the
portfolio weights of the Rz assets is that some of the weights have the same sign as in the
market portfolio (contributing to a positive covariance) and some other have the opposite
sign compared to the market portfolio (contributing to a negative covariance). Together,
this gives a zero covariance.
See Figure 6.1 for an illustration.
The main result is then the “zero-beta” CAPM

E.Ri Rz / D ˇi E.Rm Rz /: (6.22)

Suppose we run the CAPM regression (6.9). We then get (in a very large sample)

˛i D .1 ˇi / ez ; (6.23)

134
MV frontier and zero beta model
0.12

0.1
Means:
0.08 0.09 0.06
Rm
Covariance matrix:
Mean

0.026 0.000
0.06 0.000 0.014

0.04 weights Rm 0.47 0.53


weights Rz −1.67 2.67
0.02
E(Rz) Rz
0
0 0.1 0.2 0.3 0.4 0.5
Std

Figure 6.1: Zero-beta model

where ez is the expected excess return (over the riskfree rate used in the CAPM regres-
sion). This suggests that a rejection of CAPM might be due to the fact that investors
cannot borrow and lend freely at a riskfree rate.
Proof. (of (6.23)) Subtract Rf from both sides of (6.22), then add and subtract .1
ˇi /Rf on the right hand side. Rearrange to get (6.23).
Proof. ( of (6.22)) An investor (with initial wealth equal to unity) chooses the portfo-
lio weights (vi ) to maximize

k
E U.Rp / D E.Rp / Var.Rp /; where
2
Rp D v1 R1 C v2 R2 and v1 C v2 D 1;

where we assume two risky assets. Combining gives the Lagrangian

k 2
L D v1 1 C v2 2 v1 11 C v22 22 C 2v1 v2 12 C .1

v1 v2 /:
2

135
The first order conditions (for v1 and v2 ) are that the partial derivatives equal zero

0 D @L=@v1 D 1 k .v1 11 C v2 12 / 


0 D @L=@v2 D 2 k .v2 22 C v1 12 / 
0 D @L=@ D 1 v1 v2

Notice that
1m D Cov.R1 ; v1 R1 C v2 R2 / D v1 11 C v2 12 ;
„ ƒ‚ …
Rm

and similarly for 2m . We can then rewrite the first order conditions as

0 D 1 k1m  (a)
0 D 2 k2m 
0D1 v1 v2

Take a weighted average of the first two equations with the weights v1 and v2 respectively

v1 1 C v2 2  D k .v1 1m C v2 2m /


m  D kmm ; (b)

which follows from the fact that

v1 1m C v2 2m D v1 Cov.R1 ; v1 R1 C v2 R2 / C v2 Cov.R2 ; v1 R1 C v2 R2 /


D Cov.v1 R1 C v2 R2 ; v1 R1 C v2 R2 /
D Var.Rm /:

Divide (a) by (b)

1  k1m
D or
m  kmm
1  D ˇ1 .m /

Applying this equation on a return Rz with a zero beta (against the market) gives.

z  D 0.m /, so we notice that  D z :

136
Combining the last two equations gives (6.22).

6.4 Multi-Factor Models and APT

6.4.1 Multi-Factor Models

A multi-factor model extends the market model by allowing more factors to explain the
return on an asset. In terms of excess returns it could be

Rie D ˇi m Rme
C ˇiF RFe C "i , where E."i / D 0; Cov.Rm
e
; "i / D 0; Cov.RFe ; "i / D 0:
(6.24)
The pricing implication is a multi-beta model

ei D ˇi m em C ˇiF eF : (6.25)

Remark 6.7 (When factors are not excess returns) This formulation assumes that the
factor can be expressed as an excess return—but that is not necessary. For instance, it
could be that the second factor is a macro variable like inflation surprises. Then there are
two possible ways to proceed. First, find that portfolio which mimics the movements in the
inflation surprises best and use the excess return of that (factor mimicking) portfolio in
(6.24) and (6.25). Second, we could instead reformulate the model by adding an intercept
in (6.25) and let RFe denote whatever the factor is (not necessarily an excess return) and
then estimate the factor risk premium, corresponding to eF in (6.25), by using a cross-
section of different assets (i D 1; 2; : : :).

We have already seen one theoretical multi-factor model: the “CAPM with nonmar-
ketable assets” in (6.11). The consumption-based model (discussed later on) gives another
example. There are also several empirically motivated multi-factor models, that is, em-
pirical models that have been found to work well (even if the theoretical foundation might
be a bit weak).
Fama and French (1993) estimate a multi-factor model and show that it performs much
better than CAPM. The three factors are: the market return, the return on a portfolio of
small stocks minus the return on a portfolio of big stocks, and the return on a portfolio
with a high ratio of book value to market value minus the return on a portfolio with a low
ratio. He and Ng (1994) try to relate these factors to macroeconomic series.

137
The multi-factor model by MSCIBarra is widely used in the financial industry. It
uses a set of firm characteristics (rather than macro variables) as factors, for instance,
size, volatility, price momentum, and industry/country (see Stefek (2002)). This model is
often used to value firms without a price history (for instance, before an IPO) or to find
mispriced assets.
The APT model (see below) is another motivation for why a multi-factor model may
make sense. Finally, consumption-based models typically also suggest multi-factor mod-
els (in terms of macro variables).

6.4.2 The Arbitrage Pricing Model

The first assumption of the Arbitrage Pricing Theory (APT) is that the return of asset i
can be described as

Ri t D ai C ˇi f t C "i;t , where E "i t D 0; Cov."i t ; f t / D Cov."i t ; "jt / D 0: (6.26)

In this particular formulation there is only one factor, f t , but the APT allows for more
factors. Notice that (6.26) assumes that any correlation of two assets (i and j ) is due to
movements in f t —the residuals are assumed to be uncorrelated. This is clearly an index
model (here a single index).
The second assumption of APT is that there are financial markets are very well developed—
so well developed that it is possible to form portfolios that “insure” against almost all
possible outcomes. To be precise, the assumption is that it is possible to form a zero
cost portfolio (buy some, sell some) that has a zero sensitivity to the factor and also (al-
most) no idiosyncratic risk. In essence, this assumes that we can form a (non-trivial)
zero-cost portfolio of the risky assets that is riskfree. In formal terms, the assumption is
that there is a non-trivial portfolio (with the value vj of the position in asset j ) such that
N
˙iD1 vi D ˙iND1 vi ˇi D 0 and ˙iND1 vi2 Var."i;t /  0. The requirement that the portfolio
is non-trivial means that at least some vj ¤ 0.
Together, these assumptions imply that (the proof isn’t all that simple) for well diver-
sified portfolios we have
E Ri t D Rf C ˇi ; (6.27)

where  is (typically) an unknown constant. The important feature is that there is a linear
relation between the risk premium (expected excess return) of an asset and its beta. This

138
expression generalizes to the multi-factor case.

Example 6.8 (APT with three assets) Suppose there are three well-diversified portfolios
(that is, with no residual) with the following factor models

R1;t D 0:01 C 1f t
R2;t D 0:01 C 0:25f t , and
R3;t D 0:01 C 2f t :

APT then holds if there is a portfolio with vi invested in asset i, so that the cost of the
portfolio is zero (which implies that the weights must be of the form v1 , v2 , and v1 v2
respectively) such that the portfolio has zero sensitivity to f t , that is

0 D v1  1 C v2  0:25 C . v1 v2 /  2
D v1  .1 2/ C v2  .0:25 2/
D v1 v2  1:75:

There is clearly an infinite number of such weights but they all obey the relation v1 D
v2  1:75. Notice the requirement that there is no idiosyncratic volatility is (here) satis-
fied by assuming that none of the three portfolios have any idiosyncratic noise.

Example 6.9 (APT with two assets) Example 6.8 would not work if we only had the first
two assets. To see that, the portfolio would then have to be of the form (v1 ; v1 ) and it is
clear that v1  1 v1  0:25 D v1 .1 0:25/ ¤ 0 for any non-trivial portfolio (that is,
with v1 ¤ 0).

One of the main drawbacks with APT is that it is silent about both the number of
factors and their definition. In many empirical implications, the factors—or the factor
mimicking portfolios—are found by some kind of statistical method. The idea is (typi-
cally) to find that combination of some given assets that explain most of the covariance
of the same assets. Then, we find the next combination of the same assets that is uncor-
related with the first combination but also explain as much as possible of the (remaining)
covariance—and so forth. A few such factors are often enough to account for most of
the covariance. Still, the factors have no particular economic interpretation, and it is not

139
possible to guess what the betas ought to be. To do that, we have to get back to the multi-
factor model. For instance. CAPM gives the same type of implication as (6.27)—except
that CAPM identifies  as the expected excess return on the market.

6.5 Joint Portfolio and Savings Choice

6.5.1 Two-Period Problem

The basic consumption-based multi-period problem postulates that the investor derives
utility from consumption in every period and that the utility in one period is additively
separable from the utility in other periods. For instance, if the investor plans for 2 periods
(labelled 1 and 2), then he/she chooses the amount invested in different assets to maximize
expected utility

max u.C1 / C ı E1 u.C2 /, subject to (6.28)


C1 C I1 D W1 (6.29)
C2 C I2 D 1 C v1 R1e C v2 R2e C Rf I1 : (6.30)


In equation (6.28) C t is consumption in period t . The current period (when the portfo-
lio is chosen) is period 1—so all expectations are made on the basis of the information
available in period 1. The constant ı is the time discounting, with 0 < ı < 1 indicat-
ing impatience. (In equilibrium without risk, we will get a positive real interest rate if
investors are impatient.)
Equation (6.29) is the budget constraint for period 1: an initial wealth at the beginning
of period 1, W1 , is split between consumption, C1 , and investment, I1 . Equation (6.30)
is the budget constraint for period 2: consumption plus investment must equal the wealth
at the beginning of period 2. It is clear that I2 D 0 since investing in period 2 is the
same as wasting resources. The wealth at the beginning of period 2 equals the investment
in period 1, I1 , times the gross portfolio return—which in turn depends on the portfolio
weights chosen in period 1 (v1 and v2 ) as well as on the returns on the assets (from holding
them from period 1 to period 2).
Use the budget constraints and I2 D 0 to substitute for C1 and C2 in (6.28) to get

max u .W1 I1 / C ı E1 u 1 C v1 R1e C v2 R2e C Rf I1 : (6.31)


  

140
The decision variables in period 1 are how much to invest, I1 , (which implicitly defines
how much we consume in period 1), and the portfolio weights v1 and v2 .
The first order condition for I1 is

u0 .C1 / C ı E1 u0 .C2 / 1 C v1 R1e C v2 R2e C Rf D 0; (6.32)


 

where u0 .C t / is the marginal utility in period t . (In this expression, the consumption
levels is substituted back—in order to facilitate the interpretation.) This says that con-
sumption should be planned so that the marginal loss of utility from decreasing C1 equals
the discounted expected marginal gain of utility from increasing C2 by the gross return of
the money saved.
The first order conditions for v1 and v2 are

E1 u0 .C2 /R1e D 0 and (6.33)


 

E1 u0 .C2 /R2e D 0; (6.34)


 

which say that both excess returns should be orthogonal to marginal utility. To solve for
the decision variables (I1 ; v1 ; v2 ) we should use the budget restrictions (6.29) and (6.30)
to substitute for C1 and C2 in (6.32), (6.33) and (6.34)—and then solve the three equations
for the three unknowns. There are typically no explicit solutions, so numerical solutions
are the best we can hope for.
The first order conditions still contain some useful information. In particular, recall
that, by definition, Cov.x; y/ D E.xy/ E.x/ E.y/, so (6.33) can be written

Cov u0 .C2 /; R1e C E u0 .C2 / E.R1e / D 0 or


   

Cov u0 .C2 /; R1e


 
E.R1e / D : (6.35)
E Œu0 .C2 /
This says that asset 1 will have a high risk premium (expected excess return) if it is
negatively correlated with marginal utility, that is, if it tends to have a high return when the
need is low. Since marginal utility is decreasing in consumption (concave utility function),
this is the same as saying that assets that tend to have high returns when consumption
is high (and vice versa) will be considered risky assets—and therefore carry large risk
premia.
Although these results were derived from a two-period problem, it can be shown that a

141
Utility function with tangents Marginal utility

Consumption Consumption

Figure 6.2: Utility function

problem with more periods gives the same first-order conditions. In this case, the objective
function is
u.C1 / C ı E1 u.C2 / C ı 2 E1 u.C3 / C : : : ı T 1 E1 u.CT /: (6.36)

6.5.2 From a Consumption-Based Model to CAPM

Suppose marginal utility is an affine function of the market excess return

u0 .C2 / D a e
bRm , with b > 0: (6.37)

This would, for instance, be the case in a Lucas model where consumption equals the
market return and the utility function is quadratic–but it could be true in other cases as
well. We can then write (6.35) as

Cov Rm e
; R1e

E.R1 / D b
e
: (6.38)
E a bRm e

We can, of course, apply this expression to the market excess return (instead of asset 1) to
get
Var Rm e

E.Rm / D b
e
: (6.39)
E a bRm e

Use (6.39) in (6.38) to substitute E.Rme


/= Var Rm e
for b= E a bRm
e
 

Cov e e

R m ; R
E.R1e / D  1 E.Rm e
/; (6.40)
Var Rm e

142
which is the beta representation of CAPM.

6.5.3 From a Consumption-Based Model to a Multi-Factor Model

The consumption-based model may not look like a factor model, but it could easily be
written as one. The idea is to assume that marginal utility is a linear function of some key
macroeconomic variables, for instance, output and interest rates

u0 .C2 / D ay C bi: (6.41)

Such a formulation makes a lot of sense in most macro models—at least as an approxi-
mation. It is then possible to write (6.35) as

a Cov y; R1e C b Cov i; R1e


 
E.R1 / D
e
: (6.42)
E .ay C bi/
This, in turn, is easily put in the form of (6.25), where the risk premium on asset 1 depends
on the betas against GDP and the interest rate. (See the proof of (6.11) for an idea of how
to construct this beta representation.)

Bibliography
Elton, E. J., M. J. Gruber, S. J. Brown, and W. N. Goetzmann, 2010, Modern portfolio
theory and investment analysis, John Wiley and Sons, 8th edn.

Fama, E. F., and K. R. French, 1993, “Common risk factors in the returns on stocks and
bonds,” Journal of Financial Economics, 33, 3–56.

He, J., and L. Ng, 1994, “Economic forces and the stock market,” Journal of Business, 4,
599–609.

Stefek, D., 2002, “The Barra integrated model,” Barra Research Insight.

143
7 Testing CAPM and Multifactor Models
Reference: Elton, Gruber, Brown, and Goetzmann (2010) 15
More advanced material is denoted by a star ( ). It is not required reading.

7.1 Market Model

The basic implication of CAPM is that the expected excess return of an asset (ei ) is
linearly related to the expected excess return on the market portfolio (em ) according to

Cov .Ri ; Rm /
ei D ˇi em , where ˇi D : (7.1)
Var .Rm /
Let Riet D Ri t Rf t be the excess return on asset i in excess over the riskfree asset,
and let Rmt
e
be the excess return on the market portfolio. CAPM with a riskfree return
says that ˛i D 0 in

Riet D ˛i C bi Rmt
e
C "i t , where E "i t D 0 and Cov.Rmt
e
; "i t / D 0: (7.2)

The two last conditions are automatically imposed by LS. Take expectations to get

E Riet D ˛i C bi E Rmt
e
(7.3)
 
:

Notice that the LS estimate of bi is the sample analogue to ˇi in (7.1). It is then clear that
CAPM implies that ˛i D 0, which is also what empirical tests of CAPM focus on.
This test of CAPM can be given two interpretations. If we assume that Rmt is the
correct benchmark (the tangency portfolio for which (7.1) is true by definition), then it
is a test of whether asset Ri t is correctly priced. This is typically the perspective in
performance analysis of mutual funds. Alternatively, if we assume that Ri t is correctly
priced, then it is a test of the mean-variance efficiency of Rmt . This is the perspective of
CAPM tests.
The t-test of the null hypothesis that ˛i D 0 uses the fact that, under fairly mild

144
conditions, the t-statistic has an asymptotically normal distribution, that is

˛O i d
! N.0; 1/ under H0 W ˛i D 0: (7.4)
Std.˛O i /
Note that this is the distribution under the null hypothesis that the true value of the inter-
cept is zero, that is, that CAPM is correct (in this respect, at least).
The test assets are typically portfolios of firms with similar characteristics, for in-
stance, small size or having their main operations in the retail industry. There are two
main reasons for testing the model on such portfolios: individual stocks are extremely
volatile and firms can change substantially over time (so the beta changes). Moreover,
it is of interest to see how the deviations from CAPM are related to firm characteristics
(size, industry, etc), since that can possibly suggest how the model needs to be changed.
The results from such tests vary with the test assets used. For US portfolios, CAPM
seems to work reasonably well for some types of portfolios (for instance, portfolios based
on firm size or industry), but much worse for other types of portfolios (for instance, port-
folios based on firm dividend yield or book value/market value ratio). Figure 7.1 shows
some results for US industry portfolios.

7.1.1 Interpretation of the CAPM Test

Instead of a t-test, we can use the equivalent chi-square test

˛O i2 d
! 21 under H0 : ˛i D 0: (7.5)
Var.˛O i /
Tables (A.2)–(A.1) list critical values for t- and chi-square tests
It is quite straightforward to use the properties of minimum-variance frontiers (see
Gibbons, Ross, and Shanken (1989), and also MacKinlay (1995)) to show that the test
statistic in (7.5) can be written

˛O i2 .SRc /2 .SRm /2
D ; (7.6)
Var.˛O i / Œ1 C .SRm /2 =T
where SRm is the Sharpe ratio of the market portfolio (as before) and SRc is the Sharpe
ratio of the tangency portfolio when investment in both the market return and asset i is
possible. (Recall that the tangency portfolio is the portfolio with the highest possible
Sharpe ratio.) If the market portfolio has the same (squared) Sharpe ratio as the tangency

145
US industry portfolios, 1970:1−2009:12 US industry portfolios, 1970:1−2009:12
15 15
Mean excess return

Mean excess return


10 10
D D
A A
H GC HG
5 I F JB E 5 I F CJ E
B

Excess market return: 5.2%


0 0
0 0.5 1 1.5 0 5 10 15
β Predicted mean excess return (with α=0)

alpha pval StdErr CAPM


all NaN 0.09 NaN Factor: US market
A (NoDur) 3.45 0.02 8.97 α and StdErr are in annualized %
B (Durbl) −1.29 0.53 13.29
C (Manuf) 0.71 0.49 6.41
D (Enrgy) 4.41 0.06 14.75
E (HiTec) −1.61 0.41 12.27
F (Telcm) 1.32 0.47 11.39
G (Shops) 1.17 0.46 9.93
H (Hlth ) 2.10 0.25 11.76
I (Utils) 2.71 0.16 11.87
J (Other) −0.42 0.72 7.24

Figure 7.1: CAPM regressions on US industry indices

portfolio of the mean-variance frontier of Ri t and Rmt (so the market portfolio is mean-
variance efficient also when we take Ri t into account) then the test statistic, ˛O i2 = Var.˛O i /,
is zero—and CAPM is not rejected.
Proof. ( Proof of (7.6)) From the CAPM regression (7.2) we have
" # " # " # " #
Riet ˇi2 m2 C Var."i t / ˇi m2 ei ˛i C ˇi em
Cov e
D , and D :
Rmt ˇi m2 m2 em em

Suppose we use this information to construct a mean-variance frontier for both Ri t and
Rmt , and we find the tangency portfolio, with excess return Rct
e
. It is straightforward to
show that the square of the Sharpe ratio of the tangency portfolio is e0 ˙ 1 e , where
e is the vector of expected excess returns and ˙ is the covariance matrix. By using the
covariance matrix and mean vector above, we get that the squared Sharpe ratio for the

146
tangency portfolio, e0 ˙ 1
e , (using both Ri t and Rmt ) is
2  e 2
ec ˛i2

m
D C ;
c Var."i t / m
which we can write as
˛i2
.SRc /2 D C .SRm /2 :
Var."i t /
Combine this with (7.8) which shows that Var.˛O i / D Œ1 C .SRm /2  Var."i t /=T .
This is illustrated in Figure 7.2 which shows the effect of adding an asset to the invest-
ment opportunity set. In this case, the new asset has a zero beta (since it is uncorrelated
with all original assets), but the same type of result holds for any new asset. The basic
point is that the market model tests if the new assets moves the location of the tangency
portfolio. In general, we would expect that adding an asset to the investment opportunity
set would expand the mean-variance frontier (and it does) and that the tangency portfolio
changes accordingly. However, the tangency portfolio is not changed by adding an asset
with a zero intercept. The intuition is that such an asset has neutral performance com-
pared to the market portfolio (obeys the beta representation), so investors should stick to
the market portfolio.

7.1.2 Econometric Properties of the CAPM Test

A common finding from Monte Carlo simulations is that these tests tend to reject a true
null hypothesis too often when the critical values from the asymptotic distribution are
used: the actual small sample size of the test is thus larger than the asymptotic (or “nom-
inal”) size (see Campbell, Lo, and MacKinlay (1997) Table 5.1). The practical conse-
quence is that we should either used adjusted critical values (from Monte Carlo or boot-
strap simulations)—or more pragmatically, that we should only believe in strong rejec-
tions of the null hypothesis.
To study the power of the test (the frequency of rejections of a false null hypothesis)
we have to specify an alternative data generating process (for instance, how much extra
return in excess of that motivated by CAPM) and the size of the test (the critical value to
use). Once that is done, it is typically found that these tests require a substantial deviation
from CAPM and/or a long sample to get good power. The basic reason for this is that asset
returns are very volatile. For instance, suppose that the standard OLS assumptions (iid

147
MV frontiers before and after (α=0) MV frontiers before and after (α=0.05)

Solid curves: 2 assets,


0.1 0.1
Dashed curves: 3 assets
Mean

Mean
0.05 0.05

0 0
0 0.05 0.1 0.15 0 0.05 0.1 0.15
Std Std

The new asset has the abnormal return α


compared to the market (of 2 assets)
MV frontiers before and after (α=−0.04) Means 0.0800 0.0500 α + β(ERm−Rf)
Cov 0.0256 0.0000 0.0000
0.1
matrix 0.0000 0.0144 0.0000
0.0000 0.0000 0.0144
Mean

0.05
Tang N=2 α=0 α=0.05 α=−0.04
portf 0.47 0.47 0.31 0.82
0.53 0.53 0.34 0.91
0 NaN 0.00 0.34 −0.73
0 0.05 0.1 0.15
Std

Figure 7.2: Effect on MV frontier of adding assets

residuals that are independent of the market return) are correct. Then, it is straightforward
to show that the variance of Jensen’s alpha is
" #
.em /2
Var.˛O i / D 1 C  Var."i t /=T (7.7)
Var Rm e

D Œ1 C .SRm /2  Var."i t /=T; (7.8)

where SRm is the Sharpe ratio of the market portfolio. We see that the uncertainty about
the alpha is high when the residual is volatile and when the sample is short, but also when
the Sharpe ratio of the market is high. Note that a large market Sharpe ratio means that
the market asks for a high compensation for taking on risk. A bit uncertainty about how
risky asset i is then translates in a large uncertainty about what the risk-adjusted return
should be.

148
Example 7.1 Suppose we have monthly data with b ˛ i D 0:2% (that is, 0:2%  12 D 2:4%
p
per year), Std ."i t / D 3% (that is, 3%  12  10% per year) and a market Sharpe ratio
p
of 0:15 (that is, 0:15  12  0:5 per year). (This corresponds well to US CAPM
regressions for industry portfolios.) A significance level of 10% requires a t-statistic (7.4)
of at least 1.65, so
0:2
p p  1:65 or T  626:
1 C 0:152 3= T
We need a sample of at least 626 months (52 years)! With a sample of only 26 years (312
months), the alpha needs to be almost 0.3% per month (3.6% per year) or the standard
deviation of the residual just 2% (7% per year). Notice that cumulating a 0.3% return
over 25 years means almost 2.5 times the initial value.

Proof. ( Proof of (7.8)) Consider the regression equation y t D x t0 b C " t . With iid
errors that are independent of all regressors (also across observations), the LS estimator,
bOLs , is asymptotically distributed as
p d
T .bOLs b/ ! N.0;  2 ˙xx1 /, where  2 D Var." t / and ˙xx D plim˙ tD1
T
x t x t0 =T:

When the regressors are just a constant (equal to one) and one variable regressor, f t , so
x t D Œ1; f t 0 , then we have
" # " #
PT 1 T 1 f t 1 E f t
˙xx D E t D1 x t x t0 =T D E D , so
P
T t D1 f t f t2 E f t E f t2
" # " #
 2
E f t
2
E f t  2
Var.f t / C .E f t /2
E f t
 2 ˙xx1 D D :
E f t2 .E f t /2 E ft 1 Var.f t / E ft 1

(In the last line we use Var.f t / D E f t2 .E f t /2 :)

7.1.3 Several Assets

In most cases there are several (n) test assets, and we actually want to test if all the ˛i (for
i D 1; 2; :::; n) are zero. Ideally we then want to take into account the correlation of the
different alphas.
While it is straightforward to construct such a test, it is also a bit messy. As a quick
way out, the following will work fairly well. First, test each asset individually. Second,
form a few different portfolios of the test assets (equally weighted, value weighted) and

149
test these portfolios. Although this does not deliver one single test statistic, it provides
plenty of information to base a judgement on. For a more formal approach, see Section
7.1.4.
A quite different approach to study a cross-section of assets is to first perform a CAPM
regression (7.2) and then the following cross-sectional regression
T
Riet =T D C ˇOi C ui ;
X
(7.9)
t D1

where TtD1 Riet =T is the (sample) average excess return on asset i . Notice that the es-
P

timated betas are used as regressors and that there are as many data points as there are
assets (n).
There are severe econometric problems with this regression equation since the regres-
sor contains measurement errors (it is only an uncertain estimate), which typically tend
to bias the slope coefficient towards zero. To get the intuition for this bias, consider an
extremely noisy measurement of the regressor: it would be virtually uncorrelated with the
dependent variable (noise isn’t correlated with anything), so the estimated slope coeffi-
cient would be close to zero.
If we could overcome this bias (and we can by being careful), then the testable im-
plications of CAPM is that D 0 and that  equals the average market excess return.
We also want (7.9) to have a high R2 —since it should be unity in a very large sample (if
CAPM holds).

7.1.4 Several Assets: SURE Approach

This section outlines how we can set up a formal test of CAPM when there are several
test assets.
For simplicity, suppose we have two test assets. Stack (7.2) for the two equations are

e e
R1t D ˛1 C b1 Rmt C "1t ; (7.10)
e e
R2t D ˛2 C b2 Rmt C "2t (7.11)

where E "i t D 0 and Cov.Rmt e


; "i t / D 0. This is a system of seemingly unrelated regres-
sions (SURE)—with the same regressor (see, for instance, Wooldridge (2002) 7.7). In
this case, the efficient estimator (GLS) is LS on each equation separately. Moreover, the

150
covariance matrix of the coefficients is particularly simple.
To see what the covariances of the coefficients are, write the regression equation for
asset 1 (7.10) on a traditional form
" # " #
e 1 ˛1
R1t D x t0 ˇ1 C "1t , where x t D e
; ˇ1 D ; (7.12)
Rmt b1

and similarly for the second asset (and any further assets).
Define
XT XT
˙O xx D x t x t0 =T , and O ij D "Oi t "Ojt =T; (7.13)
t D1 tD1

where "Oi t is the fitted residual of asset i . The key result is then that the (estimated)
asymptotic covariance matrix of the vectors ˇOi and ˇOj (for assets i and j ) is

Cov.ˇOi ; ˇOj / D O ij ˙O xx1 =T: (7.14)

(In many text books, this is written O ij .X 0 X/ 1 .)


The null hypothesis in our two-asset case is

H0 W ˛1 D 0 and ˛2 D 0: (7.15)

In a large sample, the estimator is normally distributed (this follows from the fact that
the LS estimator is a form of sample average, so we can apply a central limit theorem).
Therefore, under the null hypothesis we have the following result. From (7.8) we know
that the upper left element of ˙xx1 =T equals Œ1 C .SRm /2 =T . Then
" # " # " # !
˛O 1 0 11 12
N ; Œ1 C .SRm /2 =T (asymptotically). (7.16)
˛O 2 0 12 22

In practice we use the sample moments for the covariance matrix. Notice that the zero
means in (7.16) come from the null hypothesis: the distribution is (as usual) constructed
by pretending that the null hypothesis is true. In practice we use the sample moments for
the covariance matrix. Notice that the zero means in (7.16) come from the null hypothesis:
the distribution is (as usual) constructed by pretending that the null hypothesis is true.
We can now construct a chi-square test by using the following fact.

Remark 7.2 If the n  1 vector y  N.0; ˝/, then y 0 ˝ 1


y  2n .

151
To apply this, form the test static
" #0 " # 1 " #
˛O 1 11 12 ˛O 1
T Œ1 C .SRm /2  1
 22 : (7.17)
˛O 2 12 22 ˛O 2

This can also be transformed into an F test, which might have better small sample prop-
erties.

7.1.5 Representative Results of the CAPM Test

One of the more interesting studies is Fama and French (1993) (see also Fama and French
(1996)). They construct 25 stock portfolios according to two characteristics of the firm:
the size (by market capitalization) and the book-value-to-market-value ratio (BE/ME). In
June each year, they sort the stocks according to size and BE/ME. They then form a 5  5
matrix of portfolios, where portfolio ij belongs to the i th size quintile and the j th BE/ME
quintile. Tables 7.1–7.2 summarize some basic properties of these portfolios.

Book value/Market value


1 2 3 4 5
Size 1 5:0 10:7 11:4 13:2 16:0
2 4:4 8:3 10:8 10:6 12:1
3 4:8 8:4 8:7 10:4 12:2
4 6:0 6:6 8:5 9:5 10:1
5 5:0 6:6 6:6 6:4 8:4

Table 7.1: Mean excess returns (annualised %), US data 1957:1–2009:12. Size 1: smallest
20% of the stocks, Size 5: largest 20% of the stocks. B/M 1: the 20% of the stocks with
the smallest ratio of book to market value (growth stocks). B/M 5: the 20% of the stocks
with the highest ratio of book to market value (value stocks).

They run a traditional CAPM regression on each of the 25 portfolios (monthly data
1963–1991)—and then study if the expected excess returns are related to the betas as they
should according to CAPM (recall that CAPM implies E Riet D ˇi  where  is the risk
premium (excess return) on the market portfolio).
However, it is found that there is almost no relation between E Riet and ˇi (there is
a cloud in the ˇi  E Riet space, see Cochrane (2001) 20.2, Figure 20.9). This is due
to the combination of two features of the data. First, within a BE/ME quintile, there is

152
Book value/Market value
1 2 3 4 5
Size 1 1:4 1:2 1:1 1:0 1:0
2 1:4 1:2 1:1 1:0 1:2
3 1:4 1:1 1:0 1:0 1:1
4 1:3 1:1 1:0 1:0 1:1
5 1:1 1:0 1:0 0:9 0:9

Table 7.2: Beta against the market portfolio, US data 1957:1–2009:12. Size 1: smallest
20% of the stocks, Size 5: largest 20% of the stocks. B/M 1: the 20% of the stocks with
the smallest ratio of book to market value (growth stocks). B/M 5: the 20% of the stocks
with the highest ratio of book to market value (value stocks).

Histogram of small growth stocks Histogram of large value stocks


0.1 0.1
mean, std: mean, std:
0.42 8.43 0.70 4.96

0.05 0.05

0 0
−20 −10 0 10 20 −20 −10 0 10 20
Monthly excess return, % Monthly excess return, %
Monthly data on two U.S. indices, 1957:1−2009:12
Sample size: 636

Figure 7.3: Comparison of small growth stock and large value stocks

a positive relation (across size quantiles) between E Riet and ˇi —as predicted by CAPM
(see Cochrane (2001) 20.2, Figure 20.10). Second, within a size quintile there is a negative
relation (across BE/ME quantiles) between E Riet and ˇi —in stark contrast to CAPM (see
Cochrane (2001) 20.2, Figure 20.11).
Figure 7.1 shows some results for US industry portfolios and Figures 7.4–7.6 for US
size/book-to-market portfolios.

153
18

16

14
Mean excess return, %

12

10

6 US data 1957:1−2009:12
25 FF portfolios (B/M and size)
4 p−value for test of model: 0.00

4 6 8 10 12 14 16 18
Predicted mean excess return (CAPM), %

Figure 7.4: CAPM, FF portfolios

7.1.6 Representative Results on Mutual Fund Performance

Mutual fund evaluations (estimated ˛i ) typically find (i) on average neutral performance
(or less: trading costs&fees); (ii) large funds might be worse; (iii) perhaps better perfor-
mance on less liquid (less efficient?) markets; and (iv) there is very little persistence in
performance: ˛i for one sample does not predict ˛i for subsequent samples (except for
bad funds).

7.2 Several Factors

In multifactor models, (7.2) is still valid—provided we reinterpret bi and Rmt


e
as vectors,
so bi Rmt
e
stands for bi o Rot
e e
C bip Rpt C :::

e
Riet D ˛ C bi o Rot e
C bip Rpt C ::: C "i t : (7.18)

154
18

16

14
Mean excess return, %

12

lines connect same size


10

8
1 (small)
6 2
3
4
4 5 (large)

4 6 8 10 12 14 16 18
Predicted mean excess return (CAPM), %

Figure 7.5: CAPM, FF portfolios

In this case, (7.2) is a multiple regression, but the test (7.4) still has the same form (the
standard deviation of the intercept will be different, though).
Fama and French (1993) also try a multi-factor model. They find that a three-factor
model fits the 25 stock portfolios fairly well (two more factors are needed to also fit the
seven bond portfolios that they use). The three factors are: the market return, the return
on a portfolio of small stocks minus the return on a portfolio of big stocks (SMB), and
the return on a portfolio with high BE/ME minus the return on portfolio with low BE/ME
(HML). This three-factor model is rejected at traditional significance levels, but it can
still capture a fair amount of the variation of expected returns (see Cochrane (2001) 20.2,
Figures 20.12–13).
Chen, Roll, and Ross (1986) use a number of macro variables as factors—along with
traditional market indices. They find that industrial production and inflation surprises are
priced factors, while the market index might not be.
Figure 7.7 shows some results for the Fama-French model on US industry portfolios
and Figures 7.8–7.10 on the 25 Fama-French portfolios.

155
18

16

14
Mean excess return, %

12

lines connect same B/M


10

8
1 (low)
6 2
3
4
4 5 (high)

4 6 8 10 12 14 16 18
Predicted mean excess return (CAPM), %

Figure 7.6: CAPM, FF portfolios

alpha pval StdErr


US industry portfolios, 1970:1−2009:12 all NaN 0.00 NaN
15 A (NoDur) 2.32 0.10 8.71
B (Durbl) −5.17 0.01 12.03
Mean excess return

C (Manuf) −0.47 0.63 6.12


10 D (Enrgy) 3.38 0.15 14.26
D E (HiTec) 2.02 0.22 10.22
A F (Telcm) 1.00 0.59 11.16
H G
5 E F I C J G (Shops) 0.47 0.77 9.82
B
H (Hlth ) 4.55 0.01 11.01
I (Utils) −0.16 0.93 10.58
0 J (Other) −2.84 0.01 6.19
0 5 10 15
Predicted mean excess return

Fama−French model
Factors: US market, SMB (size), and HML (book−to−market)
α and StdErr are in annualized %

Figure 7.7: Fama-French regressions on US industry indices

156
18

16

14
Mean excess return, %

12

10

6 US data 1957:1−2009:12
25 FF portfolios (B/M and size)
4 p−value for test of model: 0.00

4 6 8 10 12 14 16 18
Predicted mean excess return (FF), %

Figure 7.8: FF, FF portfolios

7.3 Fama-MacBeth

Reference: Cochrane (2001) 12.3; Campbell, Lo, and MacKinlay (1997) 5.8; Fama and
MacBeth (1973)
The Fama and MacBeth (1973) approach is a bit different from the regression ap-
proaches discussed so far. The method has three steps, described below.

 First, estimate the betas ˇi (i D 1; : : : ; n) from (7.2) (this is a time-series regres-


sion). This is often done on the whole sample—assuming the betas are constant.
Sometimes, the betas are estimated separately for different sub samples (so we
could let ˇOi carry a time subscript in the equations below).

 Second, run a cross sectional regression for every t. That is, for period t , estimate
 t from the cross section (across the assets i D 1; : : : ; n) regression

Riet D 0t ˇOi C "i t ; (7.19)

157
18

16

14
Mean excess return, %

12

lines connect same size


10

8
1 (small)
6 2
3
4
4 5 (large)

4 6 8 10 12 14 16 18
Predicted mean excess return (FF), %

Figure 7.9: FF, FF portfolios

where ˇOi are the regressors. (Note the difference to the traditional cross-sectional
approach discussed in (7.9), where the second stage regression regressed E Riet on
ˇOi , while the Fama-French approach runs one regression for every time period.)

 Third, estimate the time averages


T
1X
"Oi D "Oi t for i D 1; : : : ; n, (for every asset) (7.20)
T t D1
T
O D 1 O t :
X
(7.21)
T t D1

The second step, using ˇOi as regressors, creates an errors-in-variables problem since
ˇOi are estimated, that is, measured with an error. The effect of this is typically to bias the
estimator of  t towards zero (and any intercept, or mean of the residual, is biased upward).
One way to minimize this problem, used by Fama and MacBeth (1973), is to let the assets
be portfolios of assets, for which we can expect some of the individual noise in the first-

158
18

16

14
Mean excess return, %

12

lines connect same B/M


10

8
1 (low)
6 2
3
4
4 5 (high)

4 6 8 10 12 14 16 18
Predicted mean excess return (FF), %

Figure 7.10: FF, FF portfolios

step regressions to average out—and thereby make the measurement error in ˇOi smaller.
If CAPM is true, then the return of an asset is a linear function of the market return and an
error which should be uncorrelated with the errors of other assets—otherwise some factor
is missing. If the portfolio consists of 20 assets with equal error variance in a CAPM
regression, then we should expect the portfolio to have an error variance which is 1/20th
as large.
We clearly want portfolios which have different betas, or else the second step regres-
sion (7.19) does not work. Fama and MacBeth (1973) choose to construct portfolios
according to some initial estimate of asset specific betas. Another way to deal with the
errors-in-variables problem is to adjust the tests.
We can test the model by studying if "i D 0 (recall from (7.20) that "i is the time
average of the residual for asset i , "it ), by forming a t-test "Oi = Std.O"i /. Fama and MacBeth
(1973) suggest that the standard deviation should be found by studying the time-variation
in "Oi t . In particular, they suggest that the variance of "Oi t (not "Oi ) can be estimated by the

159
(average) squared variation around its mean
T
1X
Var.O"i t / D .O"i t "Oi /2 : (7.22)
T t D1

Since "Oi is the sample average of "Oi t , the variance of the former is the variance of the latter
divided by T (the sample size)—provided "Oi t is iid. That is,
T
1 1 X
Var.O"i / D Var.O"i t / D 2 .O"i t "Oi /2 : (7.23)
T T t D1

A similar argument leads to the variance of O


T
O 1 X O O 2:
Var./ D 2 . t / (7.24)
T t D1

Fama and MacBeth (1973) found, among other things, that the squared beta is not
significant in the second step regression, nor is a measure of non-systematic risk.

A Statistical Tables

n Critical values
10% 5% 1%
10 1:81 2:23 3:17
20 1:72 2:09 2:85
30 1:70 2:04 2:75
40 1:68 2:02 2:70
50 1:68 2:01 2:68
60 1:67 2:00 2:66
70 1:67 1:99 2:65
80 1:66 1:99 2:64
90 1:66 1:99 2:63
100 1:66 1:98 2:63
Normal 1:64 1:96 2:58

Table A.1: Critical values (two-sided test) of t distribution (different degrees of freedom)
and normal distribution.

160
n Critical values
10% 5% 1%
1 2:71 3:84 6:63
2 4:61 5:99 9:21
3 6:25 7:81 11:34
4 7:78 9:49 13:28
5 9:24 11:07 15:09
6 10:64 12:59 16:81
7 12:02 14:07 18:48
8 13:36 15:51 20:09
9 14:68 16:92 21:67
10 15:99 18:31 23:21

Table A.2: Critical values of chisquare distribution (different degrees of freedom, n).

Bibliography
Campbell, J. Y., A. W. Lo, and A. C. MacKinlay, 1997, The econometrics of financial
markets, Princeton University Press, Princeton, New Jersey.

Chen, N.-F., R. Roll, and S. A. Ross, 1986, “Economic forces and the stock market,”
Journal of Business, 59, 383–403.

Cochrane, J. H., 2001, Asset pricing, Princeton University Press, Princeton, New Jersey.

Elton, E. J., M. J. Gruber, S. J. Brown, and W. N. Goetzmann, 2010, Modern portfolio


theory and investment analysis, John Wiley and Sons, 8th edn.

Fama, E., and J. MacBeth, 1973, “Risk, return, and equilibrium: empirical tests,” Journal
of Political Economy, 71, 607–636.

Fama, E. F., and K. R. French, 1993, “Common risk factors in the returns on stocks and
bonds,” Journal of Financial Economics, 33, 3–56.

Fama, E. F., and K. R. French, 1996, “Multifactor explanations of asset pricing anoma-
lies,” Journal of Finance, 51, 55–84.

Gibbons, M., S. Ross, and J. Shanken, 1989, “A test of the efficiency of a given portfolio,”
Econometrica, 57, 1121–1152.

161
MacKinlay, C., 1995, “Multifactor models do not explain deviations from the CAPM,”
Journal of Financial Economics, 38, 3–28.

Wooldridge, J. M., 2002, Econometric analysis of cross section and panel data, MIT
Press.

162
8 Investment for the Long Run
Reference: Campbell and Viceira (2002), Elton, Gruber, Brown, and Goetzmann (2010)
12

8.1 Time Diversification: Approximate Case

This section discusses the notion of “time diversification,” which essentially amounts to
claiming that equity is safer for long run investors than for short run investors. The argu-
ment comes in two flavours: that Sharpe ratios are increasing with the investment horizon,
and that the probability that equity returns will outperform bond returns increases with the
horizon. This is illustrated in Figure 8.2. The results presented in this section are approx-
imate, since we work with simple returns (and disregard compounding). This has clear
disadvantages, but also the advantage of delivering simple results.

8.1.1 Increasing Sharpe Ratios

With iid returns, the expected return and variance both grow linearly with the horizon,
so Sharpe ratios (expected excess return divided by the standard deviation) increase with

Sharpe ratio Prob(excess return>0)


1.5 1

1
0.8

0.5
0.6
0
1m 1y 3y 6y 9y 1m 1y 3y 6y 9y
Investment horizon Investment horizon

US stock returns 1927:7−2010:4

Figure 8.1: SR and probability of excess return>0

163
Sharpe ratio Probability excess return>0
1
2
0.9
1.5
0.8
1

0.5 0.7
0 5 10 15 20 0 5 10 15 20
Investment horizon (years) Investment horizon (years)

Assumes annual excess return has


mean 0.08 and std 0.16, and is iid N

Figure 8.2: SR and probability of excess return>0, iid returns

the square root of horizon. However, this does not mean that risky assets are better for
long horizons, at least not if we believe in mean variance preferences and unpredictable
returns. Something else than iid data is needed for that.
Let Zq be the net return on a q-period investment. If returns are iid, the Sharpe ratio
of Zq is approximately
p E Re
SR.Zq /  q ; (8.1)
Std.R/
where E Re is the mean one-period excess return and Std.R/ is the standard deviation of
the one-period return. (Time subscripts are suppressed to keep the notation simple.) This
Sharpe ratio is clearly increasing with the horizon, q.
Proof. (of (8.1)) The q-period net return is

Zq D .R1 C 1/.R2 C 1/ : : : .Rq C 1/ 1


 R1 C R2 C : : : C Rq :

If returns are iid, then the mean and variance of the q-period return are approximately

E.Zq /  q E.R/;
Var.Zq /  q Var.R/:

164
Example 8.1 (The quality of the approximation of the q-period return) If R1 D 0:9 and
R2 D 0:9, then the two-period net return is

Z2 D .1 C 0:9/.1 0:9/ 1D 0:81

With the approximation we instead have

Z2  R1 C R2 D 0:

The difference in net returns is dramatic. If the two net returns instead are R1 D 0:09
and R2 D 0:09, then

Z2 D .1 C 0:09/.1 0:09/ 1D 0:01

and the approximation is still zero: the difference is much smaller.

Example 8.2 (The danger of arithmetic mean return). Consider two portfolios with the
following returns
Portfolio A Portfolio B
Year 1 5% 20%
Year 2 5% 35%
Year 3 5% 25%
Just adding these returns give 5% and 10% respectively, but the total returns over the
three periods are actually 4.7% and -2.5% respectively.

8.1.2 Probability of OutPerforming a Riskfree Asset

Since the Sharpe ratio is increasing with the investment horizon, the probability of beating
a riskfree asset is (typically) also increasing. To simplify, assume that the returns are
normally distributed. Then, we have

Pr Zqe > 0 D ˚ SR.Zq / ; (8.2)


  

where Zqe is the excess return on a q-period investment and ˚./ is the cumulative distri-
bution function of a standard normal variable, N .0; 1/. The argument of an increasing
probability of a positive excess return is therefore the same argument as the increasing
Sharpe ratio. See Figure 8.2 for an illustration.

165
Pdf Pdf, conditional on negative return

2 1 year 1 year
10 years 10 10 years
1.5
1 5
0.5
0 0
−0.5 0 0.5 1 −0.5 −0.4 −0.3 −0.2 −0.1 0
Net return Net return
Excess returns are iid N(0.08,0.162)

Prob of negative return Expected return, conditional on negative return

0.3 −0.05
−0.1
0.2
−0.15
0.1 −0.2
−0.25
0
0 5 10 15 20 0 5 10 15 20
Investment horizon (years) Investment horizon (years)

Figure 8.3: Time diversification, normally distributed returns

Proof. (of (8.2)) By standard manipulations we have

Pr Zqe > 0 D 1 Pr Zqe  0


 
!
Zqe E Zqe E Zqe
D1 Pr 
Std.Zqe / Std.Zqe /
!
E Zqe
D1 ˚
Std.Zqe /
!
E Zqe
D˚ ;
Std.Zqe /

where the last line follows from ˚.x/C˚. x/ D 1 since the standard normal distribution
is symmetric around zero.
Although the increasing Sharpe ratios mean that the probability of beating a riskfree

166
asset is increasing with the investment horizon, that does not mean that the risky asset
is safer for a long-run investor. The reason is, of course, that we also have to take into
account the size of the loss—in case the portfolio underperforms. With a longer horizon
(and therefore higher dispersion), really bad outcomes are more likely. See Figure 8.3 for
an illustration.
To say more about how the investment horizon affects the portfolio weights, we need
to be more precise about the preferences. As a benchmark, consider a mean-variance
investor who will choose a portfolio for q periods. With one risky asset (the tangency
portfolio) and a riskfree asset, the optimization problem is

k 2
maxv v E Zqe C qRf v Var.Zq /; (8.3)
2
where Rf is the per-period riskfree rate. With iid returns, both the mean and the variance
scale linearly with the investment horizon, so we can equally well write the optimization
problem as
k 2
maxv vq E Z1e C qRf v q Var.Z1 /; if iid returns. (8.4)
2
Clearly, scaling this objective function by 1=q will not change anything: the horizon is
irrelevant.
To be more precise, the solution of (8.3) is

1 E Zqe
vD : (8.5)
k Var.Zq /

If returns are iid, we get the following portfolio weights for investment horizons of one
and two periods
1 E Re
v.1/ D ; (8.6)
k Var.R/
1 2 E Re
v.2/ D ; (8.7)
k 2 Var.R/
which are the same. With MV behaviour, non-iid returns are required to generate a
horizon effect on the portfolio choice. The key point is that the portfolio weight is not
determined by the Sharpe ratio, but the Sharpe ratio divided by the standard deviation.
Or to put it another way, comparing Sharpe ratios across investment horizons is not very
informative.

167
Proof. (of (8.5)) The first order condition of (8.3) is

0 D E Zqe kv Var.Zq / or
1 E Zqe
vD :
k Var.Zq /

Example 8.3 (US long-run stock market) For the period 1947–2001, the US stock market
had an average excess return of 8% (per year) and a standard deviation of 16%. From
(8.5), the weight on the risky asset is then v D .0:08=0:162 /=k D 3:125=k.

With autocorrelated returns two things change: returns are predictable so the expected
return is time-varying, and the variance of the two-period return includes a covariance
term. The portfolio weights (chosen in period 0) are then
1 E0 R1e
v.1/ D ; (8.8)
k Var0 .R1 /
1 E0 .R1e C R2e /
v.2/ D ; (8.9)
k Var0 .R1 / C Var0 .R2 / C 2 Cov0 .R1 ; R2 /
where all moments carry a time subscript to indicate that they are conditional moments.
A key aspect of these formulas is that mean reversion in prices makes the covariance (of
returns) negative. This will tend to make the weight for the two-period horizon larger.
The intuition is simple: with mean reversion in prices, long-run investments are less risky
than short-run investments since extreme movements will be partially “averaged out” over
time. Empirically, there is some evidence of mean-reversion on the business cycle fre-
quencies (a couple of years). The effect is not strong, however, so mean reversion is
probably a poor argument for horizon effects.

Example 8.4 (AR(1) process for returns) Suppose the excess returns follow an AR(1)
process
RetC1 D .1 / C Ret C " t C1 with  2 D Var." t C1 /:

168
The conditional moments are then

E0 R1e D .1 / C R0e ;


E0 R2e D .1 2 / C 2 R0e ;
Var0 .R1 / D  2
Var0 .R2 / D .1 C 2 / 2
Cov0 .R1 ; R2 / D  2 :

If the initial return is at the mean, R0e D , then the forecasted return is  across all
horizons, which gives the portfolio weights
1 
v.1/ D ;
k 2
1  2
v.2/ D :
k  .2 C 2 C 2/
2

With  D . 0:5; 0; 0:5/ the last term is around .1:6; 1; 0:6/. With  D . 0:1; 0; 0:1/, the
last term is around .1:1; 1; 0:9/.

8.2 Time Diversification and the Growth-Optimal Portfolio: Lognor-


mal Returns

This section revisits the issue of time diversification—this time in a setting where log
portfolio returns are normally distributed. This allows us to get more precise results,
since we can avoid approximating the cumulative returns.

8.2.1 Time Diversification with Lognormal Returns

The gross return on a q-period investment can be written

1 C Zq D .1 C R1 /.1 C R2 /:::.1 C Rq /; (8.10)

where R t is the net portfolio return in period t. Taking logs (and using lower case letters
to denote them), we have the log q-period return

zq D r1 C r2 C : : : C rq ; (8.11)

169
where zq D ln.1 C Zq / and r t D ln.1 C R t /.

Remark 8.5 (ln.1 C x/  x:::) If x is small, ln.1 C x/  x, so assuming that x is


normally distributed is fairly similar to assuming that ln.1 C x/ is normally distributed.

Remark 8.6 (Lognormal distribution) If x  N.;  2 / and y D exp.x/, then the prob-
ability density function of y is
"  #
1 ln y  2

1
pdf.y/ D p exp , y > 0:
y 2 2 2 

The rth moment of y is E y r D exp.r C r 2  2 =2/.

To simplify the analysis, assume that the log returns of portfolio y, ryt , are iid N.y ; y2 /.
(This is a convenient assumption since it carries over to multi-period returns.) The “Sharpe
ratio” of the log q-period return, zqy , is
p y rf
SR.zqy / D q ; (8.12)
y

where rf is the continously compounded interest rate.


If log returns are normally distributed, the probability of the q-period return of port-
folio y (denoted Zqy ) being higher than the q-return of portfolio x (Zqx ) is
!
p y x
Pr Zqy > Zqx D ˚ (8.13)

q  ;
 ryt rxt

where ˚ is the cumulative distribution function of a standard normal variable, N .0; 1/,
y the expected log return on portfolio y, and  ryt rxt is the standard deviation of


the difference in log returns. (The portfolios are constant over time, since the returns
are iid.) In particular, if the x portfolio is a riskfree asset with log return rf , then the
probability is
Pr Zqy
e
> 0 D ˚ SR.zqy / ; (8.14)
  

which is a function of the Sharpe ratio for the log returns. This probability is clearly
increasing with the investment horizon, q. On the other hand, with a longer horizon (and
therefore higher dispersion), really bad outcomes more likely.
See Figure 8.4 for an illustration.

170
Pdf Pdf, conditional on negative return

1 year 10 1 year
3
10 years 10 years
2
5
1

0 0
−0.5 0 0.5 1 1.5 −0.5 −0.4 −0.3 −0.2 −0.1 0
Net return Net return
log returns are iid N(0.04,0.12)

Prob of negative return Expected return, conditional on negative return


0
0.3
−0.05
0.2
−0.1
0.1 −0.15

0 −0.2
0 5 10 15 20 0 5 10 15 20
Investment horizon (years) Investment horizon (years)

Figure 8.4: Time diversification, lognormally distributed returns

Proof. (of (8.12)) Consider (8.11). If log returns are iid with mean  and variance  2 ,
then the mean and variance of the q-period return are

E.zq / D q;
Var.zq / D q 2 :

171
Proof. ( of (8.13)) By standard manipulations we have
Pq Pq Pq Pq
Pr exp tD1 r ty > exp D1 Pr exp t D1 r ty  exp
     
t D1 r tx tD1 r tx
Pq Pq
D 1 Pr 

t D1 r ty t D1 r tx
" Pq    #
tD1 r ty r tx q y x q y x
D 1 Pr p   p 
q ryt rxt q ryt rxt
" #
p y x
D1 ˚ q 
 ryt rxt
" #
p y x
D˚ q  ;
 ryt rxt

where the last line follows from ˚.z/C˚. z/ D 1 since the standard normal distribution
is symmetric around zero.
To demonstrate that, with iid log returns, optimal portfolio weights are indeed unaf-
fected by the investment horizon, consider the simple case of a logarithmic utility func-
tion, where we find a portfolio that solves

maxv E ln.1 C Rq / D maxv E.r1 C r2 C : : : C rq /; (8.15)

where r t is the log portfolio return in period t (which clearly depends on the chosen port-
folio weights v). We here assume that the portfolio weights are chosen at the beginning
(time t D 0) of the investment period and then kept unchanged. With iid log returns, we
can clearly write (8.15) as
maxv q E r1 ; (8.16)

which demonstrates that the investment horizon does not matter for the optimal portfolio
choice. It doesn’t matter that the Sharpe ratio is increasing.

Example 8.7 (Portfolio choice with logarithmic utility function) It is typically hard to
find explicit expressions for what the portfolio weights should be with log utility, so one
typically has to resort to numerical methods. This example shows a case where we can
find an explicit solution—because of a very simple setting. Suppose there are two states
(1 and 2) and that asset A has the gross return RA .1/ in state 1 and RA .2/ in state 2—and
similarly for asset B. The portfolio return is Rp D vRe C RB , where Re D RA RB . If

172
Expected log portfolio gross return
8

7.99
Log gross return × 100

7.98

7.97
Two states with prob 1/3 and 2/3
7.96 Gross return of asset A: 1.05 in state 1 and 1.1 in state 2
Gross return of asset B: 1.083 in both states
7.95
−0.5 0 0.5 1 1.5
Weight on asset A

Figure 8.5: Example of portfolio choice with log utility

 is the probability of state 1, then the expected log portfolio return is

E ln.Rp / D  lnŒvRe .1/ C RB .1/ C .1 / lnŒvRe .2/ C RB .2/:

The first order condition for v is

 .1 /
0D Re .1/ C Re .2/
vRe .1/ C RB .1/ vR .2/ C RB .2/
e

and the solution is


Re .1/RB .2/ C .1 /Re .2/RB .1/
vD :
Re .1/Re .2/
See Figure 8.5 for an illustration.

8.2.2 The Growth-Optimal Portfolio and Log Utility

The portfolio that comes out from maximizing the log return has some interesting prop-
erties. If portfolio y has the highest expected log return, then (8.13) shows that the prob-
ability that it beats any other portfolio is increasing with the investment horizon—and

173
Probability of Ry > Rx
1

0.8

0.6

0.4

0.2 µe/σ = 0.4


µe/σ = 0.2
0
0 5 10 15 20
Investment horizon (years)

Figure 8.6: The probability of outperforming another portfolio

goes to unity as the horizon goes to infinity. This portfolio is called the growth-optimal
portfolio.
See Figure 8.6 for an illustration.
This portfolio is commonly advocated to be the best for any long-run investor. That
argument is clearly flawed. In particular, for an investor with a relative risk aversion
different from one, the growth-optimal portfolio is not optimal: a higher risk aversion
would give a more conservative portfolio. (It can be shown that the logarithmic utility
function is a CRRA utility function with a relative risk aversion of one.) The intuition is
that the occasional lower return of the growth-optimal portfolio is considered very risky,
so the investor prefers a less volatile portfolio.
Notice that, for a given q < 1, the growth-optimal portfolio does not necessarily
maximize the probability of beating other portfolios. While the growth-optimal portfolio
has the highest expected log return so it maximizes the numerator in (8.13), it may well
have a very high volatility. It is only in the limit that the growth-optimal portfolio is a sure
winner.

174
8.2.3 Maximizing the Geometric Mean Return

The growth-optimal portfolio is often said to maximize the geometric mean return. That
is true, but may need a clarification.

Remark 8.8 (Geometric mean) Suppose the random variable x can take the values x.1/; x.2/; : : : ; x.S/
with probabilities .1/; .2/; : : : ; .S/, where jSD1 .j / D 1. The arithmetic mean
P

(expected value) is jSD1 .j /x.j / and the geometric mean is jSD1 x.j /.j / . Taking the
P Q

log of the definition of a geometric mean gives


PS
j D1 .j / ln x.j / D E ln x;

which is the expected value of the log of x.

Remark 8.9 (Sample geometric mean) With the sample z1 ; z2 ; : : : ; zT , the sample arith-
metic mean is TtD1 z t =T and the sample geometric mean is TtD1 z t1=T .
P Q

It follows directly from these remarks that a portfolio that maximizes the geometric
mean of the portfolio gross return 1 C Rp also maximizes the expected log return of it,
E ln 1 C Rp .


An intuitive way of motivating this portfolio is as follows. The gross return on the
q-period investment in (8.10) is, of course, random, but in a very large sample (long
investment horizon), the histogram of the returns should start to converge to the true
distribution. With iid returns, this is the same distribution that defined the geometric mean
(which we have maximized). Hence, with a very long investment period, the portfolio
(that maximizes the geometric mean) should give the highest return over the investment
period. Of course, this is virtually the same argument as in (8.13), which showed that the
growth-optimal portfolio will outperform all other portfolios with probability one as the
investment horizon goes to infinity. (The only difference is that the current argument does
not rely on the normal distribution of the log returns.)

8.3 More General Utility Functions and Rebalancing

We will now take a look at more general optimization problems. Assume that the objective
is to maximize
E0 u.Wq /; (8.17)

175
where Wq is the wealth (in real terms) at time q (the investment horizon) and E0 denotes
the expectations formed in period 0 (the initial period). What can be said about how the
investment horizon affects the portfolio weights?
If the investor is not allowed (or it is too costly) to rebalance the portfolio—and the
utility function/distribution of returns are such that the investor picks a mean-variance
portfolio (quadratic utility function or normally distributed returns), then the results in
Section 8.1.1 go through: non-iid returns are required to generate a horizon effect on the
portfolio choice.
If, more realistically, the investor is allowed to rebalance the portfolio, then the anal-
ysis is more difficult. We summarize some known results below.

8.3.1 CRRA Utility Function and iid Returns

Suppose the utility function has constant relative risk aversion, so the objective in period
0 is
max E0 Wq1 =.1 /: (8.18)

In period one, the objective is max E1 Wq1 =.1 /, which may differ in terms of what we
know about the distribution of future returns (incorporated into the expectations operator)
and also in terms of the current wealth level (due to the return in period 1).
With CRRA utility, relative portfolio weights are independent of the wealth of the
investor (fairly straightforward to show). If we combine this with iid returns—then the
only difference between an investor in t and the same investor in t C 1 is that he may
be poorer or wealthier. This investor will therefore choose the same portfolio weights in
every period. Analogously, a short run investor and a long run investor choose the same
portfolio weights (you can think of the investor in t C1 as a short run investor). Therefore,
with a CRRA utility function and iid returns there are no horizon effects on the portfolio
choice. In addition, the portfolio weights will stay constant over time. The intuition is
that all periods look the same.
However, with non-iid returns (predictability or variations in volatility) there will be
horizon effects (and changes in weights over time). This would give rise to intertemporal
hedging, where the choice of today’s portfolio is affected by the likely changes of the
investment opportunities tomorrow.
The same result holds if the objective function instead is to maximize the utility from

176
stream of consumption, provided the utility function is CRRA and time separable. In this
case, the objective is

max C01
=.1 / C ı E0 C11
=.1 / C : : : C ı q E0 Cq1
=.1 /: (8.19)

The basic mechanism is that the optimal consumption/wealth ratio turns out to be con-
stant.

8.3.2 Logarithmic Utility Function

In the special case where the relative risk aversion (in a CRRA utility function) is one,
then the utility function becomes logarithmic.
The objective in period 0 is then

max E0 ln Wq D max.ln W0 C E0 r1 C E0 r2 C : : : C E0 rq /; (8.20)

where r t is the log return, r t D ln.1 C R t / where R t is a net return.


Since the returns in the different periods enter separably, the best an investor can do in
period 0 is to choose a portfolio that maximizes E0 r1 —that is, to choose the one-period
growth-optimal portfolio. But, a short run investor who maximizes E0 lnŒW0 .1 C R1 // D
max.ln W0 C E0 r1 / will choose the same portfolio. There is then no horizon effect.
However, the portfolio choice may change over time, if the distribution of the returns do.
The same result holds if the objective function instead is to maximize the utility from
stream of consumption as in (8.19), but with a logarithmic utility function.

Bibliography
Campbell, J. Y., and L. M. Viceira, 2002, Strategic asset allocation: portfolio choice of
long-term investors, Oxford University Press.

Elton, E. J., M. J. Gruber, S. J. Brown, and W. N. Goetzmann, 2010, Modern portfolio


theory and investment analysis, John Wiley and Sons, 8th edn.

177
9 Performance Analysis
Reference: Elton, Gruber, Brown, and Goetzmann (2010) 25
More advanced material is denoted by a star ( ). It is not required reading.

9.1 Performance Evaluation

Reference: Elton, Gruber, Brown, and Goetzmann (2010) 25

9.1.1 The Idea behind Performance Evaluation

Traditional performance analysis tries to answer the following question: “should we in-
clude an asset in our portfolio, assuming that future returns will have the same distribu-
tion as in a historical sample.” Since returns are random variables (although with different
means, variances, etc) and investors are risk averse, this means that performance analy-
sis will typically not rank the fund with the highest return (in a historical sample) first.
Although that high return certainly was good for the old investors, it is more interesting
to understand what kind of distribution of future returns this investment strategy might
entail. In short, the high return will be compared with the risk of the strategy.
Most performance measures are based on mean-variance analysis, but the full MV
portfolio choice problem is not solved. Instead, the performance measures can be seen
as different approximations of the MV problem, where the issue is whether we should
invest in fund p or in fund q. (We don’t allow a mix of them.) Although the analysis
is based on the MV model, it is not assumed that all assets (portfolios) obey CAPM’s
beta representation—or that the market portfolio must be the optimal portfolio for every
investor. One motivation of this approach could be that the investor (who is doing the
performance evaluation) is a MV investor, but that the market is influenced by non-MV
investors.
Of course, the analysis is also based on the assumption that historical data are good
forecasters of the future.

178
There are several popular performance measures, corresponding to different situa-
tions: is this an investment of your entire wealth, or just a small increment? However, all
these measures are (increasing) functions of Jensen’s alpha, the intercept in the CAPM
regression

Riet D ˛i C bi Rmt
e
C "i t , where E "i t D 0 and Cov.Rmt
e
; "i t / D 0: (9.1)

Example 9.1 (Statistics for example of performance evaluations) We have the following
information about portfolios m (the market), p, and q

˛ ˇ Std."/ e 
m 0:000 1:000 0:000 0:100 0:180
p 0:010 0:900 0:140 0:100 0:214
q 0:050 1:300 0:030 0:180 0:236

Table 9.1: Basic facts about the market and two other portfolios, ˛, ˇ, and Std."/ are from
CAPM regression: Riet D ˛ C ˇRmt e
C "i t

9.1.2 Sharpe Ratio and M 2 : Evaluating the Overall Portfolio

Suppose we want to know if fund p is better than fund q to place all our savings in.
(We don’t allow a mix of them.) The answer is that p is better if it has a higher Sharpe
ratio—defined as
SRp D pe =p : (9.2)

The reason is that MV behaviour (MV preferences or normally distributed returns) implies
that we should maximize the Sharpe ratio (selecting the tangency portfolio). Intuitively,
for a given volatility, we then get the highest expected return.

Example 9.2 (Performance measure) From Example 9.1 we get the following perfor-
mance measures

A version of the Sharpe ratio, called M 2 (after some of the early proponents of the
measure: Modigliani and Modigliani) is

Mp2 D pe  em .or p m /; (9.3)

179
SR M2 AR Treynor T2
m 0:556 0:000 0:100 0:000
p 0:467 0:016 0:071 0:111 0:011
q 0:763 0:037 1:667 0:138 0:038

Table 9.2: Performance Measures

Sharpe ratio and M2

q
0.2 Data on m, p, q: o
0.18 SR: 0.56 0.47 0.76
q*
0.16 2 o
M in %: 0.00 −1.59 3.73
0.14
m
0.12 CML o o
CAL(p) p
0.1 o
CAL(q) p*
0.08
0.06
CML = Rf + σµem/σm (slope is SRm)
0.04
0.02 CAL(x) = Rf + σµex/σx (slope is SRx)

0
0 0.05 0.1 0.15 0.2 0.25
σ

Figure 9.1: Sharpe ratio and M 2

where pe  is the expected return on a mix of portfolio p and the riskfree asset such that
the volatility is the same as for the market return.

Rp D aRp C .1 a/Rf , with a D m =p : (9.4)

This gives the mean and standard deviation of portfolio p 

pe  D ape D pe m =p (9.5)


p D ap D m : (9.6)

180
The latter shows that Rp indeed has the same volatility as the market. See Example 9.2
and Figure 9.1 for an illustration.
M 2 has the advantage of being easily interpreted—it is just a comparison of two
returns. It shows how much better (or worse) this asset is compared to the capital market
line (which is the location of efficient portfolios provided the market is MV efficient).
However, it is just a scaling of the Sharpe ratio.
To see that, use (9.2) to write

Mp2 D SRp p SRm m


D SRp (9.7)

SRm m :

The second line uses the facts that Rp has the same Sharpe ratio as Rp (see (9.5)–(9.6))
and that Rp has the same volatility as the market. Clearly, the portfolio with the highest
Sharpe ratio has the highest M 2 .

9.1.3 Appraisal Ratio: Which Portfolio to Combine with the Market Portfolio?

If the issue is “should I add fund p or fund q to my holding of the market portfolio?,” then
the appraisal ratio provides an answer. The appraisal ratio (also called the information
ratio) of fund p is
ARp D ˛p = Std."pt /; (9.8)

where ˛p is the intercept and Std."pt / the volatility of the residual of a CAPM regression
(9.1). (The residual is often called the tracking error.) A higher appraisal ratio is better.
The motivation is that if we take the market portfolio and portfolio p to be the available
assets, and then find the optimal (assuming MV preferences) combination of them, then
the squared Sharpe ratio of the optimal portfolio (that is, the tangency portfolio) is
 2
˛p
SRc2 D 2
C SRm : (9.9)
Std."pt /

If the alpha is positive, a higher appraisal ratio gives a higher Sharpe ratio—which is the
objective if we have MV preferences. See Example 9.2 for an illustration.
If the alpha is negative, and we rule out short sales, then (9.9) is less relevant. In this
case, the optimal portfolio weight on an asset with a negative alpha is (very likely to be)
zero—so those assets are uninteresting.

181
Proof. From the CAPM regression (9.1) we have
" # " # " # " #
Riet ˇi2 m2 C Var."i t / ˇi m2 ei ˛i C ˇi em
Cov e
D , and D :
Rmt ˇi m2 m2 em em

Suppose we use this information to construct a mean-variance frontier for both Ri t and
Rmt , and we find the tangency portfolio, with excess return Rct
e
. We assume that there are
no restrictions on the portfolio weights. Recall that the square of the Sharpe ratio of the
tangency portfolio is e0 ˙ 1 e , where e is the vector of expected excess returns and ˙
is the covariance matrix. By using the covariance matrix and mean vector above, we get
that the squared Sharpe ratio for the tangency portfolio (using both Ri t and Rmt ) is
2  e 2
ec ˛i2

m
D C :
c Var."i t / m

9.1.4 Treynor’s Ratio and T 2 : Portfolio is a Small Part of the Overall Portfolio

Suppose instead that the issue is if we should add a small amount of fund p or fund q
to an already well diversified portfolio (not the market portfolio). In this case, Treynor’s
ratio might be useful
TRp D pe =ˇp : (9.10)

A higher Treynor’s ratio is better.


If we mix p and q with the riskfree rate to get the same ˇ for both portfolios (here 1
to make it comparable with the market), the one with the highest Treynor’s ratio has the
highest expected return. To show this consider the portfolio p 

Rp D aRp C .1 a/Rf , with a D 1=ˇp : (9.11)

This gives the mean and the beta of portfolio p 

pe  D ape D pe =ˇp (9.12)


ˇp D aˇp D 1; (9.13)

182
so the beta is one. The T 2 measure is then

Tp2 D pe  em D pe =ˇp em : (9.14)

See Example 9.2 and Figure 9.2 for an illustration.


The basic intuition is that with a diversified portfolio and small investment, idiosyn-
cratic risk doesn’t matter, only systematic risk (ˇ) does. Compare with the setting of
the Appraisal Ratio, where we also have a well diversified portfolio (the market), but the
investment could be large.

Example 9.3 (Additional portfolio risk) We hold a well diversified portfolio (d ) and
buy a fraction 0.05 of asset i (financed by borrowing), so the return is R D Rd C
0:05 Ri Rf . Suppose d2 D i2 D 1 and that the correlation of d and i is 0.25.


The variance of R is then

d2 C ı 2 i2 C 2ıid D 1 C 0:052 C 2  0:05  0:25 D 1 C 0:0025 C 0:025;

so the importance of the covariance is 10 times larger than the importance of the variance
of asset i.

Proof. ( Version 1: Based on the beta representation.) The derivation of the beta
representation shows that for all assets ei D Cov .Ri ; Rm / A, where A is some constant.
Rearrange as ei =ˇi D Am2 . A higher ratio than this is to be considered as a positive
“abnormal” return and should prompt a higher investment.
Proof. ( Version 2: From first principles, kind of a proof...) Suppose we initially hold
a well diversified portfolio (d ) and we increase the position in asset i with the fraction ı
by borrowing at the riskfree rate to get the return

R D Rd C ı Ri

Rf :

The incremental (compared to holding portfolio d ) expected excess return is ıei and the
incremental variance is ı 2 i2 C 2ıid  2ıid , since ı 2 is very small. (The variance of R
is d2 C ı 2 i2 C 2ıid .) To a first-order approximation, the change (E Rp Var.Rp /k=2)
in utility is therefore ıei kıid , so a high value of ei =id will increase utility. This
suggests ei =id as a performance measure. However, if portfolio d is indeed well di-
versified, then id  i m . We could therefore use ei =i m or (by multiplying by mm ),

183
Treynor’s measure and T2

q
0.2 Data on m, p, q: o
0.18 TRp: 0.10 0.11 0.14
q*
0.16 2 o
T in %: 0.00 1.11 3.85
0.14 p*
p o
0.12 SML o o
TreynorLine(p) m
0.1 TreynorLine(q)
0.08
0.06
SML = Rf + βµem
0.04
0.02 TreynorLine(x) = Rf + βµex/βx (slope is TRx)

0
0 0.2 0.4 0.6 0.8 1 1.2 1.4
β

Figure 9.2: Treynor’s ratio

ei =ˇi as a performance measure.

9.1.5 Relationships among the Various Performance Measures

The different measures can give different answers when comparing portfolios, but they
all share one thing: they are increasing in Jensen’s alpha. By using the expected values
from the CAPM regression (pe D ˛p C ˇp em ), simple rearrangements give
˛p
SRp D C Corr.Rp ; Rm /SRm
p
˛p
ARp D
Std."pt /
˛p
TRp D C em : (9.15)
ˇp

and M 2 is just a scaling of the Sharpe ratio. Notice that these expressions do not assume
that CAPM is the right pricing model—we just use the definition of the intercept and
slope in the CAPM regression.

184
Since Jensen’s alpha is the driving force in all these measurements, it is often used as
performance measure in itself. In a sense, we are then studying how “mispriced” a fund
is—compared to what it should be according to CAPM. That is, the alpha measures the
“abnormal” return.
Proof. (of (9.15) ) Taking expectations of the CAPM regression (9.1) gives pe D
˛p C ˇp em , where ˇp D Cov.Rp ; Rm /=m2 . The Sharpe ratio is therefore

pe ˛p ˇp
SRp D D C em ;
p p p

which can be written as in (9.15) since


ˇp e Cov.Rp ; Rm / em
m D :
p m p m

The ARp in (9.15) is just a definition. The TRp measure can be written

pe ˛p
TRp D D C em ;
ˇp ˇp

where the second equality uses the expression for pe from above.

˛ SR M2 AR Treynor T2
Market 0:000 0:238 0:000 4:714 0:000
Putnam 2:744 0:056 3:604 0:308 1:274 3:440
Vanguard 0:511 0:160 1:549 0:071 3:788 0:926

Table 9.3: Performance Measures of Putnam Asset Allocation: Growth A and Vanguard
Wellington, weekly data 1996:1-2010:4

9.1.6 Performance Measurement with More Sophisticated Benchmarks

Reference: Dahlquist and Söderlind (1999)


Traditional performance tests typically rely on the alpha from a CAPM regression.
The benchmark for the evaluation is then effectively a fixed portfolio consisting of assets
that are correctly priced by the CAPM (obeys the beta representation). It often makes
sense to use a more demanding benchmark. There are several popular alternatives.

185
If there are predictable movements in the market excess return, then it makes sense to
add a “market timing” factor to the CAPM regression. For instance, Treynor and Mazuy
(1966) argues that market timing is similar to having a beta that is linear in the market
excess return
e
ˇi D bi C ci Rmt : (9.16)

Using in a traditional market model (CAPM) regression, Riet D ai C ˇi Rmt


e
C "i t , gives

Riet D ai C bi Rmt
e e 2
C ci .Rmt / C "i t ; (9.17)

where c captures the ability to “time” the market. That is, if the investor systematically
gets out of the market (maybe investing in a riskfree asset) before low returns and vice
versa, then the slope coefficient c is positive. The interpretation is not clear cut, however.
If we still regard the market portfolio (or another fixed portfolio that obeys the beta rep-
resentation) as the benchmark, then a C c.Rmt e 2
/ should be counted as performance. In
contrast, if we think that this sort of market timing is straightforward to implement, that
is, if the benchmark is the market plus market timing, then only a should be counted as
performance.
In other cases (especially when we think that CAPM gives systematic pricing errors),
then the performance is measured by the intercept of a multifactor model like the Fama-
French model.
A recent way to merge the ideas of market timing and multi-factor models is to al-
low the coefficients to be time-varying. In practice, the coefficients in period t are only
allowed to be linear (or affine) functions of some information variables in an earlier pe-
riod, z t 1 . To illustrate this, suppose z t 1 is a single variable, so the time-varying (or
“conditional”) CAPM regression is

Riet D .ai C i z t 1/ C .bi C ıi z t e


1 / Rmt C "i t
e e
D i1 C i 2 z t 1 C i 3 Rmt C i 4 z t 1 Rmt C "i t : (9.18)

Similar to the market timing regression, there are two possible interpretations of the re-
sults: if we still regard the market portfolio as the benchmark, then the other three terms
should be counted as performance. In contrast, if the benchmark is a dynamic strategy in
the market portfolio (where z t 1 is allowed to affect the choice market portfolio/riskfree
asset), then only the first two terms are performance. In either case, the performance is

186
time-varying.

9.2 Performance Attribution

The performance of a fund is in many cases due to decisions taken on several levels. In
order to get a better understanding of how the performance was generated, a performance
attribution calculation can be very useful. It uses information on portfolio weights (for
instance, in-house information) to decompose overall performance according to a number
of criteria (typically related to different levels of decision making).
For instance, it could be to decompose the return (as a rough measure of the perfor-
mance) into the effects of (a) allocation to asset classes (equities, bonds, bills); and (b)
security choice within each asset class. Alternatively, for a pure equity portfolio, it could
be the effects of (a) allocation to industries; and (b) security choice within each industry.
Consider portfolios p and B (for benchmark) from the same set of assets. Let n be
the number of asset classes (or industries). Returns are
n
X n
X
Rp D wi RP i and RB D vi RBi ; (9.19)
i D1 i D1

where wi is the weight on asset class i (for instance, long T-bonds) in portfolio p, and vi
is the corresponding weight in the benchmark B. Analogously, RP i is the return that the
portfolio earns on asset class i , and RBi is the return the benchmark earns. In practice,
the benchmark returns are typically taken from well established indices.
Form the difference and rearrange to get
n
X
Rp RB D .wi RP i vi RBi /
i D1
X n n
X
D .wi vi / RBi C wi .Ri RBi / : (9.20)
i D1 iD1

The first term, .wi vi / RBi , is the contribution from asset class (or industry) i. It uses
the benchmark return for that asset class (as if you had invested in that index), and simply
measures the contribution from investing more/less in that asset class than the benchmark.
If decisions on allocation to different asset classes are taken by senior management (or a
board), then this is the contribution of that level. The second term, wi .RP i RBi /, is the

187
contribution of the security choice (within an asset class) since it measures the difference
in returns (within that asset class) of the portfolio and the benchmark.

9.3 Style Analysis

Reference: Sharpe (1992)


Style analysis is a way to use econometric tools to find out the portfolio composition
from a series of the returns, at least in broad terms.
The basic idea is to identify a number (5 to 10 perhaps) return indices that are expected
to account for the brunt of the portfolio’s returns, and then run a regression to find the
portfolio “weights.” It is essentially a multi-factor regression without any intercept and
where the coefficients are constrained to sum to unity and to be positive
K
X K
X
e e
Rpt D bj Rjt C "pt ; with bj D 1 and bj  0 for all j: (9.21)
j D1 j D1

The coefficients are typically estimated by minimizing the sum of squared residuals. This
is a nonlinear estimation problem, but there are very efficient methods for it (since it is a
quadratic problem). Clearly, the restrictions could be changed to Uj  bj  Lj , which
could allow for short positions.
A pseudo-R2 (the squared correlation of the fitted and actual values) is sometimes
used to gauge how well the regression captures the returns of the portfolio. The residuals
can be thought of as the effect of stock selection, or possibly changing portfolio weights
more generally. One way to get a handle of the latter is to run the regression on a moving
data sample. The time-varying weights are often compared with the returns on the indices
to see if the weights were moved in the right direction.
See Figure 9.3 and Figure 9.5 for examples.

Bibliography
Dahlquist, M., and P. Söderlind, 1999, “Evaluating portfolio performance with stochastic
discount factors,” Journal of Business, 72, 347–383.

188
Putnam Asset Allocation: Growth A: style analysis on moving data window

Static weights: Equity: Int. (ex US), Developed


Equity: Int. (ex US), Developed 0.41 Equity: US, LargeCap, Value
0.6 Equity: US, LargeCap, Value 0.21 Fixed Income: US, Bills
Fixed Income: US, Bills 0.13
0.5

0.4 R2=0.90

0.3

0.2

0.1

0
1996 1998 2000 2002 2004 2006 2008 2010

Figure 9.3: Example of style analysis, rolling data window

Elton, E. J., M. J. Gruber, S. J. Brown, and W. N. Goetzmann, 2010, Modern portfolio


theory and investment analysis, John Wiley and Sons, 8th edn.

Sharpe, W. F., 1992, “Asset allocation: management style and performance measure-
ment,” Journal of Portfolio Management, 39, 119–138.

189
Vanguard Wellington: style analysis on moving data window

Static weights: Equity: US, LargeCap, Value


Equity: US, LargeCap, Value 0.52 Fixed Income: US, Corp. Bonds
0.6 Fixed Income: US, Corp. Bonds 0.26 Fixed Income: US, Gov. Bonds
Fixed Income: US, Gov. Bonds 0.12
0.5

0.4

0.3
R2=0.89

0.2

0.1

0
1996 1998 2000 2002 2004 2006 2008 2010

Figure 9.4: Example of style analysis, rolling data window

190
Vanguard Wellington: weight and relative return on the index Equity: US, LargeCap, Value

0.5
Weight 0.2

0
Index return minus SP500 return −0.2
0
1996 1998 2000 2002 2004 2006 2008 2010
Vanguard Wellington: weight and relative return on the index Fixed Income: US, Corp. Bonds

0
1996 1998 2000 2002 2004 2006 2008 2010
Vanguard Wellington: weight and relative return on the index Fixed Income: US, Gov. Bonds

0.5

0
1996 1998 2000 2002 2004 2006 2008 2010

Figure 9.5: Style analysis and returns

191
10 Predicting Asset Returns
Reference (medium): Elton, Gruber, Brown, and Goetzmann (2010) 17 (efficient markets)
and 26 (earnings estimation)
Additional references: Campbell, Lo, and MacKinlay (1997) 2 and 7; Cochrane (2001)
20.1
More advanced material is denoted by a star ( ). It is not required reading.

10.1 Asset Prices, Random Walks, and the Efficient Market Hypoth-
esis

Let P t be the price of an asset at the end of period t, after any dividend in t has been paid
(an ex-dividend price). The gross return (1 C R t C1 , like 1.05) of holding an asset with
dividends (per current share), D t C1 , between t and t C 1 is then defined as
P t C1 C D t C1
1 C R tC1 D : (10.1)
Pt
The dividend can, of course, be zero in a particular period, so this formulation encom-
passes the case of daily stock prices with annual dividend payment.

Remark 10.1 (Conditional expectations) The expected value of the random variable y t C1
conditional on the information set in t, E t y tC1 is the best guess of y tC1 using the infor-
mation in t . Example: suppose y t C1 equals x t C " t C1 , where x t is known in t, but all we
know about " t C1 in t is that it is a random variable with a zero mean and some (finite)
variance. In this case, the best guess of y t C1 based on what we know in t is equal to x t .

Take expectations of (10.1) based on the information set in t


E t P t C1 C E t D t C1
1 C E t R t C1 D or (10.2)
Pt
E t P t C1 C E t D t C1
Pt D : (10.3)
1 C E t R t C1

192
This formulation is only a definition, but it will help us organize the discussion of how
asset prices are determined.
This expected return, E t R t C1 , is likely to be greater than a riskfree interest rate if the
asset has positive systematic (non-diversifiable) risk. For instance, in a CAPM model this
would manifest itself in a positive “beta.” In an equilibrium setting, we can think of this
as a “required return” needed for investors to hold this asset.

10.1.1 Different Versions of the Efficient Market Hypothesis

The efficient market hypothesis casts a long shadow on every attempt to forecast asset
prices. In its simplest form it says that it is not possible to forecast asset prices, but there
are several other forms with different implications. Before attempting to forecast financial
markets, it is useful to take a look at the logic of the efficient market hypothesis. This will
help us to organize the effort and to interpret the results.
A modern interpretation of the efficient market hypothesis (EMH) is that the informa-
tion set used in forming the market expectations in (10.2) includes all public information.
(This is the semi-strong form of the EMH since it says all public information; the strong
form says all public and private information; and the weak form says all information in
price and trading volume data.) The implication is that simple stock picking techniques
are not likely to improve the portfolio performance, that is, abnormal returns. Instead,
advanced (costly?) techniques are called for in order to gather more detailed information
than that used in market’s assessment of the asset. Clearly, with a better forecast of the
future return than that of the market there is plenty of scope for dynamic trading strate-
gies. Note that this modern interpretation of the efficient market hypothesis does not rule
out the possibility of forecastable prices or returns. It does rule out that abnormal returns
can be achieved by stock picking techniques which rely on public information.
There are several different traditional interpretations of the EMH. Like the modern
interpretation, they do not rule out the possibility of achieving abnormal returns by using
better information than the rest of the market. However, they make stronger assumptions
about whether prices or returns are forecastable. Typically one of the following is as-
sumed to be unforecastable: price changes, returns, or returns in excess of a riskfree rate
(interest rate). By unforecastable, it is meant that the best forecast (expected value condi-
tional on available information) is a constant. Conversely, if it is found that there is some
information in t that can predict returns R t C1 , then the market cannot price the asset as

193
if E t R tC1 is a constant—at least not if the market forms expectations rationally. We will
now analyze the logic of each of the traditional interpretations.
If price changes are unforecastable, then E t P t C1 P t equals a constant. Typically,
this constant is taken to be zero so P t is a martingale. Use E t P t C1 D P t in (10.2)
E t D t C1
E t R t C1 D : (10.4)
Pt
This says that the expected net return on the asset is the expected dividend divided by the
current price. This is clearly implausible for daily data since it means that the expected
return is zero for all days except those days when the asset pays a dividend (or rather, the
day the asset goes ex dividend)—and then there is an enormous expected return for the one
day when the dividend is paid. As a first step, we should probably refine the interpretation
of the efficient market hypothesis to include the dividend so that E t .P t C1 C D t C1 / D P t .
Using that in (10.2) gives 1 C E t R t C1 D 1, which can only be satisfied if E t R t C1 D 0,
which seems very implausible for long investment horizons—although it is probably a
reasonable approximation for short horizons (a week or less).
If returns are unforecastable, so E t R t C1 D R (a constant), then (10.3) gives
E t P t C1 C E t D t C1
Pt D : (10.5)
1CR
The main problem with this interpretation is that it looks at every asset separately and
that outside options are not taken into account. For instance, if the nominal interest rate
changes from 5% to 10%, why should the expected (required) return on a stock be un-
changed? In fact, most asset pricing models suggest that the expected return E t R t C1
equals the riskfree rate plus compensation for risk.
If excess returns are unforecastable, then the compensation (over the riskfree rate)
for risk is constant. The risk compensation is, of course, already reflected in the current
price P t , so the issue is then if there is some information in t which is correlated with
the risk compensation in P t C1 . Note that such forecastability does not necessarily imply
an inefficient market or presence of uninformed traders—it could equally well be due to
movements in risk compensation driven by movements in uncertainty (option prices sug-
gest that there are plenty of movements in uncertainty). If so, the forecastability cannot be
used to generate abnormal returns (over riskfree rate plus risk compensation). However,
it could also be due to exploitable market inefficiencies. Alternatively, you may argue

194
that the market compensates for risk which you happen to be immune to—so you are
interested in the return rather than the risk adjusted return.
This discussion of the traditional efficient market hypothesis suggests that the most
interesting hypotheses to test are if returns or excess returns are forecastable. In practice,
the results for them are fairly similar since the movements in most asset returns are much
greater than the movements in interest rates.

10.1.2 Martingales and Random Walks

Further reading: Cuthbertson (1996) 5.3


The accumulated wealth in a sequence of fair bets is expected to be unchanged. It is
then said to be a martingale.
The time series x is a martingale with respect to an information set ˝ t if the expected
value of x t Cs (s  1) conditional on the information set ˝ t equals x t . (The information
set ˝ t is often taken to be just the history of x: x t ; x t 1 ; :::)
The time series x is a random walk if x t C1 D x t C " t C1 , where " t and " t Cs are
uncorrelated for all s ¤ 0, and E " t D 0. (There are other definitions which require that
" t and " t Cs have the same distribution.) A random walk is a martingale; the converse is
not necessarily true.

Remark 10.2 (A martingale, but not a random walk). Suppose y t C1 D y t u t C1 , where


u t and u t Cs are uncorrelated for all s ¤ 0, and E t u t C1 D 1 . This is a martingale, but
not a random walk.

In any case, the martingale property implies that x t Cs D x t C" t Cs , where the expected
value of " tCs based on ˝ t is zero. This is close enough to the random walk to motivate
the random walk idea in most cases.

195
10.2 Autocorrelations

10.2.1 Autocorrelation Coefficients and the Box-Pierce Test

The autocovariances of the y t process can be estimated as


T
1 X
Os D .y t N .y t
y/ s N ;
y/ (10.6)
T t D1Cs
T
1X
with yN D yt : (10.7)
T t D1

(We typically divide by T in (10.6) even if we have only T s full observations to estimate
s from.) Autocorrelations are then estimated as

Os D Os = O0 : (10.8)

The sampling properties of Os are complicated, but there are several useful large sam-
ple results for Gaussian processes (these results typically carry over to processes which
are similar to the Gaussian—a homoskedastic process with finite 6th moment is typically
enough, see Priestley (1981) 5.3 or Brockwell and Davis (1991) 7.2-7.3). When the true
autocorrelations are all zero (not 0 , of course), then for any i and j different from zero

p
" # " # " #!
Oi 0 1 0
T !d N ; : (10.9)
Oj 0 0 1

This result can be used to construct tests for both single autocorrelations (t-test or 2 test)
and several autocorrelations at once (2 test).

Example 10.3 (t-test) We want to test the hypothesis that 1 D 0. Since the N.0; 1/
distribution has 5% of the probability mass below -1.65 and another 5% above 1.65, we
p
can reject the null hypothesis at the 10% level if T jO1 j > 1:65. With T D 100, we
p
therefore need jO1 j > 1:65= 100 D 0:165 for rejection, and with T D 1000 we need
p
jO1 j > 1:65= 1000  0:052.
p
The Box-Pierce test follows directly from the result in (10.9), since it shows that T Oi
p
and T Oj are iid N(0,1) variables. Therefore, the sum of the square of them is distributed

196
as a 2 variable. The test statistics typically used is
L
X
QL D T Os2 !d L
2
: (10.10)
sD1

Example 10.4 (Box-Pierce) Let O1 D 0:165, and T D 100, so Q1 D 100  0:1652 D
2:72. The 10% critical value of the 21 distribution is 2.71, so the null hypothesis of no
autocorrelation is rejected.

The choice of lag order in (10.10), L, should be guided by theoretical considerations,


but it may also be wise to try different values. There is clearly a trade off: too few lags may
miss a significant high-order autocorrelation, but too many lags can destroy the power of
the test (as the test statistics is not affected much by increasing L, but the critical values
increase).

10.2.2 Autoregressions

An alternative way of testing autocorrelations is to estimate an AR model

y t D c C a1 y t 1 C a2 y t 2 C ::: C ap y t p C "t ; (10.11)

and then test if all slope coefficients (a1 ; a2 ; :::; ap ) are zero with a 2 or F test. This
approach is somewhat less general than the Box-Pierce test, but most stationary time
series processes can be well approximated by an AR of relatively low order.
See Figure 10.4 for an illustration.
Inference of the slope coefficient in autoregressions on returns for longer data horizons
than the data frequency (for instance, analysis of weekly returns in a data set consisting
of daily observations) must be done with care. If only non-overlapping returns are used
(use the weekly return for a particular weekday only, say Wednesdays), the standard LS
expression for the standard deviation of the autoregressive parameter is likely to be rea-
sonable. This is not the case, if overlapping returns (all daily data on weekly returns) are
used.

Remark 10.5 (Overlapping returns ) Consider an AR(1) for the two-period return, y t 1C

yt
y t C1 C y t C2 D a C b2 .y t 1 C y t / C " t C2 :

197
Two successive observations with non-overlapping returns are then

y t C1 C y t C2 D a C b2 .y t 1 C y t / C " t C2
y t C3 C y t C4 D a C b2 .y t C1 C y t C2 / C " t C4 :

Suppose that y t is not autocorrelated, so the slope coefficient b2 D 0. We can then write
the residuals as

" tC2 D a C y t C1 C y t C2
" t C4 D a C y t C3 C y t C4 ;

which are uncorrelated. Compare this to the case where we use overlapping data. Two
successive observations are then

y t C1 C y t C2 D a C b2 .y t 1 C y t / C " t C2
y t C2 C y t C3 D a C b2 .y t C y t C1 / C " t C3 :

As before, b2 D 0 if y t has no autocorrelation, so the residuals become

" t C2 D a C y t C1 C y t C2
" t C3 D a C y t C2 C y t C3 ;

which are correlated since y tC2 shows up in both. This demonstrates that overlapping
return data introduces autocorrelation of the residuals—which has to be handled in order
to make correct inference.

10.2.3 Autoregressions versus Autocorrelations

It is straightforward to see the relation between autocorrelations and the AR model when
the AR model is the true process. This relation is given by the Yule-Walker equations.
For an AR(1), the autoregression coefficient is simply the first autocorrelation coeffi-

198
SMI SMI daily excess returns, %
8 10
SMI
6 bill portfolio 5

4 0

2 −5

0 −10
1990 1995 2000 2005 2010 1990 1995 2000 2005 2010
Year Year

Daily SMI data, 1988:7−2010:9

1st order autocorrelation of returns (daily, weekly, monthly): 0.02 −0.08 0.04

1st order autocorrelation of absolute returns (daily, weekly, monthly): 0.28 0.28 0.17

Figure 10.1: Time series properties of SMI

cient. For an AR(2), y t D a1 y t 1 C a2 y t 2 C " t , we have

Cov.y t ; y t / Cov.y t ; a1 y t 1 C a2 y t 2 C " t /


2 3 2 3

4 Cov.y t 1 ; y t / 5 D 4 Cov.y t 1 ; a1 y t 1 C a2 y t 2 C " t / 5


6 7 6 7

Cov.y t 2 ; y t / Cov.y t 2 ; a1 y t 1 C a2 y t 2 C " t /


a1 Cov.y t ; y t 1 / C a2 Cov.y t ; y t 2 / C Cov.y t ; " t /
2 3

D 4 a1 Cov.y t 1 ; y t 1 / C a2 Cov.y t 1 ; y t 2 / 5 , or
6 7

a1 Cov.y t 2 ; y t 1 / C a2 Cov.y t 2 ; y t 2 /
a1 1 C a2 2 C Var." t /
2 3 2 3
0
4 1 5 D 4 a1 0 C a2 1 5: (10.12)
6 7 6 7

2 a1 1 C a2 0

To transform to autocorrelation, divide by 0 . The last two equations are then


" # " # " # " #
1 a1 C a2 1 1 a1 = .1 a2 /
D or D : (10.13)
2 a1  1 C a2 2 a12 = .1 a2 / C a2

If we know the parameters of the AR(2) model (a1 , a2 , and Var." t /), then we can
solve for the autocorrelations. Alternatively, if we know the autocorrelations, then we can
solve for the autoregression coefficients. This demonstrates that testing if all the autocor-

199
Autocorr, daily excess returns Autocorr, weekly excess returns
0.3 0.3
Autocorr with 90% conf band around 0
0.2 S&P 500, 1979:1−2010:9 0.2

0.1 0.1

0 0

−0.1 −0.1
1 2 3 4 5 1 2 3 4 5
lags (days) lags (weeks)

Autocorr, daily abs(excess returns) Autocorr, weekly abs(excess returns)


0.3 0.3

0.2 0.2

0.1 0.1

0 0

−0.1 −0.1
1 2 3 4 5 1 2 3 4 5
lags (days) lags (weeks)

Figure 10.2: Predictability of US stock returns

relations are zero is essentially the same as testing if all the autoregressive coefficients are
zero. Note, however, that the transformation is non-linear, which may make a difference
in small samples.

10.2.4 Variance Ratios

A variance ratio is another way to measure predictability. It is defined as the variance of


a q-period return divided by q times the variance of a 1-period return
P 
q 1
Var y
sD0 t s
VRq D : (10.14)
q Var.y t /

200
Autocorr, excess returns, smallest decile Autocorr, excess returns, 5th decile

0.3 0.3
0.2 0.2
0.1 0.1
0 0
−0.1 −0.1
1 2 3 4 5 1 2 3 4 5
lags (days) lags (days)

Autocorr with 90% conf band around 0


Autocorr, excess returns, largest decile US daily data 1979:1−2009:12

0.3
0.2
0.1
0
−0.1
1 2 3 4 5
lags (days)

Figure 10.3: Predictability of US stock returns, size deciles

To see that this is related to predictability, consider the 2-period variance ratio.

Var.y t C y t 1 /
VR2 D (10.15)
2 Var.y t /
Var .y t / C Var .y t 1 / C 2 Cov .y t ; y t 1/
D
2 Var .y t /
Cov .y t ; y t 1 /
D1C
Var .y t /
D 1 C 1 : (10.16)

It is clear from (10.16) that if y t is not serially correlated, then the variance ratio is unity;
a value above one indicates positive serial correlation and a value below one indicates
negative serial correlation. The same applies to longer horizons.
The estimation of VRq is typically not done by replacing the population variances in

201
Return = a + b*lagged Return, slope Return = a + b*lagged Return, R2
Slope with 90% conf band
0.5
0.1
0
0.05
−0.5
0
0 20 40 60 0 20 40 60
Return horizon (months) Return horizon (months)
US stock returns 1926:1−2010:4

Return = a + b*E/P, slope Return = a + b*E/P, R2


0.6 0.15

0.4 0.1

0.2 0.05

0 0
0 20 40 60 0 20 40 60
Return horizon (months) Return horizon (months)

Figure 10.4: Predictability of US stock returns

(10.14) with the sample variances, since this would require using non-overlapping long
returns—which wastes a lot of data points. For instance, if we have 24 years of data and
we want to study the variance ratio for the 5-year horizon, then 4 years of data are wasted.
Instead, we typically rely on a transformation of (10.14)
P 
q 1
Var sD0 y t s
VRq D
q Var.y t /
q 1
X  jsj

D 1 s or
q
sD .q 1/
q 1 
X s
D1C2 1 s : (10.17)
sD1
q

To estimate VRq , we first estimate the autocorrelation coefficients (using all available data

202
Variance Ratio, 1926− Variance Ratio, 1957−
1.5 VR with 90% conf band 1.5

1 1

0.5 0.5
0 20 40 60 0 20 40 60
Return horizon (months) Return horizon (months)

US stock returns 1926:1−2010:4

Confidence bands use asymptotic sampling distribution of VR

Figure 10.5: Variance ratios, US excess stock returns

points for each estimation) and then calculate (10.17).

b
Remark 10.6 ( Sampling distribution of V Rq ) Under the null hypothesis that there is no
autocorrelation, (10.9) and (10.17) give

b
" q 1 
p 
 #

d
X s 2
T V Rq 1 ! N 0; 4 1 :
sD1
q

b
Example 10.7 (Sampling distributions of V R2 and V R3 ) b
p  
b
T V R2 1 !d N .0; 1/ or V R2 !d N .1; 1=T / b
p  
b
and T V R3 1 !d N .1; 20=9/ or V R3 !d N Œ1; .20=9/=T  : b
The results in CLM Table 2.5 and 2.6 (weekly CRSP stock index returns, early 1960s
to mid 1990s) show variance ratios above one and increasing with the number of lags, q.
The results for individual stocks in CLM Table 2.7 show variance ratios close to, or even
below, unity. Cochrane Tables 20.5–6 report weak evidence for more mean reversion in
multi-year returns (annual NYSE stock index,1926 to mid 1990s).
See Figure 10.5 for an illustration.

203
10.3 Other Predictors and Methods

There are many other possible predictors of future stock returns. For instance, both the
dividend-price ratio and nominal interest rates have been used to predict long-run returns,
and lagged short-run returns on other assets have been used to predict short-run returns.

10.3.1 Lead-Lags

Stock indices have more positive autocorrelation than (most) individual stocks: there
should therefore be fairly strong cross-autocorrelations across individual stocks. (See
Campbell, Lo, and MacKinlay (1997) Tables 2.7 and 2.8.) Indeed, this is also what is
found in US data where weekly returns of large size stocks forecast weekly returns of
small size stocks.
See Figures 10.6–10.7 for an illustration.

Correlation of largest decile with lags of Correlation of 5th decile with lags of
smallest decile smallest decile
0.3 5th decile 0.3 5th decile
largest decile largest decile
0.2 0.2
0.1 0.1
0 0
−0.1 −0.1
1 2 3 4 5 1 2 3 4 5
Days Days

US size deciles
Correlation of smallest decile with lags of US daily data 1979:1−2009:12
smallest decile
0.3 5th decile
largest decile
0.2
0.1
0
−0.1
1 2 3 4 5
Days

Figure 10.6: Cross-correlation across size deciles

204
Regression of largest decile on lags of Regression of 5th decile on lags of
Regression coefficient
0.2 0.2
0.1 0.1
0 0
−0.1 self −0.1 self
largest decile
−0.2 −0.2
1 2 3 4 5 1 2 3 4 5
Days Days

US size deciles
Regression of smallest decile on lags of US daily data 1979:1−2009:12

0.2 Multiple regression with lagged return on


self and largest deciles as regressors.
0.1 The figures show regression coefficients.
0
−0.1 self
largest decile
−0.2
1 2 3 4 5
Days

Figure 10.7: Coefficients from multiple prediction regressions

10.3.2 Dividend-Price Ratio as a Predictor

One of the most successful attempts to forecast long-run returns is a regression of future
returns on the current dividend-price ratio (here in logs)
q
X
r t Cs D ˛ C ˇq .d t p t / C " t Cq : (10.18)
sD1

For instance, CLM Table 7.1, report R2 values from this regression which are close to
zero for monthly returns, but they increase to 0.4 for 4-year returns (US, value weighted
index, mid 1920s to mid 1990s).
See Figure 10.4 for an illustration.

205
(Auto−)correlation matrix
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

1 0.21 0.22 0.21 0.21 0.19 0.26 0.24 0.25 0.22 0.22 0.27 0.27 0.24 0.25 0.20 0.27 0.24 0.24 0.22 0.21 0.24 0.23 0.24 0.21 0.19

2 0.22 0.22 0.23 0.23 0.21 0.28 0.27 0.27 0.26 0.25 0.28 0.30 0.27 0.28 0.23 0.28 0.27 0.27 0.25 0.24 0.25 0.25 0.26 0.24 0.23

3 0.23 0.24 0.24 0.24 0.23 0.28 0.28 0.28 0.27 0.26 0.29 0.31 0.29 0.30 0.25 0.29 0.29 0.29 0.27 0.26 0.26 0.27 0.28 0.28 0.24

4 0.24 0.25 0.26 0.26 0.25 0.29 0.29 0.30 0.29 0.28 0.30 0.32 0.31 0.31 0.27 0.30 0.31 0.31 0.29 0.27 0.28 0.28 0.29 0.28 0.25

5 0.27 0.28 0.30 0.30 0.29 0.32 0.32 0.33 0.33 0.33 0.33 0.35 0.34 0.35 0.31 0.33 0.34 0.35 0.33 0.31 0.30 0.31 0.33 0.32 0.30

6 0.12 0.12 0.12 0.11 0.10 0.17 0.16 0.16 0.14 0.13 0.19 0.19 0.17 0.18 0.12 0.20 0.19 0.17 0.16 0.15 0.19 0.18 0.18 0.17 0.14

7 0.13 0.14 0.15 0.14 0.13 0.18 0.18 0.18 0.18 0.17 0.20 0.22 0.20 0.21 0.17 0.21 0.22 0.21 0.19 0.17 0.19 0.19 0.21 0.21 0.18

8 0.12 0.13 0.14 0.13 0.12 0.17 0.17 0.17 0.17 0.16 0.19 0.21 0.20 0.21 0.17 0.20 0.22 0.22 0.20 0.17 0.19 0.20 0.22 0.21 0.19

9 0.12 0.13 0.14 0.14 0.13 0.17 0.17 0.17 0.18 0.17 0.18 0.21 0.20 0.21 0.18 0.19 0.21 0.22 0.20 0.18 0.18 0.19 0.22 0.22 0.20

10 0.13 0.14 0.15 0.15 0.15 0.18 0.18 0.18 0.18 0.19 0.18 0.21 0.21 0.22 0.20 0.19 0.21 0.23 0.21 0.19 0.19 0.20 0.22 0.23 0.22

11 0.06 0.06 0.08 0.07 0.06 0.12 0.11 0.12 0.10 0.09 0.14 0.16 0.13 0.15 0.09 0.16 0.15 0.15 0.13 0.12 0.15 0.15 0.15 0.15 0.12

12 0.10 0.11 0.12 0.11 0.10 0.15 0.15 0.15 0.15 0.14 0.17 0.19 0.18 0.18 0.14 0.18 0.19 0.19 0.17 0.14 0.17 0.18 0.19 0.19 0.16

13 0.11 0.11 0.13 0.12 0.12 0.15 0.15 0.16 0.16 0.15 0.16 0.19 0.18 0.19 0.16 0.18 0.19 0.20 0.18 0.15 0.17 0.17 0.19 0.19 0.18

14 0.10 0.11 0.13 0.12 0.12 0.15 0.16 0.16 0.16 0.16 0.16 0.19 0.18 0.19 0.17 0.18 0.20 0.21 0.18 0.17 0.18 0.18 0.20 0.21 0.19

15 0.10 0.11 0.12 0.12 0.11 0.14 0.15 0.15 0.14 0.15 0.15 0.18 0.18 0.18 0.16 0.16 0.17 0.18 0.17 0.15 0.16 0.16 0.18 0.18 0.17

16 0.07 0.07 0.08 0.07 0.06 0.11 0.10 0.10 0.09 0.07 0.13 0.14 0.11 0.12 0.06 0.15 0.13 0.12 0.10 0.08 0.13 0.12 0.12 0.11 0.09

17 0.10 0.11 0.12 0.11 0.10 0.15 0.15 0.14 0.14 0.13 0.16 0.18 0.17 0.17 0.13 0.17 0.17 0.17 0.15 0.13 0.16 0.16 0.16 0.16 0.14

18 0.09 0.09 0.11 0.10 0.10 0.12 0.13 0.14 0.14 0.14 0.14 0.17 0.16 0.17 0.13 0.15 0.17 0.18 0.15 0.13 0.15 0.15 0.16 0.17 0.14

19 0.09 0.10 0.11 0.10 0.10 0.12 0.12 0.13 0.13 0.12 0.13 0.16 0.15 0.16 0.13 0.15 0.15 0.17 0.14 0.13 0.14 0.13 0.16 0.16 0.14

20 0.09 0.10 0.12 0.11 0.12 0.13 0.14 0.15 0.15 0.15 0.14 0.17 0.17 0.17 0.15 0.15 0.16 0.17 0.16 0.15 0.15 0.15 0.17 0.18 0.17

21 0.05 0.05 0.07 0.06 0.05 0.09 0.10 0.09 0.08 0.07 0.11 0.12 0.10 0.11 0.05 0.12 0.11 0.11 0.08 0.08 0.11 0.10 0.10 0.10 0.07

22 0.05 0.06 0.08 0.07 0.06 0.10 0.10 0.10 0.10 0.09 0.11 0.14 0.12 0.13 0.09 0.12 0.12 0.13 0.10 0.09 0.12 0.11 0.12 0.12 0.10

23 0.04 0.04 0.06 0.05 0.05 0.07 0.08 0.08 0.08 0.07 0.09 0.11 0.10 0.10 0.07 0.10 0.11 0.11 0.08 0.08 0.09 0.09 0.11 0.10 0.09

24 0.05 0.06 0.07 0.07 0.07 0.08 0.09 0.09 0.09 0.09 0.10 0.13 0.11 0.11 0.10 0.11 0.12 0.14 0.10 0.09 0.11 0.10 0.12 0.12 0.11

25 0.08 0.08 0.09 0.09 0.08 0.12 0.13 0.12 0.12 0.12 0.13 0.15 0.15 0.14 0.13 0.14 0.14 0.14 0.13 0.11 0.14 0.13 0.14 0.15 0.13

Figure 10.8: Illustration of the cross-autocorrelations, Corr.R t ; R t k /, monthly FF data.


Dark colors indicate high correlations, light colors indicate low correlations.

10.3.3 Predictability but No Autocorrelation

The evidence for US stock returns is that long-run returns may perhaps be predicted by the
dividend-price ratio or interest rates, but that the long-run autocorrelations are weak (long-
run US stock returns appear to be “weak-form efficient” but not “semi-strong efficient”).
This should remind us of the fact that predictability and autocorrelation need not be the
same thing: although autocorrelation implies predictability, we can have predictability
without autocorrelation.

10.3.4 Trading Strategies

Another way to measure predictability and to illustrate its economic importance is to


calculate the return of a dynamic trading strategy, and then measure the “performance”
of this strategy in relation to some benchmark portfolios. The trading strategy should, of
course, be based on the variable that is supposed to forecast returns.
A common way (since Jensen, updated in Huberman and Kandel (1987)) is to study

206
(Auto−)correlation matrix, daily FF returns
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

1 0.25 0.23 0.21 0.20 0.20 0.24 0.20 0.18 0.16 0.16 0.24 0.20 0.16 0.16 0.15 0.23 0.18 0.16 0.13 0.13 0.21 0.17 0.15 0.13 0.12

2 0.22 0.20 0.20 0.19 0.19 0.23 0.19 0.17 0.16 0.16 0.22 0.19 0.16 0.16 0.14 0.21 0.18 0.16 0.14 0.13 0.20 0.17 0.15 0.13 0.13

3 0.20 0.20 0.17 0.19 0.19 0.21 0.18 0.16 0.15 0.15 0.20 0.18 0.15 0.15 0.14 0.20 0.17 0.15 0.13 0.13 0.19 0.16 0.15 0.13 0.13

4 0.21 0.21 0.20 0.18 0.20 0.22 0.19 0.17 0.17 0.17 0.21 0.19 0.17 0.17 0.16 0.20 0.19 0.16 0.15 0.15 0.20 0.17 0.16 0.15 0.15

5 0.27 0.27 0.26 0.26 0.26 0.25 0.23 0.22 0.21 0.23 0.24 0.23 0.21 0.21 0.21 0.23 0.22 0.21 0.19 0.20 0.21 0.20 0.19 0.18 0.18

6 0.09 0.09 0.08 0.07 0.06 0.14 0.11 0.08 0.07 0.07 0.15 0.12 0.08 0.08 0.07 0.16 0.11 0.09 0.07 0.07 0.16 0.12 0.10 0.08 0.08

7 0.05 0.06 0.05 0.05 0.04 0.09 0.06 0.05 0.04 0.05 0.09 0.08 0.05 0.05 0.04 0.10 0.08 0.06 0.04 0.04 0.10 0.08 0.07 0.05 0.06

8 0.03 0.04 0.04 0.04 0.03 0.06 0.05 0.03 0.03 0.04 0.07 0.06 0.04 0.04 0.04 0.07 0.07 0.05 0.03 0.04 0.08 0.07 0.06 0.04 0.05

9 0.03 0.04 0.03 0.03 0.03 0.06 0.04 0.03 0.02 0.04 0.06 0.06 0.03 0.04 0.03 0.06 0.06 0.05 0.03 0.03 0.07 0.06 0.05 0.04 0.05

10 0.04 0.05 0.05 0.05 0.05 0.06 0.05 0.04 0.04 0.06 0.07 0.06 0.04 0.05 0.05 0.07 0.07 0.05 0.04 0.05 0.07 0.06 0.05 0.05 0.06

11 0.07 0.07 0.06 0.06 0.04 0.12 0.10 0.08 0.07 0.07 0.13 0.11 0.08 0.08 0.07 0.14 0.11 0.09 0.07 0.07 0.15 0.12 0.10 0.08 0.08

12 0.04 0.06 0.05 0.05 0.04 0.09 0.08 0.07 0.06 0.07 0.10 0.09 0.07 0.08 0.06 0.11 0.10 0.08 0.07 0.06 0.12 0.11 0.09 0.08 0.08

13 0.04 0.05 0.05 0.05 0.04 0.08 0.07 0.06 0.06 0.07 0.08 0.08 0.06 0.07 0.06 0.09 0.09 0.08 0.06 0.06 0.10 0.09 0.09 0.08 0.08

14 0.04 0.05 0.04 0.05 0.04 0.06 0.06 0.06 0.06 0.07 0.07 0.07 0.06 0.07 0.06 0.07 0.08 0.07 0.06 0.06 0.08 0.08 0.07 0.07 0.07

15 0.03 0.05 0.04 0.04 0.03 0.06 0.06 0.06 0.05 0.07 0.07 0.07 0.06 0.07 0.06 0.07 0.08 0.07 0.06 0.06 0.08 0.08 0.07 0.07 0.09

16 0.03 0.04 0.03 0.03 0.01 0.08 0.07 0.06 0.05 0.05 0.09 0.08 0.06 0.06 0.05 0.11 0.08 0.06 0.04 0.04 0.12 0.09 0.08 0.06 0.06

17 0.04 0.05 0.04 0.04 0.03 0.08 0.08 0.07 0.07 0.07 0.09 0.09 0.07 0.08 0.07 0.10 0.10 0.08 0.06 0.06 0.12 0.10 0.09 0.08 0.08

18 0.04 0.06 0.05 0.05 0.04 0.08 0.08 0.08 0.07 0.08 0.09 0.09 0.07 0.08 0.07 0.09 0.10 0.08 0.06 0.07 0.11 0.10 0.09 0.09 0.09

19 0.04 0.05 0.05 0.05 0.04 0.07 0.08 0.07 0.07 0.08 0.08 0.08 0.07 0.08 0.07 0.08 0.09 0.08 0.07 0.07 0.09 0.09 0.08 0.08 0.09

20 0.04 0.05 0.05 0.05 0.05 0.07 0.07 0.07 0.07 0.08 0.07 0.08 0.07 0.08 0.07 0.08 0.08 0.08 0.07 0.07 0.09 0.08 0.08 0.08 0.09

21 −0.03 −0.02 −0.03 −0.03 −0.04 0.00 0.00 −0.01 −0.01 0.00 0.02 0.01 −0.01 −0.00 −0.01 0.02 0.01 −0.00 −0.02 −0.02 0.04 0.02 0.01 −0.00 0.00

22 −0.02 −0.01 −0.01 −0.02 −0.02 0.02 0.02 0.01 0.01 0.02 0.03 0.02 0.01 0.02 0.00 0.03 0.03 0.01 −0.01 −0.00 0.04 0.03 0.02 0.01 0.02

23 −0.00 0.01 0.00 0.00 −0.00 0.03 0.03 0.02 0.02 0.03 0.04 0.04 0.02 0.03 0.02 0.04 0.04 0.03 0.02 0.02 0.05 0.04 0.04 0.03 0.04

24 −0.01 0.00 −0.00 −0.00 −0.01 0.01 0.02 0.02 0.02 0.03 0.02 0.02 0.01 0.02 0.01 0.02 0.03 0.02 0.01 0.01 0.03 0.03 0.02 0.03 0.03

25 −0.03 −0.01 −0.02 −0.02 −0.02 0.00 0.01 0.01 0.01 0.02 0.01 0.01 0.00 0.01 0.01 0.00 0.01 0.00 0.00 0.01 0.01 0.00 0.01 0.01 0.02

Figure 10.9: Illustration of the cross-autocorrelations, Corr.R t ; R t k /, daily FF data.


Dark colors indicate high correlations, light colors indicate low correlations.

the performance of a portfolio by running the following regression

R1t Rf t D ˛ C ˇ.Rmt Rf t / C " t , E " t D 0 and Cov.R1t Rf t ; " t / D 0; (10.19)

where R1t Rf t is the excess return on the portfolio being studied and Rmt Rf t the
excess returns of a vector of benchmark portfolios (for instance, only the market portfolio
if we want to rely on CAPM; returns times conditional information if we want to allow
for time-variation in expected benchmark returns). Neutral performance (mean-variance
intersection, that is, that the tangency portfolio is unchanged and the two MV frontiers
intersect there) requires ˛ D 0, which can be tested with a t test.
See Figure 10.10 for an illustration.

10.4 Security Analysts

Reference: Makridakis, Wheelwright, and Hyndman (1998) 10.1 and Elton, Gruber,
Brown, and Goetzmann (2010) 26

207
Buy winners and sell losers

excess return
8 alpha

Monthly US data 1957:1−2009:12, 25 FF portfolios (B/M and size)


2

Buy (sell) the 5 assets with highest (lowest) return over the last month
0
0 2 4 6 8 10 12
Evalutation horizon, months

Figure 10.10: Predictability of US stock returns, momentum strategy

10.4.1 Evidence on Analysts’ Performance

Makridakis, Wheelwright, and Hyndman (1998) 10.1 shows that there is little evidence
that the average stock analyst beats (on average) the market (a passive index portfolio).
In fact, less than half of the analysts beat the market. However, there are analysts which
seem to outperform the market for some time, but the autocorrelation in over-performance
is weak. The evidence from mutual funds is similar. For them it is typically also found
that their portfolio weights do not anticipate price movements.
It should be remembered that many analysts also are sales persons: either of a stock
(for instance, since the bank is underwriting an offering) or of trading services. It could
well be that their objective function is quite different from minimizing the squared forecast
errors—or whatever we typically use in order to evaluate their performance. (The number
of litigations in the US after the technology boom/bust should serve as a strong reminder
of this.)

10.4.2 Do Security Analysts Overreact?

The paper by Bondt and Thaler (1990) compares the (semi-annual) forecasts (one- and
two-year time horizons) with actual changes in earnings per share (1976-1984) for several

208
hundred companies. The paper has regressions like

Actual change D ˛ C ˇ.forecasted change/ C residual,

and then studies the estimates of the ˛ and ˇ coefficients. With rational expectations (and
a long enough sample), we should have ˛ D 0 (no constant bias in forecasts) and ˇ D 1
(proportionality, for instance no exaggeration).
The main findings are as follows. The main result is that 0 < ˇ < 1, so that the
forecasted change tends to be too wild in a systematic way: a forecasted change of 1% is
(on average) followed by a less than 1% actual change in the same direction. This means
that analysts in this sample tended to be too extreme—to exaggerate both positive and
negative news.

10.4.3 High-Frequency Trading Based on Recommendations from Stock Analysts

Barber, Lehavy, McNichols, and Trueman (2001) give a somewhat different picture.
They focus on the profitability of a trading strategy based on analyst’s recommendations.
They use a huge data set (some 360,000 recommendations, US stocks) for the period
1985-1996. They sort stocks in to five portfolios depending on the consensus (average)
recommendation—and redo the sorting every day (if a new recommendation is published).
They find that such a daily trading strategy gives an annual 4% abnormal return on the
portfolio of the most highly recommended stocks, and an annual -5% abnormal return on
the least favourably recommended stocks.
This strategy requires a lot of trading (a turnover of 400% annually), so trading costs
would typically reduce the abnormal return on the best portfolio to almost zero. A less
frequent rebalancing (weekly, monthly) gives a very small abnormal return for the best
stocks, but still a negative abnormal return for the worst stocks. Chance and Hemler
(2001) obtain similar results when studying the investment advise by 30 professional
“market timers.”

10.4.4 Economic Experts

Several papers, for instance, Bondt (1991) and Söderlind (2010), have studied whether
economic experts can predict the broad stock markets. The results suggests that they
cannot. For instance, Söderlind (2010) show that the economic experts that participate in

209
the semi-annual Livingston survey (mostly bank economists) (ii) forecast the S&P worse
than the historical average (recursively estimated), and that their forecasts are strongly
correlated with recent market data (which in itself, cannot predict future returns).

10.4.5 The Characteristics of Individual Analysts’ Forecasts in Europe

Bolliger (2001) studies the forecast accuracy (earnings per share) of European (13 coun-
tries) analysts for the period 1988–1999. In all, some 100,000 forecasts are studied. It
is found that the forecast accuracy is positively related to how many times an analyst has
forecasted that firm and also (surprisingly) to how many firms he/she forecasts. The ac-
curacy is negatively related to the number of countries an analyst forecasts and also to the
size of the brokerage house he/she works for.

10.4.6 Bond Rating Agencies versus Stock Analysts

Ederington and Goh (1998) use data on all corporate bond rating changes by Moody’s
between 1984 and 1990 and the corresponding earnings forecasts (by various stock ana-
lysts).
The idea of the paper by Ederington and Goh (1998) is to see if bond ratings drive
earnings forecasts (or vice versa), and if they affect stock returns (prices).

1. To see if stock returns are affected by rating changes, they first construct a “normal”
return by a market model:

normal stock return t = ˛ C ˇ  return on stock index t ,

where ˛ and ˇ are estimated on a normal time period (not including the rating
change). The abnormal return is then calculated as the actual return minus the
normal return. They then study how such abnormal returns behave, on average,
around the dates of rating changes. Note that “time” is then measured, individually
for each stock, as the distance from the day of rating change. The result is that there
are significant negative abnormal returns following downgrades, but zero abnormal
returns following upgrades.

2. They next turn to the question of whether bond ratings drive earnings forecasts or
vice versa. To do that, they first note that there are some predictable patterns in

210
revisions of earnings forecasts. They therefore fit a simple autoregressive model
of earnings forecasts, and construct a measure of earnings forecast revisions (sur-
prises) from the model. They then relate this surprise variable to the bond ratings.
In short, the results are the following:

(a) both earnings forecasts and ratings react to the same information, but there is
also a direct effect of rating changes, which differs between downgrades and
upgrades.
(b) downgrades: the ratings have a strong negative direct effect on the earnings
forecasts; the returns react ever quicker than analysts
(c) upgrades: the ratings have a small positive direct effect on the earnings fore-
casts; there is no effect on the returns

A possible reason for why bond ratings could drive earnings forecasts and prices is
that bond rating firms typically have access to more inside information about firms than
stock analysts and investors.
A possible reason for the observed asymmetric response of returns to ratings is that
firms are quite happy to release positive news, but perhaps more reluctant to release bad
news. If so, then the information advantage of bond rating firms may be particularly large
after bad news. A downgrading would then reveal more new information than an upgrade.
The different reactions of the earnings forecasts and the returns are hard to reconcile.

10.4.7 International Differences in Analyst Forecast Properties

Ang and Ciccone (2001) study earnings forecasts for many firms in 42 countries over the
period 1988 to 1997. Some differences are found across countries: forecasters disagree
more and the forecast errors are larger in countries with low GDP growth, less accounting
disclosure, and less transparent family ownership structure.
However, the most robust finding is that forecasts for firms with losses are special:
forecasters disagree more, are more uncertain, and are more overoptimistic about such
firms.

10.4.8 Analysts and Industries

Boni and Womack (2006) study data on on some 170,000 recommedation for a very

211
large number of U.S. companies for the period 1996–2002. Focusing on revisions of
recommendations, the papers shows that analysts are better at ranking firms within an
industry than ranking industries.

10.5 Technical Analysis

Main reference: Bodie, Kane, and Marcus (2002) 12.2; Neely (1997) (overview, foreign
exchange market)
Further reading: Murphy (1999) (practical, a believer’s view); The Economist (1993)
(overview, the perspective of the early 1990s); Brock, Lakonishok, and LeBaron (1992)
(empirical, stock market); Lo, Mamaysky, and Wang (2000) (academic article on return
distributions for “technical portfolios”)

10.5.1 General Idea of Technical Analysis

Technical analysis is typically a data mining exercise which looks for local trends or
systematic non-linear patterns. The basic idea is that markets are not instantaneously
efficient: prices react somewhat slowly and predictably to news. The logic is essentially
that an observed price move must be due to some news (exactly which one is not very
important) and that old patterns can tell us where the price will move in the near future.
This is an attempt to gather more detailed information than that used by the market as a
whole. In practice, the technical analysis amounts to plotting different transformations
(for instance, a moving average) of prices—and to spot known patterns. This section
summarizes some simple trading rules that are used.

10.5.2 Technical Analysis and Local Trends

Many trading rules rely on some kind of local trend which can be thought of as positive
autocorrelation in price movements (also called momentum1 ).
A moving average rule is to buy if a short moving average (equally weighted or ex-
ponentially weighted) goes above a long moving average. The idea is that event signals
a new upward trend. Let S (L) be the lag order of a short (long) moving average, with
1
In physics, momentum equals the mass times speed.

212
S < L and let b be a bandwidth (perhaps 0.01). Then, a MA rule for period t could be

buy in t if MA t 1 .S/ > MA t 1 .L/.1 C b/


2 3

4 sell in t if MA t 1 .S/ < MA t 1 .L/.1 b/ 5 , where (10.20)


6 7

no change otherwise
MA t 1 .S/ D .p t 1 C : : : C pt S /=S:

The difference between the two moving averages is called an oscillator (or sometimes,
moving average convergence divergence2 ). A version of the moving average oscillator is
the relative strength index3 , which is the ratio of average price level on “up” days to the
average price on “down” days—during the last z (14 perhaps) days.
The trading range break-out rule typically amounts to buying when the price rises
above a previous peak (local maximum). The idea is that a previous peak is a resistance
level in the sense that some investors are willing to sell when the price reaches that value
(perhaps because they believe that prices cannot pass this level; clear risk of circular
reasoning or self-fulfilling prophecies; round numbers often play the role as resistance
levels). Once this artificial resistance level has been broken, the price can possibly rise
substantially. On the downside, a support level plays the same role: some investors are
willing to buy when the price reaches that value. To implement this, it is common to let
the resistance/support levels be proxied by minimum and maximum values over a data
window of length L. With a bandwidth b (perhaps 0.01), the rule for period t could be

buy in t if P t > M t 1 .1 C b/
2 3

4 sell in t if P t < m t 1 .1 b/ 5 , where (10.21)


6 7

no change otherwise
Mt 1 D max.p t 1; : : : ; pt S /

mt 1 D min.p t 1 ; : : : ; p t S /:

When the price is already trending up, then the trading range break-out rule may be
replaced by a channel rule, which works as follows. First, draw a trend line through
previous lows and a channel line through previous peaks. Extend these lines. If the price
2
Yes, the rumour is true: the tribe of chartists is on the verge of developing their very own language.
3
Not to be confused with relative strength, which typically refers to the ratio of two different asset prices
(for instance, an equity compared to the market).

213
moves above the channel (band) defined by these lines, then buy. A version of this is to
define the channel by a Bollinger band, which is ˙2 standard deviations from a moving
data window around a moving average.
A head and shoulder pattern is a sequence of three peaks (left shoulder, head, right
shoulder), where the middle one (the head) is the highest, with two local lows in between
on approximately the same level (neck line). (Easier to draw than to explain in a thousand
words.) If the price subsequently goes below the neckline, then it is thought that a negative
trend has been initiated. (An inverse head and shoulder has the inverse pattern.)
Clearly, we can replace “buy” in the previous rules with something more aggressive,
for instance, replace a short position with a long.
The trading volume is also often taken into account. If the trading volume of assets
with declining prices is high relative to the trading volume of assets with increasing prices,
then this is interpreted as a market with selling pressure. (The basic problem with this
interpretation is that there is a buyer for every seller, so we could equally well interpret
the situations as if there is a buying pressure.)

10.5.3 Technical Analysis and Mean Reversion

If we instead believe in mean reversion of the prices, then we can essentially reverse
the previous trading rules: we would typically sell when the price is high. See Figures
10.11–10.12.
Some investors argue that markets show periods of mean reversion and then periods
with trends—an that both can be exploited. Clearly, the concept of support and resistance
levels (or more generally, a channel) is based on mean reversion between these points. A
new trend is then supposed to be initiated when the price breaks out of this band.

10.6 Spurious Regressions and In-Sample Overfit

References: Ferson, Sarkissian, and Simin (2003), Goyal and Welch (2008), and Camp-
bell and Thompson (2008)

214
Inverted MA rule, S&P 500
1350
MA(3) and MA(25), bandwidth 0.01

1300

1250

1200
Long MA (−)
Long MA (+)
Short MA
1150
Jan Feb Mar Apr
1999

Figure 10.11: Examples of trading rules

10.6.1 Spurious Regressions

Ferson, Sarkissian, and Simin (2003) argue that many prediction equations suffer from
“spurious regression” features—and that data mining tends to make things even worse.
Their simulation experiment is based on a simple model where the return predictions
are
r t C1 D ˛ C ıZ t C v t C1 ; (10.22)

where Z t is a regressor (predictor). The true model is that returns follows the process

r t C1 D  C Z t C u t C1 ; (10.23)

where the residual is white noise. In this equation, Z t represents movements in expected
returns. The predictors follow a diagonal VAR(1)
" # " #" # " # " #!
Zt  0 Zt 1 "t "t

D  
C  , with Cov D ˙: (10.24)
Zt 0  Zt 1 "t "t

In the case of a “pure spurious regression,” the innovations to the predictors are uncor-
related (˙ is diagonal). In this case, ı ought to be zero—and their simulations show that

215
Distribution of all returns Inverted MA rule: after buy signal

0.6 Mean Std 0.6 Mean Std


0.03 1.18 0.06 1.72

0.4 0.4

0.2 0.2

0 0
−3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3
Return Return
Daily S&P 500 data 1990:1−2010:9

Inverted MA rule: after neutral signal Inverted MA rule: after sell signal

0.6 Mean Std 0.6 Mean Std


0.04 0.93 0.00 0.92

0.4 0.4

0.2 0.2

0 0
−3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3
Return Return

Figure 10.12: Examples of trading rules

the estimates are almost unbiased. Instead, there is a problem with the standard deviation
O If  is high, then the returns will be autocorrelated.
of ı.
Under the null hypothesis of ı D 0, this autocorrelation is loaded onto the residuals.
For that reason, the simulations use a Newey-West estimator of the covariance matrix
(with an automatic choice of lag order). This should, ideally, solve the problem with the
inference—but the simulations show that it doesn’t: when Z t is very autocorrelated (0.95
or higher) and reasonably important (so an R2 from running (10.23), if we could, would
be 0.05 or higher), then the 5% critical value (for a t-test of the hypothesis ı D 0) would
be 2.7 (to be compared with the nominal value of 1.96). Since the point estimates are
almost unbiased, the interpretation is that the standard deviations are underestimated. In
contrast, with low autocorrelation and/or low importance of Z t , the standard deviations
are much more in line with nominal values.
See Figures 10.14–10.15 for an illustration. They show that we need a combination of

216
Hold index if MA(3)>MA(25) Hold index if Pt > max(Pt−1,...,Pt−5)

6 6
SMI
Rule
4 4

2 2

1990 1995 2000 2005 2010 1990 1995 2000 2005 2010
Year Year

Daily SMI data


Hold index if Pt/Pt−7 > 1 Weekly rebalancing: hold index or riskfree

1990 1995 2000 2005 2010


Year

Figure 10.13: Examples of trading rules

an autocorrelated residual and an autocorrelated regressor to create a problem for the usual
LS formula for the standard deviation of a slope coefficient. When the autocorrelation is
very high, even the Newey-West estimator is likely to underestimate the true uncertainty.
To study the interaction between spurious regressions and data mining, Ferson, Sarkissian,
and Simin (2003) let Z t be chosen from a vector of L possible predictors—which all are
generated by a diagonal VAR(1) system as in (10.24) with uncorrelated errors. It is as-
sumed that the researcher chooses Z t by running L regressions, and then picks the one
with the highest R2 . When  D 0:15 and the researcher chooses between L D 10 predic-
tors, the simulated 5% critical value is 3.5. Since this does not depend on the importance
of Z t , it is interpreted as a typical feature of “data mining,” which is bad enough. With
the autocorrelation is 0.95, then the importance of Z t becomes important—“spurious re-
gressions” interact with the data mining to create extremely high simulated critical values.

217
Model: yt = 0.9xt + εt,

Autocorrelation of xtut where εt = ρεt−1 + ut,

where ut is iid N(0,h) such that Std(εt) = 1, and


0.5 xt = κxt−1 + ηt

0 bLS is the LS estimate of b in


κ = −0.9 yt = a + bxt + ut
−0.5 κ=0
κ = 0.9
−0.5 0 0.5
ρ

Figure 10.14: Autocorrelation of x t u t when u t has autocorrelation 

Std of LS under autocorrelation, κ = −0.9 Std of LS under autocorrelation, κ = 0


0.1 2 −1
0.1
σ (X’X)
Newey−West
Simulated
0.05 0.05

0 0
−0.5 0 0.5 −0.5 0 0.5
ρ ρ

Std of LS under autocorrelation, κ = 0.9


0.1

0.05

0
−0.5 0 0.5
ρ

Figure 10.15: Variance of OLS estimator, autocorrelated errors

218
A possible explanation is that the data mining exercise is likely to pick out the most au-
tocorrelated predictor, and that a highly autocorrelated predictor exacerbates the spurious
regression problem.

10.6.2 In-Sample versus Out-of-Sample Forecasting

Goyal and Welch (2008) find that the evidence of predictability of equity returns dis-
appears when out-of-sample forecasts are considered. Campbell and Thompson (2008)
claim that there is still some out-of-sample predictability, provided we put restrictions on
the estimated models.
Campbell and Thompson (2008) first report that only few variables (earnings price
ratio, T-bill rate and the inflation rate) have significant predictive power for one-month
stock returns in the full sample (1871–2003 or early 1920s–2003, depending on predictor).
To gauge the out-of-sample predictability, they estimate the prediction equation using
data up to and including t 1, and then make a forecast for period t. The forecasting
performance of the equation is then compared with using the historical average as the
predictor. Notice that this historical average is also estimated on data up to an including
t 1, so it changes over time. Effectively, they are comparing the forecast performance
of two models estimated in a recursive way (long and longer sample): one model has just
an intercept, the other has also a predictor. The comparison is done in terms of the RMSE
and an “out-of-sample R2 ”
XT XT
2
ROS D1 .r t rOt /2 = .r t rNt /2 ; (10.25)
t Ds tDs

where s is the first period with an out-of-sample forecast, rOt is the forecast based on the
prediction model (estimated on data up to and including t 1) and rNt is the historical
average (also estimated on data up to and including t 1).
The evidence shows that the out-of-sample forecasting performance is very weak—as
claimed by Goyal and Welch (2008).
It is argued that forecasting equations can easily give strange results when they are
estimated on a small data set (as they are early in the sample). They therefore try different
restrictions: setting the slope coefficient to zero whenever the sign is “wrong,” setting
the prediction (or the historical average) to zero whenever the value is negative. This
improves the results a bit—although the predictive performance is still weak.

219
See Figure 10.16 for an illustration.

RMSE, E/P regression vs MA RMSE, max(E/P regression,0) vs MA

MA
0.2 0.2
Regression
0.19 0.19
0.18 0.18
0.17 0.17
0.16 0.16
100 150 200 250 300 350 100 150 200 250 300 350
Length of data window, months Length of data window, months

US stock 1−year returns 1926:1−2010:4


Predictions made for 1957:1−2010:4
Estimation is done on moving data window; forecasts made out of sample
In−sample RMSE: 0.17

Figure 10.16: Predictability of US stock returns, in-sample and out-of-sample

10.7 Empirical U.S. Evidence on Stock Return Predictability

The two most common methods for investigating the predictability of stock returns are
to calculate autocorrelations and to construct simple dynamic portfolios and see if they
outperform passive portfolios. The dynamic portfolio could, for instance, be a simple
filter rule that calls for rebalancing once a month by buying (selling) assets which have
increased (decreased) by more than x% the last month. If this portfolio outperforms a
passive portfolio, then this is evidence of some positive autocorrelation (“momentum”)
on a one-month horizon. The following points summarize some evidence which seems to
hold for both returns and returns in excess of a riskfree rate (an interest rate).

1. The empirical evidence suggests some, but weak, positive autocorrelation in short
horizon returns (one day up to a month) — probably too little to trade on. The
autocorrelation is stronger for small than for large firms (perhaps no autocorrela-
tion at all for weekly or longer returns in large firms). This implies that equally
weighted stock indices have higher autocorrelations than value-weighted indices.
(See Campbell, Lo, and MacKinlay (1997) Table 2.4.)

220
2. Stock indices have more positive autocorrelation than (most) individual stocks:
there must be fairly strong cross-autocorrelations across individual stocks. (See
Campbell, Lo, and MacKinlay (1997) Tables 2.7 and 2.8.)

3. There seems to be negative autocorrelation of multi-year stock returns, for instance


in 5-year US returns for 1926-1985. It is unclear what drives this result, how-
ever. It could well be an artifact of just a few extreme episodes (Great Depression).
Moreover, the estimates are very uncertain as there are very few (non-overlapping)
multi-year returns even in a long sample—the results could be just a fluke.

4. The aggregate stock market returns, that is, a return on a value-weighted stock
index, seems to be forecastable on the medium horizon by various information
variables. In particular, future stock returns seem to be predictable by the current
dividend-price ratio and earnings-price ratio (positively, one to several years), or
by the interest rate changes (negatively, up to a year). For instance, the coefficient
of determination (usually denoted R2 , but not to be confused with the return used
above) for predicting the two-year return on the US stock market by the current
dividend-price ratio is around 0.3 for the 1952-1994 sample. (See Campbell, Lo,
and MacKinlay (1997) Tables 7.1-2.) This evidence suggests that expected returns
may very well be time-varying and correlated with the business cycle.

5. Even if short-run returns, R t C1 , are fairly hard to forecast, it is often fairly easy
to forecast volatility as measured by jR t C1 j or R2tC1 (for instance, using ARCH
or GARCH models). For an example, see Bodie, Kane, and Marcus (2002) Fig-
ure 13.7. This could possibly be used for dynamic trading strategies on options
(which directly price volatility). For instance, buying both a call and a put option (a
“straddle” or a “strangle”), is a bet on a large price movement (in any direction).

6. It is sometimes found that stock prices behave differently in periods with high
volatility than in more normal periods. Granger (1992) reports that the forecast-
ing performance is sometimes improved by using different forecasting models for
these two regimes. A simple and straightforward way to estimate a model for peri-
ods of normal volatility is to simply throw out data for volatile periods (and other
exceptional events).

221
7. It is important to assess forecasting models in terms of their out-of-sample forecast-
ing performance. Too many models seem to fit data in-sample, but most of them
fail in out-of-sample tests. Forecasting models are of no use if they cannot forecast.

8. There are also a number of strange patterns (“anomalies”) like the small-firms-in-
January effect (high returns on these in the first part of January) and the book-
to-market effect (high returns on firms with high book/market value of the firm’s
equity).

Bibliography
Ang, J. S., and S. J. Ciccone, 2001, “International differences in analyst forecast proper-
ties,” mimeo, Florida State University.

Barber, B., R. Lehavy, M. McNichols, and B. Trueman, 2001, “Can investors profit from
the prophets? Security analyst recommendations and stock returns,” Journal of Fi-
nance, 56, 531–563.

Bodie, Z., A. Kane, and A. J. Marcus, 2002, Investments, McGraw-Hill/Irwin, Boston,


5th edn.

Bolliger, G., 2001, “The characteristics of individual analysts’ forecasts in Europe,”


mimeo, University of Neuchatel.

Bondt, W. F. M. D., 1991, “What do economists know about the stock market?,” Journal
of Portfolio Management, 17, 84–91.

Bondt, W. F. M. D., and R. H. Thaler, 1990, “Do security analysts overreact?,” American
Economic Review, 80, 52–57.

Boni, L., and K. L. Womack, 2006, “Analysts, industries, and price momentum,” Journal
of Financial and Quantitative Analysis, 41, 85–109.

Brock, W., J. Lakonishok, and B. LeBaron, 1992, “Simple technical trading rules and the
stochastic properties of stock returns,” Journal of Finance, 47, 1731–1764.

Brockwell, P. J., and R. A. Davis, 1991, Time series: theory and methods, Springer Verlag,
New York, second edn.

222
Campbell, J. Y., A. W. Lo, and A. C. MacKinlay, 1997, The econometrics of financial
markets, Princeton University Press, Princeton, New Jersey.

Campbell, J. Y., and S. B. Thompson, 2008, “Predicting the equity premium out of sam-
ple: can anything beat the historical average,” Review of Financial Studies, 21, 1509–
1531.

Chance, D. M., and M. L. Hemler, 2001, “The performance of professional market timers:
daily evidence from executed strategies,” Journal of Financial Economics, 62, 377–
411.

Cochrane, J. H., 2001, Asset pricing, Princeton University Press, Princeton, New Jersey.

Cuthbertson, K., 1996, Quantitative financial economics, Wiley, Chichester, England.

Ederington, L. H., and J. C. Goh, 1998, “Bond rating agencies and stock analysts: who
knows what when?,” Journal of Financial and Quantitative Analysis, 33, 569–585.

Elton, E. J., M. J. Gruber, S. J. Brown, and W. N. Goetzmann, 2010, Modern portfolio


theory and investment analysis, John Wiley and Sons, 8th edn.

Ferson, W. E., S. Sarkissian, and T. T. Simin, 2003, “Spurious regressions in financial


economics,” Journal of Finance, 57, 1393–1413.

Goyal, A., and I. Welch, 2008, “A comprehensive look at the empirical performance of
equity premium prediction,” Review of Financial Studies 2008, 21, 1455–1508.

Granger, C. W. J., 1992, “Forecasting stock market prices: lessons for forecasters,” Inter-
national Journal of Forecasting, 8, 3–13.

Huberman, G., and S. Kandel, 1987, “Mean-variance spanning,” Journal of Finance, 42,
873–888.

Lo, A. W., H. Mamaysky, and J. Wang, 2000, “Foundations of technical analysis: com-
putational algorithms, statistical inference, and empirical implementation,” Journal of
Finance, 55, 1705–1765.

Makridakis, S., S. C. Wheelwright, and R. J. Hyndman, 1998, Forecasting: methods and


applications, Wiley, New York, 3rd edn.

223
Murphy, J. J., 1999, Technical analysis of the financial markets, New York Institute of
Finance.

Neely, C. J., 1997, “Technical analysis in the foreign exchange market: a layman’s guide,”
Federal Reserve Bank of St. Louis Review.

Priestley, M. B., 1981, Spectral analysis and time series, Academic Press.

Söderlind, P., 2010, “Predicting stock price movements: regressions versus economists,”
Applied Economics Letters, 17, 869–874.

The Economist, 1993, “Frontiers of finance,” pp. 5–20.

224
11 Event Studies
Reference: Bodie, Kane, and Marcus (2005) 12.3 or Copeland, Weston, and Shastri
(2005) 11
Reference (advanced): Campbell, Lo, and MacKinlay (1997) 4
More advanced material is denoted by a star ( ). It is not required reading.

11.1 Basic Structure of Event Studies

The idea of an event study is to study the effect (on stock prices or returns) of a special
event by using a cross-section of such events. For instance, what is the effect of a stock
split announcement on the share price? Other events could be debt issues, mergers and
acquisitions, earnings announcements, or monetary policy moves.
The event is typically assumed to be a discrete variable. For instance, it could be a
merger or not or if the monetary policy surprise was positive (lower interest than expected)
or not. The basic approach is then to study what happens to the returns of those assets
that have such an event.
Only news should move the asset price, so it is often necessary to explicitly model
the previous expectations to define the event. For earnings, the event is typically taken to
be the earnings announcement minus (some average of) analysts’ forecast. Similarly, for
monetary policy moves, the event could be specified as the interest rate decision minus
previous forward rates (as a measure of previous expectations).
The abnormal return of asset i in period t is

normal
ui;t D Ri;t Ri;t ; (11.1)

where Ri t is the actual return and the last term is the normal return (which may differ
across assets and time). The definition of the normal return is discussed in detail in Section
11.2. These returns could be nominal returns, but more likely (at least for slightly longer
horizons) real returns or excess returns.
Suppose we have a sample of n such events (“assets”). To keep the notation (reason-

225
-1 0 1 -1 0 1 time

firm 1 firm 2

Figure 11.1: Event days and windows

ably) simple, we “normalize” the time so period 0 is the time of the event. Clearly the
actual calendar time of the events for assets i and j are likely to differ, but we shift the
time line for each asset individually so the time of the event is normalized to zero for
every asset. See Figure 11.1 for an illustration.
To control for information leakage and slow price adjustment, the abnormal return is
often calculated for some time before and after the event: the “event window” (often ˙20
days or so). For day s (that is, s days after the event time 0), the cross sectional average
abnormal return is
uN s D niD1 ui;s =n: (11.2)
P

For instance, uN 2 is the average abnormal return two days after the event, and uN 1 is for
one day before the event.
The cumulative abnormal return (CAR) of asset i is simply the sum of the abnormal
return in (11.1) over some period around the event. It is often calculated from the be-
ginning of the event window. For instance, if the event window starts at w, then the
q-period (day?) car for firm i is

cari;q D ui; w C ui; wC1 C : : : C ui; wCq 1 : (11.3)

The cross sectional average of the q-period car is


Pn
carq D i D1 cari;q =n: (11.4)

See Figure 11.2 for an empirical example.

Example 11.1 (Abnormal returns for ˙ day around event, two firms) Suppose there are
two firms and the event window contains ˙1 day around the event day, and that the

226
Cumulative excess return (average) with 90% conf band
100

80
Returns, %

60

40

20

Sample: 196 IPOs on the Shanghai Stock Exchange, 2001−2004


0
0 5 10 15 20 25
Days after IPO

Figure 11.2: Event study of IPOs in Shanghai 2001–2004. (Data from Nou Lai.)

abnormal returns (in percent) are

Time Firm 1 Firm 2 Cross-sectional Average


1 0:2 0:1 0:05
0 1:0 2:0 1:5
1 0:1 0:3 0:2

We have the following cumulative returns

Time Firm 1 Firm 2 Cross-sectional Average


1 0:2 0:1 0:05
0 1:2 1:9 1:55
1 1:3 2:2 1:75

11.2 Models of Normal Returns

This section summarizes the most common ways of calculating the normal return in
(11.1). The parameters in these models are typically estimated on a recent sample, the
“estimation window,” that ends before the event window. See Figure 11.3 for an illustra-

227
tion. (When there is no return data before the event window (for instance, when the event
is an IPO), then the estimation window can be after the event window.)
In this way, the estimated behaviour of the normal return should be unaffected by the
event. It is almost always assumed that the event is exogenous in the sense that it is not
due to the movements in the asset price during either the estimation window or the event
window. This allows us to get a clean estimate of the normal return.
The constant mean return model assumes that the return of asset i fluctuates randomly
around some mean i

Ri;t D i C i;t with E.i;t / D Cov.i;t ; i;t s / D 0: (11.5)

This mean is estimated by the sample average (during the estimation window). The nor-
mal return in (11.1) is then the estimated mean. O i so the abnormal return becomes Oi;t .
The market model is a linear regression of the return of asset i on the market return

Ri;t D ˛i C ˇi Rm;t C "i t with E."i;t / D Cov."i;t ; "i;t s / D Cov."i;t ; Rm;t / D 0: (11.6)

Notice that we typically do not impose the CAPM restrictions on the intercept in (11.6).
The normal return in (11.1) is then calculated by combining the regression coefficients
with the actual market return as ˛O i C ˇOi Rm;t , so the the abnormal return becomes "Oi t .
When we restrict ˛i D 0 and ˇi D 1, then this approach is called the market-adjusted-
return model. This is a particularly useful approach when there is no return data before
the event, for instance, with an IPO.
Recently, the market model has increasingly been replaced by a multi-factor model
which uses several regressors instead of only the market return. For instance, Fama and
French (1993) argue that (11.6) needs to be augmented by a portfolio that captures the
different returns of small and large firms and also by a portfolio that captures the different
returns of firms with high and low book-to-market ratios.
Finally, another approach is to construct a normal return as the actual return on assets
which are very similar to the asset with an event. For instance, if asset i is a small man-
ufacturing firm (with an event), then the normal return could be calculated as the actual
return for other small manufacturing firms (without events). In this case, the abnormal
return becomes the difference between the actual return and the return on the matching
portfolio. This type of matching portfolio is becoming increasingly popular.

228
time

estimation window event window


(for normal return)

Figure 11.3: Event and estimation windows

All the methods discussed here try to take into account the risk premium on the asset.
It is captured by the mean in the constant mean mode, the beta in the market model, and
by the way the matching portfolio is constructed. However, sometimes there is no data in
the estimation window. The typical approach is then to use the actual market return as the
normal return—that is, to use (11.6) but assuming that ˛i D 0 and ˇi D 1. Clearly, this
does not account for the risk premium on asset i , and is therefore a fairly rough guide.
Apart from accounting for the risk premium, does the choice of the model of the
normal return matter a lot? Yes, but only if the model produces a higher coefficient of
determination (R2 ) than competing models. In that case, the variance of the abnormal
return is smaller for the market model which the test more precise (see Section 11.3 for
a discussion of how the variance of the abnormal return affects the variance of the test
statistic). To illustrate this, consider the market model (11.6). Under the null hypothesis
that the event has no effect on the return, the abnormal return would be just the residual
in the regression (11.6). It has the variance (assuming we know the model parameters)

Var.ui;t / D Var."i t / D .1 R2 / Var.Ri;t /; (11.7)

where R2 is the coefficient of determination of the regression (11.6).


Proof. (of (11.7)) From (11.6) we have

Var.Ri;t / D ˇi2 Var.Rm;t / C Var."i t /:

229
We therefore get

Var."i t / D Var.Ri;t / ˇi2 Var.Rm;t /


D Var.Ri;t / Cov.Ri;t ; Rm;t /2 = Var.Rm;t /
D Var.Ri;t / Corr.Ri;t ; Rm;t /2 Var.Ri;t /
D .1 R2 / Var.Ri;t /:

The second equality follows from the fact that ˇi D Cov.Ri;t ; Rm;t /= Var.Rm;t /, the
third equality from multiplying and dividing the last term by Var.Ri;t / and using the
definition of the correlation, and the fourth equality from the fact that the coefficient
of determination in a simple regression equals the squared correlation of the dependent
variable and the regressor.
This variance is crucial for testing the hypothesis of no abnormal returns: the smaller
is the variance, the easier it is to reject a false null hypothesis (see Section 11.3). The
constant mean model has R2 D 0, so the market model could potentially give a much
smaller variance. If the market model has R2 D 0:75, then the standard deviation of
the abnormal return is only half that of the constant mean model. More realistically,
R2 might be 0.43 (or less), so the market model gives a 25% decrease in the standard
deviation, which is not a whole lot. Experience with multi-factor models also suggest that
they give relatively small improvements of the R2 compared to the market model. For
these reasons, and for reasons of convenience, the market model is still the dominating
model of normal returns.
High frequency data can be very helpful, provided the time of the event is known.
High frequency data effectively allows us to decrease the volatility of the abnormal return
since it filters out irrelevant (for the event study) shocks to the return while still capturing
the effect of the event.

11.3 Testing the Abnormal Return

In testing if the abnormal return is different from zero, there are two sources of sampling
uncertainty. First, the parameters of the normal return are uncertain. Second, even if
we knew the normal return for sure, the actual returns are random variables—and they
will always deviate from their population mean in any finite sample. The first source

230
of uncertainty is likely to be much smaller than the second—provided the estimation
window is much longer than the event window. This is the typical situation, so the rest of
the discussion will focus on the second source of uncertainty.
It is typically assumed that the abnormal returns are uncorrelated across time and
across assets. The first assumption is motivated by the very low autocorrelation of returns.
The second assumption makes a lot of sense if the events are not overlapping in time, so
that the event of assets i and j happen at different (calendar) times. It can also be argued
that the model for the normal return (for instance, a market model) should capture all
common movements by the regressors — leaving the abnormal returns (the residuals)
uncorrelated across firms. In contrast, if the events happen at the same time, the cross-
correlation must be handled somehow. This is, for instance, the case if the events are
macroeconomic announcements or monetary policy moves. An easy way to handle such
synchronized (clustered) events is to form portfolios of those assets that share the event
time—and then only use portfolios with non-overlapping events in the cross-sectional
study. For the rest of this section we assume no autocorrelation or cross correlation.
Let i2 D Var.ui;t / be the variance of the abnormal return of asset i. The variance of
the cross-sectional (across the n assets) average, uN s in (11.2), is then

Var.uN s / D 12 C 22 C ::: C n2 =n2 D niD1 i2 =n2 ; (11.8)
 P

since all covariances are assumed to be zero. In a large sample (where the asymptotic
normality of a sample average starts to kick in), we can therefore use a t-test since

uN s = Std.uN s / !d N.0; 1/: (11.9)

The cumulative abnormal return over q period, cari;q , can also be tested with a t-test.
Since the returns are assumed to have no autocorrelation the variance of the cari;q

Var.cari;q / D qi2 : (11.10)

This variance is increasing in q since we are considering cumulative returns (not the time
average of returns).
The cross-sectional average cari;q is then (similarly to (11.8))

Var.carq / D q12 C q22 C ::: C qn2 =n2 D q niD1 i2 =n2 ; (11.11)
 P

231
if the abnormal returns are uncorrelated across time and assets.
Figures 4.2a–b in Campbell, Lo, and MacKinlay (1997) provide a nice example of an
event study (based on the effect of earnings announcements).

Example 11.2 (Variances of abnormal returns) If the standard deviations of the daily
abnormal returns of the two firms in Example 11.1 are 1 D 0:1 and and 2 D 0:2, then
we have the following variances for the abnormal returns at different days

Time Firm 1 Firm 2 Cross-sectional Average


0:12 0:22 0:12 C 0:22 =4

1
0:12 0:22 0:12 C 0:22 =4

0
0:12 0:22 0:12 C 0:22 =4

1

Similarly, the variances for the cumulative abnormal returns are

Time Firm 1 Firm 2 Cross-sectional Average


0:12 0:22 0:12 C 0:22 =4

1
0 2  0:12 2  0:22 2  0:12 C 0:22 =4


1 3  0:12 3  0:22 3  0:12 C 0:22 =4




Example 11.3 (Tests of abnormal returns) By dividing the numbers in Example 11.1 by
the square root of the numbers in Example 11.2 (that is, the standard deviations) we get
the test statistics for the abnormal returns

Time Firm 1 Firm 2 Cross-sectional Average


1 2 0:5 0:4
0 10 10 13:4
1 1 1:5 1:8

Similarly, the variances for the cumulative abnormal returns we have

Time Firm 1 Firm 2 Cross-sectional Average


1 2 0:5 0:4
0 8:5 6:7 9:8
1 7:5 6:4 9:0

232
11.4 Quantitative Events

Some events are not easily classified as discrete variables. For instance, the effect of
positive earnings surprise is likely to depend on how large the surprise is—not just if there
was a positive surprise. This can be studied by regressing the abnormal return (typically
the cumulative abnormal return) on the value of the event (xi )

cari;q D a C bxi C i : (11.12)

The slope coefficient is then a measure of how much the cumulative abnormal return
reacts to a change of one unit of xi .

Bibliography
Bodie, Z., A. Kane, and A. J. Marcus, 2005, Investments, McGraw-Hill, Boston, 6th edn.

Campbell, J. Y., A. W. Lo, and A. C. MacKinlay, 1997, The econometrics of financial


markets, Princeton University Press, Princeton, New Jersey.

Copeland, T. E., J. F. Weston, and K. Shastri, 2005, Financial theory and corporate policy,
Pearson Education, 4 edn.

Fama, E. F., and K. R. French, 1993, “Common risk factors in the returns on stocks and
bonds,” Journal of Financial Economics, 33, 3–56.

233

S-ar putea să vă placă și