Sunteți pe pagina 1din 6

Quantitative Discovery

Notes on Implementing White's Reality Check

Eola Investments, LLC

info@eolainvestments.com

December 2011

All rights reserved

Eola Investments LLC All Rights Reserved Page 1 of 6


Quantitative Discovery

INTRODUCTION

White (2000) paper on the Reality Check (RC) method is an influential paper for researchers

and practitioners who are concerned with data snooping risk in developing forecasting models. There

are inquiries to Eola Investments about how to implement White's RC method or Hansen's SPA

method. Despite the unexplained paradox that I described in January 2011, the implementation of both

methods are not difficult. Since Hansen's PSA is an extension of RC by 1) studentizing the test statistic

and 2) getting rid of obviously inferior models, and its implementation is straightforward as long as RC

is understood, this short note is to concentrate on helping readers understand how to implement RC.

For detailed description on the RC method, read White (2000). A simplified version can also be found

in ei (2011) “A Reality Check and Test for SPA for z-Model”.

SYMBOLS AND NOTATIONS

Subscript k stands for k-th model, k = 1, 2, …, l. Let k = 0 be the benchmark model.

Subscript i stands for i-th bootstrap, i = 1, 2, ..., N. Let i = 0 be the real historical data.

Subscript t stands for t-th period, t = R, R+1, R+2, ..., T. R and T are the start day and end day of

forecasting, respectively. T-R+1 = n.

h k ,t 1 is a measure of prediction error of k-th model at t+1. For White's Example 2.1, it is the

NEGATIVE prediction squared error of k-th model at t+1; for Example 2.2, it is the daily return of k-

th model at t+1; for Example 2.3, it is the predictive log-likelihood of k-th model at t+1.

f k ,t 1=h k , t1− h 0,t 1 means the difference of the prediction error between the k-th model and

the benchmark model at time t+1.


T
f k = 1 ∑ f k ,t1 means the average difference of the prediction error between the k-th model
n t =R

and the benchmark model in the period from R to T. A simple derivation can make programing easier:

Eola Investments LLC All Rights Reserved Page 2 of 6


Quantitative Discovery

T T T T
f k = 1 ∑ f k ,t1= 1 ∑  h k ,t 1−h 0,t1 = 1 ∑ h k ,t 1− 1 ∑ h 0, t1=hk −h0
n t =R n t= R n t= R n t =R

where hk is the average prediction error of the k-th model in the period from R to T. Specifically, for

Example 2.1, it is the NEGATIVE Prediction Mean Squared Error (PMSE) of the k-th model. The t-

subscript is dropped out due to averaging in time domain.

Vk =max  n1 /2 f1, n1 /2 f2, ... , n1 / 2 fk  , k =1,2,. .. , l

V*k , i=max  n1 / 2  f*1− f1 , n 1/ 2  f*2− f2  , ... , n 1/ 2  f*k − fk  , k =1,2,. .. , l ; i=1,2,. .. , N.

where the star stands for the bootstrapped value and i is the i-th bootstrap. Later on I will switch

position for i and k in matrix format, letting i be row index and k be column index, just for personal

habit.

IMPLEMENTATION

Now I use Example 2.1 or section 4 of White (2000) as an example to implement RC.

Step 1. Use real historical data to calculate PMSE for each model, including the benchmark.

Organize the l+1 PMSE to a row vector and let benchmark PMSE be at the first place

{ PMSE 1, PMSE 2, ... , PMSE k , ... , PMSE l 1}


1×l1

Step 2. Bootstrap the historical data once to a new set of data, then calculate PMSE for each

model, including the benchmark, and combine the two row vectors to a (2×(l+1)) matrix

[ PMSE 1,1 ... PMSE 1, k ...


PMSE *2,1 ... PMSE *2, k ...
PMSE 1,l 1
PMSE *2,l 1 ]
2×l1

Eola Investments LLC All Rights Reserved Page 3 of 6


Quantitative Discovery

Step 3. Continue bootstrap to N times and finish the PMSE matrix ((N+1)×(l+1))

[ ]
PMSE 1,1 ... PMSE 1, k ... PMSE 1,l1
PMSE *2,1 ... PMSE *2, k ... PMSE *2,l1
⋮ ⋱ ⋮ ⋱ ⋮
PMSE= * * *
PMSE i ,1 ... PMSE i , k ... PMSE i , l1
⋮ ⋱ ⋮ ⋱ ⋮
PMSE *N 1,1 ... PMSE *N 1,k ... PMSE *N 1,l 1
 N 1×l1

Note in PMSE matrix the first row is from real historical data, and the first column is from benchmark

method.

Step 4. In the PMSE matrix, subtract each column by the first column, then remove the first

column, resulting the f matrix

[ ]
f1,1 ... f 1, k ... f1, l
f *2,1 ... f *2,k ... f *2,l
f = ⋮ ⋱ ⋮ ⋱ ⋮
f *i , 1 ... f *i , k ... f *i , l
⋮ ⋱ ⋮ ⋱ ⋮
f *N 1,1  *
... f N 1,k  *
... f N 1,l
 N 1×l 

Step 5. In the f matrix, subtract each row expect the first row by the first row. Do not change

the first row. Call the new matrix diff

[ ]
f 1,1 ... f 1, k ... f1,l
* * *
f − f1,1
2,1 ... f 2,k − f 1,k ... f 2,l − f1,l
⋮ ⋱ ⋮ ⋱ ⋮
diff =
f *i , 1− f 1,1 ...  *
f i , k − f 1, k ...  *
f i , l− f1, l
⋮ ⋱ ⋮ ⋱ ⋮
f *N1,1 − f1,1 *
... f N 1,k − f 1,k
 *
... f N 1,l − f1,l

 N 1×l 

Eola Investments LLC All Rights Reserved Page 4 of 6


Quantitative Discovery

Step 6. This step is the essence of White's RC

6a. If only the overall p-value is needed, just take the maximum of each row and times n1/2 to

 vector:
form a V

{ }
max k=1,... ,l f 1, k

 =n1 /2
V max k=1,... , l  f i , k − f 1, k 

max k=1,. .. ,l  f N 1, k − f N 1,k 
 N 1×1

Then count how many elements from the second row to the end (N+1)-th row that are larger than the

first element. Denote this number as M. The RC p-value is simply M/N. (White counts number of

elements SMALLER than the first element, so in his paper p-value is 1 - M/N)

6b. If it is desired to study the p-value evolution with adding models one by one, just like

Figure 2 in White (2000) , then modify the diff matrix by replacing an element with the maximum

between the element and its nearest neighbor on the left, starting from the second column. Of course,

 matrix
do not change the first column. The result is a V

[ ]
V1,1 ... V1, k ... V1,l
V *2,1 ... V *2, k ... V *2,l
 = ⋮*
V ⋱ ⋮ ⋱ ⋮
V i ,1 ... V *i , k
 ... V *i , l

⋮ ⋱ ⋮ ⋱ ⋮
* * *

V N 1,1 
... V N 1, k 
... V N 1,l
 N 1×l 

where

V1, k =max  n 1/ 2 f1,1 , n1 /2 f 1,2 , ... , n 1/ 2 f 1, k  , k=1,2,... ,l

V*i , k =max  n1 / 2  f *i ,1 − f1,1 , n1/ 2  f*i ,2− f1,2  , ... , n 1/ 2  f *i , k − f 1,k   ,
i=2,3,... , N 1 ; k =1,2,... , l.

Now, for each k column, find the number of elements that are larger than the first element, say M, and

then the p-value for the subset models 1 to k is simply p k =M / N .

Eola Investments LLC All Rights Reserved Page 5 of 6


Quantitative Discovery

Note 1, the Figure 2 of White (2000) is the plot of the first row of the diff matrix and the pk

against k, k = 1,... l.

Note 2, the constant n1/2 only has theoretical significance. It does not change the p-value. So it

can be safely omitted when calculating RC p-value.

REFERENCES

White, H. (2000), “A Reality Check for Data Snooping”, Econometrica, 68, pp. 1097-1126.

Eola Investments (2011), “A Reality Check and Test for SPA for z-Model”, www.eolainvestments.com

Eola Investments LLC All Rights Reserved Page 6 of 6

S-ar putea să vă placă și