Implementing White's Reality Check Method

Quantitative Discovery
Notes on Implementing White's Reality Check
Eola Investments, LLC
info@eolainvestments.com
December 2011
All rights reserved
Eola Investments LLC All Rights Reserved Page 1 of 6

INTRODUCTION
White (2000) paper on the Reality Check (RC) method is an influential paper for researchers
and practitioners who are concerned with data snooping risk in developing forecasting models. There
are inquiries to Eola Investments about how to implement White's RC method or Hansen's SPA
method. Despite the unexplained paradox that I described in January 2011, the implementation of both
methods are not difficult. Since Hansen's PSA is an extension of RC by 1) studentizing the test statistic
and 2) getting rid of obviously inferior models, and its implementation is straightforward as long as RC
is understood, this short note is to concentrate on helping readers understand how to implement RC.
For detailed description on the RC method, read White (2000). A simplified version can also be found
in ei (2011) “A Reality Check and Test for SPA for z-Model”.
SYMBOLS AND NOTATIONS
Subscript k stands for k-th model, k = 1, 2, …, l. Let k = 0 be the benchmark model.
Subscript i stands for i-th bootstrap, i = 1, 2, ..., N. Let i = 0 be the real historical data.
Subscript t stands for t-th period, t = R, R+1, R+2, ..., T. R and T are the start day and end day of
forecasting, respectively. T-R+1 = n.
h k ,t 1 is a measure of prediction error of k-th model at t+1. For White's Example 2.1, it is the
NEGATIVE prediction squared error of k-th model at t+1; for Example 2.2, it is the daily return of k-
th model at t+1; for Example 2.3, it is the predictive log-likelihood of k-th model at t+1.
f k ,t 1=h k , t1− h 0,t 1 means the difference of the prediction error between the k-th model and
the benchmark model at time t+1.

T
f k = 1 ∑ f k ,t1 means the average difference of the prediction error between the k-th model
n t =R
and the benchmark model in the period from R to T. A simple derivation can make programing easier:

T T T T
f k = 1 ∑ f k ,t1= 1 ∑  h k ,t 1−h 0,t1 = 1 ∑ h k ,t 1− 1 ∑ h 0, t1=hk −h0
n t =R n t= R n t= R n t =R
where hk is the average prediction error of the k-th model in the period from R to T. Specifically, for
Example 2.1, it is the NEGATIVE Prediction Mean Squared Error (PMSE) of the k-th model. The t-
subscript is dropped out due to averaging in time domain.
Vk =max  n1 /2 f1, n1 /2 f2, ... , n1 / 2 fk  , k =1,2,. .. , l
V*k , i=max  n1 / 2  f*1− f1 , n 1/ 2  f*2− f2  , ... , n 1/ 2  f*k − fk  , k =1,2,. .. , l ; i=1,2,. .. , N.
where the star stands for the bootstrapped value and i is the i-th bootstrap. Later on I will switch
position for i and k in matrix format, letting i be row index and k be column index, just for personal
habit.
IMPLEMENTATION
Now I use Example 2.1 or section 4 of White (2000) as an example to implement RC.
Step 1. Use real historical data to calculate PMSE for each model, including the benchmark.
Organize the l+1 PMSE to a row vector and let benchmark PMSE be at the first place
{ PMSE 1, PMSE 2, ... , PMSE k , ... , PMSE l 1}

1×l1
Step 2. Bootstrap the historical data once to a new set of data, then calculate PMSE for each
model, including the benchmark, and combine the two row vectors to a (2×(l+1)) matrix
[ PMSE 1,1 ... PMSE 1, k ...

PMSE *2,1 ... PMSE *2, k ...
PMSE 1,l 1
PMSE *2,l 1 ]
2×l1

Step 3. Continue bootstrap to N times and finish the PMSE matrix ((N+1)×(l+1))
[ ]
PMSE 1,1 ... PMSE 1, k ... PMSE 1,l1
PMSE *2,1 ... PMSE *2, k ... PMSE *2,l1
⋮ ⋱ ⋮ ⋱ ⋮
PMSE= * * *
PMSE i ,1 ... PMSE i , k ... PMSE i , l1
⋮ ⋱ ⋮ ⋱ ⋮
PMSE *N 1,1 ... PMSE *N 1,k ... PMSE *N 1,l 1
 N 1×l1
Note in PMSE matrix the first row is from real historical data, and the first column is from benchmark
method.
Step 4. In the PMSE matrix, subtract each column by the first column, then remove the first
column, resulting the f matrix
[ ]
f1,1 ... f 1, k ... f1, l
f *2,1 ... f *2,k ... f *2,l
f = ⋮ ⋱ ⋮ ⋱ ⋮
f *i , 1 ... f *i , k ... f *i , l
⋮ ⋱ ⋮ ⋱ ⋮
f *N 1,1  *
... f N 1,k  *
... f N 1,l
 N 1×l 
Step 5. In the f matrix, subtract each row expect the first row by the first row. Do not change
the first row. Call the new matrix diff
[ ]
f 1,1 ... f 1, k ... f1,l
* * *
f − f1,1
2,1 ... f 2,k − f 1,k ... f 2,l − f1,l
⋮ ⋱ ⋮ ⋱ ⋮
diff =
f *i , 1− f 1,1 ...  *
f i , k − f 1, k ...  *
f i , l− f1, l
⋮ ⋱ ⋮ ⋱ ⋮
f *N1,1 − f1,1 *
... f N 1,k − f 1,k
 *
... f N 1,l − f1,l

 N 1×l 

Step 6. This step is the essence of White's RC
6a. If only the overall p-value is needed, just take the maximum of each row and times n1/2 to
 vector:
form a V
{ }
max k=1,... ,l f 1, k
⋮
 =n1 /2
V max k=1,... , l  f i , k − f 1, k 
⋮
max k=1,. .. ,l  f N 1, k − f N 1,k 
 N 1×1
Then count how many elements from the second row to the end (N+1)-th row that are larger than the
first element. Denote this number as M. The RC p-value is simply M/N. (White counts number of
elements SMALLER than the first element, so in his paper p-value is 1 - M/N)
6b. If it is desired to study the p-value evolution with adding models one by one, just like
Figure 2 in White (2000) , then modify the diff matrix by replacing an element with the maximum
between the element and its nearest neighbor on the left, starting from the second column. Of course,
 matrix
do not change the first column. The result is a V
[ ]
V1,1 ... V1, k ... V1,l
V *2,1 ... V *2, k ... V *2,l
 = ⋮*
V ⋱ ⋮ ⋱ ⋮
V i ,1 ... V *i , k
 ... V *i , l

⋮ ⋱ ⋮ ⋱ ⋮
* * *

V N 1,1 
... V N 1, k 
... V N 1,l
 N 1×l 
where
V1, k =max  n 1/ 2 f1,1 , n1 /2 f 1,2 , ... , n 1/ 2 f 1, k  , k=1,2,... ,l
V*i , k =max  n1 / 2  f *i ,1 − f1,1 , n1/ 2  f*i ,2− f1,2  , ... , n 1/ 2  f *i , k − f 1,k   ,
i=2,3,... , N 1 ; k =1,2,... , l.
Now, for each k column, find the number of elements that are larger than the first element, say M, and
then the p-value for the subset models 1 to k is simply p k =M / N .

Note 1, the Figure 2 of White (2000) is the plot of the first row of the diff matrix and the pk
against k, k = 1,... l.
Note 2, the constant n1/2 only has theoretical significance. It does not change the p-value. So it
can be safely omitted when calculating RC p-value.
REFERENCES
White, H. (2000), “A Reality Check for Data Snooping”, Econometrica, 68, pp. 1097-1126.
Eola Investments (2011), “A Reality Check and Test for SPA for z-Model”, www.eolainvestments.com

Implementing White's Reality Check Method

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Implementing White's Reality Check Method

Încărcat de

Drepturi de autor:

Formate disponibile

Quantitative Discovery

Notes on Implementing White's Reality Check

Eola Investments, LLC

All rights reserved

Eola Investments LLC All Rights Reserved Page 1 of 6

in ei (2011) “A Reality Check and Test for SPA for z-Model”.

SYMBOLS AND NOTATIONS

Subscript k stands for k-th model, k = 1, 2, …, l. Let k = 0 be the benchmark model.

forecasting, respectively. T-R+1 = n.

the benchmark model at time t+1.

Eola Investments LLC All Rights Reserved Page 2 of 6

subscript is dropped out due to averaging in time domain.

Vk =max  n1 /2 f1, n1 /2 f2, ... , n1 / 2 fk  , k =1,2,. .. , l

{ PMSE 1, PMSE 2, ... , PMSE k , ... , PMSE l 1}

[ PMSE 1,1 ... PMSE 1, k ...

Eola Investments LLC All Rights Reserved Page 3 of 6

column, resulting the f matrix

the first row. Call the new matrix diff

Eola Investments LLC All Rights Reserved Page 4 of 6

Step 6. This step is the essence of White's RC

V1, k =max  n 1/ 2 f1,1 , n1 /2 f 1,2 , ... , n 1/ 2 f 1, k  , k=1,2,... ,l

then the p-value for the subset models 1 to k is simply p k =M / N .

Eola Investments LLC All Rights Reserved Page 5 of 6

can be safely omitted when calculating RC p-value.

Eola Investments LLC All Rights Reserved Page 6 of 6

S-ar putea să vă placă și