Documente Academic
Documente Profesional
Documente Cultură
T
A
M
CAMPBELL R. HARVEY AND YAN LIU
R
FO
Y
N
A
IN
W
CAMPBELL R. H ARVEY e provide some new tools whether there is anything we can learn in
LE
is a professor at Duke to evaluate trading strate- finance from other scientific fields. Although
University in Durham,
C
gies. When it is known that the advent of machine learning is relatively
NC, and a fellow at
TI
the National Bureau many strategies and com- new to investment management, similar
binations of strategies have been tried, we situations involving a large number of tests
R
of Economic Research
in Cambridge, MA.
cam.harvey@duke.edu
need to adjust our evaluation method for A
these multiple tests. Sharpe ratios and other
have been around for many years in other sci-
ences. It makes sense that there may be some
IS
statistics will be overstated. Our methods insights outside of finance that are relevant
TH
YAN LIU
is an assistant professor at
are simple to implement and allow for the to finance.
Texas A&M University in real-time evaluation of candidate trading Our first example is the widely heralded
E
College Station, TX. strategies. discovery of the Higgs Boson in 2012. The
C
y-liu@mays.tamu.edu Consider the following trading strategy particle was first theorized in 1964—the same
U
detailed in Exhibit 1.1 Although there is a year as William Sharpe’s paper on the capital
D
minor drawdown in the first year, the strategy asset pricing model (CAPM) was published.2
O
is consistently profitable through 2014. Indeed, The first tests of the CAPM were published
R
the drawdowns throughout the history are eight years later3 and Sharpe was awarded a
EP
minimal. The strategy even does well during Nobel Prize in 1990. For Peter Higgs, it was
R
the financial crisis. Overall, this strategy a much longer road. It took years to complete
appears very attractive and many investment the Large Hadron Collider (LHC) at a cost
TO
managers would pursue this strategy. of about $5 billion.4 The Higgs Boson was
Our research (see Harvey and Liu declared “discovered” on July 4, 2012, and
L
[2014a] and Harvey et al. [2014]) offers some Nobel Prizes were awarded in 2013.5
A
tools to evaluate strategies such as the one So why is this relevant for finance? It
G
presented in Exhibit 1. It turns out that simply has to do with the testing method. Scientists
LE
looking at average profitability, consistency, knew that the particle was rare and that it
IL
and size of drawdowns is not sufficient to decays very quickly. The idea of the LHC
give a strategy a passing grade. is to have beams of particles collide. Theo-
IS
Copyright © 2014
problem is that each of the so-called decay signatures a single gene but the interactions among several genes.
can also be produced by normal events from known Counting all the possibilities, the total number of tests
processes. can easily exceed a million. Given this large number of
To declare a discovery, scientists agreed to what tests, a tougher standard must be applied. With the con-
appeared to be a very tough standard. The observed ventional thresholds, a large percentage of studies that
occurrences of the candidate particle (Higgs Boson) had document significant associations are not replicable.7
to be five standard deviations different from a world To give an example, a recent study in Nature claims
where there was no new particle. Five standard devia- to find two genetic linkages for Parkinson’s disease.8
tions is generally considered a tough standard. Yet in About a half a million genetic sequences are tested for the
finance, we routinely accept discoveries where the t-sta- potential association with the disease. Given this large
tistic exceeds two—not five. Indeed, there is a hedge number of tests, tens of thousands of genetic sequences
fund called Two Sigma. will appear to affect the disease under conventional stan-
Particle physics is not alone in having a tougher dards. We need a tougher standard to lower the possi-
hurdle to clear. Consider the research done in bioge- bility of false discoveries. Indeed, the identified gene loci
netics. In genetic association studies, researchers try to from the tests have t-statistics that exceed 5.3.
link a certain disease to human genes and they do this by There are many more examples such as the search
testing the causal effect between the disease and a gene. for exoplanets. However, there is a common theme in
Given that there are more than 20,000 human genes that these examples. A higher threshold is required because
are expressive, multiple testing is a real issue. To make it the number of tests is large. For the Higgs Boson, there
even more challenging, a disease is often not caused by were potentially trillions of tests. For research in bioge-
EXHIBIT 2
200 Randomly Generated Trading Strategies
EXHIBIT 3
False Trading Strategies, True Trading Strategies
explore many strategies and often choose to present the (or 95% confidence that the identified strategy is not
one with the largest Sharpe ratio. But the Sharpe ratio false):
for this strategy no longer represents its true expected
profitability. With a large number of tests, it is very p-value of test < threshold
likely that the selected strategy will appear to be highly
Bonferroni divides the threshold (0.05) by the
profitable just by chance. To take this into account, we
number of tests, our case 200:
need to haircut the reported Sharpe ratio. In addition,
the haircut needs to be larger if there are more tests p-value of test < 0.05/200.
tried.
Take the candidate strategy in Exhibit 1 as an Equivalently, we could multiply the p-value of the
example. It has a Sharpe ratio of 0.92 and a corre- individual test by 200 and check each test to identify
sponding t-statistic of 2.91. The p-value is 0.4%; hence, which ones are less than 0.05, i.e.
if there were only one test, the strategy would look
very attractive because there is only a 0.4% chance it (p-value of test) × 200 < 0.05
is a f luke. However, with 200 tests tried, the story is In our case, the original p-value is 0.004 and when
completely different. Using the Bonferroni multiple- multiply by 200 the adjusted p-value is 0.80 and the
testing method, we need to adjust the p-value cutoff to corresponding t-statistic is 0.25. This high p-value is sig-
0.05/200 = 0.00025. Hence, we would need to observe nificantly greater than the threshold, 0.05. Our method
a t-statistic of at least 3.66 to declare the strategy a true asks how large the Sharpe ratio should be in order to
discovery with 95% confidence. The observed t-statistic, generate a t-statistic of 0.25. The answer is 0.08. There-
2.92 is well below 3.66—hence, we would pass on this fore, knowing that 200 tests have been tried and under
strategy. Bonferroni’s test, we successfully declare the candidate
There is an equivalent way of looking at the Bon- strategy with the original Sharpe ratio of 0.92 as insig-
ferroni test. To declare a strategy true, its p-value must nificant—the Sharpe ratio that adjusts for multiple tests
be less than some predetermined threshold such as 5%