Documente Academic
Documente Profesional
Documente Cultură
∑ {[ δ ][ ] }
N
performance over the data representing the training pe-
W N = W0 + t −1 rm ,t + (1 − δ t −1 ) r f × 1 − δ t'−1 c × Wt −1
riod. How might one go about labeling the training data?
t =1
One approach to labeling the training data is to have
an expert label the data with optimal trading points. While where Wt is the total wealth at day t, rf is the return rate of
one difficulty with this is that experts will not necessarily the fixed interest security calculated daily, rm,t = (It - It-1) /
agree on what the optimum buy and sell points are, a It-1 is the market return at day t where It is the share price
more profound concern is the interpretation of a neural at time t, δt-1 is a delta function which equals 1 if capital is
network trained on such expert assigned data. The expert invested in shares at the completion of trading day t-1 and
will undoubtedly use rules of thumb in assigning the 0 otherwise, c is the commission rate on a trade, and δ t'−1
training labels, and any network trained on such data will
merely result in a neurally encoded (approximate) repre- is a delta function which equals 1 if a trade occurs at the
end of day t-1 and 0 otherwise. Thus, the first factor ap-
sentation of the decision strategy used by the expert.
To avoid these problems we automate the labeling of pearing in the summation represents the daily return rate
data by assigning an example a buy target if a decision to that is applicable for the current day (i.e. the market return
rate or the fixed interest return rate), the second factor
buy on the day represented by that example would have
resulted in a profit. Similarly, an example is assigned a provides an adjustment for the cost of transactions, and
sell target if a decision to sell would have resulted in a 1
In practice, one should factor in trading costs into the labelling
profit. In other words, all points in a downward trend are procedure. That is, the relative increase or decrease in the value
labeled as sells, and all points in an upward trend are la- of the stock may not be sufficient to offset the cost of the trans-
beled as buys. Thus, assuming a one-point trading strat- action. We ignore transaction costs in the labelling procedure on
egy, trades will take place at every peak and trough. the grounds that the network will not be able to model the train-
The justification for this labeling procedure is that it ing data perfectly, and that small increases/decreases are not as
results in the best possible return that could have been important as larger ones. That is, it is the larger in-
creases/decreases that are more important in regards to perform-
ance on future data.
the third factor is the wealth at the close of the previous sponding to a buy-and-hold strategy are respectively
day’s trading. The return at the end of the trading period 14.9% and –5.2%.
is just the change in wealth divided by the initial wealth Testing the traders over a longer period can be
i.e. r = (WN – W0) / W0 , and it is this quantity which we achieved using a moving windows approach in which the
wish to maximize. training/testing set windows are advanced by N days after
Genetic search proceeds as follows. A population of each training/testing cycle, where N is the number of days
individuals, each representing a distinct neural network, is in each test period. The results of applying this approach
generated. Each of these networks is evaluated by fol- over an 880 day test period (i.e. 22 training/testing cycles)
lowing its trading predictions over the period represented are shown in Figure 2 (the results in Figure 1 correspond
by a set of historical training data and determining the to the last of these 22 cycles).
return at the end of this period. The fitness of an individ-
200%
ual is measured directly as the return that it is able to GA -trained netwo rk
180%
achieve. Reproduction, crossover and mutation operators B ackpro pagatio n-trained netwo rk
160%
are then applied to produce a new generation, with fitter 140%
B est P o ssible
individuals having a greater likelihood of contributing 120%
B uy-and-ho ld
offspring to the next generation. This procedure is al- 100%
lowed to proceed until either a predetermined number of 80%
generations has been reached, or until there is no further 60%
increase in fitness. At the completion of search, the best 40%
network is used to make buy/sell decisions over a test 20%
period. 0%
-20%
590 640 690 740 790 840
5. Empirical Results t ra ding da ys
-20%
2 0 200 400 600 800
This presumes the existence of an investment product (e.g. a
t ra ding da ys
unit trust) which purchases according to the make-up of the
index (i.e. the investment fund manager invests in all thirty Figure 2. Performance of traders over an 880 day forecast period
companies representing the index). using 22 training/testing cycles. The GA-trained network
achieves a total return of 128% over the test period (26.6% per
annum average return). The backpropagation-trained network
achieves a return of 48.0% (11.9% per annum average return).
Despite the fact that the backpropagation-trained
network models the training data poorly, it still appears to
Over the 880 day period, the average yearly return make some good trading decisions when projected onto
from following the decisions made by the GA-trained future data, resulting in an overall return slightly better
agent is 127.6% (26.6% per annum average return), while than that of a buy-and-hold strategy. However, this return
the return for the back-propagation-trained network is is significantly lower than that achieved by the GA-
only 48.0% (11.9% per annum average return). The buy- trained trader.
hold-return over the 880 day period is 27.2% (7.1% per
annum average return). The number of trades made by the 5.1 Performance on Random Walk Data
two networks over the 880 day period are almost identi-
cal—52 for the GA-trained network, and 54 for the back- Discovery of a neural network trader is based on the as-
propagation-trained network. sumption that patterns that have occurred in the past pro-
The relatively poor performance of the backpropaga- vide an indication of future movements in the value of the
tion-trained network can be attributed to its poor ability to index. Thus, the technique implicitly assumes that
model the training data presented to it. This poor ability is changes in the value of the index are not entirely random.
reflected in the relatively high mean square error on the How might we determine whether the predictability that
training data (approximately 0.25 on most 250 day train- we have observed in the Dow Jones Index data is real or
ing windows), and is due to the high frequency of trend anomalous?
reversals, many of which occur after only a marginal rise One way of testing this is to compare the perform-
or fall in the value of the index. ance on real data with performance on one or more sets of
There are at least two ways in which a better model random walk data. If performance on the random data is
for the training data could have been achieved: (i) using a found to be comparable with that on real data, then we
more complex network architecture; (ii) filtering out the cannot claim to have discovered any real predictability.
high frequency, low amplitude, fluctuations. While in- Random walk data for the Dow Jones Index can be
creasing the complexity of the network (i.e. increasing simulated by taking the sequence of relative daily changes
number of hidden layer units) would result in a network in the index value, and randomizing this sequence tempo-
better able to model the training data, this would most rally. That is, the same set of relative daily changes are
likely result in over-fitting and hence degraded generali- maintained, but the order in which these changes occur is
zation ability. While filtering out the smaller high fre- randomized. This means that the increase in the index
quency fluctuations in the training data would also result over the entire period will be identical for each time series
in a better model, the problem here is that there does not produced in this manner, thus reducing problems with
appear to be any objective means for deciding on a suit- biases that can arise from long term upward or downward
able filtering procedure, and the approach would be ad trends.
hoc.
13000 13000
12000 12000
11000 11000
10000 10000
9000 9000
8000 8000
7000 7000
6000 6000
5000 5000
0 200 400 600 800 0 200 400 600 800
(a) (b)
13000 13000
12000 12000
11000 11000
10000 10000
9000 9000
8000 8000
7000 7000
6000 6000
5000 5000
0 200 400 600 800 0 200 400 600 800
(c) (d)
Figure 3. (a) Dow Jones Industrial Average index values, (b) – (d) Random walk time series produced by temporally randomizing the
daily relative changes in the Dow Jones Industrial Average index.
Figure 3 shows the original Dow Jones Industrial In- References
dex data (Figure 3a) and three time series (Figures 3b,c,d)
generated by randomizing the sequence of relative daily Bäck, T. & Schwefel, H.P. 1993, ‘An overview of evolu-
changes as described above. Table 1 shows the average tionary algorithms for parameter optimization’, Evolu-
yearly returns achieved by GA-trained and backpropaga- tionary Computation, vol. 1, pp. 1-23.
tion-trained neural network traders on each of these da- Beltratti, A., Margarita, S. and Terna, P. 1996, Neural
tasets. Networks for Economic and Financial Modelling,
Thomson Computer Press, London.
Table 1. Avg. yearly returns of network traders on 4 datasets Goldberg, D.E. 1989, Genetic Algorithms in Search, Op-
Average yearly return timization and Machine Learning, Addison-Wesley
GA Backprop. Buy&Hold Publishing Company, inc., Reading, Ma.
Dow Jones 26.6% 11.9% 7.1% Holland, J. 1975, Adaptation in Natural and Artificial
Random_1 3.1% -4.5% 7.1% Systems, University of Michigan Press, Ann Arbor.
Random_2 4.6% 0.6% 7.1%
Random_3 12.3% 9.4% 7.1% Kimoto,T., Asakaya, K., Yoda, M. & Takeoka, M. 1990,
‘Stock market prediction system with modular neural
Each of the network traders perform better than a networks’, Proceedings of the IEEE International Joint
buy-and-hold strategy on the 3rd randomized dataset, but Conference on Neural Networks, pp. I1-I6.
worse than that of a buy-and-hold strategy on the other LeBaron, B., Arthur, W.B. and Palmer, R. 1999, ‘Time
random datasets. The GA-trained trader performs better series properties of an artificial stock market’, in Jour-
than the backpropagation-trained trader on each dataset. nal of Economic Dynamics & Control, vol. 23, pp.
Thus, while the network traders (in particular the 1487-1516.
GA-trained trader) may be able to find a trading strategy Malkiel, B.G. 1996, A Random Walk Down Wall Street,
that operates well on a set of (randomized) training data, 6th edn, W.W. Norton, New York.
this trading strategy is not able to satisfactorily predict
Rumelhart, D.E. & McClelland, J.L. 1986, Parallel dis-
good buy and sell points when projected into the future.
tributed processing: exploration in the microstructure
We can thus draw the weak conclusion that the predict-
of cognition (Vols. 1 & 2), MIT Press, Cambridge, MA
ability that we have observed in the daily movements of
the Dow Jones Index is not anomalous, but indicates some Weigend, A.S. & Gershenfeld, N.A. (eds) 1994, Time
regularity in the time series which can be exploited to Series Prediction: Forecasting the Future and Under-
make future decisions. standing the Past, Addison-Wesley, Reading.
White, H. 1988, ‘Economic prediction using neural net-
6. Conclusions works: the case of the IBM daily stock returns’, Pro-
ceedings of the IEEE International Conference on Neu-
The results reported in this paper illustrate that a genetic ral Networks, pp. II-451-II458.
algorithm-trained neural network trader performs signifi-
Whitley, D. 1995, ‘Genetic algorithms and neural net-
cantly better than a backpropagation-trained trader (with
works’ in Genetic Algorithms in Engineering and Com-
identical architecture) in deciding on good buy and sell
puter Science, eds G. Winter and J. Periaux, John
points on test financial data. Comparison of the perform-
Wiley, pp: 203-16.
ance of the traders on real data with performance on ran-
dom walk data lends support to the claim that the predict- Yoon, Y. & Swales, G. 1991, ‘Predicting stock price per-
ability observed on the Dow Jones index is real and not formance: a neural network approach’, Proceedings of
anomalous. the IEEE 24th Annual International Conference of Sys-
tems Sciences, pp. 156-62.