Sunteți pe pagina 1din 26

Dow Stock Selecton Using an Artificial Neural Network

JOHN JASON AUVENSHINE* ABSTRACT A Feed-Forward/Back Propagation Artificial Neural Network is used as the core of a software system to manage an investment portfolio of stocks selected from the Dow Jones 30 Industrials over the period November 1971 to October 1998. Monthly price and dividend data for each stock, plus the short-term T-Bill rate and the level of the Dow Jones Industrial Average are used as input factors. Multiple trading systems based on network outputs are tested. Trading systems which use long term (18 month) and mixed term (1, 6, and 18 month) neural network predictions are found to outperform the Dow on a statistically significant basis. Trading systems which use short term (1 month) neural network predictions afford no statistically significant improvement over a buy-and-hold Dow strategy. This article assumes a good understanding of the structure and operation of Feed-Forward/Back Propagation Neural Networks.1 Copyright 2000 John Jason Auvenshine.
KEY WORDS artificial neural networks; stock selection; trading systems

INTRODUCTION The allocation of assets within a common stock investment portfolio is a task for which there is widespread disagreement regarding the correct approaches, methods, and goals. A truly optimal asset allocation most fully accomplishes the goals of the portfolio owner (investor) given the available asset choices in the marketplace. With perfect knowledge of the future, asset allocation would simply be a matter of choosing the asset providing the maximum total return over the investors time horizon. Because the future cannot be known with certainty, risk must be added to the decision process. The only two goals considered in this study are total return and risk, however the investor may have other implicit or explicit goals. Some other possible goals are favoritism or aversion to a particular industry or company, a desire
_____________ * Correspondence to: John Jason Auvenshine, International Business Machines, PO Box 18076, Tucson, AZ 85731-8076, USA. E-mail: auvenj@yahoo.com For those unfamiliar with multilayer Feed-Forward/Back Propagation Neural Networks, the introduction in Elaine Rich and Kevin Knights Artificial Intelligence, 2nd Edition, pages 500-507 is highly recommended.

to reduce or eliminate taxes, a desire to follow others by allocating assets based on their recommendations and opinions, or a desire to follow a policy of socially responsible screening for all potential investments. One of the largest areas of disagreement in portfolio allocation is over whether or not it is possible to beat the market consistently, adjusted for risk. The Efficient Market Hypothesis (EMH) holds that market prices fully reflect all available information, and future price movements are essentially random (Malkiel, 1985). However, there is significant evidence both anecdotally and in analysis of past returns that suggests that price movements are not entirely random and better-than-market profits can be achieved (Vaga, 1994). This study rests upon the presumption that the latter is true: market-beating profits are possible if sufficiently capable analysis and/or trading strategies are driving the asset allocation. The search for a stock selection system capable of consistently outperforming the overall stock market has probably been going on for as long as the stock market has existed. Some of the earliest attempts, still in wide use today, involve simple technical and fundamental heuristics. Data on the long-term effectiveness of these simple strategies is often anecdotal rather than scientific. The primary interest in simple heuristics for this study is their illumination of possible input factors to a neural network based system. Technical heuristics attempt to select stocks that have, or will soon have, more buyers than sellers in the market a condition that causes the stocks price to rise. The simplest technical heuristics focus solely on the price of the stock being considered for trade. One type of technical heuristic is a moving average rule. An investor following a 200-day moving average rule would consider a stock a buy when its current price moves from significantly below to significantly above the 200 day moving average of its price, and a sell when the reverse happens (Mamis, 1977). The idea behind the moving average strategy is to profit from the fact that stocks tend to follow trends: a stock which is going up will continue to go up for a while, and a stock which is going down will continue to go down for a while. Other researchers, including Gencay and Stengos (1998), have confirmed the usefulness of moving averages in stock index prediction. Another technical heuristic, which often results in nearly opposite trading patterns from the moving average heuristic, is the buy low sell high heuristic. An investor following this heuristic would consider a stock a buy when its current price is a certain percentage below its all-time high, and a sell when its current price rises a certain percentage from where it was purchased (Mahoney, 1978). The idea behind this strategy is that the trends eventually reverse, and if you can pick buy and sell percentages that are close to common reversal points, you can profit from the reversal.

Fundamental heuristics attempt to identify stocks currently trading at a price below their long-term economic value. Fundamental heuristics are rarely as simple as the technical heuristics described above, primarily due to the multitude of factors that can legitimately be said to impact the true economic value of a stock. However, one popular and relatively simple heuristic is the Dogs of the Dow strategy. This strategy involves purchasing in equal dollar amounts, on January first of each year, the 10 highest yielding (dividend rate divided by price) stocks in the Dow Jones industrial average, holding them for 12 months, and then selling them (dogsofthedow.com, 1999). Depending on the time period studied, the Dogs of the Dow strategy returns 2-3 percent more than the Dow index itself (Pratt, 1998). Lawrence S. Pratt, under the auspices of the American Institute for Economic Research (AIER) developed one very well researched and documented fundamental stock selection heuristic. This heuristic was based upon the above described Dogs of the Dow heuristic. The AIER study examined the impact of the following variations to the Dogs of the Dow heuristic: 1. Variation of the number of stocks purchased, between 1 (only the highest yielding stock) and 30 (all of the Dow stocks) 2. Variation of the holding period for stocks purchased between 1 month and 36 months 3. Incremental purchases purchase a fraction of the portfolio equal to 1/holding period each month rather than reinvesting the whole portfolio on January 1st of each year. The variation of holding period and number of stocks purchased was tested over 33 years of monthly Dow data for the period 1963 to 1996. The best risk-adjusted heuristic was found to be the purchase of the top 4 highest yielding stocks and holding them for 18 months. This relatively simple heuristic generated a return that was 5.3% higher than the overall Dow, and 2.8% higher than the Dogs of the Dow heuristic (Pratt, 1998). The key to both the standard Dogs of the Dow heuristic and the AIER enhanced version of it is the purchase of stocks with high dividend yields from the Dow. The choice of the Dow index itself should not be overlooked as a key component to these strategies. The Dow stocks are all household names: large companies with diversified businesses and significant financial resources (Pratt, 1998). The implication is that when one of the Dow companies goes through a business downturn, it will have the resources to climb back rather than go out of business. Furthermore, due to the size and popularity of these companies, liquidity should not be a problem, and surprises and bizarre financial developments should be few and far between (Pratt, 1998). Most Dow stocks also have significant operations in foreign markets, giving the investor exposure to opportunities abroad without direct exposure to the political and currency risks associated with foreign equity ownership.

There are a number of reasons for choosing the dividend yield as a fundamental indicator to purchase a Dow stock. The board of directors sets the dividend payment for the company. Let us make the reasonable assumption that the board of directors has an accurate view of the businesss long-term outlook, and they wish to pay a stable dividend over time. It is then reasonable to infer that a relatively large dividend indicates well-founded confidence in the business on the part of its board of directors. Furthermore, when a stock paying a relatively high dividend is owned, the dividend is collected and added to the return of the strategy. This means that even if two stocks appreciate the same amount in terms of price, the one paying the higher dividend gives the better total return. Finally, a high dividend yield can occur due to a depressed stock price. Because of the favorable stability characteristics of Dow stocks discussed in the previous paragraph, it is more likely that a Dow stock with a depressed price will eventually return to favor, returning significant capital gains, than that it will continue to fall and eventually go out of business (Pratt, 1998). Heuristics like those described above have the advantage of being relatively simple. However, in the never-ending quest for greater risk adjusted returns there is the question of whether such techniques could be improved upon. Could the computing power now widely and inexpensively available be applied in such a way as to increase investment returns? The application of computational techniques to financial forecasting and portfolio allocation, often called Computational Finance, has been of interest to both academics and private investors for several years. Numerous academic papers were found on the topic. The majority of the academic studies used some form of Artificial Neural Network as the core software technology. There were a large variety of applications, network architectures, and results achieved. The effectiveness of Neural Networks for forecasting and prediction was analyzed in a meta-study performed by Adya and Callopy (1998). They evaluated 48 studies of Neural Networks done between 1988 and 1994, and found 22 of them to be effectively validated - containing results that were compared with other alternatives. Of these 22, 18 supported the potential of Neural Networks for forecasting and prediction. The applicability of Neural Networks to financial markets is evidenced by the variety of Neural Network based studies achieving favorable performance trading in liquid markets including currencies, commodities, stocks and stock indexes. Numerous studies on currency trading and exchange rate prediction produced favorable results, including US/Australian dollar exchange rate (Flower, Cripps, Jabri, and White, 1995), and spot rates for the British pound, Deutsche mark, French franc, Japanese yen, and the Swiss franc (Gencay, 1999). Pi and Rognvaldsson (1995) used a Neural Network to successfully trade cotton futures.

Studies of stock and stock index forecasting and trading also produced generally favorable results using Neural Networks. Bengio (1996) used a Neural Network based stock trading system on 35 Canadian stocks over a period of 6 years (3/89 2/95) and was able to beat a buy-and-hold strategy on those same stocks by 7.4%. Atiya (1996) tested a system on 100 S&P 500 companies over the period July 1989 October 1994 and achieved an astonishing 33.7% per year return versus 7.9% for the benchmark S&P 500 index. Steiner and Wittkemper (1997) demonstrate a system that beat a market portfolio by 60% on 30 German stocks over 1991-1994. These studies give evidence for optimism in the application of Neural Networks to stock trading. Additionally, Gencay and Stengos (1998) used a Neural Network on the Dow Jones Industrial Average and found some evidence of non-linear predictability in stock market returns. A variety of network architectures were used in studies of financial prediction and allocation. Many studies used the straightforward 3 or 4 layer feed forward back propagation architecture, including Pi and Rognvaldsson (1995), Choey and Weigend (1996), and Atiya (1996). One study (Zimmermann and Weigend, 1996) extends the Neural Network architecture to 6 layers. Other than the number of layers, some of the more interesting architectural variations include: The use of a recurrent architecture whereby the price predictions for one period are fed back as inputs for the next period (Bassi, 1995). In this way, past price information (actually predictions for earlier time periods) can be stored in network states and used according to trained weights as are other input factors for later time period predictions. The use of multiple Neural Networks to form a committee (Gutjahr, 1996). With this architecture, multiple standard networks are built using different input parameters but the same objective function. Each network gives an independent answer and the results are combined using a majority rule. The use of a hierarchical 2-Network structure where one network is an evaluator and one is a critic (Klenin, 1996). The evaluator network outputs are the inputs to the critic network. Input factors used in the neural network based studies varied widely. At the minimalist end of the spectrum, Chandra and Reeb (1999) used only historical pricing data, and Atiya (1996) used only time as inputs. Many studies used a number of technical and fundamental factors as input: Gencay and Stengos (1998) 2 factors: price and volume history. Bengio (1996) 5 factors, 2 of which represent macro-economic variables which are known to influence the business cycle, and 3 of which are micro-economic variables representing the profitability of the company and previous price changes of the stock.

Choey and Weigend (1996) 19 factors including stock, bond, and gold price indexes and foreign exchange rates. Gutjahr (1996) used 4 classes of factors spread across multiple neural networks. Specific inputs were not given, but the classes were charts, statistics, psychology, and fundamentals.

Many applications of neural networks in capital markets merely try to predict prices or the direction thereof. For these, the choice of output functions is usually dictated by the prediction goal of the system. However, when the goal is to generate a portfolio allocation rather than predictions per se, the question of what the neural network outputs should be and how to translate those outputs into a portfolio allocation is not so straightforward. Intuitively, it would seem that a good approach might be to attempt to predict the price, then buy the stocks with the greatest predicted price increase. Surprisingly, no studies were found suggesting that the intuitive approach worked well for stocks. Choey and Weigend (1996), Moody, Wu, Liao, and Saffell (1998), and Bengio (1996) all suggest that when attempting to generate a portfolio allocation, a neural network should be trained on an economic objectivethe Sharpe ratio, profit, or wealth--rather than a price prediction objective. This is most directly useful when comparing two alternatives, for example a stock index (risky asset) versus a money market fund (riskless asset). The system simply shifts into the asset the system predicts best meets the economic objective. Steiner and Wittkemper (1997) attempt to predict excess return (return greater than the index) for each stock within the index, rank them according to that prediction, then build a portfolio with the top ranked stocks. Pi and Rognvaldsson (1995) attempt to predict the slope (trend) of price at 3 future times, and generate a buy or sell trading signal if at least 2 out of 3 of the output nodes forecast a definite trend. Atiyas (1996) system generates a stop loss and profit objective, generating a sale whenever the price falls below the stop loss or rises above the profit objective. The right amount of training (number of epochs) is critical in any neural network application, and financial applications are no exception. Determining the optimum number of epochs when training neural networks on financial data is especially difficult. Such data is typically both highly noisy and exhibits non-stationarity: market conditions change over time and therefore the optimum network weights for one section of data are likely to be quite different from the optimum network weights for another section of data. Burgess (1995) suggests that by limiting the number of hidden nodes, in his study to two, overtraining is not a problem and the network can simply be trained until convergence. Zimmermann and Weigend (1996) found the use of a 6-layer architecture dramatically reduces the problem of overtraining. Lawrence, Giles, and Tsoi (1996) setan appropriate stopping

point a priorifrom experiments with a separate segment of data that was not part of the subsequent training and test data sets. Lawrence, Giles, and Tsoi (1996) also make the tradeoff between nonstationarity and noise resistance explicit: If the training set is too small, noise makes it harder to find true patterns in the data. If the training set is too large, the non-stationarity of the data means that more data with statistics that are less relevant for the task at hand is used when creating the estimator. Regarding non-stationarity, some studies suggest using a rolling data window ending at the most recent observation, including Pi and Rognvaldsson (1995), and Flower, Cripps, Jabri, and White (1995). RESEARCH APPROACH Goals, assets considered, and historical data period: Because of the favorable characteristics of the stocks which make up the Dow Jones Industrial Average, we choose to allocate a portfolio among only those stocks and a cash account. Funds in the cash account are presumed to earn the 91Day Treasury Bill (T-Bill) interest rate and the cash account is also presumed to be without risk to principal. We also assume a fairly long term time horizon: The shortest holding period for any stock, even with our shortterm trading approaches, will be one month. We are also considering pretax returns--no effort is made to reduce or even calculate taxes. All of these assumptions reflect the situation of an investor with an IRA or other self managed tax-deferred or tax-free account, where the goal is a long term increase in the total account balance. The assumptions certainly do not reflect the situation or goals of the currently fashionable day traders and momentum players. The historical time period of the data set is quite long, from January 1961 through October 1998. As discussed below, not all of this data is actually be used to trade stock; a certain percentage will constitute the training set for the first trade decision.

Ra w Fa cto rs:
P r ic e D i v id e n d P a y m e n t T - B il l R a t e B u y - a n d - h o ld D o w In d e x A cco u n t 1 (s to ck ,$ )
( R e fe r e n c e t o r a w d a t a f o r v a lu a t io n p u r p o s e s )

M a t h e m a t ic a l/ T im e S e r i e s T r a n s f o r m a t io n s

A cco u n t 2 (s to ck ,$ )

...

A cco u n t n (sto ck ,$ )

(Trades )

N e tw o rk In p u t F a c t o rs :
P r ic e % c h a n g e v s . la s t P r ic e % c h a n g e v s . 6 m o n t h M A P r ic e % b e l o w 1 2 m o n t h h ig h P r ic e % a b o v e 1 2 m o n t h lo w D iv id e n d Y ie ld D iv id e n d R a n k T - B ill R a t e T - B ill R a t e % c h a n g e v s . la s t T - B ill R a t e % c h a n g e v s . 6 m o n t h D o w % c h a n g e v s la s t D ow % change vs 6 m onth P r i c e % c h a n g e /D o w % c h a n g e 1 m o n t h P r i c e % c h a n g e /D o w % c h a n g e 6 m o n t h

T r a d in g S y s te m 1
( P r e d ic t io n s )

T r a d in g S y ste m 2

...

T r a d in g S y s te m n

S ix F e e d F o r w a r d /B a c k P r o p a g a t io n A r t i fi c i a l N e u r a l N e t w o r k s , t r a i n e d t o p r e d ic t :
* T o t a l r e t u r n o v e r s h o r t , m e d iu m , a n d lo n g t e r m * P e r f o r m a n c e r e la t i v e t o t h e D o w o v e r s h o r t , m e d iu m , a n d lo n g t e r m

Figure 1: System Block Diagram Network and system architecture: A standard multi-layer feed forward back propagation neural network was chosen as the core software technology for this study. The architecture of this type of network is well known and documented (Rich and Knight, 1991). Furthermore, the applicability of this type of network to the task of portfolio allocation is evidenced by the favorable results of studies in other financial applications. The number of hidden layers, hidden nodes, learning rate, momentum factor, and training epochs will be adjusted as necessary to achieve convergence. A block diagram for the system appears in Figure 1. Input factors: The input data begins with the raw factors required for the Dogs of the Dow and American Institute for Economic Research heuristic. This data consists of monthly measures of each of the following: The price of each stock in the Dow. The dividend payment of each stock in the Dow. The above two raw factors are manipulated to produce the actual input factors for the neural network. The dividend yield of each stock can be calculated by projecting the most recent dividend payment over a year period and dividing by the current price. The stocks can then be ranked by dividend yield to produce a dividend rank. The dividend yield and dividend rank are both used as input factors. Network input factors derived from price are the percentage change in price versus last month, the percentage above or below the moving average of the last 6 months, percentage below 12 month high and percentage above 12 month low.

Additionally, the 91-Day T-Bill rate is used as a raw input factor, a network input factor, and in calculation of interest added to the cash account. Derived input factors from this rate are the change since last month and the change since six months ago. Finally, a composite index reflecting a buyand-hold strategy for all 30 Dow stocks is used as a raw input factor and a benchmark. Derived input factors from the Dow index are the change since last month, the change since six months ago, and the percent change in the stock price versus the percentage change in the 30-Dow Composite over one and six months preceeding time frames. The latter factors are an attempt to capture how each stock has performed recently, compared to the market. Network outputs: A total of six network output configurations are tested two prediction strategies over three time frames. For both of the prediction strategies, short term is defined to be one month in the future, medium term is defined to be six months in the future, and long term is defined to be 18 months in the future. The two prediction strategies are: 1. Prediction of total return over short, medium, and long periods in the future. The total return includes dividends, the change in stock price (capital gains), and the value of any spin-offs given to holders of a particular stock. The return vs. the market is smoothed over a uniform range of 0.1 0.9. 2. Prediction of total return vs. market total return over short, medium, and long periods. The return vs. the market is smoothed over a uniform range of 0.1 0.9. A separate neural network is used for each combination of time frame and prediction strategy, resulting in six completely independent networks, each with their own parameters such as learning rate and training epochs. Training the network and determination of the test period: Each stocks time series is pre-labeled with short, medium, and long term actual results at each time period for which such results are available. A rolling data window is used to train the network. This means that only a certain number of recent data points are used in training, and the network is retrained prior to each periods portfolio allocation. Because we re-train the network prior to each decision point, an issue of when the latest time period used to train the network arises. To accurately simulate the real world situation, the actual returns must never be used in any way as training data sooner than they would have been available to a real world system. We have defined the long-term output nodes to reflect the return over an 18-month period of time. Thus, to train the long-term networks accurately, we must end training 18 months prior to the period under test. Similarly, we must end training six months prior to the period under test for the medium-term networks, and one month prior to the period under test for the short-term networks.

To accomplish this, the training window varies for each network. Essentially this means we set the starting point of training based on the window size and the time frame of the network we are training. It is postulated that this approach may be superior to a fixed training window because nearer term predictions may be more sensitive to non-stationarity. If the window size is size w and the period under test is n, the network is trained according to the following: Output Node Short-term Medium-term Long-term Training Window Begin (n-w)-1 (n-w)-6 (n-w)-18 Training Window End n-1 n-6 n-18

Table 1: Training Window Definition The first test period begins at the start of available data (January 1961) + the training data window size (w) + 18 months. The test period ends at the last available data point (October 1998). Trading systems: Once we have a neural network prediction of total return or return versus the market for all 30 Dow stocks over short, medium, and long time frames, we must translate this output into buy and sell orders in order to test the value of these predictions. There are a nearly infinite number of ways to do so. We tested a number of different approaches in order to gain a realistic analysis of the value of the network predictions. A trading approach consists of a set of rules for when and what to buy and sell. Each trading approach has one or more parameters defining specific values at which the rules are applied. The combination of a trading approach with a particular set of parameters constitutes a trading system. Each trading system uses the same network output or outputs, and manages a virtual account (portfolio) completely separate from all other trading systems. This allows a side-by-side comparison of the various trading systems to be made. The rules common to all trading systems are: Each account starts with $100,000 at the beginning of the test period. All accounts are tax sheltered: Tax-deferred or tax free like an IRA, 401-K, etc. Each trade (buy or sell) costs a $12 flat rate commission. Trades can occur only one time per account per month. Buy trades which would result in a single stock comprising more than 25% of the portfolios value will be reduced such that the result of the trade places no more than 25% of the portfolios value in that stock. A single issue may rise above 25% of the portfolios value due to market fluctuations without generating sell trades.

Trades will only be made if transaction costs for the trade are less than 1% of the dollar amount to be traded. At $12 per trade this translates into a smallest trade size of $1,200. Cash not invested in stocks is presumed to earn the T-Bill rate. The parameters for the trading systems are: Parameter NUMSTOCKS LTHOLD MTHOLD STHOLD STOPLOSS% PRICETARGET % Description The number of stocks to consider buying or selling at one time. Minimum holding period for stocks purchased by long term trading approaches Minimum holding period for stocks purchased by mixed term trading approaches Minimum holding period for stocks purchased by short term trading approaches Percent off purchase price to place stop losses Percent above purchase price which constitutes the sale target price Table 2: Trading System Parameters The trading approaches, and their associated rules for trading, are as follows: Long Term Trader Fully Invested: Use all available cash to buy equal dollar amounts of the top NUMSTOCKS stocks based on long-term predicted performance. Sell a stock when it has been held for at least LTHOLD months and the current long-term performance prediction is below NUMSTOCKS from the top. The last part of the sell rule avoids selling a stock that is immediately going to be repurchased. Short Term Trader Fully Invested: Use all available cash to buy equal dollar amounts of the top NUMSTOCKS stocks based on the short-term predicted performance. Sell a stock when it has been held at least STHOLD months, and the short-term performance prediction is below NUMSTOCKS from the top. Long Term Trader and Market Timer: Buy equal dollar amounts of the top NUMSTOCKS stocks based on long-term predicted performance. However, if the long-term predicted performance for any of the top NUMSTOCKS stocks is less than the current T-Bill rate, hold cash instead of that stock. Sell a stock when it has been held for at least LTHOLD months and the current long-term prediction is below NUMSTOCKS from the top. Range 1-10 6-18 1-18 1-6 2%-32% in 4% increments 10% - 100% in 20% increments

Short Term Trader and Market Timer: Buy equal dollar amounts of the top NUMSTOCKS stocks based on the short-term predicted performance. However, if the short-term predicted performance for any of the top NUMSTOCKS stocks is less than the current T-Bill rate, hold cash instead of that stock. Sell a stock when it has been held at least STHOLD months and for which the short-term prediction is below NUMSTOCKS from the top. Long Term Price Target Trader: Buy equal dollar amounts of the top NUMSTOCKS stocks based on long-term predicted performance. Sell a stock when the price falls below STOPLOSS% of the purchase price, when the price rises above PRICETARGET% of the purchase price, or when the stock has been held for at least LTHOLD months and the current long-term prediction is below NUMSTOCKS from the top. Short Term Price Target Trader: Buy equal dollar amounts of the top NUMSTOCKS stocks based on short-term predicted performance. Sell a stock when the price falls below STOPLOSS% of the purchase price, when the price rises above PRICETARGET% of the purchase price, or when the stock has been held for at least STHOLD months and the short-term prediction is below NUMSTOCKS from the top. Mixed Term Trader: Buy equal dollar amounts of the top NUMSTOCKS stocks based on the average of short, medium, and long-term predicted performance. Sell a stock when the stock has been held for at least MTHOLD months and the average of short, medium, and long-term predicted performance is below NUMSTOCKS from the top.
RESULTS Network Convergence: Getting the network to converge was initially very difficult. In order to have an additional measure of network convergence, the initial convergence testing was done using the performance vs. market data. That way, we were able to plot both mean squared error (MSE) and prediction error percentage. Prediction error percentage denotes how often the network was wrong in predicting whether a given stock would over- or under-perform the market. In order to achieve convergence, the following steps were required: Use multiple networks with only one output node each. The original plan was to have short, medium, and long term output nodes all on the same network. It was decided to split this up, so each network had only a single output node and multiple independent networks were used. The training inputs and outputs were smoothed to a uniform range. Originally, the input nodes were not smoothed at all, and the output nodes ranged from 1 to 1. It was decided to run the data through a range transformation on both the input and output nodes, transforming to a range of 0.1 0.9. Experimented with a wider range of learning rates, as low as 0.01.

Set the training window (w) to 100 periods. Initially, this was to be determined experimentally starting at 2/3 of the data, but training 2/3 of the data (about 300 periods) took a very long time for each test, and was introducing large amounts of non-stationarity into the training range.

0.006

0.005

0.004

0.003

Training Test

MSE

0.002

0.001

0 1 568 1135 1702 2269 2836 3403 3970 4537 5104 5671 6238 6805 7372 7939 8506 9073 9640 Epoch X 10

Figure 2: MSE plot for training/test data sample run


60

50

40 Prediction Error

30

Training All Test Best 50% Test

20

10

0 1 583 1165 1747 2329 2911 3493 4075 4657 5239 5821 6403 6985 7567 8149 8731 9313 9895 Epoch X 10

Figure 3: Prediction Error plot for training/test data sample run

By taking the aforementioned actions, satisfactory convergence was obtained on the initial test set. Figure 2 shows a plot of Mean Squared Error (MSE) for one of the sample runs. This shows the MSE for a learning rate of 0.02 in a 3 layer network with 13 input nodes, 8 hidden nodes, and one output node predicting the long term performance vs. market for 30 Dow stocks, with 100 periods of training data and 50 periods of test data. Figure 3 shows a plot of prediction error for the same network and training/testing data set. Note there are three lines in Figure 3: the training set, the entire testing set, and just the 50% of the testing set where the networks predictions were farthest from market performance. In other words, best 50% is the 50% of predictions where the network gave the clearest indication of over/under performance vs. the market. Figures 2 and 3 show a network which converges around 25,000 epochs, as well as a prediction accuracy far exceeding random change (around 35% error rate) at convergence. Of course, we would like to see near 100% accuracy at convergence, but 65% accuracy in predicting over/under performance of a stock vs. the entire market is a respectable showing nonetheless. Once we had an idea of the training values that would produce convergence, we next proceeded to determine the learning rate and training epochs required for convergence in the other 5 networks. Table 3 summarizes the values yielding the best convergence that was found for each network.

Network
Short term return prediction Medium term return prediction Long term return prediction Short term return vs. market prediction Medium term return vs. market prediction Long term return vs. market prediction

Learning Rate
0.16 0.16 0.16 0.16 0.04 0.02

Training Epochs
6000 6000 6000 10000 20000 25000

Table 3: Experimentally Determined Network Parameters Prediction Runs: As mentioned, 100 periods of trading data were used in determining the network parameters above as this gave satisfactory convergence. 12 data periods are needed prior to the start of training to ensure an accurate assessment of the 12-month high and low price for each stock. 18 data periods are needed after the training set to ensure no training data contaminates the test data set for the long-term predictions. Therefore, the first available period for test is period 130, which corresponds to November 1971. The last period available for test is the last period in the

data set, period 453, which corresponds to October 1998. This gives us a net of 323 periods upon which to run testable predictions. For each test period, each of the six networks was trained on 100 periods of data with the experimentally determined learning rate for the experimentally determined number of training epochs. As described above, each network was re-trained for each single test period prediction. The most recent available 100 periods of data that did not contain any future results were always used for training each network. Table 4 summarizes the training periods used for the first test period, as an example.

Network
Short term return prediction Medium term return prediction Long term return prediction Short term return vs. market prediction Medium term return vs. market prediction Long term return vs. market prediction

Training begin Training end


29 24 12 29 24 12 129 124 112 129 124 112

Table 4: Example of training ranges used for test period 130 Trading Systems: The trading systems were coded to make use of the test predictions in simulated trading. The best result from each trading system is summarized below, followed by result statistics for those parameters, and finally followed by result statistics averaged across that trading approach, prediction approach, and stocks-at-a-time parameter combination. The result statistics are given in terms of total return over the test period, the total return rendered as an annualized yield, the total return versus the buyand-hold Dow index total return, and the standard deviation of 12 month returns as a quantitative measure of portfolio volatility. For comparison purposes, the statistics for the buy-and-hold Dow index portfolio over the same period of study are: Total Return=2769.14% Annualized Return=13.2812% Standard Deviation of 12 month returns=0.155832 Results of each trading approach follow: Long Term Trader Fully Invested: Results for total return vs. market prediction, 2 stocks at a time: Longs held for at least 8 months:

Tot. Return=8256.44%, Annualized=17.8708%, 5487.3% return vs. DOW, StdDev=0.228657 * Average total return: 5172.42% * Average total return vs. Dow: 2403.29% * Average annualized total return: 15.8712% Short Term Trader Fully Invested: Results for total return vs. market prediction, 7 stocks at a time: Longs held for at least 2 months: Tot. Return=3373.66%, Annualized=14.0888%, 604.521% return vs. DOW, StdDev=0.202818 * Average total return: 2569.54% * Average total return vs. Dow: -199.593% * Average annualized total return: 12.9782% Mixed Term Trader Fully Invested: Results for total return prediction, 1 stocks at a time: Longs held for at least 11 months: Tot. Return=8707.37%, Annualized=18.1012%, 5938.24% return vs. DOW, StdDev=0.259385 * Average total return: 3180.55% * Average total return vs. Dow: 411.413% * Average annualized total return: 13.8466% Long Term Trader Market Timer: Results for total return vs. market prediction, 2 stocks at a time: Longs held for at least 8 months: Tot. Return=9515.04%, Annualized=18.4868%, 6745.91% return vs. DOW, StdDev=0.233792 * Average total return: 5391.05% * Average total return vs. Dow: 2621.91% * Average annualized total return: 16.0463%

Short Term Trader Market Timer: Results for total return prediction, 10 stocks at a time: Longs held for at least 6 months: Tot. Return=2974.78%, Annualized=13.5729%, 205.643% return vs. DOW, StdDev=0.159123 * Average total return: 2274.02% * Average total return vs. Dow: -495.118% * Average annualized total return: 12.4868% Long Term Price Target Trader: Results for total return prediction, 3 stocks at a time: Stop Loss at 2% below purchase price: Price Target at 90% above purchase price: Longs held for at least 13 months: Tot. Return=11276.4%, Annualized=19.2296%, 8507.31% return vs. DOW, StdDev=0.271124 * Average total return: 3483.17% * Average total return vs. Dow: 714.037% * Average annualized total return: 14.2204% Short Term Price Target Trader: Results for total return vs. market prediction, 6 stocks at a time: Stop Loss at 10% below purchase price: Price Target at 10% above purchase price: Longs held for at least 4 months: Tot. Return=4027.77%, Annualized=14.8224%, 1258.64% return vs. DOW, StdDev=0.19401 * Average total return: 2593.8% * Average total return vs. Dow: -175.333% * Average annualized total return: 13.0162%

Statistical Analysis: In order to establish the significance of the results, a statistical analysis of the monthly returns was performed on the best trading system found for each trading approach. The results of this analysis are given below. For each test, Dow is the buy-and-hold Dow index and the subject trading approach is denoted by its initials (for instance, LTTFI is Long Term Trader, Fully Invested). The final test establishes the statistical difference between the best long-term approach and the simpler LTTFI approach.
Paired T-Test and CI: Dow, LTTFI Paired T for Dow - LTTFI N 323 323 323 Mean 1.146 1.535 -0.389 StDev 4.516 5.662 3.098 SE Mean 0.251 0.315 0.172 P-Value = 0.025

Dow LTTFI Difference

95% CI for mean difference: (-0.728, -0.050) T-Test of mean difference = 0 (vs not = 0): T-Value = -2.26

Paired T-Test and CI: Dow, STTFI Paired T for Dow - STTFI Dow STTFI Difference N 323 323 323 Mean 1.146 1.234 -0.089 StDev 4.516 5.098 2.219 SE Mean 0.251 0.284 0.123

95% CI for mean difference: (-0.331, 0.154) T-Test of mean difference = 0 (vs not = 0): T-Value = -0.72 Paired T-Test and CI: Dow, MTTFI Paired T for Dow - MTTFI N 323 323 323 Mean 1.146 1.560 -0.414 StDev 4.516 5.911 3.782 SE Mean 0.251 0.329 0.210

P-Value = 0.474

Dow MTTFI Difference

95% CI for mean difference: (-0.828, -0.000) T-Test of mean difference = 0 (vs not = 0): T-Value = -1.97

P-Value = 0.050

Paired T-Test and CI: Dow, LTTMT Paired T for Dow - LTTMT N 323 323 323 Mean 1.146 1.583 -0.437 StDev 4.516 5.753 3.240 SE Mean 0.251 0.320 0.180 P-Value = 0.016

Dow LTTMT Difference

95% CI for mean difference: (-0.792, -0.082) T-Test of mean difference = 0 (vs not = 0): T-Value = -2.42

Paired T-Test and CI: Dow, STTMT Paired T for Dow - STTMT Dow STTMT Difference N 323 323 323 Mean 1.146 1.176 -0.0304 StDev 4.516 4.708 1.5072 SE Mean 0.251 0.262 0.0839

95% CI for mean difference: (-0.1954, 0.1346) T-Test of mean difference = 0 (vs not = 0): T-Value = -0.36 Paired T-Test and CI: Dow, LTPTT Paired T for Dow - LTPTT N 323 323 323 Mean 1.146 1.650 -0.504 StDev 4.516 6.131 3.816 SE Mean 0.251 0.341 0.212

P-Value = 0.717

Dow LTPTT Difference

95% CI for mean difference: (-0.922, -0.086) T-Test of mean difference = 0 (vs not = 0): T-Value = -2.37

P-Value = 0.018

Paired T-Test and CI: Dow, STPTT Paired T for Dow - STPTT Dow STPTT Difference N 323 323 323 Mean 1.146 1.287 -0.141 StDev 4.516 5.085 2.175 SE Mean 0.251 0.283 0.121

95% CI for mean difference: (-0.379, 0.097) T-Test of mean difference = 0 (vs not = 0): T-Value = -1.17

P-Value = 0.245

Paired T-Test and CI: LTTFI, LTPTT Paired T for LTTFI - LTPTT N 323 323 323 Mean 1.535 1.650 -0.115 StDev 5.662 6.131 3.447 SE Mean 0.315 0.341 0.192 P-Value = 0.549

LTTFI LTPTT Difference

95% CI for mean difference: (-0.492, 0.262) T-Test of mean difference = 0 (vs not = 0): T-Value = -0.60

CONCLUSIONS A few notes are in order regarding network convergence. Due to the extremely chaotic nature of stock data, network convergence was particularly subtle and difficult to obtain. At first, two networks with three output nodes each (long, medium, and short) were tried. Convergence was not obtained with this architecture, which is why we eventually split each time frame into its own network for each of the two prediction strategies, yielding a total of six networks. Prior to this architecture change, increasing the number of hidden nodes, increasing the number of layers, and adding a momentum term were all tried without successful convergence. Furthermore, the initial configuration did not smooth input and output nodes to a uniform range. In order to achieve convergence it was necessary to transform all input and output variables to a range of 0.1 0.9. This range was recommended (Rich and Knight, 1991) in order to capture the optimal range of output discrimination. Finally, we had to try much smaller learning rates, as small as 0.01, in order to begin seeing learning curves with the smoothness to determine if and when convergence was occurring. Even with the above actions, convergence is not always clear and dramatic, but was at least discernable in most cases. As mentioned in the results section, the training window size of 100 periods was settled on without a great degree of experimentation. Since each convergence test required several hours to run, convergence was so difficult to acquire in the first place, and a window size of 100 periods was shown to produce convergence and also left a large time period to test (over 26 years), it was decided that further experimentation with the window size would not be beneficial. A final caveat is in order regarding the poor short-term results and its implications on the efficacy of the short-term network predictions. Most of the initial network architecture and tuning work was done with long term predictions. It was initially believed that the network would have more difficulty making long-term predictions than short-term predictions.

Therefore, the long-term network was architected and tuned to convergence first. The resultant architecture of the long-term network was then used as a starting point in tuning the short and medium term networks. It is possible that, had the short-term predictions been used as a starting point in convergence testing, the short-term results would have been better. Turning to the trading systems output, some patterns are very clear from the quantitative results. Trading approaches which used the long term (18 month) predictions as the sole or partial basis for trading outperformed the buy-and-hold Dow index on a statistically significant basis, with p values in the range of 0.016 0.025. Trading approaches that used short-term (1month) predictions did not outperform the Dow on a statistically significant basis. Frequent trading (1-5 months) was particularly devastating to returns. See Appendix D for examples of short term trading systems with their associated poor results. The increased returns of the long term trading systems did carry a measure of increased volatility, as evidenced by increased standard deviation of both annual and monthly returns versus the buy-and-hold Dow index. Thus, in the final analysis there is still a trade-off to be made between using this system versus simply buying and holding all Dow stocks. An investor must decide if the increased volatility is justified by the increased returns. The "best" trading system found was the Long Term Price Target Trader, using total return predictions with a 2% stop loss and 90% price target, purchasing 3 stocks at a time. This system produced returns about 6% higher than the Dow on an annualized basis. It is notable that the stop loss and price target values found mirror the classic traders adage, cut your losers and let your winners run. The results of this study on the efficacy of the two prediction strategies tested are inconclusive. While the trading system yielding the highest return was based upon the total return prediction strategy, there is no statistically significant difference between this system and other systems using the total return vs. market total return prediction strategy. The p-value for mean difference between the Long Term Price Target Trader using the total return prediction strategy and the Long Term Trader Fully Invested using the total return vs. market prediction strategy is 0.549, which indicates no statistically significant difference. Even though the dataset used was the same as the AIER heuristic, the date range available for testing in this study was different from the AIER study. This is due to the omission of a percentage of the data from test, which this method requires to be set aside for establishing initial values and training the network for its first predictions. Therefore, exact statistical comparisons cannot reasonably be made with the nominal performance of the Dogs of

the Dow and AIER heuristics. However, a reasonable comparison can be made in terms of return surplus how much each system outperformed the underlying index (the Dow) over an extended test period. As cited in the literature review, depending on the time period studied the Dogs of the Dow strategy returns 2-3 percent more than the Dow index itself (Pratt, 1998). The AIER heuristic generated a return that was 5.3% higher than the overall Dow, and 2.8% higher than the Dogs of the Dow heuristic (Pratt, 1998). The best trading system found by this study yielded a return a little over 6% higher than the buy-and-hold Dow strategy did. Thus, the results of this study appear to be at least on par, if not slightly better than, the AIER index. The results are clearly superior to the Dogs of the Dow strategy and the buy-and-hold Dow index itself. To summarize, the key findings of this study are: The efficacy of normalizing the input and output nodes of a neural network to a uniform range of 0.1 0.9. The efficacy of separating different prediction time frames (short, medium, and long) into entirely separate neural networks. The achievement of 65% accuracy at predicting the over or under performance of stocks in the Dow vs. the entire Dow index over an 18 month time frame, across a 50 month (4 years 2 months) test period. The statistically significant superiority of long-term prediction based trading approaches over the buy-and-hold Dow index, and the achievement of annualized returns approximately 6% greater than the buy-and-hold Dow index over a 26 year test period. The lack of superiority exhibited by short-term prediction based trading systems when compared to the buy-and-hold Dow index system. The determination of the trading system giving the highest return: the Long Term Price Target Trader, using the total return prediction strategy with a stop-loss at 2% below the purchase price, a price target at 90% above the purchase price, purchasing 3 stocks at a time and holding for at least 13 months. ACKNOWLEDGEMENT I am grateful to Dr. Hsinchun Chen of the University of Arizona for his guidance and assistance with this research project. REFERENCES Adya, Monica and Collopy, Fred. 1998. How Effective are Neural Networks at Forecasting and Prediction? A Review and Evaluation. Journal of Forecasting 17, 481-495 (1998)

Atiya, Amir. 1996. Design of Time-Variable Stop Losses and Profit Objectives Using Neural Networks. In Decision Technologies for Financial Engineering: Proceedings of the Fourth International Conference on Neural Networks in the Capital Markets (NNCM 96). Pasadena, California, USA: World Scientific Bassi, Danilo F. 1995. Stock price prediction by recurrent multilayer neural network architectures. In Neural Networks in Financial Engineering: Proceedings of the Third International Conference on Neural Networks in the Capital Markets. London, England: World Scientific Bengio, Yoshua. 1996. Training a Neural Network with a Financial Criterion rather than a Prediction Criterion. In Decision Technologies for Financial Engineering: Proceedings of the Fourth International Conference on Neural Networks in the Capital Markets (NNCM 96). Pasadena, California, USA: World Scientific Burgess, A. N. 1995. Statistical yield curve arbitrage in Eurodollar futures using neural networks. In Neural Networks in Financial Engineering: Proceedings of the Third International Conference on Neural Networks in the Capital Markets. London, England: World Scientific Chandra, Nidhi and Reeb, David M. 1999. Neural networks in a market efficiency context. American Business Review. v17n1 (Jan 1999) 39-44 Choey, Mark and Weigend, Andreas S. 1996. Nonlinear trading models through Sharpe ratio maximization. In Decision Technologies for Financial Engineering: Proceedings of the Fourth International Conference on Neural Networks in the Capital Markets (NNCM 96). Pasadena, California, USA: World Scientific Flower, Barry and Cripps, Tony and Jabri, Marwan and White, Andrew. 1995. An Artificial Neural Network Based Trade Forecasting System for Capital Markets. In Neural Networks in Financial Engineering: Proceedings of the Third International Conference on Neural Networks in the Capital Markets. London, England: World Scientific Gencay, Ramazan and Stengos, Thanasis. 1998. Moving Average Rules, Volume, and the Predictability of Security Returns with Feedforward Networks. Journal of Forecasting. 17, 401-414 (1998) Gencay, Ramazan. 1999. Linear, non-linear and essential foreign exchange rate prediction with simple technical trading rules. Journal of International Economics 47 (1999), 92-107

Gutjahr, Steffen. 1996. Improving neural prediction systems by building independent committees. In Decision Technologies for Financial Engineering: Proceedings of the Fourth International Conference on Neural Networks in the Capital Markets (NNCM 96). Pasadena, California, USA: World Scientific http://www.dogsofthedow.com/dogsteps. 1999 Dogs of the Dow: Dog Steps. Dogsofthedow.com Lawrence, Steve and Giles, C. Lee and Tsoi, Ah Chung. 1996. Symbolic conversion, grammatical inference and rule extraction for foreign exchange rate prediction. In Decision Technologies for Financial Engineering: Proceedings of the Fourth International Conference on Neural Networks in the Capital Markets (NNCM 96). Pasadena, California, USA: World Scientific Klenin, Marjorie. 1996. Neural Networks for Risk Analysis in Stock Price Forecasts. In Decision Technologies for Financial Engineering: Proceedings of the Fourth International Conference on Neural Networks in the Capital Markets (NNCM 96). Pasadena, California, USA: World Scientific Mahoney, John E. 1978. Buy low sell high : anyone can make money in the market the formula way. Toronto : Pagurian Press Malkiel, Burton G. 1985. A Random Walk Down Wall Street, Fourth Edition. New York: W. W. Norton & Company Mamis, J. and Mamis, R. 1977. When to sell : inside strategies for stockmarket profits. New York: Farrar, Straus and Giroux Moody, John and Wu, Lizhong and Liao, Yuansong and Saffell, Matthew. 1998. Performance Functions and Reinforcement Learning for Trading Systems and Portfolios. Journal of Forecasting 17, 441-470 (1998) Pi, Hong and Rognvaldsson, Thorsteinn S. 1995. A Neural Network Approach to Futures Trading. In Neural Networks in Financial Engineering: Proceedings of the Third International Conference on Neural Networks in the Capital Markets. London, England: World Scientific Pratt, Lawrence S. 1998. How to Invest Wisely with Toward an Optimal Stock Selection Strategy. Great Barrington, Massachusetts: American Institute for Economic Research Rich, Elaine and Knight, Kkevin. 1991. Artificial Intelligence 2nd edition. McGraw-Hill, Inc.

Steiner, Manfred and Wittkemper, Hans-Georg. 1997. Portfolio optimization with a neural network implementation of the coherent market hypothesis. European Journal of Operational Research 100 (1997) 2740 Vaga, T. 1994. Profiting from Chaos: Using Chaos Theory for Market Timing, Stock Selection, and Option Valuation. New York: McGraw-Hill, Inc. Zimmermann, Hans-Georg and Weigend, Andreas S. 1996. Representing dynamical systems in feed-forward networks: A six layer architecture. In Decision Technologies for Financial Engineering: Proceedings of the Fourth International Conference on Neural Networks in the Capital Markets (NNCM 96). Pasadena, California, USA: World Scientific

S-ar putea să vă placă și