Sunteți pe pagina 1din 11

Stat Comput

DOI 10.1007/s11222-013-9375-7

A novel Hybrid RBF Neural Networks model as a forecaster


Oguz Akbilgic Hamparsum Bozdogan
M. Erdal Balaban

Received: 11 October 2011 / Accepted: 3 January 2013


Springer Science+Business Media New York 2013

Abstract We introduce a novel predictive statistical modeling technique called Hybrid Radial Basis Function Neural
Networks (HRBF-NN) as a forecaster. HRBF-NN is a flexible forecasting technique that integrates regression trees,
ridge regression, with radial basis function (RBF) neural
networks (NN). We develop a new computational procedure
using model selection based on information-theoretic principles as the fitness function using the genetic algorithm (GA)
to carry out subset selection of best predictors. Due to the
dynamic and chaotic nature of the underlying stock market process, as is well known, the task of generating economically useful stock market forecasts is difficult, if not
impossible. HRBF-NN is well suited for modeling complex
non-linear relationships and dependencies between the stock
indices. We propose HRBF-NN as our forecaster and a predictive modeling tool to study the daily movements of stock
indices. We show numerical examples to determine a predictive relationship between the Istanbul Stock Exchange
National 100 Index (ISE100) and seven other international
stock market indices. We select the best subset of predictors
by minimizing the information complexity (ICOMP) criterion as the fitness function within the GA. Using the best
subset of variables we construct out-of-sample forecasts for
O. Akbilgic () M.E. Balaban
Istanbul University School of Business Administration, Istanbul,
Turkey
e-mail: oguzakbilgic@gmail.com
M.E. Balaban
e-mail: balaban@istanbul.edu.tr
H. Bozdogan
Statistics, Operations, and Management Science, and Center
for Intelligent Systems and Machine Learning (CISML),
The University of Tennessee, Knoxville 37996, USA
e-mail: bozdogan@utk.edu

the ISE100 index to determine the daily directional movements. Our results obtained demonstrate the utility and the
flexibility of HRBF-NN as a clever predictive modeling tool
for highly dependent and nonlinear data.
Keywords Forecasting Stock markets Neural networks
Variable selection Radial basis functions

1 Introduction
We introduce a novel predictive statistical modeling approach called Hybrid Radial Basis Function Neural Networks (HRBF-NN) as a forecaster. HRBF-NN is a flexible
forecasting technique that integrates regression trees, and
ridge regression, with radial basis function (RBF) neural
networks (NN). We develop a new computational technique
based on information complexity (ICOMP) criterion (Bozdogan 1994, 2000, 2004) as the fitness function within the
genetic algorithm (GA) to carry out subset selection of best
predictors.
New generation research argue that one needs to look at
financial market data on stock indices from an evolutionary
or more from an adaptive biological points of view rather
than looking at such problems from physical systems points
of view. Traditional modeling techniques of the financial
markets such as random walk models (i.e., the Brownian
motion ideas), and rigid stochastic processes do not adapt
and land themselves to the ever changing dynamics of the
financial data (Lo and MacKinlay 1988). They are inflexible
and do not capture some of the behavioral and psychological
issues that the financial portfolio managers and the investors
face.
It is because of this and due to the dynamic and chaotic
nature of the underlying stock market process, as is well

Stat Comput

known, the task of generating economically useful stock


market forecasts is difficult, if not impossible even in the
presence of high multicollinearity between the explanatory
variables.
Akbilgic and Bozdogan (2011) studied the predictive
performance of HRBF-NN predictive model using a large
scale simulation under different and complicated structures
and compared the results to the classical regression model.
Based on the encouraging and very positive results of these
authors, in this paper, we propose HRBF-NN predictive
model specifically as an application to forecast the ISE100
index and to determine its daily directional movements. In
what follows, we provide our numerical example on the
ISE100 index data set between January 5, 2009 to February 22, 2011 as our response (or output) variable and other
international stock indices as our explanatory (or input) variables. We derive the information complexity (ICOMP) criterion (Bozdogan 1994, 2000, 2004) as the fitness function
using the genetic algorithm (GA) to choose the best subset
of predictors. Using the best subset of explanatory variables
we construct out-of-sample forecasts for the ISE100 index
to determine its daily directional movements.
The paper is organized as follows. In Sect. 2, we present
the current state of the literature on the interaction of the international markets as a whole and the usage of Radial Basis Function (RBF) Neural Networks (NN). In Sect. 3, we
present Radial Basis Function Neural Networks (RBF-NN);
estimation of optimal network weights using a data adaptive
ridge regression; a combination of regression trees and RBFNN; how to transform the tree nodes into RBFs to construct
HRBF-NN predictive model. We show the derived forms
of the information complexity (ICOMP) criterion under the
HRBF-NN predictive model. We present the two forms of
ICOMP; one when the model is correctly specified, and the
other when the model is misspecified using the robust covariance estimator of the HRBF-NN model. Further, in this
section, we present the use of the genetic algorithm (GA)
for subset selection of best predictors and the summary of
its implementation. In Sect. 5, we present and discuss our
numerical example on the ISE100 index data set. We select
the best subset of predictors by minimizing the information
complexity (ICOMP) criterion as the fitness function within
the GA. Using the best subset of variables we construct outof-sample forecasts for the ISE100 index to determine the
daily directional movements. Section 6 concludes the paper
with some discussion.

2 Current state of the literature


Due to the globalization of the financial markets, forecasting stock market movements has become difficult. Because
of the inter-dependency of stock markets with each other,

one should take into consideration the effect of this dependency in order to make accurate forecasts. This interdependency has especially large impact on stock market indices in
developing countries, such as Turkey.
In reviewing the literature, we note that there are several
studies which show the influence of international markets on
the Istanbul Stock Exchange (ISE) and predict the direction
of movements in the ISE100 index. For example, Korkmaz
et al. (2011) in their causality study, showed that ISE100 index is affected by the US markets. In another study, Ozun
(2007) showed the influence of volatility of stock markets
in developing countries along with Turkish and Brazilian
stock markets. In his study, Ozun (2007) also showed that
the US markets have influence in the positive direction,
i.e., when the US markets go up, the ISE100 index goes
up. Further, Vuran (2000) showed that the ISE100 index
is co-integrated with stock markets of the United Kingdom
(FTSE), Brazil (BOVESPA), and Germany (DAX). However, in their study, Boyacioglu and Avci (2010) used the
BOVESPA, Dow Jones Industrials (DJI), and DAX indices
along with other macro- and micro-economic variables as
predictors to forecast the ISE100 index. Cinko and Avci
(2007) compared artificial neural networks and regression
models in forecasting the daily values of the ISE100. Their
results show that neural networks models perform better
than the classical regression model. In their study, they used
only the lagged series of ISE100 as explanatory variables.
On the other hand, Ozdemir et al. (2011) used the MSCI
Emerging Markets (EM), MSCI European (EU), and S&P
500 (SP) indices along with other macroeconomic indicators to forecast the direction of movement of the ISE100 index.
Further, we also note that Radial Basis Function (RBF)
Neural Networks (NN) (RBF-NN) have been used as an alternative method in forecasting problems in stock markets.
The RBF-NN (Broomhead and Lowe 1988) model is a
special type of feed-forward neural network with one input, one hidden, and one linear output layer. In RBF-NN
the number of parameters is fewer than Multilayer Neural
Networks (ML-NN) because inputs are directly connected
to the hidden layer without using weights (Haykin 1999).
Center, width, and weight parameters of RBF-NN are the adjustable parameters. Moreover, center and width parameters
belong to the hidden layer of RBF-NN. On the other hand,
the weight parameters are the connection weights between
the hidden and output layers. Traditional RBF-NN learning
algorithms are used to determine the best parameters using
iterative techniques such as the gradient descent, and forward selection procedures.
There are several methods proposed to automatically
construct an RBF-NN model. In their study, Sun et al. (2005)
combined optimal partition algorithm with RBF-NN in order to determine the center and width parameters. Further,

Stat Comput

they showed the efficiency of their method by applying it


to prediction of S&P 500 (SP) and Shanghai Stock Exchange indices. Rivas et al. (2004) used an evolutionary algorithm in order to automatically determine the center and
width parameters of the RBF-NN model and applied their
results to forecast the exchange rates between the British
pound and the US dollar. Short-term electricity price forecasting has also been handled by a self-adaptive RBF-NN,
in which the RBF parameters are determined by fuzzy cmeans and differential evolution algorithms by Meng et al.
(2008).
Although, RBF-NN has been used by other researchers
in the literature above in forecasting stock market indices,
what makes our approach in this paper different, is in the
introduction of the clever genetic algorithm (GA) in subset selection of the best predictors in HRBF-NN model
that prunes the redundant predictors by the use of an objective entropic model selection procedure such as the information complexity (ICOMP) criterion (Bozdogan 1994,
2000, 2004) as our fitness function to choose the best subset of variables to be used in forecasting. In this process, our proposed method of HRBF-NN, is also based on
the idea of using regression trees to determine the number of radial basis functions in the hidden layer. We estimate the center and the width parameters data adaptively
based on the work of Kubat (1998) and Orr (2000). We
address the problem of singularity of the model matrix
by using adaptive ridge regression (Tikhonov and Arsenin
1977) in estimating the weight parameters. Moreover, we
choose the best fitting RBF by scoring and minimizing
the ICOMP values among several portfolio of candidate
RBFs.
Next, we present the detailed step-by-step explanation of
HRBF-NN predictive model.

3 Hybrid Radial Basis Function Neural Networks:


HRBF-NN model
In this section, we introduce the individual components of
Hybrid Radial Basis Function Neural Networks (HRBFNN) model which we utilize in this paper.

y = f (w, x)
=

m


wj hj (x) = w1 h1 + w2 h2 + + wm hm ,

(1)

j =1

where the regressors, {hj (x)}m


j =1 , are fixed radial basis functions of the explanatory variables, x n , and {wj }m
j =1 are
the unknown adaptable coefficients (weights) (Howlett and
Jain 2001).
In model (1), we can use several different RBFs which
provide the flexibility to handle non-linear relations. In this
paper, we consider the Gaussian, Cauchy, Multi-quadratic,
and Inverse Multi-quadratic RBFs given as follows (Howlett
and Jain 2001).
Gaussian RBF (GS):
 p

 (xk cj k )2
hj (x) = exp
(2)
rj2k
k=1
Cauchy RBF (CH):
hj (x) =

1 + exp(

1
p

(xk cj k )2
)
k=1
rj2k

Multiquadratic RBF (MQ):



 p


 (xk cj k )2

hj (x) = 1 + exp
rj2k
k=1

(3)

(4)

Inverse Multiquadratic RBF (IMQ):


1
hj (x) =
p
1 + exp( k=1

(xk cj k )2
)
rj2k

(5)

In short, the RBF-NN introduces a mapping or transformation of the n-dimensional inputs non-linearly to an mdimensional space and then estimates a model using linear
regression. The non-linear transformation is achieved using
m basis functions, each characterized by their centers cj in
the (original) input space and a width or radius vector rj ,
j {1, 2, . . . , m} (Orr 2000). Poggio and Girosi (1990) has
shown that RBF-NN possess the property of best approximation.
3.2 Estimation of weights: ridge regression

3.1 Radial Basis Function Neural Networks


RBF-NNs are a type of general linear model where the input
data are transferred to feature space by non-linear transformations using Radial Basis Functions. In this sense, RBFNN model can be written as a general linear model with the
dependent variable y and independent (or explanatory) variables x1 , x2 , . . . , xm in the following form

Given a network (or model) in (1) consisting of m RBFs with


m
centers {cj }m
j =1 and radii {rj }j =1 and a training set with p
p
patterns, {(xi , yi )}i=1 , the optimal network weights can be
found by minimizing the sum of squared errors:
SSE =

p


2
f (xi ) yi
i=1

(6)

Stat Comput

and is given by

 1 
w = H H
H y,

(7)

the so-called least squares estimation (Liu and Bozdogan


2004). Here H is the design or model matrix, with its el
ements Hij = hj (xi ), and similarly y = (y1 , y2 , . . . , yp ) is
the p-dimensional vector of the training set of output or response values.
When the least square estimation is used, singularity of
the model matrix is a common problem. In this case, the
use of global ridge regression (Tikhonov and Arsenin 1977;
Bishop 1991) is the way to resolve and avoid singularity. In
global ridge regression, a roughness penalty term is added to
the sum of squared errors to produce the cost function while
countering the effects of overfitting;
C(w, ) =

p


f (xi ) yi

i=1

m


wj2 =  + w  w

(8)

i=1

which is minimized to find a weight vector that is more robust to noise in the training set. The optimal weight vector
for global ridge regression is given by
1 


w = H H + Im H y,

(9)

where Im is the m dimensional identity matrix (Tikhonov


and Arsenin 1977).
Although there are different methods to estimate the best
parameter, we use the approach proposed by Hoerl, Kennard and Baldwin (HKB) (Hoerl et al. 1975) given by
H KB =

ms 2


w LS w LS

(10)

which is a data adaptive approach. Here, m = k is the number of predictors not including the intercept term, n is the
number of observations, s 2 is the estimated error variance
using k predictors so that
s2 =

1
(y H w LS ) (y H w LS )
(n k + 1)

(11)

and w LS is the estimated coefficient vector obtained from a


no-constant model given by

1 
H y.
w LS = H  H

(12)

3.3 Combining regression trees and RBF-NN


3.3.1 Regression trees
The basic idea of regression trees (RT) is to partition the
input space recursively into two, and approximate the function in each half by the average output value of the samples

(Breiman et al. 1984). Each split is parallel to one of the axes


and can be expressed as an inequality among the input components (e.g. xk > b). The input space is divided into hyperrectangles organized into a binary tree where each branch is
determined by the dimension (k) and boundary (b), which
together minimize the residual error between the model and
data (Orr 2000). The root node of the regression tree is the
smallest hyper rectangle that will include all of the training
p
data {xi }i=1 . Its size sk (half-width) and center ck in each
dimension k are

1

max(xik ) min(xik )
iS
2 iS


1
ck = max(xik ) + min(xik )
iS
2 iS
sk =

(13)
(14)

where k K is the set of predictor indices, and S =


{1, 2, . . . , p} is the set of training set indices. A split of the
root node divides the training samples into left and right
subsets, SL and SR , on either side of a boundary b in one of
the dimensions k such that
sL = {i : xik b},

(15)

sR = {i : xik > b},

(16)

The mean output value on either side of the bifurcation is


yL =

1 
yi ,
pL

(17)

1 
yi ,
pR

(18)

iSL

yR =

iSR

where pL and pR are the number of samples in each subset.


The mean square error (MSE) is then



1 
MSE(k, b) =
(yi y L )2 +
(yi y R )2 . (19)
p
iSL

iSR

The split which minimizes MSE(k, b) over all possible


choices of k and b is used to create the children of the
root node and is found by a simple discrete search over m dimensions and p observations. The children of the root node
are split recursively in the same manner; the process terminates when every remaining split creates children containing fewer than pmin samples, which is a parameter of the
method. The children are shifted with respect to their parent
nodes and their sizes reduced in the k-th dimension.
RT can both estimate a model and indicate which components of the input vector are most relevant to the modelled
relationship. Dimensions which carry the most information
about the output tend to split earliest and most often (Orr
2000).

Stat Comput

3.3.2 Transforming tree nodes into RBFs


A regression tree contains a root node, some non-terminal
nodes (having children) and some terminal nodes (having no
children). Each node is associated with a hyper rectangle of
input space having a center c and size s as described above.
The node corresponding to the largest hyper rectangle is the
root node and that is divided up into smaller and smaller
pieces progressing down the tree. To transform the hyper
rectangles into different basis kernel RBFs we use its center
c as the RBF center and its size s, scaled by a parameter
as the RBF radius given by
r = s.

(20)

The scalar has the same value for all nodes (Kubat 1998),
and
is1another parameter of the method. One can use =
2K where K is the Kubats parameter (Kubat 1998;
Orr 2000).

4 Information complexity ICOMP criterion as a fitness


function and the genetic algorithm
4.1 Information complexity ICOMP criterion
The approach we will take in this paper will be based on
cost functions which measures the goodness-of-fit or performance of a fitted model for a given stock price index.
The risk, that is, the expected cost of choosing the best fitting model will be measured in terms of a new entropic or
information-based criterion which is based on different characterization of good models by combining penalties with
lack of fit, lack of parsimony, and the profusion of complexity.
In our case, we will base the fitness function on the information theoretic model selection criteria. We shall derive
the information-theoretic measure of complexity (ICOMP)
criterion of Bozdogan (1988, 1994, 2000, 2004) to choose
the best fitting basis kernel RBFs, and the best subset of predictors with the hybridized GA with regularized regression
trees and RBF networks.
The complexity of a non-parametric regression model increases with the number of independent and adjustable parameters, which is also termed as effective degrees of freedom in the model. According to the qualitative principle of
Occams Razor, the simplest model that fits the observed
data is the best model. Following this principle, we provide
a trade off between how well the model fits the data and the
model complexity in one criterion function.
We use the information criteria to evaluate and compare
different horizontal and vertical subset in the genetic algorithm (GA) for the regularized regression trees and RBF networks model given by (1) under the assumption that the random error term is a multivariate Gaussian distribution, i.e.,
N (0, 2 I ).

One of the general forms of ICOMP is an approximation to the sum of two Kullback-Leibler (KL) (Kullback and
Leibler 1951) distances. For a general multivariate normal
linear or non-linear structural model this general form of
ICOMP is given by


ICOMP(IFIM) = 2 log L( ) + 2C1 F 1 ( ) ,

(21)

where 2 log L( ) is the maximized log likelihood (i.e., the


lack of fit component), C1 (F 1 ( )) is the maximal information theoretic complexity of the estimated inverse-Fisher
information matrix (IFIM) given by

1

s
tr(F ( ))
C1 F 1 ( ) = log L
2
s


1
log F 1 ( ) ,
2

(22)

and where s = dim(F 1 ) = rank(F 1 ).


For the regression trees and RBF networks, the estimated
inverse Fisher information matrix (IFIM ) is given by




2 (H  H )1
0
2
1
 w,
Cov
= F =
(23)
2 4 ,
0
4
and where
2 =

)
(y H w
) (y H w
n

(24)

is the estimated error variance.


The computational form of ICOMP(IFIM) in (21) for the
HRBF-NN predictive model becomes:


ICOMP(IFIM) = n log(2) + n log L 2 + n


+ 2C1 F 1 ( )

(25)

where the entropic complexity is



4

1
tr 2 (H  H )1 + 24

C1 F
= (m + 1) log
m+1
4
 2
 1 
2
1


+ log
log H H
2
4

(26)

as our performance measure for choosing the best fitting


model when the model is correctly specified. That is, when
the random error term follows a Gaussian distribution.
On the other hand, when the model is misspecified, we
define ICOMP(IFIM) for misspecified models as
ICOMP(IFIM)Misspec


 )Misspec ,
= 2 log L( ) + 2C1 Cov(

(27)

Stat Comput

where
F 1 .
 )Misspec = F 1 R
Cov(

(28)

1 is the inner-product (or the Hessian) form of


In (28) F
the estimated inverse Fisher information matrix (IFIM) and
is the outer-product form of IFIM. Cov(
 )Misspec is often
R
called the sandwich covariance or robust covariance estimator, since it is a correct covariance regardless whether
of the assumed model is correct or not. When the model is
 = R,
 and the formula (28) reduces to
correct we have F
1
the usual estimated inverse Fisher information matrix F
(White 1982).
For the misspecified HRBF-NN predictive model, the
as in Bozdogan (2004) is
outer-product form of IFIM, R,
given by
 1  2

 1 Sk
H
D
H
H
4
3

2

= 
R
(29)
(nq)(Kt1) ,
Sk 
(H  1 2
)
3
4

4

where

12 , . . . ,
n2 ),
D 2 = diag(
H is (n q) matrix of regressors or model matrix,
Sk is the estimated residual skewness,
Kt the kurtosis, and 1 is a (n 1) vector of ones.

1. Implementing a genetic coding scheme: The first step


of the GA is to represent each subset model as a binary
string. A binary code of 1 indicates presence where a 0
indicates absence of the relevant predictor. Every string is
of the same length, but contain different combinations of
predictor variables. For a data set with k = 6 predictors
with a constant, the following string represents a variable
subset including the constant, and the input variables x2 ,
x3 , and x6 .
1 0 1 1 0 0 1
x0 x1 x2 x3 x4 x5 x6

We note that using the sandwich covariance matrix in (28)


we penalize the presence of skewness and kurtosis in the
variables as we fit the HRBF-NN model, which is not possible using AIC-type criteria. See, e.g., Bozdogan (2004).
Under misspecification, the computational form of
ICOMP(IFIM)Misspec is given by


ICOMP(IFIM)Misspec = n log(2) + n log 2 + n


 )Misspec .
+ 2C1 Cov(

pling (Liang and Wong 2001). There are several applications of the GA in a variety of fields including econometrics (Routledge 1999), finance (Neely et al. 1997), and image processing (Bhandarkar et al. 1994). Additionally, successful application of the GA on variable selection has also
been reported in the literature (Bozdogan and Howe 2012;
Howe and Bozdogan 2010).
Recall that the regularized regression tree and RBF networks model given in (1), the GA is used to find the best or
nearly best subset of predictors from the data.
The summary of the implementation of the GA is as follows.

(30)

4.2 The genetic algorithm for subset selection of best


predictors
The genetic algorithm (GA) is a stochastic (or probabilistic)
search algorithm that employs natural selection and genetic
operators. A GA treats information as a series of codes on a
binary string, where each string represents a different solution to a given problem. It follows the principles of survival
of the fittest, which was introduced by Charles Darwin. The
algorithm searches within a defined search space to solve a
problem. It has outstanding performance in finding the optimal solution for problems in many different fields (Eiben
and Smith 2010).
The genetic algorithm has been used as an optimization
tool in order to find optimal parameters in many statistical methods such as robust regression (Burns 1992), experimental design (Hamada et al. 2001), and Bayesian sam-

2. Generating an initial population of the models: We


choose an initial population of size N consisting of randomly selected models from all possible models.
3. Using a fitness function to evaluate the performance
of the models in the population: A fitness function provides a way of evaluating the performance of the models.
For our fitness function, we use ICOMP under misspecification defined in (30), calculated using the in-sample
predictions obtained by HRBF-NN for each solution.
Variable selection with the GA is hence a minimization
problem (Bozdogan 2004). Note that there is no lower
bound for ICOMP values, just like with AIC-Type model
selection criteria (Akaike 1973).
4. Selecting the parent models from the current population: This step is to choose models to be used in the next
step to generate a new population. Selection of parents
models is based on natural selection; models with better
fitness values have greater chance to be selected as parents. We calculate the difference of the fitness function
as
ICOMP(i) = ICOMPMax ICOMPi = Range

(31)

for i = 1, 2, . . . , N , where N is the population size. Here


we write ICOMP instead or ICOMP(IFIM)Misspec for
simplicity. Next, we average these differences by computing
ICOMP =

n
1 
ICOMP(i)
N
i=1

(32)

Stat Comput

Then the ratio of each models difference value to the


mean difference value is calculated. That is, we compute
ICOMPRatio =

ICOMP(i)
ICOMP

(33)

This chance of a model being mated is proportional to


this ratio. The process of selecting mates to produce offspring models continues until the number of off-springs
equals to the initial population size.
5. Produce off-spring models by crossover and mutation
process: The selected parents are then used to generate
offspring by performing crossover and/or mutation processes on them. Both the crossover and mutation probabilities are determined by the analyst. A higher crossover
probability introduces more new models into the population in each generation, while removing more of the
good models from the previous generation. Mutation is
a random search operator that helps to jump around the
search space within the solutions scope. Lin and Lee
(1996) states that mutation should be used sparingly,
since with a high mutation probability the algorithm becomes little more than a random search. There are several different ways of performing crossover: single-point
crossover, two-point crossover, and uniform crossover,
etc. (Eiben and Smith 2010). In this study, we use singlepoint crossover to carry out the variable selection via
the genetic algorithm. In the literature of the GA, there
is no theoretical results available in how to choose the
crossover and the mutation parameters of the GA, since
they depend on the application area and the problem at
hand. However, based on the established empirical results, almost all researchers agree that crossover probability should be greater than 0.3 while mutation probability is smaller than 0.1. See, e.g., Fouskakis and
Draper (2002). Although the crossover and mutation
probabilities are typically set in the intervals (0.5,1.0)
and (0.005,0.01), Srinivas and Patnaik (1994) and Zhang
et al. (2007) in their study, fix the crossover and mutation probabilities at 0.6, and 0.01, respectively. Furthermore, De Jong (1975) in his simulation study found that
the best values of crossover and mutation probabilities
should be 0.6 and 0.001, respectively. Based on this, De
Jong and Spears (1989) later fixed the crossover probability at 0.6 in his yet another study. In the absence of theoretical results, in this paper, we use a statistically based
experimental study suggested by one of the referees to
find best value(s) of the crossover and mutation probabilities. These results are given in Sect. 5.2.

Table 1 Abbreviations list for stock market indexes


Variable name

Variable explanation

ISE100

Istanbul stock exchange national 100 index

SP

Standard & poors 500 return index

DAX

Stock market return index of Germany

FTSE

Stock market return index of UK

NIK

Stock market return index of Japan

BVSP

Stock market return index of Brazil

EU

MSCI European index

EM

MSCI emerging markets index

5 Numerical example: forecasting the direction of


movement of ISE100 index
In this section, we give our numerical examplevariable
selection with HRBF-NN to determine the optimal lags of
international indicators predicting the direction of movement of the ISE100 index. Based on the literature, the list
of different stock market indices considered in our numerical analysis is given in Table 1.
The organization and the preparation of our data set to
determine the training sample size, the best fitting radial basis function, the best subset selection with GA, forecasting
the movement of the direction of ISE100 are all explained
in detail as follows.
5.1 Preparing the data set for the analysis
We obtained daily price data for the indices listed in Table 1 from http://finance.yahoo.com and http://imkb.gov.tr
and converted the prices to returns. We then have n = 536
daily returns between the dates January 5, 2009 to February
22, 2011. We excluded the days for which the Turkish stock
exchange was closed. In the case of missing data for the
other indices, the previous days value was used. After constructing the first and second lags for all indices, the usable
number of observations is 534. These lags are taken into account as the explanatory variables and they are indicated by
adding 1 and 2 to the end of the variable names. For example, first and second lags for DAX are named as DAX1 and
DAX2, respectively. For comparing the ISE100 lags against
the other indices, the time differences between Turkey and
the other countries were also taken into account. Our data is
available from the link: http://www.akbilgic.com/.
5.2 Variable selection for Istanbul stock exchange
We used the HRBF-NN model to determine the best subset
of one- and two-days lags of all eight indices that most effectively and efficiently predicts the ISE100 directional movement. We have 16 predictors in total; first eight predictors

Stat Comput
Table 2 Variable selection with the genetic algorithm
Mut. pr.

Xover pr.

RBF

ICOMP

Variable subset

0.65

0.010

Gauss

2773.4

0111111000101101

0.85

0.010

Gauss

2773.1

1111111000101111

0.65

0.010

IMQ

2772.9

0110111111101010

0.65

0.005

IMQ

2771.4

0010010111101000

0.65

0.010

Cauchy

2771.4

0110011011101000

0.60

0.050

Gauss

2770.8

0010000111101011

0.50

0.100

Gauss

2770.7

0110011111101000

0.80

0.005

Gauss

2770.6

0110010111101011

0.75

0.010

Gauss

2770.6

0110010111101011

0.60

0.100

Gauss

2769.6

0010111011101001

are the one-day lags of the indices listed in Table 1 while the
other eight predictors are the two days lags of the same indices. As previously stated, the ICOMP(IFIM)Misspec criterion is used to score the subset models as the fitness function
within the genetic algorithm (GA). We used a population
size of N = 25, which is larger than the number of predictors, 16. Although, in our experiments, there was generally
no improvement on the fitness function after 18th generation, we let the GA run for 30 iterations. We ran genetic algorithm for different values of crossover and mutation probabilities attempting to minimize the ICOMP score. In our
experimental study, we set the crossover and mutation probabilities from the sets {0.50, 0.55, 0.60, 0.65, 0.70, 0.75,
0.80, 0.85, 0.90, 0.90} and {0.005, 0.01,0.05, 0.1}, respectively.
During our experimental study, we ran the GA for variable selection for different mutation and crossover probabilities, and the RBF combinations, 160 times in total. We
ran the GA for four of the RBFs, Gaussian, Cauchy, Multi
Quadratic, and Inverse Multi Quadratic. The results for best
ten out of 160 runs for different parameter combinations are
listed in Table 2. During each run of the GA elitism rule
has been applied in order to keep the best subset in each
generation of the GA found so farthis ensures monotonic
improvement of the fitness function.
Table 2 shows that, the fitness function, ICOMP, is minimized at the GA parameters: Crossover Probability = 0.65,
Mutation Probability = 0.01, and the Gaussian RBF. Therefore, we chose the Gaussian RBF as the best fitting RBF for
forecasting the ISE100 index. The best fitting model with
our approach is given by the first variable subset in Table 2.
The predictors of best subset, represented as binary code,
correspond the variable subset, {SP1, DAX1, FTSE1, NIK1,
BVSP1, EU1, DAX2, NIK2, BVSP2, EM2}. Note that the
ISE1 and ISE2 variables are not in the best subset with minimum ICOMP value. This means that the ISE100 is strongly
affected by the lags of the international markets, not by its
own lags. This is an interesting as well as an important result

in the sense that what is currently used in practice is solely


based on the lag variables of the ISE100 itself. Our results
suggest that the ISE100 index does not require its lags to
build a predictive model.
5.3 Forecasting the direction of the movement of ISE 100
index
Using the best subset model identified by the GA and
HRBF-NN, we generate out-of-sample forecast returns for
ISE100. We then translated each forecast into a directional
sign +1 and 1. A forecast of +1 indicates the ISE100
index should appreciate in value that day; 1 indicates a
drop in value. These forecasts can then be interpreted as
buy-sell signals, respectively. We use the percentage of correctly forecasted (PCF) days as a criterion to validate our
results. Although PCF may not be a universally accepted
criterion, it allows us to interpret our results in simple and
logical terms. For example, if PCF is 65 %, it means that the
forecast movement direction of the ISE100 was correct for
65 out 100 days. We also use a second criterion called Dollar 100 (D100), which give us the theoretical future value of
$100 invested at the beginning of the term forecasted, and
traded according to the forecast signals. If the forecast is +1
and we do not have an outstanding position in the ISE100,
the decision is buy. In contrast, if 1 and we do have a position, the decision is sell. It is obvious that no action is taken
for the rest of the options.
Throughout our analysis, we used a rolling training sample size of 250 days, which was used to forecast the next
20 days. For example, our first forecast was 05.05.2009. On
day 21, the training set was updated by dropping off the earliest 20 days and adding on the latest 20. This periodic renewal of the training set allows our model and parameters to
accommodate the dynamic structure of the capital markets.
This rolling-forward process was run ten times, resulting in
a total of 200 forecast days, in groups of 20. In Table 3, we
show D100 results summarized by each group. For comparison, the right column shows the same metric for a simple
buy-and-hold strategy.
Looking at Table 3, a $100 investment is increased to
$202 in 200 trading days by managing investment according
to forecasts made using our best fitting HRBF-NN model.
Figure 1 shows that making Buy-Sell decisions according to
the HRBF-NN model would have been better than just investing and holding the position, over this period. Out of all
200 forecasts, 136 correctly identified the direction of the
ISE100 index, for a PCF of 68 %.
The results shown in Table 3 summarize the results by the
forecast period. For the first period, the buy-sell signals are
shown in Table 4, along with other relevant details. Here we
see that if we had invested $100 our investment would have
become $117.84 by making buy-sell decision according to
the best fitting HRBF-NN model forecast. Taking transac-

Stat Comput
Fig. 1 Movement of invested
$100 in 200 days

Table 3 Movement of invested $100 for 200 days split in to 10 periods

Table 4 Detailed movement of invested $100 for first 20 days period

Forecast terms

Buy-sell decisions by HRBF-NN

No buy-sell

Day

Beginning (Dollar)

100

100

Period 1

117.83

91.27

Period 2

127.24

90.62

Period 3

145.85

104.51

Period 4

153.34

99.72

Period 5

171.96

114.43

Period 6

185.07

131.00

Period 7

162.70

113.36

Period 8

176.53

112.08

Period 9

186.17

108.94

Period 10

202.06

105.52

tion costs into consideration, our investment would have become $116.90. If we had just bought and held the index, the
investment would have declined to $91.27.

6 Conclusions
In this paper, we studied a variable selection and forecasting
problem in stock markets by focusing on the ISE100 index.
We have identified a model for the effects of international
stock markets on the ISE100 index. We carried out a variable
subset selection using the GA along with ICOMP criterion
to determine which indices have important effects on the direction of the movement of the ISE100 index. Our variable
subset selection results selected the first and second lags of
NIK, DAX, and BVSP and only first lags of SP, FTSE, and
EU, and only second lag of EM indices as explanatory variables. It is interesting to see that none of the lags of ISE100
were selected as explanatory variables. Our results suggest
that the ISE100 index does not require its lag variables to
build a predictive model.

ISE100

Forecast

Decision

HRBF-NN

No buy-sell

Sell

100.0000

97.2299

Buy

97.5386

94.8367

Sell

97.5386

87.8112

Buy

107.8638

97.1066

Sell

107.8638

96.0809

Keep

107.8638

99.1060

Buy

108.5335

99.7213

Sell

108.5335

95.1962

Keep

108.5335

94.8693

10

Buy

110.8596

96.9026

11

Sell

110.8596

90.0334

12

Keep

110.8596

89.3095

13

Buy

112.5537

90.6742
85.6230

14

Sell

112.5537

15

Buy

117.5910

89.4550

16

Keep

119.3674

90.8064

17

Keep

121.7044

92.5842

18

Keep

118.6782

90.2821

19

Keep

117.8346

89.6404

20

Sell

117.8346

91.2655

Further, our results have shown that the HRBF-NN model


is a highly flexible and clever data mining technique that
can handle relationships between highly nonlinear data
structures. Although forecasting in stock markets is a very
difficult task, forecasts made by our model demonstrates
a high performanceapproximately with 65 % accuracy
rate.
One caveat of our approach is that made on the assumption that random noise follows a Gaussian distribution. Even with this assumption, HRBF-NN model is able
to adopt itself, due to its overall non-parametric nature.
In a future study, we intend to relax the Gaussian as-

Stat Comput

sumption and consider a more general distributional assumption on the random noise such as the Power Exponential (PE) distribution. The PE distribution includes the
Gaussian, Laplace, and other distributions as a subfamily.
Our results will be reported in a subsequent paper elsewhere.
All our computations has been carried out using newly
developed scripts in MATLAB . Since there is still ongoing
research being done by the first two authors in HRBF-NN
these scripts presently are not freely available.
Acknowledgements This work was supported by Scientific Research Projects Coordination Unit of Istanbul University under project
number 17708. We further acknowledge the valuable comments of the
three anonymous referees and the Associate Editor which resulted to a
much improved paper.

References
Akaike, H.: Information theory and an extension of the maximum likelihood principle. In: Petrox, B.N., Csaki, F. (eds.) Second International Symposium on Information Theory, pp. 267281. Akad.
Kiad, Budapest (1973)
Akbilgic, O., Bozdogan, H.: Predictive subset selection using regression trees and RBF neural networks hybridized with the genetic
algorithm. Eur. J. Pure Appl. Math. 4(4), 467485 (2011)
Bhandarkar, S., Zhang, Y., Potter, W.: Edge detection technique using genetic algorithm-based optimization. Pattern Recognit. 27,
11591180 (1994)
Bishop, C.: Improving the generalization properties of radial basis
function neural networks. Neural Comput. 3(4), 579588 (1991)
Boyacioglu, M., Avci, D.: An adaptive network-based fuzzy inference
systems (ANFIS) for prediction of stock market return: the case
of Istanbul stock exchange. In: Expert Systems with Applications,
vol. 37, pp. 79027912 (2010)
Bozdogan, H.: ICOMP: a new model-selection criteria. In: Bock, H.H.
(ed.) Classification and Related Methods of Data Analysis. NorthHolland, Amsterdam (1988)
Bozdogan, H.: Mixture-model cluster analysis using a new informational complexity and model selection criteria. In: Proceedings of
the First US/Japan Conference on the Frontiers of Statistical Modeling: An Informational Approach. Multivariate Statistical Modeling, vol. 2, pp. 69113. Kluwer Academic, Norwell (1994)
Bozdogan, H.: Akaikes information criterion and recent developments
in informational complexity. J. Math. Psychol. 44, 6291 (2000)
Bozdogan, H.: Intelligent statistical data mining with information complexity and genetic algorithms. In: Bozdogan, H. (ed.) Statistical
Data Mining and Knowledge Discovery, pp. 1556. Chapman &
Hall, London (2004)
Bozdogan, H., Howe, J.A.: Misspecified multivariate regression models using the genetic algorithm and information complexity as the
fitness function. Eur. J. Pure Appl. Math. 5(2), 211249 (2012)
Breiman, L., Freidman, J., Stone, J.C., Olsen, R.A.: Classification and
Regression Trees. Chapman & Hall, London (1984)
Broomhead, D.S., Lowe, D.: Multi-variable functional interpolation
and adaptive networks. Complex Syst. 11, 321355 (1988)
Burns, P.: A genetic algorithm for robust regression estimation. Technical report from Statistical Sciences, Inc. (1992)
Cinko, M., Avci, E.: A comparison of neural network and linear regression forecast of the ISE100 index. neri 7(28), 301307 (2007)
De Jong, K.A.: An analysis of the behavior of a class of genetic adaptive systems. Ph.D. Dissertation, University of Michigan (1975)

De Jong, K.A., Spears, W.M.: Using genetic algorithms to solve NPcomplete problems. In: Schaffer, J.D. (ed.) Third Conference on
Genetic Algorithms, pp. 124132. Morgan Kaufmann, San Mateo
(1989)
Eiben, A.E., Smith, J.E.: Introduction to Evolutionary Computing.
Springer, Berlin (2010)
Fouskakis, D., Draper, D.: Stochastic optimization: a review. Int. Stat.
Rev. 70(2), 315349 (2002)
Hamada, M., Martz, H., Reese, C., Wilson, A.: Finding near-optimal
Bayesian experimental designs via genetic algorithms. Am. Stat.
55(3), 175181 (2001)
Haykin, S.: Neural Networks: A Comprehensive Foundation. Prentice
Hall, New Jersey (1999)
Hoerl, A.E., Kennard, R.W., Baldwin, K.F.: Ridge regression: some
simulations. Commun. Stat. 4, 105123 (1975)
Howe, J.A., Bozdogan, H.: Predictive subset VAR modeling using the
genetic algorithm and information complexity. Eur. J. Pure Appl.
Math. 3(3), 382405 (2010)
Howlett, R.J., Jain, L.C.: Radial Basis Function Networks 1: Recent
Developments in Theory and Applications. Physica-Verlag, New
York (2001)
Korkmaz, T., Cevik, E., Birkan, E., Ozatac, N.: Causality in mean and
variance between ISE100 and S&P 500: Turkcell case. Afr. J. Bus.
Manag. 5(5), 16731683 (2011)
Kubat, M.: Decision trees can initialize radial basis function networks.
IEEE Trans. Neural Netw. 9(5), 813821 (1998)
Kullback, A., Leibler, R.: On information and sufficiency. Ann. Math.
Stat. 22, 7986 (1951)
Liang, F., Wong, W.: Real-parameter evolutionary Monte Carlo with
applications to Bayesian mixture models. J. Am. Stat. Assoc.
96(454), 653666 (2001)
Lin, C.-T., Lee, C.S.G.: Neural Fuzzy Systems: A Neuro-Fuzzy Synergism to Intelligent Systems. Prentice Hall, New York (1996)
Liu, Z., Bozdogan, H.: Improving the performance of radial basis function classification using information criteria. In: Bozdogan, H.
(ed.) Statistical Data Mining and Knowledge Discovery, pp. 193
216. Chapman & Hall, London (2004)
Lo, A.W., MacKinlay, C.: Stock market prices do not follow random
walks: evidence from a simple specification test. Rev. Financ.
Stud. 1, 4166 (1988)
Meng, K., Dong, Z.Y., Wong, K.P.: Self-adaptive radial basis function
neural networks for short-term electricity price forecasting. IEE
Proc., Gener. Transm. Distrib. 3(4), 325335 (2008)
Neely, C., Weller, P., Dittmar, R.: Is technical analysis in the foreign
exchange market profitable? A genetic programming approach.
J. Financ. Quant. Anal. 32(4), 405426 (1997)
Orr, M.: Combining regression trees and RBFs. Int. J. Neural Syst.
10(6), 453465 (2000)
Ozdemir, A.K., Tolun, S., Demirci, E.: Endeks getirisi yonunun ikili siniflandirma yontemiyle tahmin edilmesi: IMKB100 endeksi
ornegi. Nigde Univ. IIBF Derg. 4(2), 4559 (2011)
Ozun, A.: Are the reactions of emerging equity markets to the volatility
in advanced markets similar? Comparative evidence from Brazil
and Turkey. Int. Res. J. Finance Econ. 9, 220230 (2007)
Poggio, T., Girosi, F.: Regularization algorithms for learning that are
equivalent to multilayer networks. Science 247(4945), 978982
(1990)
Rivas, V.M., Merelo, J.J., Castillo, P.A., Arenas, M.G., Castellano,
J.G.: Evolving RBF neural networks for time-series forecasting
with EvRBF. Inf. Sci. 1655(5354), 207220 (2004)
Routledge, B.: Adaptive learning in financial markets. Rev. Financ.
Stud. 12(5), 11651202 (1999). Oxford University Press
Srinivas, M., Patnaik, L.M.: Adaptive probabilities of crossover and
mutation in genetic algorithms. IEEE Trans. Syst. Man Cybern.
24(4), 656667 (1994)

Stat Comput
Sun, Y.F., Liang, Y.C., Zhang, W.L., Lee, H.P., Lin, W.Z., Cao, L.J.:
Optimal partition algorithm of the RBF neural network and its
application to financial time series forecasting. Neural Comput.
Appl. 14, 3544 (2005)
Tikhonov, A.N., Arsenin, V.Y.: Solutions of Ill-Posed Problems. Wiley,
New York (1977)
Vuran, B.: The determination of long-run relationship between ISE100
and international equity indices using cointegration analysis. Istanb. Univ. J. Sch. Bus. Adm. 39(1), 154168 (2000)

White, H.: Maximum likelihood estimation of misspecified models.


Econometrica 50, 125 (1982)
Zhang, J., Chung, H.S., Lo, W.: Clustering-based adaptive crossover
and mutation probabilities for genetic algorithms. IEEE Trans.
Evol. Comput. 11(3), 326335 (2007)

S-ar putea să vă placă și