A Novel Hybrid RBF Neural Networks Model As A Forecaster Statistics & Computing

Stat Comput
DOI 10.1007/s11222-013-9375-7
A novel Hybrid RBF Neural Networks model as a forecaster

Oguz Akbilgic Hamparsum Bozdogan
M. Erdal Balaban
Received: 11 October 2011 / Accepted: 3 January 2013

Springer Science+Business Media New York 2013
Abstract We introduce a novel predictive statistical modeling technique called Hybrid Radial Basis Function Neural
Networks (HRBF-NN) as a forecaster. HRBF-NN is a flexible forecasting technique that integrates regression trees,
ridge regression, with radial basis function (RBF) neural
networks (NN). We develop a new computational procedure
using model selection based on information-theoretic principles as the fitness function using the genetic algorithm (GA)
to carry out subset selection of best predictors. Due to the
dynamic and chaotic nature of the underlying stock market process, as is well known, the task of generating economically useful stock market forecasts is difficult, if not
impossible. HRBF-NN is well suited for modeling complex
non-linear relationships and dependencies between the stock
indices. We propose HRBF-NN as our forecaster and a predictive modeling tool to study the daily movements of stock
indices. We show numerical examples to determine a predictive relationship between the Istanbul Stock Exchange
National 100 Index (ISE100) and seven other international
stock market indices. We select the best subset of predictors
by minimizing the information complexity (ICOMP) criterion as the fitness function within the GA. Using the best
subset of variables we construct out-of-sample forecasts for
O. Akbilgic () M.E. Balaban
Istanbul University School of Business Administration, Istanbul,
Turkey
e-mail: oguzakbilgic@gmail.com
M.E. Balaban
e-mail: balaban@istanbul.edu.tr
H. Bozdogan
Statistics, Operations, and Management Science, and Center
for Intelligent Systems and Machine Learning (CISML),
The University of Tennessee, Knoxville 37996, USA
e-mail: bozdogan@utk.edu
the ISE100 index to determine the daily directional movements. Our results obtained demonstrate the utility and the
flexibility of HRBF-NN as a clever predictive modeling tool
for highly dependent and nonlinear data.
Keywords Forecasting Stock markets Neural networks
Variable selection Radial basis functions
1 Introduction
We introduce a novel predictive statistical modeling approach called Hybrid Radial Basis Function Neural Networks (HRBF-NN) as a forecaster. HRBF-NN is a flexible
forecasting technique that integrates regression trees, and
ridge regression, with radial basis function (RBF) neural
networks (NN). We develop a new computational technique
based on information complexity (ICOMP) criterion (Bozdogan 1994, 2000, 2004) as the fitness function within the
genetic algorithm (GA) to carry out subset selection of best
predictors.
New generation research argue that one needs to look at
financial market data on stock indices from an evolutionary
or more from an adaptive biological points of view rather
than looking at such problems from physical systems points
of view. Traditional modeling techniques of the financial
markets such as random walk models (i.e., the Brownian
motion ideas), and rigid stochastic processes do not adapt
and land themselves to the ever changing dynamics of the
financial data (Lo and MacKinlay 1988). They are inflexible
and do not capture some of the behavioral and psychological
issues that the financial portfolio managers and the investors
face.
It is because of this and due to the dynamic and chaotic
nature of the underlying stock market process, as is well
Stat Comput
known, the task of generating economically useful stock

market forecasts is difficult, if not impossible even in the
presence of high multicollinearity between the explanatory
variables.
Akbilgic and Bozdogan (2011) studied the predictive
performance of HRBF-NN predictive model using a large
scale simulation under different and complicated structures
and compared the results to the classical regression model.
Based on the encouraging and very positive results of these
authors, in this paper, we propose HRBF-NN predictive
model specifically as an application to forecast the ISE100
index and to determine its daily directional movements. In
what follows, we provide our numerical example on the
ISE100 index data set between January 5, 2009 to February 22, 2011 as our response (or output) variable and other
international stock indices as our explanatory (or input) variables. We derive the information complexity (ICOMP) criterion (Bozdogan 1994, 2000, 2004) as the fitness function
using the genetic algorithm (GA) to choose the best subset
of predictors. Using the best subset of explanatory variables
we construct out-of-sample forecasts for the ISE100 index
to determine its daily directional movements.
The paper is organized as follows. In Sect. 2, we present
the current state of the literature on the interaction of the international markets as a whole and the usage of Radial Basis Function (RBF) Neural Networks (NN). In Sect. 3, we
present Radial Basis Function Neural Networks (RBF-NN);
estimation of optimal network weights using a data adaptive
ridge regression; a combination of regression trees and RBFNN; how to transform the tree nodes into RBFs to construct
HRBF-NN predictive model. We show the derived forms
of the information complexity (ICOMP) criterion under the
HRBF-NN predictive model. We present the two forms of
ICOMP; one when the model is correctly specified, and the
other when the model is misspecified using the robust covariance estimator of the HRBF-NN model. Further, in this
section, we present the use of the genetic algorithm (GA)
for subset selection of best predictors and the summary of
its implementation. In Sect. 5, we present and discuss our
numerical example on the ISE100 index data set. We select
the best subset of predictors by minimizing the information
complexity (ICOMP) criterion as the fitness function within
the GA. Using the best subset of variables we construct outof-sample forecasts for the ISE100 index to determine the
daily directional movements. Section 6 concludes the paper
with some discussion.
2 Current state of the literature

Due to the globalization of the financial markets, forecasting stock market movements has become difficult. Because
of the inter-dependency of stock markets with each other,
one should take into consideration the effect of this dependency in order to make accurate forecasts. This interdependency has especially large impact on stock market indices in
developing countries, such as Turkey.
In reviewing the literature, we note that there are several
studies which show the influence of international markets on
the Istanbul Stock Exchange (ISE) and predict the direction
of movements in the ISE100 index. For example, Korkmaz
et al. (2011) in their causality study, showed that ISE100 index is affected by the US markets. In another study, Ozun
(2007) showed the influence of volatility of stock markets
in developing countries along with Turkish and Brazilian
stock markets. In his study, Ozun (2007) also showed that
the US markets have influence in the positive direction,
i.e., when the US markets go up, the ISE100 index goes
up. Further, Vuran (2000) showed that the ISE100 index
is co-integrated with stock markets of the United Kingdom
(FTSE), Brazil (BOVESPA), and Germany (DAX). However, in their study, Boyacioglu and Avci (2010) used the
BOVESPA, Dow Jones Industrials (DJI), and DAX indices
along with other macro- and micro-economic variables as
predictors to forecast the ISE100 index. Cinko and Avci
(2007) compared artificial neural networks and regression
models in forecasting the daily values of the ISE100. Their
results show that neural networks models perform better
than the classical regression model. In their study, they used
only the lagged series of ISE100 as explanatory variables.
On the other hand, Ozdemir et al. (2011) used the MSCI
Emerging Markets (EM), MSCI European (EU), and S&P
500 (SP) indices along with other macroeconomic indicators to forecast the direction of movement of the ISE100 index.
Further, we also note that Radial Basis Function (RBF)
Neural Networks (NN) (RBF-NN) have been used as an alternative method in forecasting problems in stock markets.
The RBF-NN (Broomhead and Lowe 1988) model is a
special type of feed-forward neural network with one input, one hidden, and one linear output layer. In RBF-NN
the number of parameters is fewer than Multilayer Neural
Networks (ML-NN) because inputs are directly connected
to the hidden layer without using weights (Haykin 1999).
Center, width, and weight parameters of RBF-NN are the adjustable parameters. Moreover, center and width parameters
belong to the hidden layer of RBF-NN. On the other hand,
the weight parameters are the connection weights between
the hidden and output layers. Traditional RBF-NN learning
algorithms are used to determine the best parameters using
iterative techniques such as the gradient descent, and forward selection procedures.
There are several methods proposed to automatically
construct an RBF-NN model. In their study, Sun et al. (2005)
combined optimal partition algorithm with RBF-NN in order to determine the center and width parameters. Further,
Stat Comput
they showed the efficiency of their method by applying it

to prediction of S&P 500 (SP) and Shanghai Stock Exchange indices. Rivas et al. (2004) used an evolutionary algorithm in order to automatically determine the center and
width parameters of the RBF-NN model and applied their
results to forecast the exchange rates between the British
pound and the US dollar. Short-term electricity price forecasting has also been handled by a self-adaptive RBF-NN,
in which the RBF parameters are determined by fuzzy cmeans and differential evolution algorithms by Meng et al.
(2008).
Although, RBF-NN has been used by other researchers
in the literature above in forecasting stock market indices,
what makes our approach in this paper different, is in the
introduction of the clever genetic algorithm (GA) in subset selection of the best predictors in HRBF-NN model
that prunes the redundant predictors by the use of an objective entropic model selection procedure such as the information complexity (ICOMP) criterion (Bozdogan 1994,
2000, 2004) as our fitness function to choose the best subset of variables to be used in forecasting. In this process, our proposed method of HRBF-NN, is also based on
the idea of using regression trees to determine the number of radial basis functions in the hidden layer. We estimate the center and the width parameters data adaptively
based on the work of Kubat (1998) and Orr (2000). We
address the problem of singularity of the model matrix
by using adaptive ridge regression (Tikhonov and Arsenin
1977) in estimating the weight parameters. Moreover, we
choose the best fitting RBF by scoring and minimizing
the ICOMP values among several portfolio of candidate
RBFs.
Next, we present the detailed step-by-step explanation of
HRBF-NN predictive model.
3 Hybrid Radial Basis Function Neural Networks:

HRBF-NN model
In this section, we introduce the individual components of
Hybrid Radial Basis Function Neural Networks (HRBFNN) model which we utilize in this paper.
y = f (w, x)
=
m
wj hj (x) = w1 h1 + w2 h2 + + wm hm ,
(1)
j =1
where the regressors, {hj (x)}m

j =1 , are fixed radial basis functions of the explanatory variables, x n , and {wj }m
j =1 are
the unknown adaptable coefficients (weights) (Howlett and
Jain 2001).
In model (1), we can use several different RBFs which
provide the flexibility to handle non-linear relations. In this
paper, we consider the Gaussian, Cauchy, Multi-quadratic,
and Inverse Multi-quadratic RBFs given as follows (Howlett
and Jain 2001).
Gaussian RBF (GS):
p

(xk cj k )2
hj (x) = exp
(2)
rj2k
k=1
Cauchy RBF (CH):
hj (x) =
1 + exp(
1
p
(xk cj k )2
)
k=1
rj2k
Multiquadratic RBF (MQ):

p

(xk cj k )2

hj (x) = 1 + exp
rj2k
k=1
(3)
(4)
Inverse Multiquadratic RBF (IMQ):

1
hj (x) =
p
1 + exp( k=1
(xk cj k )2
)
rj2k
(5)
In short, the RBF-NN introduces a mapping or transformation of the n-dimensional inputs non-linearly to an mdimensional space and then estimates a model using linear
regression. The non-linear transformation is achieved using
m basis functions, each characterized by their centers cj in
the (original) input space and a width or radius vector rj ,
j {1, 2, . . . , m} (Orr 2000). Poggio and Girosi (1990) has
shown that RBF-NN possess the property of best approximation.
3.2 Estimation of weights: ridge regression
3.1 Radial Basis Function Neural Networks

RBF-NNs are a type of general linear model where the input
data are transferred to feature space by non-linear transformations using Radial Basis Functions. In this sense, RBFNN model can be written as a general linear model with the
dependent variable y and independent (or explanatory) variables x1 , x2 , . . . , xm in the following form
Given a network (or model) in (1) consisting of m RBFs with

m
centers {cj }m
j =1 and radii {rj }j =1 and a training set with p
p
patterns, {(xi , yi )}i=1 , the optimal network weights can be
found by minimizing the sum of squared errors:
SSE =
p
2
f (xi ) yi
i=1
(6)
Stat Comput
and is given by
1
w = H H
H y,
(7)
the so-called least squares estimation (Liu and Bozdogan

2004). Here H is the design or model matrix, with its el
ements Hij = hj (xi ), and similarly y = (y1 , y2 , . . . , yp ) is
the p-dimensional vector of the training set of output or response values.
When the least square estimation is used, singularity of
the model matrix is a common problem. In this case, the
use of global ridge regression (Tikhonov and Arsenin 1977;
Bishop 1991) is the way to resolve and avoid singularity. In
global ridge regression, a roughness penalty term is added to
the sum of squared errors to produce the cost function while
countering the effects of overfitting;
C(w, ) =
p
f (xi ) yi
i=1
m
wj2 = + w w
(8)
i=1
which is minimized to find a weight vector that is more robust to noise in the training set. The optimal weight vector
for global ridge regression is given by
1

w = H H + Im H y,
(9)
where Im is the m dimensional identity matrix (Tikhonov

and Arsenin 1977).
Although there are different methods to estimate the best
parameter, we use the approach proposed by Hoerl, Kennard and Baldwin (HKB) (Hoerl et al. 1975) given by
H KB =
ms 2
w LS w LS
(10)
which is a data adaptive approach. Here, m = k is the number of predictors not including the intercept term, n is the
number of observations, s 2 is the estimated error variance
using k predictors so that
s2 =
1
(y H w LS ) (y H w LS )
(n k + 1)
(11)
and w LS is the estimated coefficient vector obtained from a

no-constant model given by
1
H y.
w LS = H H
(12)
3.3 Combining regression trees and RBF-NN

3.3.1 Regression trees
The basic idea of regression trees (RT) is to partition the
input space recursively into two, and approximate the function in each half by the average output value of the samples
(Breiman et al. 1984). Each split is parallel to one of the axes

and can be expressed as an inequality among the input components (e.g. xk > b). The input space is divided into hyperrectangles organized into a binary tree where each branch is
determined by the dimension (k) and boundary (b), which
together minimize the residual error between the model and
data (Orr 2000). The root node of the regression tree is the
smallest hyper rectangle that will include all of the training
p
data {xi }i=1 . Its size sk (half-width) and center ck in each
dimension k are

1
max(xik ) min(xik )
iS
2 iS

1
ck = max(xik ) + min(xik )
iS
2 iS
sk =
(13)
(14)
where k K is the set of predictor indices, and S =

{1, 2, . . . , p} is the set of training set indices. A split of the
root node divides the training samples into left and right
subsets, SL and SR , on either side of a boundary b in one of
the dimensions k such that
sL = {i : xik b},
(15)
sR = {i : xik > b},
(16)
The mean output value on either side of the bifurcation is

yL =
1
yi ,
pL
(17)
1
yi ,
pR
(18)
iSL
yR =
iSR
where pL and pR are the number of samples in each subset.

The mean square error (MSE) is then

1
MSE(k, b) =
(yi y L )2 +
(yi y R )2 . (19)
p
iSL
iSR
The split which minimizes MSE(k, b) over all possible

choices of k and b is used to create the children of the
root node and is found by a simple discrete search over m dimensions and p observations. The children of the root node
are split recursively in the same manner; the process terminates when every remaining split creates children containing fewer than pmin samples, which is a parameter of the
method. The children are shifted with respect to their parent
nodes and their sizes reduced in the k-th dimension.
RT can both estimate a model and indicate which components of the input vector are most relevant to the modelled
relationship. Dimensions which carry the most information
about the output tend to split earliest and most often (Orr
2000).
Stat Comput
3.3.2 Transforming tree nodes into RBFs

A regression tree contains a root node, some non-terminal
nodes (having children) and some terminal nodes (having no
children). Each node is associated with a hyper rectangle of
input space having a center c and size s as described above.
The node corresponding to the largest hyper rectangle is the
root node and that is divided up into smaller and smaller
pieces progressing down the tree. To transform the hyper
rectangles into different basis kernel RBFs we use its center
c as the RBF center and its size s, scaled by a parameter
as the RBF radius given by
r = s.
(20)
The scalar has the same value for all nodes (Kubat 1998),
and
is1another parameter of the method. One can use =
2K where K is the Kubats parameter (Kubat 1998;
Orr 2000).
4 Information complexity ICOMP criterion as a fitness

function and the genetic algorithm
4.1 Information complexity ICOMP criterion
The approach we will take in this paper will be based on
cost functions which measures the goodness-of-fit or performance of a fitted model for a given stock price index.
The risk, that is, the expected cost of choosing the best fitting model will be measured in terms of a new entropic or
information-based criterion which is based on different characterization of good models by combining penalties with
lack of fit, lack of parsimony, and the profusion of complexity.
In our case, we will base the fitness function on the information theoretic model selection criteria. We shall derive
the information-theoretic measure of complexity (ICOMP)
criterion of Bozdogan (1988, 1994, 2000, 2004) to choose
the best fitting basis kernel RBFs, and the best subset of predictors with the hybridized GA with regularized regression
trees and RBF networks.
The complexity of a non-parametric regression model increases with the number of independent and adjustable parameters, which is also termed as effective degrees of freedom in the model. According to the qualitative principle of
Occams Razor, the simplest model that fits the observed
data is the best model. Following this principle, we provide
a trade off between how well the model fits the data and the
model complexity in one criterion function.
We use the information criteria to evaluate and compare
different horizontal and vertical subset in the genetic algorithm (GA) for the regularized regression trees and RBF networks model given by (1) under the assumption that the random error term is a multivariate Gaussian distribution, i.e.,
N (0, 2 I ).
One of the general forms of ICOMP is an approximation to the sum of two Kullback-Leibler (KL) (Kullback and
Leibler 1951) distances. For a general multivariate normal
linear or non-linear structural model this general form of
ICOMP is given by

ICOMP(IFIM) = 2 log L( ) + 2C1 F 1 ( ) ,
(21)
where 2 log L( ) is the maximized log likelihood (i.e., the

lack of fit component), C1 (F 1 ( )) is the maximal information theoretic complexity of the estimated inverse-Fisher
information matrix (IFIM) given by

1
s
tr(F ( ))
C1 F 1 ( ) = log L
2
s

1
log F 1 ( ) ,
2
(22)
and where s = dim(F 1 ) = rank(F 1 ).

For the regression trees and RBF networks, the estimated
inverse Fisher information matrix (IFIM ) is given by

2 (H H )1
0
2
1
w,
Cov
= F =
(23)
2 4 ,
0
4
and where
2 =
)
(y H w
) (y H w
n
(24)
is the estimated error variance.

The computational form of ICOMP(IFIM) in (21) for the
HRBF-NN predictive model becomes:

ICOMP(IFIM) = n log(2) + n log L 2 + n

+ 2C1 F 1 ( )
(25)
where the entropic complexity is

4
1
tr 2 (H H )1 + 24
C1 F
= (m + 1) log
m+1
4
2
1
2
1

+ log
log H H
2
4
(26)
as our performance measure for choosing the best fitting

model when the model is correctly specified. That is, when
the random error term follows a Gaussian distribution.
On the other hand, when the model is misspecified, we
define ICOMP(IFIM) for misspecified models as
ICOMP(IFIM)Misspec

)Misspec ,
= 2 log L( ) + 2C1 Cov(
(27)
Stat Comput
where
F 1 .
)Misspec = F 1 R
Cov(
(28)
1 is the inner-product (or the Hessian) form of

In (28) F
the estimated inverse Fisher information matrix (IFIM) and
is the outer-product form of IFIM. Cov(
)Misspec is often
R
called the sandwich covariance or robust covariance estimator, since it is a correct covariance regardless whether
of the assumed model is correct or not. When the model is
= R,
and the formula (28) reduces to
correct we have F
1
the usual estimated inverse Fisher information matrix F
(White 1982).
For the misspecified HRBF-NN predictive model, the
as in Bozdogan (2004) is
outer-product form of IFIM, R,
given by
1 2

1 Sk
H
D
H
H
4
3
2
=
R
(29)
(nq)(Kt1) ,
Sk
(H 1 2
)
3
4
4
where
12 , . . . ,
n2 ),
D 2 = diag(
H is (n q) matrix of regressors or model matrix,
Sk is the estimated residual skewness,
Kt the kurtosis, and 1 is a (n 1) vector of ones.
1. Implementing a genetic coding scheme: The first step

of the GA is to represent each subset model as a binary
string. A binary code of 1 indicates presence where a 0
indicates absence of the relevant predictor. Every string is
of the same length, but contain different combinations of
predictor variables. For a data set with k = 6 predictors
with a constant, the following string represents a variable
subset including the constant, and the input variables x2 ,
x3 , and x6 .
1 0 1 1 0 0 1
x0 x1 x2 x3 x4 x5 x6
We note that using the sandwich covariance matrix in (28)

we penalize the presence of skewness and kurtosis in the
variables as we fit the HRBF-NN model, which is not possible using AIC-type criteria. See, e.g., Bozdogan (2004).
Under misspecification, the computational form of
ICOMP(IFIM)Misspec is given by

ICOMP(IFIM)Misspec = n log(2) + n log 2 + n

)Misspec .
+ 2C1 Cov(
pling (Liang and Wong 2001). There are several applications of the GA in a variety of fields including econometrics (Routledge 1999), finance (Neely et al. 1997), and image processing (Bhandarkar et al. 1994). Additionally, successful application of the GA on variable selection has also
been reported in the literature (Bozdogan and Howe 2012;
Howe and Bozdogan 2010).
Recall that the regularized regression tree and RBF networks model given in (1), the GA is used to find the best or
nearly best subset of predictors from the data.
The summary of the implementation of the GA is as follows.
(30)
4.2 The genetic algorithm for subset selection of best

predictors
The genetic algorithm (GA) is a stochastic (or probabilistic)
search algorithm that employs natural selection and genetic
operators. A GA treats information as a series of codes on a
binary string, where each string represents a different solution to a given problem. It follows the principles of survival
of the fittest, which was introduced by Charles Darwin. The
algorithm searches within a defined search space to solve a
problem. It has outstanding performance in finding the optimal solution for problems in many different fields (Eiben
and Smith 2010).
The genetic algorithm has been used as an optimization
tool in order to find optimal parameters in many statistical methods such as robust regression (Burns 1992), experimental design (Hamada et al. 2001), and Bayesian sam-
2. Generating an initial population of the models: We

choose an initial population of size N consisting of randomly selected models from all possible models.
3. Using a fitness function to evaluate the performance
of the models in the population: A fitness function provides a way of evaluating the performance of the models.
For our fitness function, we use ICOMP under misspecification defined in (30), calculated using the in-sample
predictions obtained by HRBF-NN for each solution.
Variable selection with the GA is hence a minimization
problem (Bozdogan 2004). Note that there is no lower
bound for ICOMP values, just like with AIC-Type model
selection criteria (Akaike 1973).
4. Selecting the parent models from the current population: This step is to choose models to be used in the next
step to generate a new population. Selection of parents
models is based on natural selection; models with better
fitness values have greater chance to be selected as parents. We calculate the difference of the fitness function
as
ICOMP(i) = ICOMPMax ICOMPi = Range
(31)
for i = 1, 2, . . . , N , where N is the population size. Here

we write ICOMP instead or ICOMP(IFIM)Misspec for
simplicity. Next, we average these differences by computing
ICOMP =
n
1
ICOMP(i)
N
i=1
(32)
Stat Comput
Then the ratio of each models difference value to the

mean difference value is calculated. That is, we compute
ICOMPRatio =
ICOMP(i)
ICOMP
(33)
This chance of a model being mated is proportional to

this ratio. The process of selecting mates to produce offspring models continues until the number of off-springs
equals to the initial population size.
5. Produce off-spring models by crossover and mutation
process: The selected parents are then used to generate
offspring by performing crossover and/or mutation processes on them. Both the crossover and mutation probabilities are determined by the analyst. A higher crossover
probability introduces more new models into the population in each generation, while removing more of the
good models from the previous generation. Mutation is
a random search operator that helps to jump around the
search space within the solutions scope. Lin and Lee
(1996) states that mutation should be used sparingly,
since with a high mutation probability the algorithm becomes little more than a random search. There are several different ways of performing crossover: single-point
crossover, two-point crossover, and uniform crossover,
etc. (Eiben and Smith 2010). In this study, we use singlepoint crossover to carry out the variable selection via
the genetic algorithm. In the literature of the GA, there
is no theoretical results available in how to choose the
crossover and the mutation parameters of the GA, since
they depend on the application area and the problem at
hand. However, based on the established empirical results, almost all researchers agree that crossover probability should be greater than 0.3 while mutation probability is smaller than 0.1. See, e.g., Fouskakis and
Draper (2002). Although the crossover and mutation
probabilities are typically set in the intervals (0.5,1.0)
and (0.005,0.01), Srinivas and Patnaik (1994) and Zhang
et al. (2007) in their study, fix the crossover and mutation probabilities at 0.6, and 0.01, respectively. Furthermore, De Jong (1975) in his simulation study found that
the best values of crossover and mutation probabilities
should be 0.6 and 0.001, respectively. Based on this, De
Jong and Spears (1989) later fixed the crossover probability at 0.6 in his yet another study. In the absence of theoretical results, in this paper, we use a statistically based
experimental study suggested by one of the referees to
find best value(s) of the crossover and mutation probabilities. These results are given in Sect. 5.2.
Table 1 Abbreviations list for stock market indexes

Variable name
Variable explanation
ISE100
Istanbul stock exchange national 100 index
SP
Standard & poors 500 return index
DAX
Stock market return index of Germany
FTSE
Stock market return index of UK
NIK
Stock market return index of Japan
BVSP
Stock market return index of Brazil
EU
MSCI European index
EM
MSCI emerging markets index
5 Numerical example: forecasting the direction of

movement of ISE100 index
In this section, we give our numerical examplevariable
selection with HRBF-NN to determine the optimal lags of
international indicators predicting the direction of movement of the ISE100 index. Based on the literature, the list
of different stock market indices considered in our numerical analysis is given in Table 1.
The organization and the preparation of our data set to
determine the training sample size, the best fitting radial basis function, the best subset selection with GA, forecasting
the movement of the direction of ISE100 are all explained
in detail as follows.
5.1 Preparing the data set for the analysis
We obtained daily price data for the indices listed in Table 1 from http://finance.yahoo.com and http://imkb.gov.tr
and converted the prices to returns. We then have n = 536
daily returns between the dates January 5, 2009 to February
22, 2011. We excluded the days for which the Turkish stock
exchange was closed. In the case of missing data for the
other indices, the previous days value was used. After constructing the first and second lags for all indices, the usable
number of observations is 534. These lags are taken into account as the explanatory variables and they are indicated by
adding 1 and 2 to the end of the variable names. For example, first and second lags for DAX are named as DAX1 and
DAX2, respectively. For comparing the ISE100 lags against
the other indices, the time differences between Turkey and
the other countries were also taken into account. Our data is
available from the link: http://www.akbilgic.com/.
5.2 Variable selection for Istanbul stock exchange
We used the HRBF-NN model to determine the best subset
of one- and two-days lags of all eight indices that most effectively and efficiently predicts the ISE100 directional movement. We have 16 predictors in total; first eight predictors
Stat Comput
Table 2 Variable selection with the genetic algorithm
Mut. pr.
Xover pr.
RBF
ICOMP
Variable subset
0.65
0.010
Gauss
2773.4
0111111000101101
0.85
0.010
Gauss
2773.1
1111111000101111
0.65
0.010
IMQ
2772.9
0110111111101010
0.65
0.005
IMQ
2771.4
0010010111101000
0.65
0.010
Cauchy
2771.4
0110011011101000
0.60
0.050
Gauss
2770.8
0010000111101011
0.50
0.100
Gauss
2770.7
0110011111101000
0.80
0.005
Gauss
2770.6
0110010111101011
0.75
0.010
Gauss
2770.6
0110010111101011
0.60
0.100
Gauss
2769.6
0010111011101001
are the one-day lags of the indices listed in Table 1 while the
other eight predictors are the two days lags of the same indices. As previously stated, the ICOMP(IFIM)Misspec criterion is used to score the subset models as the fitness function
within the genetic algorithm (GA). We used a population
size of N = 25, which is larger than the number of predictors, 16. Although, in our experiments, there was generally
no improvement on the fitness function after 18th generation, we let the GA run for 30 iterations. We ran genetic algorithm for different values of crossover and mutation probabilities attempting to minimize the ICOMP score. In our
experimental study, we set the crossover and mutation probabilities from the sets {0.50, 0.55, 0.60, 0.65, 0.70, 0.75,
0.80, 0.85, 0.90, 0.90} and {0.005, 0.01,0.05, 0.1}, respectively.
During our experimental study, we ran the GA for variable selection for different mutation and crossover probabilities, and the RBF combinations, 160 times in total. We
ran the GA for four of the RBFs, Gaussian, Cauchy, Multi
Quadratic, and Inverse Multi Quadratic. The results for best
ten out of 160 runs for different parameter combinations are
listed in Table 2. During each run of the GA elitism rule
has been applied in order to keep the best subset in each
generation of the GA found so farthis ensures monotonic
improvement of the fitness function.
Table 2 shows that, the fitness function, ICOMP, is minimized at the GA parameters: Crossover Probability = 0.65,
Mutation Probability = 0.01, and the Gaussian RBF. Therefore, we chose the Gaussian RBF as the best fitting RBF for
forecasting the ISE100 index. The best fitting model with
our approach is given by the first variable subset in Table 2.
The predictors of best subset, represented as binary code,
correspond the variable subset, {SP1, DAX1, FTSE1, NIK1,
BVSP1, EU1, DAX2, NIK2, BVSP2, EM2}. Note that the
ISE1 and ISE2 variables are not in the best subset with minimum ICOMP value. This means that the ISE100 is strongly
affected by the lags of the international markets, not by its
own lags. This is an interesting as well as an important result
in the sense that what is currently used in practice is solely

based on the lag variables of the ISE100 itself. Our results
suggest that the ISE100 index does not require its lags to
build a predictive model.
5.3 Forecasting the direction of the movement of ISE 100
index
Using the best subset model identified by the GA and
HRBF-NN, we generate out-of-sample forecast returns for
ISE100. We then translated each forecast into a directional
sign +1 and 1. A forecast of +1 indicates the ISE100
index should appreciate in value that day; 1 indicates a
drop in value. These forecasts can then be interpreted as
buy-sell signals, respectively. We use the percentage of correctly forecasted (PCF) days as a criterion to validate our
results. Although PCF may not be a universally accepted
criterion, it allows us to interpret our results in simple and
logical terms. For example, if PCF is 65 %, it means that the
forecast movement direction of the ISE100 was correct for
65 out 100 days. We also use a second criterion called Dollar 100 (D100), which give us the theoretical future value of
$100 invested at the beginning of the term forecasted, and
traded according to the forecast signals. If the forecast is +1
and we do not have an outstanding position in the ISE100,
the decision is buy. In contrast, if 1 and we do have a position, the decision is sell. It is obvious that no action is taken
for the rest of the options.
Throughout our analysis, we used a rolling training sample size of 250 days, which was used to forecast the next
20 days. For example, our first forecast was 05.05.2009. On
day 21, the training set was updated by dropping off the earliest 20 days and adding on the latest 20. This periodic renewal of the training set allows our model and parameters to
accommodate the dynamic structure of the capital markets.
This rolling-forward process was run ten times, resulting in
a total of 200 forecast days, in groups of 20. In Table 3, we
show D100 results summarized by each group. For comparison, the right column shows the same metric for a simple
buy-and-hold strategy.
Looking at Table 3, a $100 investment is increased to
$202 in 200 trading days by managing investment according
to forecasts made using our best fitting HRBF-NN model.
Figure 1 shows that making Buy-Sell decisions according to
the HRBF-NN model would have been better than just investing and holding the position, over this period. Out of all
200 forecasts, 136 correctly identified the direction of the
ISE100 index, for a PCF of 68 %.
The results shown in Table 3 summarize the results by the
forecast period. For the first period, the buy-sell signals are
shown in Table 4, along with other relevant details. Here we
see that if we had invested $100 our investment would have
become $117.84 by making buy-sell decision according to
the best fitting HRBF-NN model forecast. Taking transac-
Stat Comput
Fig. 1 Movement of invested
$100 in 200 days
Table 3 Movement of invested $100 for 200 days split in to 10 periods
Table 4 Detailed movement of invested $100 for first 20 days period
Forecast terms
Buy-sell decisions by HRBF-NN
No buy-sell
Day
Beginning (Dollar)
100
100
Period 1
117.83
91.27
Period 2
127.24
90.62
Period 3
145.85
104.51
Period 4
153.34
99.72
Period 5
171.96
114.43
Period 6
185.07
131.00
Period 7
162.70
113.36
Period 8
176.53
112.08
Period 9
186.17
108.94
Period 10
202.06
105.52
tion costs into consideration, our investment would have become $116.90. If we had just bought and held the index, the
investment would have declined to $91.27.
6 Conclusions
In this paper, we studied a variable selection and forecasting
problem in stock markets by focusing on the ISE100 index.
We have identified a model for the effects of international
stock markets on the ISE100 index. We carried out a variable
subset selection using the GA along with ICOMP criterion
to determine which indices have important effects on the direction of the movement of the ISE100 index. Our variable
subset selection results selected the first and second lags of
NIK, DAX, and BVSP and only first lags of SP, FTSE, and
EU, and only second lag of EM indices as explanatory variables. It is interesting to see that none of the lags of ISE100
were selected as explanatory variables. Our results suggest
that the ISE100 index does not require its lag variables to
build a predictive model.
ISE100
Forecast
Decision
HRBF-NN
No buy-sell
Sell
100.0000
97.2299
Buy
97.5386
94.8367
Sell
97.5386
87.8112
Buy
107.8638
97.1066
Sell
107.8638
96.0809
Keep
107.8638
99.1060
Buy
108.5335
99.7213
Sell
108.5335
95.1962
Keep
108.5335
94.8693
10
Buy
110.8596
96.9026
11
Sell
110.8596
90.0334
12
Keep
110.8596
89.3095
13
Buy
112.5537
90.6742
85.6230
14
Sell
112.5537
15
Buy
117.5910
89.4550
16
Keep
119.3674
90.8064
17
Keep
121.7044
92.5842
18
Keep
118.6782
90.2821
19
Keep
117.8346
89.6404
20
Sell
117.8346
91.2655
Further, our results have shown that the HRBF-NN model

is a highly flexible and clever data mining technique that
can handle relationships between highly nonlinear data
structures. Although forecasting in stock markets is a very
difficult task, forecasts made by our model demonstrates
a high performanceapproximately with 65 % accuracy
rate.
One caveat of our approach is that made on the assumption that random noise follows a Gaussian distribution. Even with this assumption, HRBF-NN model is able
to adopt itself, due to its overall non-parametric nature.
In a future study, we intend to relax the Gaussian as-
Stat Comput
sumption and consider a more general distributional assumption on the random noise such as the Power Exponential (PE) distribution. The PE distribution includes the
Gaussian, Laplace, and other distributions as a subfamily.
Our results will be reported in a subsequent paper elsewhere.
All our computations has been carried out using newly
developed scripts in MATLAB . Since there is still ongoing
research being done by the first two authors in HRBF-NN
these scripts presently are not freely available.
Acknowledgements This work was supported by Scientific Research Projects Coordination Unit of Istanbul University under project
number 17708. We further acknowledge the valuable comments of the
three anonymous referees and the Associate Editor which resulted to a
much improved paper.
References
Akaike, H.: Information theory and an extension of the maximum likelihood principle. In: Petrox, B.N., Csaki, F. (eds.) Second International Symposium on Information Theory, pp. 267281. Akad.
Kiad, Budapest (1973)
Akbilgic, O., Bozdogan, H.: Predictive subset selection using regression trees and RBF neural networks hybridized with the genetic
algorithm. Eur. J. Pure Appl. Math. 4(4), 467485 (2011)
Bhandarkar, S., Zhang, Y., Potter, W.: Edge detection technique using genetic algorithm-based optimization. Pattern Recognit. 27,
11591180 (1994)
Bishop, C.: Improving the generalization properties of radial basis
function neural networks. Neural Comput. 3(4), 579588 (1991)
Boyacioglu, M., Avci, D.: An adaptive network-based fuzzy inference
systems (ANFIS) for prediction of stock market return: the case
of Istanbul stock exchange. In: Expert Systems with Applications,
vol. 37, pp. 79027912 (2010)
Bozdogan, H.: ICOMP: a new model-selection criteria. In: Bock, H.H.
(ed.) Classification and Related Methods of Data Analysis. NorthHolland, Amsterdam (1988)
Bozdogan, H.: Mixture-model cluster analysis using a new informational complexity and model selection criteria. In: Proceedings of
the First US/Japan Conference on the Frontiers of Statistical Modeling: An Informational Approach. Multivariate Statistical Modeling, vol. 2, pp. 69113. Kluwer Academic, Norwell (1994)
Bozdogan, H.: Akaikes information criterion and recent developments
in informational complexity. J. Math. Psychol. 44, 6291 (2000)
Bozdogan, H.: Intelligent statistical data mining with information complexity and genetic algorithms. In: Bozdogan, H. (ed.) Statistical
Data Mining and Knowledge Discovery, pp. 1556. Chapman &
Hall, London (2004)
Bozdogan, H., Howe, J.A.: Misspecified multivariate regression models using the genetic algorithm and information complexity as the
fitness function. Eur. J. Pure Appl. Math. 5(2), 211249 (2012)
Breiman, L., Freidman, J., Stone, J.C., Olsen, R.A.: Classification and
Regression Trees. Chapman & Hall, London (1984)
Broomhead, D.S., Lowe, D.: Multi-variable functional interpolation
and adaptive networks. Complex Syst. 11, 321355 (1988)
Burns, P.: A genetic algorithm for robust regression estimation. Technical report from Statistical Sciences, Inc. (1992)
Cinko, M., Avci, E.: A comparison of neural network and linear regression forecast of the ISE100 index. neri 7(28), 301307 (2007)
De Jong, K.A.: An analysis of the behavior of a class of genetic adaptive systems. Ph.D. Dissertation, University of Michigan (1975)
De Jong, K.A., Spears, W.M.: Using genetic algorithms to solve NPcomplete problems. In: Schaffer, J.D. (ed.) Third Conference on
Genetic Algorithms, pp. 124132. Morgan Kaufmann, San Mateo
(1989)
Eiben, A.E., Smith, J.E.: Introduction to Evolutionary Computing.
Springer, Berlin (2010)
Fouskakis, D., Draper, D.: Stochastic optimization: a review. Int. Stat.
Rev. 70(2), 315349 (2002)
Hamada, M., Martz, H., Reese, C., Wilson, A.: Finding near-optimal
Bayesian experimental designs via genetic algorithms. Am. Stat.
55(3), 175181 (2001)
Haykin, S.: Neural Networks: A Comprehensive Foundation. Prentice
Hall, New Jersey (1999)
Hoerl, A.E., Kennard, R.W., Baldwin, K.F.: Ridge regression: some
simulations. Commun. Stat. 4, 105123 (1975)
Howe, J.A., Bozdogan, H.: Predictive subset VAR modeling using the
genetic algorithm and information complexity. Eur. J. Pure Appl.
Math. 3(3), 382405 (2010)
Howlett, R.J., Jain, L.C.: Radial Basis Function Networks 1: Recent
Developments in Theory and Applications. Physica-Verlag, New
York (2001)
Korkmaz, T., Cevik, E., Birkan, E., Ozatac, N.: Causality in mean and
variance between ISE100 and S&P 500: Turkcell case. Afr. J. Bus.
Manag. 5(5), 16731683 (2011)
Kubat, M.: Decision trees can initialize radial basis function networks.
IEEE Trans. Neural Netw. 9(5), 813821 (1998)
Kullback, A., Leibler, R.: On information and sufficiency. Ann. Math.
Stat. 22, 7986 (1951)
Liang, F., Wong, W.: Real-parameter evolutionary Monte Carlo with
applications to Bayesian mixture models. J. Am. Stat. Assoc.
96(454), 653666 (2001)
Lin, C.-T., Lee, C.S.G.: Neural Fuzzy Systems: A Neuro-Fuzzy Synergism to Intelligent Systems. Prentice Hall, New York (1996)
Liu, Z., Bozdogan, H.: Improving the performance of radial basis function classification using information criteria. In: Bozdogan, H.
(ed.) Statistical Data Mining and Knowledge Discovery, pp. 193
216. Chapman & Hall, London (2004)
Lo, A.W., MacKinlay, C.: Stock market prices do not follow random
walks: evidence from a simple specification test. Rev. Financ.
Stud. 1, 4166 (1988)
Meng, K., Dong, Z.Y., Wong, K.P.: Self-adaptive radial basis function
neural networks for short-term electricity price forecasting. IEE
Proc., Gener. Transm. Distrib. 3(4), 325335 (2008)
Neely, C., Weller, P., Dittmar, R.: Is technical analysis in the foreign
exchange market profitable? A genetic programming approach.
J. Financ. Quant. Anal. 32(4), 405426 (1997)
Orr, M.: Combining regression trees and RBFs. Int. J. Neural Syst.
10(6), 453465 (2000)
Ozdemir, A.K., Tolun, S., Demirci, E.: Endeks getirisi yonunun ikili siniflandirma yontemiyle tahmin edilmesi: IMKB100 endeksi
ornegi. Nigde Univ. IIBF Derg. 4(2), 4559 (2011)
Ozun, A.: Are the reactions of emerging equity markets to the volatility
in advanced markets similar? Comparative evidence from Brazil
and Turkey. Int. Res. J. Finance Econ. 9, 220230 (2007)
Poggio, T., Girosi, F.: Regularization algorithms for learning that are
equivalent to multilayer networks. Science 247(4945), 978982
(1990)
Rivas, V.M., Merelo, J.J., Castillo, P.A., Arenas, M.G., Castellano,
J.G.: Evolving RBF neural networks for time-series forecasting
with EvRBF. Inf. Sci. 1655(5354), 207220 (2004)
Routledge, B.: Adaptive learning in financial markets. Rev. Financ.
Stud. 12(5), 11651202 (1999). Oxford University Press
Srinivas, M., Patnaik, L.M.: Adaptive probabilities of crossover and
mutation in genetic algorithms. IEEE Trans. Syst. Man Cybern.
24(4), 656667 (1994)
Stat Comput
Sun, Y.F., Liang, Y.C., Zhang, W.L., Lee, H.P., Lin, W.Z., Cao, L.J.:
Optimal partition algorithm of the RBF neural network and its
application to financial time series forecasting. Neural Comput.
Appl. 14, 3544 (2005)
Tikhonov, A.N., Arsenin, V.Y.: Solutions of Ill-Posed Problems. Wiley,
New York (1977)
Vuran, B.: The determination of long-run relationship between ISE100
and international equity indices using cointegration analysis. Istanb. Univ. J. Sch. Bus. Adm. 39(1), 154168 (2000)
White, H.: Maximum likelihood estimation of misspecified models.

Econometrica 50, 125 (1982)
Zhang, J., Chung, H.S., Lo, W.: Clustering-based adaptive crossover
and mutation probabilities for genetic algorithms. IEEE Trans.
Evol. Comput. 11(3), 326335 (2007)

A Novel Hybrid RBF Neural Networks Model As A Forecaster Statistics & Computing

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

A Novel Hybrid RBF Neural Networks Model As A Forecaster Statistics & Computing

Încărcat de

Drepturi de autor:

Formate disponibile

Stat Comput

A novel Hybrid RBF Neural Networks model as a forecaster

Received: 11 October 2011 / Accepted: 3 January 2013

known, the task of generating economically useful stock

2 Current state of the literature

they showed the efficiency of their method by applying it

3 Hybrid Radial Basis Function Neural Networks:

where the regressors, {hj (x)}m

Multiquadratic RBF (MQ):

Inverse Multiquadratic RBF (IMQ):

3.1 Radial Basis Function Neural Networks

Given a network (or model) in (1) consisting of m RBFs with

the so-called least squares estimation (Liu and Bozdogan

where Im is the m dimensional identity matrix (Tikhonov

and w LS is the estimated coefficient vector obtained from a

3.3 Combining regression trees and RBF-NN

(Breiman et al. 1984). Each split is parallel to one of the axes

where k K is the set of predictor indices, and S =

sR = {i : xik > b},

The mean output value on either side of the bifurcation is

where pL and pR are the number of samples in each subset.

The split which minimizes MSE(k, b) over all possible

3.3.2 Transforming tree nodes into RBFs

4 Information complexity ICOMP criterion as a fitness

where 2 log L( ) is the maximized log likelihood (i.e., the

and where s = dim(F 1 ) = rank(F 1 ).

is the estimated error variance.

where the entropic complexity is

as our performance measure for choosing the best fitting

1 is the inner-product (or the Hessian) form of

1. Implementing a genetic coding scheme: The first step

We note that using the sandwich covariance matrix in (28)

4.2 The genetic algorithm for subset selection of best

2. Generating an initial population of the models: We

for i = 1, 2, . . . , N , where N is the population size. Here

Then the ratio of each models difference value to the

This chance of a model being mated is proportional to

Table 1 Abbreviations list for stock market indexes

Istanbul stock exchange national 100 index

Standard & poors 500 return index

Stock market return index of Germany

Stock market return index of UK

Stock market return index of Japan

Stock market return index of Brazil

MSCI European index

MSCI emerging markets index

5 Numerical example: forecasting the direction of

in the sense that what is currently used in practice is solely

Table 3 Movement of invested $100 for 200 days split in to 10 periods

Table 4 Detailed movement of invested $100 for first 20 days period

Buy-sell decisions by HRBF-NN

Further, our results have shown that the HRBF-NN model

White, H.: Maximum likelihood estimation of misspecified models.

S-ar putea să vă placă și

1 is the inner-product (or the Hessian) form of