Documente Academic
Documente Profesional
Documente Cultură
Rohitash Chandra
School of Science and Technology
The University of Fiji
rohitashc@unifiji.ac.fj
Christian. W. Omlin
The University of the Western Cape
Recurrent neural networks maintain information about 2.2 Hidden Markov Models
their past states for the computation of future states using
feedback connections. They are composed of an input A hidden Markov model (HMM) describes a process
layer, a context layer which provides state information, a which goes through a finite number of non-observable
hidden layer and an output layer as shown in Figure 1. states whilst generating either a discrete or continuous
Each layer contains one or more processing units called output signal. In a first-order Markov model, the state at
neurons which propagate information from one layer to time t+1 depends only on state at time t, regardless of the
the next by computing a non-linear function of their states in the previous times [16]. Figure 2 shows an
weighted sum of inputs. Popular architectures of recurrent example of a Markov model containing three states in a
neural networks include first-order recurrent networks stochastic automaton.
[12], second-order recurrent networks [13], NARX
networks [14] and LSTM recurrent networks [15]. A
detailed study about the vast variety of recurrent neural a11
networks is beyond the scope of this paper; however, we a12
will discuss the dynamics of first–order recurrent neural
network as given in Equation 1: П 1 2
1
П
a21
2
K J
S i ( t ) = g ∑ V ik S k ( t − 1) + ∑ W ij I j ( t − 1) (1)
a13
k =1 j =1
3
where S k (t ) and I j (t ) represent the output of the state П
3
neuron and input neurons respectively. Vik and W ij
represent their corresponding weights. g(.) is a sigmoidal
discriminant function. We will use this architecture to Figure 2: A Markov model. Пi is the probability that
construct the hybrid architecture of recurrent neural the system will start in state Si and aij is the probability
networks inspired by hidden Markov models and show that the system will move from state Si to state Sj.
that they can learn deterministic finite automaton.
The model probabilistically links the observed signal to
the state transitions in the system. The theory provides a
means by which:
N
x j ( t ) = f ∑ x i ( t − 1) wij 1≤ i ≤ N (3)
i
K J (4)
S i ( t ) = f ∑ Vik S k ( t − 1) + ∑ W ij I j ( t − 1) .bt −1 ( O )
k =1
j =1
Figure 4: The architecture of Hybrid Recurrent
where bt −1 ( O ) is the Gaussian distribution. Note that the Neural Networks. The dashed lined indicate that the
architecture can represent more neurons in each layer if
subscript in bt −1 ( O ) i.e. time t in Equation 4 is different required.
when compared to the subscript for Gaussian distribution
in equation 2. The dynamics of hidden Markov models Figure 4 shows how the Gaussian distribution for
and recurrent networks varies in this context; however we hidden Markov model is used to build hybrid recurrent
can adjust the parameter for time t as shown in Equation 4 neural networks. The output of the univariate Gaussian
in order to map hidden Markov models into recurrent function solely depends on the two input parameters
neural networks. For a single input, the univariate which are the mean and the variance. These parameters
Gaussian distribution is given by Equation 5: will also be represented in the chromosomes together with
the weights and biases and will be trained by genetic
1 1 ( O − µ )2 algorithm.
bt ( O) = exp − (5)
2
2πσ 2 σ
4. Empirical Results and Discussion
t
where O is the observation at time t, µ is the mean and
4.1 Training Hybrid Recurrent Neural Networks
σ 2 i is the variance. For multiple inputs to the hybrid on DFAs
recurrent network, the multivariate Gaussian for d
dimensions is given by equation 6: In the hybrid recurrent neural networks architecture,
the neurons in the hidden layer compute the weighted sum
1 1 (6) of their inputs with further multiplying to the output of
bt (O) = exp − (O − µ )t ∑ −1 (O − µ )
2π d / 2 | ∑ |1/ 2 2 the corresponding Gaussian function which gets inputs
from the input layer. The product of the neuron and the
where O is a d-component column vector, µ is a d- output of the Gaussian function are then propagated from
component mean vector, ∑ is a d-by-d covariance the hidden layer to the output layer as shown in Figure 4.
The recorded classification performance of the DFAs [2] C.L. Giles, S. Lawrence and A.C. Tsoi, “Rule inference for
financial prediction using recurrent neural networks”, Proc. of
extracted with increasing string length L are shown in
the IEEE/IAFE Computational Intelligence for Financial
Table 4. We note that the DFAs extracted from lengths Engineering, New York City, USA, 1997, pp. 253-259
L=2 and 3 show 0% accuracy as these string lengths were
too small to represent the deterministic finite automaton. [3] K. Marakami and H Taguchi, “Gesture recognition using
The string classification accuracy jumps to 90.02% for recurrent neural networks”, Proc. of the SIGCHI conference on
L=4 and remains at 100% for all prefix trees with larger Human factors in computing systems: Reaching through
depth than L=5. The extracted deterministic finite technology, Louisiana, USA, 1991, pp. 237-242.
automaton was identical to the automaton used for
training the hybrid recurrent neural networks architecture [4] M. J. F. Gales, “Maximum likelihood linear transformations
for HMM-based speech recognition”, Computer Speech and
as shown in Figure 3.
Language, vol. 12, 1998, pp. 75-98.
We also ran experiments where the training set [5] T. Kobayashi, S. Haruyama, “Partly-Hidden Markov Model
consisted of 50%, 30% and 10% of all strings up to length and its Application to Gesture Recognition”, Proc. of IEEE
10, i.e. the training data itself certainly no longer International Conference on Acoustics, Speech, and Signal
embodied the knowledge about the output values Processing, vol. 4, 1997, pp. 3081.
necessary in order to induce DFAs. In this case, the
trained network had to rely on its generalization capability [6] C. Lee Giles, C.W Omlin and K. Thornber, “Equivalence in
in order to assign the output of the values missing in the Knowledge Representation: Automata, Recurrent Neural
Networks, and dynamical Systems”, Proc. of the IEEE, vol. 87,
prefix tree. Our experiments show that, even when “holes”
no. 9, 1999, pp.1623-1640.
are present in the training set, it is possible to extract the
ideal deterministic finite acceptor by making use of the
[7] R. Chandra, C.W. Omlin, “Evolutionary training of hybrid
hybrid recurrent network for missing output membership
systems of recurrent neural networks and hidden Markov
values. models”, Transactions on engineering, computing and
technology, vol. 15, October 2006, pp. 58-63.
5. Conclusions
[8] C. Kim Wing Ku, M. Wai Mak, and W. Chi Siu, “Adding
We have successfully combined strengths of hidden learning to cellular genetic algorithms for training recurrent
Markov models and recurrent neural networks to construct neural networks,” IEEE Transactions on Neural Networks, vol.
10, no.2, 1999, pp. 239-252.
the hybrid recurrent neural network architecture. The
structural similarities between hidden Markov models and [9] H. Jacobsson, “Rule extraction from recurrent neural
recurrent neural networks have been the basis for the networks: A taxonomy and review”, Neural Computation, vol.
successful mapping in the hybrid architecture. We have 17, no. 6, 2005, pp. 1223-1263.
used genetic algorithms to train the hybrid system of
[10] S. Das & R. Das, “Induction of discrete state-machine by
stabilizing a continuous recurrent neural network using
clustering,” Journal of Computer Science and Informatics, vol.
2, no.2, 1991, 35-40.
[14] T. Lin, B.G. Horne, P. Tino, & C.L. Giles, “Learning long-
term dependencies in NARX recurrent neural networks,” IEEE
Transactions on Neural Networks, vol. 7, no. 6, 1996, pp. 1329-
1338.