Documente Academic
Documente Profesional
Documente Cultură
Abstract
The effects of various training strategies are investigated on a
Simple Recurrent Neural Network (SRNN) that learned to
emulate an 8-Digit up/down/resettable counter.
Introduction
In (Pres-Uribe & Sanchez, 1998), the interpretation of
results produced an initial learned strategy (for Blackjack)
of stand if the score is greater than 11, hit otherwise.
However, it must be noted that the system did not in fact
keep score as this was done by the external environment
surrounding the SARSA reinforcement learning algorithm.
Moreover, being, essentially, a table lookup process, each
score was associated with a separate rule (action-value),
rather than a generic rule of greater than 11.
A learned strategy is often an external guess of the
systems behavior. A training strategy includes the variety
of mechanisms needed to cause the system to exhibit
specific behaviors. This paper will discuss a variety of
training strategies and will attempt to analyze what has been
learned, but it must be re-emphasized that what has been
learned can only be guessed at.
Can a network learn to count and keep score? Only then
can it be presumed to have such generic counting or scoring
rules. An initial attempt at doing so was based on the
following SRNN. The inputs represented various card
numbers (1-5 for simplicity). The single output to be learned
was normally -1. but become +1 when the count achieved a
fixed value (11 for simplicity). Ideally, the system would
have learned to count by different magnitudes up to 11. In
practice the system did poorly. When the sequence of
random numbers producing the fixed value count
approached the average sequence length, the system
performed better. But in no way could it achieve a level of
counting that would be deemed acceptable (even with fewer
count magnitudes and lower count threshold).
The problem, of course, is that SRNNs are better suited
for temporal patterns the goal is too far in the future and is
better suited for a reinforcement learning algorithm such as
SARSA, unfortunately, these cannot be thought of as having
something akin to a counter under their skin as we indicated
for the Blackjack game.
However, emulating an 8-Digit counter does enable an
SRNN to learn to count as it provides immediate feedback.
This will highlight the many temporal issues in SRNNs and
8-Digit Counter
An 8-Digit counter can be considered as having 8 output
values representing the numbers 0 thru 7.
Its only inputs are an Up, Down and Reset signal. The
counter must maintain an internal state representing the
current count value in order to produce the next count value.
An SRNN does maintain a copy of the hidden nodes and
can be thought of as maintaining such a state, albeit in its
own, internal representation.
The Up input signal will cause the counter to count up by
one, wrapping back to 0 after 7.
The Down input signal will cause the counter to count
down by one, wrapping back to 7 after 0. It is the presence
of both an Up and Down signal that highlights temporal
issues with the various training strategies.
The Reset input signal sets the counter to zero. Resetting
the counter is essential for training purposes, as otherwise,
there can be no correlation between a supposedly internal
count state and the predicted next value count state. It is this
internal current count state that we want the SRNN to
emulate.
Training
The SRNN was incrementally trained in epochs consisting
of a sequence of 1000 runs using a variety of training
strategies. The effectiveness of these strategies is the main
subject of this paper. Each individual run started with a
Reset, and was then followed by a set of mini Up/Down
sequences that followed some strategic pattern.
Performance Measures
Strategy#0 was a random choice of the strategies 2 thru 5
and was used mainly for performance measurement. After
each training epoch, a sequence of 100 runs using the
trained strategy, and another 100 runs using strategy#0 was
tested against the SRNN and simple error rates calculated.
It should be pointed out that due to the state nature of the
SRNN, a single prediction mistake in the middle of a run
could cause all subsequent predictions to err until the next
Reset. A confusion matrix would have been too confusing,
when state is available. Instead, the simple error rate was
Output
The prediction outputs, shown in strategy#2 and #3, had
four columns.
Column I was the input signal - r for Reset, u for
Up, and d for Down.
Column O was the correctly counted output value (after
processing the input). It had a value from 0 thru 7.
Column PO was the predicted output value. If it did not
match the O column, an error was considered.
Column Error was the internal error measure, which for
singly valued output SRNNs can be inversely likened to
confidence measures.
SRNN Details
The first strategy was used simply to understand the limits
of some of the SRNN variables: learning rate, momentum,
number of hidden nodes, and representations.
Strategy#1 Counting Up
In this strategy, a run would only count up a random number
of times between 1 and 17. This was used to test the limits
of hidden network layers.
Reset Confidence The confidence value for Reset
consistently improved the quickest, for this and all other
strategies, see outputs for strategies#2&3). This is important
as the networks understanding of a Reset is essential to
learning the state dependant patterns (counting). The output
due to the Reset signal is independent of current state - a
simple Backprop could learn it, which is why it is learned so
quickly.
It is by design that a Reset signal was included. Whether
an SRNN can learn to count without one has not been
investigated.
Hidden Node Analysis The effect on the number of
hidden/recurrent nodes is now analyzed. The error rates here
are only measured against the same strategy, as all other
strategies would count down as well and cause horrendous
failures!
With 2 hidden nodes the SRNN could only count from 0
to 3. This could still bring the inconsistent error rates down
to 51% @ 7th epoch at best.
1
For a real paper, I would perform at least 10, if not more trials.
Conclusion
SRNNs are good at finding temporal patterns, but when
there are several (Up or Down) patterns that interfere with
overlapping representations (4 down to 3, and 4 up to 5)
Acknowledgments
Terry Stewart for allowing me this research.
References
Pres-Uribe, A. & Sanchez, E. (1998), Blackjack as a Test
Bed for Learning Strategies in Neural Networks.
Proceedings of the IEEE International Joint Conference
on Neural Networks. IJCNN (pp. 2022-2027).