Sunteți pe pagina 1din 5

Training Strategies in an SRNN

(A CGSC 5001 Final Paper)


David Pierre Leibovitz (dpleibovitz@ieee.org)
267 Iroquois Road, Ottawa, Ontario
K2A 3M5, CANADA
613-728-0099
(Student# 100251310)

Abstract
The effects of various training strategies are investigated on a
Simple Recurrent Neural Network (SRNN) that learned to
emulate an 8-Digit up/down/resettable counter.

Introduction
In (Pres-Uribe & Sanchez, 1998), the interpretation of
results produced an initial learned strategy (for Blackjack)
of stand if the score is greater than 11, hit otherwise.
However, it must be noted that the system did not in fact
keep score as this was done by the external environment
surrounding the SARSA reinforcement learning algorithm.
Moreover, being, essentially, a table lookup process, each
score was associated with a separate rule (action-value),
rather than a generic rule of greater than 11.
A learned strategy is often an external guess of the
systems behavior. A training strategy includes the variety
of mechanisms needed to cause the system to exhibit
specific behaviors. This paper will discuss a variety of
training strategies and will attempt to analyze what has been
learned, but it must be re-emphasized that what has been
learned can only be guessed at.
Can a network learn to count and keep score? Only then
can it be presumed to have such generic counting or scoring
rules. An initial attempt at doing so was based on the
following SRNN. The inputs represented various card
numbers (1-5 for simplicity). The single output to be learned
was normally -1. but become +1 when the count achieved a
fixed value (11 for simplicity). Ideally, the system would
have learned to count by different magnitudes up to 11. In
practice the system did poorly. When the sequence of
random numbers producing the fixed value count
approached the average sequence length, the system
performed better. But in no way could it achieve a level of
counting that would be deemed acceptable (even with fewer
count magnitudes and lower count threshold).
The problem, of course, is that SRNNs are better suited
for temporal patterns the goal is too far in the future and is
better suited for a reinforcement learning algorithm such as
SARSA, unfortunately, these cannot be thought of as having
something akin to a counter under their skin as we indicated
for the Blackjack game.
However, emulating an 8-Digit counter does enable an
SRNN to learn to count as it provides immediate feedback.
This will highlight the many temporal issues in SRNNs and

how to combat them via varied training strategies is the


thrust of this paper.

8-Digit Counter
An 8-Digit counter can be considered as having 8 output
values representing the numbers 0 thru 7.
Its only inputs are an Up, Down and Reset signal. The
counter must maintain an internal state representing the
current count value in order to produce the next count value.
An SRNN does maintain a copy of the hidden nodes and
can be thought of as maintaining such a state, albeit in its
own, internal representation.
The Up input signal will cause the counter to count up by
one, wrapping back to 0 after 7.
The Down input signal will cause the counter to count
down by one, wrapping back to 7 after 0. It is the presence
of both an Up and Down signal that highlights temporal
issues with the various training strategies.
The Reset input signal sets the counter to zero. Resetting
the counter is essential for training purposes, as otherwise,
there can be no correlation between a supposedly internal
count state and the predicted next value count state. It is this
internal current count state that we want the SRNN to
emulate.

Training
The SRNN was incrementally trained in epochs consisting
of a sequence of 1000 runs using a variety of training
strategies. The effectiveness of these strategies is the main
subject of this paper. Each individual run started with a
Reset, and was then followed by a set of mini Up/Down
sequences that followed some strategic pattern.

Performance Measures
Strategy#0 was a random choice of the strategies 2 thru 5
and was used mainly for performance measurement. After
each training epoch, a sequence of 100 runs using the
trained strategy, and another 100 runs using strategy#0 was
tested against the SRNN and simple error rates calculated.
It should be pointed out that due to the state nature of the
SRNN, a single prediction mistake in the middle of a run
could cause all subsequent predictions to err until the next
Reset. A confusion matrix would have been too confusing,
when state is available. Instead, the simple error rate was

calculated based on the number of bad predictions out of the


total number of predictions crude but sufficient.
For example, an error rate of 10%/15% implies that for
strategy#x, it had a 10% error rate when tested against
strategy#x, and a 15% error rate when tested against a
sequence produced by strategy#0. If only one error rate is
shown, it applies to both strategies.
As well. the first 30 predictions, along with their internal
confidence measures could be printed for analysis.
Where consistency was measured, up to 5 distinct trials
were run1.

Output
The prediction outputs, shown in strategy#2 and #3, had
four columns.
Column I was the input signal - r for Reset, u for
Up, and d for Down.
Column O was the correctly counted output value (after
processing the input). It had a value from 0 thru 7.
Column PO was the predicted output value. If it did not
match the O column, an error was considered.
Column Error was the internal error measure, which for
singly valued output SRNNs can be inversely likened to
confidence measures.

SRNN Details
The first strategy was used simply to understand the limits
of some of the SRNN variables: learning rate, momentum,
number of hidden nodes, and representations.

Strategy#1 Counting Up
In this strategy, a run would only count up a random number
of times between 1 and 17. This was used to test the limits
of hidden network layers.
Reset Confidence The confidence value for Reset
consistently improved the quickest, for this and all other
strategies, see outputs for strategies#2&3). This is important
as the networks understanding of a Reset is essential to
learning the state dependant patterns (counting). The output
due to the Reset signal is independent of current state - a
simple Backprop could learn it, which is why it is learned so
quickly.
It is by design that a Reset signal was included. Whether
an SRNN can learn to count without one has not been
investigated.
Hidden Node Analysis The effect on the number of
hidden/recurrent nodes is now analyzed. The error rates here
are only measured against the same strategy, as all other
strategies would count down as well and cause horrendous
failures!
With 2 hidden nodes the SRNN could only count from 0
to 3. This could still bring the inconsistent error rates down
to 51% @ 7th epoch at best.
1

For a real paper, I would perform at least 10, if not more trials.

With 3 hidden nodes the SRNN could only count from 0


to 6 at best. Inconsistent error rates where at best 36%@11th
epoch.
With 4 hidden nodes could count up to 7 but would often
no not wrap back to 0. Learning was not consistent, but best
ones were 0%@2nd epoch and 0%@11th epoch.
5 hidden nodes produced the most consistent results.
Error rates were 0% @ 3rd, 5th, 8th, 13th and 15th epoch,
although the solution diverged once into a local minimum.
A perfect resettable up counter is possible via SRNNs!
Perhaps generic learning rules can be made possible as well.
6 hidden nodes gave poor results. Inconsistent error rates
were 0% @ 2nd and 3rd epochs, 13% @ 17th epoch, 22% @
13th epoch, 24% at 10th epoch. Clearly, this variance is due
to too many hidden nodes.
Output Representation Analysis It is well known that the
input representation can make a huge difference. Here we
investigate changes in the output representation.
The above hidden node analysis was made with 8 separate
output nodes representing the 8 different digit values of 0
thru 7 (normalized between -1 and +1).
If the outputs where changed to be a binary encoding of
the 8 digit values, only 3 bits would be required (normalized
between -1 and +1). When this was done, 9 hidden nodes
were required for fairly consistent results. Error rates were
0% @ 1st, 2nd, 5th and 7th epoch. It was stuck 1 out of 5 times
at a local minimum of 4%. The implication may be that the
back-propagation algorithm needs to untangle the output
errors into a larger set of hidden nodes.
The standard binary notation can cause 1, 2 or 3 digits to
change during the counting procedure. There is an alternate
3-bit binary notation that, for counting, only involves a one
digit change even with wrapping. It is 000, 001, 011, 010,
110, 111, 101, 100. With this output representation
(normalized between -1 and +1) only 7 hidden nodes were
needed. The fairly consistent error rates were 0% at 1st, 2nd
and 11th epoch. It was stuck 2 out of 5 times at a local
minimum. The implication here is twofold. As before, more
hidden nodes are required for untangling the output errors.
But because SRNNs have a temporal learning element, less
untangling is required as there is a more direct relationship
between a change in output errors and a change in inputs.
This is a novel representation result.
For all further analysis, the 8 output node representation
will be used. For one thing, this requires the fewest total
nodes (hidden nodes are duplicated in the recurrent part of
the network). For another, it is fairly standard practice to
keep input and output values as separable as possible.
Learning Rates The learning rate throughout this paper has
a value of 0.01, except for strategy#1 where it is 0.1. In this
section only, the effect of other learning rates is investigated
for the standard 5 hidden node solution used in strategy 1.
With a learning rate of 0.01 the error rates for this strategy
were inconsistent being 0%@15th epoch and getting stuck 4
out of 5 times in a local minimum. A learning rate of 0.01

worked better with 7 hidden nodes given error rates of 0%


@ 5th, 7th and 7th epoch and getting stuck 2 out of 5 times at
a local minimum. Because strategy#1 has such a simple
repeating pattern, the more aggressive learning rate (0.1) is
appropriate. However, for all other strategies, the typical
value of 0.01 was found to be ideal.
For example, with a learning rate of 0.2 the SRNN got
stuck 5 out of 5 times at a local minimum (or possibly
thrashed too much).
Although there may be learning rate interactions with
training strategy selection, the focus of this paper is on these
high level strategies rather than the SRNN details, so the
value was retained at 0.01 throughout.
Momentum. The momentum throughout this paper has a
value of 0. In this section only, the effect of other
momentum values is investigated for the standard 5 hidden
node solution used in strategy 1.
With a momentum value of 0.1, the SRNN got stuck 5 out
of 5 times.
Momentum only helps in special cases2, of which this
application is not. It was thus left at 0 everywhere else. As
for learning rates, the momentum interactions with training
strategy was not further investigated.

Training Strategy Selection


The main focus of this paper is the effect of training
strategies on SRNN learning behavior.

Strategy#2 Up Only or Down Only


This, and all other subsequent training strategies count both
upwards and downwards. This ability will highlight
important learning effects. Startegy#2 is similar to
Strategy#1. A run would count in a single random direction,
a random amount of times between 1 and 17.
While 5 hidden nodes were needed for strategy#1,
strategy#2 required 10 hidden nodes, and produced an
inconsistent error rate that would at times go as low as
9%/51% @ 34th epoch.
Hidden Node Interpretation If one believes that the
hidden/recurrent nodes emulate the memory of the current
count value, in some internal representation, simply adding
another count direction should not require a doubling of
such nodes. Adding one bit of information (up or down)
would in digital terms require one more bit internally.
However, an alternative explanation is that the internal
nodes hold the logic (or pattern) for addition, and you need
to double the logic in order to handle subtraction (or another
pattern). In fact, the internal nodes must do both maintain
a state of the current value and figure out the next state. So
does the SRNN hold internally 1 counter or two its
anybodys guess.

List cases or insert a reference.

Output Interpretation A look at the output is more


revealing. Here is an annotated sample:
---- Epoch: 34
I O PO Error - against specific strategy
r 0 0 0.01
u 1 1 0.05
u 2 2 0.03
u 3 3 0.01
u 4 4 0.02
r 0 0 0.01
d 7 7 0.03
d 6 6 0.02
d 5 5 0.03
d 4 4 0.04
d 3 5 2.00 1)
d 2 2 0.13
d 1 1 0.16
d 0 0 0.05
d 7 7 0.10
d 6 6 0.06
d 5 5 0.01
d 4 4 0.03
d 3 5 2.00
d 2 2 0.08
d 1 1 0.10
d 0 0 0.03
Error rate= 9
I O PO Error - Against random strategy
r 0 0 0.01
u 1 1 0.05
u 2 2 0.03
d 1 5 2.83 2)
d 0 4 2.82
d 7 5 2.00 3)
d 6 2 2.77
d 5 1 2.76
u 6 2 2.83
u 7 3 2.80
d 6 5 3.23
d 5 4 2.80
d 4 0 2.00
u 5 1 2.63
r 0 0 0.01
u 1 1 0.05
u 2 2 0.03
d 1 5 2.83
d 0 4 2.82
d 7 5 2.00
u 0 1 1.97
u 1 6 2.48
u 2 3 2.80
u 3 4 2.83
r 0 0 0.01
d 7 7 0.03
d 6 6 0.02
d 5 5 0.03
d 4 4 0.04
d 3 5 2.00
Error rate= 51

In error 1), it is seen that the SRNN has trouble counting


down completely. Normally it learns to count in one
(dominant) way perfectly and the other way causes
problems. This can be interpreted as follows. After a Reset,
the Up or Down signal is used to start the pattern. However,
once started, knowing that the previous value was a 1 is the
best prediction that the next value is a 2 regardless of Up or
Down signals. Similarly, knowing the previous value was a
7 is the best prediction that the next value would be a 6. In
some sense, the internal nodes have potentially infinite

memory seeing as they form an endless recurring feedback


cycle, so knowing the 2nd previous value was a 1 also
predicts that the next one is a 3; knowing that the 3rd
previous value was a 1 predicts that the next one is a 4, etc.
and for the opposing direction. So for error 1), the up
pattern was dominant, the down pattern worked well
halfway when it collided with the up patter. The down
signal was simply ignored and the SRNN counted back up.
The SRNN is best suited to learn temporal patterns, and it is
the appropriate use of training strategies that can overcome
this deficiency.
Error 2) comes about because the network was trained on
only Up or Down patterns separated by Resets, but tested on
a pattern that had both not-so-separated. This further shows
that once the count has started, the Up or Down signal is
mostly ignored. In 3), the predicted output is appropriate if
one ignored the Up or Down signal, and simply counted up.

Strategy#3 Random mini Up or Down Sequences


This training strategy had more Ups and Downs in an
attempt to force the SRNN to understand them as a more
significant signal and lower the error rates. A run consists of
a Reset followed by a random (1-7) set of mini-sequences.
Each of these is a random (1-5) repetition of either an Up or
Down (randomly chosen).
Better than strategy#2, but with 12 hidden nodes
produced inconsistent error rates of 3%/5%@264th epoch,
4%/5%@305th epoch, and 52/59@144 epoch.
Analysis of output showed that most errors occurred at
the extreme end of the count.
Slow Learning Interpretation It could take 300 epochs to
learn, vs. 45 for strategy#2. This is partly because more
hidden nodes are required, and the better error rates need
longer to achieve. It is also suspected that there is a battle to
overcome going either only up or down. The 2 extra bits
could be for the controller deciding which of the internal
counters is chosen.

Strategy#4 Random Up then Down at a Random


Point
Srategy#4 attempted to ensure each count position was as
equal as another in terms of having an Up or Down change.
In this strategy, a random count (0-7) was chosen. After a
Reset, a sequence of either Ups or Downs was made to
bring the count to the chosen value. Then a sequence of
length 9 was made in the opposite direction, and then
another sequence of length 9 was made in the opposite
direction to the previous. In some sense this strategy was a
balanced version of strategy#2 and 3.
With 12 hidden nodes, more consistent and lower error
rates were achieved, 0%/5%@187th epoch, 0%/10@113th
epoch,
4/13@627th
epoch,
23%/27%@224,
and
1%/10%@143rd epoch.

Strategy#5 Random Up or Downs


In this training strategy, the intent is to cause the network to
learn the significance of the Up or Down signal faster. So a
run consisted of a Reset followed by random Up or Down
signals for a random length of 9 to 25.
With 10 hidden nodes this strategy had fairly consistent
low error rates such as 0%/0% @ 77th, 111th, 214th epochs,
4%/10% @ 52nd epoch and 7%/18% @ 522nd epoch, but
with a large variance of epochs, and, in fact, high epochs in
general. A perfect 8 digit Up/Down/Resettable counter is
possible!
Output Interpretation A simple analysis of the output
explains the slow performance.
---- Epoch: 77
I O PO Error - Against random strategy
r 0 0 0.01
u 1 1 0.04
u 2 2 0.05 2)
u 3 3 0.03
u 4 4 0.08
u 5 5 0.09
u 6 6 0.19
u 7 7 1.88 1)
u 0 0 0.73
u 1 1 0.02
u 2 2 0.08 2)
u 3 3 0.05
u 4 4 0.09
u 5 5 0.09
u 6 6 0.19
u 7 7 1.87
Error rate= 0

A purely set of random Ups or Downs rarely leads to a long


sequence of Ups or Downs. Therefore confidence goes
down (error up) with increasing length of the sequence, as in
1). The output still indicates temporal effects. At 2), the
confidence is worse because, even though it follows an Up1,
the difference is that rather the being preceded by a Reset0,
it is preceded by an Up7 and Up0. For this network, The
Up/Down/Reset signals are significant, so this temporal
change will impact confidence.
12 Hidden Node Interpretation for Strategies#3 and #4
Strategies#3 and #4 required 12 hidden nodes while
strategy#5 needed only 10. One possibility is that both
earlier strategies had several patterns within patterns each
random generator still has its own average and hence
produces a statistical effect. The 2 extra hidden nodes could
be the SRNN attempting to predict the output due to these
patterns-within-patterns, analogous to a Fourier transform
finding the constituent frequencies. In Strategy#5, there was
really only one source of randomness, so fewer hidden
nodes were required.

Conclusion
SRNNs are good at finding temporal patterns, but when
there are several (Up or Down) patterns that interfere with
overlapping representations (4 down to 3, and 4 up to 5)

then training strategies can have significant effects,


including how many hidden nodes are required. Finding a
strategy that puts emphasis on the input signals that can
distinguish the two patterns is crucial but this can still take
many epochs.
The recurrent and hidden nodes of an SRNN can also be
thought of as holding state such as a current count state as
needed for a counter. Training an SRNN to model an
Up/Down/Resettable counter is possible with the right
strategy. At this point it might be possible to consider that
other parts of the network are learning generic scoring rules
for example. However, it is not known if by playing a game
such as Backgammon, that the right strategy would have
presented itself and such rules learnt. After all, a human
should be able to learn the counting capability within a 100
patterns or less while the SRNN takes 77 thousand at best
a whole different type of learning is going on!

Acknowledgments
Terry Stewart for allowing me this research.

References
Pres-Uribe, A. & Sanchez, E. (1998), Blackjack as a Test
Bed for Learning Strategies in Neural Networks.
Proceedings of the IEEE International Joint Conference
on Neural Networks. IJCNN (pp. 2022-2027).

S-ar putea să vă placă și