Sunteți pe pagina 1din 8

Simulated Robot Football

David Goodyear, Jonathan Tanton Brown, Pete Wood


Department of Computer Science, University of Kent, Canterbury
dag6@kent.ac.uk, jdt4@kent.ac.uk, paw4@kent.ac.uk

Abstract to different situations was fascinating. At a very


early stage in the project it was decided
In recent years there has been a growing unanimously that we would design and develop a
interest in robot football with a number of team of simulated robot football players using
annual tournaments being held and an neural networks that would recognise certain
international standard set of rules being devised. patterns of play and respond accordingly.
These set of rules have largely been created by
RoboCup, an international joint project to 2. Background
promote AI, robotics, and related fields. The
overall goal of RoboCup is: 2.1. What is RoboCup?
“By 2050, develop a team of fully autonomous
humanoid robots that can win against the human The idea of RoboCup is to provide a standard
world champion team in soccer.” problem for researchers to explore artificial
This paper will discuss our design and intelligence and robotics where a wide range of
implementation of a set of programs used for the technologies can be examined and integrated. The
simulation version of robotic football (i.e. where concept of a robotic world cup was first introduced
the robots are simulated on computers rather than in 1993 and after a two-year feasibility test an
in the real world). It will begin with a short announcement was made on the introduction of the
introduction to robot football and machine first international conferences and football games.
learning before detailing our main aims and In July 1997 the first official conference and games
objectives, decisions made during the development were held in Nagoya, Japan and have been held
cycle and how these were implemented before every year since at different venues around the
drawing together conclusions about how well these world and covered by the international media.
aims and objectives were executed. There are five main leagues in RoboCup; a
simulation league where teams compete on a
1. Introduction simulated playing field on computers, a small-sized
robot league, a middle-sized robot league, a four-
This project was chosen because of the legged robot league and a humanoid league. Due to
strengths in the team and our keen interest in time and resource constraints the simulation league
football. As well as a good knowledge of the Java was the only option available to us.
programming language (the language used
throughout this project) we all enjoy and follow 2.2. The Soccer Server
football making this an attractive and potentially
challenging assignment. There were a number of Games of football are played on computers
different routes that could be taken as far as using the RoboCup simulator called the soccer
simulating football players in programs was server. The RoboCup soccer monitor (see figure 1)
concerned. We could have developed systems for allows visualisations of the game to be displayed
tactical and strategic coordination between players, on the users screen using the X windows system.
or focused more on visual processing techniques. The server is written to allow multiple virtual
However, we decided that we would focus our soccer players to connect in an unspecified
efforts on learning algorithms such as neural computing environment with real time demands as
networks or genetic algorithms. well as semi-structured conditions. This allows
We found the concept of programs that “learn” researcher to focus their efforts on cooperation and
highly appealing as it was something that none of learning techniques rather than the details of
us had done before. The idea that we could write a connecting.
computer program that could learn how to respond

-1-
4. Preliminary Investigations

4.1. Understanding the Server

Before detailing the steps we took in order to


implement the neural network it was important that
we understood the soccer server and the messages
this would send, including the timing and the
content. It was vital this was understood in some
detail as the project would depend on the
successful parsing of information and making sure
the right values were sent to the neural network.
Figure 1. The soccer monitor screen. We therefore consulted as much documentation
about these messages as possible. This proved to be
Matches are carried out in a server/client quite difficult as the documentation that was
fashion. Each client controls the actions of one available was somewhat dated and different
player while the server provides a virtual field, protocols were attainable. The only way to find out
simulates the movements of all the players and the exactly what we were being sent was to connect to
ball, sends messages to the clients and makes sure the server and receive messages sent to us. Below
the game is played within the rules. Each client is an example message:
must connect to the server through a port and
communication is carried out via UDP/IP. The (see 43 ((line t) 6.2 -17)
server sends messages to each client/player that ((ball) 16.8 15))
tells them what they can see, hear and sense. The
client sends messages back in response to this data The first piece of information we are told is the
and tells the player what to do, making the client type of message it is, in this case a “see” message.
program act just like the players brain. This is then followed by the cycle time, “43” and
then the different things the player can see. In this
3. Aims example the player can see the top line, “line t” at a
distance of 6.2m and a direction of -17º and the
Our primary aim of the project was to produce ball at a distance of 16.8m and at 15º. This is a
a team of players that were capable of beating all fairly typical message, although they are generally
other teams being developed by students at the much longer as the player can usually see more
University of Kent. Our players would implement a things.
neural network in order to learn about the Each half is split into 3000 cycles, each cycle
opposition so the more games they played the more lasting 100ms. This makes the half of a game last
our team would improve. It would also allow the five minutes. A player is only allowed to send up to
individual players to learn during a game, thus three messages per cycle, but they usually only
improving the chances of winning. send one (only one kick, dash, turn or turn_neck
However, as will be explained further in this can be sent each cycle). The server sends “see”
report, it was not possible for the neural network to messages and “sense” messages at different
be fully implemented and we therefore had to intervals, with “hear” messages being sent at any
choose another approach. Our decision was to time. “See” messages are sent every 150ms and
create a rule based system that would use genetic “sense” messages sent every 100ms. It was
algorithms to learn, with our overall aim remaining important we understood the timings correctly as
the same, i.e. to create a team of players capable of we required our players to make decisions with the
beating all other teams being produced by students most up to date information available to them, and
at the University of Kent. we also had to make all the decisions and send the
message within one cycle.

4.2. Protocol Versions

Because RoboCup has been around for several


years we encountered problems with the different
protocols available. The protocol we initially began

-2-
to use sent the same data but the “see” messages In this simple example there are just two input
were slightly different. In version 9 of the protocol nodes and two output nodes, with four hidden layer
the server (using the previous example) would nodes. Input values are entered at the input nodes
send: and processed through the network. As the values
pass through each connection they are multiplied
(see 43 ((line t) 6.2 -17) by the associated weight, a number between 0 and
((ball) 16.8 15)) 1. The summations of these weighted values are
worked out at each node and if a threshold value is
However, we changed to version 7 as this was reached they are passed onto the next layer, until
what the RoboCup manual used. In doing so the eventually there is a value in the output nodes (see
messages were sent slightly differently: section 3.1.1. in the corpus of material).
Neural networks can learn through a number of
(see 43 ((l t) 6.2 -17) ((b) 16.8 techniques, all of which involve changing the
15)) weights associated with the interconnections
between the nodes. They are used for pattern
The words “line” and “ball” had been replaced recognition amongst other things. They have three
with letters - “l” and “b”. Although this was not a main advantages:
major problem it became a hindrance as some of 1. They have the ability to learn through
the first code we wrote was based on version 9 of example
the protocol and when we changed to 7 our code 2. They are more fault tolerant than most
fell over. other learning algorithms
3. They are more suited to real-time
4.3. Choosing a Learning Algorithm operations due to their high computational
rates.
There were a few options open to us when it
came to choosing a learning algorithm and we 4.3.2. Genetic Algorithms. Genetic algorithms are
narrowed it down to two: Neural networks and inspired by Darwin’s theory of evolution. The
genetic algorithms. algorithm begins with a set of solutions to a
problem called the population. Solutions from one
4.3.1. Neural Networks. A neural network population are taken and used to form new
consists of a number of interconnected neurons populations. It is hoped that the new population
called nodes. They work by inputting data into the will be better at solving the problem than the old
input nodes and processing these through the one. The population that is chosen to “reproduce”
network to the output node. The number and is the population that provides the best solution to
strength (the weight) of the connections between the problem, much like survival of the fittest in
the nodes determines the final output. The theory is biology.
that the neural network works in the same way as A basic genetic algorithm has the following
an incredibly primitive biological brain. A human steps:
brain however contains billions of neurons and 1. Generate a random population of n suitable
connections so even if scientists understood fully solutions
the dynamics of human brains it is still beyond our 2. Evaluate the fitness of each solution in the
computing capacity to simulate them. population
A neural network typically has three layers; the 3. Create a new population by selecting the
input layer, the hidden layer and the output layer: best two performing solutions and combine
them to produce offspring. Then mutate the
offspring slightly and place them in the
new population.
4. Use the new population and run the
algorithm again.
5. If you are satisfied with the results stop,
else go to step 2.

Genetic algorithms also have three main


advantages:
Figure 2. A simple neural network
1. They converge from a random population
to a number of solutions

-3-
2. They do not rely on any assumptions or
prior knowledge of the search space
3. A genetic algorithm search runs based on
probabilistic rules instead of deterministic
ones.

4.3.3. Our Decision. We decided to implement a


neural network as the learning algorithm our
players would use. Other groups working on the
same kind of project at the University of Kent had
chosen to use a genetic algorithms approach and
we thought we would be more unique by choosing
this. One of the advantages of a neural network is
that is suited to real-time operations and the
decisions our players make would indeed be in Figure 3. The structure for message parsing
real-time. Also, the fact that neural networks were
computationally fast was also a factor in our 5.2. Implementing the Neural Network
choice, as we knew decisions would have to be
made fairly quickly as cycles only lasted 100ms. As with most neural networks designed in
Additionally, none of us had any prior knowledge software we decided that we would have three
of neural networks and we liked the fact it would layers; an input layer, one hidden layer and an
be a challenge. output layer. Initially we produced a Neuron class
that represented each neuron in the network and
5. Technical Implementation Synapse class that represented each connection
between the neurons. The class Neuron would take
5.1. Message Parsing in all the values passed to it, sum them and check
whether the summation is greater than the
threshold value. If it is greater the output of this
The first thing we needed to consider was how
summation can be passed to the next layer. Each
to handle the messages we were sent from the
Synapse class would simply take one input value
server. As we were being sent three different types
and multiply this by the weight associated with it to
of messages (see, hear and sense) we decided that
give an output. However, after some testing we
it would make more sense in having three separate
found that this implementation was much too slow.
classes to deal with each, since we received them at
Even on a much simplified network that had just 18
different times. We created a SeeBrain for see
input nodes, seven hidden nodes and four output
messages, HearBrain for hear messages and
nodes it was taking well over 100ms just to create
FeelBrain for sense messages. A further class was
the classes and process the data through. After
written to receive the messages and pass them to
more testing we found that it was the creation of all
the correct brains. This was called SensoryInput.
the classes that was taking the most time. With 18
Once the individual brains have the message it is
input nodes, 7 hidden nodes and 4 output nodes
their job to take the relevant information and pass
there were 18x7x4 Synapse classes being created.
this onto the main PlayerBrain where the neural
With the other node classes this meant we were
network would be implemented (see figure 3).
creating 533 classes before we had even made
The three brains all work in generally the same
calculations. With the size of the network only
way. They take in the string that is given to them
going to increase we needed to think of another
by the SensoryInput class and break it down into
possible implementation.
substrings by looking at the where the brackets are
We then decided to model the different layers
located. These substrings are then broken down
of the neural network as arrays. This would
further until eventually it has the values for the
eliminate the large overheads of creating hundreds
different objects the player can see. These are then
of Java objects into just creating one. We also
fed into the neural network as each possible object
decided to make the network even simpler with
seen is an input node.
only two input neurons, two hidden layer neurons
and two output neurons. We hoped that with this
simple network we could learn a couple of
situations and grow the network from there. The
two input values we choose were the distance and

-4-
direction of the ball, and the two values we wished error. If you wanted to have an output of 30 and
to output were the turn value (which direction the you get an output of 50 you have an error of 20. It
player turns) and the dash value (how fast the is this error that leads to the next phase which gives
player runs): the algorithm its name – the errors are fed back
through the network and changes are made to the
weights of the nodes depending on how much that
node contributes to the error at the output. The
algorithm repeats this process until the outputs
produced for the training data are sufficiently close
to the desired outputs; in other words until the error
is sufficiently small (further explanation is given in
Figure 4. The 2x2x2 neural network section 3.1.1. in the corpus of material).
For the backpropagation algorithm to work you
To create this small network in Java we created must first train it using training data. Since we only
a number of arrays. The first was called had a small network to begin with we decided that
neuronTypes, and this was used to identify which we would train the network to recognise the
input neuron the value for the balls distance and following simple situation:
direction should be loaded into. We then created a  If the ball is greater than 20m away turn
neuronOutputTypes to extract the correct output towards it but don’t run
value. These arrays were of String type, but we  If the ball is less than 20m away turn and
also required arrays that would handle the actual run towards it
values, so we created the output layer, a one- After implementing the mathematics of the
dimensional array called outputLayer, and two algorithm we began to train the network somewhat
two-dimensional arrays called inputLayer and unsuccessfully. We had a number of problems, one
hiddenLayer. The reason inputLayer and of which was to do with normalising our inputs.
hiddenLayer were two-dimensional was so we We were advised to use input values of between 0
could store the value and the weights associated and 1, but the distance of the ball could be anything
with that neuron in the same array. This enabled us up to 160m away, and the direction of the ball
to save valuable time during the processing stage. could be from -180° to 180°. We decided we would
Instead of taking well over 100ms to process therefore normalise the inputs by dividing them by
data through the network it was now taking the maximum possible error we could receive at
nanoseconds. We then tested to see how long it the output nodes, being 160 for ball distance and
took to parse the messages and then process the 360 for ball direction.
network. We found it took around 2 or 3 Another problem was to do with the sigmoid
milliseconds almost every time, but occasionally it function that was required during the
would jump to 30 or 40. We discovered the reason backpropagation algorithm. The problem here was
for this was that Java did its own garbage that it would never get to negative numbers. There
collection automatically when a certain amount of were a couple of ways round this. Firstly we could
memory was being used, so every now and then it have used another version of this function that
would take time out to clear this. It was decided we allowed for negative values called the tan sigmoid,
would manually perform our own garbage but eventually we just worked around the problem
collection every time the messages were processed by assuming we would never use negative
through the network. This increased the average numbers. For example, if we had to turn -10° we
time to parse and process the messages, but it would simply turn 190° as this amounted to the
meant that this whole process never took more than same thing.
about 10ms. We eventually managed to get the neural
network learning this one situation after around
5.3. Backpropagation 500 cycles through the backpropagation algorithm.
Now we knew the mathematics was working we
The backpropagation algorithm starts by attempted to learn every combination for ball
initialising all the weights in the neural network to distance and direction – all 57600 (160*360) of
random values, usually in a range of -0.5 to 0.5. them. Here however, the algorithm did not work as
The data is then fed through the network to give we had hoped. Instead of learning the two rules we
some output. The algorithm works by assuming had told it, the network seemed to just take an
you will know what you want at the output nodes average of the maximum errors we were giving, as
and therefore what you actually get will have an

-5-
Dash converged to 80 (maximum error of 160) and 5.4. Synchronisation Issues
Turn converged to 180 (maximum error of 360).
We consulted a neural network expert, Dr. While we were developing the neural network
Konstantinos Sirlantzis, who showed us that there not much attention was paid to the synchronisation
was more than one way to teach a neural network of the threads we created. We always knew this
using backpropagation; an online method and a would have to be done but we thought it would be
batch method. Online means you backpropagate better to clean up the code and synchronise it when
every time data is processed through the network, the neural network was working, but since it never
whereas batch means you run all the training did work properly we decided to implement this
patterns and sum the errors to find an overall before we continued with the genetic algorithm to
average error and then backpropagate using this save any more work in the future.
average. We tried them both but still the network Two threads are created by our program, one of
failed to learn, more often than not converging to them being the PlayerBrain and the other
the average of the maximum error. containing the smaller brains and SensoryInput
After this failed we assumed there must be class. The main motivations behind synchronising
something wrong with the code we had written so the threads were two fold: Firstly we wanted to
we went through it line by line and calculated each make sure we only sent a maximum of three
weight update manually to make sure it was doing commands per cycle (this was the maximum
exactly what we wanted. A couple of mistakes number allowed by the server) and secondly we did
were found; we were applying the sigmoid function not want to be in a situation where we were reading
twice to the same value and there were brackets in and writing data on the same arrays. We had to
the wrong place, but other than these small typos create a semaphore to ensure that decide() - the
the mathematics worked as we planned and even method used to make the decisions - was blocked
after fixing these mistakes it still was not learning until told to run. Since sense body messages are
the patterns we presented. sent every 100ms (the same time as one cycle) we
We were introduced with the following agreed this would be used to call decide().
problem: Should we continue with the neural However, there was another problem in that see
network which was the backbone of the client messages were sent every 150ms, which meant that
programs we had written or should we consider a every third cycle we would have two messages
different learning algorithm? We decided to give being sent to us. Since we wanted see messages to
ourselves a deadline on the neural network and if it be loaded into the arrays first we had to call
still wasn’t working by this date we would be decide() and then sleep for 15ms and use a second
forced to consider some other implementation, semaphore to block until all the see parameters
likely to be some kind of genetic algorithm. were loaded. This would ensure that decisions were
In the remaining weeks of using the neural not being made on the previous cycle’s
network we went through our code with Dr. information.
Sirlantzis but he could see nothing wrong with how
it was implemented. In the backpropagation 5.5. Implementing Genetic Algorithms
algorithm there are many different variables we
could change. The learning rate for example is one The approach we took with the genetic
parameter used and we altered this from values algorithm was to design the player so he could
such as 0.001 to 2000, but this didn’t seem to have recognise different situations. In each situation he
any effect. We then tried changing the number of would have some actions, but the genetic algorithm
hidden layer neurons in the network from two to would be responsible for learning what value the
higher numbers because we thought that maybe action should take. For example, if the player is in
there were not enough routes through the network a situation where he must turn and then run, the
for all the combinations of patterns to be learnt. genetic algorithm does not actually change the fact
After weeks of tweaking and testing the that the player will turn and run, but it will
network simply was not working and we decided determine how fast and what direction the player
we had to change our approach dramatically and will take.
implement a rule based client that used a genetic The first step therefore was to write down all
algorithm to learn. the different situations we wanted our players to
recognise and then code them. This was different
for each type of player - midfielder, defender, etc
and proved quite a hard task in itself. We spent a

-6-
lot of time testing the code to make sure each on it, and felt that had we had more time we would
situation was being recognised correctly. It was have done a good job of it. However, we felt the
here that we noticed there was a problem. The time would be put to better use by tidying up and
player primarily uses what he can see to work out documenting the code we already had and
the situation he is in, but when the information is beginning the documentation.
loaded into the arrays, if something cannot be seen
this piece of information is not updated and 5.6. Optimisation Issues
therefore the array could hold old information. We
solved this problem by setting a default value for With cycle times being just 100ms it was
all the elements of -999. Therefore the player knew obviously important that our program executed
he could not see an object if the value stored for it everything it required and sent the messages to the
was -999. server within this time. Even though we changed
The second step was to implement a the learning algorithm the overall structure of our
mechanism that would ensure the players only sent program remained the same, with the smaller
a maximum of three commands per cycle. This was brains parsing the information and loading the
done by creating a linkedlist of length three and values into arrays which will be used to make
adding commands to this. The first command decisions. This meant that optimising the brains,
would be taken and sent to the server and the next especially the see brain which was used most often,
cycle the second command would be sent. This would save processing time whether we used a
ensured only one action command was sent per neural network or genetic algorithms.
cycle. The only time the player made a decision The main operation the see brain did was to
was when the arraylist of commands was empty, break the message up into substrings (see section
thus making sure we only queue a maximum of 5.1. of this document). However, this is time
three messages (there can only be one kick, turn, consuming because a lot of comparisons were
dash or turn_neck command per cycle, but say being made. The optimised version was almost
messages are allowed to be sent as well as any of rewritten from scratch – instead of breaking the
the previous ones). Another linkedlist of length string down into smaller and smaller substrings it
three was created for say messages. This meant that went through the message indexing the string. This
both action commands and say commands could be meant it only had to go through it once and index
sent in the same cycle without exceeding the where specific brackets were. Depending on where
number of messages per cycle the server allowed. these brackets were and how many parameters
The next stage in implementing the genetic were contained within meant we could identify
algorithm was to create an array, where each which objects the player could see and extract the
element refers to a specific dash, turn, kick or necessary information.
turn_neck value. In accordance with the process This new indexing of the see message instead
outlined in section 4.3.2. of this document we of breaking it down meant that the time it took to
would set all the values in the array to a valid, process one message went from an average of
random number. about 10ms down to an average of about 3ms.
Once this is complete we need to implement a We also went through the PlayerBrain to
scoring system. We planned to complete this by optimise the different loops that were required and
having a kind of “fantasy football” scoring system tidied up the code to make it as efficient as
where the coach would rate the players depending possible. After all these changes were made we
on how well they play. For example, ten points managed to decrease the total processing time,
would be awarded to a player who scores a goal, from when we initially get the messages to sending
two points for a completed pass etc. This way we our own messages to the server from 30ms to less
rate the fitness of each player and can now select than 1ms.
the two best players ready to produce the next
generation of players. 6. Conclusions
However, since the failure of the neural
network time was always against us. After we Obviously we did not achieve our major
coded the players to recognise the different objective for this project which was to implement a
situations they were in we made the decision to neural network so our players could learn about the
stop working on the genetic algorithm as we knew opposition and progressively improve their
we would not have time to implement, test and performance. We were very disappointed about this
then train it. This was disappointing because we because we spent such a long amount of time on
made good progress in the two months we worked

-7-
this but were unable to fully diagnose why the 7. Acknowledgements
network failed without risking missing the
deadline. Neither our supervisor nor Dr. Sirlantzis We would like to firstly thank our project
could find any problems so it remains a mystery. supervisor Colin Johnson for his continued input
Maybe we missed some vitally important detail or and support for the project. Colin helped in
didn’t present the training patterns in a way that numerous ways, including suggests about how we
allowed the network to learn. However, we did implement the neural network using arrays instead
choose a challenging project and one of the risks of classes, as well as providing us with the
we identified was not being able to implement the backpropagation algorithm. He was always
neural network, which unfortunately proved to be a available for meetings and discussed our problems
reality. until he was sure we understood how to overcome
Since we spent such a long time on the neural them. Additionally he was a tremendous help with
network we had left ourselves a tight schedule for respect to the genetic algorithm and the write up of
learning, implementing and training a genetic documentation, including this paper.
algorithm. As we progressed with the algorithm we We would also like to thank Dr. Konstantinos
began to accept the fact that the algorithm was not Sirlantzis for his much appreciated help while we
going to be fully implemented. It was frustrating struggled with the neural network. Kostas never
because we quickly learned how they worked and hesitated when we asked to meet with him and
strongly believed that had we chosen this approach discuss our implementation of the network, and he
over the neural network we would have made a was always available for further meetings and gave
really good job of it. As it turned out we had too us invaluable insights into the workings of neural
little time and only managed to implement the networks. Without his help we would not have
players recognising the situations they found understood our problems with the network as well
themselves in. as we did and he was genuinely disappointed when
Although we did not meet our aims for the we had to abandon the network for a different
project we don’t feel that it was a wasted effort. approach.
We all now fully understand neural networks and
genetic algorithms in some detail, and all learnt 8. References
from the experience. If we were to begin the
project again we would seriously consider [1] RoboCup Official Site
implementing a genetic algorithm from the start. http://www.robocup.org/
The message parsing and dealing of messages
would be kept the same but after progressing with [2] Neural Networks: Computerized Brains
both learning algorithms as far as possible we http://www.glencoe.com/norton/n-instructor-
thought we would have done a better job with the /updates/1999/8299-4.html
genetic algorithm. We also agreed that we should
have paid more attention to the synchronisation of [3] Introduction to Genetic Algorithms
the threads before we began working on the neural http://cs.felk.cvut.cz/~xobitko/ga/
network. A lot of time was spent on this after the
failure of the neural network and the time we [4] Introduction to Backpropagation Neural
would have saved by completing the task before Network
starting on the network would have been valuable http://www.geocities.com/neuralbug/neural_ne
as we began to reach the deadline. tworks.htm
As far as future work is concerned there are
various routes we could take. Finishing the genetic [5] Bentley. P. J., Digital Biology. Simon &
algorithm would obviously be the first thing to do, Schuster, 2001
but we would have liked to implement a more
useful coach. The coach we actually managed to [6] Coppin, Ben. Artificial Intelligence
implement was quite primitive but eventually he Illuminated, Jones & Bartlett, 2004
would have been necessary to record how well the
players were doing and so be able to choose the [7] Müller, Berndt. Neural Networks: An
players that could win the game (natural selection) Introduction, Berlin, 1990
in order to produce offspring.
[8] Barnes, D. Object Orientated Programming
with Java, Prentice Hall, 2000

-8-

S-ar putea să vă placă și