Sunteți pe pagina 1din 15

Genetic Algorithms New Tools for the Programmers Toolbox Lon Riesberg April 16, 2003

Table of Contents

1.

Introduction.......................................................................................................................2 1.1 What are Genetic Algorithms? ......................................................................................2 1.2 History.........................................................................................................................2 1.3 Current Significance ....................................................................................................2 2. How Genetic Algorithms Work ........................................................................................2 2.1 Overview .....................................................................................................................2 2.2 Implementation............................................................................................................3 2.2.1 Basic Recipe ............................................................................................................3 2.2.2 Considerations .........................................................................................................4 2.2.2.1 Encoding..........................................................................................................4 2.2.2.2 Selection Strategies ..........................................................................................5 2.2.2.3 Reproduction Strategies ...................................................................................6 2.2.2.4 Elitism .............................................................................................................7 2.2.2.5 Mutation ..........................................................................................................7 2.2.2.6 Population Size ................................................................................................7 3. Case Studies.......................................................................................................................7 3.1 Traveling Salesman Problems ......................................................................................7 3.2 Environmental Modeling and Simulation.....................................................................8 3.3 Machine Learning........................................................................................................9 4. Conclusion / Future Directions .......................................................................................10 4.1 Self Adapting Systems..................................................................................................10 4.2 Parallel Models..........................................................................................................11 4.3 Incorporating New Ideas From Genetics ....................................................................11 4.4 Open-Ended Encodings .............................................................................................11 4.5 Innovation .................................................................................................................11
Appendix A: Review Questions ...............................................................................................................11 Appendix B: Internet Resources...............................................................................................................12 Appendix C: Works Cited.........................................................................................................................13

1.

INTRODUCTION

1.1 What are Genetic Algorithms? In nature, evolution works by favoring individuals who are most fit for their particular environmental problems. In a population of rabbits, for instance, some rabbits will be fast and smart. Others will be slow and some, kind of dumb. The fast and smart ones will consistently be able to escape predators and will be able to have more offspring than the slow, dumb ones. Some slow, dumb rabbits will survive by luck, however, and will also reproduce, keeping their traits alive in the population. Every once in a while, nature throws in a mutation that may ultimately prove beneficial as well. The result of all this is a population thats faster and smarter on average than the initial population because being fast and smart were good solutions to the rabbits problems. This is the model that Genetic Algorithms are built from. Essentially, when a Genetic Algorithm (G.A.) is applied to a problem, the solutions are evolved [7]. 1.2 History As far back as the 1950s, there were computer scientists that studied the possibilities of evolving solutions to engineering problems. At the time, computers were new and not very powerful compared to what we have today. These early computer scientists were motivated to best take advantage of their underpowered machines. To do that, they imported the simple processes that nature uses to solve complex problems: evolution [8]. Initially, these computer scientists developed new strategies for evolutionary programming. In the early days of computer science, programming concepts and data structures werent well developed either. With evolutionary programming, the programs themselves evolved. Although related, evolutionary programming is not the same as Genetic Algorithms. With Genetic Algorithms, its the solutions that evolve [8]. Genetic Algorithms were invented by John Holland at the University of Michigan in the 1960s. Unlike his predecessors, Holland wasnt interested in solving engineering problems at all. His interest was to study adaptation as it occurs in nature. His 1975 book Adaptation in Natural and Artificial Systems essentially presented G.A.s as a tool for environmental modeling. This landmark book introduced the basic structures of Genetic Algorithms that are still used today [8].

1.3 Current Significance For a variety of reasons, the use of Genetic Algorithms has recently been gaining in popularity. [4, 5, 7, 8] Computers are increasingly powerful and increasingly well suited for G.A.s, while at the same time, these powerful computers are less and less expensive. In addition, problems facing computer scientists have become increasingly complex. Generally, Genetic Algorithms work well on complex problems. Some of the types of problems where G.A.s are proving useful include complex pattern recognition such as financial market analysis, artificial vision, data mining, and protein engineering. In these cases, Genetic Algorithms are used to search for solutions amongst a huge number of possibilities. Other important applications involve systems that need to evolve. This includes ecosystem simulation problems, games, and machine learning. Other problems require machines to be adaptive, such as robots that need to work in changing environments. And finally, Genetic Algorithms are proving useful in applications that are just too difficult to program by hand, such as artificial intelligence problems. In these cases, the algorithms allow the intelligence itself to evolve [8]. 2. HOW GENETIC ALGORITHMS WORK

2.1 Overview Genetic algorithms operate on populations of candidate solutions. Generally, candidate solutions are binary bit strings but they can also be alphanumeric strings, matrices or other objects. Because of the variety of options, and to stay consistent with the terms used in biology, candidate solutions are commonly referred to as chromosomes. Each chromosome represents some underlying set of parameters for a given problem. The chromosomes that are evaluated to represent the best solutions are chosen to share information in a reproduction process that results in new chromosomes. Since its modeled from the natural world, its easy to view the chromosomes as simplistic representations of DNA strands. The new, better chromosomes replace those with poorer solutions. In this way, each new generation brings the entire population closer to the desired solution. This continues for multiple generations, perhaps hundreds or even thousands, depending on the problem. Mutations and different combining strategies ensure that a wide variety of solutions are evaluated [1, 7, 8, 12].

None of this is complex. The operations on the chromosomes involve nothing more than random number generation, string copying, and basic arithmetic. However, the problem solving ability of this strategy is impressive. Considering that this is essentially how natural selection has evolved a huge variety of complex life forms, it shouldnt be surprising that such simple concepts can be very powerful [1, 8, 12]. Heres a simple example. Begin with some problem that has no obvious approach to efficiently finding a worthwhile answer. The Traveling Salesman Problem discussed in section 3.1 is an excellent example to work with because, conceptually, this problem is very simple and yet there are an enormous number of possibilities. In this problem, a traveling salesman must visit every city in his territory exactly once and then return to his starting point. Because of the cost of travel, he must cover the least distance possible. This problem gets progressively more difficult with increasing numbers of cities. Since the number of possible solutions is n!, where n is the number of cities, a problem involving just a few thousand cities would take considerable computer time to evaluate every possibility [7]. The possible solutions to the problem exist in whats called the problems search space. This gives us a place to start. Each point in the search space represents one possible solution. Searching for an answer begins by creating a somewhat random population of possibilities and then evaluating them. The population may consist of hundreds, or even thousands of chromosomes, each representing a single possibility in the search space. In this case, evaluation is simply considering the total distance traveled through a given sequence of cities. Once evaluated, the chromosomes that represent the best solutions are chosen to reproduce because of their fitness to solving the problem. Reproducing involves various methods of exchanging information between chromosomes to create new chromosomes, called the offspring. Random mutations are applied next and ensure that the algorithm doesnt get stuck on a particular solution. Finally, the mutated offspring get tested for their fitness to the problem and the process continues. With each successive generation, the solutions that the chromosomes represent become better and better. Very good solutions can be found this way in just a fraction of the time required to examine every possibility. Granted, very good isnt perfect, but in

many applications, the computer time and associated costs that are required to find that perfect answer isnt necessary or practical [11]. 2.2 Implementation

2.2.1 Basic Recipe Although a thorough understanding of how Genetic Algorithms work may be required for complex applications, generally, GAs are simple enough to be used by most programmers. As Charles L. Karr and L. Michael Freeman point out in Industrial Applications of Genetic Algorithms, GAs have made their way into the practicing engineers toolbox. Indeed, their text is a presentation of 17 industrial-scale G.A. projects by first year graduate students. The projects include titles such as Data Mining Using Genetic Algorithms, Optimized Non-Coplanar Orbital Transfers using Genetic Algorithms, Simulation of an Artificial Ecosystem, Learning Classifier Systems, Software Test Data Generation From a Genetic Algorithm, etc, etc, etc. These students were able to implement these projects during the course of a single semester. The lesson? Implementing Genetic Algorithms is not difficult [2]. Heres a basic structure for working with a Genetic Algorithm [1]: // start with an initial time t := 0; // initialize a random population of individuals initpopulation P (t); // evaluate fitness of all initial individuals evaluate P (t); // test for termination criterion (time, fitness, etc.) while not done do // increase the time counter t := t + 1; // select a sub-population for offspring P' := selectparents P (t); // recombine the "genes" of selected parents recombine P' (t); // perturb the mated population mutate P' (t); // evaluate it's new fitness evaluate P' (t);

// select the survivors from actual fitness P := survive P,P' (t); od

end GA.
2.2.2 Considerations The basic structure outlined above is just that... a basic structure. There are many things that can be implemented differently, depending on the problem. For instance, chromosome encoding, selection strategies, reproduction strategies, and even population size all need to be considered and thoughtfully implemented. Furthermore, the possible variations can be complex if necessary. For instance, with the Traveling Salesman Problem, there are dozens of web sites, a few textbooks, and hundreds of articles that focus specifically on solving this problem using various G.A. strategies [11]. Thats not intended to make Genetic Algorithms intimidating, however. The Traveling Salesman Problem is somewhat of a classic problem and is commonly used for studying G.A.s in general. As mentioned previously, the basic structure of Genetic Algorithms is fairly easy to grasp and implement. Understanding the complexities involved in working with G.A.s, however, offers a deeper appreciation for both their power and flexibility. 2.2.2.1 Encoding The chromosomes should be encoded in a way that offers information about the solutions that they represent. Bit strings are the most commonly used types of chromosomes because of their simplicity. Additionally, as Melanie Mitchell points out, the original G.A. research conducted by Holland and his students focused on bit strings and subsequent research has tended to follow that lead [8]. This is an important observation because most of the existing research applies to fixed length binary encodings and research on other types of encodings has not been well developed. Thats in spite of the fact that bit strings dont offer the variety of options that some problems may require. For some problems, it may be more natural to build chromosomes from an assortment of characters and decimal digits. Other problems may be best suited for matrices or even objects that contain multiple data members. Here are a few of the most common encoding schemes used for solving the Traveling Salesman Problem: The first example uses a typical binary bit string representing n cities. In this representation, each string is encoded as [log2 n] bits. A string then, is

n[log2 n] bits [11]. A sequence of 6 cities [1 2 3 4 5 6] could be represented like this: [001 010 011 100 101 110]. Each city is represented in the bit string in its corresponding location from the sequence. For instance, cities 1 and 2 are next to each other, both in sequence and in the bit string. Note that in this case, there are a couple of 3-bit strings that do not correspond to any cities: 000 111. This becomes important when performing reproduction and mutation functions because, potentially, offspring may not actually represent a possible solution. Thus, some kind of check or repair mechanism will be required at a later step [11]. The same problem can also be represented using a decimal string as follows: [2 1 3 6 4 5]. In this example, the sequence would become the following string: 213645. This is a more natural way to consider the sequence, simply because the decimal digits are generally more intuitive for people. However, using decimal digits requires more complicated techniques for selection and reproduction than the binary strings [11]. Another possible encoding strategy uses a matrix. Theres been research on a few different matrix encodings but heres a simple one:

0 1 1 0

0 0 0 0

0 1 0 0

1 1 1 0

In this example, the element in row i and column j is a 1 if and only if the city i has been visited before city j in the sequence. Thus, the matrix above represents the sequence 2 3 1 4. Obviously, this technique isnt nearly as intuitive as the other examples mentioned. Furthermore, selection, reproduction, and mutation strategies are also more complex. However, the Traveling Salesman Problem is commonly used to do research in general on Genetic Algorithms and matrix encoding may prove worthwhile in other applications [11]. Yet another strategy would be implemented in an Object Oriented Programming language, such as C++. In this strategy, start with a group of City objects. These objects could include data members that offer information about the objects physical locations as well as pointers to the next city in the sequence. For instance, data members may include latitude, longitude, and a nextCity pointer.

These are just simple examples. There are dozens, if not hundreds, of significant encoding possibilities for tackling the Traveling Salesman Problem alone. Whats the best way to decide on a strategy? Significant research has attempted to answer this question with simple rules. Generally, however, thats been unsuccessful. Lawrence Davis, a researcher whose expertise is applying G.A.s to realworld problems, advocates simplicity. Davis approach is to use whatever encoding seems most natural for a particular problem and then devise a G.A. that can work with that encoding [8]. Furthermore, as Melanie Mitchell points out, most research is currently done by guessing at an appropriate encoding and then trying out a particular version of the G.A. on it [8]. This tends to work because, fortunately, G.A.s are very forgiving. While one technique may provide more efficiency than another, choosing the less efficient strategy doesnt necessarily mean the G.A. wont work. As long as the encoding can be related back to the problem, and the solution it represents correctly analyzed, the encoding should be functional [1, 8]. 2.2.2.2 Selection Strategies Choosing the chromosomes that will either survive into the next generation, recombine to create offspring, or be discarded is one of the most important functions of a Genetic Algorithm. Without a good selection strategy, the G.A. may wander away from its target problem and create random chromosomes that have no bearing on the problem whatsoever. Equally inadequate is a population that converges on a candidate solution too quickly and misses possibilities that would work better [5]. There are three broad classifications of commonly used selection strategies [5]: 1) Objective, where an evaluation function determines a chromosomes fitness. 2) Subjective, which allows for human interaction and lets humans make the final call. 3) Co-evolved, where multiple populations compete with each other in a predator/prey type of relationship. Each of these three strategies has unique goals and applications. 2.2.2.2.1 Objective Selection Objective selection strategies are generally the most commonly used. The goal of objective selection is to create the majority of offspring with information from the chromosomes that are evaluated to be the

best. There are three significant objective selection strategies: roulette wheels, tournaments, and truncation [1, 8, 4, 12]. Roulette wheel selection can best be understood by imagining a wheel with slots that represent chromosomes with various solutions. The slots are sized according to how good those solutions are. Better solutions get bigger slots:

Chromosome 1 Chromosome 2 Chromosome 3 Chromosome 4

Imagine a marble being thrown into the roulette wheel. The chromosome is selected where the marble stops. Clearly, the chromosomes with bigger fitness values will be selected more times [12]. Tournament selection takes a certain number of chromosomes, typically two or three, and simply chooses those chromosomes with the best solutions from each tournament. The tournaments continue until an entire population has been evaluated [5]. Truncation selection makes copies of those chromosomes with the best solutions by making n copies of each of the 1/n population. For instance, if n is chosen to be 2, the best of the population has 2 copies each made. If n is chosen to be 3, the best 1/3 of the population has 3 copies made. If n is chosen to be 4, the best of the population has 4 copies made. In this way, more copies are made of the chromosomes with better solutions [5].

2.2.2.2.2 Subjective Selection Subjective selection allows humans to make the selections. This technique offers some very interesting possibilities for certain problem types.

For instance, New Mexico State University has been experimenting with a type of artist rendering program that relies specifically on G.A.s utilizing subjective selection. In their Faceprints program, a witness to a crime is presented with a population of faces. In the program, each face is represented as a bit string. Five bits represent 32 different forehead types. Another five bits represent 32 different chins. Other bits represent cheekbones, eyebrows, noses, ears, beards, etc, etc. The witness chooses the individuals in the population that look most similar to some suspect. The G.A. then uses those individuals to create its offspring population and the witness is again asked to choose the best candidates. This continues for several populations. In this way, witnesses have been able to compile much more accurate sketches of suspects and in much less time than with traditional methods that use an artist or Mylar overlays [5]. Why? Quite simply because people are able to recognize facial features much better than theyre able to describe those features to an artist [5]. Already, research using the Faceprints program has also been used to examine the sociology of beauty [6]. In the future, this type of selection strategy may very well be used to help create art and music. 2.2.2.2.3 Co-evolved Selection In this arrangement, predator chromosomes determine which prey chromosomes will survive. This works particularly well in modeling and simulation problems where competing populations actually exist in the world being modeled. Predator chromosomes may also be invented for an application simply as a mechanism to select out the least fit chromosomes of a given problem [8].

suited for this. Crossover is the name given to this process in biology and, like many other terms, has been adopted for use in Genetic Algorithms [1, 8, 12]. The information exchange happens at a place determined by the programmer, called the crossover point. In the example below, the crossover point happens after the fifth bit of a binary bit string. Offspring 1 gets the first 5 bits from String 1 and the rest of its bits from the corresponding bits in String 2. Similarly, Offspring 2 gets the first 5 bits from String 2 and the rest of its bits from those corresponding in String 1 [12]:

String 1 String 2

10001 | 00100100110 01010 | 11000011110

Offspring 1 10001 | 11000011110 Offspring 2 01010 | 00100100110

This type of crossover strategy is called single point crossover. Although simple, this technique has some shortcomings. For starters, long strings will inevitably lose vital information from the ends. Also, single point crossover tends to keep short sections of the strings intact and leads to the preservation of hitchhikers. These are bits that arent part of the desired schema but, because theyre close on the string, hitchhike along with the desired schema as it reproduces [8]. Thus, for many situations, two-point crossover may offer better results. With two-point crossover, information exchange happens across two points rather than one. Often, the points are chosen randomly to help prevent important information from being lost on the ends [8]. Unfortunately, both of these techniques suffer from an inability to combine strings in ways that test all desired possibilities. For instance, the string [11*****1] and [****11**] cant be combined using single point crossover to create the string [11**11*1]. Similar situations occur using two-point crossover with long string lengths. Because of this, another technique has been developed, Uniform Crossover [8]. Uniform Crossover randomly copies bits from either the first or second parent into the offspring. This ensures that all possible schemas could be tested as offspring. Additionally, hitchhikers cant develop.

2.2.2.3 Reproduction Strategies While important, selection operations on their own will not result in good solutions. A population that is subjected to repeated selections and nothing else would eventually result in a population of identical chromosomes. The population wont improve because selection operations dont offer mechanisms for creating anything new. Selection strictly ensures that the fittest chromosomes survive. Whats needed now is a mechanism to create new chromosomes, using the information from those that have been selected [5, 8]. A simple strategy is to mate the selected chromosomes randomly. Combining them through some reproduction process results in a sharing of information during the creation of offspring solutions. Like DNA strands, strings are very well

The problem here is that valuable schemas cant be preserved. Theyre often broken up during the random selection process [8]. Given all this, how do you choose the most effective crossover strategy? Again, there are no concrete answers. As Melanie Mitchell puts it, the success or failure of a particular crossover strategy depends in complicated ways on the interaction of the encoding, selection strategy, crossover, and other details of the G.A. [8]. As with some of the other parameters, perhaps the most interesting possibilities will result from the G.A. evolving its own crossover strategy. 2.2.2.4 Elitism A challenge with any crossover strategy is that the selected chromosomes are always replaced by their offspring. This creates the undesired possibility that the best solution may be lost as well. A simple technique to avoid this is called elitism. Elitism automatically allows the best few solutions of a given generation to survive into the next generation. In this way, the best solution is never lost unless replaced by one thats even better [1, 8, 12]. 2.2.2.5 Mutation Random mutations are then applied to ensure that the population doesnt get so focused that it misses possible solutions. In nature, this has proven to be a very important process and results in possibilities that probably wouldnt be considered otherwise. In Genetic Algorithms, mutations are simply random alterations of a few pieces of data in the chromosomes. In the example below, for instance, the bits in red were chosen randomly and simply flipped [12]. Offspring 1 0101000100100110

answers. Again, what works for a given implementation depends in complicated ways on the interaction of all the G.A. parameters. Often, the best approach is to start with a generic population size of 50 to 100 and then adjust things from there, depending on the G.A.s performance [8]. Some of the more interesting research involves a population that changes in size. In some cases, the research has focused on evolving an optimum population size within the G.A. itself. Other research has focused on using one G.A. specifically to evolve the parameters of another. A Genetic Algorithm that evolves its own parameters is on the cutting edge of G.A. research and is certain to offer interesting possibilities [8]. 3. CASE STUDIES

3.1 Traveling Salesman Problems The Traveling Salesman Problem (TSP) discussed earlier is a classic problem in computer science and is referred to by some as the mother of all problems [7]. This problem is HUGE in computer science and there are dozens of web sites, a few textbooks, and hundreds of articles that focus specifically on TSPs. Not surprisingly, many are working with Genetic Algorithms [3, 7, 11]. The Traveling Salesman Problem is considered to be an NP hard problem. NP hard problems are characterized as difficult problems that cannot be solved in traditional ways. NP stands for nondeterministic polynomial and it means that it is possible to guess the solution and then check it. If a computer could be programmed to guess well, a reasonable solution could be found in a reasonable amount of time. Genetic Algorithms are very well suited for this [12]. There are numerous applications that the TSP can be directly applied to. Transportation routing problems are obvious. Other problems have more subtle similarities but are equally well suited to TSP type solutions. For instance, the scheduling of a machine to drill holes in a circuit board can use a TSP solution where the holes to be drilled are the cities. In this case, the cost of travel is the time it takes to move the drill head from one hole to the next [3]. An even better example is IC semiconductor fabrication. VLSI semiconductors may require as many as 1.2 million points that need to be connected with as little distance between them as possible in order to maximize the chips efficiency [3, 7]. Yet another application is protein engineering. In this case, a

Mutated offspring 1 0101000100000111 Next, these mutated offspring get tested for their fitness to the problem and the process continues. 2.2.2.6 Population Size Although not as obvious as some of the other parameters, population size is important as well. If the population size is too small, the G.A. may converge on possible solutions too quickly. If its too large, the G.A. may waste computer resources and the waiting time for improvement may take too long. As with the other parameters, theres been significant research about the dynamics of altering population sizes. And, like the other parameters, there are no set

G.A. can be used to construct complicated protein structures given a set of rules about the interactions of amino acids [8]. There are other important reasons that G.A.s have been widely used for solving TSPs as well. To begin with, the problem is easy to understand and a variety of encoding strategies are easy to intuit. This simplifies the overall workings of the G.A., which helps the programmer experiment with the other parameters. Additionally, the search space for a TSP is potentially enormous. Unlike many other strategies, G.A.s work very well at finding increasingly better solutions within a large search space. And, since the TSP has been studied extensively, its become a natural test problem to evaluate and create new G.A. implementations. In the following example, for instance, a novel TSP strategy results in an interesting G.A. implementation that has new powers and flexibilities. In this example, a G.A. is using a typical encoding strategy that considers all possible city sequences as its search space. The chromosome length is hard-wired by the program, depending on the number of cities. Even though this is a typical approach, however, its inefficient. City pairs that arent even close to each other geographically are part of the same chromosomes to begin with and thus can be considered as adjacent possibilities in the search space (highlighted pairs).

search space. Once each region has been optimized, another G.A. could optimize the integration of the regions. Coordinating the edges would be an issue because cities close to the edges of adjacent regions may be best connected with each other rather than confined to their own particular regions. Research on this type of TSP strategy has shown that these problems can be fixed with simple repair mechanisms [15]. Extrapolating downward, this idea could start on very small regions and the final chromosomes would be constructed by combining very small building blocks. Essentially, this is what people do when solving this problem: solve a small region at a time. Narrow the regions down to a size that includes just a few cities and the whole thing could be implemented as a recursive G.A. [15]. In this case, the final length of the chromosomes depend on the number of building blocks that the G.A. itself discovers when solving the problem, not by the predefined decisions of a programmer. Thus, in an effort to solve the TSP more efficiently, a new G.A. strategy has also been born. There are a few good websites that graphically demonstrate Genetic Algorithms at work on a Traveling Salesman Problem. This is one of the better ones: http://www.coyotegulch.com/evojava/Traveller.html This Java example offers a visual glimpse into the power of using G.A.s by allowing the user to alter various factors such as the number of cities, population size, mutation rate, and crossover strategy.

3.2 Environmental Modeling and Simulation Given that John Holland initially invented Genetic Algorithms to study natural evolution through a computer model, it shouldnt be surprising that another major application for G.A.s is environmental modeling and simulation. Especially with the growth of research involving co-evolutionary systems, modeling and simulation applications are a rapidly growing use of G.A.s [8]. Traditionally, there have been four major approaches for studying evolution in natural ecosystems [8]: 1) Examine the fossil record. 2) Study existing systems in their natural habitats. 3) Perform laboratory experiments on simple species where evolution happens quickly and can be studied and controlled (e.g. fruit flies). 4) Study evolution at the molecular level by looking at how DNA changes over time.

The problem with this approach is that the G.A. is attempting to find the final solution all at once. A different approach is to break the problem into subproblems. The geographical area could be broken up into regions with each region representing a different

Each of these approaches has obvious benefits and limitations. For instance, the fossil record offers a glimpse into the past but it isnt complete and is often hard to interpret. Studying systems in their natural habitats can be useful for understanding those systems, but doing controlled experiments on them can be difficult, if not impossible. Fruit fly studies have been very helpful for understanding the workings of DNA, but the questions that can be asked of fruit fly research is limited. Similarly, research on DNA is often restricted by the current understanding of DNA encodings and interactions [8]. While not a panacea, computer simulation can be a worthwhile supplement to traditional methods of studying natural systems. Computer simulations allow the ability to study the evolution of populations over millions of generations relatively quickly. Simulation allows experiments to be tightly controlled. Simulations can be repeated to see how changing the parameters affect the outcome. And finally, simulations based on co-evolutionary G.A.s offer the ability to study the interactions between multiple populations. In co-evolutionary G.A.s, multiple populations develop simultaneously in the same workspace. The selection criteria for one population may depend on the state of another population and vice versa. There may be two populations or multiple populations. These populations may compete with each other, depend on each other, or even parasitize each other. Obviously, this added complexity is important for more accurate and flexible modeling [7, 8]. One of the more ambitious implementations of an environmental simulation problem is a program called Echo. Fittingly, Echos founder was John Holland and, while it was first implemented in 1975, it was re-implemented using newer technologies in the mid-90s [8]. Echo isnt intended to simulate any particular ecosystem but rather to explore general properties that are common to all ecosystems. Its world consists of resources and several species that interact and even learn from each other. Individuals of each species can interact in three different ways: combat, trade, and mating (Melanie Mitchell points out that these are also the three essential elements of a good marriage! [8]). What each decides to do depends on its own internal rules, the outward appearance of the other, and the information learned from previous encounters. Rules regarding resource availability, reproduction, and death are based on natural systems to help make the results meaningful.

Although simplistic, experiments with Echo have revealed insights into the development of food webs, conditions for extinction, and the evolution of symbiotic communities [8]. In addition to the insights offered by Echo, simulations have also been able to shed light on other evolutionary questions. For instance, how does learning affect the evolution of a species? How does sexual selection affect evolution? How do the interactions of different species in a system affect their population densities over time? How do changes in a system affect the availability of resources? These types of insights are valuable to help understand evolutionary challenges in the real world and may help to enable more intelligent decisions about the management of real world problems [8]. 3.3 Machine Learning Machine learning depends on computer programs that are able to construct new knowledge and build upon knowledge already gained. Genetic Algorithms are proving to be very useful for a variety of these types of applications. For instance, G.A.s have been used for robot navigation systems, weather prediction, detecting credit card fraud, data mining, game playing, and even the evolution of new learning systems [4, 9, 10]. The key ingredient for using G.A.s in machine learning is the development of classifier systems. These allow the learning of simple rules that guide the machine in an arbitrary environment [1, 4]. A classifier system gets its name from its ability to learn to classify messages from the environment into general sets. Classifier systems are similar to control systems in many respects. While a control system uses feedback to control or adapt its output for an environment, a classifier system uses feedback to "teach" or "adapt" its classifiers for an environment [13]. There are four main components to a classifier system [1, 7]: Rule system Message system An evaluation system Genetic algorithm for reproduction and evolution Generally, rules used by classifier systems are encoded in strings. For instance, a simple bit string will have rules that correspond to the position of various bits. A 1 in bit position 3 may refer to a

rule that directs the computer to respond in a certain way, given situation 3. Although 0s and 1s on their own dont offer much flexibility, the rules could just as easily be represented by 3, 4, 5 bits, or more, depending on whats needed by the particular application. 5 bits, for instance, would allow the computer to respond to a given input in 32 different ways. A message system converts incoming information from the environment into a code that the computer can use. Generally, the information comes through various kinds of sensors. An evaluation system assigns weights to the messages and corresponding classifier strings. As messages are received from the environment, all of the classifiers that correspond with at least one of the messages compete, by submitting a bid in an auction to determine a victorious classifier [7]. The victorious one gets to effect the environment. This approach is one of the most common ways of evaluating classifiers and is commonly referred to as the Michigan Approach because it was developed at the University of Michigan. The victorious classifier's effect on its environment is noted as either beneficial or detrimental. With this feedback, a credit system appropriately uses reinforcement or punishment to either increase or decrease the strength of the classifier that caused the modifications [13].

Where is all this going? According to Tom Mitchell, one of the leading experts on machine learning, there are many interesting developments we can expect from all this. In the past decade alone, machine learning has evolved from a field of simple laboratory demonstrations to a field of significant commercial value [10]. In the future, we can expect computers to be able to learn from medical records to discover emerging trends in the spread and treatment of new diseases, houses that learn from experience to optimize energy costs based on the usage patterns of their occupants, personal software assistants that learn the evolving interests of their users, speech recognition systems that learn to recognize subtle differences between various people, facial recognition systems, and perhaps most interesting of all, lifelong learning systems that would enable a computer to apply seemingly unrelated information that was learned previously to new problems [10]. Granted, this level of application has been envisioned since the beginning of evolutionary computing in the 1950s and is still a ways off. However, with the ongoing advances in computer power and algorithms, its certain that development of these applications will become faster and faster as the computers themselves are used to evolve their own successors. 4. CONCLUSION / FUTURE DIRECTIONS There are a large variety of problems that Genetic Algorithms are well suited for. And, since the problems facing computer scientists have become increasingly complex while at the same time computers are increasingly powerful and well suited for G.A.s, its certain that Genetic Algorithms will continue to gain significance. Even though Genetic Algorithms have been in existence since the mid70s, by no means is their development near an end. On the contrary, Genetic Algorithms are still far from being fully understood. Even so, as Karr and Freeman point out, G.A.s have already become an important part of the engineers toolbox [2]. Because of that, theres a lot of interesting and exciting research being explored. Here are a few of the most interesting areas with open questions: 4.1 Self Adapting Systems As previously shown, deciding how to construct a G.A.s parameters is not always straightforward. String encoding, crossover strategies, selection strategies, and even population sizes all use fixed representations that need to be thoughtfully implemented. Furthermore, there arent fixed rules to help make that happen. A very appealing idea is to make use of the evolutionary nature of G.A.s and

[13] Next, a Genetic Algorithm is applied to the classifier population to evolve it to be more effective given the recent developments. This is the key component for learning. Without the ability to evolve the population, the classifier system is nothing more than a simple control mechanism, such as a thermostat [1, 13].

10

have the parameters themselves adapt so the G.A. can make better use of them [5, 8]. The TSP example in Section 3.1, for instance, demonstrates one possibility for the beginnings of a G.A. that can evolve its own parameters. In this case, the G.A. discovers for itself the appropriate length of a chromosome. Furthermore, the model suggests that additional regions could be added and the chromosome length would continue to evolve. After all, this is what nature did with DNA. In DNA strands, not only did the encoding evolve, but the length of the strands themselves evolved as well. As David Goldberg puts it, nature did not start with strings of length 5.9 x 109 (an estimate of the number of pairs of DNA nucleotides in the human genome) and try to make man. Instead, simple life forms gave way to more complex life forms, with the building blocks learned at earlier times used and reused to good effect along the way [4]. This type of research has yet to be taken on with any significance for all of the G.A. parameters. Certainly, however, there are interesting possibilities here for the future of Genetic Algorithms. As Michalewicz points out, One of the most promising research areas is based on the inclusion of self adapting mechanisms within the system itself; after all, the power of evolutionary algorithms lies in their adaptiveness [7].

It is these types of abilities that make natural genes complex and extremely adaptive. Enabling G.A.s with these type of adaptive possibilities will become more and more important as G.A.s are used for increasingly complex problems [8]. 4.4 Open-Ended Encodings In nature, evolution not only acts on the fitnesses of organisms, but on their genetic encoding schemes as well. This gives DNA strands several interesting features for us to learn from. For instance, DNA is open-ended which allows it to increase in size and complexity. DNA utilizes encapsulation, which protects an important part of an encoding from genetic alterations. And DNA uses an organizational structure that facilitates internal control of these mechanisms. Since Genetic Algorithms are built on the basic processes of DNA, these will become interesting and certainly powerful new additions to the evolution of G.A.s [8]. 4.5 Innovation In his book, The Design of Innovation, David Goldberg makes an interesting analogy between Genetic Algorithms and human innovation. Goldberg defines human innovation as the ability to combine good ideas in unobvious ways. Furthermore, the resulting ideas must be recognized and selected for having some value. In many respects, this is how Genetic Algorithms operate: select the best possibilities, combine them, and then evaluate them again [5]. Indeed, a recent article in Scientific American testifies to the innovative potential of computers. In Evolving Inventions, computer algorithms similar to Genetic Algorithms are reportedly creating new inventions that are worthy of patents. So far, these inventions have largely been restricted to electronic circuits but the concept of computer innovation is clearly viable [14]. For this type of computational creativity to further develop, more research is needed that targets innovation strategies specifically.

4.2 Parallel Models Genetic Algorithms are very well suited to parallel processing but research so far has been limited. Massively parallel G.A.s could allow each individual in a population to have its own processor. This would create numerous possibilities for different crossover and selection strategies, co-evolutionary models, self-adapting systems, and parallel islands where several subpopulations are allowed to evolve simultaneously. Although some work has been done with parallel G.A.s, generally parallel models are not well understood [7, 8]. 4.3 Incorporating New Ideas From Genetics Current G.A. models use only the bare bones of whats known about natural genetics. Concepts such as dominance, sexual differentiation, and inversion likely play important roles in nature and have just started to be studied with G.A. implementations. Even more significant is the recent work with genetic regulation. Genetic regulation is the ability of genes to turn each other off and on so that only the appropriate genes for a given situation are expressed.

11

Appendix A: Review Questions


1) Name four of the five important parameters that need to be thoughtfully implemented when working with a Genetic Algorithm. Answer - any four of the following: selection, mutation, population size, crossover, encoding. 2) Describe a Traveling Salesman Problem. Why is this type of problem well suited for a G.A? Answer - There are several important reasons that G.A.s have been widely used for solving TSPs: The search space for a TSP is potentially enormous and, unlike many other strategies, G.A.s work very well at finding increasingly better solutions within a large search space. The TSP has been studied extensively and its become a natural test problem to evaluate the functioning of new G.A. strategies. TSPs are considered to be NP hard which means that its possible to guess the solution and then check it. If a computer could be programmed to guess well, a reasonable solution could be found in a reasonable amount of time. Genetic Algorithms are very well suited for this. The problem is easy to understand and a variety of encoding strategies are easy to intuit. 3) Describe three other types of applications that G.A.'s are well suited for. Answer any three of the following: Complex pattern recognition such as financial market analysis, artificial vision, data mining, and protein engineering in which case, Genetic Algorithms are used to search for solutions amongst a huge number of possibilities. Ecosystem simulation problems, games, and machine learning where G.A.s help systems to evolve. Adaptive applications, such as robots that need to work in changing environments. Artificial intelligence problems where the algorithms allow the intelligence itself to evolve. 4) Describe the steps that a Genetic Algorithm makes when evolving a solution to a problem? Answer: initial population creation selection reproduction/recombination mutation select again from the new population. Continue until some parameter has been met, such as a good enough answer, time limit, etc. 5) During the reproduction process, the fittest chromosomes are broken up when combined with others. Thus, the best solution may be lost. Explain a strategy to avoid this. Answer: Elitism solves this by ensuring that the most fit chromosomes survive into the next generation. 6) Extra credit... Nature chose to mate pairs. Why? Why not mate 3 individuals? Or 4? Or 10? How does this relate to Genetic Algorithms? Can you imagine having to find 2 or 3 or 9 mates before you could reproduce? It's hard enough just finding one! Nature chose pairs because it's efficient for reproduction. Having to find multiple partners would slow down reproduction, and thus natural selection, considerably. In their quest to copy nature's successes, computer scientists have adopted the entire evolutionary model into G.A.'s... Eventually, that may prove to be unwise because computer models have different constraints and possibilities than the social models that work in nature. In a computer, it may very well be more efficient to mate 10 fit individuals instead of just 2... (Obviously, this isn't discussed in the paper. Nor is it something I found reference to in any of the literature. But it's an obvious thought experiment).

12

Appendix B: Internet Resources

Information Gateways for Genetic Algorithm Research http://www-illigal.ge.uiuc.edu/index.php3 - Illinois Genetic Algorithms Laboratory Homepage. Lots of resources. http://www.aic.nrl.navy.mil/galist/ - The Navy Center for Applied Research in Artificial Intelligence. Extensive. http://www-xdiv.lanl.gov/XCM/research/genalg/ga.html- Los Alamos Genetic Algorithms Niche Homepage.

Worthwhile Java Examples http://www.coyotegulch.com/evojava/Traveller.html - good visual example of the Traveling Salesman Problem. http://www.rennard.org/alife/english/gavintrgb.html - the Genetic Algorithm Viewer demonstrates how genetic algorithms can be used for finding patterns. http://www.aridolan.com/ga/gaa/gaa.html#TspDemo - The GA Playground. A general Genetic Algorithm toolkit implemented in Java, for experimenting with genetic algorithms and handling optimization problems.

C++ Implementations Class Library Download: http://lancet.mit.edu/ga/ - sponsored by MIT with thorough documentation for implementing the library. http://www.mathtools.net/C++/Genetic_algorithms/ - resources for using Genetic Algorithms to solve engineering problems in C++

Online Courses Genetic Algorithm introductory semester course hosted by the University of Illinois Genetic Algorithm Laboratory. Taught by David E. Goldberg, a leading expert on Genetic Algorithms. <http://online.engr.uiuc.edu/webcourses/ge485/index.html?intro.html&2>

13

Appendix C: Works Cited


1. Beasley, David and Joerg Heitkoetter. The Hitch-Hiker's Guide to Evolutionary Computation. Carnegie Mellon University School of Computer Science. March, 1995. <http://www-2.cs.cmu.edu/Groups/AI/html/faqs/ai/genetic/part1/faq-doc-0.html>. Carr, Charles L. and Michael Freeman. Industrial Applications of Genetic Algorithms. Washington D.C.: CRC Press, 1999. Cook, William and David Applegate, Robert Bixby, Vaek Chvtal. Solving Traveling Salesman Problems Homepage. Princeton University. May, 2002. <http://www.math.princeton.edu/tsp/apps/index.html>. David E. Goldberg. Genetic Algorithms in Search, Optimization and Machine Learning. Boston, MA: Addison-Wesley Longman Publishing Co., Inc., 1989. Goldberg, David E. The Design of Innovation: Lessons From and for Competent Genetic Algorithms. Boston, MA: Kluwer Academic Publishers, 2002. Johnston, Victor. FacePrints homepage. New Mexico State University. April, 2003. <http://www-psych.nmsu.edu/~vic/faceprints/>. Michalewicz, Zbigniew. Genetic Algorithms + Data Structures = Evolution. Berlin, Germany: Springer-Verlag, 1996. Mitchell, Melanie. An Introduction to Genetic Algorithms. MIT Press, 1997. Mitchell, Tom M. Machine Learning. Boston: McGraw-Hill, 1997.

2.

3.

4.

5.

6.

7.

8. 9.

10. Mitchell, Tom M. Does Machine Learning Really Work? Artificial Intelligence Magazine. Fall, 1997. 11. Murga I., R.H. and P. Larra~naga, C.M.H. Kuijpers, Inza and S. Dizdarevic. Genetic Algorithms for the Travelling Salesman Problem: A Review of Representations and Operators. Artificial Intelligence Review. 13.2 (April, 1999): 129 170. 12. Obitko, Marek. An Introduction to Genetic Algorithms with Java Applets Home Page. University of Applied Sciences, Dresden. 1998. <http://cs.felk.cvut.cz/~xobitko/ga/>. 13. Richards A., Robert. Zeroth-Order Shape Optimization Utilizing a Learning Classifier System. Stanford University. 1995. <http://www.stanford.edu/~buc/SPHINcsX/book.htm#RTFToC0>. 14. Streeter, Matthew J. and John R. Koza, Martin A. Keane. Evolving Inventions. Scientific American. February, 2003: 52-59. 15. Valenzuela, Christine L. and Antonia J. Jones. Evolutionary Divide and Conquer: A Novel Genetic Approach to the TSP. Evolutionary Computation. 1(4), 313-333, 1994.

14

S-ar putea să vă placă și