Sunteți pe pagina 1din 5

Global search and genetic algorithms

By MARTIN L. SMITH, JOHN A. SCALES, and’TERRI L. FISCHER


Downloaded 10/05/18 to 166.104.41.33. Redistribution subject to SEG license or copyright; see Terms of Use at http://library.seg.org/

AmocoResearchCenter
Tulsa, Oklahoma

A very generalandflexible way to formulateinverseproblems Hill-climbingmethods arelocaloptimizationmethods.At each


is asproblemsin Bayesianinference.This approach permitsus to step,the algorithmonly examinesthe immediatevicinity of the
consolidate a numberof differenttypesof informationabouta par- currentpointin orderto chooseits next step;thealgorithmmakes
ticularinverseprobleminto a singlecalculation.We can combine no attemptto reconnoiterdistantportionsof model space.This
a priori informationaboutwhichmodelfeaturesareplausibleand limitationhastwo consequences for optimization:
which are not, informationaboutthe observeddata and their
l A local methodonly exploresthe singlepeakuponwhose
errors,andinformationaboutnumericalandtheoreticalerrorsin
flanksit began
the calculationthatpredictsthedatathat wouldbe producedby a l Local methodscan be very efficient in reachingthe hill’s
givenmodel.We endup with a scalar-valued function,calledthe
posteriorpr&&ility, over thespaceof all models.The valuesof toe
thisfunctionovera setof modelsexpresstheprobabilitythatthose Wherelocalmethods areappropriate, theycanbemadeto per-
modelscouldhaveled to a particularset of observeddata.The form very well indeed.Wherelocal methodsare inuppropriate,
goal of the inversecalculation,then, is to find all of the models they are likely to efficientlyclimb the wronghill.
whoseposteriorprobabilityis greaterthan somethresholdof ac-
ceptabilitysetin advanceby theuser.(The functionthatoptimiza-
tion calculationsseekto extremizeis often calledthe objective
function;theposteriorprobabilityis usedasan objective function
whenwe try to solveinferenceproblems.)

Hill 4lmblng. If we know the objectivefunctionhasonly one


maximum,thenall uphill trailsleadto thatmaximum,andwe can
fmd its location(andthereforethebest-fittingmodel)by simply
stepping uphillagainandagainfromany convenientstartingpoint.
Therearemanywaysof goingupthesurfaceof theobjectivefunc-
tion to a maximum;theygeneraIlyrely on estimates of thederiva-
tivesof the objectivefunctionin the neighborhood of the starting
point.Thesetechniques arecollectivelycalled“hill-climbing” and
areusuallyquitestraightforward: follow thegradientvectoruphill
until the gradientis zeroor as closeto zero as you are willing to
tolerate.If there is only one local maximum,then the resulting
modelis the bestsolutionto the inverseproblem.
Unfortunately,many geophysicallyinterestinginverseprob-
lems have objectivefunctionswhichare highly complicated and
possess many local maxima.As a consequence, the resultof an
iterative, gradient-following,or hill-climbing, calculationmay
dependstronglyon the startingmodel: the algorithmmay very
well convergeto a 1oca.lmaximumwhichis notthegloballyhighest
peak.
An exampleof thiscomplexityis shownin Figure 1. The com-
putationfrom whichthiswastakenis a syntheticmodelof surface
consistent reflectionstaticswith 55 modelparameters(50 statics
delaysand 5 auxiliaryparameters)and is due to ChristofStork.
The figure showsthe cross-correlation-based “goodness”func-
tion as a functionof two of the variablemodelparameters when
the remaining53 were set to their exactlycorrectvalues.(The
centralandlargestpeakis theglobalmaximumfor thisproblem.)
The figuresuggests thatthe final answerwe wouldgetfroma hill- Figure 1. The “goodness” function for a model surface-am-
cliibing methodwill dependstronglyon wherethe climb begins sistentstaticse&Won h~volvhq 55 model staticsparame-
(thatis, on our choiceof starting.model).If we choosea starting ters; the greater the value of this function, the hetter the
point at randomfrom the domainshownin Figure 1. hill-climb- model’s parametersremovestaticsshiftsfn the data set(which
ing wouldalmostalwayslead to an inferior solution(we would is not shown).For this figure, 53 of the 55 prameters were
not climb the highestpeak). setto the correctvalue and the remaining two were varied.

22 GEOPHYSICS:THE LEADING EDGE OF EXPLORATION JANUARY1992


G lobal optimization. In contrast to local methods, global drunkard’s stateis varied slowly from plasteredto sober. The al-
methodsare optimizationtechniqueswhich make some explicit ef- gorithm functionsas a kind of hill-climbing in which the climber
fort to search more widely in model space. Three such methods is able to changehills from time to time
are: Simulatedannealingis a kind of quasi-localalgorithm: it relies
. Exhaustivesearch on samplingthe spaceof models by following along a path from
. Simulatedannealing nearest neighbor to nearest neighbor. As a result, the computa-
9 Genetic algorithms tional complexity of the method is dominated by a grid size or
step lengththat is essentiallyunrelatedto the underlying structure
We mention a few properties of the first two of these classes
Downloaded 10/05/18 to 166.104.41.33. Redistribution subject to SEG license or copyright; see Terms of Use at http://library.seg.org/

of the objective function. This leads to a phenomenonknown as


and then devote the remainder of this note to a discussionof the
“critical slowing down” or “the curse of dimensionality” as the
third class-genetic algorithms.
size of the problem grows.
Exhaustive search. This straightforward scheme consistsof
Genetic algorithms. This “new” classof global optimization
simply evaluatingthe objective function for every distinct model algorithms, often referred to as GAS, actually predatessimulated
and reporting the model that producesthe largest value. The only annealingby more than a decade.The conceptsinvolved were dis-
effort involved in formulating sucha calculationis deciding what covered in the late 1960s and early 1970s by John Holland and
constitutes“distinctness” between models.All global methodsre- his students.
quire some such decision, and it usually amountsto just convert-
Simulated annealing is based upon a close analogy between
ing continuousparametersto quantizedones. functionoptimizationand a physicalsystemcomposedof particles
Exhaustivesearchhas the interestingpropertiesthat it is very that interact in a relatively simple way but are enormousin num-
simple to implement, is guaranteedto work, and is almostuseless ber. Genetic algorithms proceed from a loose analogy between
in practice. The problem with exhaustivesearchin almost every function optimizationand a biologicalsystemcomposedof organ-
caseof interestis that the searcherbecomesexhaustedlong before
isms that interact in a relatively complex way and are compara-
the model space. For example, in the reflection staticsexample in
tively few in number. A genetic algorithm tries to find an optimal
Figure 1, the model spaceconsistsof 55 parameters,eachof which answer by evolving a population of trial answers in a way that
was allowed to range over about40 values. This fairly simple ex- mimics biological evolution. If simulatedannealing “cooks” an
ample has a model spacewith 4055=: 108sdistinct models. If we answer, then geneticalgorithms “breed” one.
could evaluate109modelsper second(one gigamodper second?),
it would take us about3 x 1072yearsto searchall of model space.
Simulated anne&ng. This class of techniquesis basedupon I nside a GA. To be a little more specific, supposethat we have
a closeanalogybetweenoptimizationand the growth of long-range a scalar-valuedfunctionwhich is defined over somedomain. Since
order (such as large crystals) in a slowly cooling melt. Simulated our mental model is that of a geophysicalinverse problem, we
annealinghas been the subject of much interest since it was in- refer to points in the domain as “models” and the value of the
vented in 1983. In the last two years, major advanceshave been function at some model as that model’s “fitness. ” Our goal is to
made in reliability and efficiency. find a model, or possibly a set of several models, that maximizes
Simulatedannealingis usually implementedas a sort of biased the fitness function: we want to find the best (“fittest”) models
“drunkard’s walk” in model space.The drunkard’s path through in the spaceof all possiblemodels. (In practice, we’ll often settle
model spaceis the resultof a competitionbetweentwo tendencies: for finding some “pretty fit” models.)
one is to walk uphill to the nearestpeak, and the other is to take The essenceof a GA is that it’s a set of operationswhich we
a stepin a randomly chosendirection. When the drunkardis really apply to a population of models that lead us to a new population
plastered,practically all of his stepsare chosenat random; when in sucha way that the members of the new populationwill have
the drunkard is relatively sober, practically all of his stepsare a greater expected average fitness than their predecessors.Sup-
uphill. During the come of a simulatedannealingcalculation,the posethat we are executinga geneticalgorithm that has a constant

Flle
I:0.2
model
0.6
parameter

2. A schematicdisplay of a model population in an ap


placationwhere a model is specifiedby a single real number in
the range [O,l]. The population consistsof 15 randomly chosen
file 3. Each model in the current population gets a portion
of a roulette wheel accordhtgto its rank in a population sorted
by fitness. We select parents by spin&g the wheel once for
points from model space. Also shown is the objective (or “Pit- each parent (but we require the two parents to be distinct).
ness”) function as well as the value achievedby each model in The proportions shown here are not intended to diiy reJiect
the population. the situation in Figure 2.

GEOPHYSICS: THE LEADING EDGE OF EXPLORATION JANUARY 1992 23


population size of 15 (there are many variations and this is a A commonly used form of selectionis illustratedin Figure 3.
plausibleone). Then at some point in the calculation, we have a We assigna slice of a roulette wheel to each model. The size of
set of 15 models and we have computedthe fitness for each of each model’s slice is somehowrelated to that model’s relative fit-
these. Figure 2 shows a populationwith 15 members for an ap- ness.In the example we discusslater, “wheel space” is allocated
plication in which a model consistsof a single real number, x, in to each member of the population as a function solely of that
the interval 0 < x < 1. Also shownis the fitnessof each model, member’s position in the population’s rank order; that is, the posi-
as a vertical line from the abscissaof the curve to the fitnessfunc- tion in the populationonce it has been sorted in order of fitness.
tion. These models were selectedat random (which is how GAS We use this schemebecauseit is independentof many decisions
Downloaded 10/05/18 to 166.104.41.33. Redistribution subject to SEG license or copyright; see Terms of Use at http://library.seg.org/

are usually started);Figure 5, to which we shall turn later, shows about objective function scaling, but many other choices are in
the populationwhen the searchhas progressedfarther. use.
We then do our “GA-thing” on the populationand are led to Recombination. The next step is the creation of a “child”
a new populationof models (many of which may be identical to model from the selected parents. This step is in some sensethe
membersof the predecessorpopulation). The transitionprocesses inner magic of GAS becauseit is mainly here that searchingex-
which tell us how to evolve the new populationfrom the old one tends into new regions of model space. Recombination, often
are designedso that (amongother things) the expectedaveragejr- called crossover, constructsa child model by splicing together
nem of the new population is at least as large as the average fit- piecescopied from two parent models. The notionof splicingbits
nessof the old one. We simply apply theseprocessesrepetitively of models together makes sensebecausewe explicitly regard a
until we have found a satisfactoryanswer (or we have given up). model as composed of a list of values. This list is called a
The details of these transition processes(which are the heart “chromosome.”
of GAS) vary widely, but all sharea basic tripartite structure.The The details of this representationare extremely important to
elementsof that structureare: producingan effective algorithm, and the selectionof an efficient
representationis a major part of the art of GAS. In the example
l Selection:designatingwhich membersof the currentpopula-
shown in Figure 2, where the “real” model was a single, real-
tion will be allowed an opportunityto passon their characteristics
valuednumber, a good representationmight be made to usea five-
to the next generation
bit binary number. In this case, our chromosomewould consist
l Recombination:constructingnew child models by combin-
of five values, each of which was 0 or 1.
ing model featurescopied from the set of selectedparent models
The simplesttype of recombination,called one-pointcrossover,
l Mutation: randomly perturbing the model parametersof an
is illustratedin Figure 4. Here two parent models, namely 10000
occasionalchild model (for the purposeof addingdiversity to the
and 11111, are recombinedto form two child models, 11000 and
population).
10111. The recombinationprocessshown here simply requiresus
The field aboundswith a lush andbewildering variety of genetic to pick a random intermediatepoint in the chromosomesand con-
algorithms. We must content ourselveshere with describingthe structeach child by selectingthe first portion of its chromosome
particular algorithm we actually used (which is of a fairly com- from one parent and the secondportion from the other parent.
mon sort) in the example to be described. In our algorithm, oneof the offspring is returnedto the popula-
Selection. We first select two parent models from the current tion to compete.At that moment, sincethe parentsare still present,
population.The parent models will be allowed to passon someof there is an extra member in the population. We then discardthe
their characteristicsto a child model in the next generation. We least fit of these. In this scheme,a child mustbe at least as fit as
try to selectthe parentsin a way that favors better models(in the the least fit preexistingmember in order to survive.
senseof fitness) over poorer models, but which still affords all The five-bit chromosomeof binary digits can represent25 =
models a reasonableopportunity to reproduce; we will return to 32 different states. There is no intrinsic reason these had to be
this point later. In our algorithm, the selectedparents remain in equally spaced,or even ordered, points acrossthe model domain.
the population(that is, reproductionis not fatal). We are free to use mappingsthat are more natural to a particular

1 0 0 0 0
Parents
I
1 + 1 1 1 1

1 1 0 0 0
Offspring
1 0 1 1 1

Figure 4. Each model is coded as a binary string of length S. 0.6 0.8


The two upper modelsare the parents (and are already present model parameter
in the population) and the two lower modelsare the offspring
produced by recombination. After a crossover point Is chosen Figure 5. A stagein the hypotheticalevolutionof a GA maxi-
randomly from the chromosome’s interior, each child is con- mizationcalculationassociatedwith the modelshownin Figure
structed by ta the first portion of its chromosomes from 2. The elementsof the populationare clusteringnear peaksof
one parent and the second portion from the other. the objectivefunction.

24 GEOPHYSICS: THE LEADING EDGE OF EXPLORATION JANUARY 1992


problem, suchas nonequallyspacedpointsor different binary rep constructedby replacingzero or more of the parametersin a model
resentationschemessuch as (as is often done) Gray coding. Fur- with “don’t care” symbols (see Figure 6). A schema always
ther, individual chromosome values do not have to be binary; matchesone or more actual models.
many applicationsuse larger domains (as does one we discuss Schemataplay the central role in Holland’s analysisof the inner
later). The selectionof useful representationsis one of the most workingsof GAS becauseschemata,unlike the modelsthemselves,
challengingand central issuesin making GAS useful optimizers. have significant and calculablechancesof surviving reproduction
Note alsothat recombinationresultsin a searchin model space even when the child model is different from both of its parents.
in which newly sampledregions need not be adjacentto previous- Holland analyzed a simple (but representative)form of GA in
Downloaded 10/05/18 to 166.104.41.33. Redistribution subject to SEG license or copyright; see Terms of Use at http://library.seg.org/

ly sampledregions. (Simulatedannealing,on the other hand, pro- terms of the change in the number of instancesof a particular
ceeds by a seriesof neighbor-to-neighborsteps.) This nonlinear, schemain the population. Define the fitness of a schemato be the
nonproximate search process results in a surprisingly efficient average fitness of all the models that are representedby that
reconnaissanceof model space. schema.Holland showedthat the expectednumberof instancesof
Mutation. Finally, most GAS also incorporate a low- a particularschemawill grow more or less exponentiallywith an
probability, randomizing processcalled mutation. Mutation acts exponentthat reflects the ratio of the fitness of that schemato the
to randomly perturb a randomly chosenelement in an occasional averagefitness of all schemata.
(randomly selected) child. In the absence of mutation, no child Holland’s insight suggeststhat GAS are searchingfor globally
could ever acquire a chromosomegene value which was not al- distributedinformationabout the behavior of the function we seek
ready present in the Population. to minimize, but that the form this information takes is subtle. It
also points out that the “state” of a GA is carried by the entire
populationof models, as opposedto simulatedannealingin which
S earth. The result of these three operations-selection, cross- the stateis carried by one point. Thus, in some sense, a GA has
over, and mutation-is a new populationof models the same size greater potential becauseit has a larger and more complex form
as the old one. Figure 5 representsan intermediate stage in the of “memory” than simulatedannealing.
hypothetical evolution of a GA for the optimization problem il-
lustratedin Figure 2. We can seethat the membersof the popula-
tion have begun to cluster aroundthe maxima. Ex ample: array optimization. ln Oceanographicexperiment
The selectionprocessis crucialto a GA’s effectivenessbecause designby simulatedannealing(Journalof PhysicalOceanography,
it determinesthe balancebetween explorationand exploitation. A September1990), Norman Barth and Carl Wunschproposedusing
selectionalgorithm which gives too little weight to fitnesswill not optimization techniquesto aid in designing acoustictomography
lead to convergencenearmaxima; with no weight at all, the process experiments. They applied simulated annealing to a simple yet
simply becomesa kind of unbiasedrandom search.The other ex- interestingand illustrative design example. We’ll apply a GA to
treme, an algorithmthat overemphasizesfitness,tendsto converge a reducedversion of their calculation.
too quickly around the one or two fittest initial models; this ex- Supposethat we wish to design a traveltime tomographyex-
treme does not searchlong enoughto get the lay of the land. periment that is, in a sensethat we describe shortly, maximally
There are many other issuesinvolved in selecting a GA. Do eficient. To be more specific, supposethat the model we wish to
we allow duplicate models in the population?How do we choose determine consistsof a 3 x 3 array of homogeneousblocks and
an initial population?How do we know when to stop?We do not we wish to determine the acousticslownessof each block. Sup-
have the space, the experience, or the insight to discussany of Posefurther that we are given a set of potential observations;that
these comprehensively.It is our experience, however, that these is, a set of potential raypathsfrom which we choosethe (smaller)
designissuesare still openquestions,and anyonewho would apply set of actual raypathsalong which we may observe traveltimes.
these algorithms must be prepared to spend some effort in ex-
perimentingwith the algorithm.

Schemata. In Adaptation in Natural and Artificial Systems


(University of Michigan Press, 1975), Holland provided a deep
theoreticalresult that shedslight on the nature of a GA’s search.
Holland’s insight derives from consideringthe effects of selection,
crossover, and mutation on the probability of occurrence of
schematain the population.
A schemais a regular expressionin model space.A schemais

1 0 0 0 1
1 0 0 1 1

1 0 1 0 1 Mode1s
1 0 1 1 1

Figure 7. The set of 216 rays made available to a fictitious


tomographicexperiment.The model ls a 3 x 3 block model.
Figure 6. A schemarepresentsan equivalenceclassof models. Deslgnlngthe experimentrequiresthe selectionof nine rays
The asteriskls a wild card or “don’t care” symbol.It can take from this set suchthat the mostpoorly resolvedmodelparam-
on any value ln the alphabet (in this case0 or 1). The four eter is as well resolvedas possible.Not all rays are separate-
modelsshownare all associated with the schemalO**l. ly resolvedIn the figure.

GEOPHYSICS: T‘HE LEADING EDGE OF EXPLORATION JANUARY 1992 25


Downloaded 10/05/18 to 166.104.41.33. Redistribution subject to SEG license or copyright; see Terms of Use at http://library.seg.org/

0’ 4
2 4 6 8 10

Figure 9. Singular value spectrum for the ray set shown in the
previous figure. There are nine singular values and the smal-
lest is 0.37.

We fed this problem to a fairly straightforwardGA and allowed


it to chug along for 2.5 x 105 trials. It came up with the ray set
shownin Figure 8. We don’t know if there is a better solutionbut
this seemsto be a prettygoodone. Its singular values are shown
in Figure 9.
Figure 8. Model calculated by the GA which maximizes the
The GA found this result after examining 3.5 X 105 ray sets.
smallesteigenvalue of the forward operator.
This soundslike a lot of work (and it is); but the model space
contained8 x 1020models, so from that point of view, this cal-
Finally, supposethat we can ignore ray-bending so that all rays culationwas astoundinglyefficient.
are straightlines. We will be allowed to make nine actualobser-
vations, the smallest number that could possibly resolve all nine
model parameters. Our problem is to select the nine actual rays E xeunt. We don’t want to leave the impressionthat GAS are a
from the set of potential rays such that the wave speeds in the magic combinationof a black box and silver bullet; they’re not.
model’s block are determinedas accuratelyas possible. Casting a particular problem into a GA requires making crucial
We follow Barth and Wunsch in defining “as accurately as decisionsabouthow to representthe problem domain and how to
possible” in terms of the singularvaluesof the system’s sensitivity implementthe basicreproductiveoperations.In our examplehere,
matrix. Supposethat we have chosena set of actual rays. Then we had to decide how to deal with the permutationaldegeneracy
we can computea 9 x 9 matrix, S, which tells us how to compute of our goodnessfunction: all of the permutationsof a particular
the traveltimes for this set of rays, T, given the values of the set of actual rays have the same goodness.We also had to con-
model’s slownessparameters,u ; T = S u sider whetheror not the GA crossoveroperationwould be allowed
The singular values of S are a measureof the dependenceof to give rise to a child structurein which some specific ray ap-
errors in the model’s parameters,d , upon errors in the traveltime peared more than once: the smallestsingularvalue for any set of
data, T. The larger the singular values of S are, the smaller the actualrays with duplicationsis exactly zero. The point here is not
errors in u . Our measure of goodnesswill be the value of the how we actually dealt with theseissues,but rather that the issues
smallestsingular value of S. (Since S is a function of the actual themselvesare extremely importantto the efficiency of GAS and
rays being observed, then obviously the singularvalues are also can easily make the difference between successand failure.
a function of the actualrays being observed). On the other hand, we believe that global optimization tech-
We specify the calculationby specifying the size of the model niques in general, and GAS in particular, will see increasinguse.
(3 x 3 in this case) and a set of potential rays from which the final This popularity will reflect the increasingpower of computing
rays must be selected.We chosea set of 216 potential rays which hardware (which makeslarge numericalefforts ever more practi-
are shownin Figure 7. (The numberof geometricallyfeasiblerays cal), the increasingcomplexity of geophysicalinversecalculations
is larger than this; we reject rays that, for example, have both (particularly in the application of a priori information), and the
endpointson the same side, or give rise to the same matrix coef- fascinatingpossibilitiesraisedby the existenceof global optimiza-
ficients.) Each trial solution of our optimization problem is a set tion algorithms suchas these.
of nine actual rays chosenfrom the set of 216 potential rays. For
those nine rays, we compute a sensitivity matrix, S, and its sin-
gular value decomposition.The goodnessof the chosenset of nine S suggestions for further readiug. An extensiveand enjoyable
rays is the value of the smallestsingular value of S. discussionof the formulation of inverse problemsas problems in
The function we are trying to optimize is very complicatedand Bayesianinference is containedin Inverse Problem Theoryby Al-
probably imperviousto assaultsby local optimizationtechniques. bert Tarantola (Elsevier, 1987). The invention of simulatedan-
Think of each trial solutionas being a point in a nine-dimensional nealing is the article Optimizationby simulatedannealingby S.
solutionspace; each coordinateis integer-valuedand can take on Kirkpatrick et al. (Science1983). Major advancesin this technique
any value in the range 1-216 (representingthe selection of one have been made in the last two years. They are summarizedin the
ray from the set of potential rays). If we ponder a bit on whether paper Global searchmethou% for multimodalinverseproblemsby
or not the quantity we are trying to optimize is unimodal (hasonly Scaleset al. (Journalof GmputatimaI Physics1991).
one extremum) or not, we will .beforced to the conclusionthat in Genetic algorithms is a very active field of research. Useful
fact this function has no natural shape. Becausewe are free to texts are Handbookof Genetic Algorithms(Van Nostrand Rein-
map the integer values1-216 to the set of basisrays in any fashion hold, 1991) and proceedingsvolwnes from internationalconfer-
we want, and becausethere is no “natural” mapping, we can ences on this subject held in 1987, 1989, and 1991. A good
make the shape of the function into virtually anything we wish. introductorytext is Genetic Algotithms in Search, Optimiution
More to the point, we have no idea how to choose a mapping and Machine Learning by David Goldberg (Addison-Wesley,
which will make this calculationamenableto local optimization. 1989). E

26 GEOPHYSICS: THE LEADING EDGE OF EXPLORATION JANUARY 1992

S-ar putea să vă placă și