Sunteți pe pagina 1din 18

t

Journul oJ C.,ry'uts-Atulad Molc.ulo Dari8,t, ll (1991) 2n9 t8. @ 1997 Klueer Acadenic Publisherc. Printed in The Netherland:.

,$,r"v

2oe @

J CAMD ]95

A comparison heuristicsearch of algorithms molecular for docking


David R. Westhead+, David E. Clark*+ and ChristopherVl Murray***
Proteu! Mole.uLlr Derign Ltd-, Prcteu! Howq Lyne Green Rusiness Park, M.rct:lesfeu, Cherhirc SKt I qJL, U.K.

Yff:.fl,|:;:fl5"i3'l
Kcywoftb: Lig;rnd+rorcin docking; Molccular rccognitioni[volutionary algorithms;Simutrrcd anocaliog;Tabu scarch

Summary
This paper describes the implemeDtation and oomparison of four heuristic search algorithms (genetic algorithm, evolutionary progmmming, simulatedannealing and tabu search) aod a random iearch procedure for Eexible molecular docking. To our knowledgg this is the first application of the tabu search algorithm in this arca. Thc algo.rithms are comparcd using a rccently doscribed fast moleoular recognition potcntial function and a diverse set of five protein-ligand systems,Statistical analysis of the results indicates that overall the genetic algorithm perforrns best in terms q[ thi fredian energy of the solotions located. Howeve4 tabu searchshow$a better performanc id tefins of locating solutions close to the crystallographic ligand conformation.Theseresultssuggesttha! a hybrid searchalgorithm may give ruperig. rcsults to any of the algorithms alone.

Introduction The safeand effectiveacticn of a pharmaceuticalagent within the human body dependsup6n thc selcctiverecognition of the drug moleculeby the appropriatetarget receptor, This moleclrlar rccognitior is governed by the interplay of a number of factors such as steric, electrostatic and hydrophobic inte-ractions. The sum of the free energyof theseinteractions is termed the ,inding affnity of the molecule for the rcceptor and is govemed,in part, by the goomotry of the ligand-rcceptor complex. The early stagesof drug discovery usually termcd leadgenerqtior, could be significantly expedited if there existed a method whereby the geomctry of the ligand-recepor complex and the binding af6nity of a given moleculefor a receptor of known structurs could be reliably estimated without rcsorting to th expcrimental techniquesof synthesis, co-cryBtallisation and assay. In computer-aided moleculardesign(CAMD), the searchfor methodsfiorthe ab initio prcdictionby computerof tbe bindiDggeometry

and binding affinity of two moleculesis termed the 'dockingproblem'.Because its potentialapplication of in CAMD, the dockingproblem received has muchattcntion over the years and the progressmade bas beenreviewed in a numberof recentarticles[-6]. The earlicst docking programsconsidered only the tEnslatiooaland odentational degrces frcedomof the of ligand with rcspectto the receptor,e.g. Rel 7. However, more recently, with advancesin ths powgr of the available computer hardware and increasingly sophisticated software algorithms, it has becomcpossiblq to. takp into account routincly the intemalconformational flexibility of the lignd ('flexible' docking) [8-23]. Limited conformational flexibility of the recptoris also being permitted In some approaches [1,1q, Clearly,as the number of degees offreedom beingcxplorcd io thc docking problem inqeasa!, the size of the search space rapidly becomes enortrous; Gehlhaaret al. [4] estimatethat, in one of their exampleq the search spacccomprised at least 10D solutions.Facedwith sucha situatiolr.it is obviousthat

tPresentaddress: EMBL Oulstation, EuropeanBioinformatics Instilutq WcllcomcTrust GcnomeCampu.xHi (ron, CambddgeCBIO ISD, U.K. Drgnham Resarch Centrq Rh6ne-Poulenc Rorer Ltd.. Rainham Road South, Dagcnham,EssexRMlo 7XS, U.K. '+Prsent address: tl'To whom corrcspondencs should be addrcsscd. Abbrcviations: GA, g.nenc algorithm; EB cvolutionary programming; sA, simulated annealing; Ts, tabu search; Rs, .andom search; COM, centr. of m6ss; rms, rcot-mcan squaE; [CM, internal coordinatc modelling; cPU, cntral processing unit; DHFR, dihydrofolatc reductale; MTx, mcthotrexate; DANA, 2,3dehydro-2deoxy-/V-acetylncummiDic acid; NAPAB dr-(2-naphrhyl-sulphonyl-glycyl)-Dl-p-amidinophenylalany! pipcridine; FIFO, fi .sr-in, fi rsl-ou!.

2t0 fast and effectivesearchalgorithms are of crucial imDortmany othe! possiblesourcesof variation. Second, the anceif the dockingproblemis to bc tackledsuccessfullv perforrnance of each algorithrn depcnds a set of ad_ on purpose this paperis to comparethe performThe of justableoperationalparameters, and the quality of thc anceof four heuristic search algorithmswhenappliedto results depends the extent to which these oDtimal on are moleculardocking. Thesefour arc simulatedannealine for a given test case. (SA), a geneticalgorithm (CA), wolutionary programl Our approach to the first of theseproblemshassimply ning (EP) and tabu searchfiS). While SA, GA and Ep beonto seekto implement all the algorithms without bias have cach been applied to the docking problem in the and with no preconceived idea as to which alsorithm existingliteraturq to our knowledgethere has beenno shouldbe dcterminedthe .besl'.We havealso so--ueht ro previousattcmpt to usethe TS algorithm. This work thus implementcach of thc algorithms in a fairly ,stan-dard. servesto introduce this algorithm to the freld and to mariier. For instancqmaintainingthe GA examplqour compa-re with algorithms it alreadyin use. The algorithms algorithmemploys just a simpleone-pointcrossover and are compared,over a number of test cases, according to rcndom mutations.It is possiblg indeed likely, thit a their ability to locateoptima of an objectivdenergy fJnc_ more sophisticated GA woutd perform better than our tion designedfor use in the docking problem. simple one, Howwer, the more sophisticated the alEo_ In developing successful a docking method,considerrithmq the more difficult the comparison, sincesophi-si_ ations iegardingthe energyfunction are at least as im_ cation brings with it more adjustableparamctersto optiportant as thospertaining to the sarchalgorithm. The miseand_a difhcult choice\which of the manypossiLil_ minimum value(s)of this function should correspond to rtresIor Increased sophisticatibn should be chosen. is It the prcfencd bioding mode(s)of the ligiaod, and ihe ultiour bciief that the besralgorithm for the docking pro6iem matc goal should be a correlation betweenthe valuesof rs probablya hybrid of various typesof algorithm.lt is the function and the binding amnity of the ligand. For hoped that by comparing fairly simple implcmentations this work, since our aim is the comparison of seanch of.eachalgorithm,our study will point to desirable algo_ algorithmg we havestudicd a singlecnergy function. This nth[uc characteristics usein hybrid atgorithms, for waschosento bc the function recentlydescribedby Gehl_ The second problem, that of opfimising oprational haar et al. U4,l5l, becausc is very fast to evaluate it parameters(the so,called,meta-optimisatio[' problem),is computationally,and because has beendemonstratedto it also oot stndghtforward. It is almost impossibleto guar_ be successful a number of docking application& The in anteethat any given set of parameters chosenis truly choice to comparealgorithms using a single energyfunc_ 'optimal', particularly if the trnrameters coupledin are tion simplifiesour study coosiderably,and, sincepotential s9me. In this work, a set of parameters each for ]vay. funciions usedin docking tend to sharc similar character_ algofltlunwassoughrwhich performed well on all thetest isticq thcre are good rasonsto believe that conclusions cases aod c"\tcnsive.tuning' experimentswerecaried out dlawn about algorithmic performance using this energy to this eod. Our cxperienchas shownthat it is imDortant function will probably apply when the algorithms arc to tune parametersover a number of lest cases because it usedwith diffeEnt functions. A study clmparing various is possible over-optimise performance one test to the on docking-energyfunctions in terms oi their abilit-y to precaseat the expeDse the others if only one exampleis of dict binding mode and afnnity would be of greai interest Cleqrly,.it is a very dgsirable characteristic of a _used. and is planned for the futurc. dockilg algorithm that parameters bc transferable beComparison of heuristic algorithms is dimcult for two tweentestcaseqand algorithmsfor whicbthis is not true marn rcasons First, each catcgory of algorithm covers will be penalised this type oTitudy. in many different possibilitics ior implementation, each of Our compqiso_nof algorithms is carried out ovor Rve which will perform differentty for a given optimisation test cases specifiedin Table l. This numUerias;uaged as ploblem. For instancq within the GA category it is possto.be,sumcie[t to allow general mnclusions about a[oible to implement a onc-point crossovcr, a twoj)ornt rithmic_.performanceto be drawn, while retaining ihe cKlssoyer, somethingmore complicated, atrd ther are or possibility of a detailed discussionof each test case. The TABLE I TEST CASES USED THE PRESENT IN STUDY
Enzyme Dihydrofolate rEduclasc lnduena virus ncuraminidasc HIV-I p.otcase Thrombio
Li8!od No. of rotarrble bonds

PDB co& 3DFR INSD I HVR I ETS IETR

Rcferne 57 58 I

Mcthotrrxate DANA xK263 NAPAP Argatroban -t"t"bt"

7 4 8 6 7 bond, "r" giiJi-illJ

The structurcs rhe ligandsand the de6nitionof th"i. of

2tl lestcases werechosen be relevant to problems CAMD. in to be of varying difficulty as docking problems, and to reflectdifferent aspects ofmolecular recognition.Someof theproblems emphasise formationof hydrogen the bonds in molecular recognition, othelsemphasise steric to the fit acttvesitg and otherscontain a more equal mixture of the two. In additionto the comparison algorithmqa method of is describedwherebythe docking problem can be set up in a very general way,includingrigid body and rotatabie bond degrees freedomin the ligand, rctatable bonds in of the active site of the recptor,and the variable presence or abseDce crystallographic of waier molecules. this In paper,however, only the ligand degrces freedomarc of used,the others being left for a later publication.It is alsoillustrated how,within this method,each ofthe search algorithms can be implemented with minimal effort, and with much important code in common. The resultins softwarewill be refened to as pRO_LEADS (ligani evaluation automaticdockiDgstudies) by and formspart ofour Prometheus system molecular for design simu_ and lation. Methods Prcblemreprcsentar ion Degrees offreedom In order for solution of the docking problem, as stated in the Introduction section, to be feasibleusing currently available methods and computational rcsourceq it is necessary rnakeccrtain simplificationsFirst, neither to receptornor ligand can be consideredto be fullt flexible. I he receptormust be consideredrigid exceptperhapsfor somelimited flexibility in the active site. Sincethe lieand is typicallymuchsmallerthan the receptor, more fleiibil_ ity can be considercd, althoughmost methods only allow ligand flexibility through rotations about rotatablc bonds. Second, activesite for the receptormust be defined an in orderto restrictthe regionofspacein whichsolurions are sought,ODly with thesesimplificationsis the solution spaC thc prcblern sulficiently small that a good heuris_ of tic algorithm could be cxpctedto find optimal solutions with a reasonable success rate. docking methods implemented in pRO_LEADS ._The allow the following degrees freedom; of (l) Ligiand translatio! - the tigand is frce to move within a user-specified box defining ihe active sitq if the @ntroid of the ligand movcsoutside thc boxt a uscr-spcified penaltyis addedto thc ener$/ value (2) Ligand orienration- the ligand has full orienta_ tional freedom. (3) Ligand flexibiliry - the ligard is consideredflcxible through a list of rotatable bonds The rctatable bonds can be specifiedby the user or assignedautomatically. (4) Re{ptorlctive sitc residues thcsecan be con_ sideredflexiblethrough their rotatablebonds The user can choosethe degreeof flexibility. The availableoptions are all rctatablebonds in the side chain or iust the ter_ minal rotatable bond(5) Crystallographic water molecules if water moleculesarc present in the X-ray structure of the recaDtor thesecan b defined ro be .variable..In this casethe docking algorithms search for solutions in which the watermolecules may be either present absent. or In this article we explain how our approach dealswith all the abovedegrees.oflreedom. For the resuttsDres_ ented, only the first three degreesof freedom on rhe abovelist were considered variablq investigation the of othersbeingleft until a later publicatron. Docking va ables The ICM tree [24] provides a complete intemal coor_ dinate dgscriptionof an assemblyof molecules, their intcrnalconformations relative and positionsand dirienta_ tions.This makesit an ideal basisfor rhe choiceof the variablesfor usein docking and thus a very similar scheme has bee! implementedin this work. In pRO_LEADS. the docling variables rpprcsenting the relative position of ligand and recptor and their intemal conformations are a subsetof the variablesfrom the intemal coordinate tree. This.is illuskared in Fig. l. The rcceptoris.copsidercd nxed.tn spacgand the relativepositioosof the feccptor ano lrgand are govemed by the rigid body-vadables 3t_ tach.gd the ligard. With the loration ofFig. I theseare to {Bl,Vl,Tl,V2,T2,T3}, and can b irterpreted as bond . lengthq and valence and torsion angle., with rcspct to. : thg fixed triplet of virtual atoms at lhe root of the tree. Variables representingflexibility of ligand and receltor aie tonion anglestaken fiom the internal coordimte iree, The. docking variableq as manipulated by docking . algonthms,are storedas a stdng of real numbers. The hrst srxare the rigid body variables after these and follow the variablesfor ligand and recptorflexibility.At rhe eno or rhe vanablestnng are storedvariables controllins the presence or absence of variable crystaltographi! watcrs using a method similar to that suggested indepen_ dently by Read et al. [2j]. Most ofthe ener$/ functions usein dockingrequire for ^ Cartssian coordinales for the ligaud and recepior mol_ ccUles,It is necessaqf therefoE to car4r out interconver-r. sionsbetween docking variableslring and Cane ian the coordinates. This is accomplished by a single sofrware' ' modul of PRO_LEADS which is ca.lledthe coding mod_ ula The interfact of this module to the outside world comprisesthree routines: Initialcode, Code and Dccode. Initialcode is calledoncefor eachligand/receptor systcm; thrssetsup an initial variablestring consistcnt with Car_ tesian coordinates of the molecules iind user options relatedto ffexibility.Ir also setsup privatemodutedata

212

/**,

DckiDg vffiable stline

I
T6

B r l v r l rl l v 2
@l

T2

T3 T4 |rb!

Ioth'rvtri'bts"

anglcandr a toFion ansrc 'T ":Iil#;?;"Iogmphic uaiencc or the denoting presence variabls illLrstratcd

Fis,.Deriva,ionor,hcd.****r,::l"l,llL"#],:'5{,fi-#i::#[*1":i:" ::*:i$g:n*hi"l,"t* warenarcappend


l"T'"ir'l ,yp."

*Tl-t)',",lJ""r" contaidnsthemapp--r:i:T-t-Ti:lf#ii"9l1;:f,:l: st-lt*:,-l lll"J and recepror lisand rheenere "uonchosenrorthisworkisthatdueto *u, *t'i"h Afier call In*lil-t:o.t^"i itiJli.""r' """'i"ore' the lo i;iiit"a", carn:: iii,iii. iii. r"n"tion' isspecificallv four and lrgaoo rcceptor convert 9tliit:::'i;.':o'"tiie uppti"ution', ""iii#"J,1t. to codeandDecode rorrasrosffi'Ht#:l;il;t;eencomprises versa desisned
i:#:#;;;j;to

"r

*;"ute mavalsobeadded

t'

r"##:ff*"pllli"ii:t".l";: iltfT*'::J: tl'J,,jff:T;3"'J:?*"m",1*;llq*qs,:,,:_4 t*:l*ff;i]i,l',,i;,,""


'.Ifr[;;,1,"jil?""f:"J:iil;::,iii#i,"-"i",? ln:-l:l-".d the quicklv similarlv' algorithms
#ilil;;:a,iJ;of is atgorithm performed bv called eacn evaluation enersl/

imprementat'"' this within desisn, "i'-"i-l*r'i"t

and variables vice docking

and iigand

'"'-'j':h: #;3Tflj;r;J::#'1""1;TJ;ffl*o.

atthe i, iii,"n'"."*r* whichallowl ":[:'il:[:lTiIJ options This


switth overuser
lnt It is worth noting at thls stage

each-term atoms, protein heavv Fig 2 Thefunction :1" lt^giil'il#;;;"il;"

sum is termapairwise taking

the nrd,emProvs i,:i"f,"o:'il"i",,ie-n, and force

il**","il;;,;il;ry:::*lfi:;::l',f;'l:; nlli:tffi5l';ll'fi'fl':"J,i"'1"""i:i:'?$ 4'20'0) Atom typingas5.5'5,-0 int.ru"tion.'ir'a,r.0,4


,onj'""*r"*j".;; "i ofrena"p"na,upon. algorithm

eacr, u',iuur".,J;,lT,"Jf, oi iulurc oo"r,ing r* ;m:.h,ft-ryU!$lh*TilJ,1;:ff1


l;i;,,;ff;i.'i""; accounted inlht::iT-:l:':L: "" "'*nili:l,ilf";15""t;""|[: tfl.,Tili};;;;;;J'",yiurrog,"pr,i" beror, or ue liii"-ri""tt rr'oura )],ii,"#f." i,,lat"rpruo.rrr" type interaction
tl"li:1i:::: were.chosen' ligand n"u.on" oiu tong "na ""o il'l",l"i"J

an.

i,'lifi;p,.p.*["ir," Iiii5,fiLl_,*t::yl*:l;f#"J3; lg,t*:Ufl'flXl# ',_.:::::. non' warers)..or


ruur" li"oj'*""]" *:1.:;j-'lfli.t"t""L'Jj$ t-ilitllill;-r;i,i, s'""in z:l:-:'o*"lH '*'r.r
[i:n*:, fli.;::i:l;:l'll';;:'i'i: 'u"" lliffi'il;;;:*i'on^or "
of of a proponion theradius the O" encombox "ialng bounding

*mq:q,tili{ii::;:i"*:.T,il:"J;:**m:,*;;i;:*llil*i'

Searchalgotithms The different,searchalgorithms used are briefly describedbelow and Figs 3 6 give the schemes usedfor SA, EB TS and GA, respectively. In. case of tabu -the sQarch, more d. gpriqgion is given sincethis is a novel approach to the docking problem. For each searchalgorithm, the pa&meJer.s have beenchoseqso that the.total numbr of enerry evalualions per docking ruo_wasapproximately 200000. Ltitisl confomations For the purposes of the test casesused in this paper, the stadn-g po-sitian and odentatiol of tbe ligand were randomiscd within th box defining tho acJivesite. All torsion angle dg;king variables were also randomised. Soqpjlgadlhlr!.! require only one such starting position (SA and TS), others (the cA and EP) require a population of them. The software also providesthe option of a user-specified starting confolmation. Simulatedanneali\g The simulated annealingalgorithm[27,28] followsthe schente. illustrated iD Fig. 3. Within our implementatioo, the pcturbations required to gelerale new solutions arc random nunibe$ drawn from either the Gaussian or Cauchy distributioq at the choice of the user. The use of the Cauchy distribution was motivated by the fast simulated annealingalgorithm of Szu and Hartley B9| Both options were tried for a [umber of tcst cases,and, sinc no particular advantage was found for the Cauchy distribution, the_Gaussian distributioD was ured for the examplescitcd in tbis ppper. Perturbationsto angular variableswereforced to lio in aD appropriate domain (-n,fi] for iorsion angles and [0,r] for valence angles) by translating any out-ofuomain valuesthrqugh multiples of the domain size.The size of the perturbations generateddependson the width of rhe generationdistribution (standard deviation for the Gaussian, smi-interquartile range for the Cauchy). Mthin the algorithm this is set as a proportion of the size bf the allowej domain for each variable (e,g.a proportion of2tr for torsion angles in the ligand), and this proportion is the same for arry variablc. For the variables Bl. Tl and TABLE 2 PAIRWISE INTERACTIONS THE DOCKINGPOTENTIAL IN FoNqflON
Non-polar Donor Accptor BOttr Noajrolar SHBHBS HBSHB.S HBHBHBS

Fig.Z Thenon-booded interaction function tbcdocking of coe.gy function ofcehlhaar al. !41. er passing molecule. general, has benfound that the In it addinghalf the radiuscreates grid of sumcientsizeto a encompass movement the ofthe molecule during docking while remainingwithin reasonable memory limits_In all exper-iments reportedin this paper,a grid resolutionof 0.2 A wasused. additionaloption available An with this energyfunction is to scaledown repulsivecomponentsin the, interaction energy by a user-specifiedfactor; many docking algorithms makc-use of this option iii 6rder to facilitate sarching@y allowing the ligand to penetrate the receptor) in the aady stagsof a docking nid; but it wasnot usedin any of the examplesquoted in this paper This wai because two ofthe algorithms (IS and RS), for it wouldhavebendimcult to implementthe scalingin a mcaningful way. However, it is likely that, for crtain caseg such scaling may prove beneficial to the docking process.Recently, Verkhivker et al. 126l have suggested that better searching characteristics obtained whenthe are F valiie of the eoergy function is set equal to 4.0; however,all the results in this paper were generatedusing the function itr its original formulation. ' The intemal energyof the ligand is the sum of torsional and i[tcmal clash terms, The latter term is a penalty of l0 000 when the distancebetweentwo non-bondedatoms becomes than 2.35A. Thc forme! term hasthe form less E = A(l - cos(n0 Oo)) (l)

wherc0 is the torsion anglc, for sp3-spt bonds A = 3.0, n = 3, 0o= n, and for sp'-sp'! bonds A = 1,5, n = 6, = 0. 0o Other typs of bonds cannot be considercd rotatablc, The 6nal tefm in the eorgy fuatio& is apeoalty for leaving.thc box.dooning tha activc site, TWo options are avaihble: the fiIst attachesthe pnalty to solutions with the centrcid of the ligaod outlide thc box, ard the second attachesthc peoalty to each ligand.atom which falls outsidc the box. In this work, thc former option ha.sbeell preferredbecause constraiusposition while allowing full it orientational frcdom.

ssss

HB: hydrcgenbqnd intcractioq S: stcric inrcraction.prciein atom lyFs arc listcdhorizontally ligrnd aton types and vcnically.

2t4
1. Generate starfing point eitherthe sameas the inputsolution by randomising docking or the variaD|es 2. Beginloop over the number temperatures of (a) Set cunentsolution the bestfromprevious to lemperature (b) Scaledownwidthof generation distribution according temperature optionis chosen to it (c) Beginloop over the nunlber trlals of (i) ceneratenew solution perlurbingcurby rentsolution (ii) Evaluate energyof newsolurion (iii) Decidewhetheror not lo accept th new solulion using Metropoliscrltedon (iv) lf accepted, curent solutjon new set to solution (v) Update bestsolution this temperature at if necessary (d) Update bestsolution overallif necessarv 3. Output best solution tound
Fig. l. Sinulated ahnealingalgorithm.

infra). In El -offspring are created from parents bv mu_ tation. Tradilionally. a mutation operator basedupon Gaussian randon numbers has been used, but recent work by Yao and Liu [31] suggests that more.apid con_ vergeJrce be obtainedby usingCauchyrandomnum_ can brsinstead.Furthermorg good resultshavebeenfound using self-adaptive mutation parameters [32], which al_ lows thF mutation to mould itself to the searchas it oro_ cceds[301.In PRO_LEADS, self-adaptive muratio; is alw4ys usd and Cauchy random numbers bave been investigatedas an altemative to the traditional Gaussian operator. FollowingSaravanan al. [32],setf_adaptive et mutation ^ ot a parent(x,o) to an offspring(x,,o')can be formulated thus: oi= qexp(r,N(0,1) r\(0,1)) + and xi = \+ oi N r(0,1) ( 3) (2)

V I , the width wasset with respectto a maximum_allo}ad translationcqual to the tengrhof the longesr sideof the oounclDgbox, rather than the size of the allowed do_ main, using scalingfor ihe angular variablesas descdbed in the Dockingvariables section. The initial width of the generation distribution is a user input (a value of 0.05 was usedfor the xamplscited in this papcr), An option existsto scalethis width down linearly with temperature, resulting in smaller perturbations being used at lower tempmtures; howsver, this was found to provide no significant advantaggand so was not used. algorithm is driven by a user_specified cooling .The_ schg{qlg 99-.r4pris!ng ser of mqnotonically decreasin! a , tempmturcaand a number of.trials at oach tmperatue, The effcctiveness the algorithm in optimisinj the sys_ of tem eDer$' is strongly dcpendcnt on thc cooling scbedula The cxamplescited in this paper all used ihc sane coolschedule. This was {T, = 4Q966; Trr = O.880lTi,i = lng 2,-..,20), with 10000triatsat eachtemperature. value The of the Boltanann constantusdwas0.0l9g6 kcal(nol K). ,rcal' if Thc temperatures could only bc considered the unit of the energyfunction were kcaUmol, which is not the casefor the cnergy function used in this paper, Evo lutionary programming The EP algorithm follows the framework given in Fig. Each individual in the population is representedby a pair of real-valued vectors. One vectorstores dockins thc variables described the earlier sectionanJ thi othei in holds parameiers guiding self-adaptive mutatiou (vide

wherc x is thc vector of docking variables and c is the associated vcctor of mutation parametersN(0,1) is a normally distributedrandom number $,ith a meanof 0 and a standarddeviationof l. The parameters and d r arp commonly-set (.{2VnfIand (i2n')-', respecrively, to wheren is the length of vector x. For fast evolutionaryprogramming, yao and Liu [3t] suggestthat Eq. 3 be replaced by

x i= x , + 4 q

(4)

1. Createan Initial poputafionof solutions 2. Evaluate ftness of each population the member usingthe energyfunclion 3. Create offsprlng fr-omatt parentswithout selec_ tion_using mutation ooerator 4. Ev4ratg fihegs.of offspring 5. Loop over alloffsprlng for toumament setection (a) Randomly choese N otheroffspring oppo_ as nenls for toumament (b) Scolr a win each lime the chosenoffspringis more ft than its opponent (c) Rank this offspring by number of wins in its tournament 6. SeleeltoFranklng solutions new population as 7. lf userdefinednumber of generafionsis exceaded, stop. Elsegoto 3
Fig. 4. Evolutionaryprogiammingatgorithm.

215 where is a Cauchyrandomnumbervariable. Following 0 thcsc uthor\ thc Cirussi4ll pcrlurbutionofthc nr tutioD parameters beenmaintained. lt may b that there arc has bette! schemesfor use with Cauchy mutation; this is beinginvestigated [31]. For all the EP expriments described this paper,a in populationsize of 2000 individuals was used and the evolutionary search took placc over 50 generations In each getreration,every parent gave rise to two children and thc number of competitors in tbe selectioo tournaments was set to five. lnitial tests showedthat usinc Cauchyrather than Gaussiannndom numbersficr mutatron gave superior results and so all EP runs used the former distribution, The initial value of the mutation paramctero was st at 0.075for all the docking variables except for two of the rigid body variables, for which it was scaledas describodin the Docking variablessction. Tbbuseqch The modem form of tabu (or taboo) soarchis due to Glover [33,34]and *as originally applied to problems in the feld of operations rescarch.Morc recently,howeve4 tabu searchhas begun to athact atlqrtion a$ an offectiv heuji_stiJ Xarch procedurcfor combinatorial optimisalion problems.iDmolecular design, such as the evaluation of the chemicaldistance betwcentwo molccules[35]. Other workers in the molecular design field have employed related aoncpts [36-38], but, to ouI krror',/ledge, this papr reports the 6rst application of tabu search to the dockingproblem. As its name suggestqtabu search is concemed with irnposingrcskictionsto eoabteixarcn processto Degotiate otherwisedimcult rggiqDs[33]. Theserestrictions take the form primarily of a dr4 /rr, whiqhstores a 4urnber of previously yisited solutioos or regions of space By prevetilg the search from rcvisiting these regions (exrpt under spcialconditioDs, vidc infra), the explqration of the scanch spacecao be encouraged.Our implcmeutation of tabu searchfor molecular docking is presentedin Fig. Tabu scarchmaintains only on cur_ntsolution during the course of a searchand the initial solution ic chosn (vide supm) at the start of the run. From this current solutiotr, a use_r-defircdnumber of .movA-rT'g;;;ratcd by a mutatiolJike procedurein which Gaussianor Cauchy random variablesare addd to cach of the docking vari'ablesin the cuflent solution,Eachof thse mo]es is then scorcdusiog the energyfunction 4nd they are thcn rankd in ordcr, with the bst move at the head of the iist. The move!3ryl4aminEd i&rank order. Moves are considered 'tab!' if they glerate solutions which are not sumcientlv differentfrom thosesoiutionsin the tabu list. The threshold measure used in this work to determine the tabu sratus or otherwise of potential movcs is a rool-Elqaq squqry GLns)(measuredover heavy atoms) of 0.?5 A or lessbetween two solutionsbeingcompared. highthe The cst raDkingmovc (titbu ()r D()t)is itlwaysactcptcdif ils energy. is.lowerthan.the lowestenorgyso far. Otherwise the algorithm choosesthc best non-tabu move, If neither of these criteriacan be met, the algoritbmterminatss I[ 4-few-currelt solution can be found, it is addedto ., the tabu list. Eqfly_ the search,solutionsqr.qs!4plyin addedto the end of the list until it is full. Thercafter, the curfent solutionmust rcplacean existingsolutionstored in the tabu list. In PRO_LEADS, the tabtr list is.Danaged in_4_:fust,in,.fint oul (FIFO).m4nner with _thecurenr solution replaEingthe iabu solution havjrg. the lougesr rcsidencein tbe list, We have also experimentodwith an enclgy-basgd upda.ling criterion io which the curent solution replacesthe sOlution of lowcstenergyin the rabu list, but tcsts havc showo that it offeF rrQBarticular advantage over the traditional FIFO updating procedure" o4!J&.nqw-c,!,rrcnt .galution has bnidentificd and stored, a new seJ.qf moves is generatedfrom it and the searg progdure_ gpntinues with Uc next itemtion. A turther mechanism which helps searchexplorationhas also beenimpleinnted:i[, after a number of itcrations ofthe above procedurg it is observedthat the bst solution has not changed, then the t4bu sealch is randogrly rcstarted aJ a new position in the soarchspac.While this

1 , Create initia-l_s-qlulio'| spqcifiedor a! random. as

Makethisthe cunentsolution

Evatu?tq.culrentsolutlon. lf the current solution is the begt so far, record it UpdateTabu list (a) lf tabu ligt is not tutl, add cunent solutionto tist (b) Else, replace otdest mgmber9l tist with curent solution Gonerate and evaluate possiblemoves from N th_djnnt solution Rank N posslblemores in ascending orderof energy Exsmlng:hemoys jn rank order (a) lf-qloig h s.lgwgr energy tian best so ter, accept it and go to 7 (b) ll moveis not Tabu,accpt it and_ge ta7 (c) lf no acceptablemoves are located, te]mlnats algorithm 7. tf the itg|alio! lttnit has been reached,xit_with the best solution tound. lf Qe best solutionso hr has nol changedtor a given numberof itemtions. restartthe wholeprocedure to 1), Otherwise, (go goto2
Fig. 5. Tabusearch algorithm.

'1. cenerate an inilial populationof solutions

' ff'1q,ffi:,:H:l::ff f fl ?",Ti"": T: ':ilffi


(b) population- converged has or maximum num_ lf of genetic Der operations beenexceeded, has (c) S,elect parentsolutions two by routettewheel procedure (d) lf^(Crossover),producehir'o childrenoy one ponl crossover {i) Choose random position docking in variables (ii) Divide parents this ooint at (iii) Obtin children takingcombrnrng by first p|eceof one parentwithsecond pieceof olnerDarenl Otherwise, copy parenrs children to , . (e) Loopoverchitdren if (Mutate) and apply ran_ oom mutation (i) Choose docking variable ranoom at (r, Add random numberliom Gaussran dis_ rflowonto docking variable. Widthof d|smbulion 0.i of the domainstze is lor lne variable (D Replaceteasttit population memberif chitd,s energy lower is Go lo 2(a)

conditions crossover' and mutatc accept If:Xr;q:"";'Tiix, ,:lll.(the


multiplerestartpiocedure not part is of the classical tabu search,it has beenshown to heli the searchescape from Iocal minima in our studie$. ft tuOu,"ur"f, lor a user-defined " numberofiterations.At the end ofthis "in,inu"l trme, it terminates and retums the best solution founi du ng the search. ' In all the tabu searchexperiments described this in paper.the search wasallowedto proceed 2000 for itera_ r00 movesweregenerated using ::::;1'j::Lll:-,'on. \aucny mutahon wilh a fixed o value of 0.075. Th; lengthof-thetabu list was2i and the fandomrestartwas initiated if the bestsolution had not changedafter 100 lterations

to act direcrly upon the string of real docking variables descrited the Dockingvariaibles in section. ff,"""fg.J,f_ use of two operators. crossover and mutation. :a-Kes vrussoveracts upon two parcnt solulionsand produces two newsolutionscalledchildren. -the mutrtionop"Lto, t: solution. The probabitity .f rh" 1:: "i" olccurnng is.controlledby the user.F* th" ";;;;;; ;; rns papet.the crossover probability is 0.5 and the muta"^;;i;, tron probabiliryis 0.5. However, in PRO_LEADS,muia_ tron will always occurif therehasbeenno crossover so as ,. the generic diversityof the population. :TTT" nercctron parentsat each step of foilows the roulette . populationmemberis assisned [39]. Each. ::::l T** a rawltuess val:ue., which is given F,, by tle aifferenilin en=ergy between solution enirgl, and the the solution of mz|"rltmum energywithin the population.ff,ir ra* linearly.. ""fu" F=ap,15, so that rhe average :t -tl^:,-tt''* nrness presewed rs and the maximum6rne* i, Mu,,il;i;_ raram ttmesgreate,Lhanthe average. lf this scheme ever rc!urrsIn negattve sc,tled ntness values, lattercriterion the rs droppcd and the lowest fitness is set to zero. When the uulu": have beencalcutated, each solution is as.-1,1:": srgned- sectionof a roulettewheet a of sizeproponionat to it$ fitness, and this wheelis spun to select parents. The point of scating fitness the valuesis to vary the selection by the algorithm. Witt u iurg" uulu; l:ess1re.lsed ;i y."_i::al:l"r.tn. s"lecrion pressure very srrongand the is httest^individuals havea very high probability-ofsetec_ jl"j^^S-..T,ll t *lds to teai to l-owgenetrc diversiryof rne poputatlonand subsequent trappingin a locat mini_ mum, moderatevaluesof MaxScaleparam are usually used (1.2is usedfor theapplications in this papcr), Randomsearch Therandom search (RS)procedure srmplygenentes a iandom.conformation -^ and orientationof ttri tiganasuUj::l:. j.h" consrrainr that the ligand.scentre of mass boundingbox specified the user. by l:y]:i],::."tTT the rn eachol the RS dockingruns,Ihis wasrepeated 000 200 hmesand the algorithm terminated retumiDgthe lowest energy solutionfound during the search.
Local minimisation

Geneticalgorithm jlo-1al-milimiser which uses powettatgorirhm rhe *^ [40J nas.oeen rmptemented pRO_LEADS.Thi"lr in qejailedlccount ofgeneticalgorirhms u non_i"] . l can be found nvatlve.algorithm designed move the sotutionto the to rhe.generatframlwork ror our hearest ll^:":]:}f fE seneric loc?lminimum.It canbe usedoptionally asa final Fig.6. rhe atgorithm islmpte_ stage. l::lljl m-inimisation the lowest energy 'steady_state, of menredIn ::_','":,*,ed_in i.e.(he conformation form, samepopulationof lound after the operationofany of the search y updated, algorithms and therels no :?l:tl:ns,is,continua or a generation. "on""fi Basisof comparisonbetweenalgorithms many algorithms rmplemented are ..,.f '*:ltn _generic *i,l encoding lhe variabtes encoded (i.e. are in i ?ri? *ott, trr. primaryconcemrs wrlh a orr srnng).lt wasdecided allow examining to the genetic ,.^li^ll]: performance operators lne retalrve of the variousalgorithms.Aii

217 the heuristicalgorithms contaiDa stochasticelement,and so produce diiferent results depending on the start-ing thcreforo value of a random number seed.It is necessary statistically over a sufficiently largc pformanc to assss number Jf independenttdals. To ensurc a fair comparison betweenalgorithmq eachone was limited to a maxlmum of 200000 (+l %) function evaluationsper docking' This number was chosen to bG largs enough for most algorithms to achieve a reasonable successrdte, while leading to a CPU time rcquiremnt short enoughto permit many independcnttrials to b made' The first and most straigbtloryrard crite a we use in of tbe comparison the algorithmsaro the chaBcteristics of the energydisfibution of thE results, generatdfrom the abovd irials, Thes chat-actcristicsare the gverage energy of a solution, and ths-Ividth of the disttibution For a simplc case,in which the thi, "u"tug" single "rolild surfacehasa "alua minimum which is much deeper energy than ary othor minimum, an ideal algorithm would be lbq expcctgdto prdue an sYeragg.eners|.aloq9-to- va!-ue of tbis minimum with a narlow- disldbutioo of rcsyJts around this valug reflecting the fa4 qBl noJrrials lead to this deepminimum bcing located The energysurfacek rarely as simple as the ideal case outlined abovg aod frequently therq are a [umber of deepminima of very similar energy('competing minima')' while the characteristics of the eneryy In such a ca.se, distribution arc still useful quadtities, they do not always revcal all the diff encesbetween the algorithms which arc presnt in the results and can sometimesevcn be mislcading.When there ar competingminima, it is usful to classifysolutionsproducedby thc algorithms ac4ording to the minimum to which they cortespond, and to study how solutioneare distributed amongstthe various minima for each of the algorithms 4 qgqrl quantity to aid this analvsisis the rmi.distancc 6f ttre ilocfea ligand confor' mation from th conformation it adopts in the cry$al structue (i.. the distance from the 'concct' aos'wer)'A scatte-rplot of nlrs agliqs! -oareyfot all the solutions proctuccdusually reveals.a nuqber of qltr$tersof soluto iionq eachcluster cor-respgDding a giYc! binding r4ode minimum and idepti!99..!4!b-q!9t!9 -b,!'gad of thc lignd, in the eoergy function. A! ex4mination,.af $uc-hsg4lter plots often revealsintcEstin& algotithmic characteristic$ which would not be ipparcBt from a study of the energy distribution alone, as will bccomc clcar in the Results section. using the energy function chosnfol For matry cases, this study, we fitrd that tbe dpestmrnllrtlp lqcated bJ anv of tbe alqorithms is one correspondingto thc crystal .t-cture Wih this in mind, uqelso f,olnpArealSaalhms rate', that is, following Gehlacrording to thei[ rsnccss haar et al. [4,15] the proportion of th9 trials,whigh find a solution within 1.5 A rms (hsavy atoms only) of the crystallographic ligand conformatioD. A prelimin4ry test was carrid out in order to decide on appropriate statistical methods with whigb to a.ssss the risults. This revealedthat, in genelal, the 4-istri-bution trom tbg 4grmaldistribusignificantly deviates of results -ai milht have been predictcd from the expected tion, Wth this in mind' themedian form of the energt surface. and semi-interquartile range were preferredas descriptive devistatistics the more cornmonmeanand standard io theheuristic algorithmq thg.main . aiioo. When iornparing quatrtity considered was the md.-iarene-lgy.sf tha-.disobtainedover 500 independent tribution of bglt.encrgies triali- fhis number of arials was chosen becauseit was found to provide.a very good estimateof the median uod a good estimate of the more variable semi"n".gy interquartile range. The minirnum energysolution found o".r th" 500 tti"lt it also reported for each of the heuristic algorithms. Note that becausethe RS procedurc is as only i-ntended a control, statisticsfo; fuis atgorithm were gatheredover 100 independcnttrial$ In view of the a frcm the normaldistribution, Doo'Parametric deviations the to assess stalistical significanccof method waschosn thesecompa son$ The metho-dof Qardnq and Altman intervalfor [4ll was usdto computea 95% confidenc and a significant itt" difenince b"twcen the two medians. diffeiliice belween iivo algeiiithns was supposedto exist if zero feli outside this interval. As with all significance may be tests,this simply tells us if an observeddilTerence considercd'real', i.e. unlikely to havc occuded by chance' to Of course,it is possibiefor small differencs be statist! significatrt and yet be of no practical significance' cally ID general,the local minimiser is usedto refinethe best solution at the end of each docking run. Howevct for (lHvR), it wasdecidedto examine one of the test cases relativeperformancesof the algorithms both with and the without this final refinement.This cnablsan assessment of the bnefit to each algorithm of thc fioal stageof local minimisation. Testcqses The test casschosen for the paper are given in Table l. All arc topical tost casesin CAMD' for which inhibitors of th associatcd enzyme arc on the market or in clinical tdals as therapeutic agcntsfot important disases' .i, Dihydrofolate- rcductase-mclbst!9lqtq p.IIER-MTX) yearsit bas becomea stanwas choscl,bqtusq in. ragnt da4-F!.94!q. for, dgqking algorithns It thus srvesas a useful benchmark for our results Inffuenza virus neur- :'aminidase-DANA was chosen as a tost cas in which clecrostatic/hy{4|gen bond effectsare-th-oughtto domiT natc recognition, and HW-l protcase-XK263 waschosen , are in because this caselipophilicintcractions particularly impoiant for good binding (steric fit). Thqlwo thrombin the exampleswere choscn because inhibiton ll!4b str-oqg lipophilic and strong electrostaticinteractionsin different'

r
218 but sirpilarly-sized, binding pockets Thus, the thrombin provide a good test of thc ability of the docking examples potntial function to diffeEntiate betwenthe two types of interaction. AII of the test cases v/tb prepared in a similar fashion. The cry__stal structureswere extractedfrcm the Brcokhaven Databank [42] and hydrcgen atoms wel added using the Insightll/Discover software [431. The ligand shucturcs were mininiscd prior to docking usitrg the C\IFF force field [431.Sinc the potential function used in this work does not require accurate hydrogen atom positions. no minimisation of the receptor was performed. In all .seg a bounding box defining the activc site wasspecifiedby pcrmittiog th ligan(t's COM to move up to 2.0 A alolg each axis from its crystallographic location. The tot4l volume of the bounding box constraining the COM was thus 64 A!. The trcatment of crystallographic water moleculesin docking and other molccular design studiesis not a simple problem and has recently been the subjct of a detailed computational study [,14-4Q. One approach is to removecrystallographic water moleculesbefore attempting to dock the ligand;seefor exampleRefs.13, t6 and 18. These g:oups rcported that they were still able to obtain correct docked conformations in their tast cases, although other workers bave found that removal of the crystallographic waters has necessitated inclusion of the a continuum solvation model belore good resultscould be obtained [7,44. In our tcst cases, havefound that the we renoval of crystallographic watcrs has a significantly delete ous effcct upon the ability of the docking algorithm !o lgcate the arystallograp.hicbinding mode for INSD-,I ETR and 3DFR. For instancg,with,a-'dry'active site for 3DFR, the best successratg sbservgd was about 50%; on inclusion of all the water molecules,this rose to ovr 80%. Clearly, watsr molcules. the active in site-helpto stqb-ilise crystallograBhiccodicrmatio[ by the the imposition of additional steric and electrostaticconstraints.In the light of thesefindings,and in view of the nature of the work- in this papr which is primarily concemed with invcstigating algorithmic performance, we have rctained all thc crystallographic wate$ present in oui tri:stcaSei

%E
{- -t'

'-' s__,//

OH

'a)'

Rssults
FiguE ? showsthe structuEs of the ligands usedin the study togetherwith,the bonds whiqh are consideredrotatable.Tablc 3 gives the resultsfor the diffirent search algorithms on the test cases The most obvious result is that, for all the algorithms, performsverypoorly.The RS high medianenergy and low subcess produced RS rate by reffectsthe fact that RS is ineffectual in a searchspaceof this size,and indicatesthe advantageof lhe more 'intelli-

Fig. T Ligandgrrcd itr dockingstudicsin lhis papcrwith rotaisblc (a) bonds indicated: r0ethotrcxate; DANA; (c) Xf.263i (d) NAPAP; (b) (e)aBatrcban.

2t9
TABLE ] IN 'TARI'I| I I)OCKINC tTISTJLI'S I'OR TIII] TEST CASI]S GIVI:N Mcdian cncrgY Minimumcocrgy Algorilhm PDB codc 3DFR ]DFR ]DFR ]DFR ]DFR INSD INSD INSD INSD INSD IHVR IIIVR IHVR IHVR IHVR IHVR' IHVR' IHVR' IHVR' lHvR: I ETS

Scmi-intcrquartilcrangc

Sucrc$s ratc (%)

SA EP TS GA RS SA EP TS GA RS SA EP TS RS SA EP TS RS SA EP TS GA RS SA EP TS GA RS

-163.15 -164.61 -164,69 -167.64 -t37.4 -103.88 -105.60 -104.71 -105.43 42.75 -177.31 -t75.24 -t16.52 -154.86 -t59.58 -168.55 -15t.66 -63.50 -ll8.0l -139.76 -t39.4 -tM.o0 -112.68 -138.46 -140.52 -138.68 -140.85 -101.45

-t5t.62 -152.l3 -t50.13 -151.96 -82.15 -93.04 -98.35 46.78 -98.75 -71.18 -158.40 -155.02 -156.56 -1s6.44 -'15.41 -111.40 -143.99 -1J5.74 -152.93 2t.92 -t t 5.76 -l17.07 -120.13 -1t8.39 -71.53 -88.86 -87.52 -88.60 -52.38

5.94 8.08 6.73 '1.54 22.63 3.t2 2.37 2.56 1.84 6.95 5.59 8.82 't.66 9,43 24.77 5.05 9.31 1.19 10,06 26.09 5.41 5.21 4.72 l7.ll t6.97 8.98 14.01 8.90 9.23

90, 16 93 9 40 64 88 51 6 65 54 58 59 2 6l 48 50 51 0 l 9 8 II 2 30 2l 39 t3 3

I E-rS I ETS
I ETS I ETS IETR IETR IETR IETR IETR

RS (100 attempts)' from 500 indcpcndent docking artempts for each algorithm. excepr Enefgies arc in srbirEry uoits and statisrics a;erived ( and arc'-t4t.?6 (3DFR), -100.04 (l NSD), -149 55 ( lHvR)' -t 32 48 I ETs) the enersi." or r. rn.cro"ea i.ystri For comDarison, "onfo.rnatton"

(IETR). -98.89 of wilrorl usinglocalminimisation thebestsolution' werc " These're*ults obtained

on The Esults for the other algorithms gent' algorithms. below. aro the individualtestcases consideled D ihydrololate reductase-methotrexs te The resultsin Table 3 show that the best performanc for 3DFR in terms of median enrgy is produced by the GA. The differencesin median energy betweenEB SA rate and TS are not statistically significant. The success rs not well correlated with the median energy; the, most succrssfulalgorithm in terms of eneryy (the GA) is actu' ally thejoint worst of the four 'intelligent'algorithmsin rate, .This may indicate that the GA is trms of suacess morc prone to becomingtrapped in local energ minima' which'reprcsentconformations more than 1.5A rms from the crystal conformation, than SA or TS, both of which perform very well on this test case'It is worth noting that this test casewas used in parameterisation of the enerry that function [4], and that thercfore it might be expecte.d

this futlctionshouldhavea singledeepminimumncarthe rates.The scatter cr)sJal structure and yield good success rms from the crystal structure (Fig ploi of energyversus show8) for the GA addssomeweightto this hypothesis, ing that the majority of solutionsfould by the GA with closeto the crystalstlucenergy than -150 are indd less plot clearlyshowssomesolutions the tu!a, Nonetheless, *it.tt rms values of more than 2.5 A tept"oit* "r,"rgy senting suboptimal minima on the energysurface' Neuruminidase-DANA The results for INSD show that t}le GA again performs bestin terms of median energy,although the differGiw*n EP and the GA is not statistically signifi"n"i arc cant.'fle suqe-s-s.rates somewhat lowgr than those except for TS which still observed with iiilFn-vrx, to perform very well. This ptiints to a more continues compiicated eocrgy surface for this casq possibly with

220 more than one minimumof similar depth to that corre_ latter bindingmode is in fact an artefactof the potential sponding rhecrystal to strltcture. This suspicion conis function,which seems be biasedin favouroisteric fit to firmed by the scatter plot shownin Fig. 9 in which it can and appearsnot to favour specificinteractions like the be clearlyseen that thereis no simplelinear relationship carboxylate-arginine bridge sufficiently. salt The resultis between energy and rms from the cryslal structure. Th; that the energyseparationof the two minima is very 6gureshowsthat TS findstwo dominantclusters lowof small (about six units); this leadsto a greatertendency energy solutions, closeto the crystalstructure one having for someof the algorithmsto be trapped in the higher energies the range-87 to -105 units, and the other in energy minimum than would exist if the seDaration were with rms valuesin the range 4.5-5.0 A with energies madegreaterby altering the porential funition so that rangingfrom -88 to -99 units.The lower ru""".r ru-t", salt bridgesweremore highly rewarded. can be attributedin part to the existence this latter of minimumin whicheachalgorithmbecomes entrapped in HIV-l protease-XK263 someproportion of its docking attempts. The resultsin this caseseem suggest TS is less to that susceptible this to A point to note conceming this systemis that the entrapment and, thus,may be carryingout a more eflec_ complex deposited the PDB contajnsno watermolas in tive global search than the other algorithms ecules;the active site water molecule which mediates As an asidq it is interesting considerthe two obto contact betweenpeptidomimeticHIV-I protease inhib,. served bindingmodes more detail.In the crystallograin itors and the enzymeis displacedby the carbonylgroup phic conformation,the carboxylategroup of DANA of the urea which interactsdirectly with the activesite formsa salt bridgewith ArgrTr and also hydrogen bonds residues lle5o and Ile5o'. with Arg'r5and Arg'?er clearlya very strongand specific SA perlorms best according to the median energy intemction.The carbonylof DANAk acetylamino group criterion,and the differcncebetween and the secondit forms a hydrogen bond with Argrae and the methylmoiplacedalgorithm (he GA) is statisrically significant alety of the acetylamino group makeshydrophobic contact thougfi the differences in median energiesbetweenthe with Trpr76and Arp:22. The alternativebinding mode GA, TS and EP are not. In this test case, conrmstto ln discovered the docking algorithmsis almost invened by the two previousexampleqa better corrglationis obwith respectto the crystallog.aphic conformation.The served between success and medianenergyi perrute SA salt bridge is not formed; instead, the carbonyl of forming best and EP performing wolst on both counts. DANAs acetylamino group formsa hydrogen bond with The scatter plot shownin Fig. I0 for the resulrs produced Argrr5. The majorityof the remainder the interactions of by SA showsthis co[elation well. It is also noticeable in this mode are hydrophobic. is our belief that this It from this figurethat a numberof low+nergysolutions tie

o o

tr

coaooo

l5

0.5

-1 7 0

-1 6 0

-150

-t40

-130

-120

_lt0

Energy(arbitraryunits) plot Fjg. 8. Scattcr of healyalom rmsversus energy 500docks for olmcthoirexate DHFR usingrheGA. into

"o

o $"d{&o

fo.roe o o

&""fe'.

F-f.""'".
0
-l

l0

-t0 5

-100

-90 Enerry(arbitrry units)

Fi8. 9. Scattcr plot of hcavy atom rms rrsus cncrBy for 500 docks of DANA ioto influcnza virus ncuraminidase usine TS

why the suc.cess Thrombin-NAPAP lq.!!rq Jlqg !.5-2.0 A rms Ttis expla.ins rates for this tcst cas arc xiiewhat lowsr than thos foriird in DHFR-MTX. The-.gcatcr spread of good solTS aBd-thacA produce ttr lowcst median energies for utigns around the crystallogralhic colformation reflects IETS, thc differenc btweenthem not bei;g statistically the,less dircctioral nature of biqdlgg of parts of this large significant. In terms of successfuldocking this example molecule in an actiw site dor4ioated by lipophilic conprdU*!_thS.lawert. r.4tesaf AlrI tcst set. However,these t4sts. For exarnplg the exact orientation of the naphthyt figures are somewhat misleadiug sincc.the-lowest energy and phenyl rings has gater effect on the calculatedrms solutigns aE.not considoredsuccessful our criterion, by than on the valueof tbc energyfunction. as will now be explained. The performanceofthe algorithms for I fryR whenthe The-qq41teJ of energyversusrms for the TS results plqt local minimisr is not employdis also given in Table 3. is shown in Fig. I L The figure indicatesthat thcre are at ,First, in terms of median encrgy,it can be seenthat the least thrce major clusters of solutions ptoducd by the results from cach algorithm arc improved by locat minialgorithms. The,-first.clustcr is close to the crystallogramisation, the size of the improvement varying as RS Pbic minimum, which can be characterisedby the piper(97.53)> (21.00)>TS SA (20.82)>Ep 1.03)>cA (3.s1). (l id-ine, nalhthyl and benzamidine moietios of NApAp This yariation can b ioterprcted as an indication of the interacting with (rcspectively)thc lipophilic'P' pocket, the effectivenessof thc local searching performed by the tipophilic 'D' pocket and an aspartateresiducat the botalgorithms Clearly, the pojlLation-basedcvolutionary tom of the Sl subsiteof thrombin. A.secondclustcrof algorithms mole eflectiveas local searchers arc than our sotuttool oqxrrs at about 3.5 A rms frgm thc crystal implementations of SA or TS. It should be noted that structurc and contains some solutioos of slightly lower these improvements arc both significant and computaenergy than thtr first cluster, From the standDoilt of tiooally inexpeNive (typically the local minimisation minimising glob-al the eoergy ofour scoringfunciion,this requircs only a few hundred e[ergy evaluations to be is probably the cqrrect solution, and this explainswhy the comparcd with the 200000 permitted for the main algo, tabu search does retatively poorly at docking NApAp rithms), making the local minimiser a useful adjunct to 's-qcccssfully'. Tabu search locatesthe secondcluster 16% the heuristicsearchalgorithm& Scond,it is interestingto of the time, more often than all thc other algorithm$ llote that in lgrms of succes!fatg,.the prformance of thc rates for the GA, Ep and CIh corrEspondingprcntage four algorithms is little affected by the absnceof the SA are 6Vo,'l% atd 9%, respectivcly.) The secoodcluster loc4! Eiaimisation proc.edure.This again indicatgs rhat is cha!"acterised the 'incorrecf positionigg 9f th9 naphby the crystallographicminimum is broad with a wide range thyl moiety, which points into solvent and makes little of energiesbeing prossibls within ths 1.5 A rms cutoff contributionto the score. This positioningis favourable

222

i*jffffili:'*[H';rr'*'-.'r iihl:ffi:*']ffi'rlfr, td+Itt*fffi ti;ffirilf


Thrombin_argatrobqn resr. casethere is some corretarion berween

is :.1"-lill"tTj[:Xil',,:,.,r,"n1r*'-{0unitsorenerrective at r,"o"i'""'"h'";.,'J."nT;:j:"j,fi more

;:q;,nr"*tr;T,*:T ,:#il,i"l'lliil ,iffi tri*{i:fi'H';;Jt',:'jffiffi :f ilr#iq,:il"T*:{#ll:i,F:"#rr* -r"*". t !"Jl'fi*;il:Tl'i: f##i[ifr$:.:jt;r;i,it{r"m::*,ru:m ,r,* -,iil::::1:"'l{
l r s e.he r A- r +L-

is given no *"igr,t ii-ti"

l,r.i-"ij#[ff,i[:;#;1lfuitr-fl*l,.'i,,if j-i+:H$i:,tr{'",tff [###


,".,irj"rnolecular interaction

:,:":-"i ,*nr:r ?rli r jift",i;j,ll.il!fi ff,itE,::,":LfJT,.T,,Hi "nl T J:,":; ::,, J,,j*a"

since^ rcorrect,positionins the o

:::::_l:*' *" $ ;;; ;;:H:lf.::'[:T:f ":. ,"rT

l4;ii$-,Tfr1=",ffi f*f;t+xl s3;ggg;*ggry:: *"ti,l'iji$flf;$##**


:ff h::*jrul :s; ff# j{iiry1il :n
",.j::,$

u r#l^ii fr *:.Tj"",H,liir::Td,"?

ip-$jfri{"rrf,,,:'#:ii'.# l r,"

. " 'i ..

. .ot ' 4 . ". ": "r. ,"':". "

-rE0

-160

-t40

-r20

Energy(afuitsaryunits) Fig I0 scatterptot of healy atom rms versu energy I for 500docks XK263into of HIV_ prorcasc t lsing SA.

-t00

-80

223

." -"fe.*
-145 -140 -135 -130 -125 -r20 -l t5
Energy(arbitBry units) Fig. ll. Scaricr of hcavy plot alom rmsvcrsus cncr8y 500docks for ofNApAp into thrombinusios TS

-l lo

_105 -100 _95

Argatroban is a moderately dimcult test case for the algorithmsbecause, spite of there being a good lowin energyminimum,it is quite dimcult to locate(with the numberof function evaluations chosenfor this study). For this reason, wasdecided relun the jobs with all it to the algorithms increasing numberof rotatableboods the by one eachtime. Because the large numbersof experiof mentsin\,olved this study,only 100docking attempts in wereusedto derivethe statisticsThe order in which the rotatable boDdswere activatedis indicatedin Fig. l3a. This orderwaschosenso as to apprcximately guarante the bst performanceof the GA for a given number of rotatable bonds. The resulting median energiesand succass rates shownin Figs. l3b and c. The introduction are of the fi$t nve rctatable bonds can be seento have little effect on the performance of the algorithms, excpt for the random search which is seriously compromised as additional rotatable bonds are activated. This indicates that althoughthe numbrof rotatablebondsig a reasonable indicator of the size of the sarchspacefor thc test casq it is a poor indicatorof its difhculty.The difficuhy is primarilycontrolledby the prcsence and character of thecompeting low-energy minimaon the potentialenergy surface. The addition of bonds 6 and ? obviouslyintroduces consolidates leastonemore competing and at minimum. In general,the success lates follow the pattem TS > SA > EP > CA, The relative perfiormanceof the algorithms usingthe medianenerg5r criteria variesconsiderably as the testcasechanges from an easytestto a diffrcult one(i.e.on the intloductiooofcompetinglow-energy minima).The CA does best in terms of medianenergy

whenthe testcaseis easy,but not so well whenthe sixth and seventh rotatablebonds are introduced. ascribe We this to good local searching capabilitics but somewhat poorer global searching. showssimilar but lessproEP nounced chalacteristics is a poorer localsearcher TS but produces the best median energies for $even rotatable bonds presumablybecauseof its increasedability to samplethe global energyminimum .elativeto the other algorithms,It could be arguedthat SA showsa similar thoughless clear-cut effect.The fact that the success mtcs and median energies sevenrotatablebonds are not at exactlythe sameas the resultsfor 500runsgivenin Table 3 underlines needto cary out largenumbers runs the of for comparisons this type. of Discussion The clearest conclusionthat can be drawn from our resultsis that RS is alwaysout-performed the other by morc 'intelligent' algorithms.This was of courseto be expectedgiven the size of the searchspaces involved; nonetheless, does provide a good 'control' for our RS results. Drawing conclusions about the other four algorithms is morc difficult. An immediate observation that can be made is that all belcfited, albcit to differing degrees, from hybridisation with the Powelllocal optimisation algorithm. Tuming to the comparisonof median energies, three of the five test cases in the GA was in 0oint) fiIst placein termsof its medianenergy and it was neverworsethanjoint second. Thus,on the basisof this criterion,the GA may perhaps judgod the ,best'algobe

224 rithm. If pointsare awardedaccordrngro ranking by medran.energy (from 4 lor first plac to I tbr founh ptace), then,based overallperformance on across five the

*** rorrows: ffi:?f,Sif;lililff1,X as GA


rlc,'\ArArframpte are not included because. discussed as earliet the global minimum on the energy surlace is not

,"J:,':#;:1.'ff $:ljil3_iiJTjl,illi,il.^,.,1;.,11,ffi

by the CA usingthis criterion is,ontf second placebehind se lHvn_X{zer;. l, i. fit"rf ir,",, thrs-resuh reflects predisposition the pan a on of our GA ror entrapmcnt localminima rn more thanj SI _, f.rn me cryshl.conformation, Conversely, success fl, the of p:od"* the besr success ,ar" tn tfre ,"mainin-g Iij^1 test tnre caseq suggests it is ableto escape that from such mrnrma locatethe ,correct, and minimummoreotten.This

our rnteresrinsty, the illg"_1 attained usins criterion. Dest T_":*:*:": result

- I40

-t3 0

-l l0 Enerry(arbitrary unib)

"e *".;i"g.8gig.3-.:.
3
E

;*S.'"'.
.a".i.".

" "..*'..

g^n "

" no

"-sesjaw,S;"";
-140

-.,r,{-r1i
-70
TS.

-t3 0

-t20

-l l 0
Energ/ (arbitraryunits)

Fig. 12.Scafier plotsof hcavy alom rmsvcrsus cncrgy 500docks for ofargatroban thrombinujing (a) into thc GA and(b)

i
t

.60 :70

3+o
E -90

9r,
I E -lro -120 -130 -l40
r45 Nur$cr of routablc boDds

100

Gl{ +

t.
\\.\. \\ \ ' z '.. ..\\
'''--""'" 't'.

TS .o-. SA -* RS ----

\
\

\".
t.,

Nunbcrof rourablc bond! Fig. 13.(.) fuf.troban showing order it| which lhc rotrr.bt! bondrsc.gctivrtcd, (b) E&,t thc on m.dian.aaerg of ilcrErnqtt lly iDcrlrsiry ure nutnberof rotatablcbondsio r.Saltobon.(c) E|fcct on suocesr of incrrerncntalty rarc inciasing itrc nurnt". oriat"u" uoJ, in'"rg"t.tun.

I
226 capability probabty resultof both the ,random is a rcstart, mum conformationwhich is not realistic,The aeature our TS algorithmand th" us" of original a hid ;;; .of authorshavehad similarexperienc this has and thrcshold led them whichdrivesthe search into newareas space. ro tmptement of a much more complete intemal energy To summarisqusingrhe sameporntssconngscheme as reprcsentation, the form ofthe DREIDING in forcefiJd ror rne medtanenergyand excluding the thrombin_ [48].This.improvement hasyieldedbettersolutions NAPAP examplqthe rankingofthe algorithms ta9j. accordins Our ultimate aim is to use PRO_LEADS to ro success is Ts (14) sA ( | l.)> cA = Ep (8). .. dock rare > ligands whichthe boundconformationat the for recepto, passing. is also interesting consider it to the rclative _In is.unknown.In such a situation, one obviously etnctencres the dockingalgorithms terms cannot ot in of CpU rcly on an rms criterionto judge the success of a docking tme per dockingattemptas measured one node on of a [)uJ.tor suchan application.it is essential haveconfi] to Convex Exemplar(Hp 735 chip). The useof the grid_ oencethat thc global minimum of the energy basedenergyevaluationmeansthat the CpU time-oer function corresponds the boundconformationofthJiigana. to energy We evaluation variesin a linear fashionwith the nu _ have shownthat the potentialfunctronin usein this work ber of heavyatoms in the ligand. Thug for TS the fastest rs relable ln most of the cases,but still requires d_ocks with DANA (22j s) and the ,to*"rr sorne are ur. *irh devetopment beforeit can be usedfor ,de novo,docking XK263 (5359. Usingthis slowest case a tesa as basisfor on a va cty of test cases. companson,the ranking from fastestto slowest is SA Anotherpoint to comeout of this work relates (530t < TS (535s) < Ep (584s) <GA (6?9 to what s). It is ctear features make a docking test casedifncult. It is our exthat the population-based methods suffersomelossfrom penenc that iDcreasing bounding box sizedegrades the the overheads rcquiredfor book-keeping manipulat_ and the performance the atgorithm$Thuq one of musfstrike rng a populationof a few thousandsolutions. However, a compromise between allowingexplorationof the active the slowness our GA implementation probably of is due srteand obtainingreliableresults. Another impoflant to a lack of code optimisationrather than any inherent factor is the numberof rotatablebondsin the ligand.h featureof rhe algorithm. rs worth noting at this point that all of the inielligent It is importatt not to tosesight of wider issues in the atgonthms are.capable virtually 100%successful of rigid midst of all this discussion about the relative qualitiesof dockrng(basedon the crystallographic conformationlf algorithms.We haveconsidered detail how the in algo_ the trgand) all the reportedtestcases in and evenmndom rithms behave verydifficult energy on surfacs wirh comsearchcan obtain the correct answer ln a non_trivial petmg minimaof similar energy. Solurion rhedocking of percentage attemptslA simplecount of the of numberof problemofcoursedepends only in part on thealgorithm] rotatablebonds, however.proves ro be a rather rne other maJorcomponentis the energyfunction. naive The rneasurcof difficulty, lt is the situation of the rotatable Iack of correlationof docking success with the rate mebondswithin the molecule that is crucial,asevidcnced by cllanenergyvaluesin some test cases suggests that the the experiments with thrombin_ argafroban:a rotatable energy function used in this work has dihciencies: for DonoIn a 'hinge'positionin a molecule clearly will make eachof neuraminidase_DANA,thrombin_argatloban and for a more dilficult search problem than one in a more thrombin-NAPAp, there exists at leasr one competrng terminalposition. Mostly, the dimculty of our testcases midimum.ofvery similar depth to that corresponding t; rs govemedpredominantlyby the form of the energy rne crystatstructureWe havesuggested the that relative surlaceOur 'easiest' case test @HFR _MTX) hassevJn depth of this minimum reflectsrhe fact that the encrgy rotatable bonds,but the energysurface seems havejust to functiondoes not attachsufficientweightto satt triall onedeepminimumcorresponding thecrystalstructurq to formation and is too biasedtowardsstedc6tting. If tie wJlrch algoriahms with comparative the find ease. more The energyfunction were improved in sorneway to make the o[ncutt test cases(neuraminidase_DANA, thrombin_ competingminimum much less favourablq then any argatrobar and thrombia_NApAp) have four, sevenand algorithm could be expected producea much higher to six rotatable bonds.respectively, haveenergy but surfaces oocKng success rate. Deficiencies the details of the in comptrcated the existence competingminima. by of energyfunction are also indicatedby the fact that the we havesought to make an unbiased comparison . lowest of energies emerging from successful dockingrunsare four heuristicsearchalgorithms with a random search generally much betterthan that obtainedlor the original procedure actingas a ,control'. The diflicultiesol carrycryslallographic conformation. This is mainly due to the ing out sucha comparison terms of algorithm in impl;_ toruional component the ligand,s of intemalenergy, indi_ mentahonand parameteroptimisationare maniflst. cating either that the crystal structurcscontain some Nonelhetess, have carried out our development we and errorsol more likely,that the torsionaleneryy term is experiments with an open mind and with no particular too large.The results with NApAp also indicaiethat the preconceptions what the resultsshould of bs We havc failureof the energyfunction to rewardintramolecular also made everyeffon to ensurethat the adjustablepara_ hydrophobic interactions leadsto a global energymini_ metersin each of the algorithmshave beensuffictntfu

S-ar putea să vă placă și