Documente Academic
Documente Profesional
Documente Cultură
AND
DRUG DISCOVERY
Sixth Edition
Volume 1: Drug Discovery
Edited by
Donald J.Abraham
Department of Medicinal Chemistry
Vir
.. School of Pharmacy
- r- m iversity
vii
PREFACE
The Editors, Editorial Board Members, and sixth edition, we devote an entire subsection
John Wiley and Sons have worked for three of Volume 4 to cancer research; we have also
and a half years to update the fifth edition of reviewed the major published Medicinal
Burger's Medicinal Chemistry and Drug Dis- Chemistry and Pharmacology texts to ensure
covery. The sixth edition has several new and that we did not omit any major therapeutic
unique features. For the first time, there will classes of drugs. An editorial board was consti-
be an online version of this major reference tuted for the first time to also review and sug-
work. The online version will permit updating gest topics for inclusion. Their help was
and easy access. For the first time, all volumes greatly appreciated. The newest innovation in
are structured entirely according to content this series will be the publication of an aca-
and published simultaneously. Our intention demic, "textbook-like" version titled, "Bur-
was to provide a spectrum of fields that would ger's Fundamentals of Medicinal Chemistry."
provide new or experienced medicinal chem- The academic text is to be published about a
ists, biologists, pharmacologists and molecu- year after this reference work appears. It will
lar biologists entry to their subjects of interest also appear with soft cover. Appropriate and
as well as provide a current and global per- key information will be extracted from the ma-
spective of drug design, and drug develop- jor reference.
ment. There are numerous colleagues, friends,
Our hope was to make this edition of and associates to thank for their assistance.
Burger the most comprehensive and useful First and foremost is Assistant Editor Dr.
published to date. To accomplish this goal, we John Andrako, Professor emeritus, Virginia
expanded the content from 69 chapters (5 vol- Commonwealth University, School of Phar-
umes) by approximately 50% (to over 100 macy. John and I met almost every Tuesday
chapters in 6 volumes). We are greatly in debt for over three years to map out and execute
to the authors and editorial board members the game plan for the sixth edition. His contri-
participating in this revision of the major ref- bution to the sixth edition cannot be under-
erence work in our field. Several new subject stated. Ms. Susanne Steitz, Editorial Program
areas have emerged since the fifth edition ap- Coordinator at Wiley, tirelessly and meticu-
peared. Proteomics, genomics, bioinformatics, lously kept us on schedule. Her contribution
combinatorial chemistry, high-throughput was also key in helping encourage authors to
screening, blood substitutes, allosteric effec- return manuscripts and revisions so we could
tors as potential drugs, COX inhibitors, the publish the entire set at once. I would also like
statins, and high-throughput pharmacology to especially thank colleagues who attended
are only a few. In addition to the new areas, we the QSAR Gordon Conference in 1999 for very
have filled in gaps in the fifth edition by in- helpful suggestions, especially Roy Vaz, John
cluding topics that were not covered. In the Mason, Yvonne Martin, John Block, and Hugo
Preface
Kubinyi. The editors are greatly indebted to Dukat, Martin Safo, Jason Rife, Kevin Reyn-
Professor Peter Ruenitz for preparing a tem- olds, and John Andrako in our Department
plate chapter as a guide for all authors. My of Medicinal Chemistry, School of Pharmacy,
secretary, Michelle Craighead, deserves spe- Virginia Commonwealth University for sug-
cial thanks for helping contact authors and gestions and special assistance in reviewing
reading the several thousand e-mails gener- manuscripts and text. Graduate student
ated during the project. I also thank the com- Derek Cashman took able charge of our web
puter center at Virginia Commonwealth Uni- site, http:l/www.burgersmedchem.com, an-
versity for suspending rules on storage and other first for this reference work. I would es-
e-mail so that we might safely store all the pecially like to thank my dean, Victor
versions of the author's manuscri~tswhere Yanchick, and Virginia Commonwealth Uni-
they could be backed up daily. ~ r $andt not versity for their support and encouragement.
least, I want to thank each and every author, Finally, I thank my wife Nancy who under-
some of whom tackled two chapters. Their stood the magnitude of this project and pro-
contributions have ~rovidedour-field with a
A vided insight on how to set up our home office
sound foundation of information to build for as well as provide John Andrako and me
the future. We thank the many reviewers of lunchtime menus where we often dreamed of
manuscripts whose critiques have greatly en- getting chapters completed in all areas we se-
hanced the presentation and content for the lected. To everyone involved, many, many
sixth edition. Special thanks to Professors thanks.
Richard Glennon, William Soine, Richard
Westkaemper, Umesh Desai, Glen Kel- DONALD J. ABRAHAM
logg, Brad Windle, Lemont Kier, Malgorzata Midlothian, Virginia
Dr. Alfred Burger
Pholtograph of Professor Burger followed by his comments to the American Chemical Society 26th Medicinal
Che,mistry Symposium on June 14, 1998. This was his last public appearance a t a meeting of medicinal
cheimists. As general chair of the 1998 ACS Medicinal Chemistry Symposium, the editor invited Professor
Burger to open the meeting. He was concerned that the young chemists would not know who he was and he
might have an attack due to his battle with Parkinson's disease. These fears never were realized and his
com.ments to the more than five hundred attendees drew a sustained standing ovation. The Professor was 93,
and it was Mrs. Burger's 91st birthday.
Opening Remarks
It has been 46 years since the third Medicinal Chemistry Symposium met at the University of
Virginia in Charlottesville in 1952. Today, the Virginia Commonwealth University welcomes
you and joins all of you in looking forward to an exciting program.
So many aspects of medicinal chemistry have changed in that half century that most of the
new data to be presented this week would have been unexpected and unbelievable had they
been mentioned in 1952. The upsurge in biochemical understandings of drug transport and
drug action has made rational drug design a reality in many therapeutic areas and has made
medicinal chemistry an independent science. We have our own journal, the best in the world,
whose articles comprise all the innovations of medicinal researches. And if you look at the
announcements of job opportunities in the pharmaceutical industry as they appear in
Chemical & Engineering News, you will find in every issue more openings in medicinal
chemistry than in other fields of chemistry. Thus, we can feel the excitement of being part of
this medicinal tidal wave, which has also been fed by the expansion of the needed research
training provided by increasing numbers of universities.
The ultimate beneficiary of scientific advances in discovering new and better therapeutic
agents and understanding their modes of action is the patient. Physicians now can safely look
forward to new methods of treatment of hitherto untreatable conditions. To the medicinal .
scientist all this has increased the pride of belonging to a profession which can offer predictable
intellectual rewards. Our symposium will be an integral part of these developments.
xii
CONTENTS
xiii
xiv Contents
History of Quantitative
structure-~ctivityRelationships
C. D. SELASSIE
Chemistry Department
Pomona College
Claremont, California
Contents
1 Introduction, 2
1.1Historical Development of QSAR, 3
1.2 Development of Receptor Theory, 4
2 Tools and Techniques of QSAR, 7
2.1 Biological Parameters, 7
2.2 Statistical Methods: Linear
Regression Analysis, 8
2.3 Compound Selection, 11
3 Parameters Used in QSAR, 11
3.1 Electronic Parameters, 11
3.2 Hydrophobicity Parameters, 15
3.2.1 Determination of Hydrophobicity by
Chromatography, 17 .
3.2.2 Calculation Methods, 18
3.3 Steric Parameters, 23
3.4 Other Variables and Variable Selection, 25
3.5 Molecular Structure Descriptors, 26
4 Quantitative Models, 26
4.1 Linear Models, 26
4.1.1 Penetration of ROH into
Phosphatidylcholine Monolayers (1841,
27
4.1.2 Changes in EPR Signal of Labeled
Ghost Membranes by ROH (185),27
4.1.3 Induction of Narcosis in Rabbits by
ROH (184), 27
4.1.4 Inhibition of Bacterial Luminescence
by ROH (185),27
4.1.5 Inhibition of Growth of Tetrahymena
pyriformis by ROH (76, 1861, 27
4.2 Nonlinear Models, 28
4.2.1 Narcotic Action of ROH on Tadpoles, 28
4.2.2 Induction of Ataxia in Rats by ROH, 29
Burger's Medicinal Chemistry and Drug Discovery 4.3 Free-Wilson Approach, 29
Sixth Edition, Volume 1: Drug Discovery 4.4 Other QSAR Approaches, 30
Edited by Donald J. Abraham 5 Applications of QSAR, 30
ISBN 0-471-27090-3 O 2003 John Wiley & Sons, Inc. 5.1 Isolated Receptor Interactions, 31
History of Quantitative Structure-Activity Relationships
Rigorous analysis and fine-tuning of indepen- tion of bases and weak acids in bacteriostatic
dent variables has led to an expansion in de- activity (10-12). Meanwhile on the physical
velopment of molecular and atom-based de- organic front, great strides were being made in
scriptors, as well as descriptors derived from the delineation of substituent effects on or-
quantum chemical calculations and spectros- ganic reactions, led by the seminal work of
copy (2). The improvement in high-through- Hammett, which gave rise to the "sigma-rho"
put screening procedures allows for rapid culture (13, 14). Taft devised a way for sepa-
screening of large numbers of compounds un- rating polar, steric, and resonance effects and
der similar test conditions and thus minimizes introducing the first steric parameter, Es (15).
the risk of combining variable test data from The contributions of Hammett and Taft to-
many sources. gether laid the mechanistic basis for the devel-
The formulation of thousands of equa-
opment of the QSAR paradigm by Hansch and
tions using QSAR methodology attests to a
Fujita. In 1962 Hansch and Muir published
validation of its concepts and its utility in
their brilliant study on the structure-activity
the elucidation of the mechanism of action of
drugs at the molecular level and a more com- relationships of plant growth regulators and
plete understanding of physicochemical phe- their dependency on Hammett constants and
nomena such as hydrophobicity. It is now hydrophobicity (16). Using the octanoVwater
possible not only to develop a model for a system, a whole series of partition coefficients
system but also to compare models from a were measured, and thus a new hydrophobic
biological database and to draw analogies scale was introduced (17). The parameter a,
with models from a physical organic data- which is the relative hydrophobicity of a sub-
base (3). This process is dubbed model min- stituent, was defined in a manner analogous to
ing and it provides a sophisticated approach the definition of sigma (18).
to the study of chemical-biological interac-
tions. QSAR has clearly matured, although
it still has a way to go. The previous review
by Kubinyi has relevant sections covering P, and P, represent the partition coefficients
portions of this chapter as well as an exten- of a derivative and the parent molecule, re-
sive bibliography recommended for a more spectively. Fujita and Hansch then combined
complete overview (4). these hydrophobic constants with Hammett's
1.1 Historical Development of QSAR electronic constants to yield the linear Hansch
equation and its many extended forms (19).
More than a century ago, Crum-Brown and
Fraser expressed the idea that the physiologi-
cal action of a substance was a function of its
chemical composition and constitution (5). A
few decades later, in 1893, Richet showed that Hundreds of equations later, the failure of lin-
the cytotoxicities of a diverse set of simple or- ear equations in cases with extended hydro-
ganic molecules were inversely related to their phobicity ranges led to the development of the
corresponding water solubilities (6). At the Hansch parabolic equation (20):
turn of the 20th century, Meyer and Overton
independently suggested that the narcotic (de- .
Log 1IC = a log P
(1.3)
pressant) action of a group of organic com-
pounds paralleled their olive oiVwater parti- - b(l0g P y + C U +k
tion coefficients (7, 8). In 1939 Ferguson
introduced a thermodynamic generalization The delineation of these models led to explo-
to the correlation of depressant action with sive development in QSAR analysis and re-
the relative saturation of volatile compounds lated approaches. The Kubinyi bilinear
in the vehicle in which they were administered model is a refinement of the parabolic model
(9). The extensive work of Albert, and Bell and and, in many cases, it has proved to be supe-
Roblin established the importance of ioniza- rior (21).
History of Quantitative Structure-Activity Relationships
.
Log 1IC = a log P distances and Euclidean distances among at-
oms to calculate E-state values for each atom
in a molecule that is sensitive to conforma-
tional structure. Recently, these electrotopo-
Besides the Hansch approach, other method- logical indices that encode significant struc-
ologies were also developed to tackle struc- tured information on the topological state of
ture-activity questions. The Free-Wilson ap- atoms and fragments as well as their valence
proach addresses structure-activity studies in electron content have been applied to biologi-
a congeneric series as described in Equation cal and toxicity data (28). Other recent devel-
1.5 (22). opments in QSAR include approaches such as
HQSAR, Inverse QSAR, and Binary QSAR
(29-32). Improved statistical tools such as
partial least square (PLS) can handle situa-
BA is the biological activity, u is the average tions where the number of variables over-
contribution of the parent molecule, and aiis whelms the number of molecules in a data set,
the contribution of each structural feature; xi which may have collinear X-variables (33).
denotes the presence Xi = 1 or absence Xi = 0 1.2 Development of Receptor Theory
of a particular structural fragment. Limita-
tions in this approach led to the more sophis- The central theme of molecular pharmacol-
ogy, and the underlying basis of SAR studies,
ticated Fujita-Ban equation that used the log-
has focused on the elucidation of the structure
arithm of activity, which brought the activity
and function of drug receptors. It is an en-
parameter in line with other free energy-re- deavor that proceeds with unparalleled vigor,
lated terms (23). fueled by the developments in genomics. It is
generally accepted that endogenous and exog-
enous chemicals interact with a binding site
on a specific macromolecular receptor. This in-
In Equation 1.6, u is defined as the calculated teraction, which is determined by intermolec-
biological activity value of the unsubstituted ular forces, may or may not elicit a pharmaco-
parent compound of a particular series. Girep-
resents the biological activity contribution of
logical response depending on its eventual site
of action.
.
the substituents, whereasxi is ascribed with a The idea that drugs interacted with specific
value of one when the substituent is present or receptors began with Langley, who studied the
zero when it is absent. Variations on this ac- mutually antagonistic action of the alkaloids,
tivity-based approach have been extended by pilocorpine and atropine. He realized that
Klopman et al. (24) and Enslein et al. (25). both these chemicals interacted with some re-
Topological methods have also been used to ceptive substance in the nerve endings of the
address the relationships between molecular gland cells (34). Paul Ehrlich defined the re-
ceptor as the "binding group of the protoplas-
structure and physical/biological activity. The
mic molecule to which a foreign newly intro-
minimum topological difference (MTD)
duced group binds" (35). In 1905 Langley's
method of Simon and the extensive studies on studies on the effects of curare on muscular
molecular connectivity by Kier and Hall have contraction led to the first delineation of crit-
contributed to the development of quantita- ical characteristics of a receptor: recognition
tive structure propertylactivity relationships capacity for certain ligands and an amplifica-
(26,271. Connectivity indices based on hydro- tion component that results in a pharmacolog-
gen-suppressed molecular structures are rich ical response (36).
in information on branching, 3-atom frag- Receptors are mostly integral proteins em-
ments, the degree of substitution, proximity of bedded in the phospholipid bilayer of cell
substituents and length, and heteroatom of membranes. Rigorous treatment with deter-
substituted rings. A method in its embryonic gents is needed to dissociate the proteins from
state of development uses both graph bond the membrane, which often results in loss of
1 Introduction
integrity and activity. Pure proteins such as Probing of various enzymes by different li-
enzymes also act as drug receptors. Their rel- gands also aided in dispelling the notion of
ative ease of isolation and amplification have Fischer's rigid lock-and-key concept, in which
made enzymes desirable targets in structure- the ligand (key) fits precisely into a receptor
based ligand design and QSAR studies. Nu- (lock). Thus, a "negative" impression of the
cleic acids comprise an important category of substrate was considered to exist on the en-
drug receptors. Nucleic acid receptors (apta- zyme surface (geometric complementarity).
mers), which interact with a diverse number Unfortunately, this rigid model fails to ac-
of small organic molecules, have been isolated count for the effects of allosteric ligands, and
by in vitro selection techniques and studied this encouraged the evolution of the induced-
(37). Recent binary complexes provide insight -
fit model. Thus, "deformable" lock-and-key
into the molecular recognition process in models have gained acceptance on the basis of
these biopolymers and also establish the im- structural studies, especially NMR (45).
portance of the architecture of tertiary motifs It is now possible to isolate membrane-
in nucleic acid folding (38). Groove-binding li- bound receptors, although it is still a challenge
gands such as lexitropsins hold promise as po- to delineate their chemistry, given that sepa-
tential drugs and are thus suitable subjects for ration from the membrane usually ensures
focused QSAR studies (39). loss of reactivity. Nevertheless, great ad-
Over the last 20 years, extensive QSAR vances have been made in this arena, and the
studies on ligand-receptor interactions have three-dimensional structures of some mem-
been carried out with most of them focusing brane-bound proteins have recently been elu-
on enzymes. Two recent developments have cidated. To gain an appreciation for mecha-
augmented QSAR studies and established an nisms of ligand-receptor interactions, it is
attractive approach to the elucidation of the necessary to consider the intermolecular
mechanistic underpinnings of ligand-receptor forces at play. Considering the low concentra-
interactions: the advent of molecular graphics tion of drugs and receptors in the human body,
and the ready availability of X-ray crystallog- the law of mass action cannot account for the
raphy coordinates of various binary and ter- ability of a minute amount of a drug to elicit a
nary complexes of enzymes with diverse li- pronounced pharmacological effect. The driv-
gands and cofactors. Early studies with serine ing force for such an interaction may be attrib-.
and thiol proteases (chymotrypsin, trypsin, uted to the low energy state of the drug-
and papain), alcohol dehydrogenase, and nu- receptor complex: KD = [Drug][Receptor]/
merous dihydrofolate reductases (DHFR) not [Drug-Receptor Complex].Thus, the biological
only established molecular modeling as a pow- activity of a drug is determined by its affinity
e r h l tool, but also helped clarify the extent of for the receptor, which is measured by its K,,,
the role of hydrophobicity in enzyme-ligand the dissociation constant at equilibrium. A
interactions (40-44). Empirical evidence indi- smaller KD implies a large concentration of
cated that the coefficients with the hydropho- the drug-receptor complex and thus a greater
bic term could be related to the degree of de- affinity of the drug for the receptor. The latter
solvation of the ligand by critical amino acid property is promoted and stabilized by mostly
residues in the binding site of an enzyme. To- noncovalent interactions sometimes aug-
tal desolvation, as characterized by binding in mented by a few covalent bonds. The sponta-
a deep crevice/pocket, resulted in coefficients neous formation of a bond between atoms re-
of approximately 1.0 (0.9-1.1) (44). An exten- sults in a decrease in free energy; that is, AG is
sion of this agreement between the mathemat- negative. The change in free energy AG is re-
ical expression and structure as determined by lated to the equilibrium constant K,,.
X-ray crystallography led to the expectation
that the binding of a set of substituents on the
surface of an enzyme would yield a coefficient
of about 0.5 (0.4-0.6) in the regression equa- Thus, small changes in AG" can have a pro-
tion, indicative of partial desolvation. found effect on equilibrium constants.
6 History of Quantitative Structure-Activity Relationships
3. Hydrogen
6 . Hydrophobic
In the broadest sense, these "bonds" would bility of the (ahelix and base-pairing in DNA.
include covalent, ionic, hydrogen, dipole-di- Hydrogen bonding is based on an electrostatic
pole, van der Wads, and hydrophobic interac- interaction between the nonbonding electrons
tions. Most drug-receptor interactions consti- of a heteroatom (e.g., N, 0, S) and the elec-
tute a combination of the bond types listed in tron-deficient hydrogen atom of an -OH, SH,
Table 1.1, most of which are reversible under or NH group. Hydrogen bonds are strongly
physiological conditions. directional, highly dependent on the net de-
Covalent bonds are not as important in gree of solvation, and rather weak, having en-
drug-receptor binding as noncovalent interac- ergies ranging from 1 to 10 kcal/mol(47,48).
tions. Alkylating agents in chemotherapy tend Bonds with this type of strength are of critical
to react and form an immonium ion, which importance because they are stable enough to
then alkylates proteins, preventing their nor- provide significant binding energy but weak
mal participation in cell divisions. Baker's
enough to allow for quick dissociation. The
concept of active site directed irreversible in-
greater electronegativity of atoms such as ox-
hibitors was well established by covalent for-
mation of Baker's antifolate and dihydrofolate ygen, nitrogen, sulfur, and halogen, compared
reductase (46). to that of carbon, causes bonds between these
Ionic (electrostatic) interactions are formed atoms to have an asymmetric distribution of
between ions of opposite charge with energies electrons, which results in the generation of
that are nominal and that tend to fall off with electronic dipoles. Given that so many func-
distance. They are ubiquitous and because tional groups have dipole moments, ion-dipole
they act across long distances, they play a and dipole-dipole interactions are frequent.
prominent role in the actions of ionizable The energy of dipole-dipole interactions can
drugs. The strength of an electrostatic force is be described by Equation 1.8, where p is the
directly dependent on the charge of each ion dipole moment, 0 is the angle between the two
and inversely dependent on the dielectric con- poles of the dipole, D is the dielectric constant
stant of the solvent and the distance between of the medium and r is the distance between
the charges. the charges involved in the dipole.
Hydrogen bonds are ubiquitous in nature:
their multiple presence contributes to the sta-
2 Tools and Techniques of QSAR
Although electrostatic interactions are state that it is the involvement of myriad in-
generally restricted to polar molecules, there teractions that contribute to the overall selec-
are also strong interactions between nonpolar tivity of drug-receptor interactions.
molecules over small intermolecular dis-
tances. Dispersion or Londonlvan der Wads
forces are the universal attractive forces be- 2 TOOLS AND TECHNIQUES OF QSAR
tween atoms that hold nonpolar molecules to-
gether in the liquid phase. They are based on
2.1 Biological Parameters
polarizability and these fluctuating dipoles or
shifts in electron clouds of the atoms tend to In QSAR analysis, it is imperative that the
induce opposite dipoles in adjacent molecules, biological data be both accurate and precise to
resulting in a net overall attraction. The en- develop a meaningful model. It must be real-
ergy of this interaction decreases very rapidly ized that any resulting QSAR model that is
in proportion to llr6,where r is the distance developed is only as valid statistically as the
separating the two molecules. These van der data that led to its development. The equilib-
Wads forces operate at a distance of about rium constants and rate constants that are
0.4-0.6 nm and exert an attraction force of used extensively in physical organic chemistry
less than 0.5 kcallmol. Yet, although individ- and medicinal chemistry are related to free
ual van der Wads forces make a low energy energy values AG. Thus for use in QSAR, stan-
contribution to an event, they become signifi- dard biological equilibrium constants such as
cant and additive when summed up over a Ki or K, should be used in QSAR studies.
large area with close surface contact of the Likewise only standard rate constants should
atoms. be deemed appropriate for a QSAR analysis.
Hydrophobicity refers to the tendency of Percentage activities (e.g., % inhibition of
nonpolar compounds to transfer from an growth at certain concentrations) are not ap-
aqueous phase to an organic phase (49, 50). propriate biological endpoints because of the
When a nonpolar molecule is placed in water, nonlinear characteristic of dose-response rela-
it gets solvated by a "sweater" of water mole- tionships. These types of endpoints may be
cules ordered in a somewhat icelike manner. transformed to equieffective molar doses.
This increased order in the water molecules Only equilibrium and rate constants pass
surrounding the solute results in a loss of en- muster in terms of the free-energy relatioA-
tropy. Association of hydrocarbon molecules ships or influence on QSAR studies. Biological
leads to a "squeezing out" of the structured data are usually expressed on a logarithmic
water molecules. The displaced water becomes scale because of the linear relationship be-
bulk water, less ordered, resulting in a gain in tween response and log dose in the midregion
entropy, which provides the driving force for of the log dose-response curve. Inverse loga-
what has been referred to as a hydrophobic rithms for activity (log 1/C) are used so that
bond. Although this is a generally accepted higher values are obtained for more effective
view of hydrophobicity, the hydration of apo- analogs. Various types of biological data have
lar molecules and the noncovalent interac- been used in QSAR analysis. A few common
tions between these molecules in water are endpoints are outlined in Table 1.2.
still poorly understood and thus the source of Biological data should pertain to an aspect
continued examination (51-53). of biological/biochemical function that can be
Because noncovalent interactions are gen- measured. The events could be occurring in
erally weak, cooperativity by several types of enzymes, isolated or bound receptors, in cellu-
interactions is essential for overall activity. lar systems, or whole animals. Because there
Enthalpy terms will be additive, but once the is considerable variation in biological re-
first interaction occurs, translational entropy sponses, test samples should be run in dupli-
is lost. This results in a reduced entropy loss in cate or preferably triplicate, except in whole
the second interaction. The net result is that animal studies where assay conditions (e.g.,
eventually several weak interactions combine plasma concentrations of a drug) preclude
to produce a strong interaction. One can safely such measurements.
History of Quantitative Structure-Activity Relationships
Table 1.2 Types of Biological Data Utilized Usually the observed biological activity is re-
in QSAR Analysis flective of the slow step or the rate-determin-
Source of Activity Biological Parameters ing step.
To determine a defined biological response
1. Isolated receptors
(e.g., IC,,), a dose-response curve is first es-
Rate constants Log k& Log k,,& Log k
Michaelis-Menten Log 1 /K,
tablished. Usually six to eight concentrations
constants are tested to yield percentages of activity or
Inhibition constants Log l/Ki inhibition between 20 and 80%,the linear por-
Affinity data P&; PA, tion of the curve. Using the curves, the dose
responsible for an established effect can easily
2. Cellular systems be determined. This procedure is meaningful
Inhibition constants Log 1/1C,, if, at the time the response is measured, the
Cross resistance Log CR system is at equilibrium, or at least under
In vitro biological data Log 1IC steady-state conditions.
Mutagenicity states Log T b Other approaches have been used to apply
3. "In vivo" systems
the additivity concept and ascertain the bind-
Biocencentration factor Log BCF ing energy contributions of various substitu-
In vivo reaction rates Log I (Induction) ent (R) groups. Fersht et al. have measured
Pharmacodynamic Log 2' (total clearance) the binding energies of various alkyl groups to
rates aminoacyl-tRNA synthetases (54). Thus the
AG values for methyl, ethyl, isopropyl, and
thio substituents were determined to be 3.2,
6.5, 9.6, and 5.4 kcal/mol, respectively.
It is also important to design a set of mole- An alternative, generalized approach to de-
cules that will yield a range of values in terms termining the energies of various drug-recep-
of biological activities. It is understandable tor interactions was developed by Andrews et
that most medicinal chemists are reluctant to al. (55), who statistically examined the drug-
synthesize molecules with poor activity, even receptor interactions of a diverse set of mole-
though these data points are important in de- cules in aqueous solution. Using Equation 1.9,
veloping a meaningful QSAR. Generally, the a relationship was established between AG
larger the range (>2 log units) in activity, the and Ex (intrinsic binding energy), ED,, (energy'
easier it is to generate a predictive QSAR. This of average entropy loss), and the A S , , (energy
kind of equation is more forgiving in terms of of rotational and translational entropy loss).
errors of measurement. A narrow range in bi-
ological activity is less forgiving in terms of
accuracy of data. Another factor that merits
consideration is the time structure. Should a Ex denotes the sum of the intrinsic binding
particular reading be taken after 48 or 72 h? energy of each functional group of which nx
Knowledge of cell cycles in cellular systems or are present in each drug in the set. Using
biorhythms in animals would be advanta- Equation 1.9, the average binding energies for
geous. various functional groups were calculated.
Each single step of drug transport, binding, These energies followed a particular trend
and metabolism involves some form of parti- with charged groups showing stronger inter-
tioning between an aqueous compartment and actions and nonpolar entities, such as sp2, sp3
a nonaqueous phase, which could be a mem- carbons, contributing very little. The applica-
brane, serum protein, receptor, or enzyme. In bility of this approach to specific drug-receptor
the case of isolated receptors, the endpoint is interactions remains to be seen.
clear-cut and the critical step is evident. But in
more complex systems, such as cellular sys- 2.2 Statistical Methods: Linear
tems or whole animals, many localized steps Regression Analysis
could be involved in the random-walk process The most widely used mathematical tech-
and the eventual interaction with a target. nique in QSAR analysis is multiple regression
2 Tools and Techniques of QSAR
analysis (MRA). We will consider some of the Expanding Equation 1.15, we obtain
basic tenets of this approach to gain a firm
understanding of the statistical procedures n
that define a QSAR. Regression analysis is a
powerful means for establishing a correlation
SS = 2 (Yo,: - YobsaXi YObsb
i=l
-
2 Ei2= C A 2
= SS
The solution of these simultaneous equa-
i=l
tions yields a and b. More thorough analyses
= 2( yobs - YcaIc)
of these procedures have been examined in
detail (19, 58-60). The following simple ex-
n ample, illustrated by Table 1.3, will illus-
Thus, SS = 2 (Yobs a x i- - b)2 (1.15) trate the nuances of a linear regression anal-
i=l ysis.
History of Quantitative Structure-Activity Relationships
may be attributed to inaccuracies in the test- designs are grouped together in the overall
ing procedure (usually dilution errors) or un- training set that is representative of all clus-
usual behavior. They often provide valuable ters (74).
information in terms of the mechanistic inter-
pretation of a QSAR model. They could be par- 3 PARAMETERS USED IN QSAR
ticipating in some intermolecular interaction
that is not available to other members of the 3.1 Electronic Parameters
data set or have a drastic change in mecha- Parameters are of critical importance in deter-
nism. mining the types of intermolecular forces that
2.3 Compound Selection underly drug-receptor interactions. The three
major types of parameters that were initially
In setting up to run a QSAR analysis, com- suggested and still hold sway are electronic,
pound selection is an important angle that hydrophobic, and steric in nature (20,751. Ex-
needs to be addressed. One of the earliest tensive studies using electronic parameters
manual methods was an approach devised by reveal that electronic attributes of molecules
Craig, which involves two-dimensional plots of are intimately related to their chemical reac-
important physicochemical properties. Care is tivities and biological activities. A search of a
taken to select substituents from all four computerized QSAR database reveals the fol-
quadrants of the plot (63). The Topliss opera- lowing: the common Hammett constants (a,
tional scheme allows one to start with two u+, up) account for 700018500 equations in
compounds and construct a potency tree that the Physical organic chemistry (PHYS) data-
grows branches as the substituent set is ex- base and nearly 1600/8000 in the Biology
panded in a stepwise fashion (64). Topliss (BIO) database, whereas quantum chemical
later proposed a batchwise scheme including indices such as HOMO, LUMO, BDE, and po-
certain substituents such as the 3,4-Cl,, 441, larizability appear in 100 equations in the BIO
4-CH,, 4-OCH,, and 4-H analogs (65). Other database (76).
methods of manual substituent selection in- The extent to which a given reaction re-
clude the Fibonacci search method, sequential sponds to electronic perturbation constitutes
simplex strategy, and parameter focusing by a measure of the electronic demands of that
Magee (66- 68). reaction, which is determined by its mecha-,
One of the earliest computer-based and sta- nism. The introduction of substituent groups
tistical selection methods, cluster analysis was into the framework and the subsequent alter-
devised by Hansch to accelerate the process ation of reaction rates helps delineate the
and diversity of the substituents (1).Newer overall mechanism of reaction. Early work ex-
methodologies include D-optimal designs, amining the electronic role of substituents on
which focus on the use of det (X'X), the vari- rate constants was first tackled by Burckhardt
ance-covariance matrix. The determinant of and firmly established by Hammett (13, 14,
this matrix yields a single number, which is 77, 78). Hammett employed, as a model reac-
maximized for compounds expressing maxi- tion, the ionization in water of substituted
mum variance and minimum covariance (69- benzoic acids and determined their equilib-
71). A combination of fractional factorial de- rium constants K,. See Equation 1.28. This
sign in tandem with a principal property led to an operational definition of u, the sub-
approach has proven useful in QSAR (72). Ex- stituent constant. It is a measure of the size of
tensions of this approach using multivariate the electronic effect for a given substituent
design have shown promise in environmental and represents a measure of electronic charge
QSAR with nonspecific responses, where the distribution in the benzene nucleus.
clusters overlap and a cluster-based design ap-
proach has to be used (73). With strongly clus-
tered data containing several classes of com-
pounds, a new strategy involving local
multivariate designs within each cluster is de-
scribed. The chosen compounds from the local Electron-withdrawing substituents are thus
History of Quantitative Structure-Activity Relationships
kx
log -
Kx
log - = p -a (1.32)
AH KH
characterized by positive values, whereas elec- Although this expression is empirical in na-
tron-donating ones have negative values. In ture, it has been validated by the sheer volume
an extension of this approach, the ionization of positive results. It is remarkable because
of substituted phenylacetic acids was mea- four different energy states must be related.
sured. A correlation of this type is clearly mean-
ingful; it suggests that changes in structure
produce proportional changes in the activa-
tion energy AG* for such reactions. Hence, the
derivation of the name for which the Hammett
equation is universally known: linear free en-
ergy relationship (LFER). Equation 1.32 has
become known as the Hammett equation and
has been applied to thousands of reactions
that take place at or near the benzene ring
bearing substituents at the meta and para po-
sitions. Because of proximity and steric ef-
fects, ortho-substituted molecules do not al-
ways follow this maxim and are subject to
different parameterizations. Thus, an ex-
panded approach was established by Charton
(79) and Fujita and Nishioka (80). Charton
partitioned the ortho electronic effect into its
The effect of the 4-C1 substituent on the ion- inductive, resonance, and steric contribu-
ization of 4 4 1 phenylacetic acid (PA) was tions; the factors a, p, and X are susceptibility
found to be proportional to its effect on the or reaction constants and h is the intercept.
ionization of 4-C1 benzoic acid (BA).
Log k = aa, + paR+ Xr, + h (1.33)
para substituents and the vacant p-orbital (a*) of a substituent R' in the ester R' COOR,
in the transition state, which led to devia- where B and A refer to basic and acidic hydro-
tions in the Hammett plot (85). They de- lysis, respectively.
fined a modified LFER applicable to this
situation.
KY
Log- = ( p + ) ( a + )
kH The factor of 2.48 was used to make a* equi-
scalar with Hammett a values. Later, a aI
a+ was a new substituent constant that ex- scale derived from the ionization of 4-X-
pressed enhanced resonance attributes. A bicyclo[2.2.2]octane-1-carboxylic acids was
similar situation was noticed when a strong shown to be related to a* (87, 88). It is now
donor center was present as a reactant or more widely used than a*.
formed as a product (e.g., phenols and m i -
lines). In this case, strong resonance interac-
tions were possible with electron-withdrawing
groups (e.g., NO, or CN). A scale for such sub- Ionization is a function of the electronic
stituents was constructed such that structure of an organic drug molecule. Albert
was the first to clearly delineate the relation-
ship between ionization and biological activity
(89). Now, pKa values are widely used as the
independent variable in physical organic reac-
One shortcoming of the benzoic acid sys- tions and in biological systems, particularly
when dealing with transport phenomena.
tem is the extent of coupling between the car-
However, caution must be exercised in inter-
boxyl group and certain lone-pair donors. In-
preting the dependency of biological activity
sertion of a methylene group between the core
on pKa values because pKa values are inher-
(benzene ring) and the functional group ently composites of electronic factors that are
(COOH moiety) leads to phenylacetic acids used directly in QSAR analysis.
and the establishment of a0scale from the ion- In recent years, there has been a rapid
ization of X-phenylacetic acids. A flexible growth in the application of quantum chemi-
method of dealing with the variability of the cal methodology to QSAR, by direct derivation
resonance contribution to the overall elec- of electronic descriptors from the molecular
tronic demand of a reaction is embodied in the wave functions (90). The two most popular
Yukawa-Tsuno equation (86). It includes nor- methods used for the calculation of quantum
and enhanced resonance contributions to chemical descriptors are ab initio (Hartree-
Fock) and semiempirical methods. As in other
electronic parameters, QSAR models incorpo-
k~ rating quantum chemical descriptors will in-
Log -= p[a
kH
+ r(a+- a ) ] (1.37)
clude information on the nature of the inter-
molecular forces involved in the biological
where r is a measure of the degree of enhanced response. Unlike other electronic descriptors,
resonance interaction in relation to benzoic there is no statistical error in quantum chem-
acid dissociations (r = 0) and cumyl chloride ical computations. The errors are usually
hydrolysis (r = 1). made in the assumptions that are established
Most of the Hammett-type constants per- to facilitate calculation (91). Quantum chemi-
tain to aromatic systems. In evaluating an cal descriptors such as net atomic changes,
electronic parameter for use in aliphatic sys- highest occupied molecular orbitalllowest un-
tems, Taft used the relative acid and base hy- occupied molecular orbital (HOMO-LUMO)
drolysis rates for esters. He developed equa- energies, frontier orbital electron densities,
tion 1.38 as a measure of the inductive effect and superdelocalizabilities have been shown
3 Parameters Used in QSAR
to correlate well with various biological activ- vised and used a multiparameter approach
ities (92). A mixed approach using frontier or- that included both electronic and hydrophobic
bital theory and topological parameters have terms, to establish a QSAR for a series of plant
been used to calculate Hammett-like substitu- growth regulators (16). This study laid the ba-
ent constants (93). sis for the development of the QSAR paradigm
and also firmly established the importance of
lipophilicity in biosystems. Over the last 40
years, no other parameter used in QSAR has
generated more interest, excitement, and con-
troversy than hydrophobicity (96). Hydropho-
bic interactions are of critical importance in
many areas of chemistry. These include en-
zyme-ligand interactions, the assembly of lip-
ids in biomembranes, aggregation of surfac-
In Equation 1.40, AN represents the extent tants, coagulation, and detergency (97-100).
of electron transfer between interacting ac- The integrity of biomembranes and the ter-
id-base systems; AE is the energy decrease in tiary structure of proteins in solution are de-
bimolecular systems underlying electron termined by apolar-type interactions.
transfer; D X D H (EAH/EAx)corresponds to Molecular recognition depends strongly on
electron affinity and distance terms; and hydrophobic interactions between ligands and
OS, factors the electrotopological state in- receptors. Excellent treatises on this subject
dex, whereas E a is the number of all a-elec- have been written by Taylor (101) and Blokzijl
trons in the functional group. Observed and Engerts (51). Despite extensive usage of
principal component analysis (PCA) cluster- the term hydrophobic bond, it is well known
ing of 66 descriptors derived from AM1 cal- that there is no strong attractive force be-
culations was similar to that previously re- tween apolar molecules (102). Frank and
ported for monosubstituted benzenes (94, Evans were the first to apply a thermodynamic
95). The advantages of quantum chemical treatment to the solvation of apolar molecules
descriptors are that they have definite in water at room temperature (103). Their
meaning and are useful in the elucidation of "iceberg" model suggested that a large en-
intra- and intermolecular interactions and tropic loss ensued after the dissolution of apo-
can easily be derived from the theoretical lar compounds and the increased structure of
structure of the molecule. water molecules in the surrounding apolar sol-
ute. The quantitation of this model led to the
development of the "flickering" cluster model
3.2 Hydrophobicity Parameters
of NBmethy and Scheraga, which emphasized
More than a hundred years ago, Meyer and the formation of hydrogen bonds in liquid wa-
Overton made their seminal discovery on the ter (104). The classical model for hydrophobic
correlation between oiltwater partition coeffi- interactions was delineated by Kauzmann to
cients and the narcotic potencies of small or- describe the van der Waals attractions be-
ganic molecules (7,8). Ferguson extended this tween the nonpolar parts of two molecules im-
analysis by placing the relationship between mersed in water. Given that van der Waals
depressant action and hydrophobicity in a forces operate over short distances, the water
thermodynamic context; the relative satura- molecules are squeezed out in the vicinity of
tion of the depressant in the biophase was a the mutually bound apolar surfaces (49). The
critical determinant of its narcotic potency (9). driving force for this behavior is not that al-
At this time, the success of the Hammett equa-
- kanes "hate" water, but rather water that
tion began to permeate structure-activity "hates" alkanes (105, 106). Thus, the gain in
studies and hydrophobicity as a determinant entropy appears as the critical driving force
was relegated to the background. In a land- for hydrophobic interactions that are primar-
mark study, Hansch and his colleagues de- ily governed by the repulsion of hydrophobic
16 History of Quantitative Structure-Activity Relationships
solutes from the solvent water and the limited amphiphilicity and hydrogen-bonding capabil-
but important capacity of water to maintain ity with phospholipids and proteins found in
its network of hydrogen bonds. biological membranes.
Hydrophobicities of solutes can readily be The choice of the octanollwater partition-
determined by measuring partition coeffi- ing system as a standard reference for assess-
cients designated as P. Partition coefficients ing the compartmental distribution of mole-
deal with neutral species, whereas distribu- cules of biological interest was recently
tion ratios incorporate concentrations of investigated by molecular dynamics simula-
charged andlor polymeric species as well. By tions (111).It was determined that pure l-oc-
convention, P is defined as the ratio of concen- tan01 contains a mix of hydrogen-bonded
tration of the solute in octanol to its concen- "polymeric" species, mostly four-, five-, and
tration in water. six-membered ring clusters at 40°C. These
small ring clusters form a central hydroxyl
core from which their corresponding alkyl
chains radiate outward. On the other hand,
It was fortuitous that octanol was chosen as water-saturated octanol tends to form well-de-
the solvent most likely to mimic the biomem- fined, inverted, micellar aggregates. Long hy-
brane. Extensive studies over the last 35 years drogen-bonded chains are absent and water
(40,000 experimental P-values in 400 different molecules congregate around the octanol hy-
solvent systems) have failed to dislodge octa- droxyls. "Hydrophilic channels" are formed by
no1 from its secure perch (107,108). cylindrical formation of water and octanol hy-
Octanol is a suitable solvent for the mea- droxyls with the alkyl chains extending out-
surement of partition coefficients for many ward. Thus, water-saturated octanol has cen-
reasons (109, 110). It is cheap, relatively non- tralized polar cores where polar solutes can
toxic, and chemically unreactive. The hy- localize. Hydrophobic solutes would migrate
droxyl group has both hydrogen bond acceptor to the alkyl-rich regions. This is an elegant
and hydrogen bond donor features capable of study that provides insight into the partition-
interacting with a large variety of polar ing of benzene and phenol by analyzing the
groups. Despite its hydrophobic attributes, it structure of the octanollwater solvation shell
is able to dissolve many more organic com- and delineating octanol's capability to serve as
pounds than can alkanes, cycloalkanes, or ar- a surrogate for biomembranes.
The shake-flask method, so-called, is most
omatic hydrocarbons. It is UV transparent
commonly used to measure partition coeffi-
over a large range and has a vapor pressure
cients with great accuracy and precision and
low enough to allow for reproducible measure- with a log P range that extends from -3 to +6
ments. It is also elevated enough to allow for (112, 113). The procedure calls for the use of
its removal under mild conditions. In addition, pure, distilled, deionized water, high-purity
water saturated with octanol contains only octanol, and pure solutes. At least three con-
M octanol at equilibrium, whereas octa- centration levels of solute should be analyzed
no1 saturated with water contains 2.3 M of and the volumes of octanol and water should
water. Thus, polar groups need not be totally be varied according to a rough estimate of the
dehydrated in transfer from the aqueous log P value. Care should be exercised to ensure
phase to the organic phase. Likewise, hydro- that the eventual amounts of the solute in
phobic solutes are not appreciably solvated by each phase are about the same after equilib-
the M octanol in the water phase unless rium. Standard concentration curves using
their intrinsic log P is above 6.0. Octanol be- three to four known concentrations in water
gins to absorb light below 220 nm and thus saturated with octanol are usually estab-
solute concentration determinations can be lished. Generally, most methods employ a UV-
monitored by W spectroscopy. More impor- based procedure, although GC and HPLC may
tant, octanol acts as an excellent mimic for also be used to quantitate the concentration of
biomembranes because it shares the traits of the solute.
3 Parameters Used in QSAR
Generally, 110-mLstopped centrifuge tubes or donor, and proton acceptor-and they were rep-
2WmL centrifuge bottles are used. They are in- resented by alkanes, odanol, chloroform, and
verted gently for 2-3 min and then centrifuged at propyleneglycol dipelargonate (PGDP), respec-
1000-2000 g for 20 min before the phases are an- tively. The demands of measuringfour partition
alyzed. Analysis of both phases is highly recom- coefficients for each solute has slowed progress
mended, to minimize errors incurred by adsorp in this particular area.
tion to glass walls at low solute concentration. For
highly hydrophobic compounds, the slow stirring 3.2.1 Determination of Hydrophobicity by
Chromatography. Chromatography provides
procedure of de B d j n and Hermens is recom-
an alternate tool for the estimation of hydro-
mended (114).The filler probe extractor system of
phobicity parameters. R, values derived from
Tornlinson et al. is a modified, automated, shake
thin-layer chromatography provide a simple,
flask method, which is efficient, fast, reliable, and rapid, and easy way to ascertain approximate
flexible (115). values of hydrophobicity (122,123).
Partition coefficients from different sol-
vent systems can also be compared and con-
verted to the octanollwater scale, as was sug-
gested by Collander (116). He stressed the Other recent developments in chromatogra-
importance of the following linear relation- phy techniques have led to the development
ship: log P, = a log P, + b. This type of rela-
. ~
neutral and weakly acidic and basic drugs, beginning that not all hydrogens on aromatic
revealed an excellent correlation between systems could be substituted without correc-
log Po,, and log Kw values (129). Log Po,, tion factors because of strong electronic inter-
values determined in this system are re- actions. It became necessary to determine .rr
ferred to as Elog Po,,. They were expressed values in various electron-rich and -deficient
in terms of solvation parameters. systems (e.g., X-phenols and X-nitroben-
zenes). Correction factors were introduced for
special features such as unsaturation, branch-
ing, and ring fusion. The proliferation of
T-scales made it difficult to ascertain which
system was more appropriate for usage, par-
ticularly with complex structures.
The shortcomings of this approach pro-
vided the impetus for Nys and Rekker to de-
sign the fragmental method, a "reductionist"
In this equation, R, is the excess molar re- approach, which was based on the statistical
fraction; ,rr,H is the dipolarity/polarizability; analysis of a large number of measured parti-
2 aZHand 2 p,O are the summation of hydro- tion coefficients and the subsequent assign-
gen bond acidity and basicity values, respec- ment of appropriate values for particular mo-
tively; and V, is McGowan's volume. lecular fragments (118, 134). Hansch and Leo
took a "constructionist" approach and devel-
3.2.2 Calculation Methods. Partition coef- oped a fragmental system that included cor-
ficients are additive-constitutive, free energy- rection factors for bonds and proximity effects
related properties. Log P represents the over- (1, 135). Labor-intensive efforts and inconsis-
all hydrophobicity of a molecule, which tency in manual calculations were eliminated
includes the sum of the hydrophobic contribu- with the debut of the automated system
tions of the "parent" molecule and its sub- CLOGP and its powerful SMILES notation
stituent. Thus, the .rr value for a substituent (136-138). Recent analysis of the accuracy of
may be defined as CLOGP yielded Equation 1.48 (139).
.
MLOGP = 0.959 CLOGP + 0.08 (1.48)
H
-(CFJ
CH=C(CN),
SO,(F)
COEt
C(CFJ3
NH-Et
NHM(CF,)
S-(CH,)
CF3
OCH,F
CHSHNOJTR)
CH,F
F
C(OMe),
SECF,
NHM(0Et)
CH,C1
N(CH,)z
3 Parameters Used in QSAR
methods are based on molecular fragments, hydrogen bond donor strength, respectively;
atomic contributions, or computer-identified and e is the intercept. An extension of this
fragments (1, 106, 107, 144-147). Whole-mol- model has been formulated by Abraham and
ecule approaches use molecular properties or used by researchers to refine molecular de:
spatial properties to predict log P values (148- scriptors and characterize hydrophobicity
150). They run on different platforms (e.g., scales (153-156).
Mac, PC, Unix, VAX, etc.) and use different
calculation procedures. An extensive, recent
3.3 Steric Parameters
review by Mannhold and van de Waterbeemd
addresses the advantages and limitations of The quantitation of steric effects is complex at
the various approaches (143). Statistical pa- best and challenging in all other situations,
rameters yield some insight as to the effective- particularly at the molecular level. An added
ness of such programs. level of confusion comes into play when at-
Recent attempts to compute log P calcula- tempts are made to delineate size and shape.
tions have resulted in the development of sol- Nevertheless, sterics are of overwhelming im-
vatochromic parameters (151, 152). This ap- portance in ligand-receptor interactions as
proach was proposed by Kamlet et al. and well as in transport phenomena in cellular sys-
focused on molecular properties. In its sim- tems. The first steric parameter to be quanti-
plest form it can be expressed as follows: fied and used in QSAR studies was Taft's Es
constant (157). Es is defined as
respectively. To correct for hyperconjuga- must be taken in the QSAR analysis of such
tion in the a-hydrogens of the acetate moi- derivatives. The MR descriptor does not dis-
ety, Hancock devised a correction on Es such tinguish shape; thus the MR value for amyl
that (-CH2CH2CH2CH2CH,)is the same as that
for [-C(Et)(CH,),]: 2.42. The coefficients
with MR terms challenge interpretation, al-
though extensive experience with this param-
In Equation 1.51, n represents the num- eter suggests that a negative coefficient im-
plies steric hindrance at that site and a
ber of a-hydrogens and 0.306 is a constant
positive coefficient attests to either dipolar in-
derived from molecular orbital calculations
teractions in that vicinity or anchoring of a
(158). Unfortunately, the limited availabil-
ligand in an opportune position for interaction
ity of Es and E s C values for a great number (161).
of substituents precludes their usage in The failure of the MR descriptor to ade-
QSAR studies. Charton demonstrated a quately address three-dimensional shape is-
strong correlation between Es and van der sues led to Verloop's development of STERI-
Waals radii, which led to his development of MOL parameters (162), which define the
the upsilon parameter y, (159). steric constraints of a given substituent along
several fixed axes. Five parameters were
deemed necessary to define shape: L, B1, B2,
B3, and B4. L represents the length of a sub-
where r, and r , are the minimum van der stituent along the axis of a bond between the
Waals radii of the substituent and hydrogen, parent molecule and the substituent; B1 to B4
respectively. Extension of this approach represent four different width parameters.
from symmetrical substituents to nonsym- However, the high degree of collinearity be-
metrical substituents must be handled with tween B1, B2, and B3 and the large number of
caution. training set members needed to establish the
One of the most widely used steric param- statistical validity of this group of parameters
eters is molar refraction (MR), which has led to their demise in QSAR studies. Verloop
been aptly described as a "chameleon" pa- subsequently established the adequacy of jqst
three parameters for QSAR analysis: a slightly
rameter by Tute (160). Although it is gener-
modified length L, a minimum width B1, and a
ally considered to be a crude measure of
maximum width B5 that is orthogonal to L
overall bulk, it does incorporate a polariz- (163). The use of these insightful parameters
ability component that may describe cohe- have done much to enhance correlations with
sion and is related to London dispersion biological activities. Recent analysis in our
forces as follows: MR = 47rNd3, where N is laboratory has established that in many cases,
Avogadro's number and a is the polarizabil- B1 alone is superior to Taft's Es and a combi-
ity of the molecule. It contains no informa- nation of B1 and B5 can adequately replace Es
tion on shape. MR is also defined by the (164).
Lorentz-Lorenz equation: Molecular weight (MW) terms have also
been used as descriptors, particularly in cellu-
lar systems, or in distributionltransport stud-
ies where diffusion is the mode of operation.
According to the Einstein-Sutherland equa-
tion, molecular weight affects the diffusion
MR is generally scaled by 0.1 and used in bio- rate. The Log MW term has been used exten-
logical QSAR, where intermolecular effects sively in some studies (159-161)and an exam-
are of primary importance. The refractive in- ple of such usage is given below. In correlating
dex of the molecule is represented by n. With permeability (Perm) of noneledrolytes through
alkyl substituents, there is a high degree of chara cells, Lien et al. obtained the following
collinearity with hydrophobicity; hence, care QSAR (168):
3 Parameters Used in QSAR 25
Log 11C
bital and I, is an indicator variable that signi- tively, in a molecule. To correct for differences
fies the presence of an acenthrylene ring in the in valence, Kier and Hall proposed a valence
mutagens. I, is also an indicator variable that delta (6") term to calculate valence connectiv-
pertains to the number of fused rings in the ity indices (175).
data set. It acquires a value of 1 for all conge- Molecular connectivity indices have been
ners containing three or more fused rings and shown to be closely related to many physico-
a value of zero for those containing one or two chemical parameters such as boiling points,
fused rings (e.g., naphthalene, benzene). molar refraction, polarizability, and partition
Thus, the greater the number of fused rings, coefficients (174, 176). Ten years ago, the E-
the greater the mutagenicity of the nitro con- State index was developed to define an atom-
geners. The EL,,, term indicates that the or group-centered numerical code to represent
lower the energy of the LUMO, the more po- molecular structure (28). The E-State was es-
tent the mutagen. In this QSAR the combina- tablished as a composite index encoding both
tion of indicator variables affords a mixed electronic and steric properties of atoms in
blessing. One variable helps to enhance activ- molecules. It reflects an atom's electronegativ-
ity, whereas the other leads to a decrease in ity, the electronegativity of proximal and dis-
mutagenicity of the acenthrylene congeners. tal atoms, and topological state. Extensions of
In both these QSAR, Kubinyi's bilinear model this method include the HE-State, atom-type
is used (21).See Section 4.2 for a description of E-State, and the polarity index Q . Log P
this approach. showed a strong correlation with the Q index
of a small set (n = 21) of miscellaneous com-
3.5 Molecular Structure Descriptors pounds (28). Various models using electroto-
pological indices have been developed to delin-
These are truly structural descriptors because eate a variety of biological responses
they are based only on the two-dimensional (177-179). Some criticism has been leveled at
representation of a chemical structure. The this approach (180, 181). Chance correlations
most widely known descriptors are those that are always a problem when dealing with such
were originally proposed by Randic (173) and a wide array of descriptors. The physico-
extensively developed by Kier and Hall (27). chemical interpretation of the meaning of
The strength of this approach is that the re- these descriptors is not transparent, although
quired information is embedded in the hydro- attempts have been made to address thi's
gen-suppressed framework and thus no exper- issue (27).
imental measurements are needed to define
molecular connectivity indices. For each bond
the Ck term is calculated. The summation of 4 QUANTITATIVE MODELS
these terms then leads to the derivation of X,
the molecular connectivity index for the mol- 4.1 Linear Models
ecule. The correlation of biological activity with
physicochemical properties is often termed an
extrathermodynamic relationship. Because it
follows in the line of Hammett and Taft equa-
S is the count of formally bonded carbons and tions that correlate thermodynamic and re-
h is the number of bonds to hydrogen atoms. lated parameters, it is appropriately labeled.
The Hammett equation represents relation-
ships between the logarithms of rate or equi-
librium constants and substituent constants.
'X is the first bond order because it considers The linearity of many of these relationships
only individual bonds. Higher molecular con- led to their designation as linear free energy
nectivity indices encode more complex at- relationships. The Hansch approach repre-
tributes of molecular structure by considering sents an extension of the Harnmett equation
longer paths. Thus, 2X and 3X account for all from physical organic systems to a biological
two-bond paths and three-bond paths, respec- milieu. It should be noted that the simplicity
4 Quantitative Models
of the approach belies the tremendous com- the mode of interactions of chemicals with bi-
plexity of the intermolecular interactions at ological entities. Examples of linear models
play in the overall biological response. pertaining to nonspecific toxicity are de-
Biological systems are a complex mix of het- scribed. The effects of a series of alcohols
erogeneous phases. Drug molecules usually tra- (ROH) have been routinely studied in many
verse many of these phases to get from the site of model and biological systems. See QSAR 1.63-
administration to the eventual site of action. 1.67.
Along this random-walk process, they perturb
many other cellular components such as or- 4.1.1 Penetration of ROH into Phosphati-
ganelles, lipids, proteins, and so forth. These in- dylcholine Monolayers (1 84)
teractions are complex and vastly different from
organic reactions in test tubes, even though the Log 1/C = 0.87(?0.01)logP
eventual interaction with a receptor may be (1.63)
chemical or physicochemical in nature. Thus, + 0.66(&0.01)
depending on the biological system involved-
isolated receptor, cell, or whole animal-one ex-
pects the response to be multifactorial and com-
plex. The overall process, particularly in vitro or 4.1.2 Changes in EPR Signal of Labeled
in vivo, studies a mix of equilibrium and rate Ghost Membranes by ROH (185)
processes, a situation that defies easy separation
and delineation. Log 1/C = 0.93(?0.09)logP
Meyer and Overton were the first to attempt
to get a grasp on biological responses by noting
the relationship between oillwater partition co-
efficients and their narcotic activity. Ferguson
recognized that equitoxic concentrations of 4.1.3 Induction of Narcosis in Rabbits by
small organic molecules was markedly influ- ROH (184)
enced by their phase distribution between the
biophase and exobiophase. This concept was
Log 1/C = 0.72(?0.16)logP
generalized in the form of Equation 1.60 and
extended by Fylita to Equation 1.61 (182,183).
Model systems have been devised to elucidate In all cases, there is a strong dependency on
History of Quantitative Structure-Activity Relationships
+ constant (1.70)
Water phase Aqueous phase
In the random-walk process, the compounds
Figure 1.1. Log Pohno,mirrors Log Pbio. partition in and out of various compartments
and interact with myriad biological compo-
log P, because all these processes involve nents in the process. To deal with this conun-
transport of alcohols through membranes. drum, Hansch proposed a general, compre-
The low intercepts speak to the nonspecific hensive equation for QSAR 1.71 (188).
nature of the alcohol-mediated toxic interac-
tion. An equilibrium-pseudoequilibrium mod- Log 1/C = -a(log P)' + b log P
eled by log P can be defined as shown in Fig. (1.71)
1.1. + p u + SEs + constant
The Hammett-type relationship for this
conceptual idea of distribution is The optimum value of logP for a given system
is log Po and it is highly influenced by the
Log Pbio .
= a log Po-o1+ b (1.68) number of hydrophobic barriers a drug en-
counters in its walk to its site of action.
This postulate assumes that steric, hydropho- Hansch and Clayton formulated the following
bic, electronic, and hydrogen bonding factors parabolic model to elucidate the narcotic ac-
that affect partitioning in the biophase are tion of alcohols on tadpoles (189).
handled by the octanollwater system. Given
that the biological response (log 1/C)is propor- 4.2.7 Narcotic Action of ROH on Tadpoles
tional to log P,,, then it follows that
.
Log 1IC = a log + constant (1.69)
model consisting of a linear ascending part ganic phase and the aqueous phase. An impor-
and a parabolic part (190). See Equations 1.73 tant feature of this model lies in the symmetry
and 1.74. of the curves. For aqueous phases of this
model system, symmetrical curves with linear
Log 1/C = a . l o g P + c ascending and descending sides (like a teepee)
(1.73) and a limited parabolic section around the hy-
(if log P < log Px) drophobicity optimum are generated. Unsym-
.
Log 1/C = -a(log P)' + b log P + c
(1.74)
metrical curves arise for the lipid phases. It is
highly compatible with the linear model and
(if log P > log Px) allows for quick comparisons of the ascending
slopes. It can also be used with other parame-
The binding of drugs to proteins is linearly ters such as MR and u,where it appears to
dependent on hydrophobicity up to a limited pinpoint a change in mechanism similar to the
value, log P,, after which steric hindrance breaks in linearity of the Hammett equation.
causes the linear dependency to alter to a non- The following example of the bilinear model
linear one. The major limitation of this ap- reveals the symmetrical nature of the curve.
proach involves the inclusion of highly hydro-
phobic congeners that tend to cause 4.2.2 Induction of Ataxia in Rats by ROH
systematic deviations between experimental
and predicted values. Log 1/C = O.77(+O.lO)log P
Another cutoff model, which deals with
nonlinearity in biological systems, is one de-
fined by McFarland (191). It attempts to elu-
cidate the dependency of drug transport on
hydrophobicity in multicompartment models.
McFarland addressed the probability of drug s = 0.165, log Po = 2.0
molecules traversing several aqueous lipid
barriers from the first aqueous compartment The bilinear model has been used to model
to a distant, final aqueous compartment. The biological interactions in isolated receptor sys-
probability Po,, of a drug molecule to access tems and in adsorption, metabolism, elimina- '
the final compartment n of a biological system tion, and toxicity studies, although it has a few
was used to define the drug concentration in limitations. These include the need for at least
this compartment. 15 data points (because of the presence of the
additional disposable parameter p and data
LogCR=a - l o g P - 2a.log(P+ 1) points beyond optimum Log P. If the range in
+ constant (1.75) values for the dependent variable is limited,
unreasonable slopes are obtained.
The ascending and descending slopes are 4.3 Free-Wilson Approach
equal (=1)and linear. However, a major draw-
back of this model is that it forces the activity The Free-Wilson approach is truly a structure-
curves to maximize at log P = 0. These studies activity-based methodology because it incor-
were extended by Kubinyi, who developed the porates the contributions made by various
elegant and powerful bilinear model, which is structural fragments to the overall.biological
superior to the parabolic model and is exten- activity (22, 193, 194). It is represented by
sively used in QSAR studies (192). Equation 1.78.
Log 1 / C = a . l o g P - b - l o g ( p . P + 1)
+ constant (1.76)
Indicator variables are used to denote the pres-
where p is the ratio of the volumes of the or- ence or absence of a particular structure feature.
History of Quantitative Structure-Activity Relationships
Like classical QSAR, this de novo approach as- Recent analyses of a Free-Wilson type have
sumes that substituent effeds are additive and included the in vitro inhibitory activity of a
constant. BA is the biological activity; Xjis the series of heterocyclic compounds against K.
jth substituent, which carries a value 1 if pneumonia (197). Other applications of the
present, 0 if absent. The term aj represents the Free-Wilson approach have included studies
contribution of the jth substituent to biological on the antimycobacterial activity of 4-alkyl-
activity and pis the overall average activity. The thiobenzanilides, the antibacterial activity of
summation of all activity contributions at each fluoronapthyridines, and the benzodiazepine
position must equal zero. The series of linear receptor-binding ability of some non-benzodi-
equations that are formulated are solved by lin- apzepine compounds such as 3-X-imidazo-
[1,2-blpyridazines, 2-phenylimidazo[l,2-alpyri-
ear regression analysis. It is necessary for each
dines, 2-(alkoxycarbony)imidazo[2,1-plbenzo-
substituent to appear more than once at a posi-
thiazoles, and 2-arylquinolones (198-200).
tion in different combinations with substituents
at other positions.
4.4 Other QSAR Approaches
There are certain advantages to the Free-
Wilson method that have been addressed The similarity in approaches of Hansch anal-
(193-195). Any type of quantitative biological ysis and Free-Wilson analysis allows them to
data can be subject to such analysis. There is be used within the same framework. This is
no need for any physicochemical constants. based on their theoretical consistency and the
The molecules of a series may be structurally numerical equivalencies of activity contribu-
dissected in any way and multiple sites of sub- tions. This development has been called the
stitution are necessary and easily accommo- mixed approach and can be represented by the
dated (196). Limitations include the large following equation:
number of molecules with varying substituent
combinations that are needed for this analysis Log 1/C = 2 a,+ cj + constant (1.80)
and the inability of the system to handle non-
linearity of the dependency of activity on sub- The term ai denotes the contribution for each
stituent properties. Intramolecular interac- ith substituent, whereas Djis any physicochem-
tions between the substituent are not handled ical property of a substituent q.For a thorough
very well, although special treatments can be review of the relationship between Hansch-and
used to accommodate proximal effects. Ex- Free-Wilson analyses, see the excellent reviews
trapolation outside of the substituents used in by Kubinyi (58, 195). A recent study of the
P-glycoprotein inhibitory activity of 48
the study is not feasible. Another problem in-
propafenone-type modulators of multidrug re-
herent with this approach is that usually a
sistance, using a combined HanscWFree-Wilson
large number of variables is required to de-
approach was deemed to have higher predictive
scribe a smaller number of compounds, which ability than that of a stand-alone Free-Wilson
creates a statistical faux pas. Fujita and Ban analysis (201). Molar refractivity, which has a
modified this approach in two important ways high collinearity with molecular weight, was a
(23). They expressed the biological activity on significant determinant of modulating ability. It
a logarithmic scale, to bring it into line with is of interest to note that molecular weight has
the extrathermodynamic approach, as seen in been shown to be an omnipresent parameter in
the following equation: cross-resistance profiles in multidrug-resistance
phenomena (167).
Log X, = C, aiXi + p (1.79)
NH2, HCI
I
5.1.1 lnhibition of Crude Pigeon Liver In this example, the R group on the 2-nitrogen
DHFR by Triazines (202) was restricted to an (3-X-phenyl) aromatic
ring (205). Accurate Ki values were obtained
Log l/IC,o = 2 . 2 1 ( + 1 . 0 0 ) ~ from highly purified DHFR isolated from
chicken liver. In most cases, T' represented
- 0.28(?0.17)~~ the hydrophobicity of the substituent except
+ O.84(+0.76) D in certain instances where X = -OR or
-CH,ZC,H,-Y. It was ascertained that alkoxy
+ 2.58(?1.30) substituents were not making direct hydro-
History of Quantitative Structure-Activity Relationships
phobic contact with the enzyme, given that 5.1.4 lnhibition of 11210 DHFR by 3-X-Tria-
their inhibitory activities were essentially con- zines (209)
stant from the methoxy to the nonyloxy sub-
stituent. In the bridged substituents where Z Log l/Ki
= 0,NH, S, Se, the Y substituent again did not
contact the enzyme surface. Variation in Y led
to the same, constant biological activity. The
coefficient with a' suggests that the substitu-
ent is engulfed in a hydrophobic pocket that
has an optimal a ' , of 2. This value is consis-
tent with that seen in the crude pigeon liver
DHFR corrected for the presence of the phenyl a t o= 1.76(?0.28) log /3 = -0.979
group (4.0 - 2.0 = 2). The 0.86 p value (coef-
ficient with u) suggests that there could be a The consistency in these models versus pro-
dipolar interaction between the electron defi- karyotic DHFR is established by the coeffi-
cient phenyl ring and a region of positively cient with the hydrophobic term, the optimum
charged electrostatic potential in the enzyme, a' value, and the rho value. These numerical
perhaps an arginine, lysine, or histidine resi- coefficients can be contrasted sharply with
due. Hathaway et al. developed a QSAR for the those obtained from fungal and protozoal
inhibition of human DHFR by 3-X-triazines DHFR. Inhibition constants were determined
and obtained Equation 1.83 (208). for 3-X-triazines versus Pneumocystis carinii
DHFR (210).
5.1.3 lnhibition of Human DHFR by 3-X-
Triazines (208) 5.1.5 lnhibition of P. carinii DHFR by 3-X-
Triazines (210)
Log l/Ki
Log l/Ki
bilinear equation is much steeper (1.36 - 0.73 the former and the testing for QSAR 1.87 was
= 0.63) than that seen with the mammalian conducted under different assay conditions; Ki
and avian enzymes. values were not determined. A noteworthy dif-
A similar model is obtained vs. the bifunc- ference between these models is the wide dis-
tional protozoal DHFR from Leishmania ma- parity in % values. The binding site of the
jor, which is coupled to thymidylate synthase protozoal and fungal species comprises an ex-
(211). tensive hydrophobic surface unlike the abbre-
viated pockets in the mammalian and avian
5.1.6 lnhibition of L. major DHFR by 3-X- enzymes. The positive coefficients with the
Triazines (211) MR, terms suggests that added bulk on the
bridged phenyl ring enhances inhibitory po-
Log 11Ki tency. The study versus T. gondii DHFR
(QSAR 1.87) included a number of mostly small,
polar substituents (NH,, NO,, CONMe,) on
the bridged phenyl and their activities were
considerably lower than the unsubstituted an-
alog. Comparative QSAR can be useful, partic-
ularly if the biological data are consistent
(tested under the same assay conditions, ex-
cellent purity of enzymes, substrates, inhibi-
tors, buffers), and the choice of substituents is
appropriate.
One of the major problems that arises with
some QSAR studies is extrapolation from be-
yond spanned space. Predictive ability is
QSAR analysis on a limited set of 3-X-triazines sound when one has probed an adequate range
assayed by Chio and Queener versus Toxo- in electronic, hydrophobic, and steric space. At
plasmosis gondii led to the formulation of the onset of the study, the training set should
Equation 1.87 (202, 212). address these concerns. Lack of adequate at-
tention to such issues can result in QSAR -
5.1.7 Inhibition of T. gondii DHFR by 3-X- models that are misleading. When examined
Triazines on its own, such a model may appear to with-
stand statistical rigor and apparent transpar-
Log l/ICS, = 0.39(IC_0.20)~' ency but, on being subjected to lateral valida-
(1.87)
- O.43(+0. 19)MRy + 6.65(20.30) tion, loopholes emerge. A brief study to
illustrate this phenomenon is outlined below.
Four different QSAR were derived for the
inhibition of DHFR from rat liver, human leu-
A quick comparison of QSAR 1.82-1.84 re- kemia, mouse L1210, and bovine liver by 2,4-
veals the strong similarity between the avian diamino, 5-Y, 6-Z-quinazolines (Fig. 1.3) (202,
and mammalian models. In fact because of its 213-215). A comparison of their QSAR pre-
increased stability, chicken liver DHFR has sents an interesting study on the importance
often been used as a surrogate for human of spanned space in delineating enzyme-recep-
DHFR in enzyme-inhibition studies. The in- tor interactions.
tercepts, coefficients with d, and optimum
do for avian (6.33, 1.01, 1.91, human (6.07,
1.07, 2.0), and mouse leukemia (6.12, 0.98,
1.76) can be compared to the corresponding
values for P. carinii (6.48, 0.73, 3.99) and
Leishmania major (5.05, 0.65, 4.54). QSAR
1.81 and 1.87 are not included in the compar-
ison because crude pigeon enzyme was used in Figure 1.3. 2,4-Diarnino,5-Y,6-Z-quinazolines.
History of Quantitative Structure-Activity Relationships
5.1.8 lnhibition of Rat Liver DHFR by 2,4- 5.1.1 1 lnhibition of Bovine Liver DHFR by
Diamino, 5-Y, 6-Z-quinazolines (21 3) 2,4-Diamino, 5-Y, 6-Z-quinazolines (21 5)
Log 1/IC50
= 0.78(+0.12).rr5
+0.81(20.12)~~,
- 0 . 0 ~ ~ 2 0 . 0 2 ~ ~ ~(1.88)
~ ~ These QSAR vary in size and the number of
variables used to define inhibitory activity.
- 0.73(rt0.49)11- 2.15(?0.38)12 Selassie and Klein have described a more thor-
ough comparative analysis of these QSAR
- 0.54(?0.21)13- 1.40(+0.41)14
(202).A brief focus on the MR, term reveals
+ 0.78(t0.37)16 that its coefficients vary remarkably in all four
sets. QSAR 1.88 is a parabola with an opti-
- O.2O(tO.l2)M& . I mum of 6.4. Because it is parabolic in nature,
the coefficient of the ascending slope cannot be
+ 4.92(t0.23) compared with the linear slopes in QSAR
n = 101, r 2 = 0.924, s = 0.441, 1.89-1.91. Figure 1.4 illustrates the problems
with QSAR 1.89-1.91, which failed to test an-
M&,g = 6.4(+0.8) alogs across the available space.
Figure 1.4 reveals that QSAR 1.89 and 1.90
were sampled in the suboptimal MR, range;
5.1.9 lnhibition of Human Liver DHFR by
thus, the negative dependency on MR,. On the
2,4-Diamino, 5-Y, 6-Z-quinazolines (214)
other hand, QSAR 1.91 was focused on the
ascending portion of the curve and thus only
Log l / K i molecules in the 0.1-3.4 range were tested.
Thus, with a limited set of compounds, one
= -2.87(?0.16)11 gets a misleading picture of the biological
interactions.
.
Enzymatic reactions in nonaqueous sol-
vents have generated a great deal of interest,
fueled in part by the commercial application of
enzymes as catalysts in specialty synthesis.
The increasing demand for enantiopure phar-
maceuticals has accelerated the study of enzy-
matic reactions in organic solvents containing
Log 1/IC50
0 2 4 6 8 10
MR 6
little or no water (216). To investigate the sub- 5.1.1 4 Binding of X-Phenyl, KBenzoyh-
strate specificity of a-chymotrypsin in penta- alaninates in Aqueous Phosphate Buffer (218)
nol, a series of X-phenyl esters of N-benzoyl-L-
alanine (Fig. 1.5) were synthesized and their
binding constants were evaluated in buffer
and in pentanol (203). The following QSAR
1.92 and 1.93 were derived in phosphate
buffer and pentanol.
outlier: n = 5, X = 4-t-Bu
binding site on the enzyme. Note that the DHFR and it can be posited that the cytotox-
larger intercept in QSAR 1.98 versus QSAR icity in the sensitive cell line results from the
1.97 suggests that hydrophobicity is more im- inhibition of the enzyme. The intercepts sug-
portant in this area. gest that slight interference with folate me-
tabolism significantly affects growth. A com-
5.2 Interactions at the Cellular Level parison of the sensitive and resistant QSAR
reveals a substantial difference in the coeffi-
QSAR analysis of studies at the cellular level cients with T . The lack of many variables in
allows us to get a handle on the physicochem- QSAR 1.100 and its overall simplicity suggests
ical parameters critical to pharmacokinetics that inhibition of the enzyme is not the critical
processes, mostly transport. Cell culture sys- step, but rather transport to the site of action
tems offer an ideal way to determine the opti- in these resistant cells may be of utmost im-
mum hydrophobicity of a system that is more portance. This particular cell line was resis-
complex than an isolated receptor. Extensive tant to methotrexate by virtue of elevated lev-
QSAR have been developed on the toxicity of els of DHFR and also overexpression of
3-X-triazines to many mammalian and bacte- glycoprotein, GP-170 (209). Thus, modified
rial cell lines (202, 209). A comparison of the transport through the dysfunctional mem-
cytotoxicities of these analogs vs. sensitive brane would severely curtail the partitioning
murine leukemia cells (L1210/S) and metho- process, resulting in a coefficient with T that is
trexate-resistant murine leukemia cells only one-half (0.42) of what is normally seen.
(L1210/R)reveals some startling differences. The negative coefficient with the MR term in-
dicates that size plays a role, albeit a negative
5.2.1 lnhibition of Growth of L1210/S by one, in passage through the GP-170-fortified
3-X-Triazines (209) membrane and to the site of action.
The QSAR paradigm has been shown to be
Log 111C50 particularly useful in environmental toxicology,
especially in acute toxicity determinations of xe-
nobiotics (223). There has recently been an em-
phasis on "transparent, mechanistically com-
prehensive QSAR for toxicity," a move that is
welcomed by many researchers in the field (224,
225). Cronin and Schultz developed QSAR 1.101
to describe the polar, narcotic toxicity of a large
set of substituted phenols. A number of phenols
with ionizable or reactive groups (e.g., -COOH,
-NO,, -NO, -NH,, or -NHCOCH,) were
omitted from the h a l analysis (226).
TO = 1.45(+_0.93) log p = -0.274
5.2.3 lnhibition of Growth of Tetrahymena
5.2.2 lnhibition of Growth of L1210/R by pyriformis (40 h)
3-X-Triazines (209)
Log 11C
Log 1/IC50
questered into two subsets containing elec- 5.2.7 lnhibition of Growth of T. pyriformis
tron-releasing and electron-attracting sub- by Aromatic Compounds (229)
stituents, respectively (227).
5.2.4 lnhibition of Growth of T. pyriformis
by Phenols (using a)(227)
Log 1/C
5.3.2 Nonrenal Clearance of @Adrenore- steric effects and there was no dependency
ceptor Antagonists on electronic terms. Careful analysis of the
initial data revealed that it had a limited
Log k = 1.94(?0.6l)Clog P range in hydrophobicity and steric at-
tributes. The lack of other QSAR to validate
the findings in QSAR 1.108 made it statisti-
cally significant, a t that time, but mechanis-
tically weak. Most weaknesses in QSAR for-
mulations usually violate the compound-to-
parameter ratio rule (232, 233).
ClogPo = 2.6 + 1.5 log P = -0.813
outlier: oxprenolol 6 COMPARATIVE QSAR
X-phenols-Enzyme Systems
1 Horseradish peroxidase
2 Ladoperoxidase
certed effort not only to develop high-quality 6.2.2 lnhibition of DNA Synthesis in CHO
regressions but also to create models that res- Cells by X-Phenols (236)
onate with those drawn from mechanistic or-
ganic chemistry. A comprehensive, integrated Log 1IC = -0.74(t0.34)u+
database C-QSAR allows us to do so; it con-
- 1.02(?0.41)CMR (1.110)
tains over 16,000 examples drawn from all fac-
ets of chemistry and biology. An example on
the toxicity of X-phenols will illustrate the use-
fulness of this database (164, 228, 235-238).
Recently, increasing numbers of QSAR for
phenols have been based on Brown's a+term, These Brown p+ values were in line with those
an electronic term that was first designed to obtained from chemical and biological systems
(228) see Table 1.5.
rationalize electronic effects of substituents
Cytotoxicity studies of X-phenols versus
on electrophilicaromatic substitution. Studies
L1210 cells in culture led to an unusual result,
conducted at EPA gave early indications that which was b a n g but reminiscent of Hammett
embryologic defects of rat embryos in vitro plots related to changes in mechanism (228).
could be correlated by u+, as seen in QSAR
1.109109 (239).
6.2.3 lnhibition of Growth of 11210 by X-
Phenols
6.2.1 Incidence of Tail Defects of Embryos
(235) Log 1IIC50
= -0.83(t0.18)ut
2. D. J. Livingstone, J. Chem. Znf. Comput. Sci., 27. L. H. Hall and L. B. Kier, J. Pharm. Sci., 66,
40,195 (2000). 642 (1977).
3. C. Hansch, A. Kurup, R. Garg, and H. Gao, 28. L. B. Kier and L. H. Hall, Molecular Structure
Chem. Rev., 101,619 (2001). Description. The Electrotopological State, Aca-
4. H. Kubinyi in M. Wolff, Ed., Burger's Medici- demic Press, San Diego, CA, 1999.
nal Chemistry and Drug Discovery, Volume 1: 29. W. Tong, D. R. Lowis, R. Perkins, Y. Chen,
Principles and Practice, John Wiley & Sons, W. J. Welsh, D. W. Goddette, T. W. Heritage,
New York, 1995, p. 497. and D.M. Sleehan, J. Chem. Inf. Comput Sci.,
5. A. Crum-Brown and T. R. Fraser, Trans. R. 38, 669 (1998).
Soc. Edinburgh, 25, 151 (1868). 30. S. J. Cho, W. Zheng, and A. Tropsha, Pac.
6. C. Richet and C. R. Seancs, Soc. Biol. Ses. Fil., Symp. Biocomput., 305 (1998).
9,775 (1893). 31. H. Gao and J. Bajorath, J. Mol. Diversity, 4,
7. H. Meyer, Arch. Exp. Pathol. Pharmakol., 42, 115 (1999).
109 (1899). 32. H. Gao, C. Williams, P. Labute, and J. Bajo-
rath, J. Chem. Znf. Comput. Sci., 39, 164
8. E. Overton, Studien Uber die Narkose, Fischer,
Jena, Germany, 1901. (1999).
33. W. J. DunnIII, S. Wold, U. Edlund, S. Hellberg,
9. J. Ferguson, Proc. R. Soc. London Ser. B , 127,
and J. Gasteeger, Quant. Struct.-Act. Relat., 3,
387 (1939).
131 (1984).
10. A. Albert, S. Rubbo, R. Goldacre, M. Darcy, and
34. J. Langley, J. Physiol., 1, 367 (1878).
J. Stove, Br. J. Exp. Pathol., 26, 160 (1945).
35. P. Ehrlich, Klin. Jahr., 6, 299 (1897).
11. A. Albert, Selective Toxicity: The Physicochem-
36. J. N. Langley, J. Physiol., 33,374 (1905).
ical Bases of Therapy, 7th ed., Chapman and
Hall, London, 1985, p. 33. 37. M. Famulok, Curr. Opin. Struct. Biol., 9, 324
(1999).
12. P. H. Bell and R. 0. Roblin, Jr.J. Am. Chem.
38. K. Y. Wang, S. Swaminathan, and P. H. Bolton,
SOC.,64,2905 (1942).
Biochemistry, 33, 7617 (1994).
13. L. P. Hammett, Chem. Rev., 17,125 (1935). 39. J. W. Lown in S. Neidle and M.-J. Waring, Eds.,
14. L. P. Hammett, Physical Organic Chemistry, Molecular Aspects ofhticancer Drug-DNA Zn-
2nd ed., McGraw-Hill, New York, 1970. teractions, Macmillan, Basinstoke, UK, 1993,
15. R. W. Taft, J. Am. Chem. Soc., 74,3120 (1952). p. 322.
40. L. Morgenstern, M. Recanatini, T. E. Klein, W.
.
16. C. Hansch, P. P. Maloney, T. Fujita, and R. M.
Muir, Nature, 194, 178 (1962). Steinmetz, C. Z. Yang, R. Langridge, and C.
17. R. Nelson Smith, C. Hansch, and M. M. Ames, Hansch, J. Biol. Chem., 262, 10767 (1987).
J. Pharm. Sci., 64,599 (1975). 41. R. N. Smith, C. Hansch, K. H. Kim, B. Omiya,
G. Fukumura, C. D. Selassie, P. Y. C. Jow, J. M.
18. T. Fujita, J. Iwasa, and C. Hansch, J. Am.
Blaney, and R. Langridge, Arch. Biochem. Bio-
Chem. Soc., 86, 5175 (1964).
phys., 215,319 (1982).
19. C. Hansch and A. Leo in S. R. Heller, Ed., Ex- 42. C. Hansch, T. Klein, J. McClarin, R. Lang-
ploring QSAR. Fundamentals and Applica- ridge, and N. W. Cornell, J. Med. Chem., 29,
tions in Chemistry and Biology, American 615 (1986).
Chemical Society, Washington, DC, 1995.
43. C. D. Selassie, Z. X. Fang, R. Li, C. Hansch, T.
20. C. Hansch, Acc. Chem. Res., 2,232 (1969). Klein, R. Langridge, and B. T. Kaufman,
21. H. Kubinyi,Arzneim.-Forsch., 26,1991 (1976). J. Med. Chem., 29,621 (1986).
22. S. M. Free and J. W. Wilson, J. Med. Chem., 7, 44. J. M. Blaney and C. Hansch in C. A. Ramsden,
395 (1964). Ed., Comprehensive Medicinal Chemistry. The
23. T. Fujita and T. Ban, J. Med. Chem., 14, 148 Rational Design, Mechanistic Study and Ther-
(1971). apeutic Application of Chemical Compounds,
Vol. 4, Quantitative Drug Design, Pergamon,
24. G. Klopman, J. Am. Chem. Soc., 106, 7315 Elmsford, NY,1990, p. 459.
(1984).
45. G. C. K. Roberts, Pharmacochem. Libr., 6, 91
25. B. W. Blake, K. Enslein, V. K. Gombar, and (1983).
H. H. Borgstedt, Mutat. Res., 241,261 (1990). 46. A. A. Kumar, J . H. Mangum, D. T. Blanken-
26. Z. Simon, Angew. Chem. Znt. Ed. Eng., 13,719 ship, and J. H. Freisheim, J. Biol. Chem., 266,
(1974). 8970 (1981).
History of Quantitative Structure-Activity Relationships
47. G. D. Rose and R. Wolfenden,Annu. Rev. Bio- 71. M. Baroni, S. Clernenti, G. Cruciani, N. Ket-
phys. Biomol. Struct., 22,381 (1993). taneh-Wold, and S. Wold, Quant. Struct.-Act.
48. A. T . Hagler, P. Dauber, and S. Lifson, J. Am. Relat., 12, 225 (1993).
Chem. Soc., 101,5131 (1979). 72. M. Sjostrom and L. Eriksson in H. van de
49. W . Kauzmann, Adv. Protein Chem., 14, 1 Waterbeemd, Ed., Chemometric Methods in
(1959). Molecular Design,VCH, Weinheim, Germany,
50. A. Ben-Naim, Pure Appl. Chem., 69, 2239 1995, p. 63.
(1997). 73. L. Eriksson, E. Johansson, M . Muller, and S.
51. W. Blokzijl and J . B. F. N. Engberts, Angew. Wold, Quant. Struct.-Act. Relat., 16, 383
Chem. Znt. Ed. Engl., 32, 1545 (1993). (1997).
52. N . Muller, Acc. Chem. Res., 23,23 (1990). 74. L. Eriksson, E. Johansson, M . Muller, and S.
Wold, J. Chemom., 14,599 (2000).
53. F. Eisenhaber, Perspect. Drug Discov. Des., 17,
27 (1999). 75. C. Hansch and T . Fujita, J. Am. Chem.. Soc.,
86, 1616 (1964).
54. A. R. Fersht, J. S. Shindler, and W . C. Tsui,
Biochemistry, 19,5520 (1980). 76. C-QSAR Database, BioByte Corp., Claremont,
55. P. R. Andrews, D. J. Craik, and J . L. Matin, CA.
J.Med. Chem., 27,1648 (1984). 77. G. N. Burckhardt, W . G. K.Ford, and E. Sin-
56. N. R. Draper and H . Smith, Applied Regression gelton, J. Chem. Soc., 17 (1936).
Analysis, 2nd ed., John Wiley & Sons, New 78. L. P. Hammett, J. Chem. Ed., 43,464 (1966).
York, 1981. 79. M. Charton, Prog. Phys. Org. Chem., 8, 235
57. Y . Martin in G. Grunewald, Ed., Quantitative (1971).
Drug Design, Marcel Dekker, New York, 1978, 80. T . Fujita and T . Nishioka, Prog. Phys. Org.
p. 167. Chem., 12,49 (1976).
58. H. Kubinyi in R. Mannhold, P. Krogsgaard- 81. P. D. Bolton, K. A. Fleming, and F. M . Hall,
Larsen, and H. Timmerman, Eds., QSAR: J. Am. Chem. Soc., 94,1033 (1972).
Hansch Analysis and Related Approaches,
82. K. Kalfus, J. Kroupa, M . Vecera, and 0. Exner,
VCH, New York, 1993, p. 91.
Collect. Czech. Chem. Commun., 40, 3009
59. R. Franke in W . Th. Nauta and R. F. Rekker, (1975).
Eds., Theoretical Drug Design Methods,
83. M. Bergon and J. P. Calmon, Tetrahedron
Elsevier Science, A m s t e r d d e w York, 1983,
Lett., 22, 937 (1981).
p. 395.
60. C. Hansch in C. J. Cavallito, Ed., Structure Ac-
84. J . Schreck, J. Chem. Ed., 48, 103 (1971). -
tivity Relationships,Vol. 1, Pergamon, Oxford, 85. H. C. Brown and Y . Okarnoto, J. Am. Chem.
U K , 1973, p. 75. SOC.,80,4979 (1958).
61. J . K. Seydel, Znt. J. Quantum Chem., 20, 131 86. Y . Tsuno, T . Ibata, andY.Yukawa, Bull. Chem.
(1981). Soc. Jpn., 32,960,965,971 (1959).
62. J . G. Topliss and R. P. Edwards, J. Med. 87. J. D. Roberts and W. T . Moreland, J. Am.
Chem., 22, 1238 (1979). Chem. Soc., 75,2167 (1953).
63. P. N. Craig, J. Med. Chem., 14, 680 (1971). 88. K. Bowden in C. A. Ramsden, Ed., Comprehen-
sive Medicinal Chemistv. The Rational De-
64. J. G. Topliss, J. Med. Chem., 15,1006 (1972).
sign, Mechanistic Study and Therapeutic Ap-
65. J. G. Topliss, J. Med. Chem., 20,463 (1977). plication of Chemical Compounds, Vol. 4:
66. T . M . Bustard, J. Med. Chem., 17, 777 (1974). Quantitative Drug Design, Pergamon, Elms-
67. F. Darvas, J. Med. Chem., 17, 799 (1974). ford, NY, 1990, p. 212.
68. P. S. Magee in J. Miyamoto and P. C. Kearney, 89. A. Albert, Selective Toxicity: The Physicochem-
Eds., Pesticide Chemistry: Human Welfare and ical Bases of Therapy, 7th ed., Chapman and
Environment, Proceedings of the international Hall, London, 1985, p. 379.
Congress on Pesticide Chemistry, Vol. 1, Per- 90. M. Karelson, V. S. Lobanov, and A. R. Ka-
gamon, Oxford,U K , 1983, p. 251. tritzky, Chem. Rev., 96, 1027 (1996).
69. T . J. Mitchell, Technometrics, 16, 203 (1974). 91. P. S. Magee in ACS Symposium Series 37,
70. T. Moon, M. H. Chi, D. H. Kim, C. N. Yoon, and American Chemical Society, Washington, DC,
Y . S. Choi, Quant. Struct.-Act. Relat., 19, 257 1980.
(2000). 92. S. P. Gupta, Chem. Rev., 91,1109 (1991).
References
136. D. Weininger, J. Chem. Znt. Comput. Sci., 28, 158. K. Hancock, E. A. Meyers, and B. J. Yager,
31 (1988). J. Am. Chem. Soc., 83,4211 (1961).
137. D. Weininger, A. Weininger, and J. L. Wein- 159. M. Charton in M. Charton and I. Motoc, Eds.,
inger, J. Chem. Znt. Comput. Sci., 29, 97 Steric Effects in Drug Design, Springer, Berlin,
(1989). 1983, p. 57.
138. A. Leo in C. A. Ramsden, Ed., Comprehensive 160. M . S. Tute in C. A. Ramsden, Ed., Comprehen-
Medicinal Chemistry. The Rational Design, sive Medicinal Chemistry. The Rational De-
Mechanistic Study and Therapeutic Applica- sign, Mechanistic Study and Therapeutic Ap-
tion of Chemical Compounds, Vol. 4, Quantita- plication of Chemical Compounds, Vol. 4,
tive Drug Design, Pergamon, Elmsford, NY, Quantitative Drug Design, Pergamon, Elms-
1990, p. 315. ford, NY,1990, p. 18.
139. A. Leo, personal communication. 161. C. Hansch and T . Klein, Acc. Chem. Res., 19,
140. A. Leo, Chem. Rev., 93, 1281 (1993). 392 (1986).
141. A. J. Leo and D. Hoekman, Perspect. Drug Dis- 162. A. Verloop, W . Hoogenstraaten, and J. Tipker
cov. Des., 18, 19 (2000). i n E. J. Ariens, Ed., Drug Design, Vol. VII,
142. H. van de Waterbeemd and R. Mannhold, Academic Press, New Yorknondon, 1976, p.
Quant. Struct.-Act. Relat., 15, 410 (1996). 165.
143. R. Mannhold and H. van de Waterbeemd, 163. A.Verloop, The STERZMOLApproach to Drug
J. Cornput.-Aided Mol. Des., 15,337 (2001). Design, Marcel Dekker, New York, 1987.
144. R. F. Rekker and H. M. DeKort, Eur. J. Med. 164. C. Hansch, D. Hoekman, A. Leo, D.Weininger,
Chem., 14,479 (1979). and C. D. Selassie, unpublished results.
165. V . A. Levin, J. Med. Chem., 23, 682 (1980).
145. G. Klopman, J. W . Li, S. Wang, a n d M. Dima-
yuga, J. Chem. Znf. Comput. Sci., 34, 752 166. E. J. Lien and P. H. Wang, J. Pharm. Sci., 69,
(1994). 648 (1980).
146. A. K. Ghose and G. M. Crippen, J. Med. Chem., 167. C. D. Selassie, C. Hansch, and T . Khwaja,
28,333 (1985). J. Med. Chem., 33,1914 (1990).
147. T . Suzuki and Y . Kudo, J. Cornput.-Aided Mol. 168. E. J. Lien, L. L. Lien, and H. Gao i n F. Sanz, J.
Des., 4, 155 (1990). Guiraldo, and F. Manaut, Eds., QSAR and Mo-
148. I. Moriguchi, S. Hirono, Q. Liu, I. Nakagome, lecular Modelling: Concepts, Computational
Tools and Biological Applications, Prous Sci-
and Y . Matsushita, Chem. Pharm. Bull., 40,
127 (1992). ence, BarcelonaPhiladelphia, 1995, p. 94. '
149. G. E. Kellogg, G. J. Joshi, and D. J. Abraham, 169. C. Selassie, unpublished results.
J. Med. Chem. Res., 1,444 (1992). 170. M. Recanatini, T . Klein, C. Z . Yang, J . McCla-
150. J. Devillers, D. Domine, C. Guillon, and W . J. rin, R. Langridge, and C. Hansch, Mol. Phar-
Karcher, J. Pharm. Sci., 87, 1086 (1998). macol., 29, 436 (1986).
151. M. J. Kamlet, P. W . Cam, R.W . Taft,and M. H. 171. Y . Naito, M. Sugiura, Y . Yamamura, C.
Abraham, J. Am. Chem. Soc., 103, 6062 Fukaya, K.Yokoyama,Y . Nakagawa, T . Ikeda,
(1981). M . Senda, and T . Fujita, Chem. Pharm. Bull.,
39, 1736 (1991).
152. M. J. Kamlet, J. L. Abboud, M. Abraham, and
R. T a f t , J. Org. Chem., 48,2877 (1983). 172. A. K. Debnath, R. L. L. de Compadre, G. Deb-
nath, A. J. Shusterman, and C. Hansch,
153. J. A. Platts, D. Butina, M. H. Abraham, and A.
J. Med. Chem., 34,786 (1991).
Hersey, J. Chem. Znf. Comput Sci., 39, 835
(1999). 173. M. Randic, J. Am. Chem. Soc., 97,6609 (1975).
154. Y . Ishihama and N. Asakawa, J. Pharm. Sci., 174. L. B. Kier and L. H. Hall, Molecular Connectiv-
88, 1305 (1999). ity in Chemistry and Drug Research, Academic
155. J. A. Platts, M. H. Abraham, D. Butina, and A. Press, New Yorknondon, 1976.
Hersey, J. Chem. Znf. Comput. Sci., 40, 71 175. L. B. Kier and M. H. Hall, J. Pharm. Sci., 72,
(2000). 1170 (1983).
156. A. J. Leo, J. Pharm. Sci., 89, 1567 (2000). 176. L. H. Hall and L. B. Kier, J. Pharm. Sci., 64,
157. R.W . T a f t in M. S. Newrnan, Ed., Steric Effects 1978 (1975).
i n Organic Chemistry, John Wiley & Sons, 177. J. Gough and L. H. Hall, J. Chem. Znf Comput.
New York, 1956, p. 556. Sci., 39, 356 (1999).
References
220. C. M.Compadre, R. J. Sanchez, C. Bhurane- 236. R. Garg, S. Kapur, and C. Hansch, Med. Res.
swarm, R. L. Compadre, D. Plunkett, and Rev., 21,73 (2000).
S. G. Novick in C. G. Wermuth, Ed., Trends in 237. L. Zhang, H. Gao, C. Hansch, and C. Selassie,
QSAR and Molecular Modelling, Escom, J.Chem. Soc. Perkin Trans. 2,2553(1998).
Strasbourg, France, 1993,p. 112. 238. C. Hansch, S. McKarns, C. J. Smith, and D. J.
221. S. V.Frye, C. D. Haffner, P. R. Maloney, R. A. Doolittle, Chem.-Biol. Interact., 127, 61
Mook, Jr., G. F. Dorsey, R. N. Hiner, C. M. (2000).
Cribbs, T. N. Wheeler, J. A. Ray, R. C. An- 239. L. A.Oglesby, M. T. Ebon-McCoy, T. R. Logs-
d r e w ~ K.
, W. Batchelor, H. N. Branson, J. D. don, F. Copeland, P. E. Beyer, and R. J. Kav-
Stuart, S. L. Schwiker, J. Van Arnold, S. lock, Teratology, 45,11 (1992).
Croom, D. M. Bickett, M. L. Moss, G. Tian,
R. 3. Unwalla, F. W. Lee, T. K. Tippin, M. K. 240. C. Hansch and H. Gao, Chem. Rev., 97, 2995
James, M. K. Grizzle, J. E. Long, and S. V. (1997).
Schuster, J.Med. Chem., 37,2352(1994). 241. A. M.Richard, J. K. Hongslo, P. F. Boone, and
222. S. V.Frye, C. D. Haffner, P. R. Maloney, R. N. J. A. Holme, Chem. Res. Toxicol.,4,151(1991).
Hiner, G. F. Dorsey, R. A. Roe, R. J. Unwalla, 242. C. D. Selassie, A. J. Shusterman, S. Kapur,
K. W. Batchelor, H. N. Branson, J. D. Stuart, R. P. Verma, L. Zhang, and C. Hansch,
S. L. Schwiker, J. Van Arnold, D. M. Bickett, J. Chem. Soc. Perkin Trans. 2,2729(1999).
M. L. Moss, G. Tian, F. W. Lee, T. K. Tippin, 243. D.Boyd in A. L. Parrill and M. Rami-Reddy,
M. K. James, M. K. Grizzle, J. E. Long, and Eds., Rational Drug Design, ACS Symposium
D. K. Croom, J. Med. Chem., 38,2621(1995). Series 719,American Chemical Society, Wash-
223. M. T. D. Cronin and J. C. Dearden, Quant. ington, DC, 1999,p. 346.
Struct.-Act. Relat., 14,518 (1995). 244. E. Plummer in C. Hansch and T. Fujita, Eds.,
224. M. T. D. Cronin, B. W. Gregory, and T. W. Classical and Three-Dimensional QSAR in
Schultz, Chem. Res. Toxicol., 11,902 (1998). Agrochemistry, ACS Symposium Series 606,
225. T. W.Schultz, Chem. Res. Toxicol., 12, 1262 American Chemical Society, Washington, DC,
(1999). 1995,p. 241.
226. M. T. D. Cronin and T. W. Schultz, Chemo- 245. T. Fujita, Quant. Struct.-Act. Relat., 16, 107
sphere, 32,1453(1996). (1997).
227. R. Garg, A. Kurup, and C. Hansch, Crit. Rev. 246. H.Koga, A. Itoh, S. Murayarna, S. Suzue, and
Toxicol., 31,223(2001). T. Irikura, J. Med. Chem., 23,1358(1980).
228. C. D. Selassie, T. V. DeSoyza, M. Rosario, H. 247. H. Chuman, A. Ito, T. Shaishoji, and S.
Gao, and C. Hansch, Chem.-Biol. Interact., Kumazawa in C. Hansch and T. Fujita, Eds.,
113,175(1998). Classical and Three-Dimensional QSAR in
Agrochemistry, ACS Symposium Series 606,
229. M.T. D. Cronin and T. W. Schultz, Chem. Res. American Chemical Society, Washington, DC,
Toxicol., 14,1284 (2001). 1995,p. 171.
230. P.H.Hinderling, 0.Schmidlin, and J. K. Sey- 248. J. Ohtaka and G. Tsukamoto, Chem. Pharm.
del, J. Pharmacokinet. Biopharm., 12, 263 Bull., 35,4117(1987).
(1984).
249. M.Kuchar, E. Maturova, B. Brunova, J. Gri-
231. C. Selassie and T. E. Klein in H. Kubinyi, Ed., mova, H. Tomkova, and K. J. Holubek, Collect.
3 0 QSAR in Drug Design. Theory, Methods Czech. Chem. Commun., 53,1862 (1988).
and Applications, Escom Science, Leiden, The
Netherlands, 1993,p. 257. 250. T. Fujita in G. Jolles and K. R. H. Wooldridge,
Eds., Drug Design: Fact or Fantasy, Academic
232. 0. Geban, H. Ertepinar, M. Yurtsever, S. Press, London, 1984,p. 19.
Ozden, and F. Gumus, Eur. J. Med. Chem., 34,
753(1999). 251. J. G. Topliss, Perspect. Drug Discov. Des., 1,
233. S. Daunes, C. D'Silva, H. Kendrick, V. Yardley, 253(1993).
and S. L. Croft, J.Med. Chem.,44,2976(2001). 252. C. Hansch, J. P. Bjorkroth, and A. Leo,
234. C. Hansch, H. Gao, and D. Hoekman in J. Dev- J. Pharm. Sci., 76,663(1987).
illers, Ed., Comparative QSAR, Taylor & Fran- 253. C. Hansch, R. Garg, and A. Kurup, Bioorg.
cis, Washington, DC, 1998,p. 285. Med. Chem., 9, 283 (2001).
235. C. Hansch, B. R. Telzer, and L. Zhang, Crit. 254. R. Garg, A. Kurup, S. B. Mekapati, and C.
Rev. Toxicol., 25,67 (1995). Hansch, Bioorg. Med. Chem., in press (2002).
CHAPTER TWO
Contents
1 Introduction, 50
1.1 A Unified Concept of QSAR, 51
1.2 The Taxonomy of QSAR Approaches, 52
2 Multiple Descriptors of Molecular Structure, 54
2.1 Topological Descriptors, 54
2.2 3D Descriptors, 55
3 QSAR Modeling Approaches, 58
3.1 3D-QSAR, 58
3.2 The Descriptor Pharmacophore Concept and
Variable Selection QSAR, 60
3.2.1 Linear Models, 61
3.2.2 Nonlinear Models, 62
.
4 Validation of QSAR Models, 63
4.1 Beware of q2, 64
4.2 Rational Selection of Training and Test Sets,
64
4.3 Guiding Principles of Safe QSAR, 66
5 QSAR Models a s Virtual Screening Tools, 66
5.1 Data Mining and SAR Analysis, 66
5.2 Virtual Screening, 67
5.3 Rational Library Design by use of QSAR, 68
6 Conclusions, 69
1 INTRODUCTION
second column)], and calculated values of mo- sition and coordinates of all atoms. Thus, in
lecular descriptors in all remaining columns general, all QSAR models can be universally
(sometimes, experimentally determined phys- compared in terms of their statistical signifi-
ical properties of compounds can be used as cance and, most important, their ability to
descriptors as well). predict accurately biological activities (or
The differences in various QSAR method- other target properties) of molecules not in-
ologies can be understood in terms of types of cluded in the training set (cf. molecular me-
target property values, types of descriptors, chanics, where different methods are ulti-
and differences in optimization algorithms mately compared by their ability to reproduce
used to relate descriptors to the target proper- experimental molecular geometries). This
ties. The target property values can be defined concept of statistical robustness and the pre-
as activity classes [i.e., active or inactive, fre- dictive ability as universal characteristics of
quently encoded numerically for the purpose any QSAR model independent of the particu-
of the subsequent analysis as one (for active) lars of individual approaches should be kept in
or zero (for inactive)] or as a continuous range mind as we consider examples of QSAR tools,
of values; the corresponding methods of data their applications, and pitfalls in the subse-
analysis are referred to as classification or con- quent sections of this chapter.
tinuous property QSAR, respectively. Descrip-
1.2 The Taxonomy of QSAR Approaches
tors can be generated from various represen-
tations of molecules (e.g., 2D chemical graphs Many different approaches to QSAR have
or 3D molecular geometries), giving rise to the been developed since Hansch's seminal work.
terms of 2D- or 3D-QSAR, respectively. Fi- As briefly discussed above, the major differ-
nally, the types of optimization algorithms ences between these methods can be analyzed
used in the QSAR model development lead to from two viewpoints: (1)the types of struc-
the definitions of linear versus nonlinear tural parameters that are used to characterize
QSAR methods. molecular identities, starting from different
In some cases, the types of biological data, representation of molecules, from simple
the choice of descriptors, and the class of opti- chemical formulas to three-dimensional con-
mization methods are closely related and mu- formations; and (2) the mathematical proce-
tually inclusive. For instance, multiple linear dure that is employed to obtain the quantita-
regression can be applied only when a rela- tive relationship between these structural '
tively small number of molecular descriptors parameters and biological activity.
are used (at least five to six times smaller than On the basis of the origin of molecular de-
the total number of compounds) and the tar- scriptors used in calculations, QSAR methods
get property is characterized by a continuous can be divided into three groups. One group is
range of values. The use of multiple descrip- based on a relatively small number (usually
tors makes it impossible to use MLR because many times smaller than the number of com-
of a high chance of spurious correlation (16) pounds in a data set) of physicochemical prop-
and requires the use of partial least squares or erties and parameters describing, for example,
nonlinear optimization techniques. However, hydrophobic, steric, and electrostatic effects.
in general, for any given data set a user could Usually, these descriptors are used as inde-
choose between various types of descriptors pendent variables in multiple regression ap-
and various optimization schemes, combining proaches (18).In the literature, these methods
them in a practically mix-and-match mode, to are typically referred to as Hansch analysis
arrive at statistically significant QSAR models (8).These types of descriptors and correspond-
in a variety of ways. This situation is in es- ing linear optimization methods used in tradi-
sence analogous to molecular mechanics cal- tional QSAR analyses are discussed exten-
culations (17), where different force fields and sively in the chapter by Celassie (7) and
differently derived parameters are developed therefore is not reviewed here.
by different groups, although the common More recent methods are based on quanti-
goal is to compute (unique) optimized geome- tative characteristics of molecular graphs
tries of molecules from their chemical compo- (molecular topological descriptors). Because
1 Introduction
molecular graphs or structural formulas are 3D-QSAR methods require 3D alignment of all
"two-dimensional," these methods are re- molecules according to a phannacophore
ferred to as 2D-QSAR. Most of the 2D-QSAR model or based on ligand docking to a recep-
methods are based on graph theoretical indi- tor-binding site. Descriptors in the case of
ces, which have been extensively studied by CoMFA (40, 43) and CoMFA-like methods
Randic (19) and Kier and Hall (20-22). They such as COMBINE (44), COMSiA (45), and
include, for example, molecular connectivity QsiAR (46) represent electrostatic, steric, and
indices (19, 20), molecular shape indices (23, hydrophobic field values (to name but a few
24), topological (25) and electrotopological examples) in the grid points surrounding mol-
state indices (26-291, and atom-pair descrip- ecules.
tors (30, 31). Sometimes, topological descrip- Finally, QSAR methods can also be classi-
tors are also combined with physicochemical fied by the type of the correlation methods
properties of molecules. Although these struc- used in model development. Linear methods
tural indices represent different aspects of include linear regression or MLR, PLS (41,42,
molecular structures, and, what is important 47), or principal component regression (PCR),
for QSAR, different structures provide nu- whereas nonlinear methods can be exempli-
merically different values of indices, their fied, for example, by k-Nearest Neighbors
physicochemical meaning is frequently un- (kNN) (48,49) and artificial neural networks
clear. The successful applications of topologi- (50) methods. An example of the linear meth-
cal indices combined with multiple linear ods is provided by the ADAPT system, which
regression (MLR) analysis have been summa- employs topological indices as well as other
rized by Kier and Hall (20,21,28). calculable structural parameters (e.g., steric
The third group of methods is based on de- and quantum mechanical parameters), and
scriptors derived from spatial (three-dimen- the MLR method for QSAR analysis. It has
sional) representation of molecular struc- been extensively applied to QSARIQSPR stud-
tures. Correspondingly, these methods are ies in analytical chemistry, toxicity analysis,
referred to as three-dimensional or 3D-QSAR; and other biological activity prediction (51-
they have become increasingly popular with 54). Parameters derived from various experi-
the development of fast and accurate compu- ments through chemometric methods have
tational methods for generating 3D conforma- also been used in the study of peptide QSAR
tions and alignments of chemical structures. (55), where PLS analysis was employed. The
The early examples of 3D-QSAR include mo- latter technique has been used almost exclu-
lecular shape analysis (MSA) (32),distance ge- sively in 3D-QSAR, where the number of de-
ometry (33, 34), and Voronoi techniques (35). scriptors characterizing molecular fields may
The first method uses shape descriptors and exceed the number of compounds by orders of
multiple linear regression analysis, whereas magnitude.
the latter methods apply atomic refractivity as There has been a great deal of interest, es-
structural descriptors and the solution of pecially more recently, in the use of data min-
mathematical inequalities to obtain the quan- ing methods to extract the information from
titative relationships. These two methods large andlor chemically inhomogeneous data
have been applied to the study of structure- sets. Examples of these methods include pat-
activity relationships of many data sets by tern recognition (56,571,automated structure
Hopfinger (e.g., Refs. 36,37) and Crippen (e.g., evaluation (58, 59), neural network (60-621,
Refs. 38, 39), respectively. and machine learning (63-65). Recent trends
Perhaps the most popular example of 3D- in QSAR studies also include developing opti-
QSAR is the comparative molecular field anal- mal QSAR models through variable selection,
ysis (CoMFA),developed by Cramer et al. (40), that is, by selecting a subset of available de-
which has elegantly combined the power of 3D scriptors in either MLR, PLS, or nonlinear
molecular modeling and partial least-square classification or artificial neural networks
(PLS) optimization technique (41, 42) and (ANN) analysis as applied either in 2D- (66-
found wide applications in medicinal chemis- 72) or in 3D-QSAR (73). These methods em-
try and toxicity analysis (see below). Most of ploy either generalized simulated annealing
54 Recent Trends in Quantitative Structure-Activity Relationships
(67), or genetic algorithms (68), or evolution- explanatory power, which has been a charac-
ary algorithms (69-72) as optimization tools. teristic feature of many traditional QSAR ap-
The effectiveness and convergence of these al- proaches.
gorithms are strongly affected by the choice of
a fitting function, which drives the optimiza- 2 MULTIPLE DESCRIPTORS OF
tion process (70-72). It has been demon-
MOLECULAR STRUCTURE
strated that optimization combined with vari-
able selection effectively improves QSAR
It has been said frequently that there are
models as compared to those without variable
three keys to the success of any QSAR model
selection. For example, GOLPE (74) was de-
building exercise: descriptors, descriptors,
veloped through the use of chemometric prin-
and descriptors. Many different molecular
ciples and q2-GRS(75) was developed on the
representations have been proposed, exempli-
basis of independent CoMFA analysis of small
fied by Hansch-type parameters (21, topologi-
areas of CoMFA descriptor space, to address
cal indices (19, 79), quantum mechanical de-
the issue of region selection. Both of these
scriptors (80), molecular shapes (32, 81),
methods have been shown to improve QSAR
molecular fields (40), atomic counts (821, 2D
models compared to the original CoMFA tech-
fragments (83-85), 3D fragments (86- 88),
nique.
molecular eigenvalues (89), molecular multi-
Different QSAR methods have their own
pole moments (go), E-state fields (28), molec-
strengths and weaknesses. For example, 3D-
ular fragment-based hash codes (91, 92), and
QSAR methods generally result in the dia-
molecular holograms (93). A recent review by
grams of important molecular fields that can
Livingstone provides an excellent survey of
be easily interpreted in terms of specific steric
various 2D and 3D descriptors, along with
and electrostatic interactions important for
some associated diversity and similarity func-
the ligand binding to their receptor. However,
tions (9). Various physicochemical parameters
the need to align structures in 3D, which is
such as the partition coefficient, molar refrac-
time-consuming and subjective, precludes the
tivity, and quantum mechanical quantities
use of 3D-QSAR techniques for the analysis of
such as highest occupied molecular orbital
large data sets. On the other hand, 2D-QSAR
(HOMO) and lowest occupied molecular or-
methods are much faster and more amenable
bital (LUMO) energies have been used to r e p
to automation because they require no confor-
resent molecular identities in early QSAR
mational search and structural alignment.
studies by the use of linear and multiple linear
Thus, 2D methods are best suited for the anal-
regression. However, these descriptors are not
ysis of large numbers of compounds and com-
suited for the analysis of large numbers of
putational screening of molecular databases;
molecules, either because of the lack of physi-
however, the interpretation of the resulting
cochemical parameters for compounds yet to
models in familiar chemical terms is fre-
be synthesized or because of the computa-
quently difficult, if not impossible.
tional expenses required by quantum mechan-
The generality of the QSAR modeling ap-
ical methods. Recent years have seen the ap-
proach as a drug discovery tool, irrespective of
plication of various topological descriptors
descriptor types or optimization algorithms,
that are usually derived from either 2D or 3D
can be best demonstrated in the context of in-
molecular structural information based on the
verse QSAR, which can be defined as design-
graph theory or molecular topology (20-22,
ing or discovering molecular structures with a
94). These descriptors are generated on the
desired property on the basis of QSAR models
basis of the molecular connectivity, 3D molec-
(76-78).In practical terms, inverse QSAR also
ular topography, and molecular field proper-
includes searching for molecules with a de-
ties.
sired target property in chemical databases or
virtual chemical libraries. These consider-
2.1 Topological Descriptors
ations emphasize the universal importance of
establishing QSAR model robustness and pre- Two widely applied examples of 2D molecular
dictive ability as opposed to concentrating on descriptors are molecular connectivity indices
2 Multiple Descriptors of Molecular Structure
(MCI) and atom-pair (AP) descriptors. Molec- mat (101) as follows: (1) negative charge cen-
ular connectivity indices, X , were first formu- ter (NCC); (2) positive charge center (PCC);
lated by Randic (19) and subsequently gener- (3) hydrogen bond acceptor (HA); (4) hydro-
alized and extended by Kier and Hall (20-22). gen bond donor (HD); (5)aromatic ring center
The fundamentals and applications of molec- (ARC); (6) nitrogen atoms (N); (7) oxygen at-
ular connectivity indices have been thor- oms (0); (8)sulfur atoms (S); (9) phosphorous
oughly reviewed (22,28).A popular MolConnZ atoms (P);(10) fluorine atoms (FL); (11)chlo-
software (95) affords the computation of a rine, bromine, iodine atoms (HAL); (12) car-
bon atoms (C); (13) all other elements (OE);
wide range of topological indices of molecular
(14) triple bond center (TBC);and (15)double
structure. These indices include (but are not
bond center (DBC). Apparently, the total
limited to) the following descriptors: simple
number of pairwise combinations of all 15
and valence path, cluster, pathlcluster and atom types is 120. Furthermore, distance bins
chain molecular connectivity indices, kappa should be defined to discriminate between
molecular shape indices, topological and elec- identical atom pairs separated by different
trotopological state indices, differential graph distances and therefore representing
connectivity indices, the graph's radius and different molecular substructures. Thus, 15
diameter, Wiener and Platt indices, Shannon distance bins can be introduced in the interval
and Bonchev-Trinajstic information indices, between graph distance zero (i.e., zero atoms
counts of different vertices, and counts of separating an atom pair) to 14 and greater.
paths and edges between different kinds of Thus, in this format a total of 1800 (120 X 15)
vertices (19, 20, 96-100). AP descriptors can be generated for any mo-
Overall, MolConnZ (95) produces over 400 lecular structure. An example of an atom-pair
different descriptors. Most of these descrip- descriptor is shown on Fig. 2.4. Frequently, as
tors characterize chemical structure, but sev- applied to particular data sets, many of the
eral depend on the arbitrary numbering of at- theoretically possible AP descriptors have
oms in a molecule and are introduced solely for zero value (implying that certain atom types
bookkeeping purposes. In a typical QSAR or atom pairs are absent in molecular struc-
study, only about one-half of all possible Mol- tures). For instance, in our recent studies of 48
ConnZ descriptors are eventually used, after anticonvulsant agents, only 273 descriptors
deleting descriptors with zero value or zero with nonzero value and nonzero variance were'
variance. Figure 2.3 provides a summary of generated (102).
these molecular descriptors and presents
2.2 3D Descriptors
some algorithms used in their derivation.
The idea of using atom pairs as molecular The rapid increase in structural three-dimen-
features in structure-activity studies was first sional (3D) information of bioorganic mole-
proposed by Carhart et al. (84). AP descriptors cules (103, 104), coupled with the develop-
are defined by their atom types and topological ment of fast methods for 3D structure
distance bins. An AP is a substructure defined generation [e.g., CONCORD (105, 106) and
by two atom types and the shortest path sep- CORINA (107)] and alignment [e.g., Active
aration (or graph distance) between the at- Analog Approach (43, 108)], have led to the
oms. The graph distance is defined as the development of 3D structural descriptors and
smallest number of atoms along the path con- associated 3D-QSAR methods. Many 3D-
necting two atoms in a molecular structure. QSAR methods (considered below) make use
The general form of an atom-pair descriptor is of so-called molecular field descriptors. To cal-
as follows: culate these descriptors, steric and electro-
static fields of all molecules are sampled with a
atom type i -(distance) -atom type j probe atom, usually carbon sp3 bearing a + 1
charge, on a rectangular grid that encom-
where atom chemical types are typically de- passes structurally aligned molecules. The
fined by the user. For example, 15 atom types values of both van der Wads and electrostatic
can be defined by use of the S Y B n mo12 for- interactions between the probe atom and all
Recent Trends in Quantitative Structure-Activity Relationships
nb, = x
k =1
n -1 b, is the s u m of vertex degrees
connected to vertex i, Obi= ai
Connectivity indices
I
1 All edges
1 f = - 0.5 "- X Molecular connectivity indices
f = l "'M Zagreb group indices
2
Figure 2.5. Process of steric and electrostatic descriptor generation in CoMFA. Note that this
lrocess results in a familiar QSAR table (cf. Fig. 2.2). PLS is used as a standard analytical technique
n CoMFA.
0ne of the most attractive features of the fined through the use of similar atom types
CoMFA and CoMFA-like methods is that, be- and atom pairs and 3D molecular topography;
caus,e of the nature of molecular field descrip- in this case, a physical distance between atom
tors, these approaches yield models that are types is used in place of chemical graph dis-
relatively easy to interpret in chemical terms. tance. The distance between two "atoms" is
Famous CoMFA contour plots, which are ob- measured and then assigned into one or two
taint?d as a result of any successful CoMFA distance bins. Typically, the width of each dis-
stud:y, tell chemists in rather plain terms how tance bin is chosen as 1.0 A. Because it is also
the (:hange in the compounds' size or charge designed to let the adjacent bins have 10%
distribution as a result of chemical modifica- overlap with each other, the actual length of
tion correlate with the binding constant or ac- each distance bin is 1.2 A. Any distance located
tivit:y. These observations may immediately in the overlap region is assigned to both bins.
suggest to a chemist possible ways to modify This "fuzzy distance" concept is adopted to
mole!cules to increase their potencies. How- alleviate the possible unfavorable boundary
ever:, as demonstrated in the next section, effects of the distance bins. For example, with
thesc2 predictions should be taken with caution strict boundary conditions, a distance of 2.05
only after sufficient work has been done to A will be assigned only to bin No. 2, but it can
provle the statistical significance and predic- be reasonably argued that it is almost as close
tive ilbility of the models. to the upper half of bin No. 1 as to bin No. 2.
B:y analogy with 2D atom-pair descriptors With fuzzy boundary conditions, 2.05 A be-
(Fig. 2.4), 3D AP descriptors can also be de- longs to both bin No. 1 and bin No. 2, allowing
58 Recent Trends in Quantitative Structure-Activity Relationships
a possible match to either. All the distances ties, respectively. The summations in Equa-
greater than 20 A are assigned into the last tion 2.1 are performed over all compounds,
bin. which are used to build a model for the train-
ing set. The statistical meaning of the q2 is
3 QSAR MODELING APPROACHES different from that of the conventional r 2 : a q 2
value greater than 0.3 is often considered sig-
nificant (111).
Despite obviously successful and growing
Two original 3D-QSAR methods, CoMFA (40) application of CoMFA in molecular design,
and GRID (110), were developed almost simul- several problems intrinsic to this methodology
taneously in the mid- to late-1980s (9). Since its have persisted. Studies revealed that CoMFA
introduction, the CoMFA approach has rapidly results can be extremely sensitive to a number
become one of the most popular methods of of factors, such as alignment rules, overall ori-
QSAR. Over the years, this approach has been entation, lattice placement, step size, and
applied to a wide variety of receptor and enzyme probe atom type (40, 75, 112-114). The prob-
ligands [many reviews appeared in a recent lem of three-dimensional alignment has been
monograph (lo)]. Undoubtedly, the further de- the most notorious among others. Even with
velopment of this and related methods is of great the development of automated or semiauto-
importance and interest to many scientists mated alignment protocols such as the Active
working in the area of rational drug design. Analog Approach (108, 115) or DISCO (116)
CoMFA methodology is based on the as- and the opportunity to use, in some cases, the
sumption that because, in most cases, the structural information about the target recep-
drug-receptor interactions are noncovalent, tor (112, 117) to align molecules, in general
the changes in the biological activities or bind- there is no standard recipe as to how to align
ing affinities of sample compounds correlate all molecules under consideration in a unique
with changes in the steric and electrostatic and unambiguous fashion. A QSAR analysis of
fields of these molecules. In a standard 60 acetylcholinesterase inhibitors (117) is par-
CoMFA procedure, all molecules under inves- ticularly illustrative with respect to this point.
tigation are first structurally aligned, and the In that study, the combination of structure-
steric and electrostatic fields around them are based alignment and CoMFA was employed
sampled with probe atoms, usually sp3 carbon to obtain a QSAR model for 60 chemically di'-
with a +1charge, on a rectangular grid that verse inhibitors of acetylcholinesterase (AChE).
encompasses aligned molecules. The results of The great structural diversity of the AChE in-
the field evaluation in every grid point for ev- hibitors, ranging from choline to decametho-
ery molecule in the data set are placed in the nium, made it practically impossible to struc-
CoMFA QSAR table, which therefore contains turally align all the inhibitors in any unbiased
thousands of columns (Fig. 2.5). The analysis way and generate a unique three-dimensional
of this table by the means of standard multiple pharmacophore. X-ray crystallographicanalysis
regression is practically impossible; however, ofAChE from Torpedo californica (EC 3.1.1.7)
the application of special multivariate statisti- (118), followed by X-ray determination of
cal analysis routines, such as PLS analysis and the complexes of the enzyme with three
LOO cross-validation ensures the statistical structurally diverse inhibitors, tacrine, edro-
significance of the final CoMFA equation. The phonium, and decamethonium (1191, pro-
outcome from this procedure is a cross-vali- vided crucial information with respect to the
dated correlation coefficient R 2 (8), which is orientation of these inhibitors in the active
calculated according to the formula site of the enzyme. The crystallographic
data indicated that each of the three inhibi-
tors had a unique binding orientation in the
active site of the enzyme (Fig. 2.6). Their
natural structural alignment would probably
where y,, ii,and are the actual, estimated, never have been predicted by any of the exist-
and averaged (over the entire data set) activi- ing automated algorithms for ligand align-
ing Approaches
for biological activity. Indeed, the deficiencies a similar way, with their pharmacophoric ele-
of conventional CoMFA routine mentioned ments interacting with the same functional
earlier may be effectively dealt with by elimi- groups of the receptor.
nating from the analyses those areas of three- The pharmacophore concept plays a very
dimensional space where changes in steric and important role in guiding the drug discovery
electrostatic fields do not correlate with process. Pharmacophore models help medici-
changes in biological activity. The q2-GRSrou- nal chemists gain an insight into the key inter-
tine was devised (75) to eliminate those areas actions between ligand and receptor when the
from the analysis based on the (low) value of receptor structure has not been determined
the q2 obtained for such regions individually. experimentally. A pharmacophore can be used
The major feature of this routine is that it as a basis for the alignment rules in 3D-QSAR
analysis for the lead compound optimization
optimizes the region selection for the final
(125). Furthermore, a pharmacophore can be
PLS analysis. In this regard, it is intellectually
directly used as the search query for 3D data-
analogous to the GOLPE approach (74). base mining, which is a common and efficient
3D-QSAR remains an active area of re- approach f;r discovery of lead compounds
search and method development. Several re- (126).
cent approaches such as COMSiA (45), QSiAR Pharmacophore identification refers to the
(461, and GRIND (122) address the most noto- computational way of identifying the essential
rious CoMFA problems dealing with the grid 3D structural features and configurations that
artifacts. However, it should be kept in mind are responsible for the biological activity of a
that 3D-QSAR modeling is a difficult process. series of compounds. It is computationally in-
It is reasonably successful when underlying tensive, requiring searching two huge spaces:
molecules are relatively rigid and similar, so the available conformations for each com-
that the identification of the 3D pharmaco- pound and the possible correspondence (align-
phore is straightforward. With the increased ment) between different compounds. A num-
complexity and flexibility of molecules and a ber of approaches and computer programs
possibility of multiple mechanisms of binding have been specifically developed for pharma-
with the receptor, the derivation of unambig- cophore identification including, for example,
uous pharmacophore and unique alignment is Active Analog Approach, AAA (108,127,128),
sometimes practically impossible (as shown Ensemble distance geometry (129), DISCO
above in the case of AchE inhibitors), and ex- (116), Chem-X (1301, CatalystIHypo (131,
treme care is important in trying to obtain 132), CatalystIHipHop (133, 134), and
reproducible and validated QSAR models. Apex-3D (135).
An obvious parallel can be established be-
tween the identification of descriptors contrib-
3.2 The Descriptor Pharmacophore Concept
uting the most to the correlation with biologi-
and Variable Selection QSAR
cal activity, and search for pharmacophoric
The termpharmacophore, introduced by Ehr- elements, which are mainly responsible for
lich in the early 1900s (1231, was originally the specificity of drug action. Indeed, individ-
referred to the molecular framework that car- ual pharmacophoric elements are typically
ries (phoms) the essential features responsible identified in the course of ex~erimentalstruc-
for a drug's (pharmacon) activity. Nowadays, ture-activity studies. Considering molecules
this term has almost the opposite meaning as as a collection of substructures, pharmaco-
applied to three-dimensional (3D) molecular phoric elements can also be viewed as specific
structure. A 3D pharmacophore is defined as a chemical features selected from all chemical
collection of particular chemical features fragments present in a molecular data set.
(functional groups) and their spatial arrange- Thus, the selection of specific pharmacophoric
ment, which define pharmacological specific- features responsible for biological activity is
ity of a series of compounds (124). The phar- directly analogous to the selection of specific
macophore concept assumes that structurally chemical descriptors contributing to the most
diverse molecules bind to their receptor site in explanatory QSAR model. Frequently, the
3 QSAR Modeling Approaches
QSAR modeling that involves descriptor (fea- descriptors (parents) is generated as follows.
ture) selection is referred to as variable selec- Each parent is described by a string of random
tion QSAR. binary numbers (i.e., one or zero), with the
This consideration emphasizes the analogy length (total number of digits) equal to the
between pharmacophore identification and total number of descriptors selected for each
variable selection QSAR. On the basis of this data set. The value of one in each string im-
analogy, we now expand the notion of chemi- plied that the corresponding descriptor is in-
cal pharmacophore to that of the more general cluded for the parent, and the value of zero
descriptor pharmacophore. We shall define de- implies that the descriptor is excluded.
scriptor pharmacophore as a special subset of Step 3. For every random combination of
molecular descriptors (of any nature, not only
descriptors (i.e., every parent), a QSAR equa-
chemical functional groups) optimized in the
tion is generated for the training data set by
process of variable selection QSAR, to achieve
the most significant correlation between de- use of the PLS algorithm (41). Thus, for each
scriptor values and biological activity. parent a q2 value is obtained, and some func-
Similar to the common areas of application tion of q2 is used as a fitness function to guide
of chemical pharmacophores, descriptor phar- GA.
macophores can be applied for database min- Step 4. Two parents are selected randomly
ing. First, a preconstructed QSAR model can and subjected to a crossover (i.e., the exchange
be used as a means of screening compounds of the equal length substrings), which pro-
from existing databases (or virtual libraries) duces two offspring. Each offspring is sub-
for high predicted biological activity. Alterna- jected to a random single-point mutation, that
tively, variables selected by QSAR optimiza- is, a randomly selected one (or zero) is changed
tion can be used for similarity searches to im- to zero (or one) and the fitness of each off-
prove the performance of the rational library spring is evaluated as described above (cf.
design or database mining methods. The ad- Step 3).
vantage of this approach for database mining Step 5. If the resulting offspring are char-
is that it affords not only the compound selec- acterized by a higher value of the fitness func-
tion but also the quantitative prediction of tion, then they replaced parents; otherwise,
their activity. the parents are kept. .
Step 6. Steps 3-5 are repeated until a pre-
3.2.1 Linear Models. Variable selection ap- defined convergence criterion is achieved. For
proaches can be applied in combination with the convergence criterion one can use the dif-
both linear and nonlinear optimization algo- ference between the maximum and minimum
rithms. Exhaustive analysis of all possible values of the fitness function. Calculations are
combinations of descriptor subsets to find a terminated when this difference falls below a
specific subset of variables that affords the certain threshold (e.g., 0.02).
best correlation with the target property is In summary, each parent in this method
practically impossible because of the combina- represents a QSAR equation with randomly
torial nature of this problem. Thus, stochastic chosen variables, and the purpose of the calcu-
sampling approaches such as genetic or evolu- lation is to evolve from the initial population
tionary algorithms (GA or EA) or simulated of the QSAR equations to the population with
annealing (SA)are employed. To illustrate one the highest average value of the fitness func-
such application we shall consider the GA-PLS tion. In the course of the GA-PLS process, the
method, which was implemented as follows initial number of members of the population
(136). (100) is maintained while the average value of
Step 1. Multiple descriptors such as molec- the fitness function for the whole population
ular connectivity indices or atom pair descrip- converges to a high number. The best model is
tors (cf. Section 2.1) are generated initially for characterized by the highest value of the fit-
every compound in a data set. ness function as well as by specific descriptor
Step 2. An initial population of 100 differ- selection (descriptor pharmacophore) that af-
ent random combinations of subsets of these fords such a model.
Recent Trends in Quantitative Structure-Activity Relationships
-
3.2.2 Nonlinear Models. Most of the QSAR ceptually simple, nonlinear approach to pat-
approaches assume the existence of a linear tern-recognition problems (147).In this method,
relationship between a biological activity and an unknown pattern is classified according to
molecular descriptors. However, the fast col- the majority of the class labels of its k nearest
lection of structural and biological data, as a neighbors of the training set in the descriptor
consequence of the recent development of space. Many variations of the kNN method
combinatorial chemistry and high throughput have been proposed in the past and new and
screening technologies, has challenged tradi- fast algorithms have continued to appear in
tional QSAR techniques. First, 3D methods recent years (148, 149). The applications of
may be computationally too expensive for the the kNN principle in chemistry have been
analysis of a large volume of data'; and in some
summarized by Strouf (150). In the area of
cases, an automated and unambiguous align-
biology, Raymer et al. have successfully ap-
ment of molecular structures is not achiev-
able. Second, although existing 2D techniques plied a kNN pattern-recognition technique
are computationally efficient, the assumption with simultaneous feature selection and clas-
of linearity in the SAR may not hold true, es- sification in the analysis of water distribution
pecially when a large number of structurally in protein structures (151). In the area of
diverse molecules are included in the analysis. QSPR, Basak et al. have applied this principle,
These considerations provide an impetus combined with principal component analysis
for the development of fast, nonlinear, vari- and graph theoretical indices, in the estima-
able selection QSAR methods that can avoid tion of physicochemical properties of organic
the aforementioned problems of linear QSAR. compounds (152-155).
Several nonlinear QSAR methods have been The assumptions underlying the kNN-
proposed in recent years. Most of these meth- QSAR method are as follows. First, structur-
ods are based on either artificial neural net- ally similar compounds should have similar bi-
work (ANN) (50, 61, 137-142) or machine ological activities, and the activity of a
learning techniques (65,143-145). Given that compound can be predicted (or estimated)
optimization of many parameters is involved simply as the average of the activities of simi-
in these techniques, the speed of the analysis lar compounds. Second, the perception of
is relatively slow. More recently, Hirst re- structural similarity is relative and should 4-
ported a simple and fast nonlinear QSAR ways be considered in the context of a partic-
method (1461, in which the activity surface ular biological target. Given that the physico-
was generated from the activities of training chemical characteristics of the receptor-
set compounds based on some predefined binding site vary from one target to another,
mathematical function. the structural features that can best explain
For illustration. we shall consider here one the observed biological similarities between
of the nonlinear variable selection methods compounds are different for different biologi-
that adopts a k-Nearest Neighbor (kNN) prin- cal endpoints. These critical structural fea-
ciple to QSAR [kNN-QSAR (4911. Formally, tures can be defined as the descriptor pharma-
this method implements the active analog cophore (DP) for the underlying biological
principle that lies in the foundation of the activity. Thus, one of the tasks of building a
modern medicinal chemistry. The kNN-QSAR kNN-QSAR model is to identify the best DP.
method employs multiple topological (2D) or This is achieved by the "bioactivity-driven"
topographical (3D) descriptors of chemical variable selection, that is, by selecting a subset
structures and predicts biological activity of of molecular descriptors that afford a highly
any compound as the average activity of k predictive kNN-QSAR model. Because the
most similar molecules. This method can number of all possible combinations of de-
be used to analyze the structure-activity scriptors is huge, an exhaustive search of
relationships (SARI of a large number of these combinations is not possible. Thus, a
compounds where a nonlinear SAR may stochastic optimization algorithm (i.e., simu-
predominate. lated annealing) has been adopted for an effi-
In principle, the kNN technique is a con- cient sampling of the combinatorial space. Fig-
4 Validation of QSAR Models
ure 2.7 shows the overall flowchart of the Calculate the cross-validated R 2 (or q2)
kNN-QSAR method, which involves the fol- value (cf. Equation 2.1). (v) Repeat calcula-
lowing steps. tions fork = 2,3,4, . . . , n. The upper limit
of k is the total number of compounds in
1. Select a subset of n descriptors randomly (n the data set; however, the best value is'
is a number between 1 and the total num- found empirically between 1 and 5. The k
ber of available descriptors) as a hypothet- that leads to the best q 2 value is chosen for
ical descriptor pharmacophore (HDP). the current kNN-QSAR model.
2. Validate this HDP by a standard cross-val- 3. Repeat steps 1 and 2, the procedure of gener-
idation procedure, which generates the ating trial HTPs and calculating correspond-
cross-validated R 2 (or q2) value for the ingq2 values. The goal is to find the best HTP
kNN-QSAR model built by use of this HDP. that maximizes the q2 value of the corre-
The standard leave-one-out procedure has sponding kNN-QSAR model. This process is
been implemented as follows: (i) Eliminate driven by a generalized simulated annealing
a compound from the training set. (ii) Cal- by use of q2 as the objective fundion.
culate the activity of the eliminated com-
pound, which is treated as an unknown, as
the average activity of the k most similar 4 VALIDATION OF QSAR MODELS
compounds found in the remaining mole-
cules (k is set to 1 initially). The similarities One of the most important characteristics of
between compounds are calculated using QSAR models is their predictive power. The
only the selected descriptors (i.e., the cur- latter can be defined as the ability of a model to
rent trial HDP) instead of the whole set of predict accurately the target property (e.g., bi-
descriptors. (iii) Repeat this procedure un- ological activity) of compounds that were not
til every compound in the training set has used for model development. The typical prob-
been eliminated and predicted once. (iv) lem of QSAR modeling is that at the time of
Recent Trends in Quantitative Structure-Activity Relationships
Figure 2.8. Beware of q2! External R2(for the test set) presents no correlation with the "predictive"
LOO 92 (for the training set). (Adopted from Ref. 163.)
should be divided into the training and test The division of a data set into the training
sets. Ideally, this division must be performed and test sets can be performed by the use of
such that points representing both training various clustering techniques. In Burden and
and test set are distributed within the whole Winkler (175) and Burden et al. (176) the K-
descriptor space occupied by the entire data means clustering algorithm (177) was used,
set, and each point of the test set is close to at and from each cluster one comr~oundfor the
least one point of the training set. This ap- training set was randomly selected. In Potter
proach ensures that the similarity principle and Matter (178), to select a representative
can be employed for the activity prediction of subset from a data set, hierarchical clustering
the test set. Unfortunately, as we shall see be- and the maximum dissimilarity method (179-
low, this condition cannot always be satisfied. 181) were used. The authors showed that both
Many authors use external test sets for val- methods choose representative subsets of
idation of QSAR models, but do not provide compounds much better than the random se-
any rationale as to how and why certain com- lection. Compounds selected through use of
pounds were chosen for the test set (164,165). the maximum dissimilarity method were used
One of the most widely used methods for di- as training sets in 3D-QSAR studies, with all
viding a data set into training and test sets is a remaining compounds composing the test set.
mere random selection (166, 167). Some au- In Wu et al. (166) the Kennard-Stone (182-
thors assign whole structural subgroups of 184) method, which is similar to the maximum
molecules to the training set or the test set dissimilarity method, was applied to the clas-
(168,169). Another frequently used approach sification of NIR spectra and QSAR analysis.
is based on the activity sampling. The whole The drawbacks of clustering methods are that
range of activities is divided into bins, and different clusters contain different numbers of
compounds belonging to each bin are ran- points and have different densities of repre-
domly (or in some regular way) assigned to the sentative points. Therefore, the closeness of
training set or test set (170,171). These meth- each point of the test set to at least one point of
ods (166,170,171) cannot guarantee that the the training set is not guaranteed. The maxi-
training set compounds represent the entire mum dissimilarity and Kennard-Stone meth-
descriptor space of the original data set, and ods guarantee that the points of the training
that each compound point of the test set is set are distributed more or less evenly within
close to at least one point of the training set. the whole area occupied by representative
In several publications, the division of a points, and the condition of closeness of the
data set into training and test sets is per- test set points to the training set points is sat-
formed by use of the Kohonen's Self-Organiz- isfied. The maximum distance between train-
ing Map (SOM) (172). Representative points ing and test set points in these methods does
falling into the same areas of the SOM are not exceed the radius of the probe sphere.
randomly selected for the training and test .
To select a re~resentativesubset of sam-
sets (173, 174). SOM preserves the closeness ples from the whole data set, factorial designs
between points (points that are close to each (185, 186) and D-optimal designs (187) were
other in the multidimensional descriptor used (166, 173, 188). Factorial designs pre-
space are close to each other on the map). sume that different sample properties (such as
Therefore, it is anticipated that the training substituent groups at certain positions) are di-
and test sets must be scattered within the vided into groups. The training set includes
whole area occupied by representative points one representative for each combination of
in the original descriptor space, and that each properties. For a diverse data set this ap-
point of the test set is close to at least one point proach is impractical, and fractional factorial
of the training set. The drawback of this designs are used, in which only a part of all
method is that the quantitative methods of combinations is included into the training set.
prediction use exact values of distances be- Generally, this approach does not guarantee
tween representative points; because SOM is a the closeness of the test set points to the train-
nonlinear projection method, the distances be- ing set points in the descriptor space. D-opti-
tween points in the map are distorted. mal design algorithms select samples that
66 Recent Trends in Quantitative Structure-Activity Relationships
structure-activity data sets, besides the tradi- compounds that can be reasonably synthe-
tional linear regression methods. Most of sized, which is sometimes called "virtual
them are nonlinear and nonparametric and chemistry space," is still far beyond today's
need no statistical assumptions to apply them. capability of chemical synthesis and biological
Decision tree and rule induction methods, assay. Therefore, medicinal chemists continue
such as ID3 (200), CART (201), and FIRM to face the same problem as before: Which
(202-204) usually use univariate splits to gen- compounds should be chosen for the next
erate a model in the form of a tree or proposi- round of synthesis and testing? For chemoin-
tional logic. The inferred model is easy to com- formatitians, the task is to develop and utilize
prehend, but the approximation power may be various computer programs to evaluate a very
significantly restricted by a particular tree or large number of chemical compounds and rec-
rule representation. Inductive logic program- ommend the most promising ones for bench
mingmethods, such as GOLEM (64) and PRO- medicinal chemists. This process can be called
GOL (65),are designed to induce a model from virtual screening (208) or chemical database
the more flexible representation of first-order searching. A large number of computational
predicate logic. However, this generality methods exist for virtual screening, but which
comes at the price of significant computational one is chosen will depend on the information
demands. Nonlinear regression and classifica- available and the task at hand in practice.
tion methods, such as various neural networks A substructure search will typically be un-
(60-62), train a model by fitting linear and dertaken if a lead compound has been found.
nonlinear combinations of basis functions to The search query will retrieve all the struc-
the combinations of the input variables. They tures in a database that contain the substruc-
may be powerful in terms of approximation, tures present in the lead compound that are
but they are statistically poorly characterized, believed to be important for activity (209). Ac-
slow (205),and difficult to interpret in chemi- cording to graph theory, it is equivalent to
cal terms. Example-based methods, such as searching a series of topological graphs for the
nearest-neighbor methods (1471, use repre- existence of a subgraph isomorphism with a
sentative examdes from the database as an specified query graph. Subgraph isomorphism
approximate model and predicate new sam- is an NP-complete problem (210), which
ples on the basis of the properties of the most means that for it, there are no algorithms .
similar examples in the model. They are as- whose worst-case time requirements do not
ymptotically powerful for approximating rise exponentially with the size of the input.
properties, but also difficult to interpret. Fur- However, various backtracking algorithms
thermore, their performance is strongly de- (211-213) and partitioning algorithms (214-
pendent on a well-defined distance metric to 217) have been developed since the 1950s, to
evaluate distances between data points. reduce the average time required for chemical
Data mining of chemical databases is still substructure searching. Today, almost all the
at its very early stage. Nevertheless, as a re- chemical database software includes the func-
sult of the data explosion in pharmaceutical tion of substructure searching.
industry, it is expected that data mining tech- A similarity search provides a way forward
niques will play an increasingly important role by retrieving the structures that are similar,
in the drug discovery process. Future studies but not identical, to a lead compound (94).
may include, for example, the definition of Therefore, it overcomes some limitations of
chemical space, the validation of various algo- substructure search, for example, not requir-
rithms (206), and the representation of ex- ing specific knowledge about the substruc-
tremely large virtual databases (207). tures responsible for activity, and being able
to rank the output structures according to the
5.2 Virtual Screening
overall similarity. The search query usually
Although combinatorial chemistry and HTS involves a set of descriptors that collectively
have offered medicinal chemists a much specify the whole structure of the lead com-
broader range of possibilities for lead discov- pound. This set of descriptors is compared
ery and optimization, the number of chemical with the corresponding set of descriptors for
68 Recent Trends in Quantitative Structure-Activity Relationships
each compound in the database, and then a suming, or redundant (223). Modern rational
measure of similarity is calculated between approaches to the design of combinatorial li-
them. There are a wide variety of molecular braries have been explored in a recent mono-
descriptors for similarity searching (cf. Sec- graph (224). Theoretical analysis of available
tion 2). Not a single set of molecular descrip- experimental information about the biological
tors has been found as the best choice in all the target or pharmacological compounds capable
cases. The present trend in descriptor selec- of interacting with the target can significantly
tion is to use combined descriptors with many enhance the rational design of targeted chem-
different types. The similarity coefficients ical libraries. In many cases, the number of
that are often used for measuring the similar- compounds with known biological activity is
ity between two structures includes Manhat- sufficiently large to develop viable QSAR mod-
tan distance, Euclidean distance, Soergel dis- els for such data sets. These models can be
tance, Tanimoto coefficient, Dice coefficient, used as a means of selecting virtual library
Cosine coefficient, and so forth (2181, and compounds (or actual compounds from exist-
again no clear-cut winner has been found ing databases) with (high) predicted biological
among them (219). Virtual screening based on activity. Alternatively, if a variable selection
QSAR models can serve as a powerful ap- method has been employed in developing a
proach to the design of targeted chemical li- QSAR model, the use of only selected variables
braries, as illustrated in the following section. can improve the performance of the rational
library design or database mining methods on
5.3 Rational Library Design by use of QSAR the basis of the similarity to a probe. This pro-
As discussed earlier, combinatorial chemical cedure of use of only selected variables in a
synthesis and high throughput screening have similarity search in the descriptor space is
significantly increased the speed of the drug analogous to more traditional use of conven-
discovery process (220-222). However, it re- tional chemical pharmacophores in database
mains impossible to synthesize all of the li- mining.
brary compounds in a reasonably short period QSAR models can be employed for rational
of time. For instance, 30003 (2.7 X 10'') com- design of targeted chemical libraries and data-
pounds can be synthesized from a molecular base mining by predicting biologically active
scaffold with three different substitution posi- structures in virtual or actual chemical librar;
tions when each of the positions has 3000 dif- ies (225-227). To illustrate this approach, we
ferent substituents. If a chemist could synthe- consider the design of a pentapeptide combi-
size 1000 compounds per week, 27 million natorial library with the bradykinin activity
weeks (-0.5 million years) would be required by use of a QSAR model derived for a small
to synthesize all these compounds. Further- bradykinin peptide data set. Figure 2.9 shows
more, many of these compounds can be struc- the schematic diagram illustrating the tar-
turally similar to each other, thus making re- geted pentapeptide combinatorial library de-
dundant the chemical information contained sign by use of the FOCUS-2D method (225,
in the library. There is a need for rational li- 226). The algorithm includes the description,
brary design (i.e., rational selection of a subset evaluation, and optimization steps.
of available building blocks for combinatorial To identify potentially active compounds in
chemical synthesis), so that a maximum the virtual library, FOCUS-2D employs sto-
amount of information can be obtained while a chastic optimization methods such as SA (228,
minimum number of compounds are synthe- 229) and GA (230-232). The latter algorithm
sized and tested. Similarly, there is a closely was used for targeted pentapeptide library de-
related task in computational database min- sign as follows. Initially, a population of 100
ing, that is, rational selection of a subset of peptides is randomly generated and encoded
compounds from commercially available or by use of topological indices or amino acid-
proprietary databases for biological testing. dependent physicochemical descriptors. The
Thus, in many practical cases, the exhaus- fitness of each peptide is evaluated by its bio-
tive synthesis and evaluation of combinatorial logical activity predicted from a precon-
libraries is prohibitively expensive, time-con- structed QSAR equation (see below). Two par-
6 Conclusions
Ba Bb Bc Bd
-Bf
Generate and Encode
Select Analyze
ent peptides are chosen by use of the roulette VEWAK and VKWAP (excluded from the
wheel selection method (i.e., high fitting par- training set for the QSAR model develop-
ents are more likely to be selected). Two off- ment). Furthermore, the actual spatial posi-
spring peptides are generated by a crossover tions of these amino acids were correctly iden-
(i.e., two randomly chosen peptides exchange tified: the first and fourth positions for V,the
their fragments) and mutations (i.e., a ran- second and fifth positions for E; the third po-
domly chosen amino acid in an offspring is sition for W; and the second and fifth positions
changed to any of 19 remaining amino acids). for K. More detailed analysis of these results
The fitness of the offspring peptides is then (cf. Fig. 2.10b,c) may suggest which residues
evaluated and compared with that of the par- should be preferably chosen for each position
ent peptides, and the two lowest scoring pep- in the pentapeptide to achieve a limited size .
tides are eliminated. This process is repeated library with high predicted bradykinin activ-
for 2000 times to evolve the population. ity.
Design of a Targeted Library with Bradykinin
(BK) Potentiating Activity. The results obtained
with the FOCUS-2D and a QSAR-based pre- 6 CONCLUSIONS
diction are shown in Figure 2.10. The position-
dependent frequency distributions of amino In this chapter, we have reviewed recent and
acids in the highest scoring pentapepeptides developing trends in the field of QSAR. We
are shown before (Fig. 2.10a) and after (Fig. have provided common terminology and pre-
2.10b,c) FOCUS-2D. To evaluate the effi- sented a unified concept of the QSAR ap-
ciency of stochastic sampling, the entire pen- proach. We have emphasized that, regardless
tapeptide library (which includes as many as of the origin of molecular descriptors, any
3.2 million molecules) was also generated and QSAR modeling exercise starts from con-
subjected to evaluation by use of the same structing a two-dimensional data array (Fig.
QSAR model, and the results are shown in Fig. 2.2), which lists molecular IDS, values of the
2.10~. Apparently, the results after FO- target (or dependent) property of each com-
CUS-2D and the exhaustive search were very pound, and values of descriptors (independent
similar to each other. FOCUS-2D selected the variables) for each compound. We have consid-
following amino acids: E, I, K, L, M, Q, R, V, ered various protocols employed by QSAR
and W. Interestingly, these selected amino ac- practitioners to develop quantitative models
ids included most of those found in the two of biological activity by the use of chemical
experimentally most active pentapeptides, descriptors and linear or nonlinear optimiza-
Recent Trends in Quantitative Structure-Activity Relationships
A C D E F G H I K L M N P Q R S T V W Y
Amino acid
A C D E F G H I K L M N P Q R S T V W Y
Amino acid
(c)
-
0 u3
120
100
& $ 80 4th AA
R K
E a , 60
2 k E4 3rd AA
40 2nd AA
E 0 20 1st AA
0
A C D E F G H I K L M N P Q R S T V W Y
Amino acid
Figure 2.10. Ratonal selection of building blocks for library design by use of FOCUS-2D and a QSAR
model for activity prediction: (a) initial population; (b)final population after FOCUS-2D; and (c)final
population after the exhaustive search.
tion techniques. We have particularly empha- 1. Establish an SAR database through the use
sized that the true power of any QSAR model of reliable quantitative measurements of
comes from its statistical significance and the the target property and a preferred set of
model's ability to predict accurately biological molecular descriptors.
properties of chemical compounds both in the 2. Divide the underlying data set into training
training and, most important, in the test sets. and test sets through the use of diversity
One of the important research challenges in sampling algorithms.
the QSPR modeling remains finding descrip-
tor types, correlation approaches, and ade- 3. Develop training set models through the
quate statistical characteristics of the training use of available QSAR methods or commer-
set only, which may ensure high predictive cial software. Characterize these models
power of the models. with internal validation parameters, as dis-
In conclusion, we strongly advocate rigor- cussed in this chapter, and define the appli-
ous validation of QSAR models before their cability domain for each model.
practical application or interpretation. The 4. Validate training set models through the
practical guidelines for the development of use of an external test set and calculate the
statistically robust and predictive QSAR mod- external validation parameters, as dis-
els can be summarized as follows: cussed in this chapter. Ideally, repeat the
References
procedure of training and test selection and 12. D. J. Livingstone, J. Chem. Znf. Comput. Sci.,
external validation several times to iden- 40,195-209(2000).
tify the QSAR model for the smallest train- 13. Chemical Abstracts Service (CAS), Columbus,
ing set that affords adequate prediction OH. May be accessed a t http://www.cas.org
power for the biggest test set. 14. D. S. Tan, M. A. Foley, M. D. Shair, and S. L.
5. Finally, explore and exploit validated Schreiber, J. Am. Chem. Soc., 120,8565-8566
QSAR models for possible mechanistic in- (1998).
terpretation and prediction. 15. J. Drews, Science, 287,1960-1964(2000).
16. J. G. Topliss and R. P. Edwards, J. Med.
In the modern age of medicinal chemistry, Chem., 22,1238 (1979).
QSAR modeling remains one of the most im- 17. U. Burkert and N. L. Allinger, Molecular Me-
portant instruments of computer-aided drug chanics, American Chemical Society, Washing-
design. Skillful application of various method- ton, DC, 1982.
ologies discussed in this chapter will afford 18. C. Hansch and T. Fujita, J. Am. Chem. Soc.,
validated QSAR models, which should con- 86,1616-1626(1964).
tinue to enrich and facilitate the experimental 19. M. Randic, J.Am. Chem. Soc., 97,6609-6615
process of drug discovery and development. (1975).
20. L. B. Kier and L. H. Hall, Molecular Connectiv-
ity in Chemistry and Drug Research, Academic
REFERENCES Press, New York, 1976.
1. C. Hansch, R. M. Muir, T. Fujita, P. P. Ma- 21. L.B. Kier and L. H. Hall, Molecular Connectiv-
loney, E. Geiger, and M. Streich, J. Am. Chem. ity in Structure-Activity Analysis, Research
Soc., 85,2817(1963). Studies Press, Chichester, UK, 1986.
2. T. Fujita, J. Iwasa, and C. Hansch. J. Am. 22. L. B. Kier and L. H. Hall in K. B. Lipkowitz and
Chem. Soc., 86,5175(1964). D. B. Boyd, Eds., Reviews in Computational
3. L. P. Hammett, Chem. Rev., 17,125(1935). Chemistry ZZ, VCH, Weinheimmew York,
4. C. Hansch and A. Leo in S. R. Heller, Ed., Ex- 1991,pp. 367-422.
ploring QSAR: Fundamentals and Applica- 23. L. B. Kier, Quant. Struct.-Act. Relat., 4,109-
tions in Chemistry and Biology, American 116(1985).
Chemical Society, Washington, DC, 1995. 24. L. B. Kier, Quant. Struct-Act. Relat., 6, 8-12 .
5. C. Hansch, A. Leo, and D. Hoekman in S. R. (1987).
Heller, Ed., Exploring QSAR: Hydrophobic, 25. L. H. Hall and L. B. Kier, Quant. Strut.-Act.
Electronic, and Steric Constants. American Relat., 9, 115-131(1990).
Chemical Society, Washington, DC, 1995.
26. L. H. Hall, B. K. Mohney, and L. B. Kier,
6. A. Verloop, W. Hoogenstraaten, and J. Tipker Quant. Struct.-Act. Relat., 10,43-51(1991).
in E. J. Ariens, Ed., Drug Design, Vol. VII, Ac-
ademic Press, New York, 1976,165pp. 27. L. H. Hall, B. K. Mohney, and L. B. Kier,
J. Chem. Znf. Comput. Sci., 31,76-82 (1991).
7. C. Selassie, this volume, Chapter 1.
28. L. B. Kier and L. H. Hall, Molecular Structure
8. H. Kubinyi in R. Mannhold, P. Krogsgaard-
Description: The Electrotopological State, Aca-
Larsen, and H. Timmerman, Eds., Methods demic Press, Orlando, FL, 1999.
and Principles in Medicinal Chemistry, Vol. 1,
VCH, New York, 1993. 29. G. E. Kellogg, L. B. Kier, P. Gaillard, and L. H.
Hall, J. Cornput.-Aided Mol. Des., 10,513-520
9. D. Livingstone, Data Analysis for Chemists:
(1996).
Applications to QSAR and Chemical Product
Design, Oxford University Press, Oxford, UK, 30. R.P. Sheridan, R. B. Nachbar, andB. L. Bush,
1995. J.Cornput.-Aided Mol. Des.,8,323-340(1994).
10. H. Kubinyi, G. Folkers, and Y. Martyn, Eds., 31. H. Matter, J. Med. Chem., 40, 1219-1229
3D QSAR in Drug Design, Vols. 2 and 3, Klu- (1997).
wer/ESCOM, Dordrecht, The Netherlands, 32. A. J. Hopfinger, J. Am. Chem. Soc., 102,7196
1998. (1980).
11. M.Karelson, Molecular Descriptors in QSARI 33. G. M. Crippen, J. Med. Chem., 22, 988-997
QSPR, Wiley-Interscience, New York, 2000. (1979).
72 Recent Trends in Quantitative Structure-Activity Relationships
34. G. M. Crippen, J. Med. Chem., 23, 599-606 58. G. Klopman, J. Am. Chem. Soc., 106, 7315-
(1980). 7321 (1984).
35. L. G. Boulu and G. M. Crippen, J. Comput. 59. G. Klopman, Quant. Struct.-Act. Relat., 11,
Chem., 10,673 (1989). 176-184 (1992).
36. U. Holzbrabe and A. J. Hopfinger, J. Chem. 60. T. Aoyama, Y. Suzuki, and H. Ichikawa,
Znf. Comput. Sci., 36, 1018 (1996). J. Med. Chem., 33,2583-2590 (1990).
37. A. J. Hopfinger, B. J. Burke, and W. J. Dunn, 61. S.-S. So and W. G. Richards, J. Med. Chem., 35,
J. Med. Chem., 37,3768 (1994). 3201-3207 (1992).
38. S. Srivastava and G. M. Crippen, J. Med. 62. F. R. Burden, B. S. Rosewarne, and D. A. Win-
Chem., 36,3572 (1993). kler, Chemom. Intel. Lab. Syst., 38, 127-137
39. M. P. Bradley and G. M. Crippen, J. Med. (1997).
Chem., 36,3171 (1993). 63. G. Bolis, L. Di Pace, and F. Fabrocini, J. Com-
40. R. D. Cramer 111, D. E. Patterson, and J. D. put.-Aided Mol. Des., 5, 617-628 (1991).
Bunce, J. Am. Chem. Soc., 110, 5959-5967 64. R. D. King, S. H. Mugglfton, R. A. Lewis, and
(1988). M. J. E. Sternberg, Proc. Natl. Acad. Sci. USA,
41. S. Wold, A. Ruhe, H. Wold, and W. J. Dunn 111, 89,11322-11326 (1992).
SZAM J. Sci. Stat. Comput., 5,735-743 (1984). 65. R. D. King, S. H. Muggleton, A. Srinivasan,
42. P. Geladi and B. R. Kowalski, Anal. Chim. and M. J. E. Sternberg, Proc. Natl. Acad. Sci.
Acta, 185, 1-17 (1986). USA, 93,438-442 (1996).
43. G. R. Marshall and R. D. Cramer 111, Trends 66. S. Clementi and S. Wold in H. van de Water-
Pharmacol. Sci., 9,285-289 (1988). beemd, Ed., Chemometrics Methods in Molec-
44. C. PBrez, M. Pastor, A. R. Ortiz, and F. Gago, ular Design, VCH, Weinheiflew York, 1995,
J. Med. Chem., 41,836-852 (1998). pp. 319-338.
45. G. Klebe in H. Kubinyi, G. Folkers, and Y. C. 67. J. M. Sutter, S. L. Dixon, and P. C. Jurs,
Martin, Eds., 3D QSAR in Drug Design, Vol. 3, J. Chem. Inf. Comput. Sci., 35, 77 (1995).
KluwerffiSCOM,Dordrecht, The Netherlands, 68. D. Rogers and A. J. Hopfinger, J. Chem. Znf.
1998, pp. 87-104. Comput. Sci., 34,854-866 (1994).
46. H. Kubinyi, F. A. Hamprecht, T. Mietzner, 69. H. Kubinyi, Quant. Struct.-Act. Relat., 13,
J. Med. Chem., 41,2553-2564 (1998). 285-294 (1994).
47. S. Wold in H. van de Waterbeemd, Ed., Chemo- 70. H. Kubinyi, Quant. Struct.-Act. Relat., J3,
metrics Methods in Molecule Design, VCH, 393-401 (1994).
Weinheimmew York, 1995, pp. 195-218.
71. B. T. Luke, J. Chem. Znf. Comput. Sci., 34,
48. B. Hoffman, S. J . Cho, W. Zheng, S. Wyrick, 1279-1287 (1994).
D. E. Nichols, R. B. Mailman, and A. Tropsha,
J. Med. Chem., 42,32173226 (1999). 72. S.-S. So and M. Karplus, J. Med. Chem., 39,
1521-1530 (1996).
49. W. Zheng and A. Tropsha, J. Chem. Znf. Com-
put. Sci., 40, 185-194 (2000). 73. K. Hasegawa, T. Kimura, and K. Funatsu,
J.Chem. Znf. Comput. Sci., 39,112-120 (1999).
50. Ajay, J. Med. Chem., 36, 3565-3571 (1993).
51. L. S. Anker and P. C. Jurs, Anal. Chem., 62, 74. M. Baroni, G. Costantino, G. Cruciani, D. Rig-
2676 (1990). anelli, R. Valigi, and S. Clementi, Quant.
Struct.-Act. Relat., 12,9-20 (1993).
52. P. C. Jurs, J. W. Ball, and L. S. Anker, J. Chem.
Znf. Comput. Sci., 32,272 (1992). 75. S. J. Cho and A. Tropsha, J. Med. Chem., 38,
53. T. M. Nelson and P. C. Jurs, J. Chem. Znf. Com- 1060-1066 (1995).
put. Sci., 34, 601 (1994). 76. L. B. Kier, L. H. Hall, and J. W. Frazer,
54. D. T. Stanton and P. C. Jurs, J. Chem. Inf. J. Chem. Znf. Comput. Sci., 33,143 (1993).
Comput. Sci., 32, 109 (1992). 77. L. H. Hall, L. B. Kier, and J. W. Frazer,
55. S. Hellberg, M. Sjostrom, B. Skagerberg, and S. J. Chem. Znf. Comput. Sci., 33,148 (1993).
Wold, J. Med. Chem., 30, 1126-1135 (1987). 78. L. H. Hall, R. S. Dailey, and L. B. Kier,
56. B. R. Kowalski and C. F. Bender, J. Am. Chem. J. Chem. Znf. Comput. Sci., 33, 598 (1993).
SOC.,96,916-918 (1974). 79. L. H. Hall and L. B. Kier in K. B. Lipkowitz and
57. K. C. Chu, R. J. Feldmann, N. B. Shapiro, G. F. D. B. Boyd, Eds., Reviews in Computational
Harard, and R. I. Geran, J. Med. Chem., 18, Chemistry IZ, VCH, W e i n h e i f l e w York,
539-545 (1975). 1991, pp. 367-422.
References
80. A. K. Debnath, R. L. Lopez de Compadre, G. 102. M. Shen, A. LeTiran, Y. Xiao, H. Kohn, and A.
Debnath, A. J. Shusterman, and C. Hansch, Tropsha, J. Med. Chem., 45, 2811-2823
J. Med. Chem., 34,786-797 (1991). (2002).
81. A. N. Jain, K. Koile, and D. Chapman, J. Med. 103. F. H. Allen, J. E. Davies, J . J. Galloy, 0.John-
Chem., 37,2315-2327 (1994). son, 0. Kennard, C. F. Macrae, E. M. Mitchell,
82. F. R. Burden, Quant. Struct.-Act. Relat., 15, G. F. Mitchell, J . M. Smith, and D. G. Watson,
7-11 (1996). J. Chem. Znf. Comput. Sci., 31,187-204 (1991).
83. P. G. Dittmar, N. A. Farmer, W. Fisanick, R. C. 104. F. H. Allen, S. Bellard, M. D. Brice, B. A. Cart-
Haines, and J. Mockus, J. Chem. Znf. Comput. wright, A. Doubleday, H. Higgs, T. Hum-
Sci., 23,93-102 (1983). melink, B. G. Hummelink-Peters, 0.Kennard,
W. D. S. Motherwell, J. R. Rodgers, and D. G.
84. R. E. Carhart, D. H. Smith, and R. Venkat- Watson, Acta Crystallogr. Sect. B, B35, 2331-
araghavan, J. Chem. Znf. Comput. Sci., 25, 2339 (1979).
64-73 (1985).
105. A. Rusinko 111, J. M. Skell, R. Balducci, C. M.
85. R. Nilakantan, N. Bauman, J. S. Dixon, and R. McGarity, and R. S. Pearlman, Concord, APro-
Venkataraghavan, J. Chem. Inf. Comput. Sci., gram for the Rapid Generation of High Quality
27,82-85 (1987). Approximate 3-Dimensional Molecular Struc-
86. C. A. Pepperrell and P. Willett, J. Cornput.- tures, The University of Texas at Austin and
Aided Mol. Des., 5,455-474 (1991). Tripos Associates, St. Louis, MO, 1988.
87. R. Nilakantan, N. Bauman, and R. Venkat- 106. R. S. Pearlman, Chem. Des. Aut. News, 2 , l - 6
araghavan, J. Chem. Znf. Comput. Sci., 33, (1987).
79-85 (1993). 107. J. Gasteiger, C. Rudolph, and J. Sadowski, Tet-
88. R. P. Sheridan, M. D. Miller, D. J. Underwood, rahedron Comput. Methodol., 3, 537-547
and S. K. Kearsley, J. Chem. Inf. Comput. Sci., (1990).
36,128-136 (1996). 108. G. R. Marshall, C. D. Barry, H. E. Bosshard,
89. F. R. Burden, Quant. Struct.-Act. Relat., 16, R. A. Dammkoehler, and D. A. Dunn in E. C.
309-314 (1997). Olson and R. E. Christoffersen, Eds., Com-
puter-Assisted Drug Design, Vol. 112, Arneri-
90. B. D. Silverman and D. E. Platt, J.Med. Chem.,
can Chemical Society, Washington DC, 1979,
39,2129-2140 (1996).
pp. 205-226.
91. D. A. Winkler, F. R. Burden, and A. Watkins,
109. G. E. Kellogg, S. F. Semus, and D. J. Abraham,
Quant. Struct.-Act. Relat., 17, 14-19 (1998).
J. Cornput.-AidedMol. Des., 5,545-552 (1991).
92. R. D. Brown and Y. C. Martin, J. Chem. Znf.
110. P. J. Goodford, J. Med. Chem., 28, 849-857
Comput. Sci., 37, 1-9 (1997).
(1985).
93. D. A. Winkler and F. R. Burden, Quant.
111. A. Agarwal, P. P. Pearson, E. W. Taylor, H. B.
Struct.-Act. Relat., 17, 224-231 (1998).
Li, T. Dahlgren, M. Herslof, Y. Yang, G. Lam-
94. G. M. Downs and P. Willett in K. B. Lipkowitz bert, D. L. Nelson, J. W. Regan, and A. R. Mar-
and D. B. Boyd, Eds., Reviews in Computa- tin, J. Med. Chem., 36,4006-4014 (1993).
tional Chemistry, Vol. 7, VCH, Weinheimmew
112. C. L. Waller, T. I. Oprea, A. Giolitti, and G. R.
York, 1996, pp. 1-65.
Marshall, J. Med. Chem., 36, 4152-4160
95. Molconn-Z version 3.5, Hall Associates Con- (1993).
sulting, Quincy, MA.
113. A. K. Debnath, C. Hansch, K. H. Kim, andY. C.
96. M. Petitjean, J. Chem. Znf. Comput. Sci., 32, Martin, J. Med. Chem., 36, 1007-1016 (1993).
331-337 (1992).
114. M. Y. Brusniak, R. S. Pearlman, K. A. Neve,
97. H. Wiener, J. Am. Chem. Soc., 69,17 (1947). and R. E. Wilcox, J. Med. Chem., 39,850-859
98. J. R. Platt, J. Phys. Chem., 56,328 (1952). (1996).
99. C. Shannon and W. Weaver, Mathematical 115. Y. C. Martin, Methods Enzymol., 203,587-613
Theory of Communication, University of Illi- (1991).
nois, Urbana, 1949. 116. Y. C. Martin, M. G. Bures, E. A. Danaher, J.
100. D. Bonchev, 0. Mekenyan, and N. Trinajstic, DeLazzer, I. Lico, and P. A. Pavlik, J. Cornput.-
J. Comput. Chem., 2,127-148 (1981). Aided Mol. Des., 7, 83-102 (1993).
101. The program Sybyl is available from Tripos 117. S. J. Cho, M. G. Serrano, J. Bier, and A. Trop-
Associates, St. Louis, MO. sha, J. Med. Chem., 39,5064-5071 (1996).
74 Recent Trends in Quantitative Structure-Activity Relationships
118. J. L. Sussman, M. Harel, F. Frolow, C. Oefner, 136. (a) Available from the author's WWW home
A. Goldman, L. Toker, and I. Silman, Science, page at http://mmlinl.pha.unc.edu/-jinl
253,8872-8879 (1991). QSARI (b) A. Tropsha, S. J. Cho, and W. Zheng
119. M. Harel, I. Schalk, L. Ehret-Sabatier, F. in A. L. Parrill and M. R. Reddy, Eds., Rational
Bouet, M. Goeldner, C. Hirth, P. H. Axelsen, I. Drug Design: Novel Methodology and Practi-
Silman, and J. L. Sussman, Proc. Natl. Acad. cal Applications, ACS Symposium Series 719,
Sci. USA, 90,9031-9035 (1993). 1999, pp. 198-211.
120. R. D. Cramer 111, S. A. DePriest, D. E. Patter- 137. T. A. Andrea and H. Kalayeh, J. Med. Chem.,
son, and P. Hecht in H. Kubinyi, Ed., 30 34,2824-2836 (1991).
QSAR in Drug Design: Theory, Methods, and 138. J. D. Hirst, R. D. King, and M. J. Sternberg,
Applications, ESCOM Scientific, Leiden, The J. Cornput.-Aided Mol. Des., 8, 405-420
Netherlands, 1993, pp. 443-485. (1994).
121. M. Baroni, G. Costantino, G. Cruciani, D. Rig- 139. J. D. Hirst, R. D. King, and M. J. Sternberg,
anelli, R. Valigi, and S. Clementi, Quant. J. Cornput.-Aided Mol. Des., 8, 421-432
Strut.-Act. Relat., 12, 9-20 (1993). (1994).
122. M. Pastor, G. Cruciani, I. McLay, S. Pickett, 140. I. V. Tetko, V. Yu. Tanchuk, N. P. Chentsova,
and S. Clementi, J.Med. Chem., 43,3233-3243 S. V. Antonenko, G. I. Poda, V. P. Kukhar, and
(2000). A. I. Luik, J. Med. Chem., 37, 2520-2526
123. P. Ehrlich, Dtsch. Chem. Ges., 42, 17 (1909). (1994).
124. C. Humblet and G. R. Marshall, Annu. Rep. 141. D. T. Manallack, D. D. Ellis, and D. J. Living-
Med. Chem., 15,267-276 (1980). stone, J. Med. Chem., 37,3758-3767 (1994).
125. S. A. DePriest, D. Mayer, C. B. Naylor, and 142. D. J. Maddalena and G. A. Johnston, J. Med.
G. R. Marshall, J. Am. Chem. Soc., 115,5372- Chem., 38,715-724 (1995).
5384 (1993). 143. G. Bolis, L. Pace, and F. A. Fabrocini, J. Com-
126. S. Wang, D. W. Zaharevitz, R. Sharma, V. E. put.-Aided Mol. Des., 5,617-628 (1991).
Marquez, N. E. Lewin, L. Du, P.M. Blumberg, 144. R. D. King, S. Muggleton, R. A. Lewis, and
and G. W. A. Milne, J. Med. Chem., 37,4479- M. J. Sternberg, Proc. Natl. Acad. Sci. USA,
4489 (1994). 89,11322-11326 (1992).
127. I. Motoc, R. A. Dammkoehler, and G. R. Mar- 145. A. N. Jain, T. G. Dietterich, R. H. Lathrop, D.
shall, Mathematics and Computational Con- Chapman, R. E. Critchlow Jr., B. E. Bauer,
cepts in Chemistry, Ellis Honvood, Chichester, T. A. Webster, and T. Lozano-Perez, J. Com-
UK, 1985, pp. 222-251. put.-Aided Mol. Des., 8,635-652 (1994).
128. D. Mayer, C. B. Naylor, I. Motoc, and G. R. 146. J. D. Hirst, J. Med. Chem., 39, 3526-3532
Marshall, J. Cornput.-Aided Mol. Des., 1, 3-16 (1996).
(1987). 147. V. S. Rose, J. Wood, and H. J. H. MacFie in H.
129. R. P. Sheridan, R. Nilakantan, J. S. Dixon, and van de Waterbeemd, Ed., Advanced Computer-
R. Venkataraghavan, J. Med. Chem., 29,899- Assisted Techniques in Drug Discovery, VCH,
906 (1986). WeinheimINew York, 1995, pp. 228-242.
130. G. W. A. Milne, M. C. Nicklaus, J. S. Driscoll, S. 148. Y. Hamamoto, S. Uchimura, and S. Tomita,
Wang, and D. Zaharevitz, J. Chem. Znf: Com- ZEEE Trans. Pattern Anal. Machine Zntell., 19,
put. Sci., 34, 1219-1224 (1994). 73-79 (1997).
131. CatalystMypo Tutorial, version 2.0, BioCAD 149. A. Djouadi and E. Bouktache, ZEEE Trans.
Corp., Mountain View, CA, 1993. Pattern Anal. Machine Zntell., 19, 277-282
132. P. W. Sprague, Perspect. Drug Discov. Des., 3, (1997).
1-20 (1995). 150. 0. Strouf, Chemical Pattern Recognition, Re-
133. D. Barnum, J. Greene, A. Smellie, and P. search Studies Press, Chichester, UK, 1986.
Sprague, J. Chem. Znf. Comput. Sci., 36,563- 151. M. L. Rayrner, P. C. Sanschagrin, W. F. Punch,
571 (1996). S. Venkataraman, E. D. Goodman, and L. A.
134. HipHop Tutorial, version 2.3, Molecular Sim- Kuhn, J. Mol. Biol., 265,445-464 (1997).
ulation Inc., Sunnyvale, CA, 1995. 152. S. C. Basak and G. D. Grunwald, SAR QSAR
135. V. Golender and B. Vesterman, Network Sci- Environ. Res., 3, 265-277 (1995).
ence (http://www.netsci.org/Science/Compchem/ 153. S. C. Basak, S. Bertelsen, and G. D. Grunwald,
featureO9. html). Toxicol. Lett., 79,239-250 (1995).
References
154. S. C. Bas& and G. D. Grunwald, Chemosphere, 175. F . R. Burden and D. A. Winkler, J. Med.
31,2529-2546 (1995). Chem., 42,3183-3187 (1999).
155. S. C. Basak and G. D. Grunwald, New 176. F. R. Burden, M. G. Ford, D. C. Whitley, and
J. Chem., 19,231 (1995). D. A. Winkler, J. Chem. Inf. Comput. Sci., 40,
156. X. Gironbs, A. Gallegos, and C.-D. Ramon, 1423-1430 (2000).
J. Chem Inf Comput. Sci., 46, 1400-1407 177. M. J. Adams, Chemometrics in Analytical
(2000). Spectroscopy, T h e Royal Society of Chemistry,
157. B. Bordhs, T . Kijmives, Z. Szant6, and A. London, 1995.
Lopata, J. Agric. Food Chem., 48, 926-931 178. T . Potter and H. Matter, J. Med. Chem., 41,
(2000). 478-488 (1998).
158. Y . Fan, L. M. Shi, K. W . Kohn, Y . Pommier, 179. M. Lajiness, M. A. Johnson, and G. M. Maggiora
and J. N. Weinstein, J. Med. Chem., 44,3254- in J. L. Fauchere, Ed., Quantitative Structure-
3263 (2001). Activity Relationships in Drug Design, Alan R.
Liss, New York, 1989, pp. 173-176.
159. M. Randic and S. C. Basak, J. Chem. Inf. Com-
put. Sci., 40,899-905 (2000). 180. R. Taylor, J. Chem. Inf. Comput. Sci., 35,
59-67 (1995).
160. T . Suzuki, K. Ide, M. Ishida, and S. Shapiro,
J. Chem. Inf. Comput. Sci., 41, 718-726 181. M. Snarey, N. K. Terrett, P. Willett, and D. J .
(2001). Wilton, J. Mol. Graphics Model., 15, 372385
(1997).
161. M. Recanatini, A. Cavalli, F. Belluti, L. Piazzi,
182. R.W . Kennard and L. A. Stone, Technometrics,
A. Rarnpa, A. Bisi, S. Gobbi, P. Valenti, V . An-
11,137-148 (1969).
drisano, M. Bartolini, and V . Cavrini, J. Med.
Chem., 43,2007-2018 (2000). 183. B. Bourguignon, P. F. Deaguiar, K. Thorre,
and D. L. Massart, J. Chromatogr. Sci., 32,
162. J. A. Morbn, M. Campillo,V . Perez, M. Unzeta, 144-152 (1994).
and L. Pardo, J. Med. Chem., 43, 1684-1691
184. B. Bourguignon, P. F. Deaguiar, M. S. Khots,
(2000).
and D. L. Massart, Anal. Chem., 66, 893-904
163. A. Golbraikh and A. Tropsha, J. Mol. Graphics (1994).
Model., 20,269-276 (2002). 185. S. Hellberg, L. Eriksson, J. Jonsson, F.
164. J. Huuskonen, J. Chem. Inc Comput. Sci., 41, Lindgren, M. Sjostrom, B. Skagerberg, S.
425-429 (2001). Wold, and P. Andrews, Int. J. Pept. Protein
165. I. V. Tetko, V . V . Kovalishyn, and D. J. Living- Res., 37,414-424 (1991). .
stone, J. Med. Chem., 44, 2411-2420 (2001). 186. L. Eriksson and E. Johansson, Chemom. Intell.
166. W. W u , B. Walczak, D. L. Massart, S. Heuerd- Lab. Syst., 34, 1-19 (1996).
ing, F. Erni, I. R. Last, and K. A. Prebble, Che- 187. R. Carlson, Design and Optimization in Or-
mom. Intell. Lab. Syst., 33, 35-46 (1996). ganic Synthesis, Elsevier, Amsterdam/New
167. A. Yasri and D. Hartsough, J. Chem. Inf Com- York, 1992.
put. Sci., 41, 1218-1227 (2001). 188. E. J. Martin and R. E. Critchlow, J. Comb.
168. P. Bernard, D. B. Kireev, J. R. Chretien, P. L. Chem., 1,32-45 (1999).
Fortier, and L. Coppet, J. Cornput.-Aided Mol. 189. A. Miller and N.-K. Nguyen, Appl. Stat., 43,
Des., 13,355-371 (1999). 669-678 (1994).
169. Y . Takeuchi, E. F. B. Shands, D. D. Beusen, 190. T . J. Mitchell, Technometrics, 42, 48-54
and G. R. Marshall, J. Med. Chem., 41,3609- (2000).
3623 (1998). 191. S. Wold and L. Eriksson in H. van de Water-
170. G. V . Kauffmanand P. C. Jurs, J. Chem. Inf. beemd, Ed., Chemometrics Methods i n Molec-
Comput. Sci., 41, 1553-1560 (2001). ular Design,VCH, WeinheimINewYork, 1995,
pp. 309-318.
171. B. E. Mattioni and P. C. Jurs, J. Chem. Inf. 192. E. Novellino, C. Fattorusso, and G. Greco,
Comput. Sci., 42,94-102 (2002). Pharm. Acta Helv., 70, 149-154 (1995).
172. J. Gasteiger and J. Zupan, Angew. Chem., 32, 193. U. Norinder, J. Chemom., 10,95-105 (1996).
503 (1993). 194. N. S. Zefirov and V . A. Palyulin, J. Chem. Inf.
173. Y . L. Loukas, J. Med. Chem., 44, 2772-2783 Comput. Sci., 41, 1022-1027 (2001).
(2001). 195. L. Sachs, Applied Statistics: A Handbook of
174. P. Bernard, M. Pintore, J.Y. Berthon, and J. R. Techniques, Springer-Verlag, BerlirdNew
Chretien, Eur. J. Med. Chem., 36,l-19 (2001). York, 1984.
76 Recent Trends in Quantitative Structure-Activity Relationships
DENISE D. BEUSEN
Tripos, Inc.
St. Louis, Missouri
Contents
1 Introduction, 78
2 Background and Methods, 79
2.1 Molecular Mechanics, 79
2.1.1 Force Fields, 79
2.1.2 Electrostatics, 81
2.1.2.1 The Dielectric Problem and
Solvation, 83
2.1.2.2 The "Hydrophobic" Effect, 85
2.1.2.3 Polarizability, 85 .
2.1.3 The Potential Surface, 85
2.1.3.1 Optimization, 86
2.1.3.2 Potential Smoothing, 86
2.1.3.3 Genetic Algorithm, 87
2.1.4 Systematic search and Conformational
Analysis, 89
2.1.4.1 Rigid Geometry Approximation,
89
2.1.4.2 Combinatorial Nature of the
Problem, 89
2.1.4.3 Pruning the Combinatorial
Tree, 90
2.1.4.4 Rigid Body Rotations, 90
2.1.4.5 The Concept and Exploitation
of Rings, 91
2.1.4.6 Conformational Clustering and
Families, 92
2.1.4.7 Conformational Analysis, 93
2.1.4.8 Other Implementations of
Systematic Search, 94
Burger's Medicinal Chemistry and Drug Discovery 2.1.5 Statistical Mechanics Foundation, 94
Sixth Edition, Volume 1: Drug Discovery 2.1.6 Molecular Dynamics, 95
Edited by Donald J. Abraham 2.1.6.1 Integration, 95
ISBN 0-471-27090-3 O 2003 John Wiley & Sons, Inc. 2.1.6.2 Temperature, 96
Molecular Modeling in Drug Design
available to guide in modeling. Nevertheless, the distance between the atoms. It is balanced
useful information to guide the design and by a repulsion between the electronic clouds as
synthesis of potential novel therapeutics can the atoms come close and this interaction has
be developed from an analysis of structure- been represented empirically by a variety of
activity data in the three-dimensional frame- functional forms: exponential, 12th power, or
work provided by current molecular modeling 9th power of the distance between the atoms.
techniques. Although most of the techniques The coefficients for these two interactions are
and approaches described have broader appli- parameterized for atom types, usually by ele-
cation than shown, the examples chosen ment, so that the minimum of the combined
should be sufficient to illustrate their use. A functions corresponds to the sum of the exper-
number of reviews (12-18) of computer-aided imental van der Waals radii for the two atoms.
drug design have relevant sections covering In addition, bonded atoms are considered
portions of this chapter with different per- as a special case, with a "spring constant" de-
spectives and are recommended for a more termining the energy of deformation from ex-
complete overview. perimental bond lengths. Atoms directly
bonded to the same atom (one-three interac-
tions) are eliminated from the van der Waals
2 BACKGROUND AND METHODS
list and have a special energetic term relating
the deviation from an ideal bond angle. Atoms
2.1 Molecular Mechanics
having a one-four interaction define a tor-
Molecular mechanics (19) treats a molecule as sional relation that is usually parameterized
a collection of atoms whose interactions can be based on the types of the four connected atoms
described by Newtonian mechanics. Because defining the torsion angle. The numerous
the mass of the nuclei is much greater than combinations of atom types require an enor-
the mass of the electrons, one can separate mous number of parameters to be determined
(the Born-Oppenheimer approximation) the from either theoretical (quantum mechanics)
Schrodinger equation into a product of two and/or experimental data. Simplified force
functions: one for electrons, one for nuclei. fields in which the torsional parameters de-
For the purposes of molecular mechanics, the pend only on the atoms at the end of a bond
electronic function, initially developed to in- have been developed, to give approximate ge- .
terpret spectroscopic data, is ignored; that is, ometries for further refinement by quantum
the charge distribution is assumed to remain mechanics.
constant during changes in the position of the
nuclei. Because molecular mechanics is based 2.1.1 Force Fields. The basic assumption
on classical physics, it cannot provide informa- underlying molecular mechanics is that classi-
tion about the electronic properties of mole- cal physical concepts can be used to represent
cules under study that are generally assumed the forces between atoms. In other words, one
fmed during the parameterization of the force can approximate the potential energy surface
field with experimental data. by the summation of a set of equations repre-
A few words about the basics of molecular senting pairwise and multibody interactions.
mechanics (19, 20) may provide the elements These equations represent forces between at-
of understanding for what follows. This is not oms related to bonded and nonbonded interac-
meant to be comprehensive, but rather a sim- tions. Pairwise interactions are often repre-
ple overview, to remind the reader of a few sented by a harmonic potential [YzKb(b- bJ2]
crucial points. For a comprehensive overview that obeys Hooke's law (derived for a spring)
of molecular modeling, the reader is referred for bonded atoms, restoring the bond distance
to the excellent text by Leach (21). The inter- to an equilibrium value b,, and a van der
actions between atoms are divided into Wads potential [C,,(i, j)/rG12- CJi, j)/rG6]for
bonded and nonbonded classes. Nonbonded nonbonded atoms. Similarly, distortion from
forces between atoms are based on an attrac- an equilibrium valence angle (8,) describing
tive interaction that has a firm theoretical ba- the angle between three bonded atoms shar-
sis and varies as the inverse of the 6th power of ing a common atom is also penalized [YzKe(8-
Molecular Modeling in Drug Design
00)2].A third class of interaction dependent on - bJ(0 - O,)], dihedral angles and bond an-
the dihedral angle C#J between four bonded at- gles, and so forth. Because of the lack of ade-
oms is the torsional potential {KJl + cod+ - quate parameterization of the more complex
S)]} used to account for orbital delocalization force fields that are usually specialized to one
and to compensate for other deficiencies in the kind of molecule (e.g., proteins or nucleic ac-
force field. A harmonic term [?hK5(5- is ids), more simplified force fields have gained
often introduced for dihedral angles 5 that are some popularity because of their general ap-
relatively fixed, such as those in aromatic plicability, despite limited accuracy.
rings. Coulomb's law [qiqj/(4m0&yij)lis the Examples are the Tripos force field (221,the
simplest approach to the contribution of elec- COSMIC force field (23), and that of White
trostatics to the potential V: and Bovill (24), which uses only two atom
types, those at the end of the bond to parame-
terize the torsional potential rather than the
four types of the atoms used to define the tor-
sional angle. One has only to consider the
number of combinations of 20 atom subtypes
taken four at time (160,000) versus two at a
time (400) to understand the explosion of pa-
rameters that occurs with increased atom sub-
types. The simplifying assumption in parame-
terization of the torsional potential reduces to
A central issue is the number of different some extent the quality of the results (251, but
atom types that are used in a particular force allows the use of the simplified force fields (22)
field. There is always a compromise between in many situations where other force fields
increasing the number to allow for the inclu- would lack appropriate parameters. The situ-
sion of more environmental effects (i.e., local ation can become complicated, however. For
electronic interactions) vs. the increase in the example, the amide bond is normally repre-
number of parameters to be determined to ad- sented by one set of parameters, whether the
equately represent a new atom type. In gen- configuration is cis or trans. Experiment$
eral, the more subtypes of atoms (how many data are quite compelling that the electronic
different kinds of nitrogen, for example), the state is different between the two configura-
less likely that the parameters for a particular tions, and different parameter sets should be
application will be available in the force field. used for accurate results (Fig. 3.1). Only AM-
The extreme, of course, would be a special BERIOPLS currently distinguishes between
atom type for each kind of atomic environ- these two conformational states (26). Cer-
ment in which the parameters were chosen, so tainly, the limited parameterization of simpli-
that the calculated properties of each molecule fied force fields would not allow accurate pre-
would simply reproduce the experimental ob- diction of spectra that is more reflective of the
servations. One major assumption, therefore, dynamic behavior of the molecule.
is that the force constants (parameters) and Accurate estimates of energy may require
equilibrium values of the equations are func- accurate representation of the dynamics of
tions of a limited number of atom types and molecules and justify derivation of the larger
can be transferred from one molecular envi- number of parameters. The new version (27)
ronment to another. This assumption holds of the Allinger force field, MM3, has the objec-
reasonably well where one may be primarily tive of reproducing spectral data more accu-
interested in geometric issues, but is not so rately than MM2. Much of the chemistry re-
valid in molecular spectroscopy. This had led mains to be incorporated into appropriate
to the introduction of additional equations, force fields. Only recently have adequate mod-
the so-called "cross-terms" which allow addi- ifications been made to the force fields devel-
tional parameters to account for correlations oped for organic molecules to include some
between bond lengths and bond angles [K,,(b metals (28-31). Carlsson (32, 33) recently de-
2 Background and Methods
trans-amide cis-amide
veloped a functional form that allows elec- gen bond is included. Because atoms involved
tronic d-orbitals of metals to be reasonably in a hydrogen bond are often closer than the
represented within molecular mechanics. sum of their VDW radii, they must be handled
Because different force fields may use dif- in a special manner. Several force fields have
ferent mathematical representations of the special functional forms with angular depen-
forces between atoms and the details of their dency that not only have special VDW param-
parameterization will in general differ also, it eters, to ensure that the close approach of the
is unwise to use parameters derived for one atoms involved is calculated correctly, but .
force field to replace missing parameters in that the angular distribution observed for hy-
another. One often hears of a "balanced" pa- drogen bonds is also reproduced. Hagler et al.
rameter set that reproduces well the phenom- (34) used an amide hydrogen with a zero VDW
ena under consideration, but which is inade- radius for hydrogen bonding and a slightly
quate for other applications. A comparison by greater nitrogen radius to give a correct amide
Burkert and Allinger (19) shows the different hydrogen bond distance. The charges on the
van der Wads (VDW) potentials used in sev- atoms involved (including the amide hydro-
eral of the popular force fields, and the situa- gen) are adjusted to give an appropriate bal-
tion has not improved significantly in the in- ance of VDW repulsion and dipole attraction.
tervening years. Because of other differences Clearly, the method for handling the electro-
in parameters and functional forms of the static interaction is an integral part of each
equations used in the rest of the individual force field and cannot be modified indepen-
force fields, these quite different approaches dently.
to the VDW potential give excellent results
when used in the correct combination. Indis- 2.1.2 Electrostatics. The most difficult as-
criminant combination of one part of a force pect of molecular mechanics is electrostatics
field with another derived independently (35-38). In most force fields, the electronic dis-
would lead to considerable divergence in the tribution surrounding each atom is treated as
calculated results from those obtained by ex- a monopole with a simple coulombic term for
perimental observation. the interaction. The effect of the surrounding
The most extreme difference between force medium is generally treated with a continuum
fields arises in the method by which the hydro- model by use of a dielectric constant. More
Molecular Modeling in Drug Design
detailed approaches with distributed multi- connecting the center of the dipole with
pole representations of the electron distribu- charge and dipole orientation, and r is the dis-
tion (39,40) andlor efforts to deal with dielec- tance between the center of the ion and the
tric inhomogeneity through solution of the center of the dipole.
Poisson equation are clear improvements and Charge-Charge lnteractions (r- '1. The en-
have become routine in many studies. Other ergy of interaction between two charges q,
difficulties arise in dealing with macromolec- and q, is given by Coulomb's law:
ular systems, given that the electrostatic in-
teraction is long ranged (llr)and the interac-
tions cannot be arbitrarily terminated with
distance. Electrostatic interactions range
from those operating only at very short dis- where r,, is the distance separating charges
tances that are nonspecific (dispersiveinterac- and E is the dielectric constant of the medium.
tions, rP6dependency) to those operating at To evaluate atom-atom interactions using
very long distances with a high degree of spec- Coulomb's law, the concept of net atomic
ificity (charge-charge interactions, r-' depen- charge is invoked. This amounts to represent-
dency). ing charge as a point, a monopole, and is an
Dispersive lnteractions (rF6). These are at- artificial construct. Nevertheless, this is the
tributed to interaction of induced dipoles common method. Recent improvements in cal-
within the electron clouds as molecules come culating an appropriate set of point charges, to
in proximity and are responsible for the at- accurately reproduce the molecular electro-
tractive part of the nonbonded van der Wads static potential derived by quantum calcula-
interaction. tions, have been reported (41).
Dipole-Dipole lnteractions (rP3). Because of In an effort to increase the quality of elec-
the nonsymmetrical distribution of electrons trostatic representations, dipole and higher
between atoms of different size and electro- multipole moments have been used. There are
negativity, bonds have associated permanent advantages in these more accurate represen-
dipoles. The interaction energy between two tations, with a relatively small computational
of these dipoles depends on their relative ori- increase attributed to the reductions in dis-
entation. This is basically the interaction un- tances over which the higher moments have to
derlying the phenomenon of the hydrogen be summed, although they do require addi-
bond. Although some force field authors use a tional effort in the derivation of the parame-
special hydrogen bonding potential with an ters for the higher moments themselves. A
orientation dependency, simple partial charge good example is the distributed multipole
representations combined with appropriate model of electrostatics derived for peptides. A
VDW parameters can reproduce the effect as review by Williams (42) discusses the prob-
well (34). lems of deriving a distributed multipole ex-
Charge-Dipole lnteractions (rP2). A charge pansion of charge representation that accu-
interacting with a permanent dipole can be rately reproduces the molecular electrostatic
handled simply by considering the charge in- potential derived from quantum calculations.
teracting with the two charges at the poles of Comparisons were made between atomic mul-
the dipole. Alternatively, if the distance be- tipoles, bond dipole, and restricted bond dipole
tween the poles of the dipole is small compared models. Williams finds that a model for the
with that between the centers of the ion and electrostatic potential based on bond dipoles
the dipole, then the potential energy @ can be supplemented with monopoles (for ions) and
approximated as atomic dipoles (for lone pairs) is most useful.
Dipole-dipole energy converges much faster
@ = e p cos 01r2 than monopole-monopole energy. Molecular
charge at any desired position in a molecule is
where e is the charge of ion, p is the dipole not a physically measurable quantity; one can
moment, O is the angle between the vector only calculate a delocalized electron probabil-
2 Background and Methods
ity distribution from quantum theory. Clearly, gin of solvent effects on conformational equi-
the more complex the representation, the libria and reaction rates. The current status of
more accurately one can approximate the such efforts, as well as simulations to rational-
quantum mechanical results, and the more re- ize solvation effects, has been reviewed by
alistic should be the results obtained. One Richards et al. (55). There are two general ap-
complexity of electrostatics is the long dis- proaches to the continuum models. The first is
tances over which interactions occur. Appro- reaction field theory (Bell, Kirkwood, On-
priate means of truncating the long-range sager) that follows the classical treatment of
forces to maintain the accuracy of simulations Debye-Huckel. The solvent is considered in
are necessary (43-45) and progress in better
terms of charge distribution, polarizability,
approximations has been reported (46). The
and dielectric constant. The solvation energy
difficulties with cutoff schemes were demon-
strated (47,481by significant variations in the is determined simply by considering the solute
behavior of a 17-residue helical peptide simu- as a point dipole that interacts with the in-
lated with explicit waters, using various elec- duced charge distribution in the solvent (On-
trostatic schemes and by studies (49) of a pen- sager reaction field). An extension by Si-
tapeptide in aqueous ionic solution (50). In nangolou in the 1960s partitioned solvation
both cases, the Ewald approximation in which energy into cavity formation, solvent-solute
periodicity is assumed (which allows summa- interaction, and the "free volume" of the sol-
tion over much longer distances) gave supe- ute. The logical extension of this approach is
rior results (47-49). scaled-particle theory (56), in which the free
2.1.2.1 The Dielectric Problem and Solva- energy of formation of a hard-sphere cavity of
tion. Although methods of localizing charge diameter a2 in a hard-sphere solvent of diam-
just described may give reasonable results, the eter a and number density p is scaled to the
use of Coulomb's law with a dielectric con- exact solution for small cavity sizes. Alter-
stant, a scaling factor related to the polariz- natively, the virtual charge approach used a
ability of the medium between the charges, is system of effective and virtual charges inter-
clearly of concern. The dielectric at the molec- acting in the gas phase. The Hamiltonian of
ular level is neither homogeneous nor contin- the system is modified to include an imagi-
uous, nor even well defined, and thus violates
the basic assumption of Coulomb's law. Al-
nary particle, a "solvaton" with an opposite .
charge for each of the solute atoms and
though the use of a low, uniform dielectric is solved by the SCF procedure. These contin-
more nearly correct in dynamical simulations
uum models have met with limited success
where all solute and solvent atoms are explic-
(trends and relative effects of solvation
itly included, a variety of comparisons of ex-
can be predicted), although highly specific
perimental data with the results of calculation
by use of a simplified solvent model have led to molecular interactions, such as those involv-
the realization that much better approaches ing hydrogen-bonding groups, cannot be
are needed. Initial efforts (51) led to the pro- accommodated.
posal of a variable dielectric (1/R or 1/4R). In the equation for calculating affinity of
More recently, the use of approaches that a drug for a receptor, the ligand is solvated
model the inhomogeneity of the dielectric at either by the receptor or by the solvent. This
the interface between the solute and solvent competition means that accurate determina-
by use of the Poisson-Boltzman equation have tion of the free energy of solvation is impor-
shown considerable promise (52,53). An alter- tant in understanding differences in affini-
native approach that uses the mirror charge ties. Solvation free energy (G,,,) can be
approximation has been described by Schaefer approximated by three terms: G, the for-
and Froemmel(54). Excellent reviews (35-38) mation of a cavity in the solvent to hold the
of the electrostatic problem have appeared, to solute; Gvdw and G,,,, the interaction be-
which the reader is referred. tween solute and solvent divided be-
Much effort has been given to simple con- tween VDW and electrostatic forces, respec-
tinuum models of solvation to explain the ori- tively:
Molecular Modeling in Drug Design
izability, for example, by inclusion of induced are highly polar. A recent paper (83) from the
dipoles, or distributed polarizability (66) in Kollman group described nonadditive many-
the electrostatic representation of the model. body potential models to calculate ion solva-
Kuwajima and Warshel (67) recently exam- tion in polarizable water with good agreement
ined the effects of this refinement in modeling with experimental observation. It was neces-
crystal structures of polymorphs of ice. Such sary to include a three-body potential (ion-wa-
models including polarizability have been pre- ter-water) in the molecular dynamics simula-
viously shown useful for predicting the prop- tion of the ionic solution to obtain quantitative
erties of crystalline polymorphs of polymers agreement with solvation enthalpies and coor-
by Sorensen et al. (68). Caldwell et al. (69) dination numbers. Inclusion of a bond-dipole
included implicit nonadditive polarization en-
-
model with polarizability in molecular dynam-
ergies in water-ion outcomes, resulting in im- ics simulations has given excellent agreement
proved accuracy. At the semiempirical level of in predicting physical properties of polymers
quantum theory, Cramer and Truhlar (70-73) by Sorensen et al. (68).
added solvation and solvent effects on polariz- A novel approach based on the concept of
ability to AM1, with impressive agreement be- charge equilibration has been suggested by
tween experimental and calculated solvation Rappe and Goddard (84) that allows the inclu-
energies (60). Rauhut et al. (74) also intro- sion of polarizabilities in molecular dynamics
duced an arbitrarily shaped cavity model by calculations.
use of standard AM1 theory.
2.1.2.2 The "Hydrophobic" Effect. Water 2.1.3 The Potential Surface. The set of
has been the nemesis of solvation modeling equations that describe the sum of interac-
because of its rather unique thermodynamic tions between the ensemble of atoms under
properties, as reviewed by Frank (75) and consideration is an analytical representation
Stillinger (76). The biochemical literature dis- of the Born-Oppenheimer surface, which de-
cusses at length "hydrophobic effects" (77). scribes the energy of the molecule as a func-
This effect is not "hydrophobic" at all because tion of the atomic positions. Many important
the enthalpic interaction of nonpolar solutes properties of the molecule can be derived by
with water is favorable. This, however, is evaluation of this function and its derivatives.
counterbalanced by an unfavorable entropic For example, setting the value of the first de-.
interaction that is interpreted as an induced rivative to zero and solvingfor the coordinates
structuring of the water by the nonpolar sol- of the atoms leads one to minima, maxima,
ute. Water interacts less well with the nonpo- and saddlepoints. Evaluation of the sign of the
lar solute than it does with itself because of the second derivative can determine which of the
lack of hydrogen-bonding groups on the sol- above have been found. It is a straightforward
ute. This creates an interface similar to the procedure to calculate the vibrational fre-
air-water interface, with a resulting surface quencies from the force constants by evalua-
tension attributed to the organization of the tion of the eigenvalues of the secular determi-
hydrogen-bonded patterns available. This is nant (the mass-weighted matrix; see textbook
the so-called iceberg formation around nonpo- on vibrational spectroscopy). Gradient meth-
lar solutes in water, first suggested by Frank ods for the location of energy minima and
and Evans. Studies by both molecular dynam- transition states are an essential part of any
ics (78-80) and Monte Carlo simulations (81) molecular modeling package. It is essential to
support this interpretation (76), although remember, however, that minimization is an
there is still considerable controversy in inter- iterative method of geometrical optimization
pretation of experimental data (82). that is dependent on starting geometry, unless
2.1.2.3 Polarizabilify. The traditional ap- the potential surface contains only one mini-
proaches in molecular mechanics have ex- mum (a condition not found for any system of
cluded the effects of charge on induced dipoles sufficient complexity to be of real interest).
and multibody effects. This approximation be- The ability to locate both minima and tran-
comes a serious limitation when dealing with sition points enables one to determine the
charged systems and molecules like water that minimum energy reaction path between any
Molecular Modeling in Drug Design
two minima. In the case of flexible molecules, within the subset and can readily be identified
these minima could correspond to conformers by its potential value compared with that of
and the reaction path would correspond to the the other minima.
most likely reaction coordinate. One could es- 2.1.3.2 Potential Smoothing. One ap-
timate the rate of transition by determination proach to global optimization that has shown
of the height of the transition states (the acti- promise is potential smoothing (88). This ap-
vation energy) between the minima. Elbers proach uses a mathematical transformation to
(85) developed a new protocol for the location smooth the multidimensional -potential en-
of minima and transition states and applied it ergy surface of a molecule, reducing the high
to the determination of reaction paths for the frequency complexity of the surface and mak-
ing it much easier to search for minimum en-
conformational transition of a tetrapeptide
ergy conformations. This concept was first
(86). Huston and Marshall (87) used this ap-
used to deform the conformational potential
proach to map the reaction coordinates of the
energy surface in the diffusion equation
a- to 3,,-helical transition in model peptides.
method (DEM) of Piela and coworkers (89).
Despite the limitations that curtail exact Search procedures will not confront multiple
quantitative applications, molecular mechan- local minima on the deformed surface. If the
ics can provide three-dimensional insight as procedure is reversed iteratively, then one can
the geometric relations between molecules are trace the path back into a region that lies near
adequately represented. Electrical field poten- the global minimum of the undeformed poten-
tials can be calculated and compared to give a tial surface. Ponder et al. (88, 90) improved
qualitative basis for rationalizing differences the procedure for tracing back from one par-
in activity. Molecular modeling and its graph- tially deformed surface to the next by includ-
ical representation allow the medicinal chem- ing a local search procedure to limit detection
ist to explore the three-dimensional aspects of of false minima.
molecular recognition and to generate hypoth- One of the best known benchmark prob-
eses that lead to design and synthesis of new lems for conformational search involves the
ligands. The more accurate the representation determination of the low energy conforma-
of the potential surface of the molecular sys- tions of the highly flexible cycloheptadecane
tem under investigation, the more likely that (91, 92). This system continues to serve as a
the modeling studies will provide qualitatively test for newly developed search methods (93).
correct solutions. Although not a particularly large molecule,
2.1.3.1 Optimization. The search for the
this system is a challenge because of its flexi-
bility and the close energy spacing of the lower
optimal solution to a complex problem is com-
lying minima. Extensive analysis through a
mon to many areas in science and engineering
variety of search methods has located ex-
and does not have a general solution. Numer-
actly- 263 minima within 3.0 kcal/mole of the
ous approaches to this problem, which is gen- purported global minimum. The potential
erally referred to as optimization, have been smoothing search (PSS) (88) was dramatically
used in chemistry: most commonly, distance effective at locating many of the lowest energy
geometry, molecular dynamics, stochastic structures for cycloheptadecane. Although the
methods such as Monte Carlo sampling, and global minimum for cycloheptadecane was not
systematic, or grid, search. Most rely on min- located, the second lowest energy structure
imization, often combined with a stochastic was located and differed by only 0.01 kcall
search. Minimization algorithms have been mole. Based on its MM2 vibrational frequen-
thoroughly characterized with regard to their cies, the global minimum is entropically disfa-
convergence properties, but, in general only vored relative to all of the minima located by
locate the closest local minima to the starting the smoothing procedure. The PSS method
geometry of the system. A stochastic approach was also applied to obtain the minimum en-
to starting geometries can be combined with ergy conformation of the TM helix dimer of
minimization to find a subset of minima in the glycophorin A (GpA) (941, previously solved by
hope that the global minimal is contained solution NMR spectroscopy (95).
2 Background and Methods
2.1.3.3 Genetic Algorithm. Another ap- in this case internal energy, to be numerically
proach to global optimization is the genetic evaluated by molecular mechanics. Each chro-
algorithm. This approach is based on biologi- mosome in the population is evaluated for its
cal evolution and is analogous to natural selec- internal energy and a subset of the more fit
tion (96-98).In applications to computational selected for reproduction. The degree of limi-
chemistry, evolution on the computer has tation on reproductive fitness is analogous to
been shown to be an efficient approach to the selective pressure brought to bear on a
global optimization, although because of sam- population (i.e., selection of the fittest). This is
pling issues, there is no guarantee that the a parameter that can be varied in most GA
global optimum has been found in any partic- programs and one must balance selective pres-
ular application (99). sure against maintaining some variation in
2.1.3.3.1 Characteristics of the Genetic A/- the population for evolution to occur (to avoid
gorithrn. In analogy to natural selection, the being trapped in a local minimum). The set of
parameters to be optimized are encoded in a chromosomes to be reproduced can be based
bit string and strung together in a "chromo- on some arbitrary criteria (the top 50%), all
some." Each chromosome in the population those with fitness at least half that of the most
represents a particular genotype or solution to fit chromosome detected, or the fitness scaled
the problem under consideration (i.e., a spe- in some way and chromosomes reproduced in
cific set of values for the parameters that de- proportion to their scaled fitness.
termine the configuration of the system under Given a subset of chromosomes to repro-
study). The values of the parameters have to duce, several operations analogous to evolu-
be decoded for the "fitness" of a particular ge- tion are invoked. First is mutation, where a
notype to be evaluated. Once the fitness of certain number of randomly selected bits are
each chromosome in the population has been mutated from 0 to 1 or vice versa in the daugh-
evaluated, then the more "fit" members are ter chromosome. This would allow for changes
allowed to reproduce, mutate, or cross over in the settings of one or more torsional angles.
with other members of the parent population A certain number of pairs of chromosomes are
to generate a new daughter population. This also selected for crossover and one or more
process is repeated until the fitness of the pop- locations between genes (if specified) are ran-
ulation converges, or until the available com- domly selected and the two pieces derived .
puter cycles are consumed. from each parent chromosome swapped, to
2.7.3.3.2 Example of Conformational Analy- generate two or more novel chromosomes.
sis. The simplifying assumption of rigid geom- This would allow for different subsets of con-
etry is used to reduce the computational formations to be combined; this provides a
complexity of the model problem of conforma- mechanism for concerted changes or jumps
tional analysis. The elimination of variables is over barriers to find minima that would be
rationalized based on the high energy cost as- difficult to sample by mutation alone. This
sociated with bond length distortions and the would appear to be the feature that provides
ability to accommodate bond angle deforma- the analogous behavior to simulated anneal-
tions by a reduced set of van der Wads radii. ing in efficient searching of parameter space.
To represent the conformation of a molecule, In this case, however, the search is more di-
one needs only to specify the values of the tor- rected by the selective pressure of increasing
sional angles associated with rotatable bonds. the "fitness" or facing elimination from the
One can assign a set number N of bits, 6 for population. In other words, each new genera-
example, to represent 2N values for the tor- tion should have eliminated a significant por-
sional angles. Each set of 6 bits can be consid- tion of the less fit members of the previous
ered a "gene" and crossover allowed only at generation and propagated those torsional
gene boundaries, if desired. Thus, the confor- values that generate good local conforma-
mation of a molecule can be encoded as a set of tional states.
torsional genes. The actual coordinates of the 2.1.3.3.3 Schema and the Building Block
molecule corresponding to each genotype Hypothesis. Once a population of good local
must be generated for the fitness function F, substates has been established, then crossover
Molecular Modeling in Drug Design
can probe the combination of these subconfor- (97) to generalize the process of crossovers
mations that have positive interactions lead- without requiring customized crossover oper-
ing to more fit progeny. In the jargon of com- ators that are problem specific, although this
puter science, the subpattern of 1's and 0's is beyond the scope of this chapter.
giving a preferred subconformation would be a 2.1.3.3.6 Examples o f Applications to Bio-
schema (or building block). According to the chemical Problems. McGanah and Judson
most accepted theory, the building block hy- (100) explored the impact of different param-
pothesis, the genetic algorithm initially de- eters setting on the ability of the GA to explore
tects biases toward fitness in lower order the conformational space of cyclo(Gly,). Each
(fewer identical bits) schemas and converges residue was represented by four angles, each
on this part of search space (the entire set of with a string of four bits (1116 of range). A
bit strings). By combining information from selection fraction of 50% was used, which
lower order schema through crossovers, biases eliminated the lower half in fitness from re-
in higher order schemas are detected and production. Population sizes of 10,50, and 100
propagated. were tested. Each group was divided into four
The strong convergence property of the ge- niche populations with communication be-
netic algorithm is a major attraction. Given tween groups. Local minimization was per-
sufficient members of the population and suf- formed for each chromosome before evalua-
ficient evolutionary time (number of genera- tion. They- concluded that it was of little use to
tions), then one can expect convergence if the examine a population size of less than 100
fitness function is based on the optimal com- members for the 24 variables examined. As
bination of locally optimized substructures. soon as convergence in the average is detected
Some fitness functions are termed "decep- in a population, it should be cross-fertilized
tive," in that low order schemas are not from another niche or GA evolution should
present in higher order schemas and their terminate. It is a clear example of a hybrid
propagation slows detection of the more fit approach, in which GA does a rough search for
higher order schemas. Another problem arises minima and local minimization to find the
when the population size is too small or the closest local minimum.
selection factor too high. Then, the genetic al- Judson et al. (101) examined the use of a
gorithm can magnify a small sampling error genetic algorithm to find low energy conform-
and prematurely converge in a local optimum. ers of 72 small to medium organic molecdes
2.1.3.3.4 Mutations and Encoding. There (1-12 rotatable bonds) whose crystal struc-
are different ways to encode binary numbers tures were known. They used the elitist strat-
by bit strings and these can have some influ- egy, in which the best individual from each
ence on the impact of mutation. Traditional generation is propagated without modifica-
binary encoding requires that all bits be tion. A population size of 10 times the number
changed for some cases if the digital value is to of the nonring dihedral angles being varied
be simply incremented. This causes erratic be- was chosen. Each molecule was allowed to run
havior near an optimum, with mutation and for 10,000 energy evaluations, or until the
mutations in higher order bits having more population was bit converged. In a few cases,
effect than in lower order bits. conformers with lower energies than those ob-
2.1.3.3.5 Crossovers and Encoding. In our served in the crystal structure were found. A
example, we indicated that one might want to comparison with CSEARCH in SYJ3YL (Tri-
separate the bit string into genes correspond- pos, Inc.) was made, but the differences in ef-
ing to torsional angles because the gene has a ficiencies found were not compelling. In only 9
coherent meaning in the context of the prob- of the 72 cases examined. did the GA find its
lem. If one restricts crossovers to the junctions best conformer had energy greater than the
between genes, then the coherence of the con- crystal structure, with the largest deviation
formation of molecular fragments is preserved being only 0.8 kcallmol.
and one is more likely to make a successful The GA approach has also been applied to
crossover producing more fit offspring. There the docking problem with dihydrofolate reduc-
are methods such as random-key encoding tase, arabinose binding protein, and sialidase
2 Background and Methods
oms and atoms bonded to the same atom from formation. For linear molecules, there are n -
the check, which is necessary) and checked 1 bonds and the number of 1-3 interactions
against the allowed sum of VDW radii for the d e ~ e n don
s the valence of the atom. This sim-
two atoms involved. The number of VDW com- plication leads to a reduction of the number
parisons V is given by of VDW checks by the factor N(N - 1)/2,which
is multiplied by the number of conformations.
How can one reduce the number of confor-
mations that have to be checked? Here the
It should be clear that the VDW comparisons concept of construction becomes useful. One
are the rate-limiting step by their sheer constructs the conformations in a stepwise
number, and any algorithmic improvement fashion, starting with an initial aggregate and
that reduces the number of such checks or adding a second aggregate at a given torsional
enhances the efficiency of performing such increment for the torsional variable T that is
checks is of value. applied to the rotatable bond connecting the
2.1.4.3 Pruning the Combinatorial Tree. two. If any pair of atoms overlaps for that in-
From this simplified analysis, a systematic crement, then one can terminate the construc-
search of other than the smallest molecules at tion because no addition operation will reli'eve
a coarse increment would appear daunting. A that steric overlap. In effect, one has trun-
hybrid approach with a coarse grid search fol- cated the combinatorial possibilities that
lowed by minimization has been successfully would have included that subconformation;
used to locate minima. There are a number of that is, one has pruned the combinatorial tree.
algorithmic improvements over the "brute 2.1.4.4 Rigid Body Rotations. If one con-
force" approach that enhances the applicabil- structs the &oleculestepwise by the addition
ity of the systematic search itself. To under- of aggregates, then one has two sets of atoms
stand these improvements, some concepts to consider. First are those in the partial mol-
need to be defined. First is the concept (110) of
aggregate, a set of atoms whose relative posi-
tions are invariant to rotation of the T rota-
tional degrees of freedom. n-Butane is divided
into aggregates as an illustration (Fig. 3.4).
In this simple example, the atoms in an ag-
gregate are all either directly bonded or have a
1-3 relationship (i.e., are related by a bond
angle). Because of the rigid geometry approx-
imation, their relative positions are fixed. At-
oms contained within the same aggregate do
not, therefore, have to be included in the set of Figure 3.4. Decomposition of n-butane molecule
those that undergo VDW checks for each con- into aggregates.
2 ~ackgroundand Methods
I
I I I I
-150 150
Phi
= n = 96*
Comparisons of a variety of methods were with solvent. If more configurations of the sur-
made on cycloheptadecane by Saunders et al. rounding solvent molecules of equivalent en-
(91) and it was concluded that the stochastic ergy were available to the staggered than to
method was most efficient. In one of the few the eclipsed, then the staggered would have a
independent comparisons of the effectiveness higher statistical weight. From the inscription
of these procedures, Boehm et al. (122)studied on Boltzmann's tomb, we all recall that S = k
the sampling properties on the model system In W, where S is the entropy and k is Boltz-
caprylolactam, a nine-membered ring, and mann's constant. Thus, we have a link be-
concluded that systematic search was both in- tween statistics and thermodynamics. W in
efficient and ineffective at finding the minima this case would be the number of configura-
found by the other methods when the number
tions associated with the particular conforma-
of conformers examined was limited.
tion of ethane under consideration divided by
2.1.4.8 Other Implementations of System-
the total number of configurations sampled.
atic Search. Numerous other implementa-
tions of systematic, or grid, search programs This would have to be weighted by their en-
exist in the literature and those with protein ergy, of course, unless the distribution was al-
applications have been reviewed by Howard ready Boltzmann weighted, as happens when
and Kollman (123), whereas those for small or one uses the Metropolis algorithm (127).
medium sized molecules are included in the Another way of stating this is that the prob-
reviews by Burt and Greer (114) and by Leach ability Piof a particular configuration Ni is
(113). One of the more widely used programs proportional to its Boltzmann probability di-
in organic chemistry, MACROMODEL, has a vided by the Boltzmann probability of all the
search module (124) coupled to energy mini- other configurations or states:
mization for conformational analysis. MAC-
ROSEARCH has been developed by Beusen et
al. (125) to generate the set of conformers con-
sistent with experimental NMR data and used
to determine the conformation of a 15-residue
peptide antibiotic.
The denominator in this equation has been
2.1.5 Statistical Mechanics Foundation (126). given a special name, partition function, 6ften
To understand the relationships between the symbolized by Z, which is derived from the
simulation methods and the desired thermo- German Zustandsumme (sum over states).
dynamic quantities, a short review of the ma- The successive terms in the partition function
jor concepts of statistical mechanics may be in describe the partition of the configurations
order. This is not meant to be comprehensive, among the respectives states available. One
but rather to remind the reader of the relevant can express the thermodynamic state func-
ideas. tions of an ideal gas in terms of the molecular
The set of configurations generated by the partition function Z as follows:
Monte Carlo simulation generates what J.
Willard Gibbs would call an "ensemble," as-
suming that the number of molecules in the
simulation was large and the number of con-
figurations was also large. This ensures that where N is the number of molecules and U is
the possible arrangements of molecules that the internal energy. From this and the as-
are energetically reasonable have been ade- sumption of an ideal gaspV = NkT, the Gibbs
quately sampled. One is often interested in the free energy G = U - TS + pV leads to
statistical weight Wof a particular observable.
For example, a particular conformation of a G = -NkT in ZIN
solute molecule, say, the staggered rotamer of
ethane, could be compared with another con- and similarly, the Helmholtz free energy A =
former, the eclipsed rotamer, in a simulation U - TS leads to the expression
2 Background and Methods
ics. In such a system, we can represent the Vi(t+ At121 . AT, to the original position Vi(t).
total energy E,,, as the sum of kinetic energy By staggering the evaluation of the velocity
E,, and potential energy V,,,: and force calculations by Atl2, an improve-
ment in the simulation performance is ob-
Etot ( t )= Ekin ( t ) + Vpot( t ) tained.
2.1.6.2 Temperafure. For simulations that
where the potential energy is a function of the can be compared with experimental results,
coordinates, V , = f(ri)for atoms i to N and ri one must be able to control the temperature of
represents cartesian coordinates of atom i; and the simulation. The temperature of a system is
the kinetic energy depends on the motion of the a function of the kinetic energy, E,,,(t):
atoms:
Figure 3.9. Schematic diagram of simulation with periodic boundary conditions in which adjacent
cells are generated by simple translations of coordinates.
where N is the number of configurations, Eiis the results. To approximate an "infinite" li6-
the energy of configuration i, k is Boltzmann's uid, one can surround the box of molecules by
constant, and T is temperature. simple translations to generate periodic im-
If we have sufficiently sampled the possible ages. Each atom in the central box has a set of
arrangements of molecules in the simulation related molecules in the virtual boxes sur-
and have an accurate method to calculate rounding the central one (Fig. 3.9). The en-
their energy E, then the above formula will ergy calculations for pairwise interactions
give a Boltzmann weighted average of the consider only the interaction of a molecule, or
property X. its "ghost," with any other molecule, but not
In practice, one must compromise the num- both. In practice, this is accomplished by lim-
ber of molecules in the simulation and/or the iting pairwise interactions to distances less
number of configurations calculated to con- than one-half the length of the side of the box.
serve computer cycles. Two essential tech- Real concerns often arise regarding conver-
niques that are utilized are periodic boundary gence of electrostatic terms because of the lin-
conditions and sampling algorithms, which we ear dependency on distance.
discuss separately. For any large nontrivial system, the total
Although it is important to minimize the number of possible configurations is beyond
number of molecules in either Monte Carlo or comprehension. Consider a set of protons in a
molecular dynamics simulations for computa- magnetic field: the magnetic moments can be
tional convenience, surface effects at the in- either aligned with or opposed to the magnetic
terface between the simulated solvent and the field. For only 50 protons, there are 250 com-
surrounding vacuum could seriously distort binations, which is a large number. For a
Molecular Modeling in Drug Design
small cyclic pentapeptide, there are poten- tropolis et al. (127). One essentially uses a
tially 36'' conformations if one considers a 10" Markov process in which the current config-
scan of the torsional variables @, V.Clearly, uration becomes the basis for generating the
some of these are energetically unreasonable next.
because the conformation requires overlap of
two or more atoms in the structure. Monte 1. A molecule in the current configuration is
Carlo simulations are successfully performed chosen at random and its degrees of free-
by sampling only a limited set of the energeti- dom randomly varied by small increments.
cally feasible conformations, say, lo6 out of
2. The energy of the new configuration is
10lo0theoretical possibilities. The reason for
evaluated and compared with that of the
this success is that the Monte Carlo schemes
sample those states that are statistically most starting configuration.
important. One could sample all states, calcu- 3. If the new energy is lower, the new config-
late the energy of each, and then Boltzmann- uration is accepted and becomes the basis
weight its contribution to the average. Alter- for the next random perturbation.
natively, one can ignore those states that are 4. If the energy is higher, E(new) >
energetically high so that they contribute lit- E(old), then a random number between 0
tle, if any, weight to the average, and concen- and 1 is generated and compared with
trate on those of low energy. In other words, exp{-[E(new) - E(old)])/kT. If the num-
we look only where there are reasonable an- ber is less, then the configuration is ac-
swers energetically. This is called importance cepted and the process continues by gen-
sampling, which is the key to the Monte Carlo erating a new configuration. If the
procedure. number is greater, then the configuration
One aspect shared by Monte Carlo meth-
is rejected and the process resumes with
ods and molecular dynamics is the ability to
the old configuration.
cross barriers. In the case of Monte Carlo,
barrier crossing occurs both by random se-
lection of variables and by acceptance of In this way, configurations of lower en-
higher energy states on occasion. Both ergy are accepted and the system eventually
methods require a n equilibration period to "minimizes" to sample the higher populated
eliminate bias associated with the starting lower energy configurations; at the sam'e
configuration. When one considers ran- time, higher energy configurations are in-
domly filling a box with molecules with arbi- cluded but only in proportion to their Boltz-
trary choices for position and orientation, it mann distribution, which is clearly a func-
should be obvious that most examples would tion of temperature of the simulation.
result in high energy, especially if the den- Because the configurations occur with a
sity of such a simulation is made to resemble probability depending on their energy and
that of a liquid in which adjacent molecules
proportional to the Boltzmann distribution,
are often in VDW contact. High energy con-
one can simply average thermodynamic
figurations contribute very little to the prop-
erties we are trying to evaluate because they properties over this distribution of configu-
are Boltzmann weighted. It is, therefore, ex- rations,
tremely inefficient to randomly calculate
configurations. One needs procedures, often
referred to as importance sampling, that se-
lectively calculate configurations that will
be representative of allowed states. In fact, if
one can guarantee that the energy of the where the sum covers the N configurations
configurations actually has a Boltzmann dis- generated. Because one often does not know
tribution, then one can simply average the an appropriate starting configuration, the
properties. In practice, this has been accom- initial part of the run may be used to "min-
plished by a n algorithm suggested by Me- imize," or equilibrate the system, and only
2 Background and Methods
X+ z
characterization of molecular electrostatic po- butions for all the electrons in a molecule and
tentials, and (3) parameter development for then partitioning those distributions to yield
molecular mechanics. representations for the net atomic charges of
atoms in the molecule, either as atom-cen-
2.2.1 Parameterization of Charge. Esti- tered charges or as more complex distributed
mates of charges in molecular mechanics can multipole models (39,42) (Fig. 3.12).
be derived, in general, by application of one of 2.2.1.1 Atom-Centered Point Charges. In
the many different quantum chemical ap- the Mulliken population analysis, all the one-
proaches, either ab initio or semiempirically. center charge on an atom is assigned to that
Quantum mechanical methods are available atom, whereas the two-center charge is di-
for calculating the electron probability distri- vided equally between the two atoms in the
Molecular Modeling in Drug Design
overlap (even if the electronegativities of the Williams (42) derived a procedure to derive
two atoms are quite dissimilar). The sum is the best fit to a given MEP with a defined set of
the gross atomic population, and the net monopoles, dipoles, and so forth.
atomic charge is simply this plus the nuclear Typically, fragments of molecules of inter-
charge. The result is very sensitive to the basis est are analyzed by ab initio techniques to gen-
set (the number of atomic orbitals) used. De- erate their MEPs that are the reference for
spite poor fit of the molecular electrostatic po- parameterization of charge. Besler et al. (152)
tential derived with point charges to the ab reported fitting of atomic charges to the elec-
initio electrostatic potential, or that derived trostatic potentials calculated by the semiem-
from a distributed multipole analysis (150), pirical methods AM1 and MINDO. The
widespread use continues because they do re- MINDO charges derived by fitting the MEP
flect chemical trends and are reportedly com- can be linearly scaled to agree with results de-
patible with known electronegativities. In ad- rived from ab initio calculations. Among the
dition, this option is commonly available in motivations for semiempirical methods are
software packages. Unfortunately, poor repre- the facts that semiempirical methods using
sentation of the electric field surrounding the high quality basis sets often yield better re-
molecule results from use of atom-centered sults than ab initio techniques employing min-
monopole models (42), even when more care- imal basis sets, and the significant reduction
ful methods are used to distribute the charge. in computational time in moving from ab ini-
2.2.1.2 Methods to Reproduce the Molecu- tio to semiempirical calculations. Rauhut and
lar Electrostatic Potential (MEP). The electro- Clark (153) used the AM1 wave function to
static potential surrounding the molecule that develop a multicenter point-charge model in
is created by the nuclear and electronic charge which each hybrid natural atomic orbital is
distribution of the molecule is a dominant fea- represented by two charges located at the cen-
ture in molecular recognition. Williams re- troid of each lobe. Thus, up to nine charges (4
views (42) methods to calculate charge models orbitals and 1 core charge) are used to repre-
to accurately represent the MEP as calculated sent heavy atoms. Results using this approach
by ab initio methods by use of large basis sets. aMirm the observations that distributed
The choice between models (monopole, dipole, charges are more successful than atom-cen-
quadrapole, bond dipole, etc., Fig. 3.12) de- tered charges in reproducing intermolecular
pends on the accuracy with which one desires interactions (154, 155).
to reproduce the MEP. This desire has to be
balanced by the increased complexity of the 2.2.2 Parameter Derivation for Force Fields.
model and its resulting computational costs Because molecular mechanics is empirical, pa-
when implemented in molecular mechanics. rameters are derived by iterative evaluation of
The first problem is to select points where computational results, such as molecular ge-
the MEP is to be evaluated and eventually fit- ometry (bond lengths, bond angles, dihedrals)
ted, the position of the shell outside the VDW and heats of formation, compared with exper-
radii of the atoms in the molecule, and the imental values (20). Lifson has coined the ex-
spacing of grid points on that shell. Sampling pression "consistent" for force fields in which
too close to the nuclei gives rise to anomalies structures, energies of formation, and vibra-
because the potential around nuclei is always tional spectra have all been used in parame-
positive. Singh and Kollman (151) report the terization by least-squares optimization. In
use of four surfaces at 1.4, 1.6, 1.8, and 2.0 the case of bond lengths, bond angles, and
times the VDW radii, with a density of one to VDW parameters, crystallography has pro-
five points per A'. This paradigm was reported vided most of the essential experimental data-
to give an adequate sampling to which the fit- base. Major efforts (156) to derive general sets
ted charges were fairly insensitive, at least at of parameters from quantum mechanical cal-
the higher values. An improved procedure, the culation have been made, especially for sys-
restrained electrostatic potential fit (RESP), tems for which adequate experimental data
was developed by Bayly et al. (41) to enhance are unavailable. Although quantum mechan-
transferability of the resulting point charges. ics is certainly adequate for initial approxima-
3 Known Receptors
tions of parameters and essential for charge mined by either X-ray crystallography or
approximations, a detailed analysis indicates NMR (12, 13, 166). The availability of the co-
that in vacuo calculations neglect many-body ordinates of all the atoms of the target sug-
effects and can be misleading. A major effort gests use of modeling of the site and interac-
by Hehre (personal communication) to derive tion with prospective ligands. Qualitative
parameters for water from extensive ab initio information can be discerned by simple exam-
calculations with large basis sets failed even to ination of complexes by the use of molecular
give a parameter set that reproduced the ra- graphics and improvement of known ligands
dial distribution for bulk water. Parameters made by searching for accessory binding inter-
derived from relevant experimental data in actions through ligand modification. This ap-
condensed phase (especially if available in the proach was pioneered by groups at Wellcome
solvent of theoretical interest) are generally Research Laboratories (167-169) in designing
more capable of accurately predicting results analogs of 2,3-diphosphorylglycerate (Fig.
because the many-body effects are implicitly 3.131, to modulate oxygen binding to hemoglo-
included in the parameterization. The basic bin, and at Burroughs-Wellcome (170), to en-
assumption is that these "effective" two-body hance affinity of dihydrofolate reductase
potentials implicitly incorporate many-body (DHFR) antagonists. When used in an itera-
interaction energies. tive fashion, novel compounds with improved
Jorgensen has parameterized by fitting affinity result (166, 171, 172). Quantification
properties of bulk liquids to Monte Carlo sim- of interactions and design of novel ligands re-
ulations to give the AMBERIOPLS force field quire application of molecular and statistical
(26, 157, 158).Conceptually, one is attracted mechanics to quantify the enthalpy and en-
by the use of liquids and their observable prop- tropy of binding. In other words, experimental
erties as constraints during the derivation of a measurements reflect free energies of binding
force field that is destined to study the proper- and both enthalpic and entropic contributions
ties of solvated molecules. must be estimated for prediction of affinities
as part of the design process. When combined
2.2.3 Modeling Chemical Reactions and De- with combinatorial chemistry and high
sign of Transition-State Inhibitors. In cases, throughput screening, rapid identification of
such as enzyme reactions, where chemical therapeutic candidates is feasible, as wit-
transformations occur, quantum chemical nessed in the case of factor Xa antagonists
methods must be used to deal with electronic (173) or TAR RNA inhibitors as possible HIV
changes in hybridization and bond cleavage drugs (174).
(159, 160). Hybrid applications (161-163) in 3.1 Definition of Site
which the reaction core is modeled quantum
mechanically and the rest by molecular me- The availability of three-dimensional struc-
chanics would appear a viable option. Alterna- tural information on a potential therapeutic
tively, the geometry of the transition state has target does not guarantee identification of the
been modeled by molecular mechanics, with site of action of the substrate, or inhibitor, un-
force constants derived from ab initio calcula- less the structure of a relevant complex has
tions that predict with amazing accuracy the been determined. In fact, conformational
relative selectivity of reactions. Andrews and changes often occur during binding of ligands
coworkers (164) pioneered modeling of transi- to enzymes that are not r'eflected in the three-
tion states (165) of enzymatic reactions to de- dimensional structure of the enzyme alone. 11-
sign transition-state inhibitors. lustrative examples are the major conforma-
tional changes seen (175,176) in HIV protease
on binding the inhibitor MVT-101 (Fig. 3.14)
3 KNOWN RECEPTORS and the changes in domain orientation ob-
served (177) in the complex of an anti-HIV
A significant challenge is the design of novel peptide antibody with the peptide. Until the
ligands for therapeutic targets in which the two P-strand flaps have been folded in, to com-
three-dimensional structure has been deter- plete the active site of HIV protease, many of
104 Molecular Modeling in Drug Design
Figure 3.13. Diphosphoglycerate (a) and analogs (b-d)designed to optimize interactions bound in
schematic model of hemoglobin. Used with permission (169).
Figure 3.14. Ribbon diagram of HIV-1 protease in the absence of inhibitor (a) and when bound to the
inhibitor MVT-10103). Diagrams based on crystal structures as reported by Miller et al. (175,176).
mimination. The active site has had no evo- accessible at room temperature may be diffi-
lutiionary pressure to optimize binding per se, cult to characterize experimentally because of
bu t rather rates of interaction and discrimina- relatively low abundance and/or lack of reso-
I tion among the limited repertoire of the bio- lution of the experimental techniques used.
log$calmilieu. One classic example (181)of dif- Computationally, they are problematic as well
fic1dty in interpretation of binding as a result because of the complexity of the energy sur-
of ligand modification occurred when an ana- face for a macromolecule.
1%;designed to bind to a specific site on hemo-
d o,bin actually found a more appropriate site 3.2 Characterization of Site
wil;hin the packed side chains of the protein
ma~lecule(Fig. 3.16).This example emphasizes 3.2.1 Volume and Shape. Most substrate-
thc? importance of protein dynamics. Alternate enzyme or receptor-ligand interactions occur
corlformations of the protein that are easily within pockets, or cavities, buried within pro-
Molecular Modeling in Drug Design
Figure 3.15. Bound conformation of cyclosporin (a)as determined by NMR compared with solution
conformation (b) (178). Residues involved with interaction with cyclophilin are indicated on (a) in
bold.
teins. Inside these invaginations, a microenvi- of the relative distance paradigm allows c&-
ronment is established that favors desolvation parison without the need for orientation of
and binding of the ligand, despite the entropic one shape with respect to the other. Potential
cost of fixing the relative geometries of the two ligands are characterized in a similar fashion
molecules. Knowledge of the three-dimen- by generating a set of spheres that mimic the
sional structure of such cavities can assist the shape of the ligand. Matching the distance ma-
study of binding interactions and the design of trix of the cavity with that of a potential ligand
novel ligands as potential therapeutics. Sev- provides an efficient screen for selection of
eral algorithms to find, display, and character- complementary shapes. Voorintholt et al.
ize cavity-like regions of proteins as potential (184)used three-dimensional lattices to calcu-
binding sites have been developed. Kuntz et al. late density maps of proteins. In these maps,
(13, 183) described a program, DOCK, to ex- lattice points were assigned as a function of
plore the steric complementarity between li- the distance to the nearest atom. This tech-
gands and receptors of known three-dimen- nique is effective in delineating regions of low
sional structure. Using the molecular surface density where channels and cavities exist. Ho
of a receptor, a volumetric representation of and Marshall (185) implemented a search
the chosen binding cavity is approximated by function in CAVITY to allow the investigator
use of a set of spheres of various sizes that to isolate a single cavity of interest by specify-
have been mathematically "packed" within it ing a seed point. From this seed point, the al-
(Fig. 3.17). The set of distances between the gorithm systematically explored the entire
centers of the spheres serves as a compact rep- volume of the cavity, following its borders and
resentation of the shape of the cavity. The use effectively filling every crevice within it; that
3 Known Receptors
tions of parameters and essential for charge mined by either X-ray crystallography or
approximations, a detailed analysis indicates NMR (12, 13, 166). The availability of the co-
that in vacuo calculations neglect many-body ordinates of all the atoms of the target sug-
effects and can be misleading. A major effort gests use of modeling of the site and interac-
by Hehre (personal communication) to derive tion with prospective ligands. Qualitative
parameters for water from extensive ab initio information can be discerned by simple exam-
calculations with large basis sets failed even to ination of complexes by the use of molecular
give a parameter set that reproduced the ra- graphics and improvement of known ligands
dial distribution for bulk water. Parameters made by searching for accessory binding inter-
derived from relevant experimental data in actions through ligand modification. This ap-
condensed phase (especially if available in the proach was pioneered by groups at Wellcome
solvent of theoretical interest) are generally Research Laboratories (167-169) in designing
more capable of accurately predicting results analogs of 2,3-diphosphorylglycerate (Fig.
because the many-body effects are implicitly 3.131, to modulate oxygen binding to hemoglo-
included in the parameterization. The basic bin, and at Burroughs-Wellcome (170), to en-
assumption is that these "effective" two-body hance affinity of dihydrofolate reductase
potentials implicitly incorporate many-body (DHFR) antagonists. When used in an itera-
interaction energies. tive fashion, novel compounds with improved
Jorgensen has parameterized by fitting affinity result (166, 171, 172). Quantification
properties of bulk liquids to Monte Carlo sim- of interactions and design of novel ligands re-
ulations to give the AMBERIOPLS force field quire application of molecular and statistical
(26, 157, 158).Conceptually, one is attracted mechanics to quantify the enthalpy and en-
by the use of liquids and their observable prop- tropy of binding. In other words, experimental
erties as constraints during the derivation of a measurements reflect free energies of binding
force field that is destined to study the proper- and both enthalpic and entropic contributions
ties of solvated molecules. must be estimated for prediction of affinities
as part of the design process. When combined
2.2.3 Modeling Chemical Reactions and De- with combinatorial chemistry and high
sign of Transition-State Inhibitors. In cases, throughput screening, rapid identification of
such as enzyme reactions, where chemical therapeutic candidates is feasible, as wit-
transformations occur, quantum chemical nessed in the case of factor Xa antagonists
methods must be used to deal with electronic (173) or TAR RNA inhibitors as possible HIV
changes in hybridization and bond cleavage drugs (174).
(159, 160). Hybrid applications (161-163) in 3.1 Definition of Site
which the reaction core is modeled quantum
mechanically and the rest by molecular me- The availability of three-dimensional struc-
chanics would appear a viable option. Alterna- tural information on a potential therapeutic
tively, the geometry of the transition state has target does not guarantee identification of the
been modeled by molecular mechanics, with site of action of the substrate, or inhibitor, un-
force constants derived from ab initio calcula- less the structure of a relevant complex has
tions that predict with amazing accuracy the been determined. In fact, conformational
relative selectivity of reactions. Andrews and changes often occur during binding of ligands
coworkers (164) pioneered modeling of transi- to enzymes that are not r'eflected in the three-
tion states (165) of enzymatic reactions to de- dimensional structure of the enzyme alone. 11-
sign transition-state inhibitors. lustrative examples are the major conforma-
tional changes seen (175,176) in HIV protease
on binding the inhibitor MVT-101 (Fig. 3.14)
3 KNOWN RECEPTORS and the changes in domain orientation ob-
served (177) in the complex of an anti-HIV
A significant challenge is the design of novel peptide antibody with the peptide. Until the
ligands for therapeutic targets in which the two P-strand flaps have been folded in, to com-
three-dimensional structure has been deter- plete the active site of HIV protease, many of
Molecular Modeling in Drug Design
overlap (even if the electronegativities of the Williams (42) derived a procedure to derive
two atoms are quite dissimilar). The sum is the best fit to a given MEP with a defined set of
the gross atomic population, and the net monopoles, dipoles, and so forth.
atomic charge is simply this plus the nuclear Typically, fragments of molecules of inter-
charge. The result is very sensitive to the basis est are analyzed by ab initio techniques to gen-
set (the number of atomic orbitals) used. De- erate their MEPs that are the reference for
spite poor fit of the molecular electrostatic po- parameterization of charge. Besler et al. (152)
tential derived with point charges to the ab reported fitting of atomic charges to the elec-
initio electrostatic potential, or that derived trostatic potentials calculated by the semiem-
from a distributed multipole analysis (150), pirical methods AM1 and MINDO. The
widespread use continues because they do re- MINDO charges derived by fitting the MEP
flect chemical trends and are reportedly com- can be linearly scaled to agree with results de-
patible with known electronegativities. In ad- rived from ab initio calculations. Among the
dition, this option is commonly available in motivations for semiempirical methods are
software packages. Unfortunately, poor repre- the facts that semiempirical methods using
sentation of the electric field surrounding the high quality basis sets often yield better re-
molecule results from use of atom-centered sults than ab initio techniques employing min-
monopole models (42), even when more care- imal basis sets, and the significant reduction
ful methods are used to distribute the charge. in computational time in moving from ab ini-
2.2.1.2 Methods to Reproduce the Molecu- tio to semiempirical calculations. Rauhut and
lar Electrostatic Potential (MEP). The electro- Clark (153) used the AM1 wave function to
static potential surrounding the molecule that develop a multicenter point-charge model in
is created by the nuclear and electronic charge which each hybrid natural atomic orbital is
distribution of the molecule is a dominant fea- represented by two charges located at the cen-
ture in molecular recognition. Williams re- troid of each lobe. Thus, up to nine charges (4
views (42) methods to calculate charge models orbitals and 1 core charge) are used to repre-
to accurately represent the MEP as calculated sent heavy atoms. Results using this approach
by ab initio methods by use of large basis sets. aMirm the observations that distributed
The choice between models (monopole, dipole, charges are more successful than atom-cen-
quadrapole, bond dipole, etc., Fig. 3.12) de- tered charges in reproducing intermolecular
pends on the accuracy with which one desires interactions (154, 155).
to reproduce the MEP. This desire has to be
balanced by the increased complexity of the 2.2.2 Parameter Derivation for Force Fields.
model and its resulting computational costs Because molecular mechanics is empirical, pa-
when implemented in molecular mechanics. rameters are derived by iterative evaluation of
The first problem is to select points where computational results, such as molecular ge-
the MEP is to be evaluated and eventually fit- ometry (bond lengths, bond angles, dihedrals)
ted, the position of the shell outside the VDW and heats of formation, compared with exper-
radii of the atoms in the molecule, and the imental values (20). Lifson has coined the ex-
spacing of grid points on that shell. Sampling pression "consistent" for force fields in which
too close to the nuclei gives rise to anomalies structures, energies of formation, and vibra-
because the potential around nuclei is always tional spectra have all been used in parame-
positive. Singh and Kollman (151) report the terization by least-squares optimization. In
use of four surfaces at 1.4, 1.6, 1.8, and 2.0 the case of bond lengths, bond angles, and
times the VDW radii, with a density of one to VDW parameters, crystallography has pro-
five points per A'. This paradigm was reported vided most of the essential experimental data-
to give an adequate sampling to which the fit- base. Major efforts (156) to derive general sets
ted charges were fairly insensitive, at least at of parameters from quantum mechanical cal-
the higher values. An improved procedure, the culation have been made, especially for sys-
restrained electrostatic potential fit (RESP), tems for which adequate experimental data
was developed by Bayly et al. (41) to enhance are unavailable. Although quantum mechan-
transferability of the resulting point charges. ics is certainly adequate for initial approxima-
Molecular Modeling in Drug Design
and receptor (185). At every cavity-pocket in- areas that are less well packed and available
terface point, the electrostatic potential of for ligand modification.
both the atoms forming the cavity and those of
the binding ligand are calculated. A rough ap- 3.3.2 Three-Dimensional Databases. Medici-
proximation of complementarity is computed nal chemists have recognized the potential of
by multiplying these potentials together. A fa- searching three-dimensional chemical data-
vorable electrostatic interaction is produced bases to aid in the process of designing drugs
when the electrostatic potentials are opposite for known, or hypothetical, receptor sites. Sev-
in sign. Therefore, favorable interactions are eral databases are well known, such as the
indicated when the product of these values is a Cambridge Crystallographic Database (194)
(CSD). The crystal coordinates of proteins and
negative number. Likewise, unfavorable in-
other large macromolecules are deposited into
teractions are indicated when the product of
the Brookhaven Protein Databank (195). The
these values is a positive number and the po-
conformations present in crystallographic da-
tential of the cavity and that of the binding tabases reflect low energy conformers that
ligand have the same sign. These products are should be readily attainable in solution and in
then normalized, assigned a color, and dis- the receptor complex. The three-dimensional
played. orientation of the key regions of the drug that
In a similar way, an estimate of the hydro- are crucial for molecular recognition and bind-
phobic character of a segment of the surface ing are termed thepharmacophore. The inves-
can be quantitated and indicated through tigator searches the three-dimensional data-
color coding. The ability to rapidly switch be- base through a query for fragments that
tween these hydrophobic and electrostatic contain the pharmacophoric functional
surface representations, to visually integrate groups in the proper three-dimensional orien-
the optimal complementarity between site tation. Using these fragments as "building
and potential ligand to be designed, is helpful. blocks," completely novel structures may be
constructed through assembly and pruning
3.3 Design of Ligands (196). Receptor sites are complex both in geo-
metrical features and in their potential energy
3.3.1 Visually Assisted Design. In the pro- fields, and many diverse compounds can bind
cess of optimization of a lead, one needs to to the same protein by occupying various com-
ascertain where modification is feasible. Al- binations of subsites. Noncrystallographic da-
though visualization of the excess space avail- tabases have been developed as well. One ex-
able in the active-site cavity by directly exam- ample is the three-dimensional database of
ining ligands is useful for locating selected structures from Chemical Abstracts gener-
regions where ligand modifications may be ated through CONCORD (197-199) that con-
made, it is not well suited for fully character- tains over 700,000 entries. The use of such
izing the void that exists between the ligand databases is most applicable when the binding
and the receptor, the ligand-receptor gap re- of a particular ligand and its receptor is well
gion; information concerning the relative di- understood in terms of functional group rec-
mensions of free space is difficult to discern. ognition, and a crystal structure of the com-
To facilitate the display of this information, plex is known (200). One approach to ligand
Ho and Marshall (185) developed another al- design is to develop novel chemical architec-
gorithm to color-code the cavity display by the tures (i.e., scaffolds) that position the pharma-
ligand-receptor nearest atom gap distance. cophoric groups, or their bioisosteres, in the
The actual VDW, surface-to-surface distance correct three-dimensional arrangement.
(not center to center) between the ligand and Gund conceived the first prototypic pro-
enzyme atoms is calculated. When the ligand- gram designed to search for molecules that
receptor distances have been calculated at all match three-dimensional pharmacophoric
cavity-pocket interface lattice points, a user- patterns (201, 202). This program, MOLPAT,
defined color-coding scale is implemented to performed atom-by-atom searches to verify
generate the displays. This highlights those comparable interatomic distances between
3 Known Receptors
pattern and candidate structures. Although mentarity. Furthermore, CHEM-X (210) per-
rigorous, this approach was tedious and re- forms a rule-based conformational search on
quired optimization. Lesk (203) devised a each structure in the database to account for
method that used the geometric attributes of conformational flexibility. For a comprehen-
the query to screen potential candidates. Sim- sive review of three-dimensional chemical da-
ilarly, Jakes and Willett (204) proposed that tabase searching, see Martin et al. (212,213).
screens based on interatomic distances and Pharmaceutical companies have developed
atom types could considerably augment three-dimensional databases for their com-
search efficiency. Furthermore, Jakes et al. pound files to help prioritize candidates for
(205) showed that methods widely used in screening (210, 214). An essential component
two-dimensional structure retrieval could be in such a system is a method for assessing sim-
applied to three-dimensional searches, to re- ilarity (212,215). Because most compound da-
move the vast majority of compounds before tabases were entered as two-dimensional
more rigorous comparisons. This was vali- structures, this has required conversion to a
dated in test searches against a subset of the three-dimensional format. Programs have
CSD. This concept was furthered by Sheridan proved (197-199, 216) useful in generating
et al. (200),who included screens based on aro- plausible three-dimensional structures from
maticity, hybridization, connectivity, charge, the connectivity data, as reviewed by Sa-
position of lone pairs, and centers of mass of dowski and Gasteiger (217). Because of the in-
rings. To contain this wealth of information, herent flexibility in most compounds, the use
an inverted bit map [the presence or absence of a single conformation to represent the
of a feature is encoded as a 1 or 0 (bit) at a three-dimensional potential for interaction of
particular location in a "keyword"] was em- a molecule is a clear limitation. Development
A
ployed for highly efficient screening, hundreds of three-dimensional databases with a com-
of thousands of compounds in minutes. pact, coded representation of the conforma-
Similar database searching methods have tional states available to each compound is a
been incorporated into a number of current logical next step. Efficient use of such a data-
database searching systems. Programs such as base requires methods for evaluating three-
CAVEAT (206), ALADDIN (Abbott) (2071, dimensional similarities. In addition to identi-
3DSEARCH (Lederle) (208), MACCS-3D fication of compounds that can present an.
(209),CHEM-X (2101, UNITY (2111, and oth- appropriate three-dimensional pattern, com-
ers contain considerable functionality useful pounds must also fit within the receptor cav-
for such an approach. CAVEAT (206) is de- ity. Based on a shape-matching algorithm,
signed to assist a chemist in identifying cyclic Sheridan et al. (200) screened candidate com-
structures that could serve as the foundation pounds to select those whose volumes would
*
for novel compounds. In particular, it allows fit within the combined volumes of known ac-
an investigator to rapidly search structural tive compounds. Previously, this group used
databases for compounds containing substitu- (218) the same algorithm to help identify po-
ent bonds that satisfy a specific geometric re- tential ligands for papain and carbonic anhy-
lationship. ALADDIN (2071, 3DSEARCH drase, by screening compounds from the CSD.
(208), MACCS-3D (209), and CHEM-X (210) Screening of the active site of HIV protease
are similar, in that geometric relationships be- identified (219) haloperidol (Fig. 3.20) as an
tween various user-defined atomic compo- - inhibitor of the enzyme and provided a novel
nents can be used as a query to retrieve match- chemical lead for further investigation. Burt
ing structures. Features have been included to and Richards (220) introduced flexible fitting
allow the user to delineate molecular charac- of molecules to a target structure, with assess-
teristics (atom type, bond angles, torsional ment of molecular similarity as a means of
constraints, etc.) to ensure the retrieval of rel- dealing with the conformational problem.
evant compounds. Additional constraints have The use of preliminary screens can elimi-
been incorporated into 3DSEARCH (208) and nate the vast majority of compounds before
ALADDIN (2071, including the consideration more rigorous, and computationally demand-
of retrieved ligand-receptor volume comple- ing, pattern-matchingcomparisons (212,213).
Molecular Modeling in Drug Design
consisting of the coordinates of atoms and/or Bartlett to find cyclic scaffolds (207) by search-
bonds. All possible structures that contain any ing the CSD (195) for the correct vectorial ar-
combination of a user-specified minimum rangement of appended groups.
number of matching atoms and/or bonds are All of these approaches attempt to help the
retrieved. Combinations of hits can be gener- chemist discover novel compounds that will be
ated automatically by a companion program recognized at a given receptor. Van Drie et al.
(104),SPLICE, which trims molecules found (207) described a program, ALADDIN, for the
from the database to fit within the active site design or recognition of compounds that meet
and then logically combines them by overlap- geometric, steric, or substructural criteria,
ping bonds to maximize their interactions and Bures et al. (235) described its successful
with the site (Fig. 3.21). The addition of bridg- application to the discovery of novel auxin
ing fragments to those recovered from the da- transport inhibitors. As our knowledge base of
tabase allows generation of many novel li- receptors grows, such tools will prove increas-
gands for further evaluation. ingly useful. The ability to transcend the
chemical structure of lead compounds, while
3.3.3 De Novo Design. Design of novel retaining the desired activity, should dramat-
chemical structures that are capable of inter- ically improve the ability to design away unde-
acting with a receptor of known structure uses sirable side effects. Bohm developed the pro-
methodology that is much more robust, given gram LUDI (221,222) to construct ligands for
that the geometric foundations of molecular active sites with an empirical scoring function
sciences are much firmer than the thermody- to evaluate their construction.
namic ones. Techniques for the design of novel
structures to interact with a known receptor 3.3.4 Docking. The search for the global
site are becoming more available and show minimum, or the complete set of low energy
promise (227-229). It has become quite evi- minima, on the free energy surface when two
dent that much of a molecule acts simply as a molecules come in contact is commonly re-
scaffold to align the appropriate groups in the ferred to as the "docking" problem [(236);see
three-dimensional arrangement that is crucial also Leach (21)l.Any useful molecular docking
for molecular recognition. By understanding program must be computationally efficient in
the pattern for a particular receptor, one can determining the most favorable binding mode,
transcend a given chemical series by replacing sufficiently sensitive in its scoring function to
one scaffold with another of geometric equiv- discriminate between alternate binding
alence. This offers a logical way to dramati- modes and the correct mode, and robust
cally change the side-effect profile of the drug enough to allow various ligand-receptor sys-
as well as its physical and metabolic at- tems to be studied.
tributes. Various software tools are already 3.3.4.1 Docking Methods. In the case of
under development to assist the chemist in two proteins of known structure that can be
this design objective. Lewis and Dean de- approximated as rigid bodies, there are 6 de-
scribed their approaches to molecular tem- grees of freedom, the relative position ( x , y,
plates in a series of papers (230, 231). An al- and z coordinates), and relative orientation
ternative approach, BRIDGE (Dammkoehler (roll, pitch, and yaw to use the aeronautical
et al., unpublished), is based on geometric gen- expressions) to be explored. Several very intel-
eration of possible cyclic compounds as scaf- ligent approaches to this problem have been
folds, given constraints derived from the types developed. The first and most well known ap-
of chemistry the chemist is willing to consider. proach is the DOCK program (http://www.
Nishibata and Itai (232, 233) published a cmpharm.ucsf.edu/kuntz/dock.html)(183) that
Monte Carlo approach to generating novel was developed to solve the ligand-receptor
structures that fit a receptor cavity. Pearlman problem. This program uses abstract repre-
and Murko (234) combined a similar approach sentations (a set of spheres) of the convex
with molecular dynamics with illustrative ap- shape on the receptor to be filled and the con-
plications to HIV protease and FK506 binding cave ligand and matches them to generate
protein. CAVEAT is a program developed by plausible binding modes with complementary
Molecular Modeling in Drug Design
--L
Figure 3.21. Combination by SPLICE (104)of fragments that bind to different subsites of NADP
binding site of DHFR to generate a more optimal ligand.
3 Known Receptors
2. Krystek et al. (261) analyzed 19 protein- ing a good range of activity as well as using
ligand complexes in an update of the No- several inhibitors from the published test set.
votny approach (262). The PLS predictive r 2 value was 0.565, with an
absolute average error of 0.694. The predictive
r 2 value is considerably lower than that of the
first test set, although this is attributed to the
smaller range and distribution of activity in
this set. The absolute average error is almost
identical.
Although shape complementarity is an im-
3. VALIDATE is a hybrid approach to predict portant consideration and shows correlation
the binding affinity of novel ligands for with the energy of interaction, it does not con-
a receptor of known three-dimensional sider the electrostatics of the system (the rel-
structure based on the calculation of sev- ative positioning of hydrogen-bond donors and
eral physicochemical properties of the li- acceptors, etc.). More sophisticated energetic
gand itself as well as a molecular mechanics functions are often used to refine the candi-
analysis of the receptor-ligand complex date binding modes found by DOCK, or in the
(263). The properties of a diverse training docking process itself. The assumption of rigid
set (-log K,, range = 2.47-14.00) of 51 geometry for the receptor allows a preprocess-
crystalline complexes were analyzed by ing of the energetic contribution of the recep-
partial least squares (PLS) statistical tor to each grid point of a lattice constructed
methodology and neural network analysis within the active site cavity (131, 265, 266).
to select a statistical model from a variety This allows a simple estimation of the energy
of parameters with the following proper- of interaction of each atom in the ligand by
ties: finding the energy of the lattice points that are
closest followed by interpolation. By increas-
ing the efficiency of the scoring function, more
candidate binding modes can be evaluated
S (press) = 1.29 (1.75 kcal/mol) and, thus, one resembling the global minimum
is more likely to be found. This assumes that.
The true measure of any model rests in its the scoring function used is sufficiently accu-
ability to predict the affinity of new com- rate to discriminate between the correct bind-
pounds. This would include the prediction of ing mode and others, and the problem is sim-
unique ligands bound to receptors that exist in ply one of sampling. Most scoring functions
the base set as well as the affinities of unique used, however, deal almost essentially with
ligand/receptor complexes. Three separate the enthalpy of binding and ignore the entropy
test sets were compiled for this purpose. The of binding. It should not be surprising, there-
first set consisted of 14 inhibitors that were fore, that the agreement between the pre-
obtained from crystalline receptorlligand com- dicted binding modes and those observed ex-
plexes. Neither ligands nor their receptor perimentally are not always perfect. AS one is
classes were included in this training set. attempting to discriminate between alternate
Included were 2 DHFR, 2 penicillipepsin, 3 binding modes of the same complex, difficul-
carboxypeptidase, 2 alpha-thrombin, and 2 ties in estimating entropy and desolvation are
trypsinogen inhibitors as well as 3 DNA-bind- minimal because many of the terms (solvation
ing molecules. Prediction of binding affinities and entropy of isolated ligand and receptor) in
gave a PLS predictive r2 = 0.786, with an ab- the comparison cancel.
solute average error of 0.693 log units. The 3.3.4.3 Search for the Correct Binding
second test set consisted of 13 HIV protease Mode (267-283). Just as there are many dif-
inhibitors whose initial conformation and ferent approaches to the global minimization
alignment were derived from the CoMFA problem, most, if not all, have been applied to
analysis done by Waller et al. (264). The selec- the docking problem. These include molecular
tion of the inhibitors was based on maintain- dynamics, Monte Carlo sampling, systematic
3 Known Receptors
search (284), the genetic algorithm (101, 102, several groups (101,102,105,285,286,293) to
105,285,286),and straight derivative optimi- optimize the scoring function used. Encoding
zation with multiple starting geometries. A of the conformation of the ligand by torsional
combination of MDMC has been shown (287, degrees of freedom and generating increas-
288) to be a fairly efficient method for deter- ingly more fit sets of progeny by mutation and
mining the free energy surface in smaller host- crossover have proved to be an effective search
guest systems (289). The combination of mo- strategy. In one example (285), a Gray-coded
lecular dynamics to locally sample with Monte binary string was used for the three transla-
Carlo that allows for conformational transi- tions, three rotations, and bond rotations that
tions provides adequate sampling if sufficient specified the binding mode, and a two-point
computational resources are available.
crossover operator was used in the GA algo-
Wasserman and Hodge (290) used molecu-
rithm. In the four examples of complexes with
lar dynamics to dock thermolysin inhibitors to
an approximate model of the enzyme, with known crystal structures, the results of rigid-
flexibility in the active site (38 of 314 residues) body docking with a straightforward applica-
and ligand and with the rest of the enzyme tion of the GA were not encouraging, in that
represented by a grid approximation. A solva- the correct binding mode was identified in
tion model was used to compensate for desol- only two of the four test cases. Restraining the
vation in complex formation. To get 22 of 25 GA to search subdomains (different binding
runs to orient the hydroxamate function cor- hypotheses) in a systematic manner corrected
rectly, the hydroxamate oxygens of the start- this problem. Only the ligand was allowed
ing conformation were initialized within 4 A of flexibility and the GA procedure was repeated.
the zinc. If they were allowed to vary to 8 then Several binding modes similar to that seen in
only 3 of 24 runs placed the ligand correctly. the experimental complex were found in each
Obviously, there is a serious sampling problem. example, but ones with the lowest energy did
Desmet et al. (291) used a truncated (dead- not necessarily have the lowest rms from the
end elimination) search procedure to bind experimental, pointing out deficiencies in the
flexible peptides to the MHC I receptor. The AMBER-like scoring function used.
translatiodrotational space covered 6636 rel- Generally, no single scoring function can
ative orientations and each nonglycine/proline accurately predict the binding affinities for all
residue of the peptide had 47 main-chain con- types of ligands with all types of receptors.
formers. Side chains had threefold rotations Consensus scoring (294, 295) is the simulta-
about their chi angles and 28 side chains of the neous use of multiple different scoring func-
receptor were allowed to rotate. Seventy-four tions to make virtual screening more predic-
low energy structures were obtained with an tive. CScore (Tripos, Inc.) is a consensus-
average rmsd of 1 A. The lowest energy struc- scoring program that integrates several well-
ture had an rmsd of 0.56 A. Peptides up to 20 known scoring functions from the scientific
residues were docked with this procedure. literature. Each individual scoring function is
King et al. (292) used an empirical binding used to predict the affinity of ligands in candi-
free-energy function when docking MVT-101 date complexes. CScore also creates a consen-
to HIV protease. Forty-nine translationlrota- sus column, containing integers that range
tions were examined with the PonderIRichard from 0 to the total number of scoring func-
rotamer library. Only a limited number of tions. Each complex whose score exceeds the
rotamers for each amino acid were examined: threshold for a particular function adds 1 to
Thr(21, Ile(31, Nle(31, Nle(3), Gln(6), and the value of the consensus; configurations be-
Arg(5). According to the authors, 2.24 x 10'' low the threshold contribute a zero. Consen-
discrete states were examined. Sixty-four low sus columns can also be calculated from any
energy structures with an average rmsd of combination of externally supplied indicators,
1.36 A were found. If the CHARMM potential so that key aspects of binding (e.g., the pres-
was used with the same protocol, then the av- ence of a specific hydrogen bond) can be used
erage rmsd was increased to 1.68 k to discriminate good configurations from bad
The genetic algorithm has been used by ones. CScore can be used to rank multiple con-
Molecular Modeling in Drug Design
OR3
proved to be a reliable indicator. The reasons AGvdw is the energy derived from enhanced
behind this difficulty become more obvious if van der Wads interactions in complex; and
one dichotomizes the free energy of binding AGH is the free energy attributed to the hydro-
into a logical set of components. phobic effect (0.125 kJ/mol per A2of hydrocar-
For example, Williams (311-314) used a bon surface removed from solvent by complex
vancomycin-peptide complex (Fig. 3.23) as an formation).
experimental system in which to evaluate the Through use of this analysis on the dipep-
various contributions to binding affinity. A tide-vancomycin system, estimates of the con-
similar analysis for antibody mutants was at-
tempted by Novotny (262).
tribution of the hydrogen bonds to binding .
were made (312) that were considerably
higher (-24 kJ/mol, -6 kcal/mol) than those
derived experimentally. The most likely
source of error is the assumption of complete
loss of relative and internal entropy upon
where AGerans + rot, is the free energy associ- binding. In retrospect, Searle and Williams
ated with translational and rotational free- (313) examined the thermodynamics of subli-
dom of the ligand. This has an adverse effect mation of organic compounds without inter-
on binding of 50-70 kJ/mol (12-17 kcallmol) nal rotors, and showed that only 40-70% of
at room temperature for ligands of 100-300 theoretical entropy loss occurs on crystalliza-
Da, assuming complete loss of relative trans- tion. This provides an estimate of the entropy
lational and rotational freedom. AGrotOrs is the loss to be expected on drug-ligand interaction.
free energy associated with the number of ro- Applying this correction to the peptide-vanco-
tational degrees of freedom frozen. This is 5-6 mycin system led (314)to a more conventional
kJ/mol (1.2-1.6 kcal/mol) per rotatable bond, view of the hydrogen bond of between -2 and
assuming complete loss of rotational freedom. -8 kJ/mol(0.5-2.0 kcallmol). Because several
~ c o n f o m is the strain energy introduced by of the components in the binding energy esti-
complex formation (deformation in bond mate are directly related to the degree of order
lengths, bond angles, torsional angles, etc. of the system (entropy),simulations in solvent
from solution states); X AG, is the sum of in- may be necessary to quantitate the degree by
teraction free energies between polar groups; which the relative motions of the ligand and
Molecular Modeling in Drug Design
protein are quenched and the restriction on Data Bank, drawing on hundreds or thou-
rotational degrees of freedom upon complex- sands of examples of each interaction type.
ation. Aqvist (316, 317) developed the linear Grzybowski et al. (321) combined a knowl-
interaction energy (LIE) method for calculat- edge-based potential with a Monte Carlo
ing the ligand-binding free energies from mo- growth algorithm that generated a very potent
lecular dynamics simulations. Verkhivker et inhibitor of human carbonic anhydrase (322).
al. (318) developed a hierarchical computa- The resulting equation for all the atom-pair
tional approach to structure and affinity pre- interactions in a protein-ligand complex can
diction in which dynamics is combined with a yield free energies directly, given that solva-
simplified, knowledge-based energy function. tion and entropic terms are treated implicitly.
Despite the focus on short peptides interacting
with the SH2 domain with exhaustive calori- 3.4.4 Simulations and the ~hermodynamic
metric determination of binding entropy, en- Cycle. Given a known structure of a drug-re-
thalpy, and heat capacity changes, the overall ceptor complex with a measured affinity of the
correlation between computed and experimen- ligand, the thermodynamic cycle paradigm al-
td binding amnity remained rather modest. \OW% calcu1ation of the diffe~exein an it^
(AAG) with a novel ligand. Bash et al. (136)
3.4.2 Binding Energetics and Compari- successfully calculated the effect of changing a
sons. Because of the difficulties in calculating phosphoramidate group (P-NH) to a phos-
binding free energies (see below), attempts to phate ester (P-0) in transition-state analog
use AH as a means of correlation with binding inhibitors of thermolysin (Fig. 3.24). The dif-
affinities have often appeared in the litera- ference in free energy between a benzenesul-
ture, sometimes meeting with considerable fonamide and itsp-chloro derivative as an in-
success. These successes, however, are fortu- hibitor of carbonic anhydrase has been
itous and depend on simplifying assumptions calculated (323) as well. This is similar to the
as well as the well-known correlation (319) be- original application to enzyme-ligand work on
tween AH and AG, which has been suggested benzamidine inhibitors of trypsin, in which
as an unusual property of the solvent water. A the mutation of a proton to a fluorine was cal-
similar correlation has been observed in non- culated (324). Hansen and Kollman (325) cal-
aqueous systems and relates to higher entropy culated differences in the free energy of bind-
loss associated with stronger enthalpic inter- ing of an inhibitor of adenosine deaminas'e as
actions (313). It is a common assumption with one changes a proton to a hydroxyl group by
congeneric series that the desolvation ener- use of a model of the active site. Other exam-
gies and entropic effects will be approximately ples (326-328) looked at the difference in
the same across members of the series. This, binding of two stereoisomers of a transition-
often tacit, assumption may hold for most of state inhibitor of HIV protease (Fig. 3.25) and
the series, but complex formation is depen- the affinity of DHFR for methotrexate analogs
dent on the total energetics of the complex, (329). One obvious conclusion can be drawn:
and what may appear a relatively innocuous successful applications in the literature deal
change in a substituent may trigger a different with relatively minor perturbations to a struc-
binding mode in which the ligand has reori- ture where there is less chance that the bind-
ented. This will likely have an impact on de- ing mode might be altered.
solvation as well as entropic effects, in that the There is at least one example in the litera-
interactions of the majority of the ligand have ture (330) in which the calculated affinity dif-
changed environment. ference did not agree with the experimental
date [binding of an antiviral agent to human
3.4.3 Atom-Pair Interaction Potentials. Af- rhinovirus HRV-14 and to a mutant virus in
finities can be calculated based on ligand-re- which a valine was mutated to a leucine (Fig.
ceptor atom-pair interaction potentials that 3.2611. Here a p-branched amino acid (Val)
are statistical in nature rather than empirical. was converted into Leu, which lacks the iso-
Muegge and Martin (320)derived these poten- propyl side chain adjacent to the peptide back-
tials from crystallographic data in the Protein bone besides the addition of a methyl group.
3 Known Receptors
The differences between calculation and ex- with electrostatics were cited. A review of ap-
perimental data may be related to rotational plications by Kollman (134) cites numerous
isomerism of the side chains that can be ex- other examples.
plicitly included (331). Despite the successful
examples of this approach that appear in the 3.4.5 Multiple Binding Modes. Realisti-
literature, there exists a growing healthy cally, congeneric series that can be a useful
skepticism regarding its general application. construct exist only in the mind of the medic-
In a discussion (332)of the application of sim- inal chemist. The orientation of the drug in
ulations to prediction of the changes in protein the active site depends on a multitude of inter-
stability attributed to amino acid mutation, actions and a minor perturbation in structure
problems in adequate sampling, particularly can destabilize the predominant binding mode .
of the unfolded state, as well as difficulties in favor of another. As examples, detailed
J M G * = - 0.5\
Figure 3.26. Calculated (330) kcallmol
relative affinity of a Sterling-
Winthrop antiviral that binds to
rhinovirus coat protein (HRV-11)
and to the V188L mutant. Biolog-
ANH 0
0
ical data indicate that V188L mu- Leucine-188
tation drastically diminishes ac- Valine-188
HRV-14
tivity of the antiviral. HRV-14
analyses of the multiple binding modes shown they bind at the same site on the receptor (cer-
with thyroxine analogs (334) by transthyretin, tainly, the simplest hypothesis). Recent stud-
a transport protein, and enkephalin analogs ies on G-protein-coupled receptors indicates
(335) by an FAB fragment have been made that agonists and antagonists often have dif-
through crystallography. For this reason, the ferent binding sites, given that mutations in
probability of correct answers with thermody- the receptor can affect the binding of one and
namic integration studies is directly related to not the other. An example of such a study on
the similarity in structure between the ligand the angiotensin I1 receptor has been published
of interest and the reference compound. All (336). This story is only beginning to unfold,
three-dimensional methods for predicting af- but appears to be a general phenomenon in
finity require a fundamental assumption G-protein receptors (337, 338). Examples of
about the binding mode (in other words, an this phenomenon have been reported with an-
orientation rule for aligning compounds in the tagonists derived from screening where the
model). Examination of series of ligands bind- structure of antagonist and agonist differ dra-
ing to the same site usually includes examples matically, but also where the antagonists were
of similar compounds that have different bind- obtained by minor structural modification of
ing modes [e.g., the change in orientation (Fig. the natural agonist.
3.25) of the C-terminal portion of the Roche
3.5 Protein Structure Prediction
HIV protease inhibitor compared with
JG-3651 (333). Molecular modeling is cur- Prediction methods for generating the 3D
rently capable of distinguishing correctly in structure of a protein based on its sequence
many cases between alternate binding modes alone fall into several categories. There are
of the same ligand. Many components (desol- hierarchical methods that predict secondary
vation, entropy of binding, etc. of the ligand), structures and then attempt to fold those ele-
which cloud the issue of direct calculation of ments together. There are simulation meth-
affinities are constant when comparing bind- ods that attempt to fold the protein through
ing modes of the same compound and, there- the use of models of reduced complexity and
fore, do not have to be evaluated. The compu- then refine the prediction by using them to
tational costs of exploring possible binding constrain all-atom models. Additionally, there
modes within the active site is nontrivial, how- are hybrids of these approaches that rely
ever, especially when the protein is capable of heavily on heuristics. These methods have
reorganizing to expose alternative sites, as been successful in limited cases in the hands of
was the case for a series of ligands for hemo- their authors, but have generally been found
globin (181). lacking when tested by others in a more thor-
In a similar fashion, it is generally assumed ough and objective manner. Nevertheless,
from the competitive behavior for binding partial successes indicate that signal has be-
shown by many agonists and antagonists that gun to emerge from the smoke and mirrors.
3 Known Receptors
3.5.1 Homology Modeling. Often, the crys- is systematically forced to adopt the coordi-
tal structure of the therapeutic target is not nates of overlapping segments of the 3D motif
available. but the three-dimensional structure and its energy evaluated. In essence, the local
of a homologous protein will have been deter- multibodied interactions induced by the 3D
mined. Depending on the degree of homology constraints are evaluted with an empirical
between the two proteins, it may be useful to pseudopotential that has been calibrated on
model-build the structure of the unknown the PDB database (354,355) and that is capa-
protein based on the known structure. Many ble of returning a low energy for native se-
models (339341) of the various G-protein- quences compared with scrambled sequences
coupled receptors have been built based on ho- or protein with other 3D structures. If one
mology with bacterial rhodopsin. Models of cannot discriminate native structures from
the three-dimensional structures of human other folding motifs, then there is little chance
rennin (342) and HIV protease (343,344) were that an unknown sequence, which folds in a
built from crystal structures of homologous similar 3D pattern, would be discriminated.
aspartyl proteinases as aids to drug design. The basic assumption is that 3D homology ex-
The known structures of serine proteases ists between the test sequence and some se-
have served as templates for models of phos- quence represented in the motif database.
pholipase A2 (345) and convertases or subti- This is not necessarily true, inasmuch as
lases (346). The crystal
- structure of the MHC many as 40% of the new structures by crystal-
class I receptor served to generate a hypothet- lography determined have no known 3D ho-
ical model of the foreign antigen-binding site mologs. In fact, in an analysis of the genomes
of Class I1 histocompatibility molecules (347). of several sequenced microorganisms (356), no
Models of human cytochrome P450s have more than 12% of the deduced proteins had
been built by homology as well (348). detectable homology with proteins of known
One of the major difficulties facing con- structure. In the CASP competition, however,
struction of such models is the alignment the most predictive success has been with this
problem that is compounded by multiple in- approach when a 3D homology existed.
sertions and/or deletions. As the number of One interesting question that arises is an
known homologous sequences increases, the estimate of the number of protein motifs that
alignment problem is lessened by consensus exist. One way to approximate this is to as;
criteria. Although the interior core of the pro- sume random sampling of protein motif space
teins is often quite similar, significant alter- and then analyze the frequency of new motifs
ations can occur on surface loops, and much in new crystal structures that leads to a num-
effort has been expended to fold these loops ber of approximately 1500 folds (357). Of
(123, 349). With regard to the utility of such course, such an estimate is always biased by
models in drug design, one can expect that size of protein, ease of crystallization, abun-
they will prove useful conceptually, but that dance, and so forth. Lattice approaches give a
the molecular details required for optimizing maximal estimate of 4000 folds (358). Over
specificity, for example, would be deficient. 1000 protein structures are known with ap-
One tries to exploit the often subtle differ- proximately 120 folds (351).
ences that arise from sequence changes, which At a more local level, proteins are gener-
are reflected in the three-dimensional struc- ated from a set of architectural building
ture. Models built by homology would be ex- blocks, helices, sheets, turns, and so forth. If
pected to be weakest in those areas in which one can accurately determine the location of
sequence differences were greatest. these structural elements within a sequence,
then the difficulty of assembly of these com-
3.5.2 Inverse Folding and Threading (350- ponents is significantly easier because the
353). This is the ultimate in motif recogni- degrees of freedom have been drastically re-
tion. One makes use of the ever-increasing da- duced. Unfortunately, our ability to accu-
tabase of known three-dimensional structures rately determine these elements of secondary
to generate a set of 3D folding motifs for pro- structure seems to have peaked at the 75%
teins. The sequence of an unknown structure accuracy level (359, 360).
Molecular Modeling in Drug Design
LIN US. LINUS (Local Independent Nucle- values. A contact between nonpolar atoms
ating Units of Structure) (361) is an imple- (carbon or sulfur) is worth -0.7 kcdmol at
mentation of a hierarchical folding model in closest contact and scaled down from there.
which protein sequences are subdivided into Buried non-hydrogen-bonding groups get a
overlapping 50-residue fragments to assess penalty of 1.5 kcal/mol. Polar conflicts in
the algorithm effectiveness in predicting which two donors or two acceptors are in con-
short- and medium-range interaction as well tact are given a similar penalty. Constraint-
as to limit computational complexity. The al- based exhaustive search is used (systematic
gorithms
- accumulate favorable structures search with limits such that no steric overlap
within a sequence window, and repeat the pro- is allowed and that a compact structure is gen-
cess as the window is allowed to grow over the erated), a branch-and-bound method that
sequence. Obviously, this is an embodiment of guarantees that all globally or near-globally
the principle of hierarchical condensation of optimal conformations will be found, while ne-
local initiation of folding. At the beginning, glecting less important conformations. The
the segment length is six and the starting con- compact structure is guaranteed by a volume
formation set to all extended backbone. Start- constraint about 60% higher than the volume
ing at the N-terminus of the segment, three- of a native protein of the same size. Side
residue subfragments are perturbed with chains are introduced in their most populated
backbone torsional values from a library to rotameric state from the PBD and only
give a trial conformation. If two atoms over- changed to an alternate rotamer to avoid a
lap, the trial conformation is rejected. Other- vdW contact. Four -proteins were used to test
wise, the energy is evaluated and selection de- the approach, avian pancreatic polypeptide
pends on the Metropolis criterion. For each (IPPT), crambin (ICRN), melittin (2MLT),
interaction cycle, 6000 iterations of this proce- and apamin (18 residues). Some 190 million
dure are performed, 1000 iterations for equi- conformations were generated for lPPT, with
librium and 5000 samples. Conformations of 8217 having an energy not more than 16 kcall
chain segments that give a high frequency in mol above the optimum found. The conforma-
the sample are frozen and the segment size tion with the lowest rms to the native struc-
increased. Backbone atoms and highly simpli- ture was within the 100 lowest energy
fied side chains are used in the simulations. conformations found, but the true native
The simplified energy function has a vdW structure had a lower energy by use of the
term, a hydrogen-bonding term, and a back- same energy function than that of any con-
bone torsional term. former found by 3-10%. This implies that the
Given the arbitrary fragmentation of the major problem was conformational sampling,
protein for computational efficiency, the pre- not just an oversimplified potential function.
dicted secondary structures were surprisingly Genetic Algorithm. Le Grand and Merz
accurate for the five cases examined, with he- (364) applied the genetic algorithm to a model
lical and sheet boundaries within two residues of proteins using a rotamer library and the
of their corresponding native structures. Nev- AMBER potential function. In a second study,
ertheless, the rms differences were rather they used a fragment library and a knowledge-
large, from 3 to 9 A. Certainly, these results based potential function. Sun (365) used a
are quite encouraging and confirm the ideas fragment library consisting of di- to pentapep-
from studies on lattices by Dill (362) and oth- tides and the Sippl potential. He predicted
ers that much of the secondary structure is the structures of mellitin, avian pancreatic
encoded into local patterns of hydrophobic polypeptide, and apamin (both fragments
and polar residues. from apamin and APP were included in the
GEOCORE (363). Amino acids are repre- library, so it is not so surprising that the rms
sented at the united atom level with explicit agreement for these two was around 1.5 A).
polar hydrogens with slightly reduced vdW ra- Bowie and Eisenberg (366) used the genetic
dii. The approach uses a discrete set of @, 9 algorithm with a fragment library of from 9 to
values for each residue type: Gly has six, Pro 25 residues and their own knowledge-based
has three, and most others have four or five potential. The fragment most similar to that
3 Known Receptors 125
of the sequence based on 3D profiles (367) was with a compact structure. This is done within
chosen. They were able to fold 50-residue frag- the framework of a simple and readily formal-
ments to within 4.0 A based on the error in the ized geometric model.
distance matrix. This avoids the problem of The system of intraglobular residue-resi-
embedding and generating the wrong chiral- due contacts of a protein of N residues may be
ity, which reduces the error estimate. represented as an N x N matrix of the carbon-
alphas, whose elements are ones (contact) or
3.5.3 Contact Matrix. Instead of searching zeros (lack of contact). Any reasonable defini-
the three-dimensional coordinate space, one tion of contact provides ones in the positions
can reduce dimensionality by focusing on gen- (i, i + 1) that correspond to a peptide bond
erating an optimal contact map in 2D (368). between two adjacent residues in the se-
The 3D coordinates of a correct contact map quence. The same is true for the residues cor-
can be generated within 1Arms for the carbon responding to the pair of cysteines forming a
alphas by distance geometry (369) or other disulfide bond (these data may not be available
methods (370). By use of the powers of the as input and may be used as a test of correct
contact matrix as constraints that limit the prediction). This set of contacts describes the
contact matrices to compact structures, explo- sequential covalent topology and is a constant
ration of various potential interactions be- part of the contact matrix which does not de-
tween secondary structural elements can be pend on the spatial structure of the polypep-
done efficiently. Because of the limited predic- tide chain; however, any additional informa-
tion on existing intraglobular contacts (e.g.,
tive ability of current secondary structure pre-
from NMR data or disulfide linkage) can easily
diction paradigms, a set of plausible inputs to
be introduced in the constant part A" of the
this procedure need to be generated, and the
contact matrix A:
best structures that are derived evaluated fur-
ther. This may be an efficient low resolution
A"= const. (3.1)
model builder and have some of the computa-
tional advantages of the hydrophobic core con-
The number of contacts involving a given
straints used by Dill and coworkers. This ap-
residue ni(the coordination number of the ith
proach based on geometrical constraints was
originally proposed by Kuntz et al. in 1976 residue) .
(371). The matrices of residue-residue con-
tacts provide, at the very least, a significant
partial solution to the prediction of long-range
intersegmental contacts through a formalism
explicitly describing the structure and some are assumed to be approximate constants (co-
structure-related properties of a protein glob- ordination number) and are determined by a
ule in terms of matrices of residue-residue separate algorithm based on residue type and
contacts without explicit knowledge of second- position in the sequence as well as predicted
ary structure predictions, although they can secondary structure.
be a useful source of constraints. In many A very important condition of spatial con-
ways, the success of this approach verifies the sistency of any given contact system is defined
conclusions based on lattice models that sec- by the relation
ondary structures are implicit in the pattern
of hydrophobic and hydrophilic residues and
the requirements of compactness. The resi-
due-residue contact matrices have some spe-
cial properties as mathematical objects that In other words, the squared matrix of A
can encode the geometrical requirements of should have its elements not less than c at any
compactness; the knowledge of these allows position where there is a nonzero element in
their treatment, starting with the sequence to matrix A. More generally, there exists a set of
generate a contact matrix that is consistent specific constraints regulating the relation-
Molecular Modeling in Drug Design
ships of A with its powers A', A3, and so forth. @'(A)= 2 lnql,
These relations are entirely analogous with all contacts
those known from graph theory for connectiv-
ity (adjancency) matrices. The elements of the
squared matrix represent the number of paths
of length two, the cubed matrix, the number of
paths of length three, and so forth. Finally, an
obvious property of matrix A is its symmetry It is clear that proper formulation and pararn-
(for all contact definitions considered so far, if eterization of this problem need the analysis
of the voluminous experimental data on pro-
the ith residue is in contact with the jth, the
tein structure to derive the specific properties
jth residue is in contact with the ith, also).
to be emulated.
This methodology has been used to predict
the structure of loops of helical-bundle pro-
teins, given the positions of the connection to
Thus, conditions 3.1-3.4 define the set of ma- the helices (372). Because of the uncertainties
trices A, that correspond to spatially consis- in secondary structure predictions that are
tent, compact structures of protein chains. Be- used as inputs to constrain the search, any
sides these general conditions, mainly of single prediction of the method must be
geometrical origin, any matrix A describing viewed with skepticism. Development of scor-
the structure of a real protein molecule should ing functions that discriminate between alter-
also possess several more specific properties native models at the Ca level of resolution
that may be derived from studies of the gen- would complement this approach.
eral properties of protein structures as exem- Distance Geometry. Aszodi et al. (373-375)
plified in the Brookhaven Protein Databank. explored the use of distance geometry as the
The central idea of the approach is to use both metric for comparative modeling of struc-
the general and specific properties of the con- tures. In the CASP2 target set, the methods
tact matrix and its powers for the design of a generated an overall Ca rmsd of 1.85 A for
gain (energy, penalty) function, @(A),so that glutathione transferase based on close ho-
the task of determining an appropriate intra- mologs with known structure. It had more dif;
globular contact matrix might be formulated ficulty with PNSl and built models based on
two different proteins. The correct fold was
as a problem of maximization of @(A),
not obvious based on the CHARMM energy
values for the two models.
@(A) + max Neural Networks. PROBE (376) is an inte-
A grated suite of neural network modules that
predicts folding motif, secondary structure per
with respect to A under conditions 3.1-3.4. In residue, location of disulfide bonds, and sur-
the simplest and clearest form, @(A)may be face accessibility of each residue. No critical
expressed in terms of the probabilities of con- assessment of the accuracy of the results from
tact between the residues of different types (or this package was given in the description, but
groups), qG.The solution of the problem pro- is available for evaluation.
vides the most probable residue-residue con- Discrimination Between Folds. Because of
tact matrix A in the inherent error in potential functions, sec-
ondary structure prediction methods, limited
@(A)= I1 qij+ max, sampling, and so forth, one can anticipate that
all contacts (3.6) prediction of a variety of alternative struc-
A
tures (perhaps, by several methods) would be
more likely to generate a correctly folded
which is the sense of the maximum likelihood structure than any single prediction. The
principle. This condition may be rewritten in problem then becomes one of discriminating
the form between the correct structure and alterna-
4 Unknown Receptors
tives that may be very similar in overall qual- molecular dynamics and the Monte Carlo
ity of fold. Park et al. (377) evaluated the abil- method, are not possible. One can only at-
ity of 18 low and medium resolution energy tempt to deduce an operational model of the
functions to discriminate correct from incor- receptor that gives a consistent explanation of
rect folds. Functions that were effective in the known data and, ideally, provides predic-
protein threading were not competitive in dis- tive value when considering new compounds
criminating the X-ray structure from ensem- for synthesis and biological testing. The utility
bles of plausible structures, and vice versa. of such an approach has been demonstrated by
Obviously, these empirical functions have Bures et al. (2351, who used the pharmaco-
been derived to optimize their discriminate phoric pattern derived for the plant hormone
abilities for a given problem class and the auxin, to find four novel classes of active com-
training (selection) sets were different. In pounds by searching a corporate three-dimen-
other words, the true physics has not been sional database of structures. In many ways,
captured by any of the methods. Crippen (378) the approach that has evolved is analogous to
also raised serious doubts concerning the abil- the American parlor game of 20 questions, in
ity of "empirical" energy functions to identify which the medicinal chemist poses the ques-
correctly folded structures based on studies tions in terms of novel three-dimensional
with simple lattice models. Thomas and Dill chemical structures and attempts to interpret
(379) described an iterative approach EN- the response of the receptor in a consistent
ERG1 to generate pairwise residue "energy" manner. The underlying hypothesis is a struc-
scores from the PDB. This is one alternative to tural complementarity between the receptor
the Boltzmann-based pairing frequency anal- and compounds that bind. In the same way
ysis used by others (380).The assumption that that the receptor's existence could be deduced
pairing frequencies are independent is not based on pharmacological data, some low res-
true based on lattice simulation and, there- olution three-dimensional schematic of the re-
fore, the underlying assumption of the Boltz- ceptor, at least with regard to the active site or
mann approach is flawed. The study that used binding pocket, can be deduced by analysis of
two different sets of proteins to thread was structure-activity data. It is the purpose of
able to classify 88%of 121 proteins having less this section to summarize the current ap-
than 25% homology and no homologs in the proaches in use for receptors of unknown .
training set. The method appears to separate three-dimensional structure and evaluate
interactive free energies from chain configura- their utility. For purposes of this section, re-
tional entropies and thus give a more realistic ceptor is often used in a completely generic
estimate. sense, including enzymes and DNA, for exam-
ple, as the macromolecular component (i.e.,
binding site) of recognition of biologically ac-
4 UNKNOWN RECEPTORS tive small molecules.
X-A
Figure 3.27. (a) Pharmacophore hypothesis with correspondence of functional groups in drugs, A =
A', B = B', C = C'. (b) Binding-site hypothesis by use of drugs with hypothetical binding sites
attached (X, Y, and Z overlap).
tion, by chemical modification and biological macophoric groups with retention of activity.
testing, of the relative importance of different This is the basis of the current activity (381,
functional groups in the drug to receptor rec- 382) in peptidomimetics, in which the amide
ognition. This can give some indication of the backbone of peptides has been replaced by
nature of the functional groups in the receptor sugar rings, steroids (383, 384), benzodiaz-
that are responsible for binding of the set of epines (385), or carbocycles (386, 387) (Fig.
drugs. Second, a hypothesis is proposed (Fig. 3.28). In the pharmacophoric hypothesis,
3.27) concerning correspondence, either be- physical overlap of similar functional groups is
tween functional groups (pharmacophore) in assumed; that is, the carboxyl group fr'om
different congeneric series of the drug or be- compound A physically overlaps with the cor-
tween recognition site points postulated to ex- responding carboxyl group from compound B
ist within the receptor (binding-site model). and with the bioisosteric tetrazole ring of com-
The intellectual framework for use of pound C.
structure-activity data to extrapolate infor- One caveat that must be remembered is the
mation regarding the ligand's partner, the re- probability of alternate, or multiple, binding
ceptor, is the concept of the pharmacophore. modes. The interaction of a ligand with a bind-
The pharmacophore, a concept introduced by ing site depends on the free energy of binding,
Ehrlich at the turn of the 20th century, is the a complex interaction with both entropic and
critical three-dimensional arrangement of mo- enthalpic components. Simple modifications
lecular fragments (or distribution of electron in structure may favor one of several nearly
density) that is recognized by the receptor energetically equivalent modes of interaction
and, in the case of agonists, that causes subse- with the receptor, and change the correspon-
quent activation of the receptor upon binding. dence between functional groups that has pre-
In other words, some parts of the molecule are viously been assumed and supported by exper-
essential for interaction, and they must be ca- imental data. Changes in binding mode of an
pable of assuming a particular three-dimen- antibody FAB fragment to progesterone and
sional pattern that is complementary to the its analogs have been shown by crystallogra-
receptor to interact favorably. One corollary of phy (390,391) of the complexes. For this rea-
the pharmacophoric concept is the ability to son, analysis of agonists as a class is usually
replace the chemical scaffold holding the phar- preferred, given that the necessity to both
Jnkna
(4
= Tyr-Gly-Gly-Phe-Leu-OH
(Enkephalin)
'0 = H2N-Ala-Gly-Cys-Lys-Asn-
Phe-Phe-Trp-Lys-Thr-Phe-
Thr-Ser-Cys-OH
(Somatostatin)
Figure 3.28. Peptidomimetics that have been designed based on iterative introduction of con-
straints into parent peptide and hypotheses concerning receptor-bound conformation. Enkephalin
mimetic (3881, RGD platelet GPIIbLIIa receptor antagonists (384, 385), thyroliberin [TRH (38711,
and somatostatin (383,389).For an overview of recent approaches to peptidomimetic design, see the
review by Bursavich and Rich (382).
Molecular Modeling in Drug Design
bind and trigger a subsequent transduction the receptor that interact with ligands as be-
event is more restrictive than the simple re- ing the common features for recognition of a
quirement for binding shared by antagonists set of analogs. When pharmacophore and
(336). Compounds that clearly are inconsis- binding-site hypotheses are compared, the
tent with models derived from large amounts binding-site model is physicochemically more
of structure-activity data may be indicative of plausible, in that overlap of functional groups
such changes in binding mode, and may re- in binding to a receptor is more restrictive
quire a separate structure-activity study to than assuming the site remains relatively
characterize their interaction. Despite its lim- fixed when binding different ligands. How-
itations, the pharmacophore approach is often ever, the number of degrees of freedom in
the most appropriate because of lack of de- binding-site hypotheses, represented by the
tailed information regarding the receptor and necessary addition of virtual bonds between
can yield useful insights, as seen in the case of groups A and X, B and Y, and C and Z in Fig.
clinical success with tyrosine kinase inhibitors 3.27, is greater. Additional degrees of freedom
(392,393) and other recent examples (394). complicate subsequent conformational analy-
ses and may preclude any conclusions unless a
4.1.2 Binding-Site Models. One major defi- sufficiently diverse set of compounds is
ciency in the approach described above is the available.
requirement for overlap of functional groups Other approaches to this problem have em-
in accord with the pharmacophoric hypothe- phasized comparison of molecular properties
sis. Although it is true that molecules having rather than atom correspondences. Kato et al.
functional groups that show three-dimen- (395) developed a program that allows con-
sional correspondence can interact with the struction of a receptor cavity around a mole-
same site, it is also true that a particular ge- cule emphasizing the electrostatic and hydro-
ometry associated with one site is capable of gen-bonding capabilities. Other molecules can
interacting with equal affinity with a variety then be fit within the cavity to align them.
of orientations of the same functional groups. This is similar in concept to the field-fit tech-
One has only to consider the cone of nearly niques available in the CoMFA module of
equal energetic arrangements of a hydrogen- SYBYL, in which the molecular field (electro-
bond donor and acceptor to realize the prob- static and steric) surrounding a selected ipol-
lem. Sufficient examples from crystal struc- ecule becomes the objective criterion for align-
tures of drug-enzyme complexes and from ment of subsequent molecules for analysis. An
theoretical simulation of binding compel the example emphasizing molecular properties in
realization that the pharmacophore is a limit- pharmacophoric analysis was given by Moos et
ing assumption. Clearly, the observed binding al. (396) on inhibitors of CAMPphosphodies-
mode in a complex represents the optimal po- terase 11.
sition of the ligand in an asymmetric force
field created by the receptor that is subject to 4.1.3 Molecular Extensions. If we assume
perturbation from solvation and entropic con- the binding-site points remain fixed and can
siderations. Less restrictive is the assumption augment our drug with appropriate molecular
that the receptor-binding site remains rela- extensions that include the binding site (i.e., a
tively fixed in geometry when binding the se- hydrogen-bond donor correctly positioned
ries of compounds under study. Experimental next to an acceptor), we can then examine the
support for such a hypothesis can be found in set of possible geometrical orientations of site
crystal structures of enzyme-inhibitor com- points to see whether one is capable of binding
plexes, where the enzyme presents essentially all the ligands. Here, the basic assumption of
the same conformation, despite large varia- rigid site points is more reasonable, at least for
tions in inhibitor structures; studies of HIV-1 enzymes that have evolved to catalyze reac-
protease complexed with diverse inhibitors tions and must, therefore, position critical
support this view (171). groups in a specific three-dimensional ar-
In recent years, therefore, there has been rangement to create the correct electronic en-
an increasing effort to focus on the groups of vironment for catalysis. The program checks
4 Unknown Receptors
CH2
I
CHz e C H 2 - P H - C , N HII\ ~ H 2 I
I
O=P-NH-CH-C' CH2 O COOH
I I
I' COOH
OH 0
Figure 3.31. Compounds from different chemical classes of ACE inhibitors used in active-site
analysis. Used with permission (397).
2. The compound must be capable of assum- 3. The compound must not compete with
ing a conformation that will present the the receptor for space while presenting
pharmacophoric or binding-site pattern the pharmacophoric or binding-site
complementary to that of the receptor. pattern.
4 Unknown Receptors
COOH
0 CH2-CH2-CH2-P-CH2-C-N
I
O COOH II
OH O COOH
CH3
CH3--(
0 CH2 CHP
II I I
HO-P-NH-CH-C-NH-CH
I II I
SH O COOH 0 COOH
HO
CH3
8 \ CH2-CH2-CH-NH-CH-C-N
I
CH3
I
O
II
3
COOH
ecH2-TH-NH-c
CH2
I
SH
O
,cH~,
COOH
NH CH3
/
0
4
\
C-CH2-CH2-CH-NH-CH-C-N
I
COOH
' 3 II
O COOH
HS-CH2-CH2-CHz COOH
Once these conditions are met, we can at- cophoric pattern, but incapable of binding, to
tempt to deal with the potency, or binding af- help determine the location of receptor-occu-
finity. This belongs to the domain of three- pied space in relation to the pharmacophore
dimensional quantitative structure-activity (receptor-mapping) (402). This allows a crude,
relationships (3D-QSARs) (400) and we illus- low resolution map of the position of the recep-
trate the use of a particular variant, CoMFA tor relative to the pharmacophoric elements
(187,401),on ACE inhibitors at the end of this and indicates in which directions chemical
chapter. Condition 3.3 allows us to utilize modifications may be productive.
compounds capable of presenting the pharma- The number and diversity of compounds
134 Molecular Modeling in Drug Desig~
Figure 3.32. Change in OMAP (projection of three of the five dimensions) as new compounds were
introduced to analysis of ACE inhibitors (397). Left is original OMAP of compound 1 (Fig. 3.30). Right
is OMAP after completion of analysis.
available for analysis determine the method- important and then comparison of moleculai
ology to be used. If there is a limited data set, properties becomes of interest. A major im,
then the pharmacophoric approach should be pediment to analysis is the definition of a corn,
assessed first because of its fewer degrees of mon frame of reference by which to align mol.
freedom. If no pharmacophoric patterns are ecules for comparison. This is equivalent tc
consistent with the set of analogs, then intro- solving the three-dimensional pharmaco,
duction of logical molecular extensions to en- phoric pattern, and implies that one has dis.
able the active-site approach is warranted. Op- tinguished those properties of the molecule!:
erationally, one first determines the set of under consideration in a manner similar t c
potential pharmacophoric patterns consistent the receptor. Initial efforts to rationalize
with the set of active analogs [leading to its structure-activity relationships (SARs)among
name of Active Analog Approach (398)l. If noncongeneric systems was hampered by ar
there are sufficient data, then a unique phar- "RMS mentality." That is, a point of view thal
macophore, or active-site model, may be iden- required atomic centers to align rather than
tifiable. The basic assum~tionbehind efforts overlap of steric and electronically simila~
to infer properties of the receptor from a study grouping of atoms. An example would be re.
of structure-activity relations of drugs that quiring the six atoms of aromatic benzene
bind is the idea of complementarity. It follows rings to overlap at each of the six atoms of the
that the stronger the binding affinity, the ring vertices rather than simple requirements
more likely that the drug fits the receptor cav- for coincidence and coplanarity that would
ity and aligns those functional groups that recognize the torus of electron density that the
have specific interactions in a way comple- rings share in common (Fig. 3.33). In conge-
mentary to those of the receptor itself. cer- neric series, the difficulty in assignment oi
tainly, our understanding of intermolecular correspondence is less (nonexistent by defini-
interactions from studies of known complexes tion). This allows a variety of approaches, in-
does not dissuade us of this notion, but may cluding those based on molecular graph the-
make us somewhat skeptical of the naive mod- ory (404-4071, to detect similarities between
els that often result from such efforts. An- molecules that can form the basis of a correla-
drews et al. (403) reviewed efforts of this type tion analysis. Extrapolation outside of the
with regard to CNS drugs. group of congenerically related compounds on
Clearly, the key to insight relies on chemi- which the analysis was based would appear
cal modification to determine the relative im- difficult, if not impossible.
portance of functional groups for molecular Although it is simpler to start an analysis
recognition. Often more subtle effects than with a congeneric series to identify the recog-
the simple presence or absence of a group are nition elements, diversity in chemical struc-
4 Unknown Receptors 135
Figure 3.33. Torus of electron density representing benzene ring. Atom-to-atom correspondences
of ring atoms used in normal fitting routines lead to overconstrained fits.
tures implies more information regarding the dimensional patterns and generates an
conformational requirements of the system. A opportunity for determining a unique solu-
congeneric series requires that the basic tion.
chemical framework of the molecule remains
constant and that groups on the periphery are 4.2 Searching for Similarity
either modified (e.g., aromatic substitution) or
substituted (e.g., tetrazole for carboxyl func- 4.2.1 Simple Comparisons. To gain insight .
tional group). Implicit in this concept is the into molecular recognition, subtle differences
notion that the compounds bind to the recep- in molecules must be perceived. Comparisons
tor in a similar fashion and, therefore, the can be divided into two categories: those that
changes are localized and comparable for each are independent of the orientation and posi-
position of modification. Introduction of de- tion of the molecule and those that depend on
grees of freedom in the substituents as well as a known frame of reference. Simple compari-
consideration of differences in properties that sons deal with properties independent of a ref-
are conformationally dependent, such as the erence frame. For example, the magnitude of
electric field, require conformational analysis the dipole moment is frame independent, but
in an effort to determine the relevant confor- the dipole itself is a vectorial quantity depen-
mation for comparison. dent on the orientation and conformation of
The problem can be divided into two: what the molecule, Similarly, the bond lengths, va-
are the aspects of the molecules that are in lence angles and torsion angles, and inter-
common and that may provide the basis for atomic distances are independent of orienta-
molecular recognition, and which conforma- tion. The distance matrix, composed of the set
tion for each molecule is appropriate to con- of interatomic distances (Fig. 3.34), is a conve-
sider. For the first problem, studies on a con- nient representation of molecular structure
generic series can often yield valuable insight. that is invariant to rotation and translation of
For determination of the three-dimensional the molecule, but which reflects changes in
lrrangement of the crucial recognition ele- internal degrees of freedom. The distance
nents, diversity in the chemical scaffolds im- range matrix is an extension (Fig. 3.34) that
loses different constraints on possible three- has two values for each interatomic distance
Molecular Modeling in Drug Design
02
Figure 3.35. Distance range
matrices used for illustra-
tion of analysis of musca-
rink receptors (398). Used
with permission.
lnknown Receptors 137
ecu les has been the sheer volume of informa- In a similar procedure to that described for
ti01I produced. The traditional means of dis- the display of electrostatic potential, Cohen
pla:ying such large amounts of data has been to and colleagues developed a technique whereby
dis]play the electrostatic potential around a the steric field surrounding- a molecule can be
mo.lecule as a two-dimensional contour map. displayed on a graphics screen as a three-di-
Thcs advent of computer graphics techniques mensional isopotential contour map (415).
ha! re improved the situation by allowing The map is generated by calculating the VDW
thr4ee-dimensional contour maps to be dis- interaction energy between the molecule and
pla:yed in color on the graphics screen and ma- a probe atom or molecule placed at varying
nip.ulated in real time along with a display of points around the molecule of interest. This
the molecule itself. An alternative mode for interaction energy is then contoured at spe-
disl)laying molecular electrostatic potentials cific levels to give the most stable VDW con-
is tjo employ a dotted surface representation, tour lines around the molecule, that is, the
witlh the dots taking on an appropriate color contour that represents the most favorable
accc~rdingto the electrostatic potential value steric position for the probe as it is moved
at t~he relevant location. Such techniques were around the target.
Molecular Modeling in Drug Design
Lattice . ..................................
......... .....
a. .). .a. .a.
.....................................................
-. -.* ..
- ..;,-.
...-.-5.. -....."...."....'.;s;s;s.
..............
I.
PLS
Figure 3.37. Calculation of electrostatic and
VDW fields surrounding a series of molecules
in defined orientations are used as a basis for
Equation \1
31) QSAR correlations in C ~ M F A(187,401). Bio = y + a x SO01 + b x SO02 + ..... + m x S998 + n x E001
used with permission.
0-0 $
/
Determine
Centroid
.*.
8 /
bet!
------,
Normals
Du
Atom
Carbon
Temperature
factor
60
Atomic
number
25
Nitrogen 55 25
Figure 3.38. Construction of dummy vector per- Oxygen 50 25
pendicular to plane of aromatic ring at centroid that Sulfur 67 35
allows superposition and coincidence of aromatic Phosphorus 70 35
rings by fitting endpoints (Du) of dummy vector Hydrogen 40 15
without requiring superposition of ring atoms. Bromine 65 50
Chlorine 60 35
Fluorine
along coordinate axes can be used, or the mol-
Iodine
ecules can be successively fit to one that is
Sodium
used as the standard orientation. Danziger Potassium
and Dean (422) described an approach that Calcium
will find geometric similarities in positions of Lithium
hydrogen-bonded atoms between two mole- Aluminum
cules. Least-squares-fitting procedures for Silicon
designated atoms allow selectivity in orienting-
the molecules with predetermined conforma-
tions in the most appropriate manner. Kears- Figure 3.39. Set of parameters to generate pseudo-
electron density maps of molecules that can be con-
ley (423) described an efficient method for fit- toured to approximately represent VDW surface
ting a series of molecules when atom-atom (Ho and Marshall, unpublished).
associations have been previously defined be-
tween members of the series. In some cases,
the use of dummy atoms allows geometric su- three-dimensional grid that surrounds the
perposition of groups such as aromatic rings molecule whose atoms are replaced by dummy
without requiring superposition of the atoms Gaussian atoms. Atom types are characterized
composing the ring. By defining the centroid by a half-width and an integrated density, cho-
of the ring and erecting a normal to the plane sen so that the Gaussians have a fixed value at
of the ring, the dummy atom at the end of the a distance equal to the VDW radius (Fig. 3.39).
normal and the centroid dummy atom can be Such density maps may be contoured in three
used to superimpose the ring on another ring dimensions to provide a chicken wire-like en-
with similar dummy atoms (Fig. 3.38). This velope around the molecule that corresponds
method leads to coincidence and coplanarity of to the van der Wads surface.
the two ring systems without requiring the A concomitant benefit of this technique is
atoms composing the rings to be coincident. In that estimates of the molecular surface area
other words, the rings can be viewed as two and volume are generated as by-products of
toruses of electron density without overem- the contouring routines, whether the surface
phasizing the positions of the atomic nuclei. In is being drawn around one or several mole-
numerous studies [see review by Andrews et cules. Additionally, the generated surfaces
al. (403)l of biogenic amine ligands, this and volumes are readily susceptible to logical
method of comparison of the aromatic ring operations, such as union, intersection, or
components is essential to allow alignment of subtraction, enabling the rapid determination
the nitrogens. of, for example, union or difference volumes
among a series of molecules.
. . - One method of dis-
4.3.1 Volume Mapping. Once one has fixed the molecules in a com-
playing molecular surfaces that retains the mon frame of reference, then comparison by a
ability to transform the display interactively variety of techniques becomes feasible. As an
has been developed by Marshall and Barry example, difference in volume may be impor-
(424). The procedure involves computing a tant in understanding the lack of seen activity
molecular pseudo-electron density map on a in compounds that appear to possess all the
Molecular Modeling in Drug Design
prerequisites for activity seen in others in the troduced flexibility in the comparison of mol-
series. In a congeneric series, a significant por- ecules based on their electrostatic potential
tion of the molecular structure is common to fields.
the molecules under comparison. This com-
mon volume that is shared logically should not 4.3.3 Directionality. If one is comparing
contribute to differences in activity. By sub- molecules that share interaction at a common
traction of the volume shared by two mole- site on a biological macromolecule, it is logical
cules, one obtains a difference map in which to assume that they may do so by interacting
the volume occupied by one molecule and not with similar sites in the receptor with optimal
the other remains (398). Correlations between interaction shown by molecules with correctly
oriented functional groups. If one does not
the shared volume and the biological activity
have a three-dimensional model of the recep-
of a congeneric series of inhibitors of DHFR
tor from which to deduce potential interactive
have been shown by Hopfinger (425). Simon
sites, then one can only attempt to deduce the
and his colleagues (426)emphasized the use of potential interactive receptor-subsites by ex-
both overlapping volume and nonoverlapping amination of the molecules that interact with
volume in QSAR studies in a quantitative them. Systematically, one can vary the confor-
methodology, the minimal steric difference, or mation of a molecule and record the relative
MTD method. This approach has been en- orientation of groups postulated, or shown ex-
hanced to allow comparison of low energy con- perimentally, to play a dominant role in inter-
formers of each molecule and use of those that molecular interactions. In this way, one can
are sterically most similar. An application to map out the directionality of interactions of
substrates of acetylcholinesterase illustrates each functional group of the ligand in a com-
this facility (427). mon frame of reference. Comparison of these
maps can often lead to hypotheses regarding
4.3.2 Field Effects. Once the frame of refer- pharmacophoric groups and their correspon-
ence has been established, other properties of dence between molecules.
molecules, such as the electrostatic field, can
be compared as well. Because the electrostatic 4.3.4 Locus Maps. One can generate a lo-
properties can be sampled on a grid, differ- cus plot in coordinate space showing all $he
ences between the values of two molecules can potential locations of one group relative to an-
be calculated and a difference map contoured. other by fixing one group in a particular orien-
Such difference maps (428) highlight more tation as a frame of reference and recording all
clearly the similarities and differences be- possible coordinates of the other. An example
would be the relative positions of the basic ni-
tween molecules. Hopfinger (429) integrated
trogen to the aromatic ring in compounds such
the difference between potential fields and
as dopamine interacting with biogenic m i n e
showed this parameter to be useful in QSAR receptors. One must choose the common frag-
studies. ment (in the example, the aromatic ring) of
An approach to statistically quantifying the each molecule and its orientation to generate a
similarity between two molecular electrostatic similar frame of reference, so that the locus of
potential surfaces was developed by Dean and positions of the atom (the basic nitrogen) leads
coworkers (430,431) and by Richards and co- to a meaningful comparison across a series of
workers (215). Here, the previously deter- molecules (Fig. 3.40).
mined molecular electrostatic potential sur-
faces are projected outward onto surrounding 4.3.5 Vector Maps and Conformational
spheres that provide a common surface of ref- Mimicry. Often, one is more interested in ac-
erence, and then statistical analyses are per- cessing the directionality of potential interac-
formed over the points on this common sur- tion rather than simply looking for overlap of
face in an attempt to quantify the similarities atoms such as the basic nitrogen. In this case,
or differences between the two molecules un- for example, one is interested in determining
der consideration. Burt and Richards (432) in- both the locus of the lone pair of the nitrogen
4 Unknown Receptors
rigid and mobile domains. In general, the dif- approach with simultaneous minimization of
ficulties with most methods are similar to all variables is recommended (Fig. 3.43).
those seen with minimization procedures. If The combination of molecular mechanics
one is in the area of the global minimum, then with flexible minimization routines allows
one is likely to converge to that solution. Oth- penalty functions to be assigned to force geo-
erwise, one will be trapped in some local min- metrical correspondence of groups, whereas
imum. In contrast, systematic search methods individual molecules have their internal en-
are algorithmic, so that all sterically allowed ergy evaluated, but are invisible to the other
conformations are generated at the selected molecules under consideration. A program has .
torsional grid parameters. Systematic search been described (437) with this capability and
methods, therefore, do not have problems in its use illustrated on histamine antagonists by
sampling and are path independent, but are Naruto et al. (438). Template forcing allows
combinatorial in complexity, which may limit one molecule to be set up as a template and
the fineness of the sample grid and thus com- another molecule to be constrained to overlap
promise the results. Only in small systems in a specified manner. The strain energy in-
such as cycloalkane rings (121) and small pep- volved in forcing correspondence gives an up-
tides (90, 436) have the potential energy hy- per-bound estimate of the distortion energy
persurfaces been mapped. required, given that the results depend on the
initial-problem definition.
4.4.1 Constrained Minimization. In cases An alternative approach uses the distance
where one has internal degrees of freedom, geometry paradigm, in which all the con-
besides the six associated with position and straints are combined to form the distance
orientation. the use of constrained minimiza- matrix from which energetically feasible con-
tion procedures becomes a useful technique. formations of the set of molecules are sought
Often the standard molecule for comparison mathematically. Sheridan et al. (439) demon-
has a fixed conformation and the molecule to strated this approach on acetylcholine analogs
be fitted has internal degrees of freedom. Sev- that are muscarinic agonists. Both of these ap-
eral groups have published methods for deal- proaches ask the same question and suffer
/ ing with this problem. In case one has simul- from the same limitations, and differ only in
taneous degrees of freedom in both the computational technique. Each suffers from
molecule to be fitted and the target, a different the local minima problem, in that each uses a
Molecular Modeling in Drug Design
minimization technique, and the results will then the OMAP for each active molecule must
be dependent on the starting geometries of the contain the pattern encrypted in the set of dis-
initial set of molecules. Both have the advan- tances. By logically intersecting the set of
tage that the unique constraints imposed by OMAPs, one can determine which patterns
particular molecules enter consideration at an are common to all molecules (444). In other
early stage and minimize comparison of words, all potential pharmacophoric patterns
conformations. consistent with the activity of the set of mole-
Another variant recently reported by cules can be found by this simple manipula-
Hodgkin et al. (440) uses a Monte Carlo search tion of OMAPs, and the question of unique-
procedure to generate candidate pharmaco- ness addressed directly (Fig. 3.44).
phoric patterns. A reduced force-field parame- A good example is the work of Nelson et al.
ter set is used initially to lower energy barriers -
(445) on the rece~tor-bound conformation of
between conformations to ensure greater con- morphiceptin. Based on structure-activity
figurational sampling. Candidate pharma- data, the tyrarnine portion and phenyl ring of
cophores are then refined to produce low en- residue three of morphiceptin, Tyr-Pro-Phe-
ergy conformations of molecules overlaid in a Pro-NH,, were postulated to be the pharma-
common binding mode. Application to antag- cophoric groups responsible for recognition
onists of the human platelet-activating factor and activation of the opioid preceptor. It was
led to a consistent binding model for a set of assumed further that the aromatic rings
five diverse structures when active-site hydro- bound to the receptor in the different analogs
gen-bonding groups were postulated. Barakat were coincident and coplanar. A series of ac-
and Dean (441, 442) utilized simulated an- tive analogs with a variety of conformationally
nealing to optimize structure matching by constrained amino acid analogs in positions
minimizing the difference matrix between the two and three were analyzed. Aunique confor-
two molecules. A somewhat similar approach mation was found for the two most con-
is that of Perkins and Dean (443), who used strained analogs that allowed overlap of the
simulated annealing to search conformational Phe and Tyr portions of the molecules (Fig.
space followed by cluster analysis for each 3.45). In this case, a five-dimensional orienta-
molecule, with subsequent comparison of a tion map with distances between the nitrogen
small number of diverse conformers between and normals to the two aromatic rings was
different molecules. used in the analysis.
The Active Analog Approach (Fig. 3.46) is
4.4.2 Systematic Search and the Active An- appropriate for the unknown receptor prob-
alog Approach. Once the existence of a com- lem, given that no objective criteria function,
mon pattern has been determined, then the such as'potential energy, can be used a priori
issue of uniqueness needs to be addressed. The in the absence of information regarding the
Active Analog Approach (398) uses a system- receptor. Adequate sampling of the potential
atic search to generate the set of sterically al- surface to ensure that the complete set of local
lowed conformations based on a grid search of minima is found is still problematic because of
the torsional variables at a given angular in- the phenomenon known as "grid tyranny."
crement. For each sterically allowed confor- This relates to the fact that the combinatorial
mation, a set of distances between the postu- explosion that results by decreasing the incre-
lated pharmacophoric groups are measured. ment of the torsion angles scanned limits one
The set of distances, each of which represents to a finite increment for a given problem, say,
a unique pharmacophoric pattern, constitutes 10" for a seven-rotatable bond problem. Be-
an O W . Each point of the OMAP is simply a cause the energetics of the system is very sen-
submatrix of the distance matrix and, as such, sitive to interatomic distances, a conformation
is invariant to global translation and rotation generated at the 10" increment may be steri-
of the molecule. If the initial assumption is cally disallowed, but very close to a minimum.
valid, that the same binding mode of interac- Relaxation of the structure might find the
tion, or pharmacophoric pattern, is common relevant conformation, for example, by al-
to the set of molecules under consideration, lowing a torsional angle to vary by lo. Im-
4 Unknown Receptors
3 potential
pharmacophoric
areas
Molecule 1 Molecule 2
Figure 3.44. OMAPs generated for two molecules can be logically intersected to determine which
three-dimensional patterns are common.
provements in algorithms described in the ation, generation of an OMAP from those con-
following section have helped to overcome formations, and logical intersection of the
this problem. OMAPs to determine the common pharma-
cophoric patterns. A simple analysis will easily
4.4.3 Strategic Reductions of Computa- convince one that this is not feasible because
tional Complexity. Logically, the Active Ana- of the computational complexity of the prob-
log Approach can be conceived as sequentially lem. For example, the set of 28 ACE inhibitors'
determining all the sterically allowed confor- (Fig. 3.311, analyzed by Mayer et al. (3971,
mations for each molecule under consider- have a total of 163 torsional degrees of free-
dom that have to be explored to find a common
pattern, as seen in Table 3.1. If we were to
determine all possible conformations for each
molecule at 10" torsional scan, the scan pa-
rameter (s) = 10" and the number of torsional
increments r = 360"/s, or 36. For each mole-
cule, there are r" possibilities to be examined.
For the set of molecules there are (6 x 363) +
(7 X 365) + (3 X 366) + (5 X 367) + (6 X 368)
+ (1 x 36') possible conformations to be gen-
erated and examined. If one compares each
conformation of each molecule with all the
conformations of the other molecules to find
possible correspondences, the combinatorials
of the problem explode and one reaches the
same level of complexity as a complete confor-
Figure 3.45. Conformations of two constrained mational search of a peptide of 30 residues at a
analogs of morphiceptin in which aromatic rings of 10" scan (not currently feasible).
Tyrland Phe3 are overlapped (445). One is not interested in the conformational
Molecular Modeling in Drug Design
Figure 3.46. The flow of information in the Active Analog Approach (111,399).Sterically allowed
conformations (represented by filled circles on the o,,o,torsional grid) of a molecule are determined
and the distances (dl,d,, etc.) between pharmacophore elements are recorded for each. The resulting
OMAP is used to constrain the next molecule in the series. Ideally, once all of the molecules have been
evaluated, only a single point or cluster of points remains in the OMAP.
hyperspace of the set of the inhibitors, but ines each candidate solution from the initial
rather the three-dimensional patterns com- OMAP to see whether all the other molecules
mon to the total set of inhibitors. Many con- are capable of presenting the same pattern. By
formations of a molecule often map into one changing the focus to the hypothesis of a com-
three-dimensional pattern. Transformation of mon three-dimensional pattern, a more effi-
the multidimensional conformational hy- cient approach has been devised (Fig. 3.46)
perspace in a smaller-dimensioned OMAP (399).Clearly, the algorithms that one chooses
space reduces the number of objects for com- to do the problem are important.
parison. If one starts with the most con-
strained inhibitor (fewest torsional degrees of
freedom) and determined an OMAP for it, 4.4.4 Alternative Approaches. A conceptu-
then one can use the upper and lower distance ally similar approach to receptor mapping has
bounds as constraints for searches for the next been taken by Ghose and Crippen (446-449),
molecule. In other words, one looks only who used the distance geometry method to an-
where there are possible solutions to the prob- alyze site points and drug interactions. A site
lem. A more advanced approach simply exam- model was postulated with some initial esti-
mates of force constants between the appro-
Table 3.1 Degrees of Torsional Freedom to priate portion of the ligand and the site point.
Specify ACE Active Site Geometry The binding energy for a particular binding
Degrees of Number of mode can be calculated:
Freedom (n) Molecules Total
3 6 18
5 7 35
6 3 18
7 5 35 where E, is the conformational energy, c is a
8 6 48 coefficient to be fit, x is the interaction of a site
9 1 9
point i with the bound ligand point m, which
Totals 28 163
depends on their types. The novel aspect of
4 Unknown Receptors
this approach was the use of distance geome- The ETMC is essentially an interatomic dis-
try to generate avariety of conformers binding tance matrix (Fig. 3.47), with the diagonal ele-
within the postulated site and then finding a ments containing an electronic structural pa-
set of force constants between the postulated rameter (atomic charge, polarizability, HOMO
site points and ligand points that will predict energy, etc.). Off-diagonal elements for two at-
the affinities of the compounds in the data set oms that are chemically bonded are used to
when bound in their optimal manner. With a store information regarding the bond (bond
site model of 11 attractive site points and 5 order, polarizability, etc.). Matrices for active
repulsive ones for DHFR, Ghose and Crippen compounds in a series are then searched for
-
(447) were able to derive force constants that common features that are not shared by inac-
fit 62 molecules, with an R 2 = 0.90, and pre- tive compounds. The successful examples
dict the activity of 33 molecules, with an R 2 = cited are predominately for small, relatively
0.71. The compounds, however, are essentially rigid structures where the conformational pa-
an extended congeneric series because the rameter does not confuse the analysis.
core recognition portion of the inhibitor, the Martin et al. (456) developed a strategy for
pyrimidine ring, is common to all the determining both the bioactive conformation
compounds. and a superposition rule for each active mole-
Linschoten et al. (450) extended Crippen's cule in a data set. In DISCO, a set of low en-
method by use of lipophilicity to describe the ergy conformers for each molecule is pro-
binding of parts of the ligand to lipophilic ar- cessed to locate atoms within the molecule and
eas of the receptor. Through the use of only a extensions for binding-site points for superpo-
nine-point model of the turkey erythrocyte sition. A clique-finding algorithm then finds
P-receptor and six energy parameters, they superpositions containing at least one confor-
successfully modeled 58 compounds. Distance mation of each molecule and a user-specified
geometry approaches to receptor-site model- minimum number of site points.
ing have been reviewed (449,451). Unlike methods that are limited to a pre-
Simon and his coworkers have developed computed set of rigid conformers, GASP (Ge-
(426) a quantitative 3D-QSAR approach, the netic Algorithm Similarity Program) (457) al-
minimal steric (topologic) difference (MTD) lows full conformational flexibility of ligands.
approach. Oprea et al. (452) compared MTD GASP employs a genetic algorithm for deter- -
and CoMFA on affinity of steroids for their
7
mining the correspondence between func-
binding proteins and found similar results. tional groups in different molecules and the
Snyder and colleagues (453) developed an au- alignment of these groups in a common geom-
tomated method for pharmacophore extrac- etry for receptor binding. For a set of ligands,
tion that can ~rovidea clear-cut distinction GASP automatically identifies rotatable
between agonist and antagonist pharmaco- bonds and pharmacophore elements such as
phores. Klopman (404, 454) developed a pro- rings and potential hydrogen-bonding sites. A
cedure for the automatic detection of common population of chromosomes is randomly con-
molecular structural features mesent in a structed, where each chromosome represents
training set of compounds. This has been used a possible alignment of all the molecules.
to produce candidate pharmacophores for a Chromosomes encode the torsion settings for
set of antiulcer compounds (404). Extensions rotatable bonds as well as the intermolecular
(454)of this approach allow differentiation be- mapping of elements. The fitness score of a
tween substructures responsible for activity particular alignment is the weighted sum of
and those that modulate the activity. three terms: the number and similarity of
Bersuker and Dimoglo (455) described a overlaid elements. the common volume of all
matrix-based approach that combines geomet- the molecules, ancl the internal van der Wads
ric and electronic features of a molecule, the energy of each molecule. Using a mutation or
electron-topological approach. For each mole- crossover operator, child chromosomes are
cule, an electron-topological matrix of congru- produced. Those with improved fitness scores
ity (ETMC) is constructed based on a con- replace the least-fit members of the existing
former selected by conformational analysis. population. The calculation terminates when
Molecular Modeling in Drug Design
Figure 3.47. The electron-topological matrix of congruity (ETMC)for a 17-atom fragment proposed
by Bersuker and Dimoglo (455) to encode geometrical and electronic features of molecules.
the fitness of the population fails to improve by the receptor and that must be available for
by a specified amount, or when the preset binding. Inactive compounds mentioned
number of genetic operations is completed. above should possess novel volume require-
GASP produces several sets of alignments and ments, some portion of which is likely to ove&
their associated pharmacophore elements. lap with that occupied by the receptor. As an
example of receptor mapping, Sufrin et al.
4.4.5 Receptor Mapping. One can attempt (402) showed with amino acid analogs of me-
to decipher physical properties of the receptor thionine, which inhibited the enzyme, methi-
by use of data from both active and inactive 0nine:adenosyl transferase, that the data for a
analogs. Interpretation of results requires set of rigid amino acid inhibitors required the
some understanding of the interactions be- postulation of competition between the inac-
tween ligand and receptor that underlie mo- tive analogs and the enzyme for a particular
lecular recognition. Oprea and Kurunczi (458) volume of space (Fig. 3.48). Summation of the
reviewed these interactions in the context of volume requirements for the set of com-
receptor mapping. A basic assumption is that pounds, when oriented on the amino acid
a compound that contains the correct pharma- framework, yielded a minimum space from
cophoric elements and has the capability of which the receptor could be excluded. Each
positioning them correctly should be active. amino acid had the necessary binding ele-
Compounds with these attributes that are in- ments, but several were inactive. Each of the
active must be incapable of binding to the re- inactive analogs required extra volume not re-
ceptor in the correct orientation; that is, steric quired by the active analogs and shared a
overlap with the receptor must occur. By cal- small common unique volume whose occu-
culating the combined volume of the active an- pancy by the enzyme would be sufficient to
alogs superimposed in the correct orientation, rationalize their inactivity.
one has mapped space that cannot be occupied Klunk et al. (459) used separate receptor
4 Unknown Receptors
Active analogs
4 C O O H
COOH
COOH NH2
Inactive analogs
Figure 3.48. Example of recep-
&COO'
NH2
A N H
COOH
2 Q COOH
NH2
tor mapping of set of enzyme in-
hibitors that can be aligned on
common amino acid framework.
Set of inactive compounds all re-
quire common novel volume when
compared with active compounds
VII Vlll IX (402). Used with permission.
mapping of two different chemical classes of tion, and subtraction of volumes. Analytical
hands to support the hypothesis that they representation of molecular volumes by Con-
bound to the same site. Calder et al. (460) ar- nolly (464, 465) and solvent-accessible sur-
gued that a successful correlative CoMFA faces by Kundrot et al. (466) may be an alter-
model for 36 compounds of six chemical native that would allow optimization of
classes of GABA inhibitors indicated that the volume overlap, for example, by minimizing '
alignments used were significant. In some the difference in volume between two struc-
cases, comparison of volume maps for two re- tures. The solvent-accessible surface area can
ceptors have allowed optimization of activity be used to approximate the free energy of hy-
at one receptor with respect to the other. The dration and a rapid, numerical procedure for
work of Hibert et al. (461, 462), through the its calculation has been reported (467).
use of receptor mapping to increase the selec-
tivity of a lead compound for the 5-HT,, re- 4.4.6 Model Receptor Sites. One of the first
ceptor over the a,-adrenoreceptor, has re- visualizations of a receptor model is that of
sulted in clinical trials for a novel chemical Beckett and Casey (468) for the opiate recep-
class. This steric-mapping approach has be- tor published in 1954. Because morphine and
come relatively popular, and numerous exam- many other compounds active at this receptor
ples appear in current journals (463) on a reg- are essentially rigid, the model did not have to
ular basis. address the interaction of myriad numbers of
Although there are several feasible algo- flexible, naturally occurring opioid ligands,
rithms to deal with unions of molecular vol- such as endorphins and enkephalin, which
umes, the use of pseudoelectron density func- were only subsequently discovered. The model
tions calibrated to reproduce VDW radii (424) receptor had an anionic site to bind the
with three-dimensional contouring to repre- charged nitrogen, a hydrophobic flat surface
sent the surface has allowed mathematical with a cleft to bind the phenyl ring, and a hy-
manipulation of the density associated with drophobic hydrocarbon bridge seen in mor-
each lattice point to allow for union, intersec- phine. Kier (469) published a number of pa-
Molecular Modeling in Drug Design
pers attempting to define the pharmacophore by varying the distances of the amino acid
based on semiempirical molecular orbital cal- from its postulated binding position and find-
culations of in vacuo minimum-energy confor- ing the optimal distance for correlation with
mations. Although his basic concepts were observed affinity for the ribosome. Peptidic
valid, his emphasis on the global minima in pseudoreceptors have been constructed (453)
vacuo limited his scope of applicability. that correctly rank-order glutamate NMDA
Humber et al. (470) used semirigid antipsy- agonists and antagonists (Fig. 3.49).
chotic drugs, the so-called neuroleptics, which An intermediate between unknown recep-
antagonize CNS dopamine transmission and tors and ones where the three-dimensional
displace dopamine from its receptor, to formu- structure is known are models based on homol-
late a geometrical arrangement of receptor ogy. For the medicinal chemist, the G-protein
groups to rationalize their activity. Olson et al. receptors have been of intense interest and nu-
(471) used this model to design a novel ste- merous models (339,340,461,473) of the vari-
reospecific dopamine antagonist and success- ous receptor types have been developed based on
fully predicted its stereochemistry. their presumed three-dimensional homology
Because we are reasonably convinced the with bacteriorhodopsin (474). Mechanisms of
receptor is a protein, construction of hypothet- signal transduction (475) and differences be-
ical sites from amino acid fragments and cal- tween agonists and antagonists (476) have even
culation of affinity for these sites should cor- been rationalized based on such models. Nord-
relate with observed affinity, assuming that vall and Hacksell (341) recently combined the
the type of interactions and their geometry is construction of such a model for the muscarinic
represented by the site in some reasonable m l receptor with constraints derived from steric
manner. An individual fragment such as an mapping of muscarinic agonists. By adding the
indole ring from tryptophan does a good job of experimental constraints from ligand binding, a
simulating a flat hydrophobic surface. Holtje qualitative model was derived that was able to
and Tintelnot (472) constructed a site for reproduce experimentally derived stereoselec-
chloramphenicol from arginine and histidine tivities.
4 Unknown Receptors
4.4.7 Assessment of Model Predictability. What appears crucial to such studies is the
Because it is unlikely that there will be suffi- choice of training set, which encompasses as
cient structure-activity data to uniquely de- much of parameter space as one is likely to use
fine a model at atomic resolution in competi- in the predictive mode as well as tests of the
tion with crystallography, justification for predictive ability of resulting models. Given
model building must come from its potential that one is dealing with a situation in which
predictive power and possible insight into the the number of variables is larger (often several
receptor-drug interaction before detailed times) than the number of observations, lin-
ear regression models are not applicable be-
three-dimensional information from either
cause chance correlations are highly probable.
crystal structure or NMR studies. Certainly,
The use of cross-validation allows selection of
the questions regarding the ability of a pro-
correlations that are predictive in a self-con-
posed drug to bind to the active site without sistent manner within the training set. This
steric conflict with the receptor can be ad- does not mean to imply that such internally
dressed by the methods outlined above in a self-consistent models have predictive power
qualitative manner. The resolution of our re- outside of the training set, or extremely close
ceptor models is too crude, however, to subject congeners.
them to molecular mechanics estimates of af- DePriest et al. (483, 484) applied the
finities. There are alternative paradigms, CoMFA methodology to a series of 68 ACE
however, based on pattern recognition tech- (angiotensin-converting enzyme) inhibitors
niques in which a set of analogs and their representing 28 different chemical classes.
activities are used, along with their physico- Through use of the binding-site geometry de-
chemical parameters, to generate a mathe- termined by Mayer et al. (397), a CoMFA
matical model that relates the values of the model with a statistically significant cross-val-
physicochemical parameters for a given ana- idated R 2 and considerable predictive ability
log with its activity. One such paradigm is for inhibitors outside of the training set was
comparative molecular field analysis (CoMFA), derived. Because the geometry of the ACE in-
which combines the three-dimensional elec- hibitors was determined computationally by
trostatic and steric fields surrounding the an- an active-site analysis rather than experimen-
alogs with powerful statistical techniques, tally, a comparison of the results of the ACE
partial least squares (PLS) (477) and cross- series against thermolysin inhibitors, for '
validation, to generate predictive models if a which there were crystallographic data to ex-
set of orientation rules are available for align- plicitly define the binding-site geometry and
ing the molecules for comparison and predic- the resulting alignment rules, was made,
tion. Alternative methods for assessing simi- given that thermolysin is also a zinc-contain-
larity and their use in QSAR schemes have ing metallopeptidase with numerous similari-
been compared (215) with CoMFA. Another ties between ACE and thermolysin. Their re-
approach is the use of neural nets that learn to sults give strong support to both the Active
"see" patterns in much the same way as our Analog Approach (398) used to define the
own nervous system processes information. alignment rule for the ACE series and the
Examples of the use of this pattern-recogni- CoMFA methodology itself. In the absence of
tion approach include classification of mecha- an experimentally known active-site geome-
nism of action for cancer chemotherapy (478) try, correlations were derived that explain as
and QSAR studies of DHFR inhibitors (479, much as 84% of the variance in activities
480) and carboquinones (481). Machine learn- among a set of 68 diverse ACE inhibitors by
ing has also been applied (482) to the QSAR use of CoMFA steric and electrostatic poten-
problem. Trimethoprim analogs were success- tials plus a zinc indicator variable (Fig. 3.50).
fully analyzed for their inhibition of DHFR If the set of 68 ACE inhibitors was divided into
and similar results to the original Hansch re- three classes and correlations are derived for
sults were obtained. It is not clear that this each class, CoMFA parameters alone explain
paradigm could be applied to noncongeneric 79-99% of the variance in activities. It was
series, at least as outlined. notable that statistically significant correla-
Molecular Modeling in Drug Design
9-
8-
7-
0
..
u
tions were found, in spite of the fact that predictive r 2 = (SD - "press")/SD
CoMFA does not explicitly consider hydropho-
bicity or solvation. In further support of the where SD is the sum of the squared deviations
active-site paradigm, the cross-validated re- between the affinities of molecules in the
sults of the ACE series were equivalent to test set and the mean affinity of the training
those of the thermolysin series (cross-vali- set molecules, and "press" is the sum of the
dated R 2 = 0.65 to 0.70), for which the align- squared deviations between predicted and ac-
ment rule was defined by crystallographic tual affinity values for every molecule in the
data.
test set. It should be obvious from the equa-
The predictions for molecules outside the
tion that prediction of the mean value of the
training sets are a valid test of the predictive
ability of the model, rather than just a confir- training set for each member of the test set
mation of self-consistency of the derived would yield a predictive r 2 = 0.35 out of the 66
model. In other words, statistical analysis predicted molecules had residuals less than
alone does not answer the question of a chance one log value with a predictive r 2 value for the
correlation (485) for the training set. One collective set of these 35 test molecules of 0.90.
must investigate lateral correlations such as Of the 31 inhibitors with residuals greater
predictability. The predictive correlations pre- than 1.0, 8 were carboxylates, 12 were phos-
sented by DePriest et al. (483;484) represent a phates, and 11 were thiols. Clearly, no single
total of 66 diverse inhibitors that were not class of inhibitors dominated the distribution
chosen as analogs of compounds present in the of residuals. Considering both the composition
training set, but by selecting published papers and the method of selection of the test data
on three different chemical classes and testing sets (range of activities over 7 log units), the
all compounds in the papers [predictive r 2 = fact that more than 50% of the molecules were
0.46 for the set of 66 compounds predicted, predicted with correlations greater than r2 =
which had not been included in the training 0.90 lends strong support to the use of CoMFA
set for the ACE model with a zinc indicator of as a tool for QSAR development.
10 (Fig. 3.5111. The "predictive" r 2 was based Use of CoMFA as a predictive tool for recep-
only on molecules not included in the training tors of known three-dimensional structure
set and was defined as has also been explored. Klebe and Abraham
5 Conclusions
Diverse 20
A Thiols
Carboxylates
0 Phosphates
I I
I I
I I
I I I I
4 5 6 7 8 9 10
Actual (plC50)
Figure 3.51. Plot of experimental versus predicted inhibition constants for 35 ACE inhibitors not
used in derivation of CoMFA model (484). This plot indicates the predictability of the model. Used
with permission.
(486)used two enzymes (thermolysin and re- dictions from this CoMFA model of HIV pro-
nin) as well as antiviral activity against tease are being used to prioritize synthesis of de
human rhinovirus, where the coat-protein re- novo-designed HIV-protease inhibitors not in-
ceptor is known, to calibrate CoMFA method- cluded in development of the model.
ology. They concluded that only enthalpies of Crippen developed a method (488) to objec-
binding and not binding affinities were pre- tively model the binding of small ligands to .
dicted by CoMFA. Waller et al. (264)developed receptors, given the experimentally deter-
a predictive CoMFA model for the binding af- mined affinities of a set of ligands. The proce-
finities of HIV-protease inhibitors based on dure, Vorom, used Voronoi polyhedra to gen-
crystal structures of complexes. Initial analy- erate the simplest geometrical model of the
sis of the 59 molecules in the training set binding site. In a recent application to DHFR
representing five structurally diverse classes inhibitors (4891, only eight analogs were used
(hydroxyethylamine, statine, norstatine, keto- in the training set to derive the model and the
amide, and dihydroxyethylene) of transition- affinities of 23/39 of the test set molecules
state protease inhibitors yielded a correlation were correctly predicted, with an average rel-
with a cross-validated r2 value of 0.786. To ative error of 0.83 kcal/mol for the remaining
evaluate the predictive ability of this model, a compounds.
test set of 18 additional inhibitors (487) was
used that represented another class of transi-
tion-state isostere, hydroxyethylurea. The 5 CONCLUSIONS
model expressed good predictive ability for the
test set of hydroxyethylurea compounds Rapid advances in molecular and structural
,?
(,, = 0.624) with all compounds predicted biology have provided ample therapeutic tar-
within 1.06 log unit (1.4 kcdmol in binding af- gets characterized in three dimensions. Tools
finity) of their actual activities, with an average to exploit this information are being rapidly
absolute error of 0.58 log units (0.8 kcal/mol) developed and several strategies for de novo
mom a range of 3.03 log units (Fig. 3.52). Pre- design of ligands, given an active site, are un-
Molecular Modeling in Drug Design
der investigation. It is already clear, however, The game of 20 questions with receptors
that iterative approaches are necessary be- has progressed with experience. Ambiguity in
cause of the lack of precision in predicting af- interpretation of results and multiple models
finities for bound ligands. Molecular mechan- clearly
" reflect the uncertainties inherent in
ics and computer graphics are essential this indirect approach. Nevertheless, the ab-
components for design of novel ligands, and sence of direct experimental data in many bi-
rapid progress in evolving a useful set of tools ological systems of intense therapeutic inter-
is apparent. est make this the only game available for
The ultimate goal in comparison of mole- many. It is hoped that the next decade will gee
cules with respect to their biological activity is further progress in our ability to extract three-
insight into the receptor and its requirements dimensional information from structure-ac-
for recognition and activation. Conjecture re- tivity studies on unknown receptors.
garding the receptor is often a necessary part This perspective has examined the ap-
of rationalizing a set of structure-activity proaches to molecular modeling and drug de-
data. Although the problem of characterizing sign and emphasized their limitations. The
the active site of an unknown macromolecule reader should be aware. however. that these
indirectly is certainly challenging, the analy- tools are daily used on many problems of ther-
sis of structure-activity data of a set of ligands, apeutic interest with increasing success. This
especially if their structural variety is wide, is clearly witnessed by publications of such
allows useful models of active sites to be devel- studies in almost every issue of current major
oped. There are numerous caveats that must journals. For specific application areas, such
be acknowledged, however, such as flexibility as RNA (490, 491), DNA (492-496), mem-
of the receptor, multiple binding modes for li- brane (497-5071, or peptidomimetic modeling
gands, and lack of uniqueness of most models (382, 508-513), the reader is referred to the
because of limited experimental observations. literature. The prediction of molecular prop-
Success in using these methods would appear erties, such as log P and correlation between
to be increasing. This reflects both technolog- substructures and metabolism. has led to a
ical advances as well as insight into the prob- dramatic increase in efforts to correlate ad-
lem and algorithmic improvements in our an- sorption, distribution (514), metabolism (515-
alytical approaches. 5171, and elimination (ADME) with chemical
References
30. A. Vedani and D.W . Huhta, J. Am. Chem. Soc., 55. W . G. Richards, P. M. King, and C. A. Reynolds,
112,4759-4767 (1990). Protein Eng., 2, 319-327 (1987).
31. V . S. Allured, C. M. Kelly, and C. R. Landis, 56. R. A. Pierotti, Chem. Rev., 76,717-726 (1976).
J. Am Chem. Soc., 113, 1-12 (1991). 57. G. L. Pollack, Science, 251, 1323-1330 (1991).
32. A. E. Carlsson, Phys. Rev. Lett., 81, 477-480 58. R. J. Zauhar and R. S. Morgan, J. Comput.
(1998). Chem., 9,171-187 (1988).
33. A. E. Carlsson and S. Zapata, Biophys. J., 81, 59. J. Tomasi, R. Bonaccorsi, R. Cammi, et al.,
1-10 (2001). Theochem. J. Mol. Struct., 80,401-424 (1991).
34. A. T . Hagler, E. Hugler, and S. Lifson, J. Am. 60. D. A. Liotard, G. D. Hawkins, G. C. Lynch, C. J.
Chem. Soc., 96,5319 (1974). Cramer, and D. G. Truhlar, J. Comput. Chem.,
35. S. C. Harvey, Proteins, 5, 78-92 (1989). 16,422-440 (1995).
36. M. E. Davis and J. A. McCammon, Chem. Rev., 61. K. Sharp, J. Comput. Chem., 12, 454-468
90,509-521 (1990). (1991).
37. W . F. van Gunsteren and H. J. C. Berendsen, 62. W . C. Still, A. Tempczyk, R. C. Hawley, and T .
Angew. Chem. Znt. Ed. Engl., 29, 992-1023 Hendrickson, Chem. Soc., 112, 6127-6129
(1990). (1990).
38. C. E. Dykstra, Chem. Rev., 93, 2339-2353 63. C. A. Schiffer,J. W . Caldwell, P. A. Kollman,
(1993). and R. M. Stroud, Mol. Simul., 10, 121-149
39. A. J. Stone and M. Alderton, Mol. Phys., 56, (1993).
1047-1064 (1985). 64. P. F. W . Stouten, C. Frommel, H. Nakamura,
40. M. J. Dudek and J. W . Ponder, J. Comput. and C. Sander, Mol. Simul., 10,97-120 (1993).
Chem., 16,791-816 (1995). 65. R. J. Zauhar, J. Comput. Chem., 12, 575-583
41. C. I. Bayly, P. Cieplak,W . D. Cornell, and P. A. (1991).
Kollman, J. Phys. Chem., 97, 10269-10280 66. A. J. Stone, Mol. Phys., 56, 1065-1082 (1985).
(1993). 67. S. Kuwajima and A. Warshel, J. Phys. Chem.,
42. D. E. Williams in K. B. Lipkowitz and D. B. 94,460-466 (1990).
Boyd, Eds., Revisions in Computational Chem- 68. R. A. Sorensen, W . B. Liau, L. Kesner, and
istry, VCH, New York, 1991, pp. 219-271. R. H. Boyd, Macromolecules, 21, 200-208
43. R. J. Loncharich and B. R. Brooks, Proteins, 6, (1988).
32-45 (1989). 69. J. Caldwell, L. X . Dang, and P. A. Kollman,
44. J. Guenot and P. A. Kollman, J. Comput. J. Am. Chem. Soc., 112,9144-9147 (1990):
Chem., 14,295-311 (1993). 70. C. J. Cramer and D. G. Truhlar, J. Am. Chem.
45. K. Tasaki, S. McDonald, and J. W . Brady, SOC.,113,8305-8311 (1991).
J. Comput. Chem., 14,278-284 (1993). 71. C. J. Cramer, J. Am. Chem. Soc., 113, 8552-
46. J. Shimada, H. Kaneko, and T . Takada, 8554 (1991).
J. Comput. Chem., 14,867-878 (1993). 72. C. J. Cramer and D. G. Truhlar, Science, 256,
47. H. Schreiber and 0. Steinhauser, Chem. Phys., 213-217 (1992).
168, 75-89 (1992). 73. C. J. Cramer and D. G. Truhlar, J. Comput.
48. H. Schreiber and 0. Steinhauser, Biochemis- Chem., 13,1089-1097 (1992).
t ~31,5856-5860
, (1992). 74. G. Rauhut, T . Clark, and T . Steinke, J. Am.
49. P. E. Smith and B. M. Pettit, J. Chem. Phys., Chem. Soc., 115,9174-9181 (1993).
95,8430-8441 (1991). 75. F. Franks i n F. Franks, Ed., Water, A Compre-
50. G. E. Marlow, J. S. Perkyns, and B. M. Pettit, hensive Treatise, Vol. 1, Plenum Press, New
Chem. Rev., 93,2503-2521 (1993). York, 1975.
51. M. Whitlow and M. M. Teeter, J. Am. Chem. 76. F. H. Stillinger, Science, 209,451-457 (1980).
SOC.,108,7163-7172 (1986). 77. L. R. Pratt, Ann. Rev. Phys. Chem., 36, 433-
52. M. K. Gilson, K. A. Sharp, and B. H. Honig, 449 (1985).
J. Comput. Chem., 9,327435 (1987). 78. J. P. M. Postma, H. J. C. Berendsen, and J . R.
53. A. Nicholls and B. Honig, J. Comput. Chem., Haak, Faraday Symp. Chem. Soc., 17, 55-67
12,435-445 (1991). (1982).
54. M. Schaefer and C. Froemmel, J. Mol. Biol., 79. B. G. Rao and U . C. Singh, J. Am. Chem. Soc.,
216,1045-1066 (1990). 111,31253133 (1989).
References
80. I. Ohmine and H. Tanaka, Chem. Rev., 93, 104. C. M. W. Ho and G. R. Marshall, J. Cornput.-
2545-2566 (1993). Aided Mol. Des., 7,623-647 (1993).
81. W. L. Jorgensen, J. Gao, and C. Ravimohan, J. 105. A. W. R. Payne and R. C. Glen, J. Mol. Graph-
Phys. Chem., 89,34703473 (1985). ics, 11, 74-91 (1993).
82. N. Muller, Trends Biochem. Sci., 17,459-463 106. H. A. Scheraga in K. B. Lipkowitz and D. B.
(1992). Boyd, Eds., Revisions in Computational Chem-
83. L. X. Dang, J. E. Rice, J. Caldwell, and P. A. istry, VCH, New York, 1992, pp. 73-142.
Kollman, J. Am. Chem. Soc., 113, 2481-2486 107. T. Schlick in K. B. Lipkowitz and D. B. Boyd,
(1991). Eds., Revisions in Computational Chemistry,
84. A. K. Rappe and W. A. Goddard 111, J. Phys. VCH, New York, 1992, pp. 1-71.
Chem., 95,3358-3363 (1991). 108. D. D. Beusen, E. F. B. Shands, S. F. Karasek,
85. R. Czerminski and R. Elber, Int. J. Quantum G. R. Marshall, and R. A. Dammkoehler,
Chem. Quantum Chem. Symp., 24, 167-186 THEOCHEM, 370, 157-171 (1996).
(1990). 109. H. Iijima, J. B. Dunbar, Jr., and G. R. Marshall,
86. C. Choi and R. Elber, J. Chem. Phys., 94,751- Proteins, 2 , 3 3 0 3 3 9 (1987).
760 (1991). 110. I. Motoc, R. A. Dammkoehler, and G. R. Mar-
87. S. E. Huston and G. R. Marshall, Biopolymers, shall in N. Trinajstic, Ed., Mathematic and
34, 74-90 (1994). Computational Concepts in Chemistry, Ellis
88. R. V. Pappu, R. K. Hart, and J. W. Ponder, J. Honvood, Chichester, UK, 1986, pp. 222-251.
Phys. Chem. B, 102,9725-9742 (1998). 111. R. A. Dammkoehler, S. F. Karasek, E. F. B.
89. L. Piela, Collect. Czech. Chem. Commun., 63, Shands, and G. R. Marshall, J. Cornput.-Aided
1368-1380 (1998). Mol. Des., 3, 3-21 (1989).
90. R. K. Hart, R. V. Pappu, and J. W. Ponder, 112. N. Go and H. A. Scheraga, Macromolecules, 3,
J. Comput. Chem., 21,531-552 (2000). 178-187 (1970).
91. M. Saunders, K. N. Houk, Y.-D. Wu, W. C. Still, 113. A. R. Leach in K. B. Lipkowitz and D. B. Boyd,
M. Lipton, G. Chang, and W. C. Guida, J. Am. Eds., Revisions in Computational Chemistry,
Chem. Soc., 112,1419-1427 (1990). VCH, New York, 1991, pp. 1-55.
92. M. Saunders, J. Am. Chem. Soc., 109, 3150- 114. S. K. Burt and J. Greer, Ann. Rep. Med. Chem.,
3152 (1987). 23,285-294 (1988).
93. J. T. Ngo and M. Karplus, J. Am. Chem. Soc., 115. D. M. Ferguson and D. J. Raber, J. Am. Chem.
119,56575667 (1997). SOC., 111,4371-4378 (1989).
94. R. V. Pappu, G. R. Marshall, and J. W. Ponder, 116. M. Saunders, J. Comput. Chem., 10, 203-208
Nut. Struct. Biol., 6 , 5 0 6 5 (1999). (1989).
95. K. R. Mackenzie, J. H. Prestegard, and D. M. 117. M. Saunders, J. Comput. Chem., 12, 645-663
Engelman, Science, 276, 131-133 (1997). (1991).
96. J. H. Holland, Sci. Am., July,66-72 (1992). 118. M. Saunders and H. A. Jimenez-Vazquez,
97. S. Forrest, Science, 261,872-878 (1993).
J. Comput. Chem., 14,330-348 (1993).
98. P. Willett, Trends Biotechnol., 13, 516-521 119. M. Saunders and N. Krause, J. Am. Chem.
(1995). SOC.,112,1791-1795 (1990).
99. J. E. Devillers, Genetic Algorithms in Molecu- 120. A. V. Shah and D. P. Dolata, J. Cornput.-Aided
lar Modeling, Academic Press, New York, Mol. Des., 7, 103-124 (1993).
1996. 121. I. Kolossvary and W. C. Guida, J. Am. Chem.
100. D. B. McGarrah and R. S. Judson, J. Comput. SOC.,115,2107-2119 (1993).
Chem., 14,1385-1395 (1993). 122. H A . Boehm, G. Klebe, T. Lorenz, T. Mietzner,
101. R. S. Judson, Y. T. Tan, E. Mori, C. Melius, and L. Siggel, J. Comput. Chem., 11, 1021-
E. P. Jaeger, A. M. Treasurywala, and A. Ma- 1028 (1990).
thiowetz, J. Comput. Chem., 16, 1405-1419 123. A. E. Howard and P. A. Kollman, J. Med.
(1995). Chem., 31,1669-1675 (1988).
102. R. P. Meadows and P. J. Hajduk, J. Biomol. 124. M. Lipton and W. C. Still, J . Comput. Chem., 9,
NMR, 5,41-47 (1995). 343-355 (1988).
103. B. Waszkowycz, D. E. Clark, D. Frenkel, J. Li, 125. D. D. Beusen, R. D. Head, J . D. Clark, W. C.
C. W. Murray, B. Robson, and D. R. Westhead, Hutton, U. Slomczynska, J. Zabrocki, M. T.
J. Med. Chem., 37,3994-4002 (1994). Leplawy, and G. R. Marshall in C. H. Schnei-
Molecular Modeling in Drug Design
der and A. N. Eberle, Eds., The Solution NMR Eds., Advances in Biomolecular Simulations,
Structures of Emerimicins III and N Deter- American Institute of Physics Conference Pro-
mined Using the New Program, MACROSE- ceedings No. 239, Obernai, France, 1991, pp.
ARCH, ESCOM Scientific, Leiden, Nether- 174-199.
lands, 1993, pp. 79-80. 147. M. L. Smythe, S. E. Huston, and G. R. Mar-
126. M. P. Allen and D. J. Tildesley, Computer Sim- shall, J. Am. Chem. Soc., 115, 11594-11595
ulation of Liquids, Oxford Science Publica- (1993).
tions, Oxford, UK, 1989, p. 385. 148. M. L. Smythe, S. E. Huston, and G. R. Mar-
127. N. Metropolis, A. W. Rosenbluth, M. N. Rosen- shall, J. Am. Chem. Soc., 117, 5445-5452
bluth, A. H. Teller, and E. Teller, J. Chem. (1995).
Phys., 21, 1087 (1953). 149. G. H. Loew and S. K. Burt in C. A. Ramsden,
128. J. A. McCammon and S. C. Harvey, Dynamics Ed., Quantitative Drug Design, Pergamon
of Protein and Nucleic Acids, Cambridge Uni- Press, Oxford, UK, 1990, pp. 105-123.
versity Press, Cambridge, UK, 1987, p. 234. 150. S. L. Price and N. G. J. Richards, J. Cornput.-
129. G. Zhang and T. Schlick, J. Comput. Chem., Aided Drug Des., 5,41-54 (1991).
14,1212-1233 (1993). 151. U. C. Singh and P. A. Kollman, J. Comput.
130. T. Schlick and W. K. Olson, Science, 257, Chem., 5, 129 (1984).
1110-1115 (1992).
152. B. H. Besler, K. M. Merz, Jr., and P. A. Koll-
131. D. S. Goodsell and A. J. Olson, Proteins, 8,195- man, J. Comput. Chem., 11,431-439 (1990).
202 (1990).
153. G. Rauhut and T. Clark, J. Comput. Chem., 14,
132. W. L. Jorgensen, Acc. Chem. Res., 22,184-189 503-509 (1993).
(1989).
154. J. G. Vinter and M. R. Saunders in D. J. Chad-
133. D. L. Beveridge and F. M. DiCapua in W. van
wick and K. Widdows, Eds., Host-Guest Molec-
Gunsteren and P. K. Weiner, Eds., Computer
ular Interactions: From Chemistry to Biology,
Simulation of Biomolecular Systems, ESCOM
John Wiley & Sons, Chichester, UK, 1991, pp.
Science, Leiden, Netherlands, 1989, pp. 1-26.
249-265.
134. P. Kollman, Chem. Rev., 93,2395-2417 (1993).
155. C. A. Hunter and J. K. M. Sanders, J. Am.
135. W. L. Jorgensen, J. Phys. Chem., 87, 5304- Chem. Soc., 112,5525-5534 (1990).
5314 (1983).
156. U. Dinur and A. T. Hagler in K. B. Lipkowitz
136. P. A. Bash, U. C. Singh, F. K. Brown, R. Lan-
and D. B. Boyd, Eds., Revisions in Computa-
gridge, and P. A. Kollman, Science, 235,574-
tional Chemistry, VCH, New York, 1991, fip.
576 (1987).
99-164.
137. P. A. Kollman and K. M. Merz, Acc. Chem.
Res., 23, 246-252 (1990). 157. J. Pranata, S. G. Wierschke, and W. I. Jor-
gensen, J. Am. Chem. Soc., 113, 2810-2819
138. T. P. Lybrand, J. A. McCammon, and G. Wipff, (1991).
Proc. Natl. Acad. Sci. USA, 83, 833-835
(1986). 158. J. Tirado-Rives and W. L. Jorgensen, J. Am.
Chem. Soc., 112,2773-2781 (1990).
139. J. Hermans, R. H. Yun, and A. G. Anderson,
J. Comput. Chem., 13,429-442 (1992). 159. A. Alex and T. Clark, J. Comput. Chem., 13,
140. J. Hermans, Curr. Opin. Struct. Biol., 3, 270- 704-717 (1992).
276 (1993). 160. J. Aqvist and A. Warshel, Chem. Rev., 93,
141. R. Elber and M. Karplus, J. Am. Chem. Soc., 2523-2544 (1993).
112,9161-9175 (1990). 161. M. J. Field, P. A. Bash, and M. Karplus,
142. D. J. Tobias, J. E. Mertz, and C. L. Brooks 111, J. Comput. Chem., 11,700-783 (1990).
Biochemistry, 30,6054-6058 (1991). 162. A. Warshel, Computer Modeling of Chemical
143. D. J. Tobias and C. L. Brooks 111,Biochemistry, Reactions in Enzymes and Solutions, John
30,6059-6070 (1991). Wiley & Sons, New York, 1991, p. 236.
144. D. J. Tobias, S. F. Sneddon, and C. L. Brooks 163. V. Daggett, S. Schroder, and P. Kollman,
111, J. Mol. Biol., 216, 783-796 (1990). J. Am. Chem. Soc., 113,8926-8935 (1991).
145. S. F. Sneddon, D. J. Tobias, and C. L. Brooks 164. P. R. Andrews and D. A. Winkler in G. Jolles
111, J. Mol. Biol., 209, 817-820 (1989). and K. R. H. Wooldridge, Eds., Drug Design:
146. D. J. Tobias, S. F. Sneddon, and C. L. Brooks Fact or Fantasy?, Academic Press, New York,
I11 in R. Lavery, J.-L. Rivail, and J. Smith, 1984, pp. 145-174.
References
165. J. E. Eksterowicz and K. N. Houk, Chem. Rev., 179. G.Otting, Cum. Opin. Struct. Biol., 3,760-768
93,2439-2461(1993). (1993).
166. K.Appelt, R. J. Bacquet, C. A. Bartlett, C. L. J. 180. S. 0. Smith, Curr. Opin. Struct. Biol., 3, 755-
Booth, S. T. Freer, M. A. M. Fuhry, M. R. Geh- 759(1993).
ring, S. H. Herrmann, E. F. Howland, C. A. 181. M. F. Perutz, G. Fermi, D. J:Abraham, C. Po-
Janson, T. R. Jones, C.-C. Kan, V. Kathard- yart, and E. Bursa-, J. Am. Chem. Soc., 108,
ekar, K. K. Lewis, G. P. Marzoni, D. A. 1064-1078 (1986).
Mathews, C. Mohr, E. W. Moomaw, C. A. 182. A.S. Mehanna and D. J. Abraham, Biochemis-
Morse, S. J. Oatley, R. C. Ogden, M. R. Reddy, try, 29,3944-3954(1990).
S. H. Reich, W. S. Schoettin, W. W. Smith,
M. D. Varney, J. E. Villafranca, R. W. Ward, S.
183. I. D. Kuntz, J. M. Blaney, S. J. Oatley, R. Lan-
gridge, and T. E. Ferrin, J. Mol. Biol., 161,269
Webber, S. E. Webber, K. M. Welsh, and J.
(1982).
White, J. Med. Chem., 34, 1925-1934 (1991).
184. R. Voorintholt, M. T. Kosters, G. Vegter, G.
167. P. J. Goodford, J. Med. Chem., 27, 557-564 Vriend, and W. G. J. Hol, J. Mol. Graphics, 7,
(1984). 243-245(1989).
168. C. R. Beddell, Chem. Soc. Rev., 13, 279-319 185. C. M. W. Ho and G. R. Marshall, J. Cornput.-
(1984). Aided Mol. Des., 4,337454(1990).
169. R. Wootton in C. R. Beddell, Ed., The Design of 186. P. J. Goodford, J. Am. Chem. Soc., 28, 849-
Drugs to Macromolecular Targets, John Wiley 856(1985).
& Sons, New York, 1992,pp. 49-83.
187. R. D. Cramer 111, D. E. Patterson, and J. D.
170. L.F. Kuyper, B. Roth, D. P. Baccanari, R. Fer- Bunce, J. Am. Chem. Soc., 110, 5959-5967
one, C. R. Beddell, J. N. Champness, D. K. (1988).
Stammers, J. G. Dann, F. E. Norrington, D. J.
188. R. D. Cramer I11 and M. Milne, The Lattice
Baker, and P. J. Goodford, J. Med. Chem., 28,
Model: A General Paradigm for Shape-Related
303-311 (1985).
Structure/Activity Correlation, in Proceedings
171. K. Appelt, J. Cornput.-Aided Mol. Des., 1, of the 19th National Meeting of the American
23-48(1993). Chemical Society, American Chemical Society,
172. M.von Itzstein, W.-Y. Wu, G. B. Kok, M. S. Washington, DC, 1979.
Pegg, J. C. Dyason, B. Jin, T. V. Phan, M. L. 189. A. Miranker and M. Karplus, Proteins, 11,
Smythe, H. E. White, S. W. Oliver, P. M. Col- 29-34(1991).
man, J. N. Varghese, D. M. Ryan, J. M. Woods,
R. C. Bethell, V. J. Hotham, J. M. Cameron,
190. A. CafIisch, A. Miranker, and M. Karplus, .
J. Med. Chem., 36,2142-2167 (1993).
and C. R. Penn, Nature, 363,418-423 (1993).
191. P. K.Weiner, C. Landridge, J. M. Blaney, R.
173. J. W. Liebeschuetz, S. D. Jones, P. J. Morgan, Schaefer, and P. A. Kollman, Proc. Natl. Acad.
C. W. Murray, A. D. Rimmer, J. M. Roscoe, B. Sci. USA, 79,3754-3758(1982).
Waszkowycz, P. M. Welsh, W. A. Wylie, S. C. 192. S. J. Weiner, P. A. Kollman, D. A. Case, U.C.
Young, H. Martin, J. Mahler, L. Brady, and K. Singh, C. Ghio, G. Alagona, J. S. Profeta, and
Wilkinson, J. Med. Chem., 45, 1221-1232 P. Weiner, J. Am. Chem. Soc., 106, 765-784
(2002). (1984).
174. K.E. Lind, Z. Du, K. Fujinaga, B. M. Peterlin, 193. S. J. Weiner, P. A. Kollman, D. T. Nguyen, and
and T. L. James, Chem. Biol., 9, 185-193 D. A. Case, J. Comput. Chem., 7, 230-252
(2002). (1986).
175. M. Miller, M. Jaskolski, J. K. M. Rao, J. Leis, 194. F. H.Allen, J. E. Davies, J. J. Galloy, 0.John-
and A. Wlodawer, Nature, 337, 576-579 son, 0. Kennard, C. F. Macrea, E. M. Mitchell,
(1989). G. F. Mitchell, J. M. Smith, and D. G. Watson,
176. M. Miller, B.K. Sathyanarayana, A. Wlodawer, J. Chem. Znf. Comput. Sci., 31,187-204(1991).
M. V. Toth, G. R. Marshall, L. Clawson, L. 195. E. E. Abola, F. C. Bernstein, and T. F. Koetzle
Selk, J. Schneider, and S. B. H. Kent, Science, in P. S. Glaeser, Ed., The Role of Data in Sci-
246,1149-1152(1989). entific Progress, Elsevier, New York, 1985.
177. R. L. Stanfield, M. Takimoto-Kamimura, J. M. 196. P. R. Andrews, E. J. Lloyd, J. L. Martin, and
Rini, A. T. Profy, and I. A. Wilson, Structure, 1, S. L. A. Munro, J. Mol. Graphics, 4, 41-45
83-93(1993). (1986).
178. S. W. Fesik, J. Med. Chem., 34, 2938-2945 197. R. S. Pearlman, Chem. Des. Auto. News, 2,1
(1991). (1987).
Molecular Modeling in Drug Design
285. K. P. Clark and Ajay, J. Comput. Chem., 16, 308. F. M. Menger and M. J. Sherrod, J.Am. Chem.
1210-1226 (1995). SOC., 112,8071-8075 (1990).
286. G. M. Verkhivker, P. A. Rejto, D. K. Gehlhaar, 309. D. P. Riley, P. J. Lennon, W. L. Neumann, and
and S. T. Freer, Proteins, 25,342353 (1996). R. H. Weiss, J. Am. Chem. Soc., 119, 6522-
287. D. Q. McDonald and W. C. Still, J. Am. Chem. 6528 (1997).
Soc., 116,11550-11553 (1994). 310. K. Aston, N. Rath, A. Naik, U. Slomczynska,
288. F. Guarnieri and W. C. Still, J.Comput. Chem., 0.F. Schall, and D. P. Riley, Inorg. Chem., 40,
15,1302-1310 (1994). 1779-1789 (2001).
289. D. Q. McDonald and W. C. Still, J. Am. Chem. 311. D. H. Williams, Aldrichimica Acta, 24, 71-80
Soc., 118,2073-2077 (1996). (1991).
290. Z. R. Wasserman and C. N. Hodge, Proteins, 312. A. J. Doig and D. H. Williams, J. Am. Chem.
24,227-237 (1996). SOC., 114,338-343 (1992).
291. J. Desmet, I. A. Wilson, M. Joniau, M. De- 313. M. S. Searle and D. H. Williams, J. Am. Chem.
maeyer, and I. Lasters, FASEB J.,ll,164-172 SOC., 114,10690-10697 (1992).
(1997). 314. M. S. Searle, D. H. Williams, and U. Gerhard,
292. B. L. King, S. Vajda, and C. Delisi, FEBS Lett., J. Am. Chem. Soc., 114,10697-10704 (1992).
384,87-91(1996). 315. D. H. Williams and B. Bardsley, Perspect. Drug
293. D. S. Goodsell, H. Lauble, C. D. Stout, and A. J. Discov. Des., 17,43-59 (1999).
Olson, Proteins, 17, 3-10 (1993). 316. M. Graffner-Nordberg, K. Kolmodin, J.Aqvist,
294. R. X. Wangand S. M. Wang, J.Chem. In$ Com- S. F. Queener, and A. Hallberg, J. Med. Chem.,
put. Sci., 41,1422-1426 (2001). 44,2391-2402 (2001).
295. P. S. Charifson, J. J. Corkery, M. A. Murcko, 317. J. Aqvist, V. B. Luzhkov, and B. 0. Brandsdal,
and W. P. Walters, J. Med. Chem., 42, 5100- Acc. Chem. Rev., 35,358-365 (2002).
5109 (1999). 318. G. M. Verkhivker, D. Bouzida, D. K. Gehlhaar,
296. N. L. Allinger, Z.-q. S. Zhu, and K. Chen, P. A. Rejto, L. Schaffer, S. Arthurs, A. B. Col-
J. Am. Chem. Soc., 114,6120-6133 (1992). son, S. T. Freer, V. Larson, B. A. Luty, T. Mar-
rone, and P. W.Rose, J.Med. Chem., 45,72-89
297. B. R. Brooks, R. E. Bruccoleri, B. D. Olafson, (2002).
D. J. States, S. Swaminathan, and M. Karplus,
J. Comput. Chem., 4, 187-217 (1983). 319. R. Lumry and S. Rajender, Biopolymers, 9,
1125-1227 (1970).
298. F. A. Momany and R. Rone, J. Comput. Chem.,
13,888-900 (1992). 320. I. Muegge and Y. C. Martin, J.Med. hem., 22,
791-804 (1999).
299. G. Nemethy, M. S. Pottle, and H. A. Scheraga,
321. B. A. Grzybowski, A. V. Ishcheno, J. Shimada,
J. Phys. Chem., 87,1883-1887 (1983).
and E. I. Shakhnovich, Acc. Chem. Res., 35,
300. T. A. Halgren, J. Am. Chem. Soc., 114, 7827- 261-269 (2002).
7843 (1992). 322. B. A. Grzybowski, A. V. Ishcheno, C.-Y. Kim,
301. P. S. Charifson, R. G. Hiskey, L. G. Pedersen, G. Topolov, R. Chapman, D. W. Christianson,
and L. F. Kuyper, J. Comput. Chem., 12,899- G. M. Whitesides, and E. I. Shakhnovich, Proc.
908 (1991). Natl. Acad. Sci. USA, 99,1270-1273 (2002).
302. S. C. Hoops, K. W. Anderson, and K. M. Merz, 323. K. M. Merz, Jr., M. A. Murcko, and P. A. Koll-
Jr., J.Am. Chem. Soc., 113,8262-8270 (1991). man, J. Am. Chem. Soc., 113, 4484-4490
303. C. J. Casewit, K. S. Colwell, and A. K. Rappe, (1991).
J. Am. Chem. Soc., 114,10035-10046 (1992). 324. C. F. Wong and J. A. McCammon, J. Am.
304. C. J. Casewit, K. S. Colwell, and A. K. Rappe, Chem. Soc., 108,3830-3832 (1986).
J. Am. Chem. Soc., 114,10046-10053 (1992). 325. L. M. Hansen and P. A. Kollman, J. Comput.
305. A. K. Rappe, C. J. Casewit, K. S. Colwell, W. A. Chem., 11,994-1002 (1990).
Goddard 111, and W. M. Skiff, J. Am. Chem. 326. B. G. Rao, R. F. Tilton, and U. C. Singh, J.Am.
SOC., 114,10024-10035 (1992). Chem. Soc., 114,4447-4452 (1992).
306. Y.-D. Wu and K. N. Houk, J. Am. Chem. Soc., 327. W. E. Harte, Jr. and D. L. Beveridge, J. Am.
114,1656-1661 (1992). Chem. Soc., 115,3883-3886 (1993).
307. K. Houk, J. A. Tucker, and A. Dorigo, Acc. 328. D. M. Ferguson, R. J. Radmer, and P. A. Koll-
Chem. Res., 23,107-113 (1990). man, J. Med. Chem., 34,2654-2659 (1991).
J. J. McDonald and C. L. Brooks 111, J. Am. 348. L. M. H. Koyrnans, N. P. E. Vermeulen, A.
Chem. Soc., 114,2062-2072 (1992). Baarslag, and G. M. Donne-op den Kelder,
T . P. Lybrand and J. A. McCammon, J. Com- J. Cornput.-Aided Mol. Des., 7,281-289 (1993).
put.-Aided Mol. Des., 2,259-266 (1988). 349. R. E. Bruccoleri and M. Karplus, Biopolymers,
W . R. Cannon, J. D. Madura, R. P. Thummel, 26,137-168 (1987).
and J . A. McCammon, J. Am. Chem. Soc., 115, 350. D. Jones and J. Thornton, J. Cornput.-Aided
879-884 (1993). Mol. Des., 7,439-456 (1993).
S. Yun-yu,A. E. Mark, W . Cun-Xin, H. Fuhua, 351. J. U. Bowie and D. Eisenberg, Curr. Opin.
J. C. Berendsen, and W . F. van Gunsteren, Struct. Biol., 3,437-444 (1993).
Protein Eng., 6, 289-295 (1993). 352. S. J. Wodak and M. J. Rooman, Curr. Opin.
D. H . Rich, C.-Q. Sun, J. V . N. Vara Prasad, Struct. Biol., 3, 247-259 (1993).
M . V . Toth, G. R. Marshall, P. Ahammadunny,
353. M. J. Sippl, J. Cornput.-Aided Mol. Des., 7,
M . D. Clare, R. D. Mueller, and K. Houseman,
473-501 (1993).
J. Med. Chem., 34,1222-1225 (1991).
P. De La Paz, J. M. Burridge, S. J. Oatley, and 354. S. Miyazawa and R. C. Jernigan, Macromole-
C. C. F. Blake i n C. R. Beddell, Ed., The Design cules, 18, 534-552 (1985).
of Drugs to Macromolecular Targets, John 355. S. H. Bryant and C. E. Lawrence, Proteins, 16,
Wiley & Sons, New York, 1992, pp. 119-172. 92-112 (1993).
A. B. Edmundson, J. N. Herron, K. R. Ely, 356. D. Frishman and H. W . Mewes, Nut. Struct.
X.-M. He, D. L. Harris, and E. W . Voss, Jr., Biol., 4,626-628 (1997).
Philos. Trans. R. Soc. Lond. Biol., 323, 495- 357. C. Chothia, Nature, 357,543-544 (1992).
509 (1989).
358. P.-A. Lindgard and H. Bohr in H. Bohr and S.
C. Bihoreau, C. Monnot, E. Davies, B. Teutsch, Bunak, Eds., Protein Folds, CRC Press, Boca
K. E. Bernstein, P. Corvol, and E. Clauser, Raton, FL, 1996, pp. 98-102.
Proc. Natl. Acad. Sci. USA, 90, 5133-5137
(1993). 359. P. E. Boscott, G. J. Barton, and W . G. Richards,
Protein Eng., 6, 261-266 (1993).
T . M . Fong, R. R. C. Huang, and C. D. Strader,
J. Biol. Chem., 267,25664-25667 (1992). 360. D. Frishman and P. Argos, Proteins, 27, 329-
335 (1997).
U. Gether, T . E. Johansen, R. M. Snider, I.
Lowe, J . A., S. Nakanishi, and T . W . Schwartz, 361. R. Srinivasan and G. D. Rose, Proteins, 22,
Nature, 362,345-348 (1993).
M . F. Hibert, S. Trumpp-Kallmeyer, A. Bruin-
81-99 (1995).
362. K. A. Dill, H. S. Chan, and K. Yue, Macromol.
.
vels, and J. Hoflack,Mol. Pharmacol., 40,8-15 Symp., 98,615-617 (1995).
(1991). 363. K. Y u e and K. A. Dill, Protein Sci., 5,254-261
M . F. Hibert, S. Trumpp-Kallmeyer, J. (1996).
Hoflack,and A. Bruinvels, Trends Pharmacol. 364. S. M. Le Grand and K. L. Merz, Jr. in S. M. Le
Sci., 14, 7-12 (1993). Grand and K. L. Merz, Jr., Eds., The Protein
G. Nordvall and U. Hacksell, J. Med. Chem., Folding Problem and Tertiary Structure Pre-
36,967-976 (1993). diction, Birkhauser, Boston, 1994, pp.
T . L. Blundell, B. L. Sibanda, M. J. E. Stern- 109-124.
berg, and J. M. Thornton, Nature, 326, 347- 365. S. Sun, Protein Sci., 2, 762-785 (1993).
352 (1987). 366. J. U. Bowie and D. Eisenberg, Proc. Natl. Acad.
, L. H. Pearl and W . R. Taylor, Nature, 329, Sci. USA, 91,4436-4440 (1994).
351-354 (1987). 367. J . U. Bowie, K. Zhang, M. Wilmanns, and D.
, I. T. Weber, Proteins, 7, 172-184 (1990). Eisenberg, Methods Enzymol., 266, 598-616
. L. M. Balbes and F. I. Carroll, Med. Chem. (1996).
Res., 1, 283-288 (1991). 368. S. G. Galaktionov and G. Marshall, Molecular
, R. J. Siezen,W . M. de Vos, J. A. M. Leunissen, Graphics and Drug Design: 27th Hawaii Znter-
and B. W . Dijkstra, Protein Eng., 4, 719-737 national Conference on System Sciences, IEEE
(1991). Computer Society Press, Washington, DC,
, J. H. Brown, T . Jardetzky, M. A. Saper, B.
1994.
Samraoui, P. J. Bjorkman, and D. C. Wiley, 369. S. Saitoh, T . Nakai, and K. Nishikawa, Pro-
Nature, 332,845-850 (1988). teins, 15, 191-204 (1993).
Molecular Modeling in Drug Design
370. M. Vendruscolo, E. Kussell, and E. Domany, 387. G. L. Olson, D. R. Bolin, M. P. Bonner, M. Bos,
Fold. Des., 2,295-306 (1997). C. M. Cook, D. C. Fry, B. J. Graves, M. Hatada,
371. I. D. Kuntz, G. M. Crippen, P. A. Kolman, and D. E. Hill, M. Kahn, V. S. Madison, V. K.
D. Kimelman, J. Mol. Biol., 106, 983-994 Rusiecki, R. Sarabu, J. Sepinwall, G. P. Vin-
(1976). cent, and M. E. Voss, J.Med. Chem., 36,3039-
372. S. Galaktionov, G. V. Nikiforovich, and G. R. 3049 (1993).
Marshall, Biopolymers, 60, 153-168 (2001). 388. P. C. Belanger and C. Dufresne, Can. J. Chem.,
373. A. Aszodi and W. R. Taylor, Fold. Des., 1,325- 64,1514-1520 (1986).
334 (1996). 389. R. Hirschmann, K. C. Nicolaou, S. Pietranico,
374. A. Aszodi, R. E. J. Munro, and W. R. Taylor, E. M. Leahy, J. Salvino, B. Arison, M. A. Cichy,
Fold. Des., 2, S3-S6 (1997). P. G. Spoors, W. C. Shakespeare, P. A. Spren-
geler, P. Hamley, A. B. Smith 111, T. Reisine, K.
375. A. Aszodi and W. R. Taylor, Comput. Chern.,
Raynor, L. Maechler, C. Donaldson, W. Vale,
21, 13-23 (1997).
R. M. Friedinger, M. R. Cascieri, and C. D.
376. S. R. Holbrook, I. Dubchak, and S.-H. Kim, Strader, J. Am. Chem. Soc., 115,12550-12568
Biotechniques, 14, 984-989 (1993). (1993).
377. B. H. Park, E. S. Huang, and M. Levitt, J.Mol. 390. J. H. Arevalo, E. A. Stura, M. J. Taussig, and
Biol., 266, 831-846 (1997). I. A. Wilson,J.Mol. Biol., 231,103-118 (1993).
378. G. M. Crippen and V. N. Maiorov in H. Bohr 391. J. H. Arevalo, M. J. Taussig, and I. A. Wilson,
and S. Bunak, Eds., Protein Folds, CRC Press, Nature, 365,859-863 (1993).
Boca Raton, FL, 1996, pp. 189-201. 392. P. Traxler, J. Green, H. Mett, U. Sequin, and
379. P. D. Thomas and K. A. Dill, J. Mol. Biol., 257, P. Furet, J. Med. Chem., 42, 1018-1026
457-469 (1996). (1999).
380. S. Miyazawa and R. L. Jernigan, J. Mod. Biol., 393. P. Traxler, G. Bold, E. Buchdunger, G. Cara-
256,623-644 (1996). vatti, P. Furet, P. Manley, T. O'Reilly, J. Wood,
381. V. J. Hruby, W. Qui, T. Okayama, and V. A. and J . Zimmermann, Med. Res. Rev., 21,499-
Soloshonok, Methods Enzymol., 343, 91-123 512 (2001).
(2002). 394. R. Bureau, C. Daveu, J. C. Lancelot, and S.
382. M. G . Bursavich and D. H. Rich, J. Med. Rault, J. Chem. Znf. Comput. Sci., 42,429-436
Chem., 45,541-558 (2002). (2002).
383. R. Hirschmann, K. C. Niwlaou, S. Pietranico, 395. Y. Kato, A. Itai, and Y. Iitaka, Tetrahedron
J. Salvino, E. M. Leahy, P. A. Sprengeler, G. Lett., 43,5229-5236 (1987). .
Furst, and A. B. Smith 111,J. Am. Chem. Soc., 396. W. H. Moos, C. C. Humblet, I. Sircar, C. Rith-
114,9217-9218 (1992). ner, R. E. Weishaar, J. A. Bristol, and A. T.
384. R. Hirschmann, P. A. Sprengeler, T. Ka- McPhail, J. Med. Chem., 30, 1963-1972
wasaki, J. W. Leahy, W. C. Shakespeare, and (1987).
A. B. Smith 111,J.Am. Chem. Soc., 114,9699- 397. D. Mayer, C. B. Naylor, I. Motoc, and G. R.
9701 (1992). Marshall, J. Cornput.-Aided Mol. Des., 1,3-16
385. T. W. Ku, F. E. Ali, L. S. Barton, J. W. Bean, (1987).
W. E. Bondinell, J. L. Burgess, J. F. Callahan, 398. G. R. Marshall, C. D. Barry, H. E. Bosshard,
R. R. Calvo, L. Chen, D. S. Eggelston, J. S. R. A. Dammkoehler, and D. A. Dunn in E. C.
Gleason, W. F. Huffman, S. M. Hwang, D. R. Olsen and R. E. Christoffersen, Eds., Com-
Jakas, C. B. Karash, R. M. Keenan, K. D. Kop- puter-Assisted Drug Design, American Chem-
ple, W. H. Miller, K. A. Newlander, A. Nichols, ical Society, Washington, DC, 1979, pp. 205-
M. F. Parker, C. E. Peishoff, J. M. Samanen, I. 226.
Uzinskas, and J. W. Venslavsky, J. Am. Chem. 399. R. A. Dammkoehler, S. F. Karasek, E. F. B.
Soc., 115,8861-8862 (1993). Shands, and G. R. Marshall, Constrained
386. G. L. Olson, H.-C. Cheung, M. E. Voss, D. E. Search of Conformational Hyperspace: Seg-
Hill, M. Kahn, V. S. Madison, C. M. Cook, J. mentation and Parallelism, Abstr. 204th ACS
Sepinwall, and G. Vincent, Concepts a n d National Meeting, American Chemical Society,
Progress in the Design of Peptide Mimetics: Washington, DC, 1992.
Beta Turns and Thyrotropin Releasing Hor- 400. G. R. Marshall and R. D. Cramer 111, Trends
mone (Biotechnology USA 1989), Conference Pharmacol. Sci., 9,285-289 (1988).
Management Corporation, Norwald, CT, 1989, 401. R. D. Cramer 111and S. B. Wold, Comp. Mol.
pp. 348-360. Field Anal. (CoMFA), 5,388 (editorial) (1991).
J. R. Sufrin, D. A. Dunn, and G. R. Marshall, 423. S. K. Kearsley, J. Comput. Chem., 11, 1187-
Mol. Pharmacol., 19, 307313 (1981). 1192 (1990).
P. R. Andrews, E. J . Lloyd, J. L. Martin, S. L. 424. G. R. Marshall and C. D. Barry, Functional
Munro, M. Sadek, and M. G. Wong i n A. S. V . Representation of Molecular Volume for Com-
Burgen, G. C. K. Roberts, andM. S. Tute, Eds., puter-Aided Drug Design, Abstr. Amer. Cryst.
Molecular Graphics and Drug Design, Elsevier Assoc., Honolulu, HI, 1979.
Science, Amsterdam, 1986, pp. 216-255. 425. A. J. Hopfinger, J. Med. Chem., 2, 7196-7206
G. Klopman and S. Srivastava, Mol. Pharma- (1980).
col., 37,958-965 (1989). 426. Z. Simon, A. Chiriac, S. Holban, D. Ciubotariu,
G. Klopman and M. L. Dimayuga, J. Cornput.- and G. I. Mihalas, Minimum Steric Difference,
Aided Mol. Des., 4, 117-130 (1990). Research Studies Press, Letchworth, UK,
G. Rum and W . C. Herndon, J. Am. Chem. Soc., 1984.
113,9055-9060 (1991). 427. D. Ciubotariu, E. Deretey, T . I. Oprea, T . I.
Sulea, Z. Simon, L. Kurunczi, and A. Chiriac,
C. Silipo and A.Vittoria in C. A. Ramsden, Ed.,
Quant. Struct.-Act. Relat., 12,367-372 (1993).
Quantitative Drug Design, Pergamon Press,
Oxford,UK, 1990, pp. 153-204. 428. H.-D. Holtje and S. Marrer, J. Cornput.-Aided
Mol. Des., 1,23-30 (1987).
G. M. Crippen in D. Bawden, Ed., Distance Ge-
ometry and Conformational Calculations (Che- 429. A. J. Hopfinger, J. Med. Chem., 26, 990-996
mometrics Research Studies), Vol. 1, John (1983).
Wiley & Sons, Chichester, UK, 1981. 430. S. Namasivayam and P. M. Dean, J. Mol.
D. E. Clark, P. Willett, and P. W . Kenny, J. Graphics, 4,46 (1986).
Mol. Graphics, 10, 194-204 (1992). 431. P. L. Chau and P. M. Dean, J. Mol. Graphics, 5,
97 (1987).
C. A. Pepperrell and P. Willett, J. Cornput.-
Aided Mol. Des., 5,455-474 (1991). 432. C. Burt and W . G. Richards, J. Cornput.-Aided
Mol. Des., 4,231-238 (1990).
A. R. Poirette, P. Willett, and F. H. Allen, J.
Mol. Graphics, 11,2-14 (1993). 433. J. Zabrocki, G. D. Smith, J. B. Dunbar, Jr., H.
Iijima, and G. R. Marshall, J. Am. Chem. Soc.,
G. R. Marshall and C. B. Naylor in C. A. Rams-
110,5875-5880 (1988).
den, Ed., Quantitative Drug Design, Pergamon
Press, Oxford,U K , 1990, pp. 431-458. 434. J. B. Ball, R. A. Hughes, P. F. Alewood, and
P. R. Andrews, Tetrahedron, 49, 34673478
A. Davis, B. H. Warrington, and J. G. Vinter, (1993).
J. Cornput.-Aided Mol. Des., 1,97-120 (1987).
435. J. B. Ball and P. F. Alewood, J. Mol. Recognit.,'
H. Weinstein, R. Osman, S. Topiol, and J. P. 3,55-64 (1990).
Green, Ann. N. Y. h a d . Sci., 367, 434-448
436. G. V . Nikiforovich, K. E. Kover, W . J. Zhang,
(1981).
and G. R. Marshall, J. Am. Chem. Soc., 122,
N. C. Cohen in B. Testa, Ed., Advances in Drug 32623273 (2000).
Research, Academic Press, New York, 1985, 437. J. Labanowski, I. Motoc, C. B. Naylor, D.
pp. 40-144. Mayer, and R. A. Dammkoehler, Quant.
R. C. Wade, K. J. Clark, and P. J. Goodford, Struct.-Act. Relat., 5, 138-152 (1986).
J. Med. Chem., 36,140-147 (1993). 438. S. Naruto, I. Motoc, and G. R. Marshall, Eur.
N. Marchand-Geneste, K. A.Watson, B. K. Als- J. Med. Chem., 20,529-532 (1985).
berg, and R. D. King, J. Med. Chem., 45,399- 439. R. P. Sheridan and R. Venkataraghavan,
409 (2002). J. Cornput.-Aided Mol. Des., 1,243-256 (1987).
, C. Hansch, J. Mcclarin, T . Klein, and R. Lan-
440. E. E. Hodgkin, A. Miller, and M. Whittaker,
gridge, Mol. Pharmacol., 27, 493-498 (1995). J. Cornput.-AidedMol. Des., 7,515-534 (1993).
C. Hansch, T. Klein, J. McClarin, R. Lan- 441. M. T . Barakat and P. M. Dean, J. Cornput.-
gridge, and N. W . Cornell, J. Med. Chem., 29, Aided Mol. Des., 4,295-316 (1990).
615-620 (1986). 442. M. T . Barakat and P. M. Dean, J. Cornput.-
, G. E. Kellogg, S. F. Semus, and D. J. Abraham, Aided Mol. Des., 4,317-330 (1990).
J. Cornput.-AidedMol. Des., 5,545-552 (1991). 443. T . D. J. Perkins and P. M. Dean, J. Cornput.-
. G. E. Kellogg and D. J. Abraham, J. Mol. Aided Mol. Des., 7,173-182 (1993).
Graphics, 10,212-217 (1992). 444. I. Motoc, J. Labanowski, C. B. Naylor, D.
, D. J . Danziger and P. M. Dean, J. Theor. Biol., Mayer, and R. A. Dammkoehler, Quant.
116,215-224 (1985). Struct.-Act. Relat., 5, 99-105 (1986).
Molecular Modeling in Drug Design
445. R. D. Nelson, D. I. Gottlieb, T. M. Balasubra- 463. A. W. Schmidt and S. J. Peroutka, Mol. Phar-
manian, and G. R. Marshall in R. S. Rapaka, G. macol., 36,505-511 (1989).
Barnett, and R. L. Hawks, Eds., Opioid Pep- 464. M. L. Connolly, Science, 221,709-713(1983).
tides: Medicinal Chemistry, NIDA Office of 465. M. L. Connolly, J. Appl. Crystallogr., 16, 548-
Science, Rockville, MD, 1986,pp. 204-230. 558(1983).
446. A. K. Ghose and G. M. Crippen, J.Med. Chem., 466. C. E. Kundrot, J. W. Ponder, and F. M. Rich-
27,901-914(1984). ards, J. Comput. Chem., 12,402-409(1991).
447. A. K. Ghose and G. M. Crippen, J.Med. Chem., 467. S. M. Le Grand and K. M. Merz, Jr., J. Comput.
28,333-346(1985). Chem., 14,349-352(1993).
448. A. K. Ghose and G. M. Crippen in C. A. Rams- 468. A. H. Beckett and A. F. Casey, J. Pharm. Phar-
den, Ed., Quantitative Drug Design, Pergamon macol., 6,986-999(1954).
Press, Oxford, UK,1990,pp. 716-733.
469. L. B. Kier and H. S. Aldrich, J. Theor.Biol.,46,
449. A. K. Ghose and G. M. Chippen, Mol. Pharma- 529-541(1974).
col., 37,725-734(1990).
470. L. G. Humber, F. T. Bruderlin, A. H. Philipp,
450. M.R. Linschoten, T. Bultsma, A. P. IJzerman, M. Gotz, and K. Voith, J. Med. Chem., 22,761-
and H. Timmerman, J. Med. Chem., 29,278- 767(1979).
286(1986).
471. G. L. Olson, H. C. Cheung, K. D. Morgan, J. F.
451. G. M. Donne-op den Kelder, J. Cornput.-Aided Blount, L. Todaro, L. Berger, A. B. Davidson,
Mol. Des., 1,257-264(1987). and E. Boff, J. Med. Chem., 24, 1026-1034
452. T. I. Oprea, D. Ciubotariu, T. I. Sulea, and Z. (1981).
Simon, Quant. Struct.-Act. Relat., 12, 21-26 472. H.-D. Holtje and M. Tintelnot, Quant. Strut.-
(1993). Act. Relat., 3,6-9(1984).
453. J. P. Snyder, S. N. Rao, K. F. Koehler, A. Ve- 473. W. C. Probst, L. A. Snyder, D. J. Schuster, J.
dani, and R. Pellicciari in C. G. Wermuth, Ed., Brosius, and S. C. Sealfon, DNA Cell Biol., 11,
Trends in QSAR and Molecular Modelling 92, 1-20(1992).
ESCOM Scientific, Leiden, Netherlands, 1993,
pp. 44-51. 474. S. Trumpp-Kallmeyer, J. Hoflack, A. Bruin-
vels, and M. Hibert, J. Med. Chem., 35,3448-
454. G. Klopman, Quant. Struct.-Act. Relat., 11, 3462(1992).
176-185(1992).
475. D. Timms, A. J. Wilkinson, D. R. Kelly, K. J.
455. I. B. Bersuker and A. S. Dimogo in K. B. Lip- Broadley, and R. H. Davies, Znt. J. Quantum
kowitz and D. B. Boyd, Eds., Revisions in Com- Chem. Quantum Biol. Symp., 19, 197-215
putational Chemistry, VCH, New York, 1991, (1992).
pp. 423-460.
476. D. Zhang and H. Weinstein, J. Med. Chem.,36,
456. Y. C. Martin, M. G. Bures, E. A. Danaher, J. 934-938(1993).
DeLazzer, I. Lico, and P. Pavlik, A., J. Com-
put.-Aided Mol. Des., 7,83-102(1993). 477. B. L. Bush and R. B. Nachbar, Jr., J. Cornput.-
Aided Mol. Des., 7,587-619(1993).
457. G. Jones, P. Willett, and R. C. Glen, J. Mol.
Biol., 245,4343(1995). 478. J. N. Weinstein, K. W. Kohn, M. R. Grever,
V. N. Viswanadhan, L. V. Rubeinstein, A. P.
458. T. I. Oprea and L. Kurunczi in N. Voiculetz, I. Monks, D. A. Scudiero, L. Welch, A. D. Kout-
Motoc, and Z. Simon, Eds., Specific Znterac- soukos, A. J. Chiausa, and K. D. Paull, Science,
tions and Biological Recognition Processes, 258,447-451(1992).
CRC Press, BocaRaton, FL, 1993,pp. 295-326.
479. T. A. Andrea and H. Kalayeh, J. Med. Chem.,
459. W. E. Klunk, B. L. Kalman, J. A. Ferrendelli,
34,2824-2836(1991).
andD. F. Covey, Mol. Pharmacol., 23,511-518
(1982). 480. S.-S. So and W. G. Richards, J. Med. Chem., 35,
460. J. A. Calder, J. A. Wyatt, D. A. Frenkel, and 32014207 (1992).
J. E. Casida, J. Cornput.-Aided Mol. Des., 7, 481. I. V. Tetko, A. I. Luik, and G. I. Poda, J. Med.
45-60 (1993). Chem., 36,811-814(1993).
461. M. F. Hibert, R. Hoffmann, R. C. Miller, and 482. R. D. King, S. Muggleton, R. A. Lewis, and
A. A. Cam, J. Med. Chem., 33, 1594-1600 M. J. E. Sternberg, Proc. Natl. Acad. Sci. USA,
(1990). 89,11322-11326(1992).
462. M. F. Hibert, M. W. Gittos, D. N. Middlemiss, 483. S. A. DePriest, E. F. B. Shands, R. A. Damm-
A. K. Mir, and J. R. Fozard, J. Med. Chem., 31, koehler, and G. R. Marshall in C. Silipo and A.
1087-1093(1988). Vittoria, Eds., QSAR: Rational Approaches to
the Design of Bioactive Compounds, Elsevier R. P. Mason, D. G. Rhodes, and L. G. Herbette,
Science, Amsterdam, 1991, pp. 405-414. J. Med. Chem., 34,869-877 (1991).
S. A. DePriest, D. Mayer, C. B. Naylor, and L. G. Herbette in C. G. Wermuth, Ed., Trends
G. R. Marshall, J. Am. Chem. Soc., 115,5372- in QSAR and Molecular Modelling 92, ES-
5384 (1993). COM Scientific, Leiden, Netherlands, 1993,
C. Hansch, Acc. Chem. Res., 26, 147-153 pp. 76-85.
(1993). H. Heller, M. Schaeffer, and K. Schulten, J.
G. Klebe and U . Abraham, J. Med. Chem., 36, Phys. Chem., 97,8343-8360 (1993).
70-80 (1993).
W . Im and B. R o n , J. Mol. Biol., 319, 1177-
D. P. Getman, G. A. DeCrescenzo, R. M. 1197 (2002).
Heintz, K. L. Reed, J. J. Talley, M. L. Bryant,
M. Clare, K. A. Houseman, J. J. Marr, R. A. T . Kataoka, D. D. Beusen, J. D. Clark, M. Yodo,
Mueller, M. L. Vazquez, H.-S. Shieh, W . C. and G. R. Marshall, Biopolymers, 32, 1519-
Stallings, and R. A. Stegeman, J. Med. Chem., 1533 (1992).
36,288-291 (1993). G. R. Marshall, Tetrahedron, 49, 3547-3558
G. M. Crippen, J. Comput. Chem., 8,943-955 (1993).
(1987). G. V . Nikiforovich and G. R. Marshall, Bio-
M. P. Bradley and G. M . Crippen, J. Med. chem. Biophys. Res. Commun., 195, 222-228
Chem., 36,3171-3177 (1993). (1993).
F. Major, M. Turcotte, D. Gautheret, G. Lap- G. V. Nikiforovich and V . J. Hruby, Biochem.
alme, E. Fillion, and R. Cedergren, Science, Biophys. Res. Commun., 194,9-16 (1993).
253,1255-1260 (1991). G. Nikiforovich and G. R. Marshall, Int. J.
D. Gautheret and R. Cedergren, FASEB J., 7, Pept. Protein Res., 42, 171-180 (1993).
97-105 (1993). G. V. Nikiforovich and G. R. Marshall, Int. J.
P. A. Greenidge, T . C. Jenkins, and S. Neidle, Pept. Protein Res., 42, 181-193 (1993).
Mol. Pharrnacol., 43,982-988 (1993). P. Poulin and F. P. Theil, J. Pharm. Sci., 91,
M . G. Cardozo and A. J. Hopfinger, Mol. Phar- 1358-1370 (2002).
macol., 40, 1023-1028 (1991). G. M. Keseruu a n d L. Molnar, J. Chem. Inf.
M . J. J. Blommers, C. B. Lucasius, G. Kate- Comput. Sci., 42,437-444 (2002).
man, and R. Kaptein, Biopolymers, 22, 45-52
(1992).
H. van de Waterbeemd, Curr. Opin. Drug Dis- .
cov. Dev., 5, 33-43 (2002).
A. G. Palmer I11 and D. A. Case, J. Am. Chem. J. Langowski and A. Long, Adv. Drug Deliv.
SOC.,114,9059-9067 (1992). Rev., 54,407-415 (2002).
K. Boehncke, M. Nonella, K. Schulten, and S. Ekins and J. Rose, J. Mol. Graph. Model.,
A. H.J. Wang, Biochemistry, 30, 5465-5475 20,305309 (2002).
(1991).
T . I. Oprea, I. Zamora, and A. L. Ungell,
J. Xingand H. L. Scott, Biochem. Biophys. Res. J. Comb. Chem., 4,258-266 (2002).
Commun., 165,l-6 (1989).
H. E. Selick, A. P. Beresford, andM. H. Tarbit,
T. R. Stouch, K. B. Ward, A. Altieri, and A. T . Drug Discov. Today, 7, 109-116 (2002).
Hagler, J. Comput. Chem., 12, 1033-1046
A. P. Li and M. Segall, Drug Discov. Today, 7,
(1991).
25-27 (2002).
H. L. Scott and S. Kalaskar, Biochemistry, 28,
3687-3691 (1989). A. Kulkarni, Y . Han, and A. J. Hopfinger,
J. Chem. Znf. Comput. Sci., 42,331342 (2002).
P. S. O'Shea and R. Matela, Biochem. Soc.
Trans., 14,1119-1120 (1986). R. D. Brown, M. Hassan, and M. Waldman, J.
Mol. Graph. Model., 18, 427-437,537 (2000).
D. M. Kroll and G. Gompper, Science, 255,
968-971 (1992). 0.Roche, P. Schneider, J. Zuegge,W . Guba, M.
Kansy, A. Alanine, K. Bleicher, F. Danel, E. M.
L. I. Krishtakik, V.V. Topolev, and Y . I. Khar- Gutknecht, M . Rogers-Evans, W . Neidhart, H.
kats, Biophysics, 36,257-262 (1991). Stalder, M. Dillon, E. Sjogren, N. Fotouhi, P.
E. Egberts and H. J. C. Berendsen, J. Chem. Gillespie, R. Goodnow,W . Harris, P. Jones, M.
Phys., 89,3718-3732 (1988). Taniguchi, S. Tsujii, W , von der S a d , G. Zim-
Molecular Modeling in Drug Design
mermann, and G. Schneider, J. Med. Chem., 530. M. J. Valler and D. Green, Drug Discov. Today,
45,137-142 (2002). 5,286-293 (2000).
525. 0. Llorens, J. J. Perez, and H. 0.Villar, 531. Y. C. Martin, Farmaco, 56, 137-139 (2001).
J. Med. Chem., 44,2793-2804 (2001). 532. J. Xu and J. Stevenson, J. Chem. Inf. Comput.
526. A. Cheng, D. J. Diller, S. L. Dixon, W. J . Egan, Sci., 40, 1177-1187 (2000).
G. Lauri, and K. M. Merz, Jr., J. Comput.
Chem., 23,172-183 (2002). 533. J. S. Mason and B. R. Beno, J. Mol. Graph.
Model., 18,438-451,538 (2000).
527. T . I. Oprea, J. Cornput.-Aided Mol. Des., 14,
251-264 (2000). 534. T . I. Oprea and J. Gottfries, J. Mol. Graph.
528. T . Olsson and T . I. Oprea, Curr. Opin. Drug Model., 17,261-274,329 (1999).
Discov. Dev., 4, 308-313 (2001). 535. A. K. Mandagere, T . N . Thompson, and
529. D. Gorse and R. Lahana, Curr. Opin. Chem. K. K. Hwang, J. Med. Chem., 45, 304-311
Biol., 4,287-294 (2000). (2002).
CHAPTER FOUR
DAVID A. CASE
The Scripps Research Institute
Department of Molecular Biology
La Jolla, California
Contents
1 Introduction, 170
2 Energy Components for Intermolecular
Noncovalent Interactions, 171
2.1 ~lectrostaticEnergy, 171
2.2 Exchange Repulsion Energy, 172
2.3 Polarization Energy, 173
2.4 Charge Transfer Energy, 173
2.5 Dispersion Attraction, 174
2.6 Summary, 174
3 Molecular Mechanics Force Fields, 174
3.1 Biochemical Force Fields, 175
3.2 Force Field Models for Simple Liquids, 176
3.3 Nonadditive and More Complex Models, 176
3.4 Long Range Electrostatic Effects, 177
4 Thermodynamics of Association, 177
4.1 Gas Phase Association, 177
4.2 Solvation Effects, 177
4.3 An Illustrative Example: Protonation of
Amines, 179
5 Calculating Free Energies, 180
6 Examples of Drug-Receptor Interactions, 181
6.1 Biotin-Avidin, 181
6.2 Dihydrofolate Reductase-Trimethoprim, 183
Burger's Medicinal Chemistry and Drug Discoz'cry 6.3 Nucleotide Intercalator, 183
Sixth Edition, Volume 1: Drug Discovery 7 Summary, 183
Edited by Donald J. Abraham
ISBN 0-471-27090-3 0 2003 John Wiley & Sons, Inc.
169
Drug-Target Binding Forces: Advances in Force Field Approaches
150 - I I I
100 - -
-
-100 - v 1 1.5 2
Atom-atom distance, ang.
02
2.5 2 3 4
Atom-atom distance, ang.
5
Figure 4.1. Potential energy curves for atom-atom interactions in 0,, N+, and the 0--0 interac-
tion in a water dimer. Note the different energy scales on the left and right.
dominated by the first nonvanishing multi- ergy between them dies off as l/Rntm+' . The
pole moment M, of the charge distribution, electrostatic interaction energy between wa-
ter a dipolar molecule (n = 1) and benzene,
no. charges whose first nonvanishing moment is a quadru-
Mn = C qiC pole (m = 2), dies off as 1/R4.
i=l
2.2 Exchange Repulsion Energy
where q iare the individual charges and ri is The Pauli principle keeps electrons with the
the vector from the origin of the coordinate same spin spatially apart. This principle ap-
system to the ith charge (5,6). Molecules that plies whether one is dealing with electrons on
are charged have a nonzero zeroth moment the same molecule or on different molecule's
M,. Ionic crystals such as NafC1- are held and is the predominant repulsive force (6) that
together predominantly by electrostatic at- keeps electrons of different molecules from in-
traction between oppositely charged ions. terpenetrating when noncovalent complexes
Crystals of ice I are mainly held together by are formed. This repulsive term is often repre-
dipolar electrostatic forces where Mo = 0 and sented by an analytical function of the form
MI # 0,because there are virtually no ions in
these crystals. It should be noted here that
"hydrogen bonding" is not a separate energy
component; typically hydrogen bonds contain
important energy contributions from all five where R is the distance between molecules or
energy components, although the electrostatic nonbonded atoms and A is a constant that de-
component is usually the largest contributor pends on the atom types. However, the best
to this interaction (7). available quantum mechanical calculations
Of the intermolecular energy components, suggest that this repulsion should diminish
the electrostatic is the longest range (i.e., it with an exponential dependence on the dis-
dies off most slowly with distance as the two tance between the atoms (6).This difference is
molecules separate). Ion-ion interactions die only important for very precise calculations:
off as 1/R; ion-dipole as 1/R2; dipole-dipole as the key point is that the repulsive energy rises
1/R3, etc. In general, if two molecules have as very quickly once the electrons from two dif-
their first nonvanishing multipole moments ferent atoms overlap significantly. Roughly
M, and M, the electrostatic interaction en- speaking, this happens with the distance be-
2 Energy Components for Intermolecular Noncovalent Interactions 173
Table 4.2 Selected Atomic van der W a d s izabilities are additive to a good approxima-
Radii (in A) tion (B)], and it is roughly proportional to the
Element ~VDW number of valence electrons, as well as on how
tightly these valence electrons are bound to
Hydrogen
Carbon the nuclei. Umeyama and Morokuma (9) have
Nitrogen calculated the ion-induced dipole contribution
Wgen to the proton affinities of the simple alkyl
Fluorine amines. They attributed the order ofgasphase
Phosphorus proton affinities in the alkyl amines [NH, <
Sulfur CH3NH, < (CH3),NH < (CHJ3N1 to the
Chlorine
greater polarizability of a methyl group than a
Bromine
hydrogen. A simple estimate using the above
Values from A. Bondi, J. Phys. Chem. 68,441(1964). empirical equation for an ion-induced dipole
interaction with q = +1, which is the differ-
tween two atoms is less than the sum of their ence in polarizabilities of a methyl and a hy-
van der Wads radii. Table 4.2 gives some typ- drogen (Aa) = 4 cm3, a proton-methyl dis-
ical radii for atoms commonly found in organic tance of 2.0 A, and a proton-proton distance of
molecules. 1.6 A, leads to an expected increase of --20
kcal/mol of proton affinity for every methyl
2.3 Polarization Energy group added to NH,. This very qualitative es-
timate is of the right magnitude but about two
When two molecules approach each other,
there is charge to three times too large (see below).
- redistribution within each mol-
ecule, leading to an additional attraction be-
tween the molecules. The energy associated 2.4 Charge Transfer Energy
with this charge redistribution is invariably
attractive and is called the polarization en- When two molecules interact, there is often a
ergy. For example, if a molecule with polariz- small amount of electron flow from one to the
ability a is placed in an electric field, E, the other. For example, in the equilibrium geom-
polarization energy is etry of the linear water dimer HO-H. . .OH2,
the water molecule that is the proton acceptor
1 has transferred about 0.05e- to the proton do-
EP O
= --
~ 2 nor water (9, 10). The attractive energy asso-
ciated with this charge transfer is the charge
If the electric field is caused by an ion, then transfer energy and can be thought of as a
E = qi/R2, where q is the ionic change, i is the mixing of an ionic resonance structure
unit vector along the ion-molecule direction, H a ( - ' . . .H---OH,(+' into the overall wave
d R the ion-molecule distance, which is the function. Although the charge transfer energy
,
= -1/2aq 2/R 4 for this ion-induced dipole is an important contributor to the interaction
tion. The corresponding formula for di- energy of most noncovalent complexes, the
le-induced dipole interaction between two presence of a "charge transfer" electronic
molecules is transition in the visible spectrum does not
mean that the charge transfer energy is the
a1P; + ~ Z C L : : predominant force holding the complex to-
E P O ~= - 21
-
R6 gether in its ground state. For example, the
complex between benzene and I,, earlier
re the j~'sare the dipole moments of the thought to be a prototype "charge transfer"
ecules, the a's are their polarizabilities, complex, seems to be held together predomi-
R is the distance between molecules. The nantly by electrostatic, polarization, and dis-
zability of a molecule can be broken persion energies in its ground electronic state
into atomic contributions [atomic polar- (11).
174 Drug-Target Binding Forces: Advances in Force Field Approaches
molecules with thousands of atoms. Over the On the other hand, biochemists, guided by
past quarter century, many interesting ap- an interest in proteins and nucleic acids, have
plications of such molecular mechanical more generally followed a "bottom up" ap-
methods to complex molecules have been proach (16,19,20).This approach focuses first
carried out (17). on the atomic charges q,. The most general
The ideas that are outlined in a qualitative method to derive the atomic charges is to fit
way above can also be cast into a useful math- them to quantum mechanically calculated
ematical form for computer calculation. The electrostatic potentials on appropriately cho-
basic idea is to write down a (fairly simple and sen molecules or fragments. In early attempt
approximate) function that gives the energy of to do this, computational limitations in quan-
the system as a function of the positions (or tum mechanical calculations led to the use of a
coordinates) of its atoms. Because the deriva- minimal basis set STO-3G to derive the q i(16).
tive (or gradient) of this function yields the More recent efforts have used a 6-31G* or
forces for Newton's equations, such a function larger basis set (19). The 6-31G* basis set has
is often called a "force field"; and because mol- the fortunate property in that it leads to
ecules are viewed as being made up of balls charges (dipole moments) that are enhanced
and springs (so that quantum effects are ig- over accurate gas phase experimental values,
nored), the term "molecular mechanics" is and thus, implicitly builds in "polarization"
used to represent a concrete, mechanical pic- effects characteristic of polar molecules in
ture of molecular motions and energies. condensed phases. The fact that this basis set
enhances the polarity just about the same
3.1 Biochemical Force Fields amount as the popular water models TIP3P
Equation 4.1 represents about the simplest (21) and SPC (22), (where the charges are em-
functional form of a force field that preserves the pirically adjusted to reproduce the water en-
essentialnature of molecules in condensed phases. thalpy of vaporization) is a fortunate fact and
+ C Kd 0 - eeq)'
angle
angles
+ C Vn
T (I+"
dhedrals
atoms
atoms
+C% electrostatic
L <J
ERV
The earliest force fields, which attempted is key in leading to balanced solvent-solvent
describe the structure and strain of small and solvent-solute interactions.
rganic molecules, focused considerable atten- van der Wads parameters are generally
on on more elaborate functions of the first dominated by the inner closed shell of elec-
terms, as well as cross terms (18),repre- trons and thus are fortunately far more trans-
ing a "top down" philosophy. ferable than atomic charges. Therefore, gener-
176 Drug-Target Binding Forces: Advances in Force Field Approaches
ally only one set of van der Waals parameters drocarbons, N-methyl acetamide, and di-
(radius and well depth) per atom type need be methyl sulfide, as well as the liquid structure
employed, with the important exception of hy- and energy of methanol and N-methyl acet-
drogen (23). Unfortunately, it is harder to de- amide, show good agreement with experi-
rive van der Wads parameters than charges ment, with little or no adjustment of parame-
using a b initio quantum mechanics (6, 24). ters. For example, Fox and Kollman (25) have
The alternative that has emerged as a general shown that this approach leads to a density
model is to empirically calibrate results to fit and enthalpy of vaporization of liquid di-
experimental liquid structures and enthalpies methyl sulfoxide (DMSO) within 2% of exper-
(25). iment, using restrained electrostatic potential
Continuing with the "bottom up" develop- charges (RESP) and van der Wads parame-
ment of a force field, we come to the torsion ters taken without modification from the cor-
energy term, where the V, and y either come responding values in proteins. Similar results
from experiment or quantum mechanical cal- have been obtained for other organic liquids.
culations on small molecule models. Whereas
"top down" force fields often use many terms 3.3 Nonadditive and More Complex Models
in the Fourier series for rotation around a
given bond type and attempt to reproduce the What are the most important weaknesses in
conformational energy for a collection of mol- the above-described parameterizational ap-
ecules, most "biochemical" force fields take a proach and the use of Equation (4.1)? In our
minimalist approach (16,19,20).For example, opinion, the main ones are the use of an effec-
we would have only a single V3 torsional term tive two-body potential and the use of only
around an X-C-C-Y bond except when X or Y atom-centered charges.
are electronegative, where another term can
be rationalized from electronic effects and can 1
atom
shows that they can often be important in by six (six translations and six rotations in the
leading to very accurate description of H bond free molecules, three of each in the complex)
directionality (30). during complex formation, and replacing
these with vibrations, which have lower entro-
3.4 Long Range Electrostatic Effects pies (33).
To accurately describe the energy and struc-
4.1 Gas Phase Association
ture of complex systems, not only are the func-
tional form and parameters of molecular mod- For example, at 300 K, two CH, molecules
els described by Equations 4.1 and 4.2 have a translational entropy of 69 eu (entropy
important, but also the manner in which the unit, or caVK) and a rotational entropy of 31
long range electrostatic effects are repre- eu, whereas (CH,), has a translational en-
sented. The standard approach is to use anon- tropy of 37 eu and a rotational entropy (as-
bonded cutoff for both electrostatic and van suming a C. . .C distance of 4 & of 22 eu. Thus,
der Wads interactions, which seems to be a one can see that the translational and rota-
reasonable method for proteins but seems to tional entropy contributions to the reaction
be a poor method to describe highly charged 2CH4 -,(CH,), is -41 eu. These six degrees of
molecules such as nucleic acids. For periodic freedom become vibrations in the complex
systems, Ewald methods (which are too com- (CH,),, and as such, might contribute a vibra-
plex to be described here) have been known for tional entropy of about 20-30 eu. Thus, for the
-
a long time to remove most of the artefacts dimerization of CH, in the gas phase, we ex-
arising from cutoffs, and impressive efficiency pect TAS" of about -3 to -6 kcal/mol at 300 K.
and accuracy of a variant called particle-mesh As stressed in the second law of thermody-
Ewald (PME) has been demonstrated for pro- namics, the tendency for a chemical process to
tein crystals (31) [0.3Arms deviation from the occur is governed both by the energy released
observed crystal structure for bovine pancre- (exothermicity) in the process and the entropy
atic trypsin inhibitor (BPTI) in a 1-nssimula- gained (the tendency of the reaction to go to a
tion with an increase in computer time of only more random, disordered state). In the case of
~ 5 0 %over standard cutoff methods]; the gas phase association, the energy term is in-
PME method also leads to accurate simula- variably exothermic if the reactants approach
tions of proteins, DNA, and RNA in solution each other in an appropriate orientation, and
(32). the entropy term is always negative, opposing
association. Table 4.3 gives an example of the
thermodynamics of association of water mole-
4 THERMODYNAMICS OF ASSOCIATION
cules in the gas phase. As one can see, the
entropy (AS") contribution to association of
We have focused mainly on the energy of asso- water molecules in the gas phase is substantial
ciation between molecules; in any drug- recep- and negative; thus, there is little tendency for
tor interaction, we typically want to know water molecules to associate in the gas phase
the equilibrium constant for association K, at room temperature and 1atm pressure, even
and the free energy of association AGO. The though the hydrogen bond energy is about 5
difference between the free energy (AGO) and kcal/mol.
energy
-- (AEO) of association is given by
AG" = AH" - TAP, and AH" = AEO-+ (APV). 4.2 Solvation Effects
For gas phase associations, (APV) is N-RT,
which is -0.6 kcal/mol at room temperature. The thermodynamic cycle (Fig. 4.2) illustrates
Thus, this term, when added to AE, favors as- the problems we face in transferring our
sociation (the more negative AG, the greater knowledge of gas phase intermolecular inter-
tendency for association). However, AS, the actions to solution phase phenomena.
entropy of association, is typically large and Our real interest is in AG,, the solution
negative. The reason is that one is reducing phase free energy of association. Until now,
the "floppy" degrees of freedom, which have our discussion has focused on the energy
large translational and rotational entropies, (AE,), enthalpy (AH,), and free energy (AG,)
178 Drug-Target Binding Forces: Advances in Force Field Approaches
Table 4.3 Thermodynamic Functions for step taken by Kauzmann (35) in his classic
Gas Phase Association of Water Molecules: paper on the forces that affect protein stability
2H2O + (HzO)z and structure. He examined the thermody-
Thermodynamic Value for H,O namics of association and solution of small
Function Dimerization (kcal/mol) nonpolar molecules in aqueous solution. The
AE" (0 K)" -6.2
associations were characterized by a largepos-
AE" (300 K)" -4.2 itive entropy term and the solution by a large
AH0 (300 K)" -5.2 negative entropy, with the enthalpy terms less
AS" (300 K)b -9.0 important. Thus, the well-known lack of solu-
AG" (300 K) +3.8 bility of hydrocarbons in water was not caused
by a net loss of hydrogen bonds; the hydrocar-
"See Joesten and Schaad (13).
'~stimatedusing the vibration frequencies employed bons cause the water molecules to become
by Joesten and Schaad (14). more ordered (thus to lose entropy) so that
they can still find a good hydrogen bond part-
ner (AH of solution of these hvdrocarbons
" is
of association in the gas phase. To be able to often negative, but much smaller in magni-
calculate AG,, we need to know AG,, the sol- tude than the TAS of solution). By coming to-
vation free energy of the drug-receptor com- gether in aqueous solution, these hydrocar-
plex; AG2,, the solvation free energy of the bons "release" some H20's, and this favorable
drug; and AG, the solvation free energy of TAS association is the driving force for this
the receptor. These solvation free energies are association. It is generally agreed that this
the free energies gained (or lost) by taking the "hydrophobic" effect of hydrocarbon groups is
molecule from a standard concentration in the a key feature in many drug-receptor associa-
gas phase to a corresponding concentration in tions. A lucid description of hydrophobic
solution. Using the thermodynamic cycle in forces is given by Jencks (36) and Dill (37).
Fig. 4.2, it follows that Computer simulation approaches have
proven very useful in enabling calculation of
the association of molecules. For example, the
association of two methane molecules in the
Similar relationships hold for AH, and AS,. gas phase would lead to a AEo (0 K) of N ~ l
There is no reason to expect AG, and AG, to be kcal/mol, and by analog with water dimer (Ta-
similar, so we face the problem of estimating ble 4.3), a very positive AG" (300 K) and thus
AGw, AG, and AG,. We cannot measure no tendency for association. In aqueous solu-
AG2, or AG,, because this would require us to tion, one can calculate, using modern statisti-
vaporize a measurable amount of a receptor or cal mechanical simulation methods, the po-
drug-receptor complex. For most polar and tential of mean force for association of two
ionic drugs, AG, is not measurable either. molecules, which is the free energy as a func-
Therefore, one resorts to measuring the free tion of molecular se~arationin solution. Al-
energy of transfer from octanol to water though there is some controversy about
AG,(oct) rather than the free energy of
transfer from the gas phase to water, AG2,.
This situation underlies the postulate of the
Hansch approach (34), which suggests that
the differences in AGW(oct)[AAGW(oct)lmay
be related to the biological activity of drugs,
and in many cases this desolvation (water +
octanol) does indeed seem to be related to drug
binding and/or biological activity.
Because the individual free energies in
Equation 4.3 are so hard to measure, one is led Figure 4.2. A schematic representation of the
to smaller model systems to analyze the major thermodynamic cycle for molecular association in
driving force for drug- receptor association, a the gas phase and in solution.
4 Thermodynamics of Association 179
Table 4.4 Free Energies in Cycle (Fig. 4.2) for Protonation of Alkyl Amines (kcaVmo1)"
Calc. Expb
Perturbation AGsah AGpmt 'Gprot - AGsolv AGbind2 - AGbindl
positively charged groups into DNA) in which 2. W. Muller and D. Crothers, J. Mol. Biol., 35,251
there might be an important polar or electro- (1968).
static driving force for binding. Again, it is diB- 3. K. Kitaura and K. Morokuma, Int. J. Quant.
cult to ascertain whether these polar contribu- Chem., 10,325 (1976).
tions come from "freeing up" water or from 4. J. C. G. M. van Duijnevelt-van der Rijdt and
direct interactions, but they seem to contribute F. B. van Duijneveldt, J. Am. Chem. Soc., 93,
in a sigmficant fashion to the driving force for 5644 (1971).
association as well as being important in deter- 5. J. Hirschfelder, C. Curtiss, and R. Bird, Molec-
mining biological specificity. The lessons for the ular Theory of Gases and Liquids, Wiley, New
medicinal chemist attempting to design a drug York, 1954.
to maximize the drug receptor association in- 6. R. H. Margenau and N. Kestner, Theory of In-
clude the following: termolecular Forces, 2nd ed., Pergamon Press,
Oxford, 1971.
1. Conformational flexibility can decrease the 7. H. Umeyama and K. Morokuma, J. Am. Chem.
association constants in a straightfor- Soc., 99, 1316 (1977).
wardly predictable way. 8. R. Lefevre, Adv. Phys. Org. Chem., 3, 1 (1965).
2. Hydrophobic effects usually contribute sig- 9. H. Umeyama and K. Morokuma, J. Am. Chem.
Soc., 98,4400 (1976).
nificantly to drug-receptor association, but
one must also consider possible specific po- 10. P. Kollman and L. C. Allen, Chem. Rev.,72,283
lar and ionic interactions. (1972).
11. M. Hanna, J. Am. Chem. Soc., 90,285 (1968);R.
3. Preorganization of the receptor or ligand is
Lefevre, D. V. Radford, and P. Stiles, J. Chem.
a key to obtaining optimal electrostatic or Soc. B, 31, 1297 (1968).
van der Wads interactions.
12. M. Karplus and R. Porter, Atoms and Molecules,
Benjamin, Menlo Park, CA, 1971.
We have tried to provide examples in this 13. K. C. Janda, J. C. Hemminger, J. W. Winna,
chapter both of the qualitative arguments that S. E. Novick, S. J. Harris, and W. Klemperer,
are important for understanding ligand-protein J. Chem. Phys., 63,1419 (1975);M. Joesten and
or ligand-DNA interactions and of some typical L. Schaad, Hydrogen Bonding, Dekker, New
numerical results arising from computer exper- York, 1974; K. Morokuma, S. Iwata, and W.
iments. Understanding these interactions is key Lathan in R. Daubel and B. Pullman, Eds., !Fhe
to the rational design of inhibitors, and a com- World of Quantum Chemistry, D. Reidel, Dor-
puter-aided approach is increasingly being used drecht, Holland, 1974, p. 277.
to screen libraries of potential inhibitors and to 14. P. Kollman. J. Am. Chem. Soc., 99,4875 (1977).
suggest improvements to lead compounds (61). 15. G. E. Bacon, N. A. Curry, and S. A. Wilson, Proc.
As force fields and sampling methods improve R. Soc. Ser. A, 279,98 (1964).
and as computers become ever-more powerful, 16. S. J. Weiner, P. A. Kollman, D. A. Case, U. C.
the practical use of methods like these should Singh, C. Ghio, G. Alagona, S. Profeta, and P.
improve as well. Weiner, J. h e r . Chem. Soc., 106, 765 (1984).
17. A. McCammon and S. Harvey, Molecular Dy-
AUTHOR'S NOTE: namics of Proteins and Nucleic Acids, Cam-
bridge University Press, Cambridge, UK, 1987.
Peter Kollman died unexpectedly in May, 18. U. Bukert and N. L. Allinger, Molecular Me-
2001. He had authored an article on "Drug- chanic, American Chemical Society, Washing-
Target Binding Forces" for the Fifth Edition ton, DC, 1982.
of this series. This revision and extension for 19. W. D. Cornell, P. Cieplak, C. I. Bayly, I. R.
the Sixth Edition is based primarily on Peter's Gould, K. M. Merz Jr., D. M. Ferguson, D. C.
writings, and is dedicated to his memory. Spellmeyer, T. Fox, J. W. Caldwell, and P. A.
Kollman, J . Am. Chem. Soc., 117,5179 (1995).
20. A. D. MacKerell Jr., D. Bashford, M. Bellott,
REFERENCES R. L. Dunback Jr., J. D. Evanseck, M. J. Field, S.
1. P. Atkins, Physical Chemistry, 4th ed., W. H. Fischer, J. Gao, H. Guo, S. Ha, D. Joseph-Mc-
Freeman, New York, 1990. Carthy, L. Kuchnir, K. Kuczera, F. T. K. Lau, C.
erences
Mattos, S. Michnick, T . Ngo, D. T . Nguyen, B. W . Jorgensen, J. Amer. Chem. Soc., 111, 3770
Prodhom,W . E. Reiher 111, B. Roux, M. Schlenk- (1989).
rich, J. C. Smith, R. Stote, J. Straub, M. Wa- J. Timko, S. Moore, D.Walba, P. Hiberty, and D.
tanabe, J . Wirkiewicz-Kuczera, D. Yin, and M. Cram, J. Am. Chem. Soc., 99,4207 (1977).
Karplus. J. Phys. Chem. B , 102,3586 (1998). D. Aue, H. Webb, and M. Bowers, J. Am. Chem.
W . L. Jorgensen, J. Chandrasekhar, J. Madura, SOC., 31,318 (1976).
R. W. Impey, and M . L. Klein, J. Chem. Phys., J. Kirkwood, J. Chem. Phys., 3, 300 (1935); R.
79,926 (1983). Zwanzig, J. Chem. Phys., 22, 1420 (1954).
H. J. C. Berendsen, J . R. Giegera, and T . J. P. M. Postma, H . J. C. Berendsen, and J. R.
Straatsma, J. Phys. Chem., 91, 6269 (1987). Houk, Faraday Symp. Chem. Soc., 17, 55
D. L. Veenstra, D. M. Ferguson, and P. A. Koll- (1982).
man, J. Comput. Chem., 8,971 (1992). W . Jorgensen and C. Ravimohan, J. Chem.
J . Pirssette and E. Kochanski, J. Am. Chem. Phys., 83,3050 (1985).
SOC.,100,6609 (1978). B. L. Tembe and J. A. McCammon, J. Comput.
W . L. Jorgensen and J . Tirado-Rives, J. Am. Chem., 8,281 (1984).
Chem. Soc., 110,1657 (1988);W. L. Jorgensen, A. Warshel, J. Phys. Chern., 86,2218 (1982).
D. S. Maxwell, and J. Tirado-Rives, J. Am. D. L. Beveridge and M. Mezei, Annu. Reu. Bio-
Chern. Soc., 118,11225 (1996);G. Kaminski and phys. Chem., 18,431 (1989).
W . L. Jorgensen, J. Phys. Chem., 100, 18010 P. A. Kollman, Chem. Rev., 93,2395 (1993).
(1996); T . Fox and P. A. Kollman, J. Phys.
N. Green, Biochem. J., 101,774 (1966).
Chem. B, 102,8070 (1998).
P. C. Weber, J. J. Ohlendorf, and F. R. Salemne,
E. C. Meng, P. Cieplak, J. W . Caldwell, and P. A. Science, 243,85 (1989).
Kollman, J. Am. Chem. Soc., 116,12061 (1994).
Y . Sun, D. Spellmeyer, D. Pearlman, and P.
J. W . Caldwell and P. A. Kollman, J . Am. Chem. Kollman, J. Amer. Chem. Soc., 114, 6798
SOC.,117,4177 (1995). (1992).
J. W. Caldwell and P. A. Kollman, J. Phys. B. C. Rao and U. C. Singh, J. Amer. Chem. Soc.,
Chem., 99, 6208 (1995). 111, 3125 (1989); B. C. Rao and U. C. Singh,
Y. Sun, J. W . Caldwell, and P. A. Kollman, J. J. Amer. Chem. Soc., 112, 3803 (1990).
Phys. Chem., 99, 10081 (1995). S. Miyamoto and P. Kollman, Proc. Natl. Acad.
R. W . Dixon and P. A. Kollman, J. Comput.
Chem., 18, 1632 (1997).
Sci. USA, 8402 (1993);S. Miyarnoto and P. Koll-
man, Proteins, 16,226 (1993).
.
D. M. York, A. Wlodawer, L. Petersen, and T . A. S. B. Dixit and C. Chipot,J. Phys. Chern. A, 105,
Darden, Proc. Natl. Acad. Sci. USA, 91, 8715 9795 (2001); B. Kuhn and P. A. Kollman, J. Am.
(1994). Chem. Soc., 122, 3909 (2000).
T. E. CheathamIII, J . L. Miller, T . Fox, T. A. D. Matthews, J . Bolin, J . Burridge, D. Filman,
Darden, and P. A. Kollman, J. Am. Chem. Soc., K. Volz, B. Kaufman,C. Beddell, J. Champness,
117,4193 (1995). D. Stammers, and J. Kraut, J. Biol. Chem., 260,
N. Davidson, Statistical Mechanics, McGraw- 381 (1985).
Hill, New York, 1962; M . I. Page and W . P. L. Kuyper in C. Bugg and S. Ealick, Eds., Crys-
Jencks, Proc. Natl. Acad. Sci. USA, 68, 1678 tallographic and Molecular Modeling in Drug
(1971). Design, Springer-Verlag, NY,1989, pp. 56-79.
C. Hansch, Biological Correlations-The S. Fleischman and C. L. Brooks, Proteins, 7,52
Hansch Approach, ACS, Washington, DCJ973. (1990); C. L. Brooks and S. Fleischman,
W. Kauzmann, Adu. Protein Chem., 14, 1 J. Amer. Chem. Soc., 112,3307 (1990).
(1975); C. Tanford, The Hydrophobic Effect, J. J. McDonald and C. L. Brooks, J. Amer.
Wiley, New York, 1973. Chem. Soc., 113, 2295 (1991); J . J. McDonald
and C. L. Brooks, J. h e r . Chem. Soc., 114,
W. Jencks, Catalysis in Chemistry and Enzy-
2062 (1992).
mology, McGraw-Hill, New York, 1969.
F. Quadrifoglio and V . Crescenzi, Biophys.
K. Dill, Biochemistry, 29, 7133 (1990). Chem., 1, 319 (1974); F . Quadrifoglio and V .
W. Jorgensen, J. K. Buckner, S. Boudon, and J . Crescenzi, Biophys. Chem., 2, 64 (1974).
Tirado-Rives, J. Chem. Phys., 89, 3742 (1988). T . J. A. Ewing, S. Makino, A. G. Skillman, and
L. X. Dang, J. Rice, and P. Kollman, J. Chem. I. D. Kuntz, J. Comput. AidedMol. Des., 15,411
Phys., 93,7528 (1990). (2001).
CHAPTER FIVE
STEPHEN D. PICKETT
GlaxoSmithKline Research
Stevenage, United Kingdom
Contents
1 Introduction, 188
1.1Scope, 188
1.2 Molecular Similarity/Diversity, 188
1.3 Combinatorial Library Design, 190
1.4 Subset Selection and Screening Set
Enrichment, 190
2 Molecular Similarity/Diversity, 191
.
2.1 Descriptors, 191
2.1.1 2D Substructural and Topological
Descriptors, 192
2.1.2 AtomiJMolecular Propertie and 2DJ3D
Structural Descriptors, 193
2.1.2.1 Physicochemical, 193
2.1.2.2 2/3D Structural, 193
2.1.3 3D Properties, 194
2.1.3.1 3D Pharmacophores, 194
2.1.3.2 Shape, 199
2.1.3.3 Field-Based, 201
2.1.4 Analysis, 201
2.1.4.1 Descriptor Transformations, 201
2.1.4.2 Similarity and Distance
Measures, 201
2.2 Analysis and Selection Methods, 202
2.2.1 Cell-Based Partitioning Methods, 203
2.2.1.1 Diverse Solutions, 203
2.2.1.2 Pharmacophore Fingerprints,
204
urger's Medicinal Chemistry and Drug Discovery 2.2.2 Cluster-Based Methods, 206
2.2.3
. . . 'ty-Based Methods, 206
ixth Edition, Volume 1: Drug Discovery
d by Donald J. Abraham 2.2.4 Biasing to Desiredmesirable
0-471-27090-3 0 2003 John Wiley & Sons, Inc. Properties, 208
Combinatorial Library Design, Molecular Similarity, and Diversity Applications
I 0 0
Figure 5.1. A simple illustration of bit-
string encoding of chemical structure (7).
(a) A fragment dictionary-based approach.
(b)Illustration of a hashing scheme using a
path-based decomposition of the structure.
The asterisk denotes an element in the bit
, 1 1 1 ~ 1 ~ ~ ~ ~ ~ ~ ~ ~ 1 ~ , 1 1 1 1
string where a collision has resulted from1 ,
phoric features) as a string of bits (indicating and protein structure-based universes. The
either the presence or absence of a particular pharmacophore fingerprints also represent a
characteristic; see section 2.1.1 and Fig. 5.11, simplified approach to the goal of providing
optionally including a count of the number of molecular descriptors with 3D shape and
times the characteristic is exhibited. A wide property content, while obviating the need for
variety of descriptors is available to evaluate molecular superposition or refined pharma-
the potential similarity or diversity between cophore hypothesis generation.
structures (2). These range from one-dimen- Partitioning methods are widely used. The
sional (ID) descriptors based on molecular compounds are grouped using either a cell-
properties such as molecular weight, which based approach, in which each dimension of
can be derived from the molecular formula;
the chemical space is subdivided or "binned,"
two-dimensional (2D) substructural finger-
or by a clustering approach, in which islands of'
prints, topological methods, and atomiclmo-
similar compounds are formed. Alternatively,
lecular properties [e.g., physicochemical prop-
erties such as calculated log P (c log P)] that the distance between pairs of molecules can be
require knowledge of the "flat" or 2D struc- calculated, and this distance minimized (for
ture, which represents the bonds between the similarity) or maximized (for diversity). For
atoms; to three-dimensional (3D) properties diversity the goal is normally not to identify a
(e.g., pharmacophoric fingerprints), requir- diverse compound in isolation, but to explore a
ing knowledge of the full 3D conformational range of diversity through selection of a di-
space available to a molecule. A 3D pharma- verse subset of compounds. Cell-based meth-
cophoric fingerprint marks the presence or ods provide the advantage of a common frame
absence of potential pharmacophores [com- of reference in terms of the multidimensional
binations of different features and distances cell positions. It is possible with a cell-based
between them, often for three- or four-point method to evaluate both what is there and
pharmacophore fingerprints (i.e., triplets/ what is missing (in terms of empty cells); clus-
triangles or quartetsltetrahedra)] within a tering, by contrast, is based on exploring what
molecule. is there. The same method/descriptor may
Three-dimensional -properties
- such as the thus be used to evaluate both similarity and
pharmacophore fingerprints can also be calcu- "diversity." In practice, "dissimilarity" ap-
lated for the target protein binding site, being proaches often provide a more acceptable ap-
derived from site points complementary to the proach to diversity, ensuring that compounds
functional groups in the protein backbone and are not too similar, but avoiding a potential
side chains, thus bridging the ligand-based pitfall of exploring too frequently the ex-
190 Combinatorial Library Design, Molecular Similarity, and Diversity Applications
tremes of chemical space. Methods and de- 1.4 Subset Selection and Screening Set
scriptors are discussed for each of these cate- Enrichment
gories.
A related task to combinatorial library design
1.3 Combinatorial Library Design that uses molecular diversitylsimilarity meth-
Combinatorial library design is an important ods is subset selection of compound screening
application of molecular similarity and diver- sets. Initial efforts were focused on small "di-
sity principles and methods. Combinatorial verse" or "representative" sets of large corpo-
chemistry approaches can exploit automation rate compound collections. The increased ca-
and robotics to enable the rapid production of pabilities of high throughput screening have
large numbers of compounds. Libraries are changed the demand for such sets, and there is
synthesized for both lead identification and a renewed demand for "focused" and "repre-
lead optimization purposes. The resultant li- sentative" screening subsets of varying sizes;
braries consist of products formed by combin- this includes target class (gene family) focus
ing "reactants" (reagents, monomers) with and the identification of "interesting" (e.g.,
each other or with a "scaffold" (template, novel) compounds in a large set. Newer bio-
core). The most efficient use of reactants and physical screening methods [e.g., NMR-based
automation/robotics would use a strictly com- screening (311 still have capacity issues and a
binatorial combination of reactants/scaffold, need for smaller representative and focused
but other constraints, including the issue of sets. Diverse subset selection can be used to
generating products that have suitable prop- generate sets of compounds to probe a biolog-
erties for biological screening and as potential ical assay or to select a subset of reactants to
drugs, often lead to sparse arrays. Parallel probe the scope of a chemical reaction scheme
synthesis, in which multiple analogs are syn- or screen. However, such methods have a ten-
thesized at a time, is now a standard part of dency to select compounds at the extremes of
the drug discovery process. chemical space; that is, the selected com-
Many molecular diversity and similarity pounds tend to be less suitable as drug candi-
approaches are brought together in the com- dates, and hence the approach is less favored
binatorial library design process. Either the for general screening sets. Rather, diversity
properties of the reactants/scaffolds are used methods are used to ensure that a random,
(reactant-based design) or the properties of subset of a screening set contains compounds
the resultant enumerated products are used in that are representative of the whole, or, in
selecting appropriate reactants (product- conjunction with a focused method, to ensure
based reactant selection). The latter approach a representative sampling of biologically rele-
requires much greater computational re- vant chemical space.
sources, and a preselection of potential reac- Compound subsets focusedjbiased to par-
tants may need to be made to control the total ticular target classes (gene families) have be-
size of the "virtual" (potentially synthesiz- come of greater importance, with application
able) library to be analyzed. Regardless of the to both lead identification and de-orphaning of
method, the required deliverable is sets of re- new targets from genomics studies. Properties
actants/scaffolds to be combined. When work- important for the target class of interest are
ing with the properties of the products, the identified, using descriptors used for molecu-
constraint that reactants are to be used as ef- lar similarity/diversity. A focused subset can
ficiently as possible presents a major optimi- then be selected using a combination of all the
zation problem. possible hypotheses for activity for that target
Virtual screening, with experimental veri- class, including the use of one or more molec-
fication by biological screening, has provided a ular similarity approaches to select com-
validation of many of the molecular similarity1 pounds similar to any known active com-
diversity methods used for combinatorial li- pounds. For targets that have structural
brary design, and some ligand-based ap- information available, docking methods (one
proaches and examples are discussed in widely used method for virtual screening) can
Section 3. be used to select compounds that are comple-
2 Molecular Similarity/Diversity
mentary to the binding site(s). Applications the various methods for applying these repre-
encompass both high throughput screening sentations to real-world problems. The reader
(HTS) and therapeutic area screening where is referred to a number of reviews on various
only smaller numbers of compounds can be aspects covered by this chapter (2,4-9). A di-
screened. For HTS, smaller thematic studies verse set of perspectives/reminiscences on
using these enriched focused sets enable the computational aspects of molecular diversity
rapid prosecution of a set of related targets, has been assembled by Martin (10).
and make the use of duplicate runs for all com-
pounds feasible. This enables selectivity to be
2.1 Descriptors
addressed up front, and the duplicate runs
provide potentially higher quality informa- The problem lies in finding a representation of
tion, with the potential for the identification of chemical structure that allows a mapping be-
hits that might otherwise be missed. tween the chemical structure and its response
General enrichment of the available screen- in a biological or physical process. The repre-
ing compound set for lead identification is a sentation must be general enough to be appli-
major application for both combinatorial li- cable to a range of chemical structures but
brary designlsynthesis and compound acquisi- specific enough to capture the differences be-
tion. The goal of in silico (i.e., computer- tween structures that account for differences
based) studies in compound acquisition is to in response. Once found, this representation
evaluate the interest of compounds that could or set of descriptors can be said to define a
be purchased to add to the screening file, and chemistry space (11)for the population of com-
to select a subset that meets the same type of pounds of interest. The similarity between
physicochemical/"druglikeness" criteria dis- two compounds is their distance within this
cussed for combinatorial libraries. The "inter- space. Unfortunately, this simple statement
est" of a compound or compound set is evalu- hides a number of difficulties. Many descrip-
ated as in combinatorial library design: tors of choice are correlated and it can be dif-
diversity relative to existing compound, tar- ficult to combine categorical (e.g., acid, base,
get, target-class focus, and so forth. neutral) and real-valued (charge, dipole, c log
P) variables. The issue of how to analyze com-
pounds within the chemistry space is covered
2 MOLECULAR SIMILARITY/DIVERSITY in Section 2.2.
Methods for describing chemical structures
The field of medicinal chemistry is based on fall into two broad classes. Two-dimensional
the hypothesis that similar compounds will (2D) methods can be calculated from the 2D
display similar, but probably not identical, ac- graph in which atoms are nodes in the graph
tivities in some biological screen, and that po- and the bonds are the connections between
tency, selectivity, and properties can thus be the nodes. Three-dimensional (3D) methods
modulated by analog synthesis. The challenge require the generation of a 3D structure (x, y,
facing the computational chemist is how to z coordinates) for a structure. Because a mol-
represent compounds in a computer in such a ecule does not exist in a single low energy con-
wav that "similar" comlsounds in the in silico former, the issue of conformer generation also
world are "similar" in the biological world. It requires addressing with this latter method.
is evident that the biological process that is Combining the various descriptors, particu-
being modeled will influence the nature of the larly 2D and 3D, is an area of active research.
chosen representation. For example, c log P is The potential advantage of 3D descriptors
a useful descriptor for modeling processes in- (ligand-protein binding is a 3D spatidelec-
volving cell penetration, whereas a pharma- tronic property that can be described only in
cophoric representation would be more appro- part using 2D descriptors) (5c) has led many
priate for selecting compounds for screening groups to identify 3D descriptors that can han-
against a particular protein active site. In this dle large numbers of compounds and multiple
section we review the wide range of represen- potential models, and do not require a super-
tations that have been developed and describe imposition in 3D coordinate space (e.g., for re-
192 Combinatorial Library Design, Molecular Similarity, and Diversity Applications
view, see Ref. 12). The pharmacophore finger- ical representation of the structure where
prints described in Section 2.1.3 are an bonds are represented by the edges between
example of this. nodes (atoms). They provide a direct represen-
tation of the topological structure of a mole-
2.1 .I 2D Substructural and Topological De- cule encoding information such as the degree
scriptors. The principle behind substructural of branching (IX) and the adjacency of the
keys or fingerprints is shown in Figure 5.1. A branch points (3X), flexibility, and shape (20a).
molecule is encoded by the presence or ab- The superscript describes the number of
sence of a set of predefined atoms, atom types, bonds in the path between atoms used to cal-
and fragments (e.g., S, aromatic nitrogen, culate the index. The software package MOL-
C0,H). The most widely used set of keys is the CONN-Z (20b) was developed specifically for
publicly available ISIS (MACCS) key set pro- generating these descriptors. A number of au-
vided by MDL (13a). An alternative to the use thors have included topological indices or vari-
of predefined fragments is provided by soft- ants thereof in their description of molecules
ware packages such as Daylight (13b) and for describing compound collections (21) or
UNITY (13c). In this approach, all possible large combinatorial libraries, often allied to a
bond paths in a molecule from zero (the at- dimensionality-reduction algorithm such as
oms) to a specified number of bonds (usually 7) principal components analysis (PCA) (6a, 23).
are identified. A hashing procedure is used to Cahart et al. (24) introduced the concept of
store the paths in a bit string of fixed length. atom-pairs, where the topological distance
Each path will set several bits in the bit string (number of bonds) between atoms of specified
(giving them the value of 1) and there is the element type are encoded in a bit string. This
possibility of different paths setting some of was extended to the topological torsion (251,
the same bits. As a result, individual bits lose where elements on all paths of length four are
any meaning. encoded. Kearsley et al. (26) extended this ap-
The origin of the 2D substructural repre- proach to use more generic atom-type proper-
sentation lies in the first chemical registration ties in place of element type. They termed
systems where some means was required to these types binding property classes because
enhance the speed of compound retrieval. they represent key features of intermolecular
Thus, if the query molecule contains a partic- interactions (positiveand negative charge; hx-
ular combination of features, the whole data- drogen bond donor, hydrogen bond acceptor,
base can be screened very rapidly using the and groups that are both of these, such as hy-
keys to identify compounds that are likely to droxyl; hydrophobic atoms; and all others).
contain those features before a more exhaus- These descriptors have been used widely for
tive graph matching is performed to ensure an similarity- and diversity-related tasks. The
exact match with the query. The features rep- CATS descriptors of Schneider et al. (27a) are
resented in the keys (ISIS) or the fingerprint a variant on this approach. All topological dis-
length and density (Daylight) were selected to tances (number of bonds) between a pair of
optimize the process of compound retrieval. binding property classes (e.g., acid-base) in a
Nevertheless, they have proved very useful for molecule are recorded with count information
a variety of similarity-based tasks (14). De- in a correlation vector; that is, how often that
spite these successes, issues surrounding their topological distance occurs between a specified
use in diversity-based approaches have been pair of features in the molecule of interest.
highlighted (15). Similarity is calculated as the Euclidian dis-
Molecular connectivity indices were first tance between the correlation vectors. These
proposed by RandiC in 1975 (16) as a means of CATS descriptors were shown to be useful in
estimating physical properties of alkanes. scaffold-hopping, identifying actives with a
This formalism was quickly extended to other structural type distinct from that of the initial
types of molecules (17) and, since then, a wide lead structure, and have also been used as the
range of indices has been proposed, as re- basis for a de novo design program, TOPAS
viewed by Hall and Kier (18) and RandiC (19). (27b).
The indices are derived from a graph theoret- Functional diversity requirements of com-
2 Molecular Similarity/Diversity
pound libraries have been reviewed (28), for taining suitable physicochemical properties.
which molecular descriptors that relate to This is addressed in later sections. Such prop-
both structure and properties are needed, as erties can also be used to identify particular
well as their evaluation in terms of biological combinations that are preferred for different
relevance. gene families, and these are used to focus a
design.
2.1.2 Atomic/Molecular Properties and 2D/3D 2.1.2.2 2 / 3 0 Structural. The issues with
Structural Descriptors whole molecule descriptors mentioned above
2.1.2.1 Physicochemical. The descriptors led Pearlman (11)and colleagues to look at an
in the previous section focus largely on the alternative representation ("BCUT" descrip-
structure of the compound. The binding prop- tors/metrics) based on atomic properties and
erty classes generalize this to some extent by on how atoms are connected. The approach
replacing relationships between elements or stems from original work of Burden (30) to
atom types with a broader definition, still derive a unique signature for a molecule.
within the framework of an atoms-and-bonds Pearlman extended the concept to develop the
description of the molecule. An alternative ap- BCUT descriptors suitable for diversity- and
proach would be to describe compounds by similarity-related tasks. Each molecule is de-
whole molecule properties, such as molecular scribed by a series of square matrices with
weight and log P. Indeed such properties have atom labels defining the rows and columns. In
been related to important pharmacological a given matrix, the diagonal represents an
and physical properties such as absorption atomic property such as charge, hydrogen
across cell membranes, distribution, and solu- bonding ability (donor/acceptor), or polariz-
bility. These properties are represented, in ability, with optional weighting by accessible
part, by the well-known Lipinski Rule-of-5 surface area; the off-diagonal terms represent
based on molecular weight, calculated log P, topological or Cartesian interatomic distance
and hydrogen bond donor and acceptor counts or other such property. Molecular descriptors
(29). Thus, such properties have an important are generated from the lowest and highest eig-
role in drug design, and in general assess- envalues of these matrices. and describe the
ments of "druggability." However, their use as molecular surface distributions of positive or
descriptors for tasks related to similarity or negative charge, H-bond donors, H-bond ac-.
diversity in the context of receptor affinity is ceptors, and high or low polarizability.
less clear and has been questioned (llb). A A number of such matrices can be calcu-
primary concern is that such properties do not lated based on the nature of the diagonal and
reflect sufficient information regarding chem- off-diagonal properties and the scaling be-
ical structure to enable their use for lead fol- tween them. An "auto-choose" algorithm [see
low-up or similar purposes. For example, a ste- the DiverseSolutions (DVS) program below1
roid and a benzodiazepine can have identical typically finds a 5D or 6D orthogonal chemis-
log P values but are clearly dissimilar from a try space that best represents the diversity of a
medicinal chemistry perspective. Another ma- given population. This ability to identify rele-
jor problem is that many properties (e.g., log vant (to drug-receptor interactions and re-
P, molecular weight, surface area, volume, flecting molecular substructure) and orthogo-
molar refractivity, molecular polarizability) nal (noncorrelated) descriptors is critical for
are correlated, making it difficult to find a rea- the effective use of both distance-based and
sonable set of orthogonal descriptors for the cell-based methods. Three-dimensional prop-
calculation of meaningful distances or for cell- erties may be included by the use of a single
partitioning (see Section 2.2.1). Such conformer to represent atom-atom distances
molecule properties are best used as or the inclusion of quantum mechanical prop-
aints on a design, to define boundaries erties (bond order or overlap-squared) from
of a pharmacologically relevant chemical semiempirical molecular orbital (MO)calcula-
pace or to define a distribution to match. The tion. However, the inclusion of 3D/MO infor-
challenge is then how to combine the mea- mation significantly slows down descriptor
res of diversity while simultaneously main- calculation and does not appear to offer any
194 Combinatorial Library Design, Molecular Similarity, and Diversity Applications
H-bond
donor
H-bond
donor
Figure 5.2. Illustration of the creation of a pharmacophore key. As the conformation of a molecule
changes, so do the distances between the pharmacophoric groups, shown as spheres. The two differ-
ent three-point pharmacophores shown each set their own particular bit in the pharmacophore key.
tasks. The diversity-related use was based on terized by the pharmacophores that they ,
the!hypothesis that sampling over all potential matched. This method was powerful because
phiirmacophores leads to diversity in a biolog- it gave precise control over the queries that
ically relevant space, in contrast to some other were generated and ensured that the com-
methods that focus on chemical diversity. The pounds matched the query, as opposed to sat-
desicriptor thus generated identifies in a sys- isfylng a set of distance constraints; however,
tennatic way all the potential pharmacophores it was slow in execution. The Chem-XIChem-
thttt a molecule could exhibit. Triplet (three- Diverse implementation (36) generates a
point) and quartet (four-point) pharmacoph- pharmacophore fingerprint during the course
ore! representations have been extensively of a single systematic conformational search,
USf ?d(in addition to two-point/2Dapproaches), with a bump-check and/or rules to eliminate
wit;h a variety of features sampled at each high energy conformers. The details of the
poi nt and interfeature distances considered in conformational search and the definitions of
a discrete set of ranges ("bins") (see Fig. 5.2). the pharmacophoric features are key compo-
The ability of pharmacophores to divorce the nents of the system and this methodology has
thrnee-dimensional structural requirements been used extensively for a range of library
for biological activity from the two-dimen- design and both diversity-and similarity-
sional chemical makeup of a ligand has been based tasks (e.g., see Ref. 37). The use of 3D
hierhlighted in a recent review (34). pharmacophores in drug design applications
In an initial implementation from the au- has recently been reviewed (12, 34).
thc)rs (35), a set of 5916 three-point pharma- To perform the necessary analyses to gen-
co1~horequeries was generated and used to erate the pharmacophore fingerprint, relevant
setrch a database. Compounds were charac- features in a molecule need to be identified.
196 Combinatorial Library Design, Molecular Similarity, and Diversity Application
electronic properties to be included. This can screening applications. Around 2-10 million
give a much better performance in similarity different potential pharmacophores are re-
searching. It also increases enormously the solved in &ch a fingerprint. A limited sam-
number of potential pharmacophores that pling of conformations has generally been
need to be considered. To analyze pharma- used to achieve reasonable times (in seconds)
cophoric patterns in molecules, the distances for descriptor calculation. For example, Ma-
between pharmacophoric features are divided son et al. (37) use two (conjugated), three (sin-
into a finite number of ranges using a pre- gle bonds), or four (sp2-sp3 and some conju-
defined binning scheme (e.g., 0-2, 2-3, 3-5, gated) increments with large data sets, using a
5-8 A, etc.), up to a maximum distance nor- systematic analysis for less flexible molecules
mally between 15 and 20 A [a nonuniform bin- and random sampling for flexible molecules.
ning is often used because this mirrors the See Fig. 5.4 for a comparison of three- and
tolerances (e.g., 220%) used in 3D database four-point fingerprints. Software companies
searching that can be more appropriate than such as Accelerys, Tripos, the Chemical Com-
b e d increments, given the limited conforma- puting Group (MOE, http:llwww.chemcomp.
tional sampling that is possible]. The addi- com), and Treweren Consultants (THINK)
tional pharmacophoric combinations created are developing their versions of pharmacoph-
in moving from a three- to four-point descrip- ore fingerprinting methods, with three-point
tion provides additional shape information, pharmacophore fingerprints already imple-
thus increasing molecular separation in simi- mented. The automatic assignment of phar-
larity and diversity studies. macophore features such as hydrophobes, ac-
Separation has a central role in determin- ids and bases, conformational sampling, and
ing the final result of such calculations, with other key options discussed above for the
too little separation resulting in a noisy de- Chem-X software (now no longer supported;
scriptor and too many molecules being defined owned by Accelerys) such as nonuniform bin-
as similar, whereas when too large a separa- ning are challenges that have variable levels of
tion exists, trivial differences can have a current implementation; other options and ex-
disproportionately negative effect on the sim- tensions such as overlapping bins are becom-
ilarity value. Conformational sampling is nec- ing available.
essary, and the granularity of this affects the Others have developed similar approaches .
useful resolution that can be used, as defined for library design (38, 39). Horvath (40) gen-
by the number and size of the distance bins. erates an autocorrelogram of feature-feature
The sampling is generally performed by tor- distances for conformers and calculates a dis-
sional sampling of rotatable bonds. similarity score that takes into account sepa-
Thus fewer ranges are generally considered rate weightings for each feature and allows
with four-point pharmacophores while con- fuzziness between the distance bins. These 3D
comitantly maintaining or improving on the pharmacophoric descriptors were termed
performance of three-point pharmacophore fuzzy bipolar pharmacophore autocorrelo-
methods. For example, by the use of 32 dis- grams (FBPAs), and the use of fuzzy logic to
tances for three-point pharmacophores with build up and compare the fingerprints avoids
seven different features possible for each of the "all-or-nothing" bitwise match of bit-
the points, there are about one million possi- string representations in which sampling arti-
ties (35). Expanding to four-point pharma- facts can cause significant differences. The
phores, just 15 distance bins generate about method has been shown useful in library de-
million geometrically valid possibilities. sign and for analyzing selectivity profiles in
refore for pragmatic reasons of both mem- terms of pharmacophore similarity (41).
disk space, and the limited resolution of It is possible to represent not only a ligand
conformational sampling that is normally by the potential pharmacophores it possesses
plied, seven or 10 distance ranges for four- but also a protein target. In this case the phar-
int pharmacophore fingerprints have been macophore points are identified by the posi-
ed by Mason et al. (37) and recommended tions where a ligand atom of a particular type
r combinatorial library design and virtual (donor, acceptor, acid, base, hydrophobic, aro-
198 Combinatorial Library Design, Molecular Similarity, and Diversity Applications
H-bond H-bond
Acid Base Aromatic Hydrophobe
donors acceptors ring (lipophile)
All combinations of 6 features &
7 distance ranges
9,000
I 10 distance ranges
33,000
1
I 3-point
potential pharmacophores
Figure 5.4. Three- and four-point (tripletlquartet) pharmacophore fingerprint creation. Assign-
ment is often binary (on or off), although a count can be kept, and has been used in more recent
studies. The large difference in bin numbers between three- and four-point pharmacophores provides
additional shape information, thus increasing molecular separation in similarity and diversity stud-
ies.
matic centroid) is likely to bind and so provide trates the favorable energy contours for a va-
a complementary interaction with the adja- riety of pharmacophoric probes for the Factor
cent protein residue side chain. The pharma- Xa serine protease active site. Atoms (with as-
cophore fingerprints are thus generated from sociated pharmacophore features) are then
these complementary site points. The site added in the positions for the most favorable
points can be positioned in the active site us- interaction (also shown in Fig. 5.5).
ing methods such as GRID (42),in which an The resultant ensemble of atoms repre-
energetic survey of the site is made using a sents a hypothetical molecule that interacts at
variety of functional groups. Figure 5.5 illus- all favorable positions in the binding site, and
-
CO-NH
Acid Base
a pharmacophore fingerprint is calculated 1-A grid (Fig. 5.6). Cells occupied by a par-
from this. This fingerprint represents a form ticular feature are recorded in a bit string.
of "protein structure-based diversity," quanti- This descriptor is ideally suited to monomer
fyng the range of different pharmacophoric acquisition and reactant diversity.
shapes complementary to a target protein Topomer shape similarity, developed by
binding site. For example, for the Factor Xa Cramer (45) at Tripos, has been used for sim-
serine protease active site, 13 complementary ilarity searching and targeted library design
site points generated a fingerprint of 2103 (using Tripos' proprietary software, "Chem-
four-point pharmacophore shapes, of which Space"), building on earlier work on steric
354 were the same as the 2062 found for the fields of single "topomeric" conformers,
mine protease thrombin, generated from 13 clustering reactants by their 3D steric fields
site points. Only 11 significant complemen- into "bioisosteric" clusters. The descriptor
tary site points were found for the serine pro- was considered to be useful in describing
tease trypsin, which has a less defined S4 variations about a fixed molecular core. de-
pocket. Of the 1233 total pharmacophore fining a single, unambiguous, aligned con-
shapes, 363 were in common with Factor Xa, formation for any nonchiral molecule.
with 120 in common for all three serine pro- Approaches such as the Gap program that
teases. It is thus possible to identify ensembles exploit 3D descriptors for monomer selection
of pharmacophores that can be used to both address a need for an easily accessible set of
differentiate the sites (selectivity) and identify in-house monomers available for library gen-
common features. Comparison of these pro- eration. Such monomers need to be diverse in
tein-derived pharmacophore fingerprints with nature and able to probe regions of space
known ligands, using four-point fingerprints, through attachment to known leads, while
shows that they can be used for searching for producing compounds with druglike proper-
novel ligands within a database and that they ties. More detailed conformational searching
are specific enough to capture ligand selectiv- paradigms can be used for the smaller mono-
ity between similar proteins such as the serine mer compomds, and approaches such as Gap
proteases thrombin, Factor Xa, and trypsin and OSPPREYS exploit this opportunity.
(37). With three-point fingerprints, the com- For the selection of diverse compound sub-
parison of ligand- and site-derived finger- sets, studies (46a) have compared three-point .
prints could identify common binding motifs, pharmacophore descriptors and 2D finger-
although selectivity was not captured (37b). prints. These have highlighted benefits of the
Pharmacophore fingerprints are relatively different approaches, and the improved per-
slow to calculate, however. Thus, their appli- formance of some combined descriptors. The
tion to very large virtual libraries requires a use of clustering for the rational selection of
t deal of computer power. Researchers at compounds for acquisition and for in-house
ron (12, 43) have developed a pharmaco- compound collections used for screening has
ore-based methodology applicable to reac- also been investigated (46b),with comparable
nts, OSPPREYS (Oriented-Substituent results obtained with 3D pharmacophore-de-
Pharmacophore PRopErtY Space). In this rived fingerprints to the typically used 2D fin-
proach, reactant pharmacophores are calcu- gerprints.
d with respect to the reactant attachment 2.1.3.2 Shape. Pharmacophores capture
m and combinations of up to nine pharma- the key features of intermolecular interac-
cophore centers are considered (see Section tions. However, they do not explicitly capture
8). In the Gridding and Partitioning (Gap) the shape and volume of the ligand, even if this
proach, developed at GlaxoWellcome (44), is crudely implied by the largest four-point
actants are aligned such that the bond be- pharmacophore exhibited, and the totality of
een the attachment atom and the first potential pharmacophores exhibited across a
hydrogen atom is along the x-axis with range of conformations encodes shape frag-
attachment atom at the origin. A confor- ments. Hahn (47) has described a method for
ational analysis is then performed and the three-dimensional shape-based searching im-
harmacophore features are mapped to a plemented in the Catalyst program. Seven
Attachment group at origin Free x-axis rotation
------- about attachment bond
Track locations of
pharmacophores
within regular grid
000110001o...
I J 2 Pharrnacophore key
Figure 5.6. Overview of the Gridding and Partitioning (Gap) procedure as applied to monomers,
exemplified using phenylalanine as a potential primary amine. This molecule thus contains two
pharmacophoric groups (the aromatic ring and the carboxylic acid). During the conformational
analysis the locations of these pharmacophoric groups are tracked within a regular grid. See color
insert. [Reproduced from A. R. Leach and M. M. Hann, Drug Discovery Today, 5, 326-336 (2000),
with p-rmia-ion of Elrevier Science 1
2 Molecular Sirnilarity/Diversity 201
shape indices, positive and negative extents tion with PLS as in the CoMFA (comparative
along the three principal axes from the molec- molecular field analysis) 3D-QSAR methodol-
ular centroid, and the volume of that con- ogy (53). More recently, these fields have been
former are computed and stored in a database. further transformed to generate 3D molecular
These indices can then be used for rapid com- descriptors. The VolSurf program (54) calcu-
parison with a query shape derived from ac- lates a wide range of descriptors from the grid
tive structures. Conformers passing this filter energies [calculated with the program GRID
are then aligned with the query and the simi- (42)l. These have been shown to correlate to a
larity is assessed from the volume overlap. range of properties such as membrane pene-
Shape-based searching can be used indepen- tration and solubility (55). The Almond pro-
dently, in which case it will complement a 2D gram (56) uses a transform known as the
similarity search. The method can also be em- Maximum Auto-Cross Correlation (MACC)
ployed in conjunction with a 3D pharmacoph- between pairs of grid nodes, to give a type of
ore search; however, it is not clear that results two-point pharmacophoric representation of
are improved in this case (48). the fields. Such descriptors have been useful in
2.1.3.3 Field-Based. A receptor site recog- QSAR studies because they are alignment
nizes the surface properties of a molecule. free; that is, they are independent of the posi-
These can be represented by different types of tion within the defining grid, and have also
molecular fields, electrostatic, steric, and hy- been used in reactant selection (Pickett, un-
drophobic, that can be calculated from the published results, 1999). However, the limita-
atomic com~ositionof the molecule and com- tions of the lack of conformational flexibility
paredusing a measure such as the Carbo index have so far precluded their use in more general
(49). A gaussian representation of the field al- database searching and diversity applications.
lows for a more rapid alignment of the mol-
ecules (50). Willett's group has developed a 2.1.4 Analysis
program FBSS (51), which uses a genetic algo- 2.1.4.1 Descriptor Transformations. A large
for the alignment of the molecular number of potential descriptors are available
They have compared the performance and this presents a number of issues. Many
s method with a 2D structural finger- descriptors will tend to be correlated with one
nt (UNITYsoftware, (13c),in searching the another to a greater or lesser degree. There is .
I, a collection of drug molecules and com- the question of the scale of the descriptors and
nds in development, and the BIOSTER da- also the difficulty of combining, say, a finger-
age, a database of functional groups that print with a calculated property. Thus the de-
been used to replace other groups and scriptors must first be transformed in some
n biological function (e.g., a carboxylic way. A key study in this regard was the work
d and a tetrazole). Although the 2D mea- of the Chiron group (57). Groups of similar
m e will tend to find more bioactive mole- descriptors were combined using principal
es, the 3D measure gives a greater struc- components analysis (PCA) and multidimen-
a1 diversity in the hits (52). This seems to sional scaling (MDS),to give a total of 16 com-
the case for most 3D methods. In these ex- posite descriptors. D-optimal design was then
ples conformational flexibility can be con- used to further analyze a data set. Also of in-
dered during the alignment stage but will terest was the use of a "flower plot" to visual-
w the search down considerably and may ize the results. In the DPD (diverse-property
lead to the algorithm becoming stuck in derived) methodology (21a),the search was for
ima. six noncorrelated descriptors. The selection of
ternative to using the molecule com- relevant BCUT descriptors using a 2 test is
ion in calculating the fields is to use mo- mentioned below.
ar fragments as probes to represent pro- 2.1.4.2 Similarity and Distance Measures. A
in side chains. The interaction energy variety of measures exist for assessing the
ween the probe and the molecule is calcu- similarity or distance between molecules in a
d on a grid surrounding the molecule. given descriptor space (2a), as described
ese grid fields can then be used in conjunc- above. Similarity measures give a direct mea-
202 Combinatorial Library Design, Molecular Similarity, and Diversity Applications
sure of similarity between molecules in some expressly include the absence of a feature (or
property space and give values in the range of low values for real-valued properties) in the
0 to 1, with 1 being identical. Typical examples measure of similarity. This has led to the sug-
are the Tanimoto coefficient and the Cosine gestion (58) that, in the chemical domain at
coefficient. For real-valued properties the Tan- least, such measures are best for relative sim-
imoto is defined as ilarity; that is, ranking the similarity of two
molecules to a target, as opposed to measuring
the absolute similarity of molecules for which
similarity measures, are preferred.
i=l Similarity and distance measures form the
Tanimoto = basis for most of the analysis and selection
i=N i=N i=N
2.2.1 Cell-Based Partitioning Methods. Par- each bin. Follow-up of initial hits involves the
titioning methods divide chemistry space into screening of additional compounds from the
hyperdimensional "cells" by "binning" the cells containing hit molecules. Several leads
axes (descriptors) that define the chemistry were identified using this approach (7).
vector space, just as the eight divisions on the 2.2.1.1 Diverse Solutions. DiverseSolutions
x- and y-axes of a two-dimensional checker (DVS) is software developed by Pearlman et
board divide the board into 64 squares. A al. (11,31) to generate and use the BCUT de-
chemical compound occupies a position in scriptors in addition to other DVS-computed
chemistry space determined by the descrip- or user-provided low dimensional descriptors.
tors (coordinates) computed based on its (DiverseSolutions is also designed to work
structure. Once the compounds have been with high dimensional metrics such as 2D fin-
partitioned, selecting diverse or representa- gerprints, and includes some novel algorithms
tive sets of compounds involves selecting a for such distance-based work.) DVS uses a 2-
small number of compounds from each occu- based "auto-choose" algorithm ( l l c ) to iden-
pied cell, either in proportion to the number of tify the combination of low-D descriptors,
compounds in the cell or a specified number which are mutually orthogonal and which
from each occupied cell. For focused sets, com- most uniformly distribute a given large popu-
pounds are sampled from cells neighboring lation of compounds among the cells of the
the population of actives. The real advantage resulting chemistry space. Originally, the bin-
of partitioning methods, however, lies in their ning was performed in a uniform manner
ability to readily identifjr underpopulated re- along each axis, with a given percentage of
gions of property space. Selections can then be outliers to avoid sampling the extremes of
made from a second population of mole- space. This could be useful for large sets of
cules-a virtual library for i n s t a n c e t o in- diverse compounds where the extremes tend
crease the occupancy of underpopulated cells. to be undesirable compounds. However, for
Usually, such methods require a low dimen- large (virtual) libraries initial filtering can re-
sional representation of the space, although move these before the analysis, and thus a
the pharmacophore
- methods are a notable ex- nonuniform binning scheme was suggested
ception to this. The low dimensional space (59),so that acceptable compounds are not lost
may be the result of a dimensionality-reduc- as outliers, and is now the preferred option.
tion algorithm, as described earlier. Alterna- Often, the large population of compounds '
tively, a small number of descriptors may be used as the basis for defining a chemistry
judiciously selected. This latter approach was space is the entire compound collection avail-
taken by Lewis et al. in their DPD methodol- able to a pharmaceutical company for its drug
ogy (21a), which is a good example of parti- discovery efforts, together optionally with
tion-based selection. The aim was to select a structures from commercial databases of bio-
representative set of compounds based on mo- logically active compounds. The resulting
lecular and physicochemical properties for chemistry space can be regarded as the "cor-
screening. Six properties were chosen from porate standard chemistry space" and pro-
ong 49, based on their low pairwise corre- vides an ideal basis for comparing large sets of
number of H-bond acceptors, number compounds such as alternative commercially
nd donors, molecular flexibility, an available compound collections or alternative
pological state index, c log P, and a combinatorial libraries. It is also a good basis
easure of aromatic density. Each descriptor for comparing small sets of compounds such as
(axis) was divided into two to four partitions, compounds with reasonable affinity for vari-
give a total of 576 bins. A major issue was in ous bioreceptors.
ntifymg six relevant and reasonably non- The axes of a corporate standard chemistry
lated (orthogonal) descriptors, leading to space are intended to represent all aspects of
ition of a new descriptor. The chosen molecular structure. Thus, all axes of the cor-
ranges covered more than 85% of a 150,000 porate chemistry space must be considered for
ubset of the corporate collection and approx- purposes such as general diverse subset selec-
ately three compounds were taken from tion or rational compound acquisition. How-
204 Combinatorial Library Design, Molecular Similarity, and Diversity Applications
ever, not all aspects of molecular structure binatorial libraries, and the MDDR drugs da-
may be important for understanding struc- tabase) was also shown. The methods used
ture-activity relationships (SARs) for a partic- were a 2D structural characterization (Day-
ular receptor. This led Pearlman and Smith light fingerprints), DiverseSolutions, and 3D
( l l d ) to introduce the concept of a receptor- pharmacophore fingerprints. A combinatorial
relevant subspace (RRSS) of a full chemistry library of 100,000 structures appeared struc-
space. For example, starting with a chemistry turally different from the other databases by
space of six dimensions, defined to best repre- the Daylight fingerprint clustering, yet the
sent the diversity of all druglike compounds in bulk of its compounds overlapped with drug-
the MDDR (MDL Drug Data Report) database like compounds (MDDR) in DiverseSolutions
(13a), they showed how to perceive the three- BCUT chemistry space and 3D pharmacoph-
dimensional subspace that conveys informa- ore space ("cells" in fingerprints). It was
tion that is particularly relevant for affinity to shown and "quantified" that new diversity rel-
the ACE (angiotension converting enzyme) re- ative to the company database was explored,
ceptor. ACE inhibitors of diverse structure with much of this new diversity in desirable
were tightly clustered with respect to the re- areas occupied by MDDR compounds. The
ceptor-relevant metrics, thereby providing an nonuniform binning scheme was developed to
obvious near-neighbor strategy for lead fol- enable the use of chemistry spaces scaled to
low-up. They ( l l d ) also emphasized the im- include all structures within a set, while main-
portance of not considering metrics that are taining a reasonable distribution of com-
not "receptor-relevant" when computing dis- pounds within cells. The method was used to
tances for such near-neighbor-based discovery select a subset for initial screening of a large
efforts. This also enables diversity in these set of combinatorial libraries designed for
other dimensions to be explored (e.g., with 7-TM GPCR targets.
combinatorial libraries), to obtain compounds 2.2.1.2 Pharmacophore Fingerprints. Phar-
with a modified profile for other properties macophore fingerprints can also be considered
such as bioavailability. as a high dimensional partitioning of the com-
Work on the design and diversity analysis pound space (35). Underrepresented pharma-
of large combinatorial libraries at Pharmaco- cophores within a population can be identified
peia using BCUT metrics and DiverseSolu- and act as a possible focus for library design or
tions was reported by Schnur (32). A cell- compound acquisition. Using six feature typ'es
based analysis of synthon-derived libraries (hydrogen bond acceptor, donor, acid, base,
was performed, using full product libraries, in- hydrophobe, and aromatic ring centroid) with
cluding library comparisons. Active molecules four-point pharmacophores and 7-10 binned
in these libraries, which involved multiple distance ranges, it is possible to resolve about
scaffolds, were found to cluster in various 2-10 million different phannacophoric shapes.
three-dimensional subspaces of the diversity Different databases can be compared using
spaces. The utility of a simple property-based this fingerprint, and differences identified. For
reactantlsynthon selection tool was also de- example, by comparing a corporate screening
scribed, targeted at the synthetic chemists, file (100,000 structures) with the MDDR data-
with reactants binned according to patterns base (62,000 structures) of biologically active
based on the ranges of a set of user-selected compounds (as discussed above for Diverse-
properties that form a diversity hypothesis. Solutions, Refs. 62,80) "holes" could be iden-
Chemistry space metrics have been used at tified, in terms of about 1 million 3D pharma-
Rh6ne-PoulencRorer for diversity analysis, li- cophores exhibited only by MDDR compounds
brary design, and compound selection (59,80) (about 2.7 million were in common and 0.2
using DiverseSolutions to generate a "univer- million unique to the corporate set). This pro-
sal" chemistry space for use as a standard for vides a design space for which combinatorial
profiling structural sets of interest. The libraries were designed and synthesized. A to-
complementarity of three different diversity tal of 100,000 combinatorial library com-
measures for comparing and profiling com- pounds were able to match about 40% (0.4 mil-
pound collections (a corporate database, com- lion) of the pharmacophore "holes" (i.e.,
MDDR Corporate Libraries MDDR Corporate Libraries Total of sets
62 K 100 K 100 K rand 14 K rand 14 K single (from a theor-
chemistry etical 9.7 M)
14 K each
Filwe 5.7. Comparisons of the 3D four-point pharmacophore fingerprints exhibited by several sets
[MDDR database of 62,000 biologically active compounds, a corporate registry database of 100,000
COTnpounds used for screening, 100,000 compounds from combinatorial libraries (from a four-com-
PO' lent Ugi condensation reaction), and 14,000 compound random subsets (MDDR, corporate) or
indlividual libraries]. The four-point potential pharmacophores were calculated using 10 distance
rarige bins and the standard six pharmacophore features.
MDDIE pharmacophores not in corporate set), ing, has been described (37d; see Section 4.7).
and aclditionally explore about 0.3 million new Simulated annealing is a widely used optimi-
pharrriacophores. Figure 5.7 illustrates the zation methodology whereby the "tempera-
numbcEr of pharmacophores found in these ture" of the system is used to control the d c
sets, together with those for the ACD (Avail- gree of sampling of solution space. The
able (2hemicals Directory), random 14,000 "temperature" is cooled or annealed as the
subsel;s of the database sets and some of the run progresses so that the system moves into a
combinatorial libraries (-14,000 each, from a minimum for the function at low "tempera-
four-c~ omponent Ugi condensation reaction, ture." In the classical sense, temperature con-
12 x 1.2 x 12 x 8 reactants). The relative rich- trols the kinetic energy of the system; in a
ness tind diversity of the MDDR database, more general sense, the "temperature" has no
which includes structures from a large num- physical meaning and is a parameter to con-
ber of ' companies, is clear from the compari- trol the sampling of solution space. Diversity
sons. 'The contributions, and eventual dimin- was the goal (function to be optimized) of the
ishing;return,of successive libraries using the studies reported, but the approach can equally
same chemistry is discussed in Section 5.1.2 be applied to optimize to a desired distribution
(see Fig. 5.24 below). of properties (e.g., from sets of biologically ac-
An example of the use of 3D pharmaco- tive compounds). The power of this pharma-
phore fingerprints for the design of GPCR li- cophoric approach has been exemplified by
brarie!s (37a) using "relative" fingerprints fo- Leach et al. in their Gap protocol for monomer
cused around privileged substructures is acquisition (44).
described in Section 5.1.2. An approach that Pharmacophore fingerprints derived from
combines an optimization of a four-point complementary site points to a target binding
pharnnacophore fingerprint and BCUT chem- site have been used as a quantification of "bi-
istry f;pace diversity, using simulated anneal- ological diversity"/structure-based diversity
206 Combinatorial Library Design, Molecular Similarity, and Diversity Applications
(371, defining a measure of the intersection method requires the user to specify the num-
between chemical and biological space. They ber of clusters desired, and tends to be prone
can be compared to the pharmacophore finger- to singletons (clusters of one) and/or a small
prints calculated from ligands, and the phar- number of very large clusters. The cascade
macophore fingerprints of different target clustering methodology (59b) was developed
binding sites can also be compared to identify to address some of these issues. Parameters
similarities (e.g., common binding motifs) and were selected to produce an acceptable size
differences (e.g., for selectivity). The four- distribution for the largest clusters and the
point pharmacophore fingerprint of a serine small clusters were then reclustered. Doman
protease binding site was used to quantify all et al. (63) have developed a fuzzy clustering
the possible binding modes. An example was technique, also based around the Jarvis-
given of how a combinatorial library could be Patrick algorithm but which has no user-de-
designed to match as many as possible of these fined parameters and allows a compound to
site pharmacophores, with the idea that the belong to more than one cluster.
biological screening of the resultant library Hierarchical methods can be further subdi-
would provide information as to which hy- vided into agglomerative and divisive meth-
potheses lead to (the best) binding. The site ods. Agglomerative methods start with each
points can be generated by both geometric compound in a separate cluster and iteratively
methods (as implemented in Chem-X/Chem- join the closest clusters together. Divisive
Protein; see Ref. 133) or through energetic methods start with a single cluster and itera-
surveys of the site [e.g., by using a variety of tively subdivide until each compound is a sin-
probe atoms (as implemented and used for gleton. Hierarchical clustering methods gen-
pharmacophore fingerprint generation) (37); erate a dendrogram showing the relationship
see Section 2.1.3.11. between the compounds, the issue being the
The pharmacophore fingerprinting method level at which to cut the hierarchy (i.e., how
thus provides a novel method to measure many clusters to generate). Although heuris-
similarity when comparing ligands to their tics exist, there is no automated method. Such
binding site targets, with applications such algorithms, however, at best scale to order
as virtual screening and structure-based (N? in time, where N is the number of com-
combinatorial library design, as well as to pounds, and so are limited in application to a
compare binding sites themselves. Flexibil- few hundred thousand compounds at mos't
ity of the binding site can also be explicitly (64). Nevertheless, they have been shown to
accounted for by using a composite finger- be superior to nonhierarchical methods for
print generated from several different bind- clustering of chemical compounds (65).
ing site conformations. Ward's method was shown (5) to be the most
effective at separating active from inactive
2.2.2 Cluster-Based Methods. Clustering compounds by clustering bit strings that de-
methods have a long history of application in scribe the presence or absence of 153 small
chemical information (60). Any set of descrip- generic and specific fragments (ISIS struc-
tors can be used in the clustering, but most tural key descriptors). Even better perfor-
typically some form of structural fingerprint is mance was obtained with the inclusion of
used in conjunction with a similarity measure pharmacophore distances between site points
such as the Tanimoto coefficient (see Section complementary to hydrogen bonding and
2.1.4.1). The methods fall into two broad charged groups combined with distances be-
classes, hierarchical and nonhierarchical. tween centers of aromatic rings and attach-
Nonhierarchical methods such as that de- ment points for hydrophobic groups.
scribed by Jarvis and Patrick (61) have been
widely used for compound selection from large 2.2.3 Dissimilarity-Based Methods. The meth-
databases (62). The principle behind the ods for compound selection described above
Jarvis-Patrick method is to group together essentially group compounds either by par-
compounds that have a large number of near- titioning into cells or by clustering. Dis-
est neighbors in common. However, the similarity-based methods (66) avoid this step.
2 Molecular Similarity/Diversity
I Substructure,
featureirnotif. -- I \
Figure 5.8. Example of privileged four-point pharmacophores, either created from a ligand using a
Particular feature (e.g., the centroid of a "privileged" substructure) or complementary to a protein
sit;e using a site point or attachment point of a docked scaffold. Only pharmacophores that include
this special feature are included in the fingerprint, thus providing a relative measure of diversity1
sixnilarity with respect to the privileged feature.
drugs. Such an approach is most widely used further in Section 5.1.2. The use of "receptor-
as an additional constraint in library design relevant" BCUT chemistry spaces from Di-
algorithms (78) and is further discussed verseSolutions provides a different approach
below to a focused similarity/diversity measure (lld,
An interesting example of biasing in com- 32e-h).
pounc1 selection is provided by Grassy et al.
(79). 1Lead compounds were used to derive a
range of acceptable values for topological indi- 3 VIRTUAL SCREENING BY MOLECULAR
ces a1nd other molecular descriptors. These SIMILARITY
were used to filter a large virtual library and
led to an active compound being synthesized. The use of molecular similarity to analyze
large databases of structures using informa-
. 5 Relative Diversity/Similarity. This de-
scribes an approach that measures "relative"
tion derived from one or several ligands pro-
vides a powerful ligand-based virtual screen-
.
similarity and diversity between chemical ob- ing method (protein structure-based virtual
jects, in contrast to the use of the concept of a screening methods are by comparison based
total IDr "absolute" reference space (80). The on docking structures into a binding site). Vir-
abilitjr of 3D pharmacophoric fingerprint de- tual screening requires that a set of structures
scriptors to separate ligand-binding proper- is ranked, with the goal of identifying new
ties firom chemical structure has enabled a structures that have similar biological activ-
usefulI modification to the way the descriptor ity, with top-scoring compounds sent for eval-
is evahated (37). It is possible to identify one uation in a biological assay. Usually, the re-
of the points of a pharmacophoric description quirement is to provide a small subset of
such iis a triplet or quartet with a special fea- compounds (10-1000) from a large set
ture, such as a "privileged" substructure (100,000-1,000,000) of possible compounds
?d important for binding or a pharma- for screening that is enriched in actives (i.e.,
cophore group. A fingerprint can be generated contains a greater proportion of actives than
that (lescribes the possible pharmacophoric that of the full compound set). In this context,
shape,s from the viewpoint of that special enrichment involves identifying the highest
point/substructure (see Fig. 5.8). This creates number of new chemotypes as opposed to an-
a "re1ative" or "internally referenced" mea- alogs of the query structure($. Pharmaco-
sure (~f diversity, enabling new design and phoric methods have been found to be partic-
analyrris methods. The technique has been ex- ularly effective for this, building on the
tensiv.ely used to design combinatorial librar- successful use of 3D database searching for
ies thtit contain "privileged" substructures fo- lead generation. Other similarity methods
cused on GPCRs (37a), and this is described such as the use of 2D descriptors (Section
210 Combinatorial Library Design, Molecular Similarity, and Diversity Applications
2.1.1) are also commonly used to identify molecule-by-molecule searches. This provides
structures for screening based on the struc- the ability to search mixtures, which some
ture of a known ligand. The use of similarity companies use for high throughput screening,
searching in chemical databases has been re- in that both the search query and/or the data-
viewed by Willett et al. (2a), comparing newer base being searched can be mixtures of struc-
types of similarity measure with existing ap- tures.
proaches. In this section the focus is on the use
3.2 Use of 3D Pharmacophore Fingerprints
of the 3D pharmacophoric methods, which
(Three- and Four-Point)
have been shown to provide a ligand-based vir-
tual screening method that yields new chemo- Some research groups have extended the at-
types. om-pair descriptors to three-point (triplets)
and four-point (quartets) pharmacophore de-
3.1 Use of Geometric Atom-Pair Descriptors
scriptors (35,37,76,81)as described in section
The topological atom-pair descriptors (24) 2. These descriptors have a potentially supe-
have been extended by Sheridan and cowork- rior descriptive power, and a perceived advan-
ers to geometric atom pairs (26),and shown to tage over atom pairs is the increased "shape"
be effective at generating hit lists enriched in information (intrapharmacophore distance
active molecules of different chemotypes. A set relationships) content of the individual de-
of precalculated conformations (-10-25) is scriptors (37a). The quartet (tetrahedral)
used for each molecule, and each atom is as- four-point descriptors offer further potential
signed two different atom types: (1)a binding 3D content by including information on vol-
property (donor, acceptor, acid, base, hydro- ume and chirality (37a, 82), compared with
phobic, polar, and other); (2)a combination of the triplets that are components of the quar-
element type, number of neighbors and T-elec- tets and represent planes or "slices" through
tron count. All combinations of atom pairs are the 3D shapes.
analyzed, for each conformation, and result- The fingerprints can be precalculated for
ant histograms of each probe and database database compounds, with conformational
molecule conformation are compared. The sampling, and stored in an efficient format
technique was compared with its topological (e.g., four-point pharmacophore fingerprints,
equivalent (counting bond connections be- where one line of encoded information uses,
tween atoms to estimate interatomic "dis- about 11 kilobytes of space for 1000 pharma-
tance"). This demonstrated that, although cophores). Probe fingerprints from one or
both methods were able to significantly enrich more structures can be rapidly compared
the highest ranking structures with other ac- against such databases at speeds of >100,000
tive molecules for the same target (-20- to compounds/min, even for large four-point
30-fold enhancement over random in the top pharmacophore fingerprints, representing
300 compounds), the 3D structure-derived de- about 10 million different pharmacophoric
scriptors were able to show their advantage by shapes. Similarity is measured using potential
picking out active chemotypes with greater pharmacophore overlap and similarity indices
structural variation relative to those from the such as the modified Tanimoto index (37a).
2D searches. The analysis used about 30,000 The relative merits of two-, three- and four-
structures from the Derwent Standard Drug point pharmacophore descriptors for different
File (SDF; version 6, developed and distrib- applications is an area of ongoing study (37,
uted by Derwent Information Ltd., London, 83). Figure 5.9 shows some structurally di-
England, 1991, now known as the World Drug verse endothelin antagonists that exhibit low
Index) using probe molecules with known ac- 2D similarity, but maintain significant over-
tivity against a particular target to rank the lap of their four-point pharmacophore finger-
database. Sheridan et al. (26c) have also prints (37a).
shown how a single combined atom-pair de-
3.3 Validation Studies
scriptor from a set of molecules can be used in
a single fast search to provide results similar The validation issue for ID, 2D, and 3D de-
to those from the slower process of individual scriptors for similarity searching and virtual
3 Virtual Screening by Molecular Similarity
OMe
u r/\/O do
%SB 209670
0
H3C
,& N
N,wcH3
:'? CH3
L-746,072
Figure 5.9. Structurally diverse endothelin antagonists exhibiting low 2D similarity wk lain-
taining common pharmacophoric elements crucial to activity.
reening has been addressed in several pub- These relate to bias in the data sets arising
cations (5d, 14a, 45, 72, 84, 85). Conflicting from the presence of closely related analogs,
esults have been reported, probably because which by their nature have high 2D substruc-
if the wavu the different descri~torswere used
A
tural similarities, and the way the 3D pharma-
and biases in the test sets. Two primary con- cophoric descriptors were generated (single
pts have been applied
-- to the analysis of bio- conformation only) and used (bin setting,
&a1 data. The concept of "neighborhoodn Tanimoto index).
~ehavior(84) as a measure of descriptor utility Some comparative studies of ligand-based
las been promoted, based on the idea that if a virtual screening methods have been under-
lescriptor is able to cluster molecules with a taken within Bristol-Myers Squibb (85) using
articular biological activity, the descriptor more optimum settings for pharmacophore
ncodes information regarding the require- fingerprint generation [four-point pharma-
ments for that activity, and by extension is a cophores, 7 distance bins, and full conforma-
lseful measure for molecular similarityldiver- tional analysis (37a)l, which gave quite differ-
ity. Comparisons using 2D fingerprints with ent results. An example using melatonin as a
harmacophore fingerprints with this ap- probe molecule to search against a database of
roach led to the conclusion that 2D descrip- about 150,000 compounds containing about
rs performed better than their 1D and 3D 250 known melatonin antagonists is shown in
ounterparts (14a, 45). However, issues with Fig. 5.10. The graph shows the hit rates ob-
he studies undertaken have been raised (85). tained by similarity ranking in terms of the
21 2 Combinatorial Library Design, Molecular Similarity, and Diversity Applications
Daylight
Isis/MACCS
Atom pairs
4-point
pharmacophores
4-point pharmacophores
+ Atom pairs
Figure 5.13. (a) A virtual library of 634,721 allowed combinatorial AB products (after filtering out
proclucts that failed Lipinski's Rule of 5 "druglike" criteria) shown in a BCUT chemistry space
specifically chosen to best represent the diversity of the virtual library. (b) The maximally diverse
96013-compound subset of the virtual library, illustrating the results of purely product-based "library
design." Although providing the maximal diversity, synthesis of these 9600 AB products would
reqllire the use of 347 A's and 1024 B's-clearly unacceptable from the perspective of synthetic
ecoriomy (numbers of reactants and robotic control). (c) The 9600-compound library resulting from
the traditional, purely reactant-based library design strategy of selecting the 80 most diverse A's and
the 120 most diverse B's. Although providing user-selected synthetic economy, the diversity of these
96010 AB products is clearly quite poor. (d) The 9600-compound library resulting from the reactant-
biased, product-based (RBPB) algorithm developed by Pearlman and Smith (see Refs. 31, 87c and
text). The algorithm selected a different set of 80 A's and a different set of 120 B's, thus providing the
same level of user-selected synthetic economy, while also providing substantially greater diversity
tha~I could be achieved using a purely reactant-based library design strategy. See color insert.
lest represent the diversity of that vir- omy. Although the diversity of these products
Irary. Figure 5.13b illustrates an opti- is clearly optimal, the fact that 347 A's and
diverse "library" of 9600 products 1024 B's would be required to make the 9600
!~ectedby using cell-based diverse subset se- AB products provides an equally clear indica-
ction to cherry pick the 9600 most diverse tion of why purely product-based methods are
roducts without regard for synthetic econ- unsatisfactory from an economical perspec-
214 Combinatorial Library Design, Molecular Similarity, and Diversity Applications
thesis of a library of compounds with a high contain molecules constrained to certain drug-
degree of control over associated properties. like properties with only a small trade-off in
Thus, the combinatorial library design pro- terms of the maximum possible diversity.
cess brings together many of the methods al- The design of leadlike combinatorial librar-
ready described for molecular similarity and ies is an approach of more recent interest. A
molecular diversity coupled to synthetic feasi- lower molecular weight starting point is ad-
bility considerations. Diversity-based and vantageous, in that bulk can be added for po-
structure-based approaches to the design of tency/selectivity/propertieswithout exceeding
virtual libraries have been reviewed (7, 91a). "rule of 5" parameters for orally absorbed
Both ligand-based and protein structure- drugs; otherwise a more labor-intensive step
based virtual screening methods can be used, may be needed to identify a smaller active part
with the combinatorial nature of the virtual of the hit. The properties required of library
compounds being exploited to increase the compounds intended to provide leads suitable
speed of the analysis. Some properties of the for further optimization, that may be rather
products can be estimated rapidly on the fly different from final optimized leads, has been
from the reactants, and products can be gen- reviewed (95).
erated in the active site. The CombiDOCK ap- Thus, library design is a complex optimiza-
proach that can rapidly analyze very large vir- tion problem with often competing con-
tual databases in a binding site by connecting straints, including requirements to have com-
reactants to scaffolds docked in multiple binatorial efficiency and/or several specifled
orientations is discussed in Section 4.10. A product properties (both desired and nonde-
genetic algorithm-based method for the com- sired). Methods such as genetic algorithms,
binatorial docking of reactants has been de- simulated annealing, and Monte Carlo optimi-
scribed by Jones et al. (921, with the applica- zation have been used, and iterative cyclic ap-
tion of a ligand-docking genetic algorithm to proaches applied. The next section describes
screening combinatorial libraries. the application of these methods within the
A challenge in the design of small- and me- context of library design but the reader should
dium-sized focused combinatorial libraries is note that some of these methods are applicable
to harness for use in library design the experi- only for the design of diverse libraries.
ence and knowledge gained in generating
4.3 Optimization Approaches
structure-activity relationships (91b). Screen-
ing libraries biased for pharmaceutical discov- The most basic product-based selection pro-
ery are often designed to augment the struc- cess used in library design is an order-depen-
tural diversity of a chemical library. The dent analysis of products, selecting a com-
approach used in the LASSO0 algorithm (93) pound if it exhibits sufficient "diversity" to
is based on the identification of compounds products already selected. This approach was
from a virtual library that are most different used in the Chem-X/ChemDiverse software
from those already present in a screening set with three- and four-point pharmacophore
and to a reference set of undesirable com- fingerprints. A compound was selected if the
pounds, while being simultaneously most sim- overlap with the ensemble fingerprint of al-
ilar to a set of compounds with desirable char- ready selected compounds was less than a
acteristics. An illustration of the method using user-defined amount; that is, the molecule
bit-string structure descriptors is given. contains a significant number of pharmacoph-
Combinatorial library design approaches ores not already exhibited in selected com-
have been discussed (94), with the design of pounds. This cherry-picking process is an effi-
library subsets that simultaneously optimize cient method for ensuring a high diversity
the diversity or similarity of a library to a tar- library, but can be a combinatorially ineffi-
get, properties (such as druglikeness) of the cient selection for synthesis, with no explicit
library members, properties (such as cost or reference to the constituent reactants being
availability) of the reactants required to make made (see Section 4.2 above for further exam-
them, and the efficiency for array synthesis. ples). A preferred selection for combinatorial
They showed that libraries can be designed to efficiency is arrays of reactants, in which all
218 Combinatorial Library Design, Molecular Similarity, and Diversity Applications
reactants from one component of a combina- been used to perform reactant selection for
torial library are reacted with all the reactants combinatorial libraries based on three-point
in the other components, or sparse arrays, in pharmacophores (78a,b), as described above,
which subsets of reactants are combined. Ad- and other metrics (6b, 23d, 97c,d).
ditional constraints such as physicochemical Genetic algorithms (GA) are another class
properties and flexibility are addressed implic- of optimization techniques widely used within
itly by assigning upper and lower bounds for chemistry (98) that have been explored for li-
given properties, or controlling the order in brary design. A GA is an attempt to utilize the
which molecules are processed. Darwinian process of evolution in an optimi-
To address the issue of using pharmaco- zation procedure. A solution is represented by
phore fingerprints in a way that enabled a a string of fixed length, the chromosome, and
combinatorially efficient selection of reactants is evaluated according to some criterion to
to be selected, and the explicit inclusion of ad- give the fitness score, for example, the phar-
ditional molecular properties such as a bal- macophore coverage of the solution (78b). The
ance of druglike physicochemical properties GA maintains a number of chromosomes (po-
and shape descriptors, the HARPick program tential solutions) that are ranked on their fit-
(78a,b) was created. A stochastic optimization ness and are then modified according to oper-
technique [Monte Carlo simulated annealing ators including mutation, where one element
(9611 was used to enable selections in reactant of the string is changed, and crossover, where
space, whereas diversity is still calculated in the string is cut at some position and swapped
product space. User-defined flexibility for the with equivalent portions of another solution.
reactant array sizes was possible, and addi- These new solutions are evaluated and the
tional descriptors could be used (e.g., to ad- process is repeated for a defined number of
dress the selection of non-drug-like com- iterations or until all (or most) solutions con-
pounds). The pharmacophore fingerprint verge on one result. For library design, the
(three-point, triplets) was used in a nonbinary string represents the selected monomers at
mode (the frequency of occurrence of each each variable position of the library. Evalua-
pharmacophore was calculated), and the tion involves enumerating the sublibrary de-
HARPick diversity measure was tuned to in- fined by the solution and calculating the score
clude a term (Conscore) to force molecules to associated with the products. The stochgstic
occupy relative rather than absolute voids in nature of the process means that the GA is run
pharmacophore space. This avoids the prob- several times to ensure good convergence.
lem of saturation of the fingerprint with large A GA was used by Sheridan and Kearsley
databases in a binary mode, particularly a (99) to design peptoid libraries focused to cho-
problem with the three-point pharmacophore lecystokinin by scoring on similarity to two
descriptors. It was thus possible to design peptide leads. Biological activity, rather than a
combinatorial libraries that exhibited phar- computed fitness, has been used as the score in
macophores that were poorly represented in a a directed combinatorial synthesis program
reference set of compounds. The Conscore (100). Brown and Martin developed GA-
constraint score sums the product of the num- LOPED (101) as a way to design combinatorial
ber of times pharmacophore i has been hit for mixtures. The SELECT program (78c) com-
molecules selected from the current data set bines measures of diversity and the physical
with the score associated with pharmacophore properties of the designed library. The library
i for the constraining library. The Conscore can be designed to be both internally diverse
term can be inverted, enabling focused de- and diverse with respect to a reference popu-
signs, in which the selection of products that lation. Physical properties are optimized by
occupy the more highly occupied bins (e.g., comparing to a user-defined profile for the
from a set of active compounds) is desired. The property of interest, c logP for example. As for
flexibility and success of this kind of stochastic the HARPick approach (78a,b), however, it is
optimization methodology has led to its use by necessary to define a weighting scheme be-
many other researchers for library design (5c, tween the different elements of the score,
6b, 23d, 78c, 97c,d). Simulated annealing has which leads to a number of difficulties. Selec-
4 Corr~binatorialLibrary Design 219
(4
0.6 1 1
0.58 0.6 0.62 0.64
AMW
A A
0.575 - j, Figure 5.14. (a) Results from multiple
'
". .
0.58 - SELECT runs with alternative weightings
? A
% for molecular weight vs. diversity. Filled tri-
0.585 -
'9ik rn
angles, 1.OxDiv and l.OxMW; filled circles,
1.OxDiv and 0.5xMW; filled squares,
0.59 -
0.595 -
x
' 10.OxDiv and l.OxMW. (b) As in a, with
results of a single MOGA run shown as
crosses. [Reproduced from V. Gillet, et al.,
0.6 I I J. Chem. Inf. Comput. Sci., 42, 375-385
0.58 0.6 0.62 0.64 (2002) with permission of the American
AMW Chemical Society.]
thesized compounds, is an important compo- trieving them. There is also interest in extend-
nent of lead identification because this allows ing the approach to 3D property calculation
a weak hit from primary screening to be rap- (1054.
idly expanded into a more potent lead. Exist- An alternative approach has been taken by
ical database systems can be used or Agrdotis and colleagues. In a conference pre-
adily modified to benefit from the combina- sentation (106) they show how a neural net-
torial nature of libraries (64a) but they do not work can be trained on a small sample of enu-
overcome the fundamental issues. merated combinatorial products to reproduce
Downs and Barnard (105a) have proposed 2D molecular descriptors and properties for all
elegant solution to these problems using library members without the need to con-
struct their connection tables.
e Markush representation commonly used
A method for rapid similarity searching in
chemical patents. The key component of
large combinatorial spaces using a new algo-
descriptor calculation rithm Ftrees-FS was published by Rarey and
sis can be performed with- Stahl (135). The similarity search is based on
the need for full enumeration of the prod- the feature tree similarity measure represent-
ther words, both storage and calcula- ing molecules by tree structures. Combinato-
on will tend to scale as the sum of the rial chemistry spaces are handled as a whole
of building blocks in the library rather than looking at subsets of enumerated
an the product as in techniques re- compounds. A set of 17,000 fragments of known
n. The method has been drugs was used, which could be combined to
into a software suite and released 10'' compounds of reasonable size. A novel
commercially as the LibEngine module of the ChemSpace approach (45a)for searching large
&nus2 suite for combinatorial library analy- virtual libraries that does not require enumer-
8ia and design (105d). ation has also been developed by Tripos, using
ckground and theory behind the ap- shape descriptors (topomeric fingerprints) on
ach have been published (105b). In sum- the monomers, and has been used for targeted
, the algorithm relies on identifying a library design (45b).
sociated R-groups that define the
4.5 Library Comparisons
s may or may not be directly re-
to the manner of synthesis. For example, In the previous sections we described the de-
agine a tripeptide library synthesized from sign of libraries based on a number of user-
0 amino acids. The algorithm de- defined criteria, whether they were focused or
the tripeptide backbone as the core and whether they were of a more general nature.
cid side chains as the R-groups. So far, these designs have been undertaken,
e fingerprints are calcu- treating the library in isolation, with the in-
agment basis representing the clusion of property profiles in methods such as
and R-groups taking full account of the HARPick and SELECT to ensure that the syn-
the core and the possibility that thesized compounds are of a suitable physical
icular path may extend between two R- nature. In this sense, the designed library can
gerprints are then com- be said to be internally diverse; that is, the
full fingerprint, a relatively fast selected compounds are diverse within the
proach is a couple of orders of limited chemistry space of all virtual products.
tudes faster than calculating finger- Even for very large virtual libraries, the chem-
s from fully enumerated products. Addi- istry space is still small with respect to the
roperties such as molecular possible chemistry universe. It is very diffi-
-bond donor and acceptor cult, a priori, to address how "diverse" a de-
, and logP can be calculated in a similar signed library is compared to a library gener-
er as well as topological indices. Finger- ated with another set of reactions without
or property data can also be calculated having to go through the computationally ex-
mand for use with clustering algorithms, pensive process of computing all pairwise sim-
avoiding the overhead of storing and re- ilarities between members of the libraries.
222 Combinatorial Library Design, Molecular Similarity, and Diversity Applications
Nevertheless, questions such as "How diverse is still defined with respect to a reference pop-
is the library compared to the screening collec- ulation. By comparing the libraries with refer-
tion?" or "Which of the following chemistries ence to a population (REFDB), such as a cor-
should I choose for a library?" are often posed porate database or a combination of known
and methods are required to answer them. drug databases, one can make statements
Distance-based methods such as clustering such as, library A shows the greatest overlap
can be and have been used but suffer from a with REFDB, whereas library B fills the great-
number of drawbacks both in terms of speed est number of empty or low occupancy cells.
and the fact that the exercise needs to be re- Cummins et al. (22) used a cell-based ap-
peated for every additional library (i.e., there proach to compare five databases, including
is no common frame of reference). In addition, the Wellcome Registry, to select screening sets
all pairwise comparisons would need to be per- of diverse compounds. Topological indices and
formed. Thus, Shemetulskis et al. (107) used a measure of free energy of solvation were
clustering methods to compare the Parke- taken as the descriptors and factor analysis
Davis corporate collection (117,000 com- was used to combine them and define a four-
pounds) with external compounds from dimensional chemistry space that was then
-
Chemical Abstracts Service (380,000) and partitioned. Outliers were removed to allow
Maybridge (42,000). Even today, clustering the partitioning to focus on the most densely
half a million compounds is a daunting task populated region. The use of pharmacophore
and interpreting the results is not straightfor- descriptors in such a task was illustrated by
ward. The Jarvis-Patrick method employed by Mason and Pickett (41, where the pharma-
Shemetulskis et al. has several input parame- cophore overlap between three libraries was
ters, including the need to predefine the num- calculated. It was possible to identify the li-
ber of clusters. Voigt et al. (108) compared the brary covering regions of pharmacophore
National Cancer Institute (NCI) database, a space not covered by the other two. Alterna-
publicly available database of compounds used tively, given that library A is synthesized and
in the NCI screening program, to a number of gives hits in screening, then presumably the
compound databases. The diversity of each library that overlaps best with A should be
collection was estimated by the number of made. Pearlman and Smith ( l l d ) have
compounds selected by use of a diversity-selec- adapted their DVS software to identify what
tion algorithm as a function of database size. they term a receptor-relevant subspace, where
The similarity overlap between two databases the BCUT metrics are selected to best group
has been determined by calculating the per- the active compounds within a population (in
centage of compounds of the first database for fact, it is possible to have several groupings of
which a compound exists in the second data- actives within the space) (see Section 2.2.1.1).
base with a similarity greater or equal to a Comparing two populations by pharma-
specified cutoff (109).Such an approach neces- cophore coverage, although straightforward,
sitates the calculation of the Tanimoto simi- does ignore the contribution from individual
larity coefficient of all compounds in a data- compounds. This is important, in that two li-
base with all compounds in the other braries could cover similar regions of pharma-
databases. As indicated before, the largest cophore space but individual compounds in
drawback of distance-based methods is that the two libraries could be displaying different
they give no indication of where the voids are subsets of the total pharmacophores covered.
within the chemistry space, and searching an This prompted Pickett et al. (70) to explore an
additional compound source for interesting alternative approach. In this case, a number of
compounds would require reclustering. potential scaffolds were available and the aim
Therefore, partitionlcell-based methods was to find which of these would best comple-
are preferred for such library comparison ment previously synthesized libraries. Virtual
tasks. They provide a common frame of refer- libraries were generated using a predefined
ence in which it is possible to identify voids set of reactants and pharmacophore finger-
within the chemistry space of a population. It prints were calculated for these and the previ-
must be emphasized that the chemistry space ously synthesized libraries. By use of mea-
4 Combinatorial Library Design
sures proposed by Turner et al. (69b), the macophoric features, plus an additional defi-
virtual libraries were compared to the synthe- nition of other for all remaining unassigned
sized libraries at both a whole library level and atoms. A subset of the MDDR database (13a)
an individual molecule level. From this analy- was used to define a reference set of bioactive
sis it was possible to select the scaffold that molecules, separated into target classes (gene
best complemented the previously synthe- families). The discriminating power of several
sized libraries. molecular descriptors was measured using the
An alternative methodology based on the target class assignments for this set, and it
ring content of a database, using precalculated was found that the pharmacophore finger-
structure-based hashcodes has been proposed print outperformed other descriptors.
(110). The comparison of the hashcode tables 4.7 Combined Pharmacophore Fingerprints
can be used to compare two databases and the and BCUTs
number of distinct ring-system combinations
can be used as an indicator of database diver- Library design using a simultaneous optimiza-
sity. A method for diversity assessment called tion of BCUT chemistry-space descriptors (11)
the saturation diversity approach, based on and four-point pharmacophore fingerprints
picking as many mutually dissimilar com- has been reported (32d, 37d). The authors in-
pounds as possible from a database was also vestigated the feasibility and results in terms
proposed. The methods were used to compare of complementarity of simultaneously opti-
a number of public databases and gave similar mizing two product-based descriptors for reac-
results. tant selection from large virtual libraries. Di-
versity around a chosen chemistry was the
4.6 Pharmacophore-Based Fingerprints goal of the studies reported, but the approach
The examples of GPCR library design (de- could equally be applied to optimize to a de-
scribed in Section 5.1.2) and protein-site de- sired distribution of properties, say, from sets
sign for Factor Xa (described in Section 5.3) of biologically active compounds. A simulated
illustrate the use and relevance of pharma- annealing algorithm (97) was used to combine
cophore-based fingerprints in library design. both components in a single optimization pro-
A pharmacophoric bias has been a major com- cedure. The choice was based on the ease of
ponent of many library designs ( I l l ) , used in implementation and the ability to include
the context of focused or biased libraries. multiple components in the objective (23d),an
Their broad applicability is important, with important goal in many recent designs, if only
the same descriptors being used for diverse to modulate physicochemical properties to
library design, screening set selection, and fo- druglike ranges. In this example a small, fully
cused library design. This provides a consis- enumerated virtual library of 86,140 amide
tent approach that extends to protein-site compounds was constructed from carboxylic
based pharmacophores as discussed above. acids and primary amines present in the ACD
Their ability to determine the similarities and (Available Chemicals Directory). The prod-
differences between structurally diverse mol- ucts of the optimized and random starting re-
ecules and sites is very powerful. An ensemble actant sets were compared using average
pharrnacophore data set measure is often nearest-neighbor distances, and the Hopkins'
used, which attempts to condense the individ- statistic (113), which evaluates the degree of
ual molecule pharmacophore fingerprints into clustering in a data set, together with the four-
a single measure that describes the important point pharmacophore fingerprint diversity.
features of the data set as a whole (36, 37, The potential utility for very large virtual li-
78a,b). braries, where precalculation of all the phar-
McGregor et al. (112) have recently pub- macophore fingerprints would not be feasible,
lished a version of pharmacophore finger- was illustrated by calculating four-point phar-
printing (the PharmPrint method) applied to macophore fingerprints for virtual library
QSAR and focused library design that uses a compounds on the fly. The fingerprints were
limited basis set of 10,549 three-point phar- calculated during the optimization procedure
macophores. They included the usual six phar- and stored in a compact encoded form, with
Combinatorial Library Design, Molecular Similarity, and Diversity Applications
Databases (e.g.
compounds (e.g.substructure search) corporate registry,
I ACD)
t I
The ADEPT (A Daylight Enumeration and similar system has been implemented at Ver-
Profiling Tool) suite of programs developed at tex (118a). A key component of this system is
GlaxoWellcome (116) is a Web-based system the REOS filtering tool (118b), which applies
providing access to a wide range of library de- filters on molecular weight, lipophilicity, un-
sign functionality, again based around the wanted substructures, rotatable bond counts,
Daylight tool kit. Figure 5.16 provides an out- and so forth to remove "obviously bad" com-
line of the process workflow. Reactant lists are pounds.
generated from searches in databases of in- 4.1 0 Structure-Based Library Design
house and commercially available monomers.
Avariety of filters can be applied to reduce the Structure-based library design uses 3D struc-
size of the lists. These include filters on molec- tures of the biological targets to direct the de-
ular weight, rotatable bond count, and sub- sign and selection of templates/scaffolds and
structure filters to remove unwanted func- of reactants that will produce compounds that
tionality. After library enumeration, various can fit into the target and thus are likely to
property histograms are calculated. This al- bind and have biological activity. The experi-
lows the user to further refine the reactant mental structural information can be derived
choice. by a structural biology approach, using X-ray
A product-based library design algorithm, crystallography or NMR spectroscopy. Com-
PLUMS (117), has been developed to ensure putational models can be built and used (e.g.,
that combinatorial constraints are satisfied in homology modeling techniques for closely re-
the design. The algorithm successively re- lated proteins), but an experimental structure
moves the monomer that adds least value to is always preferred. A structural biology ap-
the library as governed by two terms, the ef- proach can also be used to identify molecules
fectiveness (number of molecules meeting or fragments thereof that bind to a target. For
user-defined criteria such as property ranges, example, NMR screening (3) can be used to
fit to pharmacophore or dock to protein site) identify potential scaffolds or reactants for a
and efficiency (ratio of effectiveness to library combinatorial library that bind to a target site
size). The algorithm is sufficiently fast to and is able to detect very low affinity binding
work within the Web-based environment of (in the millimolar range, compared to the low
ADEPT. Figure 5.17 shows screen shots from micromolar range from biological screening);
ADEPT, illustrating how a library can be spec- this can be done without the need to deter-
ified and the resulting product histograms. A mine the 3D structure of the target.
otatable bonds 1
"
B 4 B 12 16 I
..... . . .......................................
I
ply m k w n value: A P Pd
~ m value:
.............................................. ...............................................................
oitfulm Weight
Nimbrr 0 8 ~ l u c s :342
Ixrnunum value: 287.42
Aaxlmum value: 748.35
Hean: 425.754
I /Standard deviation: 6 8 . 9 3 .
Figure 5.17. Screen-shots from ADEPT.(a)A simple two-component library composed of an ami-
nothiazole template and a series of piperidines specified with ADEPT.(b) Histograms of rotatable
bonds and molecular weight for the enumerated virtual library, aiding the medicinal chemist in the
design of the library. [Reproduced from A. R. Leach and M. M. Hann, Drug Discovery Today, 5,
326-336 (2000), with permission of Elsevier Science.]
4 Combinatorial Library Design
Structure-based drug design (SBDD) is the tions are described: (1)where the 2D structure
topic of another chapter, and key issues such of some actives (diverse angiotensin I1 antag-
as the scoring functions for the ligand-recep- onists) is known, with the goal to design a li-
tor interaction are not discussed further here. brary that best resembles the actives; and (2)
The ability to combine SBDD with combinato- to simulate the situation where an active site
rial chemistry enables a focused design ap- (stromelysin-1 in this case) is available and
proach that can explore a range of ideas, re- the requirement is to design a library of struc-
ducing the dependency on SBDD limitations tures likely to bind to it.
(structural information, scoring, conforma- Tondi (123) discusses several examples in
tional sampling, etc.). The ability to obtain the which structure-based drug design and combi-
X-ray or NMR structure of new potent mole-
natorial library synthesis have worked suc-
cules complexed with their targets can also be cessfully together in a complementary way.
critical for the next iteration, in that compu-
These include the discovery of:
tational structure-based design methods may
be unable to predict alternative and new bind- 0 Potent nonpeptide inhibitors of cathepsin D
ing modes, especially because the protein site (124), which uses CombiBUILD (125), a de-
is normally kept rigid and unpredicted confor-
rivative of the DOCK (126a,b) approach,
mational changes can take place during the with this structure-based selection ap-
binding process. A review by Stahl (119) dis- proach yielding seven times as many hits as
cusses the technology that directly uses recep- a diversity-based procedure.
tor three-dimensional structures, discussing
0 Thrombin inhibitors (127), where B6hm et
relevant topics such as scoring functions, re-
ceptor-ligand docking, and practical applica- al. used LUDI to dock and score computa-
tions. Bohm and Stahl (120) have reviewed tionally available primary amines and then
structure-based library design in terms of mo- score the virtual library generated from
lecular modeling merging with combinatorial benzaldehydes with the top-scoring hit.
chemistry. 0 Novel inhibitors of matrix metalloprotein-
The synergy between combinatorial chem- ases (128):Rockwell et al. (128a) used a com-
istry and de nouo design has been discussed by binatorial library at the beginning of the
Leach et al. (121). They present an approach work to suggest leads suitable for further
wherein a template (corresponding to the cen- optimization that required a conformational
tral core of a combinatorial library) is posi- change at the binding site, and a structure of
tioned within an acyclic carbon chain whose the complex to enable iterative optimiza-
length and bond orders are systematically var- tion; Szardenings et al. (128b) used SBDD to
ied. The conformational space of each result- design the starting scaffold, with synthesis
ing structure (core plus chain) is explored, to guiding the introduction of diversity.
determine whether it is able to link together 0 Thymidylate synthase inhibitors (1291, us-
two or more strongly interacting functional ing DOCK to identify the starting lead.
groups or pharmacophores located within a
protein binding site. In a second phase, 2D The CombiDOCKprogram (1264, based on
queries are derived from the molecular skele- DOCK, enables the evaluation of very large
tons and used to identify possible reactants virtual libraries by using structure-based com-
from a database that would enable the all-car- binatorial docking. Multiple docked orienta-
bon linking chains to be replaced by more syn- tions of the scaffold are used to evaluate reac-
thetically feasible groups. tants separately at each of the substitution
Sheridan et al. (122) have published on de- positions. The total docking score for each
signing targeted libraries with genetic algo- product is rapidly estimated by summing the
rithms, extending earlier work, to use the GA contributions from reactants at each position
with 3D scoring methods and showing that the (which are attached as in the final product to
approach of assembling libraries from frag- the docked scaffold, which may be a computa-
ments in high scoring molecules is a reason- tionally convenient anchor fragment formed
able one. Example applications to two situa- during the reaction rather than a syntheti-
228 Combinatorial Library Design, Molecular Similarity, and Diversity Applications
cally used chemical). Further checks are made with the cytochrome P450 metabolizing en-
for the highest scoring structures (e.g., for zymes are also now becoming available.
steric interactions between reactants at the
different substitution positions). This approx-
imation produces an enormous speed-up over 5 EXAMPLE APPROACHES
docking all the individual compounds, which,
from a time perspective, rapidly becomes pro- 5.1 General Target Class-Focused
hibitive for large combinatorial libraries. Approaches
From the scores it is possible to select combi-
nations of reactants that produce compounds
complementary to the protein binding site. 5.1 .I Defining the Chemical/Biological Space.
The design of target class (gene family) librar-
Combinatorial restraints can be applied as re-
ies or compound subsets requires the defini-
quired to obtain the most efficient use of reac-
tion of a biologically relevant chemical space.
tants and robotics, with an evaluation of any This "biological" space can then be used for
reduction in the inclusion of higher scoring the design and selection of biasedlfocused li-
compounds. braries and compound subsets. Many ap-
Different strategies for combining diversity proaches can be taken, adapting the use of a
and structure-based design in site-focused li- wide variety of similarity/diversity descriptors
braries and the DOCK-based CombiBUILD al- (discussed in Section 2.1) to the identification
gorithm are discussed in a review (125),as an of properties associated with a particular tar-
example of how lead compounds can be rapidly get class or subset thereof. The goal is to iden-
identified by combining diversity with struc- tify a feature or set of features that, ideally, is
ture-based design in site-focused libraries. specific, but more generally "enriched" for the
Lamb et al. (130) have published on the target(s) of interest. A common approach is to
design, docking, and evaluation of multiple li- identify chemical substructures that are char-
braries against a family of targets, using a sim- acteristic for the target class, and use these for
ilar divide-and-conquer algorithm for side the design. The simplest approach is to include
chain selection that enables the exploration of such substructures in the library, but the co-
large lists of reactant substituents with linear occurrence of other features is often needed,
rather than combinatorial time dependency. and the quantification of this provides an en-
The method consists of three main stages: (1) hanced design. An example of this combined
docking the scaffold, (2)selecting the best sub- approach is discussed in the next section, us-
stituents at each site of diversity, and (3) com- ing the pharmacophore fingerprints expressed
paring the resultant structures within and be- relative to "privileged" substructures. This
tween the libraries. The scaffold docking provides a convenient cell-based partitioning
procedure, in conjunction with a novel vector- approach. Alternatively, it is possible to iden-
based orientation filter, was shown to be effec- tify properties that are enriched for a particu-
tive for several protease targets, reproducing lar target class, without reference to any
experimental binding modes. particular substructures: 1D (e.g., physico-
The application of the powerful combina- chemical), 2D (e.g., ISIS keys, BCUTs),and 3D
tion of SBDD and combinatorial chemistry is (e.g., pharmacophore fingerprints) properties
not limited to lead discovery or the optimiza- can all be used. BCUTs have been used within
tion of potency, but also to the optimization of a target (to identify a receptor-relevant sub-
the selectivity (using knowledge of the struc- space, in which actives cluster), to differenti-
tures of related targets) and pharmacokineticl ate within a target class (e.g., ion channel
druglike properties of a molecule. For exam- openers vs. blockers) and for general target
ple, the structure of a ligand-receptor complex class analysis. BCUT chemical space provides
can clearly indicate areas where chemical a way to quantifj. the "diversity" of certain
modifications could be made to modulate these properties within actives for a target class, as
other properties, without directly affecting well as to identify any particular combination
binding/potency. Models/structures of ligands of properties that actives share. BCUTs have
5 Example Approaches
Substructure
featurelmotif
e.g. acid, base
0=
H-bond donor I Acceptor
Acid I Base
Aromatic ring I Hydrophobe
Figure 5.19. Example of a "privileged" four-point pharmacophore. Here biphenyl tetrazole, a sub-
structure seen in a number of GPCR inhibitors, is specifically defined as a pharmacophore feature,
using a centroid dummy atom. Only pharmacophores that include this type are included in the
fingerprint, thus providing a relative measure of diversitylsimilarity with respect to the privileged
feature.
Properties, to be optimized also. The total The example here used only a binary finger-
nurnber of pharmacophores (this time without print, but even more powerful results can be
refc?renceto the privileged substructures) can obtained when a count for each potential phar-
alsc be monitored and optimized. Example re- macophore is included. The authors showed
sullts from one of the Ugi library optimizations that for these designed Ugi libraries the same
are shown in Fig. 5.23. Ugi chemistry could indeed yield significant
I
1 l'his design illustrates an advantage of a new diversity for multiple 14,000 compound
titioning (cell-based) approach. The phar- libraries, but that after three libraries dimin-
cophore fingerprint can be used to monitor ishing returns were obtained. They used the
gress, to quantify how much of the desired understandable nature of the pharmacophore
I has been accomplished, and to evaluate descriptor by analyzing the remaining MDDR-
?ther a given chemistry can yield further pharmacophore fingerprint to show that most
conlpounds that match the design criteria of the remaining pharmacophores to be
- .
andIlor explore new pharmacophoric space. matched contained acids andlor bases. A mod-
ified chemistry approach was therefore devel-
oped using protected acids (t-butyl esters) and
bases (BOC protected) in the Ugi reaction.
The unmatched cells in the MDDR-fingerprint
can be related back to the compounds that
20,000
Figure 5.23. (a and b) Contribu-
10,000
tions per acid reactant of pharma-
cophores for optimization in the 0
U; reaction (with biphenyl tetra- 1 3 5 7 9 11 13 15 17 19 21
zole as the "privileged" motif at
the amine position). The order
f
Reagent 1
t
12
f
22
shown is the final selected order of
reactants, based on obtaining the (b) Cumulative total of 4-point pharmacophores
maximum number of new privi- after each reagent selection
leged pharmacophores per addi-
tional reactant. Histogram a
shows the number of new phar-
macophores added by each new
selected reactant in the "privi-
leged" pharmacophoric space de-
fined by known GPCR compounds
containing the biphenyl tetrazole;
shown in histogram b is the
matching increase in the total
number of pharmacophores for
the library for each new selected
reactant. Reagent 1 12 22
5 Examlple Approaches 233
2,200,000
H-bond donor +
1,800,000
1,400,000
1,000,000
600,000
200,000
0
1 2 3 4
Library
Fi gure 5.24. On the left is shown the cumulative (black) total number of four-point pharmaco-
phores from consecutive 14,000 sets of Ugi libraries designed for 7-TM GPCR targets, together with
thc:total number of pharmacophores in each library (in gray). Note the diminishing yield of new
ph armacophores with later libraries, indicating that a change in strategy is needed. On the right are
sh1own the features present in the resultant unrepresented pharmacophores (i.e., found in 7-TM
GI'CR biphenyl tetrazole-containingcompounds in MDDR but not in synthesized libraries), indicat-
in$:a strategy change to include more acids and bases together with the biphenyl tetrazole.
than 3-00f l affinity for the corresponding re- PA+ receptors. This was done to further
ceptor.. Significantly, they also eliminated li- convince their colleagues, as explained below.
gands with better than 1 a affinity for the All 2000 compounds were screened for activity
comesiponding receptor. This very unusual against the GPCR-1 receptor. Those testing
step P(as taken in an effort to convince their positive were retested in a secondary, func-
collea;gues that the method they intended to tional assay. All but two compounds having
use was not reliant on knowing the answer better than 100 nM affinity for the GPCR-1
ahead of time. This left 187 ligands with affin- receptor are colored blue and/or are located
I
!
ities n~ostlybetween 10 and 70 for various within the blue oval. All but one compound
I memElers of the GPCR-PA+ family of recep- having better than 10 nM affinity for the
I
I
tors. 1Jsing these compounds, they perceived a
three-dimensional BCUT subspace within
GPCR-1 receptor are colored red and/or are
located within the red oval. All compounds
.
their Itorporate chemistry space that clusters with better than 2 nM affinity are colored
the ligands of individual members of the green and are located within the two small
GPCE:-PA+ family and appears to be appro- green ovals within the larger green oval, con-
priate for this target class. The positions of all sistent with the two crude clusters of GPCR-1
187 li;zands in the 3D chemistry space shown ligands seen in Fig. 5.25. The fact that these
in Fig .5.25 were originally indicated by open two small ovals each contain products from
cyan circles. All ligands of some but not all several different libraries (scaffolds) suggests
recept,ors were then color-coded as indicated. the possible existence of two binding modes
Many red GPCR-2 and yellow GPCR-4 ligands for this receptor. It is also significant to note
are hidden under the green GPCR-1 ligands. that, although the authors intentionally syn-
The gray oval provides a crude indication of thesized compounds within the entire region
the region of chemistry space of interest for of interest for GPCR-PA+ receptors, the only
GPCE:-PA+ receptors. compounds showing significant affinity for the
Figpre 5.26 indicates the positions of GPCR-1 receptor were located close to the
roughly 2000 Neurocrine compounds selected known GPCR-1 ligands (compare with Fig.
from 14 different combinatorial libraries 5.251, thus supporting the use of BCUT coor-
based on 14 different and proprietary scaf- dinates (on receptor-relevant axes) as a valid
folds. Rather than selecting compounds only approach to virtual high throughput screen-
near t he known ligands of GPCR-1, their re- ing. The tight clustering of GPCR-PA+ li-
ceptoro f interest, Wang and Saunders also se- gands in both figures clearly suggests that
lected compounds spanning the entire GPCR- BCUT metrics represent, albeit in a relatively
Combinatorial Library Design, Molecular Similarity, and Diversity Applications
Figure 5.25. The 3D subspace most receptor relevant for members of the GPCR-PA+ family of
receptors. Points indicate coordinates of 187 published ligands of various GPCR-PA+ receptors.
Some have been color-coded by receptor for illustrative purposes.See Refs. 32e,i and text for further
details. See color insert.
crude fashion, the same sort of information as pharmacophoric methods to the design of fo-
would be represented in a description of the cused libraries was demonstrated in this case,
pharmacophore for the receptor of interest. where the aim was to design the library to-
ward a known lead or leads. The authors also
5.2 Property-Biased Design investigated the design of libraries with im-
The use of pharmacophoric descriptors in en- proved pharmacokinetic properties. Simple
hancing the hit-to-lead properties of lead opti- and rapidly computable descriptors applicable
mization libraries has been described (76). to the prediction of drug transport properties
Pharmacophore fingerprints, based on the were used, and the results illustrate a common
Chem-XIChemDiverse multiple pharmaco- problem: to obtain the best results it may be
phore descriptors, were used and several is- necessary to synthesize libraries in a noncom-
sues in the design of lead optimization librar- binatorial manner. A Monte Carlo search pro-
ies were addressed. The applicability of cedure was devised to enable the selection of a
Figure 5.26. The same 3D subspace as in Fig. 5.25, rotated slightly to provide a better viewing
perspective. Points indicate coordinates of about 2000 combinatorial products selected from 14
different libraries. Color-coding indicates affinity for the GPCR-1 receptor. See Refs. 32e,i and text for
further details. See color insert.
positioned in the active site of each protein The active site of the Factor Xa serine pro-
using the results of GRID (42) analyses (see tease (134) has been used for combinatorial
Fig. 5.5), and receptor-based four-point phar- library design (37c,d) using the DiR approach.
macophore fingerprints were generated. GRID analyses using probes for hydrogen
Fingerprints were also generated using full bond donors, acceptors, bases, acids, and hy-
conformational flexibility for some highly se- drophobes resulted in 23 complementary site
lective and potent thrombin and Factor Xa in- points being added (see Fig. 5.5). The shape of
hibitors. Receptor-based similarity was inves- the active site was defined using 162 protein
tigated as a function of common potential atoms. To ensure that a relevant area of the
three- and four-point pharmacophores for binding site was being explored (based on
each ligandreceptor pair. The results indi- knowledge of X-ray protein-ligand complexes),
cated that the use of just the common poten- site pharmacophores were forced to contain a
tial four-point pharmacophores could give in- hydrophobe or aromatic ring centroid point
formation pertaining to relative enzyme from both the S1 and S4 regions of the binding
selectivity; when three-point pharmacopliores pocket. By using this focused approach, a "di-
were used, however, poor resolution of en- versity" of matched site pharmacophores was
zyme selectivity was observed. The thrombin obtained, representing a sampling of "reason-
inhibitor thus exhibited greater similarity able" binding modes related to those experi-
with the complementary four-point pharma- mentally observed and, thus, presumably hav-
cophore fingerprints of the thrombin active ing a higher probability of giving rise to
sites than with the potential pharmacophore biological activity. This focused approach re-
keys generated from the other enzymes; a sim- duced the total number of site pharmaco-
ilar result was found for the Factor Xa inhibi- phores from 5393 to 775 [using the seven dis-
tors with the Factor Xa site. tance ranges setting (37a) and considering all
Clearly, the inclusion of the shape of the distances in the 1-15 A range]. The approach
binding site should improve the resolution, was validated by the identification of feasible
and the DiR (Design in Receptor) approach binding models (374, similar to that experi-
(133) refines the process, requiring that the mentally observed for a known Factor Xa in-
pharmacophoric match fits the shape of the hibitor. The Ugi four-component condensa-
target site (i.e., is sterically compatible with tion reaction (131)(see Fig. 5.20) was used fpr
the site). This clearly provides much addi- the study and is capable of producing very
tional information at the expense of greatly large numbers of different structures from
increased calculation time. Within the DiR ap- commercially available reactants. An example
proach, two-, three-, and four-point potential of the power of the method was given, whereby
site pharmacophores can be used. This pro- products were selected semimanually from a
vides interesting new library design possibili- small virtual library of 432 products (37c,d).
ties, in that it is possible to evaluate which Products were constructed from the four reac-
ligands are able to fit in the site by matching at tant sets: carboxylic acids (R, x 3), amines (R,
least one set of pharmacophoric features, and x 2), aldehydes (R, x 3), and isonitriles (R, x
to quantify which pharmacophore hypotheses 24). The pharmacophore-based site analysis
are matched. A subset of ligands can then be showed the optimum positions of substitution
designed that match as many different phar- and chain length for benzamidine-containing
macophoric hypotheses as possible, and the bi- fragments (targeted to the aspartate-contain-
ological screening of the resultant compounds ing S1 pocket) and the optimum lengths of
can determine which bind best. Alternatively, other hydrophobic reactants (targeted to the
pharmacophore constraints can be applied to a S4 pocket) to produce compounds that would
shape-driven searching approach, and Good et sample the maximum number of binding
al. (34) have shown the effectiveness of this modes. In this case the groups were always
with the DOCK virtual screening/docking ap- forced to be in the S1 and S4 pockets to main-
proach, in which the addition of pharmaco- tain "reasonable" binding modes, although
phore constraints improved both the enrich- this restriction could be excluded to probe
ment and speed of the process. even further potential binding modes. Thus,
References
as the identity of the matched site pharrna- toxicophores) space. Different targets and dif-
cophore(s)was known for each compound, tar- ferent expected routes of administration will
get site-based diversity of binding modes could require different constraints, and an element
be explored in the design process. An opti- of diversity (with constraints toward a drug-
mized selection of reactants was possible, and occupied chemical space) will remain impor-
the value to the design of reactants with dif- tant, to enable the most effective use of com-
ferent chain lengths could be evaluated. binatorial library chemistry and to discover
new leads for both established and new tar-
gets.
6 CONCLUSIONS AND FUTURE
DIRECTIONS
REFERENCES
Similarity and diversity metrics have been 1. M. Johnson and G. Maggiora, Concepts and Ap-
successfullyused for a variety of tasks, includ- plications ofMolecular Diversity, John Wiley &
ing virtual screening, subset selection, and Sons, New York, 1990.
combinatorial library design. Databases of vir- 2. (a) P. Willett, J. M. Barnard, and G. M. Downs,
tual compounds (e.g., from validated combina- J. Chem. Znf Comput. Sci., 38,983 (1998); (b)
torial chemistry protocols and reactants) can P. Willett, Curr. Opin. Biotechnol., 11, 85
be used for both virtual screening and library (2000); (c) P. Willett, Ed., Perspectives in Drug
design (virtual screening on virtual libraries Discovery Design, Vols. 718, Kluwer Academic,
with additional combinatorial constraints). Dordrecht/Norwell, MA, 1997; (d) J. S. Mason
The ability to exploit rapidly large virtual li- and M. A. Hermsmeier, Curr. Opin. Chem.
braries of compounds that could be made by Biol., 3,342 (1999).
validated combinatorial chemistry protocols 3. (a) J. M. Moore, Curr. Opin. Biotechnol., 10,54
provides very powerful virtual screening and (1999); (b) P. J. Hajduk, T. Gerfin, J.-M.
library design approaches. Future directions Boehlen, M. Haeberli, D. Marek, and S. W.
Fesik, J. Med. Chem., 42,2315 (1999); (c) C. A.
for library design will involve the application
Lepre, Drug Discovery Today, 6, 133 (2001);
of such approaches in a fully integrated fash- (d) J. Fejzo, C. A. Lepre, J. W. Peng, G. W.
ion (e.g., the ADEPT tool described in Section Bemis, Ajay, M. A. Murcko, and J. M. Moore,
4.10) and further enhancements to the con- Chem. Biol., 6, 755 (1999). .
aints necessary to achieve druglike com- 4. J. S. Mason and S. D. Pickett, Perspect. Drug
unds (e.g., 80% compliance to the Rule of 5, Discov. Des., 718,85 (1997).
redictive models for metabolism- and toxici-
5. (a) R. D. Brown, Perspect. Drug Discov. Des.,
-related issues). Where the goal is lead gen- 718, 31 (1997); (b) Y. C. Martin, R. D. Brown,
n (e.g., to enrich the compound screen- and M. G. Bures in E. M. Gordon and J. F.
e for high throughput screening), a focus Kenvin, Jr., Eds., Combinatorial Chemistry
be on target classes (gene families) of in- Molecular Diversity Drug Discovery, Wiley-
rest, and the generation of compounds with Liss, New York, 1998, pp. 369385; (c) M. G.
like properties, such as a lower molecular Bures and Y. C. Martin, Curr. Opin. Chem.
t. The move away from combinatorial Biol., 2, 376 (1998); (d) Y. C. Martin, M. G.
es to sparse arrays and noncombinato- Bures, and R. D. Brown, Pharm. Pharmacol.
cheny-picked) libraries (90) will con- Commun., 4, 147 (1998).
enabling more effective designs with 6. (a) D. K. Agrafiotis in P. v. R. Schleyer, N. L.
rol of associated properties. However, as Allinger, T. Clark, J. Gasteiger, P. A. Kollman,
e property constraints are applied to the H. F. Schaefer 111, and P. R. Schreiner, Eds.,
The Encyclopedia of Computational Chemis-
rary designs for leadlike/druglike proper-
try, Vol. 1, John Wiley & Sons, Chichester, UK,
es, the need to include positive design ele- 1998, pp. 742-761; (b) D. K. Agrafiotis andV. S.
ents to ensure good biological activity is em- Lobanov, J. Chem. Znf Comput. Sci., 39, 51
ed. The goal for drug discovery is thus (1999); (c) D. K. Agrafiotis, J. C. Myslik, and
tify targets and to generate compounds F. R. Salemme, Annu. Rep. Comb. Chem. Mol.
are at the intersection of chemical, biolog- Diversity, 2, 71 (1999); (d) D. K. Agrafiotis,
, and druglike property (e.g., absorption, V. S. Lobanov, D. N. Rassokhin, and S. Izrailev,
238 Combinatorial Library Design, Molecular Similarity, and Diversity Applications
Methods Princ. Med. Chem., 10 (Virtual Quincy, M A , available from eduSoft L.C.
Screening for Bioactive Molecules), 265 (2000). http://www.eslc.vaviotech.com.
7 . R. A. Lewis, S. D. Pickett, and D. E. Clark in 21. (a) R. A. Lewis, J. S. Mason, and I. M. McLay,
K. B. Lipkowitz and D. B. Boyd, Eds., Reviews J. Chem. Inf. Comput. Sci., 37,599 (1997).
i n Computational Chemistry, Vol. 16, Wiley- 22. D. J. Cummins, C. W . Andrews, J. A. Bentley,
VCH, John Wiley & Sons, New York, 2000, pp. and M. J. Cory, J. Chem. Znf. Comput. Sci., 36,
1-5 1. 750 (1996).
8. D. C. Spellmeyer and P. D. J. Grootenhuis, 23. (a)W . G. Glen,W . J. Dunn 111, and D. R. Scott,
Annu. Rep. Med. Chem., 34, 287 (1999). Tetrahedron Comput. Methodol., 2,349 (1989);
9. H. Matter and M . Rarey in G. Jung, Ed., Com- ( b ) C. Cheng, G. Maggiora, M. Lajiness, and
binatorial Chemistry, Wiley-VCH Verlag M. J. Johnson, J. Chem. Znf. Comput. Sci., 36,
GmbH, Weinheim, Germany, 1999, pp. 409- 909 (1996);(c)M. Hassan, J. P. Bielawski, J. C.
439. Hempel, and M. Waldman, Mol. Div., 2, 64
10. Y . C. Martin, J. Comb. Chem., 3,231 (2001). (1996);( d )D. K. Agrafiotis, J. Chem. Znf. Com-
11. (a) R. S. Pearlman, Network Sci., 2, (617) put. Sci., 37,841 (1997).
(1996), available at: http://www.netsci.org/ 24. R. E. Cahart, D. H. Smith, and R. Venkat-
Science/Combichem/feature08.html; ( b ) R. S. araghavan, J. Chem. Znf. Comput. Sci., 25,64
Pearlman and K. M. Smith, Perspect. Drug (1985).
Discov. Des., 9, 3391355 (1998);(c)R. S. Pearl- 25. R. Nilakantan, N. Bauman, J. S. Dixon, and R.
man and K. M. Smith, Drugs Future, 23,885 Venkataraghavan, J. Chem. Inf. Comput. Sci.,
(1998); ( d ) R. S. Pearlman and K. M. Smith, 27, 82 (1987).
J. Chem. Znf. Comput. Sci., 39,28 (1999). 26. (a) S. K. Kearsley, S. Sallamack, E. M. Fluder,
12. J. S. Mason, A. C. Good, and E. J . Martin, Curr. J. D. Andose, R. T . Mosley, and R. P. Sheridan,
Pharm. Des., 7, 567 (2001). J. Chem. Znf. Comput. Sci., 36, 118 (1996);(b)
13. (a) MDL Information Systems Inc., San Lean- R. P. Sheridan, M. D. Miller, D. J. Underwood,
dro, CA, URL: http://www.mdli.com; ( b )C. A. and S. K. Kearsley, J. Chem. Znf. Comput. Sci.,
James, D.Weininger, and J. Delaney, Daylight 36,128 (1996);(c)R. P. Sheridan, J. Chem. Znf
Theory Manual, version 4.72, Daylight Chem- Comput. Sci., 40, 1456 (2000).
ical Information Systems, Inc., URL: http:// 27. (a)G. Schneider, W . Neidhart, T . Giller, and G.
www.daylight.com/dayhtml/doc/theory/theory. Schmid, Angew. Chem. Znt. Ed. Engl., 38,2894
toc.htrnl; (c) UNITYISLN manual available (1999); (b)G. Schneider, 0. Clement-Chomi-
from Tripos, Inc., 1699 South Hanley Road, enne, L. Hilfiger, P. Schneider, S. Kirsch, HIJ.
Suite 303, St. Louis, MO 63144, URL: http:// Bohm, and W . Neidhart, Angew. Chem. Znt.
www.tripos.com. Ed. Engl., 39,4130 (2000).
14. (a)R. D. Brown and Y . C. Martin, J. Chem. Znf. 28. D. Gorse and R. Lahana, Curr. Opin. Chem.
Comput. Sci., 37,1(1997);(b) R. D. Brown and Biol., 4,287 (2000).
Y . C. Martin, J. Chem. Znf. Comput. Sci., 36,
29. C. A. Lipinski, F. Lombardo, B. W . Dominy,
572 (1996).
and P. J. Feeney, Adv. Drug Deliv. Rev., 23, 3
15. D. R. Flower, J. Chem. Znf. Comput. Sci., 38, (1997).
379 (1998).
30. F. R. Burden, J. Chem. Znf. Comput. Sci., 29,
16. M. RandiE, J. Am. Chem. Soc., 97,6609 (1975). 225 (1989).
17. (a)L. B. Kier, L. H. Hall,W . J. Murray, and M. 31. DiverseSolutions was developed by R. S. Pearl-
RandiC, J. Pharm. Sci., 64, 1971 (1975); ( b ) man and K. M. Smith at the University of
L. B. Kier, L. H. Hall, and W . J. Murray, Texas, Austin, and is distributed by Tripos,
J. Pharm. Sci., 64,1974 (1975). Inc., St. Louis, MO.
18. L. H. Hall and L. B. Kier, J. Mol. Graph. Mod- 32. (a) H. Gao, J. Chem. Znf. Comput. Sci., 41,402
ell., 20, 4 (2001). (2001);( b ) D. Stanton, J. Chem. Znf. Comput.
19. M. RandiC, J. Mol. Graph. Modell., 20, 19 Sci., 3 9 , l l (1999);(c)D. Schnur, J. Chem. Znf
(2001). Comput. Sci., 39,36 (1999);(c)D. Schnur and
20. (a) L. H. Hall and L. B. Kier in K. B. Lipkowitz P. Venkatarangan i n A. K. Ghose and V . N .
and D. B. Boyd, Eds., Reviews in Computa- Viswanadhan, Eds., Combinatorial Library
tional Chemistry,Vol. 2,VCH Publishers, New Design and Evaluation, Marcel Dekker, New
York, 1991, pp. 367-422; ( b ) MOLCONN-Z, York, 2001, pp. 473-501; ( d ) B. R. Beno and
Hall Associates Consulting, 2 Davis Street, J. S. Mason, Drug Discovery Today, 6, 251
References
(2001); (e) X.-C. Wang and J. Saunders, Ab- DBprez, J. Med. Chem., 44,3378 (2001); (b) R.
stracts of Papers, 222nd ACS National Meet- Poulain, D. Horvath, B. Bonnet, C. Eckhoff, B.
ing, Chicago, IL, August 26-30, 2001 (2001), Chapelain, M.-C. Bodinier, and B. DBprez,
MEDI-012; (f) E. L. Stewart, P. J. Brown, J. A. J. Med. Chem., 44, 3391 (2001).
Bentley, and T. M. Willson, Abstracts of Pa- 42. (a) P. J. Goodford, J. Med. Chem., 28, 849
pers, 222nd ACS National Meeting, Chicago, (1985); (b) D. N. A. Bobbyer, P. J. Goodford,
IL, August 2640,2001 (2001), COMP-182; (g) and P. M. McWhinnie, J.Med. Chem., 32,1083
Y. Gao andV. Goodfellow, Abstracts of Papers, (1989); (c) The GRID program is developed and
221st ACS National Meeting, San Diego, CA, distributed by Molecular Discovery Ltd.
2001 (2001) MEDI-235; (h) X.4. Wang and J.
Saunders, Abstracts of Papers, 221st ACS Na- 43. E. J. Martin and T. J. Hoeffel, J. Mol. Graph.
tional Meeting, San Diego, CA, 2001 (20011, Modell., 18,383 (2000).
MEDI-207; (i) J. Saunders, Proceedings of the 44. A. R. Leach, D. V. S. Green, M. M. Hann, D. B.
IBC Conference on Drug Discovery by Design, Judd, and A. C. Good, J. Chem. Znf. Comput.
Boston, MA, November 5-8,2001; (j) B. Pirard Sci., 40, 1262 (2000).
and S. D. Pickett, J. Chem. Znf: Comput. Sci., 45. (a) K. A. Andrews and R. D. Cramer, J. Med.
40,1431 (2000) . Chem., 43,1723 (2000); (b) R. D. Cramer, M. A.
33. (a) A. C. Good and J. S. Mason, Reviews in Poss, M. A. Hermsmeier, T. J. Caulfield, M. C.
Computational Chemistry, Vol. 7, VCH, New Kowala, and M. T. Valentine, J. Med. Chem.,
York, 1995, pp. 67-127; (b) G. W. A. Milne, 42,3919 (1999); (c) R. D. Cramer, R. D. Clark,
M. C. Nicklaus, and S. Wang, SAR QSAR En- D. E. Patterson, and A. M. Ferguson, J. Med.
viron. Res., 9 , 2 3 (1998); (c) W. A. Warr and P. Chem., 39,3060 (1996).
Willett, Design of Bioactive Molecules, Arneri- 46. (a) H. Matter and T. Potter, J. Chem. Znf Com-
can Chemical Society, Washington, DC, 1998, put. Sci., 39, 1211 (1999); (b) V. J . Van Geer-
pp. 73-95. estein, H. Hamersma, and S. P. Van Helden in
34. A. C. Good, J . S. Mason, and S. D. Pickett, H. Van de Waterbeemd, B. Testa, and G. Folk-
Methods Princ. Med. Chem., 10 (Virtual ers, Eds., Computer-Assisted Lead Finding
Screening for Bioactive Molecules), 131 (2000). and Optimization, Verlag Helvetica Chimica
35. S. D. Pickett, J. S. Mason, and I. M. McLay, Acta, Basel, Switzerland, 1997, pp. 159-178.
J. Chem. Inf: Comput. Sci., 36,1214 (1996). 47. M. Hahn, J. Chem. Znf: Comput. Sci., 37, 80
36. E. K. Davies in I. M. Chaiken and K. D. Janda, (1997).
Eds., Molecular Diversity a n d Combinatorial
48. 0. F. Giiner, M. Waldman, R. Hoffmann, and
Chemistry: Libraries and Drug Discovery,
J.-H. Kim in 0. F. Giiner, Ed., Pharmacophore
American Chemical Society, Washington, DC,
Perception, Development and Use in Drug De-
1996, pp. 309-316.
sign, International University Line, La Jolla,
37. (a) J. S. Mason, I. Morize, P. R. Menard, D. L. CA, 2000, p. 213.
Cheney, C. R. Hulme, and R. F. Labaudiniere,
J.Med. Chem., 42,3251 (1999); (b) J. S. Mason 49. R. Carbo, L. Leyda, and M. Arnau, Znt. J.
and D. L. Cheney, Proc. Pac. Symp. Biocom- Quantum Chem., 17,1185 (1980).
put., 4, 456 (1999); (c) J. S. Mason, and D. L. 50. A. C. Good, E. E. Hodgkin, and W. G. Richards,
Cheney, Proc. Pac. Symp. Biocomput., 6, 576 J. Chem. Inf: Comput. Sci., 32,188 (1992).
(2000); (dl J. S. Mason and B. R. Beno, J.Mol. 51. (a)D. J. Wild and P. Willett, J. Chem. Znf: Com-
Graph. Modell., 18, 438 (2000). put. Sci., 36, 159 (1996); (b) D. A. Thorner,
. (a)M. J. McGregor and S. M. Muskal, J. Chem. D. J. Wild, P. Willett, and P. M. Wright,
Znf: Comput. Sci., 39, 569 (1999); (b) M. J. J. Chem. Znf: Comput. Sci., 36,900 (1996).
McGregor, and S. M. Muskal, J. Chem. Znf 52. A. Schuffenhauer, V. Gillet, and P. Willett,
Comput. Sci., 40, 117 (2000). J. Chem. Znf: Comput. Sci., 40, 295 (2000).
. E. K. Bradley, P. Beroza, J. E. Penzotti, P. D. J. 53. R. D. Cramer 111, D. E. Patterson, and J. D.
Grootenhuis, D. Spellmeyer, and J. L. Miller,
J. Med. Chem., 43,2770 (2000). Bunce, J. Am. Chem. Soc., 110, 5959 (1988).
0. D. Horvath in A. K. Ghose and V. N. Viswa- 54. G. Cruciani, P. Crivori, P.-A. Carrupt, and B.
nadhan, Eds., Combinatorial Library Design Testa, THEOCHEM, 603, 17 (2000).
and Evaluation, Marcel Dekker, New York, 55. (a)W. Guba and G. Cruciani in K. Guberrtofte
2001, pp. 429-472. and F. S. Jorgensen, Eds., Molecular Modeling
. (a) R. Poulain, D. Horvath, B. Bonnet, C. Eck- a n d Prediction of Bioreactivity, Plenum, New
hoff, B. Chapelain, M.-C. Bodinier, and B. York, 2000, pp. 89-95; (b) P. Crivori, G. Cru-
Combinatorial Library Design, Molecular Similarity, and Diversity Applications
ciani, P.-A. Carrupt, and B. Testa, J. Med. D. B. Turner, S. M. Tyrrell, and P. Willett,
Chem., 43,2204 (2000). J. Chem. Inf. Comput. Sci., 37,18 (1997).
56. M. Pastor, G. Cruciani, I. McLay, S. Pickett, 70. S. D. Pickett, C. Luttmann, V . Guerin, A.
and S. Clementi, J. Med. Ckern., 43, 3233 Laoui, and E. James, J. Chem. Inf. Comput.
(2000). Sci., 38, 144 (1998).
57. E. J. Martin, J. M. Blaney, M. A. Saini, D. C. 71. J. Mount, J. Ruppert, W . Welch, and A. Jain,
Spellmeyer, A. K. Wong, and W . H. Moos, J. Med. Chem., 42, 60 (1999).
J. Med. Chem., 38, 1431 (1995). 72. M. Snarey, N. K. Terrett, P. Willett, and D. J.
Wilton, J. Mol. Graph., 15, 372 (1997).
58. C. A. James, D. Weininger, and J. Delaney,
Fingerprints-Screening and Similarity, Day- 73. M. Waldman, H. Li, and M. Hasan, J. Mol.
light Theory Manual v4.72, Daylight Chemical Graph. Modell., 18,412 (2000).
Information Systems, Inc., URL: http://www. 74. M. Hann, B. Hudson, X . Lewell, R. Lifely, L.
daylight.com/dayhtml/doc/theory/theory.toc. Miller, and N. Ramsden, J. Chern. Inf. Comput.
html. Sci., 39, 897 (1999).
59. (a)P. R. Menard, J. S. Mason, I. Morize, and S. 75. (a) D. E. Clark and S. D. Pickett, Drug Discov-
Bauerschmidt, J. Chem. Inf. Cornput. Sci., 38, ery Today, 5, 49 (2000);( b ) P. J. Eddershaw,
1204 (1998);(b) P. R. Menard, R. A. Lewis, and A. P. Beresford, and M. K. Bayliss, Drug Dis-
J. S. Mason, J. Chem. Inf. Comput. Sci., 38,497 covery Today, 5,409-414 (2000);(c)H. van de
(1998). Waterbeemd, D. A. Smith, K. Beaumont, and
D. K. Walker, J. Med. Chem., 44, 1313 (2001).
60. P. Willett, Similarity and Clustering in Chem-
76. S. D. Pickett, D. E. Clark, and I. M. McLay,
ical Information Systems, Research Studies
Press, Letchworth, UK, 1987. J. Chem. Inf. Comput. Sci., 40,263 (2000).
77. D. E. Clark, J. Pharrn. Sci., 88,807 (1999).
61. R. A. Jarvis and E. A. Patrick, ZEEE Trans.
78. (a)A. C. Good and R. A. Lewis, J. Med. Chem.,
Cornput., C-22,1025 (1973).
40, 3926 (1997); (b)R. A. Lewis, A. C. Good,
62. (a) P. Willett,V . Winterman, and D. Bawden, and S. D. Pickett in H. V a n de Waterbeemd, B.
J. Chem. Znf.Comput. Sci., 26,109 (1986);( b ) Testa, and G. Folkers, Eds., Computer-As-
J . B. Dunbar, Perspect. Drug Discou. Des., 718, sisted Lead Finding and Optimization,Verlag
51 (1997). Helvetica Chimica Acta, Basel, Switzerland,
63. T . N. Doman, J. M. Cibulskis, M. J. Cibulskis, 1997, pp. 135-156; (c)V . J. Gillet, P. Willet, J.
P. D. McCray, and D. P. Spangler, J. Chem. Inf. Bradshaw, and D. V . S. Green, J. Chem. Znf:
Comput. Sci., 36, 1195 (1996). Comput. Sci., 39, 169 (1999).
64. (a) J. M. Barnard and G. M. Downs, J. Chem. 79. G. Grassy, A. Yasri, R. Lahana, J. Woo, S. iyer,
Inf. Comput. Sci., 37,141 (1997);( b )J. M. Bar- M. Kaczorek, R. Folc'h, and R. Buelow, Nat.
nard, and G. M. Downs, Perspect. Drug Discov. Biotechnol., 16, 748 (1998).
Des., 718, 13 (1997). 80. J . S. Mason i n P. M. Dean and R. A. Lewis,
65. G. M. Downs, P. Willett, and W . Fisanick, Eds., Molecular Diversity Drug Design, Klu-
J. Chern. Inf. Comput. Sci., 34, 1094 (1994). wer Academic, Dordrecht, Netherlands, 1999,
pp. 67-91.
66. (a) M. Lajiness, Perspect. Drug Discov. Des.,
81. (a) A. C. Good and I. D. Kuntz, J. Camput.-
7/8,55 (1997);( b )V . J. Gillet and P. Willett i n
Aided Mol. Des., 9 , 373 (1995);( b ) M. J. Ash-
A. K. Ghose and V . N. Viswanadhan, Eds.,
ton, M . Jaye, and J. S. Mason, Drug Discovery
Combinatorial Library Design and Eualua-
Today, 1, 71 (1996).
tion, Marcel Dekker, New York, 2001, pp. 379-
398; (c)R. D. Clark, J. Chem. Inf:Comput. Sci., 82. J. H . V a n Drie and R. A. Nugent, SAR QSAR
37, 1181 (1997); (dl T . Potter and H. Matter, Enuiron. Res., 9 , 1 (1998).
J. Med. Chem., 41,478 (1998). 83. A. C. Good, Internet J. Chem., 3 (20001,http://
www.ijc.com/article/2OOOv3/9/.
67. B. D. Hudson, R. M . Hyde, E. Rahr, and J.
84. D. E. Patterson, R. D. Cramer, A. M. Ferguson,
Wood, Quant. Struct.-Act. Relat., 15, 285
R. D. Clark, and L. E. Weinberger, J. Med.
(1996).
Chem., 39,3049 (1996).
68. R. E. Higgs, K. G. Bemis, I. A. Watson, and 85. A. C. Good, J. S. Mason, D. V . S. Green, and
J . H.Wikel, J. Chern. Inf. Comput. Sci., 37,861 A. R. Leach in A. K. Ghose and V . N. Viswa-
(1997). nadhan, Eds., Combinatorial Library Design
69. (a)J. D. Holliday, S. S. Ranade, and P. Willett, and Evaluation, Marcel Dekker, New York,
Quant. Struct.-Act. Relat., 14, 501 (1995); ( b ) 2001, pp. 399-428.
rences
108. J. H. Voigt, B. Bienfait, S. Wang, and M. C. 124. E. K. Kick, D. C. Roe, A. G. Skillman, G. Lin,
Nicklaus, J. Chem. Znf. Comput. Sci., 41, 702 T . J. A. Ewing, Y . Sun, I. Kuntz, and J. A.
(2001). Ellman, Chem. Biol., 4,297 (1997).
109. M. J. McGregor and P. V . Pallai, J. Chem. Znf. 125. D. C. Roe in P. M. Dean and R. A. Lewis, Eds.,
Comput. Sci., 37, 443 (1997). Molecular Diversity Drug Design, Kluwer Ac-
110. (a)R. B. Nilakantan, N. Bauman, K. S. Haraki, ademic, Dordrecht, Netherlands, 1999, pp.
and R. J. Venkataraghavan, J. Chem. In6 141-173.
Comput. Sci., 30,65 (1990);( b )R. B. Nilakan- 126. (a) I. D. Kuntz, J. M. Blaney, S. J. Oatley, R.
tan, N . Bauman, and K. S. Haraki, J. Cornput.- Langridge, and T . E. Ferrin, J. Mol. Biol., 161,
Aided Mol. Des., 11,447 (1997). 269 (1982); DOCK is developed and distrib-
111. E. J. Martin and R. E. Critchlow, J. Comb. uted by the Kuntz Group, Dept. o f Pharmaceu-
Chem., 1,32 (1999). tical Chemistry, 512 Parnassus, University of
112. (a)M. J. McGregor and S. M. Muskal, J. Chem. California, San Francisco, C A 94143-0446,
Inf. Comput. Sci., 39, 569 (1999); ( b ) M. J. URL: http://www.cmpharm.ucsf.edu/kuntz;
McGregor and S. M. Muskal, J. Chem. Znf. ( b )T . J. A. Ewing and I. D. Kuntz, J. Comput.
Comput. Sci., 40, 117 (2000). Chem., 18, 1175 (1997); (c)Y . Sun, T . J. A.
Ewing, A. G. Skillman, and I. D. Kuntz,
113. B. A. Hopkins, Ann. Bot., 18,213 (1954). J. Cornput.-Aided Mol. Des., 12,597 (1998).
114. (a)T . R. Hagadone and M. S. Lajiness, Tetra- 127. (a) S. F. Brady, et al., J. Med. Chem., 41, 401
hedron Comput. Methodol., 1, 219 (1988);( b ) (1998);( b ) H. J. Bohm, D. W . Banner, and L.
T . R. Hagadone, J. Chem. Znf. Comput. Sci., Weber, J. Cornput.-Aided Mol. Des., 13, 51
32,515 (1992). (1999).
115. A. Gobbi, D. Poppinger, and B. Rohde, Per-
128. (a)A. Rockwell, M. Melden, R. A. Copeland, K.
spect. Drug Discov. Des., 718, 131 (1997).
Hardman, C. P. Decicco, and W . F. DeGrado,
116. A. R. Leach, J. Bradshaw, D. V . S. Green, and J. Am. Chem. Soc., 118,10337 (1996);( b )A. K.
M. M. Hann, J. Chem. Inf. Comput. Sci., 39, Szardenings, D. Harris, S. Lam, L. Shi, D.
1161 (1999). Tien,Y .Wang, D.V . Patel, M. Navre, a n d D. A.
117. G. Bravi, D.V . S. Green, M. M. Hann, and A. R. Campbell, J. Med. Chem., 41,2194 (1998).
Leach, J. Chem. Znf. Comput. Sci., 40, 1441 129. D. Tondi, U . Slomiczynska, M. P. Costi, D. M.
(2000). Watterson, S. Ghelli, and B. K. Shoichet,
118. (a)W . P. Walters, Presented at the Daylight Chem. Biol., 6 , 319 (1999).
User Group Meeting, MUG'99.1999. Available
online at http://www.daylight.com/meetings/
130. M. L. Lamb, K. W . Burdick, S. Toba, M. w.
Young, A. G. Skillman, X . Zou, J. R. Arnold,
mug991Walters/index.html;( b )W . P. Walters,
and I. D. Kuntz, Proteins Struct. Funct. Genet.,
M. T . Stahl, and M. A. Murcko, Drug Discovery
42,296 (2001).
Today, 3,160 (1998).
119. M. Stahl, Methods Princ. Med. Chem., 10 (Vir- 131. I. Ugi and C. Steinbruckner, Chem. Ber., 94,
tual Screening for Bioactive Molecules), 229 734 (1961).
(2000). 132. T . F. Herpin, G. C. Morton, A. K. Dunn, C.
120. H. J. Bohm and M. Stahl, Curr. Opin. Chem. Fillon, P. R. Menard, S. Y . Tang, J. M. Salvino,
Biol., 4, 283 (2000). and R. F. Labaudiniere, Mol. Diversity, 4, 221
(2000).
121. A. R. Leach, R. A. Bryce, and A. J. Robinson, J.
Mol. Graph. Modell., 18, 358 (2000). 133. C. M. Murray and S. J. Cato, J. Chem. Inf.
122. R. P. Sheridan, S. G. SanFeliciano, and S. K. Comput. Sci., 39,46 (1999).
Kearsley, J. Mol. Graph. Modell., 18, 320 134. A. Tulinsky, K. Padmanbhan, K. P. Padmanb-
(2000). han, C. H. Park,W . Bode, R. Huber, D. T . Blan-
123. D. Tondi and M. P. Costi i n A. K. Ghose and kenship, A. D. Cardin, and W . Kisiel, J. Mol.
V . N. Viswanadhan, Eds., Combinatorial Li- Biol., 232,947 (1993).
brary Design and Evaluation, Marcel Dekker, 135. M. Rarey and M. Stahl, J. Cornput.-Aided Mol.
New York, 2001, pp. 563-603. Des., 15,497-520 (2001).
CHAPTER SIX
Virtual Screening
INGOMUEGGE
ISWANENYEDY
Bayer Research Center
West Haven, Connecticut
Contents
1 Introduction, 244
2 Concepts of Virtual Screening, 244
2.1 Druglikeness Screening, 245
2.1.1 Counting Schemes, 245
2.1.2 Functional Group Filters, 246
2.1.3 Topological Drug Classification, 247
2.1.3.1 Artificial Neural Networks and
Decision Trees, 247
2.1.3.2 Structural Frameworks and Side
Chains of Known Drugs, 248
2.1.4 Pharmacophore Point Filter, 249
2.2 Focused Screening Libraries for Lead
Identification, 250
2.2.1 Targeting Protein Families, 251
2.2.2 Privileged Structures, 251 .
2.3 Pharmacophore Screening, 252
2.3.1 Introduction to Pharmacophores, 252
2.3.2 Databases of Organic Compounds, 254
2.3.3 2D Pharmacophore Searching, 255
2.3.4 3D Pharmacophores, 255
2.3.4.1 ~ i ~ a n d - ~ aPharmacophore
sed
Generation, 255
2.3.4.2 Manual Pharmacophore
Generation, 256
2.3.4.3 Automatic Pharmacophore
Generation, 256
2.3.4.4 Receptor-Based Pharmacophore
Generation, 259
2.3.5 Pharmacophore-Based Virtual
Screening, 259
Structure-Based Virtual Screening, 260
2.4.1 Protein Structures, 261
2.4.2 Computational Protein-Ligand
Docking Techniques, 262
2.4.2.1 Rigid Docking, 262
Chemistry and Drug 19isco17ery 2.4.2.2 Flexible Ligands, 263
me 1: Drug Discovery 2.4.3 Scaring of Protein-Ligand Interactions,
Abraham 264
0 2003 John Wiley & Sons.. Inc. 2.4.3.1 Force Field (FF)Scoring, 264
Virtual Screening
andlor screen against a specific target protein, Table 6.1 Mica1 Ranges for Parameters
to a manageable number of compounds that to
exhibit the highest chance to lead to a drug Parameter Minimum Maximum
candidate (10, 19). The major sources of infor- LogP -2 5
mation to guide virtual screening for a partic- Molecular weight 200 500
ular target are derived from the following Hydrogen bond acceptors 0 10
questions: Hydrogen bond donors 0 5
Molar refractivity 40 130
1. What does a drug look like in general? Rotatable bonds 0 8
Heavy atoms 20 70
2. What is known about compounds that in- polar surface area [A2] o 120
teract with the receptor? Net charge -2 +2
3. What is known about the structure of the "Data taken from ref. 21.
target protein and the protein-ligand
interactions?
2.1 .I Counting Schemes. Database collec-
In the following subsections we address tions of known drugs [e.g., CMC (301, WDI
these three points, outlining concepts of as- (311, or MDDR (3211 are typically used to ex-
sessing the overall druglikenss of molecules, tract knowledge about structure and proper-
the concentration of subsets of molecules in ties of potential drug molecules. Key physico-
focused libraries, and the identification of spe- chemical properties such as molecular weight,
cific leads through structure-based virtual charge, and lipophilicity (33, 34) of drug col-
screening techniques. lections are profiled to extract simple counting
rules for relevant descriptors of ADMET-re-
lated parameters. ~ x a m ~ iinclude
es Lipinski's
2.1 Druglikeness Screening
"rule-of-five" (33), which limits the range for
Many drug candidates fail in clinical trials be- molecular weight (MW 5 500), computed oc-
cause of reasons unrelated to potency against tanol-water partition coefficient (Clog P 5 5),
the intended drug target. Pharmacokinetics and hydrogen-bond donors and acceptors
and toxicity issues are blamed for more than (OHs + NHs I5; Ns + 0s 5 10). Other au-.
half of all failures in clinical trials. Therefore, thors limit the number of rotatable bonds (RB
the first part of virtual screening evaluates the 5 8) or rings in a molecule (number of rings
druglikeness of small molecules, mostly inde- 5 4) (34). Table 6.1 shows a list of typical
pendent of their intended drug target (there boundaries of counting parameters. Figure 6.1
are specific drug classes such as those acting in illustrates the profiling procedure for these
the central nervous system that require spe- counting parameters using polar surface area
c drug profiles). Druglike molecules exhibit (PSA) (35) as a descriptor. Collections of 776
vorable absorption, distribution, metabo- orally administered CNS drugs and 1590
m, excretion, and toxicological (ADMET) orally administered non-CNS drugs that
ameters (20-24). They are synthetically reached phase I1 efficacy studies were ana-
asible and possess pharmacophore features lyzed for their PSA. It was found that 90% of
hat offer the chance of specific interactions the non-CNS compounds have a PSA below
th the intended protein target. Druglike- 120 A2;90% of CNS drugs have a PSA below
ss is currently assessed using the following 80 A2.Although it is possible that drugs have
s of methods: simple counting methods, higher PSA values and are still orally bioavail-
ctional group filters, topological filters, and able or penetrate the blood-brain barrier (as
armacophore filters. Computational tech- the result of active transport or other rea-
ques used to identify druglikeness include sons), the profile suggests that it is much less
ural networks (25-27), recursive partition- likely. It is therefore a reasonable assumption
approaches (25, 28), and genetic algo- in a virtual screening approach to discrimi-
thms (29). These methods are further dis- nate against compounds outside the most pop-
ulated descriptor space (in this case, PSA
Virtual Screening
< 120 A'), especially if the compound lies out- imation for eliminating potentially toxic com-
side the optimal region for several descriptors pounds. Better descriptions of toxicity may be
(e.g., MW > 500 and Clog P > 5). provided by structure-based methods to assess
Simple descriptors as described above are toxicity of compounds. They draw primarily
quickly calculated and counted. Therefore, af- from mutagenicity, carcinogenicity, and acute
ter typically removing compounds with atoms toxicity databases assembled, for instance, by
other than C, N, 0 , S, H, P, Si, C1, Br, F, and I, the National Toxicology Program (37) and the
counting schemes present the first filter in vir- Toxic Effect of Chemical Substances database,
tual screening approaches. RTECS (38). CASETox (39), TOPKAT (401,
and DEREK (41) are commercial software
2.1.2 Functional Croup Filters. Reactive,
toxic, or otherwise unsuitable compounds,
such as natural product derivatives, are re-
moved using specific substructure filters. Fig-
ure 6.2 shows a subset of substructures that
lead to the dismissal of compounds in virtual
screening. Typical reactive functional groups Sulfonyl halides Acyl halides Alkyl halides
include, for example, reactive alkyl halides,
peroxides, and carbazides. Unsuitable leads
may include crown ethers, disulfides, and ali-
phatic methylene chains seven or more long.
Unsuitable natural products may include qui-
nones, polyenes, or cycloheximide derivatives.
Anhydrides Halopyrimidines Epoxides
A list of such fragments coded in Daylight
SMARTS is given, for example, by Hann and
coworkers (36). It should be noted, however,
that natural product derivatives are not al-
ways unsuitable leads. Aldehydes lmines Thioesters
Screening out compounds that contain cer-
tain atom groups associated with toxicity pro- Figure 6.2. Selection of reactive functional groups
vides a practical and fast way to reduce large that should be removed from a virtual screen (exam-
databases; however, it is only a crude approx- ples taken from Ref. 212).
2 Concept:s of Virtual Screening 247
'U g (Output = 1)
)n-drug (Output = 0)
Wij \\ 1 /Hidden laver
Figure 6.3. Neural network
architecture for prediction of
druglikeness.
products that can be used to evaluate virtual structure. Ninety-one statistically significant
compoun~ dfor
s potential toxicity. atom types correspond to 91 input neurons of
the neural net. Typically, five neurons in the
2.1.3 Topological Drug Classification. It is hidden layers are used in the net design (25,
generallj assumed that compounds with 27). The single neuron output layer can vary
structunil similarity to known drugs may ex- between 0 (nondrugs) or 1(drugs).Trained on
hibit dru.glike properties themselves, such as 5000 drugs taken from the WDI and 5000 com-
oral bios~vailability,low toxicity, membrane pounds labeled nondrugs taken from the ACD,
permeability, and metabolic stability. Follow- the resulting neural net was shown to cor-
ing this iidea, drug databases and reagent da- rectly classify about 80% of other drugslnon-
tabases sluch as the ACD (42) as negative con- drugs (27).
trol (assuming they do not contain many Recursive partitioning, also known as the
drugs) hiave been analyzed to find structural decision tree approach, is another powerful
features of drugs and nondrugs. Neural net- method to extract knowledge from a database.
work aplyoaches have been devised (25, 27) Wagener and Geerestein have explored the
that can discriminate between drugs and non- WDI and ACD databases to train a decision
drugs urlith about 80% certainty. Recursive tree for the discrimination of drugs and non-
partitioning approaches classify drugs and drugs (28). Figure 6.4 shows a partial decision -
nondrug3 with similar accuracy. tree derived by the authors. One rule derived
2.1.3.1 Artificial Neural Networks and De- from this partial tree is, for example, if a com-
cision Trc:es. Figure 6.3 shows an example of a pound possesses no alcohol and a tertiary ali-
simple neural network that uses Ghose and phatic amine but no methylene linker between
Crippen i&tomtypes (43) to code the molecular a heteroatom and a carbon atom, it is not
Alcohol
I
Tertiary am
Figure 6.4. Partial decision tree from Wagener
and Geerestein (28). C(n)spXdescribes a carbon
1 amine I Nondrug I with hybridization spx and formal oxidation
number n. X refers to a heteroatom; R refers to
any group linked through a carbon. The tree
starts at the top left corner. Here is an example
Phenol; en101; carboxyl
\r.. of how to read the tree: If a compound contains
CHX yes_ El an alcohol, it is classified as a drug. If it does not
contain an alcohol, the presence of a tertiary
amine is checked. If it contains a tertiary amine
and also contains (does not contain) a CH, group
with attached heteroatom as well as another R
group, it is classified as drug (nondrug).
Virtual Screening
"
Urea
I
----
, ,
-;-N+-;-
I . ,
Aromatic nitrogen-
aliphatic carbon
50.0 o
0 Pharmacophore points are fused and type databases reveal that about two thirds of
counted as one if they are separated by less drugs and nondrugs can be classified correctly
than two carbon atoms. by PF1. This performance is not as impressive
0 Molecules with less than two and more than as that of neural networks. However, as a filter
seven pharmacophore points fail the filter. for virtual screening, pharmacophore point fil-
Amines are considered pharmacophore ters offer some advantages. First, the occur-
points but not azoles or diazines. rence and count of pharmacophore points can
be evaluated on the building-block level of a
Compounds with more than one carboxylic
virtual combinatorial library. No enumeration
acid are dismissed.
is necessary as for druglike neural nets. Sec-
Compounds without a ring structure are ond, the results of the pharmacophore point
dismissed. filter can be easily interpreted. Third, the set-
0 Intracyclic m i n e s in the same ring are tings of the filter can be easily adjusted (e.g.,
fused to one pharmacophore point. PF1 for non-CNS drugs, PF2 for CNS drugs).
The requirement of two distinct pharmaco- 2.2 Focused Screening Libraries for Lead
phore points neglects at least one very impor- Identification
tant class of drugs: biogenic amine-containing
CNS drugs. Therefore, a second pharmaco- Without the knowledge about specific drug
phore filter has been designed that requires targets it is sometimes useful to apply virtual
only one pharmacophore point in small mole- screening for the design of focused libraries of
cules of the type amine, amidine, guanidine, or a few thousand compounds rather than to find
carboxylic acid (PF2). a small number of hits to be tested against a
An analysis of drug databases and reagent- specific target. To save resources it may some-
times be more prudent not to run the entire
HTS file against a target protein; instead, a
focused library with higher chances of con-
taining hits may be scrutinized. Those focused
libraries may be designed to target specific
protein families such as GPCRs, kinases, or
Figure 6.9. Functional motifs of drugs used to nuclear hormone receptors. They can also be
build pharmacophore points. enriched with privileged structures that occur
2 Concepts of Virtual Screening
more often in drug molecules andlor were compound correlated very well with the
found to inhibit members of the protein fam- GPCR-likeness of the most GPCR-like build-
ing block it contained. This offers an impor-
tant advantage for the design of combinatorial
2.2.1 Targeting Protein Families. Target libraries because, for large virtual libraries,
class-directedlibraries can be built from avail- the computer costs for enumeration go with
able compounds or be synthesized in combina- the power of the number of R groups and thus
torial fashion. The design of target class-di- very quickly becomes impractical. For in-
rected libraries relies on the identification of stance, for a 3-R-group library with 1000
structural motifs in small molecules or in building blocks each, the enumerated library
building blocks for combinatorial libraries would contain 1 billion compounds to be ana-
that can be linked to increased activity for the lyzed, whereas the building block-level analy-
target class. Functional groups that show the sis needs to examine only 3000 compounds.
propensity to hit a certain target class can be Figure 6.10 shows a list of amine building
found by examining ligands from the litera- blocks extracted from the ACD that were
ture. Recurring motifs for GPCRs include, for found to be most GPCR-like by the neural net.
ple, piperazines, morpholines, and pip- Not every portion of a GPCR-like molecule
nes; for kinases they include, for example, has to be GPCR-like. The presence of one
erobicyclic compounds or pyrimidines. GPCR-like moiety (building block or core
ompounds bearing those structural motifs structure) is sufficient to make a compound
e thought to have a generally higher chance GPCR-like. Therefore, the neural network of-
o be active against the respective target fers two different strategies for the design of
ses. A more rigorous approach to identify GPCR-like libraries: (1) GPCR-like core +
e "GPCR-likeness" of compounds or build- druglike building blocks (need not be GPCR-
gblocks can be provided by a statistical anal- like); (2) non-GPCR-like core + GPCR-like
's of druglike databases. Neural networks building blocks. Virtual screening of a data-
e been shown to be particularly useful in base of existing compounds using the de-
iassifying chemical matter, such as CNS-ac- scribed neural net can be applied to assemble a
e compounds (26,48). focused screening library. Alternatively, com-
A neural network approach similar to that binatorial libraries can be designed.
f Sadowski and Kubinyi (27) has been de-
recently to address the "GPCR-like- 2.2.2 Privileged Structures. Privileged struc-
ness" of small molecules as well as building tures are structural types of small molecules
locks for combinatorial libraries (49). A feed- that are able to bind with high affinity to multi-
ard neural net was trained using 5000 ple classes of receptors (50). An enrichment of
nds from the MDDR that target libraries with privileged structures may in-
and 5000 compounds that target other crease the chance of finding active compounds.
em classes. Using the "activity-class" field Examples of privileged structures include ben-
e database, about 20,000 GPCR-like and zazepine analogs found to be effective ligands
,000 non-GPCR-like have been identified by for an enzyme that cleaves the peptide angioten-
ries such as 5HT, leukotriene, and PAF. sin I, whereas others are effective CCK-A recep-
resulting neural net classifies GPCR-like tor ligands. Cyproheptadine derivatives were
pounds correctly with 80% certainty. A n found to have peripheral anticholinergic, antise-
ent test of compounds in our propri- rotonin, antihistaminic, and orexigenic activity.
atabase that were found to hit GPCRs Hydroxamate and benzamidine derivatives
other targets showed a correct prediction of have been shown to be privileged structures for
CR-like compounds in 70% of the cases. metalloproteases and serine proteases, respec-
en several virtual combinatorial libraries tively. For the class of 7-transmembrane G-pro-
e analyzed, it turned out that the property tein-coupled receptors a large number of privi-
eing GPCR-like could be attributed to the leged structures has been found including, for
R-likeness of the building blocks alone; example, diphenylmethane, diazepine, benzaz-
is, the GPCR-likeness of the enumerated epine, biphenyltetrazole, spiropiperidine, in-
252 Virtual Screening
dole, and benzylpiperidine (51). Some ubiqui- sional (3D) conformations of each molecule]
tously privileged structures have recently been (10). Another interesting aspect of pharma-
identified (52).They include carboxylicacids, bi- cophores in virtual screening is 3D-pharma-
phenyls, diphenylmethane, and, to a lesser ex- cophore diversity. Although the diversity con-
tent, naphthyl, phenyl, cyclohexyl, dibenzyl, cept for virtual compounds in general is not
benzimidazole, and quinoline. applicable because of the enormity of the
chemical space, diversity in pharmacophore
2.3 Pharmacophore Screening space is a feasible concept. Virtual libraries
In cases where no structural information can therefore be optimized for covering a wide
about the target protein is given, pharmaco- pharmacophore space.
phore models can provide powerful filter tools
for virtual screening (53). Even in cases where 2.3.1 Introduction to Pharmacophores. In
the protein structure is available, pharma- 1894 Emil Fischer proposed the "lock-and-
cophore filters should be applied early because key" hypothesis to characterize the binding of
they are generally much faster than docking compounds to proteins (54). This can be con-
approaches (discussed below) and can, there- sidered the first attempt to explain binding of
fore, greatly reduce the number of compounds small molecules to a biological target. Proteins
subjected to the more expensive docking appli- recognize substrates through specific interac-
cations. For example, a pharmacophore model tions. It is a challenge for the medicinal chem-
consisting of three pharmacophore points can ist to synthesize compounds that can capture
be tested against about lo6 compounds in a the 3D arrangement of functional groups in a
few minutes of computer time [disregarding small molecule that forms the pharmacophore
the time it takes to generate three-dimen- and that is responsible for substrate binding
! Concepts of Virtual Screening
F82
Hydrophobic
inieractions
,othe protein. The first definition of the phar- bic region through the cyclopentyl group, and
nacophore formulated by Paul Ehrlich was "a to Asp145 and Asn132 through hydrogen
nolecular framework that carries (phoros) the bonds. The pharmacophore that reflects these
ssential features responsible for a drug's interactions has a hydrogen-bond donor and a
pharmacon) biological activity" (55). This hydrogen-bond acceptor pair that ensures
lefinition was slightly modified by Peter Gund binding to the hinge region, a hydrophobic
o "a set of structural features in a molecule group that corresponds to the cyclopentyl
hat is recognized at a receptor site and is re- binding site, and a hydrogen-bond donor that
iponsible for that molecule's biological activ- ensures binding to Asp145 and/or Asn132.
ty" (56).An example is shown in Fig. 6.11. An Note that in addition to distances that de-
hay structure of CDK2 complexed with the scribe the 3D relationship among pharma-
~denine-derivedinhibitor H717 (57-59) has cophore points, angles, dihedrals, and exclu-
Ieen solved. Interactions that are essential to sion volumes are also used. Each additional
iubstrate and inhibitor binding to the enzyme restraint can reduce the number of hits, thus
d l form the pharmacophore that should be making the compound selection easier for
:aptured by inhibitors binding the same way testing. Pharmacophore hypotheses for
1717 does. As shown in Fig. 6.11, the inhibitor searching can be generated using structural
~indsto the hinge region (Phe82 and Leu83) information from active inhibitors, ligands, or
hrough two hydrogen bonds, to a hydropho- from the protein active site itself (60, 61).
Virtual Screening
CACTVS
Daylight
CACTVS
for distances between all atoms in the mole- has also been the subject of several studies. Xue
cule. OMEGA (76) uses a torsion-driven ap- et al. showed that compounds with similar activ-
proach for building conformers. It generates ity could be identified using mini-fingerprints
low energy conformers for each molecule by (87-89), physicochemical property descriptors
assembling it from fragments and searching (go), or latent semantic structure indexing (91,
through possible orientations of the subunit 92). In addition, similarity searches can be com-
added. WIZARD (77) and COBRA (78), AIMB bined with superstructure searches for limiting
(79) and MIMUMBA (80) employ artificial in- the number of compounds selected. Flexible
telligence techniques for generating a set of match searches are used for identifying com-
user-specified low energy conformations for a pounds that differ from the query structure in
compound. MOLGEO (81) uses a depth-first user-specified ways. In addition, isomer, tau-
approach for generating 3D structures based tomer, and parent molecule searches may be
on connectivity using bond length and bond done to find in a database isomers, tautomers, or
angle tables. IDEALIZE (82) is a molecular parent molecules of the query.
mechanics program that minimizes 2D struc-
tures to generate the corresponding 3D struc-
ture. 2.3.4 3D Pharmacophores
2.3.4.1 Ligand-Based Pharmacophore Gen-
2.3.3 2D Pharmacophore Searching. Search- eration. Ligand-based pharmacophores are
g 2D databases is ofgreat importance for ac- typically used when the crystallographic, solu-
tion structure, or modeled structure of a pro-
tein cannot be obtained. When a set of active
compounds is known and it is hypothesized
for synthesis or analogs of a lead com- that all compounds bind in a similar way to the
protein, then common groups should interact
h a 2D database to identify compounds of with the same protein residues. Thus, a phar-
macophore capturing these common features
r a compound is present in the data- should be able to identify from a database
searches identify larger mol- novel compounds that bind to the same site of
the user-defined query, irre- the protein as the known compounds do. The
ive of the environment in which the query process of deriving a pharmacophore, called
structure occurs (83) (Fig. 6.13). Further- pharmacophore mapping, consists of three
re, substructure searching can identlfy all steps: (1) identifying common binding ele-
ase that share the same ments that are responsible for biological activ-
ity; (2) generating potential conformations
s can be used for gener- that active compounds may adopt; and (3) de-
structure-activity relationships ( S m ) , termining the 3D relationship between phar-
before synthetic plans are made for lead macophore elements in each conformation
ast, superstructure generated. To build a pharmacophore based
are used to find smaller molecules that on a set of active compounds, two methods are
dded in the query (Fig. 6.14). One prob- usually applied. One method is to generate a
arises from substructure searches is set of minimum energy conformations for
the number of compounds identified can each ligand and search for common structural
into the thousands. A solution to this features. Another method is to consider all
is ranking the compounds based on possible conformations of each ligand to eval-
ty to a reference compound. Similarity uate shared orientations of common func-
tional groups. Analyzing many low energy
between com- conformers of active compounds can suggest a
ds in the database and in the query (85,861 range of the distance between key groups that
rs used in simi- will take in account the flexibility of the li-
searches is provided by Willett et al. (86). gands and of the protein. This task can be per-
nd structural similarity, activity similarity formed either manually or automatically.
Virtual Screening
ACD database
Figure 6.13. Compounds identified from the ACD database through substructure search.
3.1g.2 Manual Pharmacophore Genera- and every " conformation considered led to the
tion. Manual pharmacophore generation is distance ranges among pharmacophore points
used when there is an easy way to identify the shown in Fig. 6.16. Because proteins are flex-
common features in a set of active compounds ible, pharmacophores should also have some
and/or there is experimental evidence that flexibility built in, thus justifying the use of
some functional groups should be present in distance ranges.
the ligand for good activity. An example is the 2.3.4.3 Automatic Pharmacophore Genera-
development of a pharmacophore model for tion. Pharmacophore generation through
dopamine-transporter (DAT) inhibitors (Fig. conformational analysis and manual align-
6.16). In the first step common structural fea- ment is a very time-consuming task, especially
tures were identified in the selected five DAT when the list of active ligands is large and the
inhibitors (93-95) (Fig. 6.16, circles). Four out elements of the pharmacophore model are not
of five compounds were structurally rigid, obvious. There are several programs, HipHop
whereas the khydroxy piperidinol was flexi- (961, HypoGen (97), Disco (98), Gasp (99), Flo
ble. A systematic conformational search for (loo), APEX (101), and ROCS (1021, that can
4-hydroxy piperidinol identified 10 possible automatically generate potential pharma-
conformations. Measuring distances among -
cophores from a list of known inhibitors. The
pharmacophore elements in every inhibitor performance of these programs in automated
Figure 6.14. Compounds identified from
the ACD database through superstruc-
ture search.
pharmacophore generation varies depending rigid, Carlson et al. (110) proposed using mo-
on the training set. The use of these programs lecular dynamics simulation for generating a
for pharmacophore generations was recently set of diverse protein conformations to include
reviewed in detail (103). Here we focus on protein flexibility in the pharmacophore de-
common features of these programs. All pro- velopment. In this case distance ranges be-
grams use algorithms that identify common tween pharmacophores are obtained by exam-
pharmacophore features in the training set ining several conformations of the protein.
molecules; they use scoring functions to rank This technique is similar to the one used for
the identified pharmacophores. The following the generation of flexible pharmacophores
features are identified in each molecule: hy- (Fig. 6.16), based on active compounds, when
drogen-bond donors, hydrogen-bond accep- several conformations of the compound and/or
tors, negative and positive charge centers, and many compounds are considered for pharma-
surface accessible hydrophobic regions that cophore mapping.
can be aliphatic, aromatic, or nonspecific.
Most of the programs consider ligand flexibil- 2.3.5 Pharmacophore-Based Virtual Screen-
ity when generating pharmacophores because ing. ~harmaco~hoEe-based virtual screening
compounds might not bind to the protein in is the process of matching atoms and/or func-
the minimum energy conformation. tional groups and the geometric relations be-
2.3.4.4 Receptor-Based pharmacophore Gen- tween them to the pharmacophore in the
eration. If the 3D structure of a receptor is query. Examples of programs that perform
known, a pharmacophore model can be de- pharmacophore-based searches are 3Dsearch
rived based on the receptor active site. Bio- (Ill),Aladdin (53),UNITY (1121,MACCS-3D
chemical data can be used for identifying key (113), Catalyst (114), and ROCS (102). There
residues that are important for substrate are also web-based applications (115,116) that
and/orinhibitor binding. This information can can perform pharmacophore searches. Usu-
be used for building pharmacophores target- ally pharmacophore-based searches are done
ing the region defined by key residues or for in two steps. First, the software checks
choosing among pharmacophores generated whether the compound has the atom types
by an automated program. This can greatly and/or functional groups required by the phar-
improve the chance of finding small molecules macophore; then it checks whether the spatial
that inhibit the protein because the search is arrangement of these elements matches the
focused on a region of the binding site that is query. The fastest approach used in the
cial for binding substrates and inhibitors. matching step is considering rigid compounds.
ligands bind to proteins through non- Because molecules that are not rigid might
d interactions such as hydrogen bonds have a conformation that matches the phar-
d hydrophobic interactions. Programs such macophore, flexibility of the ligands should be
LUDI (104-106) or POCKET (107) can use considered. Flexible 3D searches identify a
e structure of the protein to generate inter- higher number of hits than rigid searches do
ion sites or grids to characterize favorable (117). However, flexible searches are more
sitions that ligand atoms should occupy. time consuming than rigid ones. There are
our types of interaction sites are character- two main approaches for including conforma-
ed: hydrogen-bond donors, hydrogen-bond tional flexibility into the search: one is to gen-
cceptors, and hydrophobic groups that can be erate a user-defined number of representative
pophilic-aliphatic or lipophilic-aromatic. conformations for each molecule when the da-
I-generated interaction maps for Cerius2 tabase is created; the other is to generate con-
dure-Based Focusing (108) do not differ- formations during the search. By use of the
late between aliphatic and aromatic inter- first approach, any rigid search program can
ion sites. This is based on the observation be used for doing a flexible search; however,
Burley and Petsko (109) that, besides aro- generating the database takes more time and
ic side chains, aliphatic and aromatic side disk space. The second approach gives more
ns also pack closely to form the hydropho- flexibility to the user, given that a larger num-
bic core of proteins. Because proteins are not ber of conformations can be generated for each
260 Virtual Screenin
molecule during the search. In this case the a specific target protein.' Computational met1
database search requires more computer re- ods that predict the 3D structure of a proteir
sources; however, this approach will not miss ligand complex are often referred to as molec
conformations that fit the query but were not ular docking approaches (Fig. 6.17) (124
stored in the database. Pharmacophore que- Protein structures can be employed to doc
ries that define distance ranges between phar- ligands into the binding site of the protein an
macophore elements compensate for possible to study their interactions (125). For virtu;
conformational changes in the receptor site screening, the crucial task at hand is the far
upon ligand binding. Also, these flexible phar- and reliable ranking of a database of putatih
macophore queries compensate for the differ- protein-ligand complexes according to the:
ence between using multiconformer databases binding affinities. Depending on ligand an
and generating conformers during the search. protein flexibility, sampling depth, and opt
ROCS is using a shape-based superposition mizing schemes, docking programs used toda
for identifying compounds that have similar (Table 6.2) can facilitate this task within a fe.
shape. Grant and Pickup (118) showed that minutes or sometimes seconds per processc
using atomic-centered Gaussians instead of a and molecule. Virtual screening as a computi
spherical function can dramatically reduce the tion task can be trivially run using parall1
time required for a shape alignment of two computing because the protein-ligand dockir
molecules. This improved routine allows the events are completely independent of eac
program to perform shape-based database other. Although docking has initially been dl
searches at an acceptable speed (300-400 veloped as a specialist modeling tool run a
conformers/s). computer workstations, nowadays inexpei
There are several methods for generating sive Linux clusters or distributed computir
conformers during i n silico screening. Torsion over networked PCs can be used for virtu;
optimization (119) is used for minimizing the screening. This increases the in silic
root-mean-square (rms) deviation between throughput into the realm of 100,000 con
the constraints from the pharmacophore and pounds per day on a Linux cluster, therel:
the corresponding distances in the compound. reaching the speed of today's high throughpi
The "directed tweak" (120) algorithm also screens. Energy functions that evaluate t1
uses torsion o~timizationfor minimizing - the
sum of the squared deviations between dis-
tances in the pharmacophore and the corre-
sponding ones in the compound. Chem-
DBS-3D (121) generates low energy
conformations that can match the pharma-
cophore using rules similar to those in WIZ-
ARD (77). The distance geometry algorithm
(122) uses bond length and bond angle infor-
mation for building a matrix containing upper
and lower limits of distances between atoms in
the organic compound. These distances can be
used for building the conformation that fits
the pharmacophore query. The systematic
search method (123) is feasible for molecules
with few rotatable bonds and thus has limited
applicability.
2.4 Structure-Based Virtual Screening
In direct analogy to high throughput screen-
ing, docking and scoring techniques can be aP- Figure 6.17. Crystal structure (PDB entry la4q)
plied to computationally screen a database of the neuraminidase inhibitor zanamivir bound in tl
hundreds of thousands of compounds against active site (213).
2 Concepts of Virtual Screening 261
binding free energy between protein and li- computational docking experiment (127).
gand sometimes employ rather heuristic Moreover, many receptor sites are flexible;
terms. Therefore, those functions are more they often undergo conformational changes
broadly referred to as scoring functions. upon ligand binding. A good example is the
Tyr248 movement of carboxypeptidases upon
2.4.1 Protein Structures. A 3D-protein struc- substrate or ligand binding, which has pro-
ture of the receptor at atomic resolution is nec- vided the first structural perspective of Kosh-
essary to start a protein-ligand docking exper- land's induced-fit hypothesis (128, 129). Pro-
iment. The exponential growth of solved teins have to be studied carefully in every
crystal and solution structures in recent years individual case to decide how promising a vir-
provides a reliable source of protein struc- tual screen may be.
tures. The protein database (PDB) currently For many protein drug targets crystal or
holds more than 18,000 protein structures. It solution structures are not available. In such
should be noted, however, that the chances of cases homology models (130,131) and pseudo-
a successful virtual screen very much depend receptor models (132) are often used. How-
on the quality of the available structure. The ever, unless there is a very high conservation
crystal structure should be well refined; typi- of receptor site residues the use of homology
cally a resolution of at least 2.5 A is considered models for virtual screening is much riskier
to be necessary (126). Small changes in struc- than using solved structures. On the other
ture can drastically alter the outcome of a hand, the PDB contains a wealth of protein
Virtual Screening
,
Kuh1(139,146).Furthermore, several scor- tion of a ligand in the receptor site. Simulation
;functions are now applied in combination techniques such as simulated annealing (154)
th the DOCK algorithm (147-151). are then applied to find energetically more fa-
2.4.2.2 Flexible Ligands. Druglike mole- vorable conformations of the ligand. To speed
CUles are typically flexible, with usually up to up the docking process, docking programs
c :ht rotatable bonds (34). Energetic differ-
ences between alternative ligand conforma-
such as AutoDock (155) precalculate molecu-
lar affinity potentials of the protein on a grid.
I tioIns are often small compared to the total Molecular dynamics (MD) methods (see, e.g.,
1 birlding affinity between iigand and target refs. 156 and 157) and Monte Carlo simulation .
P'3tein. Also, for flexible ligands it is quite techniques (see, e.g., Refs. 158-162) are also
co1nmon that the bioactive conformations are frequently used in protein-ligand docking
dif Terent from the minimum energy confor- applications.
I mtitions in solution (133). Ligand flexibility is A variety of other sampling methods are
tylically handled in docking approaches by applied in docking programs, including ge-
C01nbinatorial optimization protocols such as netic algorithms, distance geometry methods,
fragmentation, ensembles, genetic algo- random searching, hybrid methods, and gen-
rit hms, or simulation techniques. eralized effective potential methods. Genetic
In fragmentation approaches, the ligand is algorithms have been employed in programs
dieisected into pieces that are either rigid or such as Gambler (163), AutoDock (155), and
thiit can be represented by small conforma- GOLD (126). PRO-LEADS uses an alternative
tional ensembles. In docking approaches, typ- search technique called "tabu search" (164).
ically a strategy called incremental construc- Starting from a random structure, new struc-
ti0n is used to assemble fragments to whole tures are created by random moves. A tabu list
mcilecules directly in the receptor site. Usu- is maintained during the optimization phase
all:y, the largest rigid moiety of the ligand and contains the best and the most recently
(solmetimes called anchor) is docked first in found binding configurations. Configurations
thc? receptor site. The remaining fragments that resemble those stored in the tabu list are
arc3 subsequently added in a buildup protocol. rejected, except they are better than the one
Aft;er each incremental buildup step, torsion scoring best. The sampling performance is im-
an;gles are sampled and the growing molecule proved because previously sampled config-
is Ininimized. urations are avoided. Finally, it should be
Virtual Screening
mentioned that multistep hybrid docking pro- have been added to FF scores. Examples in-
cedures have been developed that combine clude generalized Born/surface area ap-
rapid fragment-based searching with sophisti- proaches (176) or atomic solvation parameters
cated MC or MD simulations (165, 166). (177-179).
2.4.3.2 Empirical Scoring. Empirical scor-
2.4.3 Scoring of Protein-Ligand Interactions. ing functions are multivariate regression
The problem of sampling the correct binding methods. They fit coefficients of physically
geometry (binding mode) of a protein-ligand motivated contributions to binding free en-
complex is considered to be solved in many ergy in reproduction of measured binding af-
docking programs (167). However, to identify finities of a training set of protein-ligand com-
this correct binding mode by its lowest energy plexes with known 3D structure. As an
or score is a different matter; this is indeed the example, the docking program FlexX (180)
bottleneck of docking-scoring approaches to- uses a scoring function similar to that of Bohm
day. The most important aspect of scoring (181,182). It calculates the sum of free-energy
functions for virtual screening is speed. contributions from the number of rotatable
Therefore, accuracy requirements are low; bonds in the ligand, hydrogen bonds, ion-pair
most functions used do not conceptually de- interactions, hydrophobic and pi-stacking in-
scribe binding free energies. Therefore, these teractions of aromatic groups, and lipophilic
functions are typically not called energy func- interactions:
tions but scoring functions. Three main scor-
ing strategies are typically used in docking ap-
plications for virtual screening: force field
scoring, empirical scoring, and knowledge-
based scoring.
2.4.3.1 Force Field (FF) Scoring. Nonbonded
interaction energy terms of standard force fields
are typically used in FF scoring (e.g., in vmuo ionic-int
electrostatic terms; sometimes modified by scal-
ing constants that assume the protein to be an
+ AGaro 2 f(AR, A a )
electrostatic continuum) and van der Wads
aro-int .
(vdW) terms (168-171). DOCK and GREEN + AG~ipo 2 f*(AR)
(172) use the intermolecular terms of the AM- 1ipo.cont
BER energy function (173,174),with the excep-
tion of an explicit hydrogen bonding term (147):
where AGO, AG,,,, AG,,, AG,,, AG,, and
AGlip0are adjustable parameters that are fit-
ted; PAR, A a ) is a scaling function penalizing
deviations from the ideal geometry; and N,,, is
the number of freely rotatable bonds. The in-
teraction of aromatic groups is an addition to
Bohm's original force-field design (181, 182).
where each term is summed up over ligand The lipophilic contributions are calculated as
atoms i and protein atoms j.AGand BGare the a sum of atom-pair contacts in contrast to
vdW repulsion and attraction parameters of evaluating a surface grid as in Bohm's scoring
the 6-12 potential, rGis the distance between function. Bohm's scoring function and its
atoms i and j,q is a point charge at each of the FlexX implementation are being improved
atoms, and D is the dielectric constant. Intra- and additional terms are being tested (see,
ligand interactions are added to the score. Up e.g., Refs. 182 and 183).
to a 100-fold gain in docking time can be 2.4.3.3 Knowledge-Based Scoring. Because
achieved by precomputing these terms on a 3D the forces that govern protein-ligand interac-
grid that represents the protein during dock- tions are so complex, an implicit approach to
ing (155, 175). More recently, solvation terms capture all relevant terms of protein-ligand
2 Concepts of Virtual Screening
binding seems very attractive. Borrowing sphere radius of 12 A (184); k , is the Boltz-
from statistical thermodynamics of liquids, mann factor; T is the absolute temperature;
mean-field approaches derived solely from and f,,,-,,,, J(r)is a ligand volume correction
structural information have been applied to factor that is introduced because intraligand
protein-ligand binding. Protein-ligand atom- interactions
..
are not accounted for (185, 186).
pair potentials can be calculated from struc- p,,,"J(r) designates the number density of
tural data (e.g., PDB), assuming that observed atom pairs of type .. g at a certain atom-pair
crystallographic protein-ligand complexes ex- distance r. hU,,"is the number density of a
hibit optimal placement. As an example, a ligand-protein atom pair of type in a refer-
knowledge-basedscoring function was derived ence sphere with radius R (184). For use in
recently using 697 protein-ligand complexes docking studies, the PMF score is combined
from the PDB as knowledge base. Using 16 with a vdW term to account for short-range
protein and 34 ligand atom types, a total of 282 interactions (187, 188). The PMF scoring
statistically significant interaction potentials function was implemented into the DOCK4.0
s was derived. The final score is program. For faster scoring it was also imple-
the sum over all protein-ligand mented on a grid similar to the force-field
m-pair interactions. score in DOCK. Flexible docking experiments
on FK506 binding protein (187), neuramini-
PMF-score = 2 AzJ(r); dase (127),and stromelysin (189) showed high
kl + r < rmtorr'J (6.3)
predictive power and robustness of the PMF
[
A,(r) = - ~ B Tln f~ol-corrJ(r)
p4(r)1
hulk
41
score. Figure 6.19 shows the predictive power
of the scoring function applied to 132 protein-
ligand complexes taken from the PDB.
ere k1 is a ligand-protein atom pair of type 2.4.3.4 Consensus Scoring. Consensus scor-
designates the distance at which at- ing is an approach that combines several scor-
actions are truncated (6 A for ing functions to find common hits. Such an
on-carbon interactions and 9 A other- approach seems desirable because of the miss-
e); all A,(r) are derived with a reference ing robustness of current scoring functions.
Virtual Screening
Charifson et al. (163) provided a comprehen- scoring functions and also in consensus com-
sive consensus scoring study using DOCK and bination. Consensus scoring experiments re-
Gambler, in combination with 13 scoring func- ported by Bissantz et al. found that docking1
tions: LUDI (104), ChemScore (190, 191), consensus scoring performances varied widely
Score (192), PLP (193), Merck force field among targets (198). Stahl and Rarey sug-
(1941, DOCK energy score (146, 147), DOCK gested that the combinations of FlexX and
chemical score, Flog (152),strain energy, Pois- PLP scores are ideal for consensus scoring for
son Boltzmann (195),buried lipophilic surface a variety of targets including COX-2, ER, p38
area (1961, DOCK contact score (1441,and vol- MAP kinase, gyrase, thrombin, gelatinase A,
ume overlap (197). Three enzymes were used and neurarninidase (199).
as test proteins: p38 MAP kinase, inosine
monophosphate dehydrogenase, and HIV pro- 2.4.4 Docking as Virtual Screening Tool. A
tease. By comparing the performance of sin- virtual screening protocol is schematically
gle-scoring functions with consensus scoring shown in Fig. 6.20. The necessary steps in-
schemes involving two or three scoring func- clude: protein structure preparation, ligand
tions, the authors found that false positives database preparation, docking calculation,
(inactive compounds that have high predicted and postprocessing.
scores) were significantly reduced in the latter The protein has to be prepared only once
case. The authors estimated that a consensus for a virtual screening experiment unless dif-
scoring approach would consistently provide ferent protein conformations are considered.
hit rates between 5 and 10% (5-10 out of 100 The receptor site needs to be determined and
compounds tested to show low f l activity) for charges have to be assigned. The protein
enzymes with reasonably buried binding sites. structure and the receptor site have to be mod-
A comparison of the different scoring func- eled as accurately as possible. Determining
tions revealed that ChemScore, PLP, and protein surface atoms and site points as well
DOCK energy score performed best as single- as the assignment of interaction data, such as
Protein structure
Protonationl
2 k&J
assignment charge assign.
Placement
optimization
I
1
Figure 6.20. Flowchart of
Consensus
scoring
I
I
I
docking as virtual screen-
ing tool in the example of Ranking1 Hit list
selection
FlexX.
Virtual library 1 012 compounds
ADME/tox/druglikeness filters
n 1 o9 compounds
2D similarityldissimilarity
3D conformations
II 1 o7 compounds
3D pharmacophore /I 1 o5 compounds
Docking 1 o4 compounds
Scoring 1 o3 compounds
Visual inspection
Compounds assayed
V 10' - 1 o2 compounds
1 o1 - 1 o2 compounds
Figure 6.21. Virtual screening filter
cascade.
mission by taking up dopamine released into model was derived based on two known poten
the synapse. There is no experimental struc- DAT inhibitors R-cocaine and WIN-35065-
ture available for DAT. However, an extensive (Fig. 6.22) (95). The common binding elf
SAR of DAT inhibitors (mostly cocaine ana- ments of these compounds are a ring N thz
logs) is available. DAT is involved in several may be substituted, a carbonyl oxygen, and a
diseases such as drug addiction and attention aromatic ring that can be defined by the pos
deficit disorder (200). For example, ritalin tion of its center (Fig. 6.22). Because bot
[(?)-threo-methylphenidate], a DAT inhibi- compounds have some flexibility, a systemati
tor, is marketed for treating attention deficit conformational search was performed to 01
disorders in children (200, 201). Until re- tain all possible conformations these con
cently, all efforts in synthesizing DAT inhibi- pounds can have when bound to DAT. T
tors were focused on creating analogs around identify structurally diverse conformers, clut
the tropane, piperazine, methylphenidate, tering of the generated conformers was don1
and 2,3-dihydro-5-hydroxy-5H-imidazo[2,1- Measuring distances among chosen pharmz
a]isoindole cores. It was shown that, despite cophore elements in the generated conformel
structural differences, DAT inhibitors share led to distances shown in Fig. 6.22.
one or more common 3D pharmacophore mod- Recently, analysis of several large chemici
els (95, 202, 203). In an effort to identify new databases showed that the NCI database h~
chemical cores for developing DAT inhibitors by far the highest number of unique con
with new pharmacological profiles, a pharma- pounds (204). Thus this database provides
cophore-based 3D database search was pro- large number of unique synthetic compounc
posed (95). For this purpose a pharmacophore and natural products and is an excellent rc
1 Lead compounds
for optimization
identified. In consequence, the strategy for
identifying inhibitors was to first build the
matriptase-HAI-1 Kunitz domain 1 complex,
identify binding regions on matriptase, screen
Figure 6.23. Flowchart showing steps used in lead the NCI 3D database for hits that capture
identification using pharmacophore-based 3D data-
binding groups of HAI-1 to matriptase, and in
base searching.
the end, biochemical testing (Fig. 6.25). The
structure of matriptase was obtained from
source for drug lead discovery. Using the 3D PDB entry lEAW (207). Homology modeling,
pharmacophore from Fig. 6.22, the NCI 3D- as implemented in MODELLER (208, 2091,
database (67) of 206,876 "open compounds" was chosen to build the 3D structure of the
was searched using the program Chem-X Kunitz domain 1 from KSPI. The complex of
(205). The strategy used for identifying leads matriptase with HAI-1 Kunitz domain 1 was
through virtual screening is shown in Fig. built using a combination of manual docking
6.23. During the search each compound was and molecular dynamics refinement with the
first checked as to whether it had the pharma- program CHARMM (210). The obtained bind-
cophore elements and second as to whether it ing mode of HAI-1 Kunitz domain 1 to
had any acceptable conformation matching matriptase (Fig. 6.26) suggests that three re-
the distance requirements. Up to 3 million gions might be important for inhibitor bind-
conformations were examined for each com- ing. The S1binding site Asp185, which is char-
pound. A total of 4094 compounds, 2% of the acteristic of trypsinlike serine proteases, is the
database, were identified as "hits." This num- specificity pocket used to recognize substrates
ber was further reduced using filters such as with Arg or Lys as P 1 residue. The anionic
molecular weight, structural novelty, simplic- site, defined by Asp96, Asp60.A7and AspGO.B,
ity, diversity, and hydrogen-bond acceptor ni- is the site at which Arg258 from HAI-1 binds.
trogen. Seventy compounds were selected for A hydrophobic region defined by Ile41 and
testing in biochemical assays. Forty-four com- Tyr6O.G might also be important for specific-
pounds displayed more than 20% inhibition at ity of future matriptase inhibitors.
10 pA4 in the [3H]mazindol binding assay, Thus, the active site used for in silico
from which three compounds were chosen for .screening with the program DOCK consti-
deriving an SAR (Fig. 6.24). These results sug- tutes all three binding regions. Energy scoring
gested that the 3D pharmacophore-based da- was used for ranking docked compounds. The
tabase search is an efficient tool for identifying top 2000 compounds were considered for se-
novel DAT inhibitors. lecting potential inhibitors. Given that
Virtual Screening
Figure 6.24. Selected DAT inhibitors identified from the NCI database.
matriptase prefers positively charged residues ecule for one protein molecule (Table 6.4). It
in the P 1 position, inhibitors should also have should be noted that screening results at sin-
positively charged groups to bind efficiently to gle dose and IC,, depend on the protein con-
Asp185 from the S1 site of matriptase (Fig. centration, whereas Ki is concentration inde-
6.27). Note that a more efficient way of doing pendent. From the hits in the screening step
the virtual screening presented above is to do bis-benzamidines were chosen for Ki determi-
a pharmacophore search first followed by nation (Table 6.5) because this class of com-
docking. Thus, 69 compounds were selected pounds could bind to both the S1 site and the
for biochemical testing at 75 p M inhibitor and anionic site. These results show that combin-
matriptase concentration. Initial screening ing a pharmacophore hypothesis with a struc-
showed that 50% of compounds tested pro- ture-based database search can provide an ef-
duced more than 70%inhibition of enzymatic ficient way of identifying leads for a drug
activity when the ratio was one inhibitor mol- design project.
Figure 5.5. GRID probes on
Factor Xa site and the com-
bined resultant complementary
site points that can be used for
pharmacophore fingerprint cal-
culations (lower right).
Figure 5.6. Overview of the Gridding and Partitioning (Gap) procedure a s applied to monomers,
exemplified using phenylalanine a s a potential primary amine. This molecule thus contains two
pharmacophoric groups (the aromatic ring and the carboxylic acid). During the conformational
analysis the locations of these pharmacophoric groups are tracked within a regular grid.
[Reproduced
- from A. R. Leach and M. M. Hann, Drug Discovery Today, 5, 326-336 (2000), with
permission of Elsevier Science.]
posslble products (blue: selected without regard for economy
0 10 0 10
BCUT rnetrlc 1 BCUT metrlc 1
0"
0 10
BCUT metrlc 1
Figure 5.13. (a)Avirtual library of 634,721 allowed combinatorial AB products (after filtering out
products that failed Lipinski's Rule of 5 "druglike" criteria) shown in a BCUT chemistry space
specifically chosen to best represent the diversity of the virtual library. (b) The maximally diverse
9600-compound subset of the virtual library, illustrating the results of purely product-based
"library design." Although providing the maximal diversity, synthesis of these 9600 AB products
would require the use of 347 A's and 1024 B's--clearly unacceptable from the perspective of syn-
thetic economy (number of reactants and robotic control). (c) The 9600-compound library resulting
from the traditional, purely reactant-based library design strategy of selecting the 80 most diverse
A's and the 120 most diverse B's. Although providing user-selected synthetic economy, the diversi-
ty of these 9600 AB products is clearly quite poor. (d) The 9600-compound library resulting from
the reactant-biased, product-based (RBPB) algorithm developed by Pearlman and Smith (see Refs.
31,87c and text). The algorithm selected a different set of 80 A's and a different set of 120 B's, thus '
providing the same level of user-selected synthetic economy, while also providing substantially
greater diversity than could be achieved using a purely reactant-based library design strategy.
Figure 11.6. Three density maps at tlil'kring resolutio~ls:a: 1 .:3 A: h, 2.1 A: c. :(.O A
D
inn A
Figure 14.6. Examples of macromolecules studied by cryo-EM and 3D image reconstruction and the
resulting 3D structures (bottom row) after q o - E M analysis. All micrographs (top row) are displayed at
above 170,000X magniscation and all models at about 1,200,000X magnification. (a) A single particle
without symmetry: The micrograph shows 70s E. coli ribosomes complexed with mRNA and fMet-
tRNA. The surface-shaded density map, made by averaging 73,000 ribosome images from 287 micro-
graphs has a resolution (FSC)of 11.5 & The 50s and 30s subunits and the tRNA are colored blue, yel-
low, and green, respectively. The identity of many of the subunits is known as some RNA double helices
are clearly recognizable by their major and minor grooves (e.g., helix 44 is shown in red). [Courtesy of
J. Frank (SUNY, Albany), using data h m Gabashvili et al. (86).1 (b) A single particle with symmetry:
The micrograph shows hepatitis B virus cores. The 3D reco~wtruction,at a resolution of 7.4 A (DPR),
was computed from 6384 particle images taken from 34 micrographs. From Bottcher et.al. (44).] (c)A
helical filament: The micrograph shows adin filaments decorated with myosin S1heads containing the
essential light chain. The 3D reconstruction, at a resolution of 30-35 is a composite in which the dif-
ferently colored parts are derived from a series of difference maps that were superimposed on f-actin.
The components include: f-actin (blue), myosin heavy chain motor domain (orange), gssential light chain
(purple), regulatory light chain (white), tropomyosin (green), and myosin motor domain N-termind
beta-barrel (red). [Courtesy of A. Lin, M. Whittaker, and R. Milligan (Scripps Research Institute,
LaJolla, CA).] (d) A 2D crystal, light-harvesting complex LHCII a t 3.44 resolution. The model shows
the protein backbone and the arrangement of chromophores in a number of trimeric subunits in the
crystal lattice. In this example, image contrast is too low to see any hint of the structure without image
processing (see also Fig. 14.3). [Courtesy of W. Kiihlbrandt (Max-Planck-Institute for Biophysics,
Frankfurt, Germany).]
ity needs to be included in high-throughput easy. For instance the necessity of having cer-
docking. Scoring functions have to improve to tain features like salt bridges formed on ligand
make consistently correct predictions of puta- binding [e.g., in influenza virus neuramini-
tive protein-ligand binding affinities. Scoring dase (21111 or other prevalent information
functions, calibrated to reproduce experimen- (e.g., hinge region binding for many ATP com-
tal data, have unreliable performance outside petitive kinase inhibitors) greatly helps to re-
their training set. Thus, de novo methods us- duce the number of compounds subjected to
ing terms describing the thermodynamics of docking experiments.
binding should replace the first generation of The missing robustness of many structure-
scoring functions. In consequence, some of the based dockinglscoring techniques opens the
speed gained from low cost parallel computing questions of when should one apply it and
should be invested into higher accuracy scor- when should one retreat to pharmacophore-
ing rather than higher throughput. One way based virtual screening. In many cases it
of increasing throughput is to keep the num- makes sense to prescreen virtual libraries us-
ber of compounds docked as small as possible ing pharmacophore techniques, particularly if
by using every bit of knowledge one has to one uses shape representations of the receptor
prefilter the database, mainly based on phar- site, such as volume-exclusion spheres, a
macophore information. In some cases this is pharmacophore search can be a very effective
prefilter. Also, in cases where receptor-site
Table 6.4 Results Obtained from the Initial flexibility is problematic, pharmacophore
Screening of Compounds against Matriptase searching may be less restrictive (unless one
(206Y tries to deal with protein flexibility in the
% Inhibition Number of Compounds docking routine-a task that is not easy, usu-
ally not applied today, and another future di-
Over 95% 15
rection of development in virtual screening).
90-94% 4
70-89% 15 The above tools and pathways show a sim-
40-69% 13 ple and inexpensive way of discovering novel
Below 39% 17 lead chemical matter for drug discovery pro-
High absorbency 3 grams. However, there are many hurdles to
Increased activity 3 overcome to make virtual screening success-
"Testing was done at 75 pi14 compound and protein con-
ful. The properties of druglikeness may not be
centration. The ratio between compound and protein molar understood sufficiently enough, resulting in
concentration was 1:1. poor pharmacokinetics of the compounds: ex-
273
H 2 N ~ o - ( c H 2 ) 6 - o
HN
Structure
aNr K, (nM)
924
H 2 N ~ o - ( c H 2 ) 6 - o ~ ~ ~
H2Nmy
HN
I
H 2 N ~ 0 - ( c H 2 ) 5 - O a r
HN
\ /
HN
H 2 N
'535
HN
H2N0(cH
HN
Br
\ /
Br
>10,000
H 2 - N \ /o
~ ~ \ /o ~ ~ , 208 "
HN
isting SARs that lead to the generation of On one hand, there are obviously many
pharmacophore models may bias the pharma- risks involved in virtual screening, many as-
cophore toward a narrow segment of com- sumptions made, and a positive outcome not
pounds; structural information of the target at all guaranteed in each and every case. On
protein is often not available; and homology the other hand, however, the overall process is
models may not be precise enough. Current extremely cost effective and fast. Even if suc-
scoring functions are often not robust enough cessful in only a few cases, virtual screening
to separate actives from inactives. Compounds can produce leads that may otherwise not
identified may not be easy to synthesize. Hits have surfaced and so add immense value to a
may not be selective or patentable. drug discovery program. Especially in cases
Virtual Screening
where high throughput screening cannot 14. H. J. Bohm and G. Schneider, Virtual Screen-
identify a viable lead chemical matter, virtual ing for Bioactive Molecules, Wiley-VCH, Wein-
screening applied to vendor databases or combi- heim, 2000.
natorial libraries to be synthesized presents a 15. A. C. Good, S. R. Krystek, and J. S. Mason,
cost-effective alternative. Mainly because of its Drug Discovery Today, 5 (Suppl.), S61 (2000).
speed, cost effectiveness, ease of setup, and in- 16. T. Langer and R. D. Hoffmann, Curr. Pharm.
creasing robustness, we expect virtual screening Design, 7, 509 (2001).
to become a mainstream approach throughout 17. B. Waszkowycz, T. D. J. Perkins, R. A. Sykes,
the pharmaceutical industry. and J. Li, IBM Systems J.,40,360 (2001).
18. A. Good, Curr. Opin. Drug Discovery Dev., 4,
301 (2001).
5 ACKNOWLEDGMENTS 19. R. F. Burns, R. M. A. Simmons, J. J. Howbert,
D. C. Waters, P. G. Threlkeld, and B. D. Gitter,
The authors thank Dr. Matthias Rarey for Exploiting Molecular Diversity, Symposium
valuable discussions. Proceedings, Vol. 2, Cambridge Healthtech In-
stitute, San Diego, 1995, p. 2.
20. W. P. Walters, Ajay, and M. A. Murcko, Curr.
REFERENCES Opin. Chem. Biol., 3,384 (1999).
1. D. J. Abraham, Intra-Science Chem. Rept., 8 , l 21. W. P. Walters and M. A. Murcko, Methods
(1974). Principles Med. Chem., 10, 15 (2000).
2. M. Perutz, Protein Structure. New Approaches 22. D. E. Clark and S. D. Pickett, Drug Discovery
to Disease and Therapy, Freeman, New York,
Today, 5 , 4 9 (2000).
1992.
23. B. L. Podlogar, I. Muegge, and L. J. Brice, Curr.
3. S. W. Fesik, J. Med. Chem., 34,2937 (1991).
Opin. Drug Discovery Dev., 4, 102 (2001).
4. D. W. Cushman, H. S. Cheung, E. F. Sabo, and
24. I. Muegge, Chem. Eur. J.,8, 1976 (2002).
M. A. Ondetti, Biochemistry, 16,5484 (1977).
25. Ajay, W. P. Walters, and M. A. Murcko, J. Med.
5. L. F. Kuyper, B. Roth, D. P. Baccanari, R. Fer-
Chem., 41,3314 (1998).
one, C. R. Bedell, J. N. Champness, D. K. Stam-
mers, J. G. Dann, F. E. Norrington, D. J. 26. Ajay, G. W. Bemis, and M. A. Murcko, J. Med.
Blaker, and P. J. Goodford, J. Med. Chem., 25, Chem., 42,4942 (1999).
1120 (1982). 27. J. Sadowski and H. Kubinyi, J. Med. Chem.,
6. M. A. Gallop, R. W. Barrett, W. J. Dower, 41,3325 (1998).
S. P. A. Fodor, and E. M. Gordon, J. Med. 28. M. Wagener and V. J. vanGeerestein, J. Chem.
Chem., 37,1233 (1994). Inf. Comput. Sci., 40,280 (2000).
7. E. M. Gordon, R. W. Barrett, W. J. Dower, 29. V. J. Gillet, P. Willett, and J. Bradshaw,
S. P. A. Fodor, and M. A. Gallop, J. Med. J. Chem. Inf. Comput. Sci., 38,165 (1998).
Chem., 37, 1385 (1994). 30. Comprehensive Medicinal Chemistry is avail-
8. M. W. Lutz, J. A. Menius, T. D. Choi, R. G. able from MDL Information Systems Inc., San
Laskody, P. L. Domanico, A. S. Goetz, and D. L. Leandro, CA 94577 and contains drugs already
Saussy, Drug Discovery Today, 1,277 (1996). on the market.
9. H. J. Bohm and G. Klebe, Angew. Chem. Int. 31. World Drug Index is available from Derwent
Ed. Engl., 35,2589 (1996). Information, London, UK. Website: www.der-
10. W. P. Walters, M. T. Stahl, and M. A. Murcko, went.com.
Drug Discovery Today, 3,160 (1998). 32. MACCS-I1 Drug Data Report is available from
11. Y. C. Martin, Perspect. Drug Discovery Des., MDL Information Systems Inc., San Leandro,
718, 159 (1997). CA 94577 and contains biologically active com-
12. A. C. Good and J. S. Mason in K. B. Lipkowitz pounds in the early stages of drug develop-
and D. B. Boyd, Eds., Reviews in Computa- ment.
tional Chemistry, Vol. 7, VCH, New York, 33. C. A. Lipinski, F. Lombardo, B. W. Dominy,
1995, p. 67. and P. J. Feeney, Adv. Drug Delivery Rev., 23,
13. G. Klebe, Virtual Screening: An Alternative or 3 (1997).
Complement to High Throughput Screening?, 34. T. I. Oprea, J. Cornput.-Aided Mol. Des., 14,
Kluwerffiscom, Leiden, 2000. 251 (2000).
J. Kelder, P. D. J. Grootenhuis, D. M. Bayada, 53. J. H. Van Drie, D. Weininger, and Y. C. Martin,
L. P. C. Delbressine, and J. P. Ploemen, J. Cornput.-Aided Mol. Des., 3,225 (1989).
Pharm. Res., 16, 1514 (1999). 54. E. Fischer, Ber. Dtsch. Chem. Ges., 27, 2985
M. Hann, B. Hudson, X. Lewell, R. Lifely, L. (1894).
Miller, and N. Ramsden, J. Chem. Znf. Comput. 55. P. Ehrlich, Ber. Dtsch. Chem. Ges., 42, 17
Sci., 39,897 (1999). (1909).
National Toxicology Program. http://ntp- 56. P. Gund, Prog. Mol. Subcell. Biol., 5, 117
server.niehs.nih.gov. (1977).
RTECS C2(96-4); National Institute for Occu- 57. F. Bernstein, T. F. Koetzle, G. J. B. Williams,
pational Safety and Health (NIOSH), U.S. De- E. F. Meyer, Jr., M. D. Brice, J. R. Rodgers, 0.
partment of Health and Human Services: Kennard, T. Schimanouchi, and M. J. Tasumi,
Washington, DC, 1996. URL: http://www. J. Mol. Biol., 112, 535 (1977).
ccohsxa. 58. H. M. Berman, J. Westbrook, Z. Feng, G. Gilli-
G. Klopman and H. S. Rosenkranz, Mutat. land, T. N. Baht, H. Weissig, I. N. Shindyalov,
Res., 305, 33 (1994). and P. E. Bourne, Nucleic Acids Res., 28, 235
K. Enslein, V. K. Gombar, and B. W. Blake, (2000).
Mutat. Res., 305,47 (1994). 59. M. K. Dreyer, D. R. Borcherding, J. A. Dumont,
Lhasa Ltd., School of Chemistry, University of N. P. Peet, J. T. Tsay, P. S. Wright, A. J. Bi-
Leeds, Leeds, UK. URL: http://www.chem. tonti, J. Shen, and S.-H. Kim, J. Med. Chem.,
leeds.ac.uWLUWdereWindex.htm1. 44, 524 (2001).
Available Chemicals Directory is available 60. G. R. Marshall, C. D. Barry, H. E. Bosshard,
from MDL Information Systems Inc., San Le- R. A. Dammkoehler, and D. A. Dunn in E. C.
andro, CA 94577 and contains specialty bulk Olson and R. E. Christoffersen, Eds., Com-
chemicals from commercial sources. Website: puter Assisted Drug Design, American Chemi-
http://www.mdli.com. cal Society, Washington, 1979, p. 205.
61. J. R. Sufrin, D. A. Dunn, and G. R. Marshall,
V. N. Viswanadhan, A. K. Ghose, G. R. Revan-
Mol. Pharmacol., 19,307 (1981).
kar, and R. K. Robins, J. Chem. Znf. Comp. Sci.,
29,163 (1989). 62. R. D. Cramer, D. E. Patterson, R. D. Clark, F.
Soltanshahi, and M. S. Lawless, J. Chem. Inf.
G. W. Bemis and M. A. Murcko, J.Med. Chem.,
Comput. Sci., 38, 1010 (1998).
39,2887 (1996).
63. D. Horvath in A. K. Ghose and V. N. Viswa-
G. W. Bemis and M. A. Murcko, J.Med. Chem., nadhan, Eds., Combinatorial Library Design
42,5095 (1999). and Evaluation. Principles, Software Tools,
X. Q. Lewell, D. B. Judd, S. P. Watson, and and Applications in Drug Discovery, Marcel
M. M. Hann, J. Chem. Znf. Comput. Sci., 38, Dekker, New York, 2001, p. 429.
511 (1998). 64. A. Dalby, J. G. Nourse, W. D. Hounshell,
, I. Muegge, D. Brittelli, and S. L. Heald, J. Med. A. K. I. Gushurst, D. L. Grier, B. A. Leland, and
Chem., 44, 1841 (2001). J. Laufer, J. Chem. Znf. Comput. Sci., 32, 244
, B. L. Podlogar and I. Muegge, Curr. Top. Med. (1992).
Chem., 1,257 (2001). 65. Spresi Chemical Database, InfoChem GMBH,
, R. M. Brunne, G. Hessler, and I. Muegge in Grobenzell, Germany and Daylight Chemical
K. C. Nicolaou, R. Hanko, and W. Hartwig, Information Systems, Irvine, CA (2002).
Eds., Handbook of Combinatorial Chemistry, 66. The Chemical Abstracts Database, Chemical
Vol. 2, Wiley-VCH, Weinheim, 2002, p. 761. Abstracts Service, 2540 Olentangy River Road,
. B. E. Evans, K. E. Rittle, M. G. Bock, R. M. PO Box 3012, Columbus, OH (2002).
DiPardo, R. M. Freidinger, W. L. Whitter, G. F. 67. G. W. A. Milne, M. C. Nicklaus, J. S. Driscoll, S.
Lundell, D. F. Veber, P. S. Anderson, R. S. L. Wang, and D. W. Zaharevitz, J. Chem. Znf.
Chang, V. J. Lotti, D. J. Cerino, T. B. Chen, Comput. Sci., 34, 1219 (1994).
P. J. Kling, K. A. Kunkel, J. P. Springer, and J. 68. G. W. A. Milne and J. A. Miller, J. Chem. Znf.
Hirshfield, J. Med. Chem., 31, 2235 (1988). Comput. Sci., 26, 154 (1986).
. J. S. Mason, I. Morize, P. R. Menard, D. L. 69. D. Weininger, J. Chem. Inf. Comput. Sci., 28,
Cheney, C. Hulme, and R. F. Labaudiniere, 31 (1988).
J.Med. Chem., 42,3251 (1999). 70. D. Weininger, A. Weininger, and J. L. Wein-
. P. J. Hajduk, M. Bures, J. Praestgaard, and inger, J. Chem. Znf. Comput. Sci., 29, 97
S. W. Fesik, J. Med. Chem., 43,3443 (2000). (1989).
Virtual Screening
154. S. Kirkpatrik, C. D. J. Gelatt, andM. P. Vecchi, 174. S. J. Weiner, P. A. Kollman, D. T. Nguyen, and
Science, 220,671 (1983). D. A. Case, J. Comput. Chem., 7,230 (1986).
155. D. S. Goodsell and A. J. Olson, Proteins: 175. P. J. Goodford, J. Med. Chem., 28,849 (1985).
Struct., Funct., Genet., 8, 195 (1990). 176. D. Qui, P. S. Shenkin, E. P. Hollinger, and
156. T. P. Lybrand in D. B. Boyd and K. B. Lipko- W. C. Still, J. Phys. Chem., 101, 3005 (1997).
witz, Eds., Reviews in Comptational Chemis- 177. D. Eisenberg and A. D. McLachlan, Nature,
try, Vol. 1, VCH, New York, 1990, p. 295. 319,199 (1986).
157. J . A. Given and M. K. Gilson, Proteins: Struct., 178. P. F. W. Stouten, C. Frommel, H. Nakamura,
Funct., Genet., 33,475 (1998). and C. Sander, Mol. Simul., 10,97 (1993).
158. T. N. Hart and R. J. Read, Proteins: Struct., 179. S. Vajda, Z. Weng, R. Rosenfeld, and C. DeLisi,
Funct., Genet., 13, 206 (1992). Biochemistry, 33, 13977 (1994).
159. C. McMartin and R. S. Bohacek, J. Cornput.- 180. M. Rarey, B. Kramer, T. Lengauer, and G.
Aided Mol. Des., 11, 333 (1997). Klebe, J. Mol. Biol., 261, 470 (1996).
160. A. Wallqvist and D. G. Covell, Proteins: Struct., 181. H.-J. Bohm, J. Cornput.-Aided Mol. Des., 8,
Funct., Genet., 25, 403 (1996). 243 (1994).
161. R. Abagyan, M. Totrov, and D. Kuznetsov, 182. H. J. Bohm, J. ComputAided Mol. Des., 12,
J. Comput. Chem., 15,488 (1994). 309 (1998).
162. J . Apostolakis, A. Pluckthun, and A. Caflisch, 183. B. Kramer, G. Metz, M. Rarey, and T. Len-
J. Comput. Chem., 19,21 (1998). gauer, Med. Chem. Res., 9, 463 (1999).
163. P. S. Charifson, J. J. Corkery, M. A. Murcko, 184. I. Muegge, Perspect. Drug Discovery Des., 20,
and W. P. Walters, J. Med. Chem., 42, 5100 99 (2000).
(1999). 185. I. Muegge andY. C. Martin, J. Med. Chem., 42,
164. C. A. Baxter, C. W. Murray, D. E. Clark, D. R. 791 (1999).
Westhead, and M. D. Eldridge, Proteins: 186. I. Muegge, J. Comput. Chem., 22,418 (2001).
Struct., Funct., Genet., 33, 367 (1998). 187. I. Muegge, Y. C. Martin, P. J. Hajduk, and
165. J. Wang, P. A. Kollman, and I. D. Kuntz, Pro- S. W. Fesik, J. Med. Chem., 42, 2498 (1999).
teins: Struct., Funct., Genet., 36, 1 (1999). 188. I. Muegge and B. Podlogar, Quant. Struct.-Act.
166. D. Hoffmann, B. Kramer, T. Washio, T. Stein- Relat., 20,215 (2001).
metzer, M. Rarey, and T. Lengauer, J. Med. 189. S. Ha, R. Andreani, A. Robbins, and I. Muegge,
Chem., 42,4422 (1999). J. Cornput.-Aided Mol. Des., 14, 435 (2009.
167. J. S. Dixon, Proteins: Struct., Funct., Genet., 190. M. D. Eldridge, C. W. Murray, T. R. Auton,
Suppl., 1, 198 (1997). G. V. Paolini, and R. P. Mee, J. Cornput.-Aided
Mol. Des., 11,425 (1997).
168. P. D. J. Grootenhuis and P. J. M. vanGalen,
Acta Crystallogr. D, 51, 560 (1995). 191. C. W. Murray, T. R. Auton, and M. D. Eldridge,
J. Cornput.-Aided Mol. Des., 12,503 (1998).
169. M. K. Holloway, J. M. Wai, T. A. Halgren,
P. M. D. Fitzgerald, J. P. Vacca, B. D. Dorsey, 192. R. X. Wang, L. Liu, L. H. Lai, andY. Q. Tang, J.
R. B. Levin, W. J. Thompson, L. J. Chen, S. J. Mol. Model., 4, 379 (1998).
deSolms, N. Gaffin, A. K. Ghosh, E. A. Giu- 193. D. K. Gehlhaar, G. M. Verkhivker, P. A. Rejto,
liani, S. L. Graham, J. P. Guare, R. W. Hun- C. J. Sherman, D. B. Fogel, L. J. Fogel, and
gate, T. A. Lyle, W. M. Sanders, T. J. Tucker, S. T. Freer, Chem. Biol., 2,317 (1995).
M. Wiggins, C. M. Wiscount, 0. W. Wolters- 194. T. A. Halgren, J. Comput. Chem., 17, 520
dorf, S. D. Young, P. L. Darke, and J. A. Zugay, (1996).
J. Med. Chem., 38,305 (1995). 195. B. Honig and A. Nicholls, Science, 268, 1144
170. M. K. Holloway, Perspect. Drug Discovery Des., (1995).
9/10/11, 63 (1998). 196. D. R. Flower, J. Mol. Graphics Modell., 15,238
171. N. S. Blom and J. Sygusch, Proteins: Struct., (1998).
Funct., Genet., 27,493 (1997). 197. T. R. Stouch and P. C. Jurs, J. Chem. Znf. Com-
172. N. Tomioka and A. Itai, J. Cornput.-Aided Mol. put. Sci., 26,4 (1986).
Des., 8,347 (1994). 198. C. Bissantz, G. Folkers, and D. Rognan,
173. S. J. Weiner, P. A. Kollman, D. A. Case, U. C. J. Med. Chem., 43, 4759 (2000).
Singh, C. Ghio, G. Alagona, S. Profeta, Jr., and 199. M. Stahl and M. Rarey, J. Med. Chem., 44,
P. Weiner, J. Am. Chem. Soc., 106,765 (1984). 1035 (2001).
rences
MARTIN STAHL
HANSJOACHIM Born
Discovery Technologies
F. Hoffmann-La Roche AG
Basel. Switzerland
Contents
1 Introduction, 282
2 General Concepts and Physical Background, 284
2.1 Protein-Ligand Interactions and the
Physical Basis of Biomolecular Recognition,
284
2.2 Docking, Scoring, and Virtual Screening: *
The Basic Concepts, 289
3 Docking, 290
3.1 General Concepts to Address the Docking
Problem, 291
3.1.1 Representation of the Macromolecular
Receptor, 291
3.1.2 Ligand Handling, 293
3.1.3 Strategies for Searching the
Configuration and Conformation
Space, 294
3.1.3.1 Geometric/Combinatorial
Search Strategies, 295
3.1.3.2 Energy Driven/Stochastic
Procedures, 296
3.2 Special Aspects of Docking, 300
3.2.1 Protein Flexibility, 300
3.2.2 Water Molecules, 302
3.2.3 Assessment of Docking Methods,
3.2.4 Docking and QSAR, 304
Chemistry and Drug Disc:every 3.2.5 Docking and Homology Modeling
me 1: Drug:Discovery 4 Scoring Functions, 306
. Abraham 4.1 Description of Scoring Functions for
02003 51
d m Wiley & SolIS, Inc. Protein-Ligand Interactions, 306
281
Docking and Scoring Functions/Virtual Screening
often applied a posteriori to rationalize and ful: several publications have reported quite
understand the binding and structure-activity impressive enrichments of active compounds
relationships in a series of inhibitors and to (11-16).
assist in the manual design of individual com- The change of focus from single molecule to
pounds. Guided by the creativity of the de- compound library design in modern structure-
signer, a novel putative ligand was con- based drug discovery is also a consequence of
structed using computer graphics. Molecular major technological advances that have dra-
mechanics calculations were performed on the matically enhanced the data throughput in a
produced protein-ligand complex to assess the variety of fields:
properties of the generated ligand in terms of
a geometric and energetics analysis. A ligand 1. Progress in gene technology, protein chem-
was assumed to bind with high affinity if sat- istry, and structure determination tech-
isfactory complementarity in shape and sur- niques have resulted in a tremendous
face properties between the protein and the increase in protein structure informa-
ligand could be detected. tion. The number of publicly available
It has been realized, however, that the de- 3D protein structures continues to grow
sign of a single, synthetically accessible, active exponentially, with further acceleration
compound is a larger challenge than antici- expected from the current initiatives of
pated. Many phenomena of molecular recogni- structural genomics (17). As a conse-
tion are not yet fully understood, nor are cur- quence, more and more design projects
rent modeling tools able to reflect and are based on structural information, and
accordingly predict them with sufficient reli- structure-based ligand design has be-
ability. Most important, a fast and accurate come routine at all major pharmaceutical
computational prediction of binding affinities companies. On the other hand, the grow-
for new inhibitor candidates is still difficult to ing amount of structural knowledge also
obtain. Although the existing tools do cer- calls for automated methods that make
tainly not allow the medicinal chemist to de- this new wealth of data accessible and
sign the one perfect ligand, they can help to available.
enrich sets of molecules with more active ones, 2. Automation and miniaturization have led
even though the known deficiencies of the to the development of high-throughput
methods can still lead to significant rates of screening(HTS),which is now a well-es-
both false positives and false negatives. A tablished process for large-scale biological
more moderate goal of current molecular de- testing. Libraries of several hundred thou-
sign is thus to improve the hit rates of mole- sand compounds are routinely screened
cules suggested for biological assaying com- against new targets, frequently on a time
pared to a mere random compound selection scale of less than 1 month.
This implies that structure-based design ap- have significantly changed with the intro-
proaches now focus on the processing of large duction of combinatorial and parallel
numbers of molecules, arranged in so-called chemistry techniques. The trend contin-
virtual libraries. These can be composed of ei- ues to move away from the synthesis of in-
ther existing chemical substances (such as, for dividual compounds toward the generation
example, compound collections of a pharma- of compound libraries, whose members
ceutical company) or hypothetical new mole- are accessible through the same type of
cules that could be synthesized by combinato- chemical reaction but different building
rial chemistry. The task is then to filter these reagents.
large libraries by eliminating the majority of Massive data processing and computa-
molecules that is rather unlikely to bind and tional tasks formerly requiring expensive
by prioritizing the remaining ones. As recent supercomputers have become generally
experience shows, this strategy can be success- feasible by advances in PC cluster com-
Docking and Scoring FunctionsNirtual Screening
screening (Section 2.2). Subsequently, the fore docking or to hits obtained from virtual
current approaches to the docking problem screening on an early stage. This aspect of pre-
are presented (Section 3.11, focusing on the or postprocessing is discussed only briefly.
search methods (Section 3.1.3) and the ap-
proaches used to represent protein and ligand
2 GENERAL CONCEPTS AND PHYSICAL
structures in an efficient way (Sections 3.1.1-
BACKGROUND
3.1.2). In addition, a number of special aspects
is discussed (Section 3.2), including, for exam-
2.1 Protein-Ligand Interactions and the
ple, the issues of protein flexibility (Section
Physical Basis of Biomolecular Recognition
3.2.1) or the consideration of water molecules
in the context of docking (Section 3.2.2). This The selective binding of a small-molecule li-
is followed by a section on scoring functions gand to a specific protein is determined by
used for docking. Three major classes of scor- structural and energetic factors. For ligands of
ing functions are presented (Section 4.1) and pharmaceutical interest, protein-ligand bind-
subjected to critical assessment (Section 4.2). ing usually occurs through noncovalent inter-
A final section is dedicated to virtual screen- actions. The physical basis of noncovalent
ing, illustrating general strategies (Section 51, interactions is generally well established
special problems (Sections 5.1-5.51, and repre- through the theories of electromagnetic forces
sentative applications (Section 5.6). or, on a more fundamental level, of quantum
Although the goal of this chapter is to high- mechanics. For macromolecules, liquid sys-
light the most important aspects of docking in tems, or solutions, however, direct application
2 General Concepts and Physical Background
of these first principles is significantly compli- to l/r6 results. This l/r6 dependency is also
cated by the size and complexity of the sys- encountered in interactions that arise be-
tems, in which a large number of fluctuating tween induced electric moments, such as the
particles simultaneously interact and influ- dispersion interaction based on London
ence each other. Principles from classical me- forces. The attractive interactions between
chanics and heuristic models are therefore fre- (induced) electric multipoles are generally
quently used as an approximation to describe summarized in the term van der Waals inter-
protein-ligand interactions in aqueous solu- actions. Accordingly, van der Waals forces are
weak, attractive, short-range forces that decay
The primary forces acting between a pro- with l/r6. These are normally described by in-
tein and a ligand are all of electrostatic nature. termolecular interaction potentials such as
It is the interaction between explicit charges,
the Lennard-Jones potential:
dipoles, induced dipoles, and higher electric
multipoles that leads to phenomena that are
commonly referred to as salt bridges, hydro-
gen bonds, or van der Waals interactions. In
simplified classifications, it is only the charge- where A and B are parameters depending on
charge interaction that is called electrostatic. the type of the interacting atoms. The r12
This interaction between two charges is of term reflects the short-range repulsive
long range and considerable strength. In vac- forces attributed to unfavorable spatial
uum or uniform media it can be described by overlap of electron clouds a t short distances.
Coulomb's law. In aqueous solution of biomol- An interaction deserving special attention
ecules, however, its application is complicated is that of hydrogen bonds (33,341.In principle,
because of the presence of a large number of their origin is of the same nature as the inter-
water molecules. Unless a sufficiently large actions mentioned above. A hydrogen bond is
number of water molecules is explicitly in- defined as the interaction of an electronega-
cluded in the calculations [as usually only tive atom (the hydrogen-bond acceptor) with a
tractable in computationally expensive molec- hydrogen atom covalently bonded to an elec-
ular dynamics simulations (31)], the correct tronegative atom (the hydrogen-bond donor).
treatment of electrostatic interactions in solu- The major component of a hydrogen bond is
tion requires solving the Poisson-Boltzmann the electrostatic interaction of the donor-hy-
equation, where the solvent is considered as a drogen dipole with the negative partial charge
continuous medium of high dielectric constant of the acceptor. The special characteristics
surrounding a low-dielectric solute (32). originate from the fact that the hydrogen
Electrostatic interactions, however, do not atom is very small and can bear a considerable
only occur between charge monopoles. In a positive partial charge, such that the acceptor
comprehensive treatment of electrostatics one can contact the hydrogen atom at a shorter
has to consider a full power series, and there- distance than expected from the van der Wads
fore interactions between higher electric mo- radii. Hydrogen bonds are directed interac-
ments, such as dipoles and quadrupoles, also tions showing a high angular dependency.
lay an essential role. Their interaction ener- This directionality arises from the anisotropic
are orientation dependent and become charge distribution around the acceptor atom
orter in range with increasing electric mo- (lone pairs) and the fact that the electron
ent. For example, in contrast to the llr de- shells of donor and acceptor atom start to
ndency in Coulomb's law, the energy of the overlap at these short distances unless the
teraction between a charge and a dipole de- ideal geometry is maintained. Hydrogen
s with 1/r2,the interaction between two bonds are attributed an important role with
oles with l/r3. This, however, is valid only respect to specificity of the protein-ligand in-
a fixed orientation of the dipoles. If they teraction. This is based on their directionality
mobile, as in isotropic media (liquids), the and the fact that they require a well-defined
pole-dipole interaction is thermally aver- complementarity in the complex (mutual ar-
d and an average interaction proportional rangement of hydrogen-bond donors and ac-
Docking and Scoring FunctionsIVirtual Screening
ceptors). However, the importance of hydro- binding constant Ki) is generally used to de-
gen bonds should not be overemphasized scribe the stability of complex formation:
because it is the balance between hydrogen
bonds and other forces in protein-ligand com-
plexes that must be appropriately considered
(35). From the experimentally measured equilib-
Weakly polar interactions in proteins and rium constant the binding affinity can be cal-
protein-ligand complexes are frequently phe- culated as
nomenologically analyzed and classified in
terms of the interacting partners (36). This
especially includes interactions with T-sys-
tems, such as the NH-T, OH-T, or CH-.rrinter- where R is the gas constant (8.314 JImolK)
action (37, 38), aromatic-aromatic interac- and T is the temperature [the equilibrium con-
tions (parallel T-T stacking versus edge-to- stant would actually have to be related to a
face interaction), and the cation-T interaction standard concentration to become a dimen-
(39). All of these can mostly be rationalized in sionless quantity, but in general this is not
terms of electrostatic interactions outlined explicitly considered (44,4511. Experimentally
above; that is, they involve interactions be- determined binding constants Ki (K,) are typ-
tween monopoles, dipoles, and quadrupoles ically in the range of lop2 to 10-l2 M, corre-
(permanent and induced). A more distinct sponding to a Gibbs free energy of binding of
character can be attributed to metal complex- roughly -10 to -70 kJ/mol(1,2).
ation, which can play a significant role in indi- According to the Gibbs-Helmholtz equa-
vidual cases of protein-ligand interactions, as tion, the free energy of binding consists of an
for example in metalloenzymes (2,40,41). enthalpic and an entropic contribution:
Finally, so-called hydrophobic or lipophilic
interactions are often mentioned as additional
contribution to protein-ligand interactions.
These terms are used to describe the preferen- The enthalpy and entropy of binding can be
tial association of nonpolar groups in aqueous determined experimentally, as, for example,
by isothermal titration calorimetry (46,471.
solution. It should be emphasized, however,
These data, however, are still sparse and not
that in contrast to what the name suggests,
always easy to interpret (48, 49). Substan-
there is no special hydrophobic force. Instead,
tial compensation between enthalpic and en-
one should speak of a hydrophobic effect. As tropic contributions is observed (50-52);
further mentioned below, according to the this phenomenon and its interpretations
generally accepted view, it arises primarily have recently been critically reexamined
from the entropically favorable replacement (53). Interestingly, the data also show that
and release of water molecules (42, 43). The binding can be both enthalpy-driven (e.g.,
association between the nonpolar surfaces it- streptavidin-biotin, AG = -76.5 kJ/mol, AH
self is simply based on weak London forces = -134 kJ/mol) or entropy-driven (e.g.,
(36). streptavidin-HABA, AG = -22.0 kJ/mol, AH
Thermodynamically, the strength of the in- = + 7.1 kJ/mol) (54). However, because of
teraction between a protein and a ligand is strong temperature dependencies, even this
described by the binding affinity or (Gibbs) partitioning is a question of the temperature
free energy of binding. Assuming a simple used for measuring.
equilibrium reaction of the form What are the major contributions to the en-
thalpy and entropy of binding? Direct interac-
tions between the protein and the ligand are
obviously very important for the enthalpy of
between a protein P and ligand L to give the binding. Besides that, an essential factor is
complex PL, the dissociation constant K, (or that protein-ligand interactions occur in
mneral Concepts and Physical Background 287
Figure 7.1. Overview of the receptor-ligand binding process. All species involved are solvated by
water (symbolized by gray spheres). The binding free energy difference between the bound and
unbound state is a sum of enthalpic components (breaking and formation of hydrogen bonds, forma-
tion of specific hydrophobic contacts) and entropic components (release of water from hydrophobic
surfaces to solvent, loss of conformational mobility of receptor and ligand).
aq' leous solution (cf. Fig. 7.1). The unbound and released. This leads to an entropy gain
re5kction partners are solvated and partial de- that is attributed to the fact that the water
solvation is required before complex forma- molecules are no longer positionally confined.
tion can occur. This is important for the en- In addition, there is an enthalpic contribution:
th:ilpy balance because the net energy gain water molecules occupying lipophilic binding
UPon complexation can only be the difference sites are unable to form hydrogen bonds with
be1:ween the direct protein-ligand interaction the protein, but after release they can form
en.thalpy and the desolvation enthalpies of the strong hydrogen bonds with bulk water. Be-
two molecules. In this context, the hydropho- cause the removal of hydrophobic surfaces
bicIeffect has to be considered again. Upon the from contact with water leads to negative
foramation of lipophilic contacts between apo- changes in the heat capacity (AC,), the buried
lm parts of the protein and the ligand, unfa- hydrophobic surface area has frequently been
vo:rably ordered water molecules are replaced correlated with AC, values measured upon li-
Docking and Scoring Functions/Virtual Screening
gand binding. This, however, may be an over- Thus, the electrostatic interaction of an ex-
simplification, neglecting other potential con- posed salt bridge contributes as much as a
tributions to AC, (55). As further noted by neutral hydrogen bond (5 + 1 kJ/mol accord-
Tame, enthalpy-entropy compensation and ing to Ref. 66), but the same interaction in the
the temperature dependency of AH and T A S interior of a protein can be significantly stron-
(which are both directly related to AC,,), make ger (67). Because of the complicated interplay
it ultimately impossible to consider polar or with water, a detailed analysis of the thermo-
apolar contributions as purely enthalpic or en-
-
dynamics of hydrogen bond formation can
tropic, respectively (56). sometimes yield surprising results. For a par-
Entropically unfavorable contributions ticular hydrogen bond in complexes of-the
arise from the loss of translational and rota- FK506-bindingprotein, it has been found that
tional degrees of freedom upon complexation, its formation is enthalpically unfavorable but
whereas a small gain in entropy can result entropically favorable (60). The entropy gain
from low-frequency concerted vibrations in appears to be attributable mainly to the re-
the complex. A more important factor to con- placement of two water molecules (68).
sider in an actual design process is conforma- Contributions from hydrophobic interac-
tional flexibility. Upon binding, internal de- tions have frequently been found to be propor-
grees of freedom are frozen, the ligand loses a tional to the lipophilic surface area buried
considerable amount of its flexibility, and usu- from solvent, with values in the range of 80-
ally binds in one single orientation. This is 200 J/(mol A') (69-71). The entropic penalty
also the explanation why rigid analogs of flex- for freezing a single rotatable bond has been
ible ligands show higher affinity, as, for exam- estimated to be 1.6-3.6 kJ/mol at 300 K (72,
ple, observed for cyclic derivatives of ligands 73); recent estimates derived from NMR shift
that adopt the same binding mode as the open- titrations are much lower (0.5 kJ/mol) (74),
chain derivative (57, 58). Accordingly, higher but in the systems studied the conformational
affinity also results if the protein-bound li- restriction may not have been as high as in a
gand conformation is already preorganized in protein binding site. Finally, the unfavorable
solution. entropy contribution from the loss of transla-
From a variety of experiments, quantita- tional and orientational degrees of freedom
tive estimates for some of the mentioned en- has been estimated to be around 10 kJ/mol
ergetic contributions to protein-ligand bind- (75, 76).
ing could be derived. Based on data from Despite many inconsistencies and difficul-
protein mutants, the contribution of individ- ties in interpretation, most of the experimen-
ual hydrogen bonds to the binding affinity has tal data suggest that simple additive models of
been estimated to be 5 + 2.5 kJ/mol (59-62). protein-ligand interactions might be a reason-
This is similar to what has been obtained for able starting point for the development of
the contribution of an intramolecular hydro- methods to predict binding affinities, that is,
gen bond to protein stability (63,64). The con- for the derivation of empirical scoring func-
sistency of values derived from different pro- tions. Still, it has to be kept in mind that the
teins suggests some degree of additivity in the assumption of additivity in biochemical phe-
hydrogen-bonding interactions. The accurate nomena is not strictly valid (77). On the other
description of the interplay with water mole- hand, the large body of experimental data on
cules remains, however, a most challenging 3D structures of protein-ligand complexes
task. The contribution of hydrogen bonds to and binding affinities allows one to derive
the overall affinity strongly depends on local some general characteristics about protein-li-
solvation and desolvation effects and can gand interactions. Several features are com-
sometimes be very small or even adverse to monly found in complexes of tightly binding
binding, as illustrated by the comparison of ligands:
ligand pairs differing by just one hydrogen
bond (65). Charge-assisted hydrogen bonds 1. A high steric complementarity between the
are stronger than neutral ones, but also asso- protein and the ligand, an observation often
ciated with a higher desolvation penalty. described as the lock-and-key paradigm (78,
2 General Concepts and Physical Background
79). This complementarity, however, is fre- structures of a ligand and a protein, the task is
quently not the result of a match between to predict the structure of the resulting com-
rigid bodies, but rather achieved through sig- plex. This is the so-called docking problem. Be-
nificant conformational changes of both cause the native geometry of the complex can
binding partners, a phenomenon generally generally be assumed to reflect the global min-
referred to as induced fit. Additionally, elec- imum of the binding free energy, docking is
trostatic complementarity can also be in- actually an energy-optimization problem (821,
duced, for example, by strong pK, shifts upon concerned with the search of the lowest free
ligand binding that result in the release or energy binding mode of a ligand within a pro-
uptake of protons of different functional tein binding site. The macromolecular nature
groups either of the protein or the ligand. of the protein and the fact that binding occurs
2. A high complementarity of the surface in aqueous solution complicate the problem
properties. Lipophilic parts of the ligands significantly because of the high dimensional-
are generally in contact with lipophilic ity of the configuration space and considerable
parts of the protein, whereas polar groups complexity of the energetics governing the in-
are usually paired with suitable polar pro- teraction. Accordingly, heuristic approxima-
tein groups to form hydrogen bonds or tions are frequently required to render the
ionic interactions. problem tractable within a reasonable time
3. An energetically favorable conformation of frame. The development of docking methods is
the bound ligand. Significant conforma- therefore also concerned with making the
tional strain is usually not observed in li- right assumptions and finding acceptable sim-
gands binding with high affinity. plifications that still provide a sufficiently ac-
curate and predictive model for protein-ligand
interactions.
In addition to insights taken from high-&- Regardless of the nature of the interacting
ity complexes, experimental information about partners, computational docking always re-
weakly bound complexes could be equally in- quires two components, which may briefly be
structive. Such information has indeed been characterized as "searching" and "scoring"
recognized to be vital for the development of (83). "Searching" refers to the fact that any
scoring functions (80). Structural data on unfa- docking method has to explore the configura-
vorable protein-ligand interactions, however, tion space accessible for the interaction be-
are sparse, partly because structures of weakly tween the two molecules. The goal of this ex-
binding ligands are more difficult to obtain and ploration is to find the orientation and
are usually considered less interesting by many conformation of the interacting molecules cor-
structural biologists. What can be concluded responding to the global minimum of the free
from the available data is that an imperfect energy of binding. Unless the degrees of free-
steric fit at the lipophilic part of the protein-li- dom are restricted to translation and rotation
gmd interface leads to reduced binding affinity by treating both molecules as rigid bodies, a
and that unpaired buried polar groups at the full systematic search of all "dockings" is nor-
protein-ligand interface are strongly adverse to mally not feasible because of the huge number
binding. Few buried CO and NH groups in of potential solutions and the large amount of
folded proteins fail to form hydrogen bonds (81). commtational resources needed to evaluate
*
Therefore, in the ligand design process an im- them. Different strategies are therefore re-
portant prerequisite to be regarded is that polar quired, which should be accurate and efficient:
functional groups, either of the protein or the accurate in the sense that the optimization
ligand, will find suitable counterparts if they be- procedure should not miss any valuable solu-
come buried on ligand binding. tion (near-global minima), and efficient in
terms of computing time and with respect to
.2 Docking, Scoring, and Virtual Screening:
the fact that the algorithm should not spend
e Basic Concepts
unnecessary time by exploring irrelevant re-
he subject of docking is the formation of non- gions or by rediscovering previously detected
d e n t protein-ligand complexes. Given the local minima. As will be elaborated in the next
Docking and Scoring FunctionsIVirtual Screening
section, there are two opposing approaches to pounds to compound libraries, state-of-the-art
simplify the docking problem either by refor- docking and scoring methods have to be suffi-
mulating it to a discrete problem that can be ciently fast to be applied for virtual screening.
solved with combinatorial algorithms or by us- The general strategy of a virtual screening
ing stochastic search algorithms. process based on the 3D structure of a target
"Scoring" refers to the fact that any dock- typically involves the following steps:
ing procedure must evaluate and rank the con-
figurations generated by the search process. Analysis of the 3D protein structure.
The scoring scheme most closely related to ex- Selection of key interactions that need to
periment, the ab initio calculation of the free be satisfied by all candidate molecules.
energy of binding, is not easily accessible to Computational search in chemical data-
computation. Hence, approximate scoring bases for compounds that potentially sat-
functions must be used that model the binding isfy the key interactions, fit into the bind-
free energy with sufficient accuracy and corre- ing site, and form additional interactions
late well with experimental binding affinities. with the protein; this is done by means of
In particular, the scoring function should be docking and/or structure-based pharma-
able to discriminate between native and non- cophore searches.
native binding modes. Postprocessing by analyzing the retrieved
Scoring is actually composed of three dif- hits and removing undesirable compounds.
ferent aspects relevant to docking and design: Synthesis or ordering of the selected
compounds.
1. Ranking of the configurations generated by
Biological testing, eventually crystallo-
the docking search for one ligand interact-
graphic confirmation.
ing with a given protein; this aspect is es-
sential to detect the binding mode best ap-
proximating the experimentally observed All these stem will be discussed in some
situation. more detail in section below. Of primary inter-
est in the context of this chapter is step 3. It
2. Ranking different ligands with respect to
requires high-throughput docking with effi-
the binding to one protein, that is, priori-
cient search algorithms, and scoring functions
tizing ligands according to their affinity;
that are able to provide a good separation be-
this aspect is essential in virtual screening.
tween potentially "binding" and "nonbind-
3. Ranking one or different ligands with re- ing" ligands. The database or library that is
spect to their binding affinity to different screened should consist of a sufficiently large
proteins; this aspect is essential for the con- and diverse set of relevant compounds. Thus,
sideration of selectivity and specificity. library design is increasingly applied to ensure
that only reasonably preselected compounds
If one were able to accurately calculate the are docked (29,84,85).
free energy of binding, all three aspects would
be satisfied simultaneously. Current scoring
functions used in docking programs, however, 3 DOCKING
can usually resolve satisfactorily only the first
aspect. They provide only a rough estimate In this section, approaches to the docking
with respect to the comparison across differ- problem are presented with respect to the
ent ligand or protein systems. This is the case docking algorithm and the search aspect,
whenever the scoring scheme neglects certain Scoring is discussed separately in Section 4. It
factors that are virtually constant for different should be noted in this context, that although
binding modes with respect to one protein, but a specific docking method is frequently associ-
that matter for comparisons with other pro- ated with a certain scoring procedure, many
teins. docking methods could in principle be com-
Following the general paradigm shift in bined with a variety of different scoring func-
structure-based design from single com- tions, either for postprocessing of the results
3 Docking 291
or as objective function during the optimiza- ignore information already available from bio-
tion. Actually, such strategies are followed by chemical experiments or structural data of re-
considering multiple scoring schemes to lated complexes. If no such information is
achieve "consensus scoring" (86) or "multidi- available [a situation that we may increasingly
mensional scoring" (87). The emphasis in this be facing as a consequence of the effects of
section is on general characteristics and prin- the structural genomics initiatives (17, 96)],
ciples, rather than individual methods, al- methods to identify binding sites are required
though occasionally specific docking programs before the actual docking process can start.
have been selected as representative examples Examples are programs for geometric cavity
for a more detailed illustration of a general detection, such as LIGSITE (97) or PASS (981,
concept. The interested reader is referred to tools to infer protein function from structural
Table 7.1 for an overview of currently used homologies (99, loo), or more sophisticated
docking programs described in the literature. approaches based on a physicochemical and
In addition, a valuable source of information is geometrical characterization of binding sites
the corpus of regularly published reviews in (101). Some docking programs incorporate
the field of docking (18,19,26,27,83,88-95). routines for binding site identification as pre-
processing steps (102).
3.1 General Concepts to Address the Despite a reduction to only a specified part
Docking Problem of the protein surface, a simple representation
in terms of atomic coordinates is not practical
Essential for any docking method is a search for most docking procedures. Instead, the
algorithm that samples the configuration space available for ligand binding is frequently
space of two interacting molecules. These mol- characterized bv " other means that permit
ecules need to be represented in a way that is more efficient searches. A first alternative is
suitable for efficient handling by the search given by geometric shape descriptors, some-
algorithm. Docking methods may therefore times combined with a physicochemical de-
roughly be classified by the way the macromo- scription. Approaches of this class include mo-
lecular receptor is represented (Section 3.1.11, lecular surface cubes (103),surface normals at
by the handling of the ligand (Section 3.1.21, sparse critical points (104), and modified Lee-
and-most important-by the search algo- Richard's dotted surfaces, with each dot cod~d
rithm itself (Section 3.1.3). by chemical property and accessibility (105).A
further prominent example is the sphere
3.1.1 Representation of the Macromolecu- images of the binding site used in DOCK (106,
lar Receptor. The most straightforward ap- 107). These spheres are complementary to the
proach for representing the macromolecular molecular surface and represent a space-fill-
structure in a docking application would be by ing negative image of the binding site. An-
tomic coordinates of the entire protein. A full other important concept that goes beyond a
tomic representation, however, is generally pure geometric description and represents in-
practical because of the size and complexity teraction properties of physicochemical rele-
in structures. The structural informa- vance is the usage of interaction sites or
refore needs to be reduced to a man- points, as introduced by the program LUDI
e yet representative size and form. (108,109). These interaction sites are discrete
first step into this direction is to limit the positions and vectors in space serving as
ch area to the region surrounding the pu- dummy representations for atoms capable of
tive binding site. This is general practice in forming hydrogen bonds or filling hydropho-
tein-ligand docking (whereas in protein- bic pockets. The docking tool FlexX is based on
ein docking often the entire surfaces are this concept (110). Also, the program SLIDE
ched for appropriate matches). Scanning (111) and the new approach by Diller and
the entire surface for potential binding re- Merz (112) use interaction points for fast
ns of a small-molecule ligand would hardly docking.
feasible with most docking methods. Fur- A popular alternative to geometric or phys-
more, it would be rather unreasonable to icochemical descriptors is the grid representa-
292 Docking and Scoring FunctionsNirtual Screening
tion of protein structures. The general princi- ularly spaced, orthogonal grids are calculated
ple of this approach is that the protein is before the actual docking process. At every
represented by a set of affinity grids or maps grid point, some sort of scoring value or inter-
that cover the entire search region. These reg- - action energy of a probe atom with the entire
3 Docking
protein is calculated, providing a map of pseu- ferent conformers using distance geometry
do-affinities for each atom type or interaction and docking each conformer in a rigid-body
type possibly present in the ligands to be fashion. A similar approach has also been ob-
docked. These maps then serve as look-up ta- tained with the DOCK program (122). To
bles for the calculation of the interaction en- avoid redundancy in the docking, a common
ergy or scoring value during the docking pro- rigid fragment is identified, which is docked
cess. Examples of docking programs using this only once for the entire set of pregenerated
approach are AutoDock (113-115), ICM (82, conformers. The flexible portions of the mole-
116,117),or ProDock (118, 119). cule that determine the different conforma-
It should be noted that most of the men- tions are subsequently scored based on the
tioned representations of protein structure preplacement of the rigid fragment. Yet other
imply that the protein remains rigid during examples for rigid docking of multiple con-
the docking process. As a matter of fact, dock- formers are provided by the programs FRED
ing under the assumption of a rigid protein is from OpenEye Scientific Software, which per-
still common practice in standard applica- forms a fast exhaustive search over all possible
tions. Although an acceptable simplification orientations (123), and SYSDOC (124) or EU-
under certain circumstances, it can represent DOC (125), which use fast affine transforma-
a serious limitation if only unbound protein tion to perform systematic searches over the
structures are available. As a consequence, the translational and rotational degrees of free-
inclusion of protein flexibility in the docking dom of the ligand.
process is an active area of research, and a Although this multi-conformer docking can
separate section is dedicated to this issue (cf. be efficient and accurate for molecules with a
Section 3.2.1). limited number of discrete, low-energy confor-
mations, it is less suited for larger and highly
3.1.2 Ligand Handling. For the ligand, a flexible molecules, simply because the number
complete representation in atomic coordi- of possible conformations increases dramati-
nates is perfectly feasible. Ligand atoms may cally. Another way of partially accounting for
be used directly for matching with binding site conformational flexibility in whole-molecule
descriptors or in the calculation of interaction rigid-body docking is to subject the initial
energies in the case of energy-driven proce- matches to some kind of optimization that al-
dures. The central problem is conformational lows for conformational relaxation. This could
flexibility. Predicting the binding conforma- be done with some standard energy minimiza-
tion of a ligand is in fact a major component of tion technique (126,127) or other procedures
the docking problem, given that this confor- that resolve clashes of the initial placement by
mation can simificantly
- - differ from that rotation about single
- bonds, as done, for exam-
adopted in other environments. ple, in the docking SLIDE.(^^^).
Two general strategies for ligand handling A more rigorous treatment of ligand flexi-
may be distinguished: whole-molecule ap- bility in whole-molecule docking is performed
proaches and fragment-based methods. In the by sampling ligand conformation space during
&st case, the ligand is docked as an entire docking (variant 2 in Fig. 7.2). It normally re-
molecule. This is rather straightforward if the quires ligand conformational energies to be
gand is treated as a rigid body and only trans- evaluated besides intermolecular interaction
ional and rotational degrees of freedom are energy. Molecular mechanics force fields are
nsidered. Such rigid docking was common frequently applied for this purpose. Although
radice in early docking algorithms (106, a more exhaustive sampling of accessible con-
20). A straightforward extension to account formations within the binding site is definitely
r flexibility is to separately dock precalcu- achieved, an obvious disadvantage is the
d conformers of a given molecule (variant higher computational demand and possibly a
in Fig. 7.2). Explicit docking of multiple con- reduced efficiency of the algorithm because of
rmers has, for example, been obtained with lengthy exploration of local minima.
e FLOG program (121). FLOG deals with An interesting variant of whole-molecule
nformational flexibility by generating dif- representations is the use of internal coordi-
294 Docking and Scoring FunctionsIV~rtualScreeniing
2. Simultaneous optimization
of orientation and conformation
(simulatedannealing, GA)
nates instead of Cartesian coordinates (82). Methods of this class have been reviewed t
Internal coordinates help to reduce the num- tensively (26,27). However, the approach E
ber of variables defining the conformation of also been applied for docking (128) and co
the molecular system. In Cartesian space, pared to the whole-molecule docking approa
three functionally equivalent variables per (129).
atom are required. Internal coordinates, in- The other variant of fragment-based liga
stead, consist of bond lengths, bond angles, docking is used in incremental constructi
and torsion angles. Because bond lengths and algorithms (110,130),sometimes also referr
angles can be considered rigid to a good ap- to as "anchor and grow" (131). These sear
proximation, only the torsion angles matter as strategies are further described below. Th
variables to map conformation space. An effi- dissect the ligand into modular portions a
cient implementation of docking algorithms rebuild it incrementally within the bindi
operating on internal coordinates has been ob- site starting from the docking position oj
tained, for example, with the ICM method (82, suitable base fragment. The advantage is tl
116, 117). many potential combinations are eliminat
Fragment-based techniques are an alterna- early in the construction, but success critica
tive to whole-molecule docking (variant 3 in depends on the selection and placement oft
Fig. 7.2). Here, the molecule is dissected into base fragment.
fragments that can be docked individually in a
rigid fashion. The fragments can either be 3.1.3 Strategies for Searching the Confil
docked separately and then reconnected, or ration and Conformation Space. Search str
the ligand is built up incrementally following a egies of automated docking procedures m
certain fragmentation scheme. The first vari- roughly be classified as geometric or combir
ant is very common to programs dedicated to torial on the one hand and energy driven
de novo design rather than pure docking. stochastic on the other, although ultimat~
3 Docking
all methods try to optimize a function that ible geometry and chemistry. This mapping is
models to some extent the free energy of bind- used to generate initial placements of mole-
ing. cules in the binding site and followed by a se-
3.1.3.1 Geometric/Combinatorial Search ries of steps that refine the initial position,
Strategies. Most of the early docking methods resolve collisions, and consider flexibility of
were entirely based on the concept of shape both the ligand and the protein side chains (cf.
complementarity. Until today this is the fun- note on hybrid approaches below). Similarly,
damental idea in most protein-protein docking the rapid docking approach for library priori-
programs. The observation that protein-li- tization developed by Diller and Merz (112) is
gand complexes frequently show a remarkable based on rigid-body triplet matching of ligand
shape fit of both binding partners has stimu- atoms onto precalculated hot spots; subse-
lated the conception of surface or descriptor quently, pruning is performed to remove any
matching as docking search technique. The positions with significant steric clash, and the
molecules are represented by geometric remaining matches are subjected to energy
andlor physicochemical descriptors and vari- minimization.
ous alignment procedures are applied to Pure descriptor matching is efficient for
match complementary parts of ligand and rigid-body docking only. Flexible docking, in
protein. An example is the original DOCK fact, is always faced with the additional prob-
method, where the ligand is superimposed lem of a combinatorial explosion of possible
onto a negative sphere image of the binding conformers depending on the number of rotat-
pocket, using a distance matching algorithm able bonds. Systematic searches or explicit
followed by least-squares fitting (106, 132). consideration of each possible conformation
Other examples are the least-squares fitting would therefore require enormous computing
procedure described by Bacon and Moult to resources. A popular way to address this prob-
achieve matches between complementary sur- lem within the class of geometric/combinato-
face patterns (133), or the hierarchical search rial docking methods is incremental construc-
of geometrically compatible triplets of surface tion (110, 130, 131, 137). The ligand is
normals on the molecules to be docked, as pro- dissected into fragments and incrementally
posed by Wallqvist and Cove11 (134). The pro- reconstructed in the binding site starting from
gram ADAM performs a complete combinato- a suitably docked base fragment. To avoid
rial search over all possible matches between dead-end solutions during construction, mu)-
hydrogen bond patterns (135).Recently, a new tiple placements of the base fragment have to
matching algorithm based on so-called qua- be considered. In addition, it can be useful to
dratic shape descriptors has been described perform different fragmentations and hence
(QSDock);along with the presentation of their to use different base fragments as starting
method, the authors also provide an extensive points, especially for long and highly flexible
discussion of shape-based docking algorithms molecules. The docking itself, that is, the
placement of the base fragment and the at-
Another recent example of descriptor tachment of remaining portions, is guided by
atching is SLIDE, developed as a tool for li- some descriptor matching procedure.
d database screening by docking (111). An example of an incremental construction
binding site is represented by a template method is the program FlexX (110, 130, 138,
orable interaction points onto which li- 139). Conformational flexibility is considered
atoms are matched during the search. using a discrete set of preferred torsion angles
of serving as a purely geometric de- about acyclic single bonds, together with mul-
ption, these points address four different tiple conformations for ring systems. These
s of interactions (hydrogen-bond donor, torsion angle preferences are taken from a li-
ptor, donorlacceptor, or hydrophobic in- brary compiled from torsional fragments
action center). The search is then per- extracted from the Cambridge Structural Da-
med such that all triangles of appropriate tabase (140). The model of molecular interac-
ms in the ligand are exhaustively mapped tions is based on similar rules as implemented
nto triangles of template points with compat- in LUDI, originating from a composite crystal-
Docking and Scoring FunctionsIVirtual Screening
field analysis (141). For each group capable of sp3 atoms and six between sp2 and sp3 atoms.
forming an interaction, a special contact ge- The partial constructs are then locally opti-
ometry is defined: the group is placed to a cen- mized to minimize the sum of intra- and inter-
ter about which an interaction surface is de- molecular energies and pruned back to an ap-
fined, usually as part of a sphere. Two groups proximately constant size of configurations.
form an interaction if the interaction center of Pruning is necessary to cope with combinato-
one group coincides with the interaction sur- rial explosion. It is performed on the basis of
face of a counter group. To start with the ac- the score and the orientation, such that both
tual docking process, the ligand is fragmented the best scoring and most deviating orienta-
into components by dissecting at all single tions are retained from each expansion cycle.
bonds that are not part of a cycle. Out of these Finally, after complete reconstruction of the
components suitable base fragments are se- ligand, the pruned set of binding configura-
lected. The base fragment is the first portion tions is again subjected to local energy
of the ligand to be placed into the binding site. minimization.
This is done by superimposing either triples This anchor-and-grow implementation in
or pairs of interaction centers constructed DOCK represents a combination of a geomet-
around the base fragment with triples or pairs ric and energy-based approach to docking, due
of compatible interaction points generated in to the intermediate steps of energy minimiza-
the binding region. Normally, a large number tion. As already encountered for SLIDE (11 l),
of initial placements is generated, which is such multistep or hybrid approaches are com-
then reduced either by clustering similar solu- monly found in current docking protocols.
tions or because of clashes with the protein. DOCK in general is a prototype of such a pro-
Next, the incremental construction of the en- gram, originally based solely on rigid geomet-
tire ligand is initiated. Starting with the dif- ric descriptor matching, later enhanced with a
ferent base placements, the ligand is built up variety of additional features. For example,
by stepwise linking of the components in com- some degree of flexibility has been introduced
pliance with the torsional database. After into the rigid docking procedure by dissecting
hooking up additional fragments, new interac- the ligand into a small set of rigid fragments
tions are searched and a scoring function is that are docked separately and then recon-
used to select the best partial solutions, which nected (128). The concept of geometric shape
are expanded in the following step. This is complementarity has been extended to con-
done until the last fragment has been added sider physicochemical complementarity by as-
and placed to result in the complete ligand. signing properties to binding-site spheres and
The generated ligand positions are finally allowing them to match only those ligand
stored and ranked according to the predicted atoms that are of complementary character,
binding affinity. an approach referred to as "sphere coloring"
An anchor-and-grow algorithm has re- (142, 143). Rigid-body minimization has been
cently also been incorporated into DOCK introduced as refinement after the initial de-
(131). Here, after identification of rotatable scriptor-matching step (126) or in the variant
bonds, the ligand is fragmented into rigid seg- of on-the-fly optimization using force-field en-
ments, the largest segment is identified as the ergies precomputed on a grid (127). In sum-
anchor, and the remaining segments are orga- mary, the combination of different approaches
nized as layers surrounding the anchor. Then and algorithms to overcome the limitations of
the anchor is docked using geometrical match- every single approach has provided us with
ing. Based on the obtained anchor positions, steadily improving solutions to the docking
the conformational search is initiated by add- problem.
ing segments from the innermost layer and 3.1.3.2 Energy Driven/Stochastic Proce-
proceeding outward. This addition is done ac- dures. As mentioned above, docking is essen-
cording to the accessible torsion angle values tially an energy optimization problem because
along the newly added bond. The default is to the native binding mode of a ligand can in gen-
use two alternative settings for bonds between eral be expected to correspond to the global
two sp2 hybridized atoms, three between two minimum of the binding free energy (82). Ac-
3 Docking
cordingly, finding this binding mode by dock- docking procedures, they are applied in com-
ing corresponds to the identification of the bination with other techniques.
global minimum of the free-energy function. Monte Carlo methods consist of two essen-
Because the actual free energy of binding is tial components that are repetitively applied:
not accessible to computation, approximate a random walk of the ligand through the re-
energy evaluations or scoring functions are ceptor-near space (i.e., the random displace-
used to guide the search. These functions are ment along translational, rotational, and/or
required to model the free-energy surface in torsional degrees of freedom), and the evalua-
anappropriate way: although the absolute val- tion of the new configuration based on the Me-
ues are not of relevance for the structural as- tropolis criterion (144). This criterion decides
pect of docking, it is essential that the global whether a new position is accepted and hence
minimum of a relative free-energy function on the configuration from where the search
models accurately enough the position of the
will proceed. If the energy of the new docked
global minimum on the real free energy sur-
position (E,,,) is more favorable (lower) than
face. (It is worth mentioning in this context
the energy of the previous position (E,,,), the
that in purely geometrical or descriptor-based
docking procedures, the central assumption is new position is accepted. If it is less favorable,
that the degree of surface complementarity or the probability P for its acceptance is given by
matching between descriptors is proportional
to the interaction energy.)
With a suitable energy function available,
docking can be performed by global minimiza- where k is the Boltzmann constant and T is
tion of the energy with respect to the position, the effective temperature. To turn this sam-
orientation, and conformation of the ligand. pling technique into an efficient optimization
However, this apparently straightforward ap- method applicable to docking, it has to be com-
proach bears two fundamental problems, in- bined either with a temperature lowering pro-
herently related to characteristics of the en- tocol or with some local minimization steps.
ergy landscape of protein-ligand interactions: The former approach is known as Monte Carlo
the high dimensionality, which precludes a simulated annealing, the latter as Monte
systematic, exhaustive search; and the rug- Carlo minimization.
gedness of the surface, reflected by a large In simulated annealing, the effective t e h -
number of local minima. Because of this last perature T is initially set to a high value and
aspect, standard energy minimization tech- gradually lowered, after a predefined number
niques alone are not useful for docking appli- of Monte Carlo steps has been performed at a
cations because they can guide the search only given temperature. At high temperatures, a
to the next local minimum. They are used, broad region of configuration space is sam-
however, in combination with other tech- pled: energy barriers can be surmounted be-
niques and play a valuable role at certain cause of the high acceptance probability for
stages of the docking process, primarily to re- less favorable placements. As the temperature
h e docked positions and conformations by ex- is lowered, this becomes less probable and the
loring the local energy landscape in the vicin- configuration is optimized more locally. Given
ity of this position. the stochastic nature of the process, multiple
To address the docking problem, tech- independent runs are required to assess con-
ques for a more global exploration of the en- vergence (this equally applies to many of the
ergy landscape are required. A variety of methods further described below). Examples
methods is available, frequently used in the of docking programs using Monte Carlo simu-
context of other modeling applications and op- lated annealing as a search strategy are
zation problems as well. Three major AutoDock (113-115), RESEARCH (145,1461,
ses may be distinguished: Monte Carlo and MCDOCK (147).
chniques, molecular dynamics simulations, In Monte Carlo minimization, an addi-
d genetic algorithms. Many different vari- tional step is inserted after the random walk
ts exist for all of them and frequently, in before Metropolis evaluation. This step is a
Docking and Scoring FunctionsIVirtual Screening
local energy minimization, using techniques coupling to different thermal baths for both
such as steepest descent or conjugate gradi- types of motion of the ligand and the receptor
ent. Full local minimization after each ran- is performed. Because the temperature and
dom-walk step has been reported to improve the time constants of coupling to the baths can
the efficiency of the procedure (148, 149). A be varied arbitrarily, it is possible to increase
docking procedure that uses global Monte the kinetic energy of the center of mass of the
Carlo minimization is the ICM program of ligand without increasing the temperature of
Totrov and Abagyan (82, 116, 117). ICM de- the internal motions of receptor and ligand.
scribes both the relative positions of two mol- This allows for complete control of the search
ecules and their conformations by a uniform rate. The technique was applied to the docking
set of internal variables and uses precalcu- of phosphocholine to antibody McPC603,
lated grids of the interaction energies to speed starting from distinct positions well separated
up calculations. Trosset and Scheraga use from the actual binding site. After appropriate
Monte Carlo minimization in their ProDock sampling, the average structure of the com-
program; computational efficiency is en- plex in the binding region was found to closely
hanced by a grid-based energy evaluation resemble the crystal structure. Still, the
using Bezier splines, which enables one to method remains computationally expensive,
evaluate gradients and hence to perform min- and thus it is not yet suited for a large-scale
imization on a 3D grid (118, 119). Further application to practical drug design docking
Monte Carlo minimization docking proce- problems.
dures have been reported by Caflisch et al. Other docking applications of MD have
(150,151). Also, the QXP program of McMar- been reported as well. In a comparison of a
tin and Bohacek relies on Monte Carlo tech- CHARMM-based MD docking algorithm with
niques combined with energy-minimization a Monte Carlo and a genetic algorithm, Vieth
procedures (152). et al. have observed a comparatively good per-
Molecular dynamics (MD) simulations rep- formance of the MD search for the five ana-
resent another technique to sample configura- lyzed test cases (166). Pak et al. have recently
tion space (31, 153-157). Based on Newton's presented a docking approach based on so-
equation of motion and principles of statistical called q-jumping MD (167, 168); its basic idea
thermodynamics, the standard application of is to apply a smoothed generalized effective
this technique is to analyze flexibility and dy- potential to enhance conformational sampling
namic properties of molecular systems and to by MD. Luty et al. have combined a grid rep-
calculate free energies in a theoretically rigor- resentation for the bulk portion of the recep-
ous manner (158-163). With respect to pro- tor with MD simulations of the ligand in the
tein-ligand docking, MD simulations could in flexible binding site (169). Multiple-copy si-
principle be used to simulate the actual bind- multaneous search methods (MCSS) can help
ing process, thus providing a "realistic" view to speed up energy-based searches. They use
of how the docking process proceeds, although numerous ligand copies that are transparent
this is computationally still out of reach. In to each other, but subject to the full force of
fact, standard MD requires massive computa- the protein (170, 171). Finally, short MD sim-
tional resources, which limits its application ulations are occasionally used at some stage of
to a small number of selected systems. In the a docking procedure, primarily with the pur-
context of docking, the problem is that stan- pose of local refinement, as for example in the
dard MD is slow in exploring global features multistep docking strategy of Wang et al.,
(crossing of large barriers and exploration of where the last step is an MD-based simulated
multiple binding sites); accordingly, MD is es- annealing (129).
sentially limited to the simulation and refine- The third major class of search methods are
ment of already bound complexes. Di Nola et genetic algorithms (GAs), which are widely
al. have addressed this problem in their MDD used for docking purposes. GAs are stochastic
(MD docking) algorithm (164, 165). This optimization methods inspired by the con-
method separates the ligand's center of mass cepts of evolution (172-174). The optimization
motion from its internal motions. A separate problem is generally formulated in the lan-
guage of genetics. Initially, a random popula- eration, a user-defined fraction of the popula-
tion is generated in which each member corre- tion is subjected to such a local minimization.
sponds to a potential solution of the problem. This hybrid algorithm was found to be more
A member of the population is represented by efficient than a traditional GA, also imple-
its chromosome, in which the variables to be mented in AutoDock. A conceptually similar
optimized are encoded. This means that each strategy has recently been implemented into
mosome contains a number of genes, the docking program DARWIN (178). Here, a
re the genes correspond to the value of a standard GA is combined with a gradient min-
ain variable or set of variables. In the case imization search strategy through an inter-
of docking, the variables for translation and face to the CHARMM molecular mechanics
rotation, as well as the torsion angles of the program (179). Further GA-based docking
gand, are encoded in the chromosome. Ge- methods can be found in the literature (86,
netic operators are then applied to the initial 180-183).
population to generate a new population. In Another class of evolutionary algorithms
nerd, these operators are "crossover," by that has occasionally found application in the
ch genes from two distinct chromosomes context of docking is known as evolutionary
interchanged to generate two new individ- programming (184). Its main difference with
als, and "mutation," by which a given gene is respect to GAS is that there is no recombina-
andomly modified. For each newly generated tion (crossover) operator, such that evolution
vidual the chromosome is decoded (geno- is wholly dependent on mutation. Gehlhaar et
e +phenotype) and the fitness of the indi- al. (185) and Westhead et al. (186) have dem-
al is evaluated. In the context of docking, onstrated the applicability of evolutionary
fitness is the interaction energy or dock- programming to the docking problem, although
g score. Individuals with better scores re- in a comparative study other algorithms were
ive a higher chance for being selected as found to be more effective (186). A new variant
ers of the new population, and thus a called "family competition evolutionary algo-
er chance of survival and reproduction rithm" has recently been proposed for docking
o the next generation. Accordingly, the av- (187).
age fitness increases from generation to gen- Besides the three major classes of energy-
ration, until, at some point, the process is driven searches (MD, MC, GA), some further
rminated (by reaching either a fixed number heuristic algorithms and search strategies
generations or a constant fitness of the pop- have been developed or adapted for the dock-
on). The best individual of this final pop- ing problem. "Tabu search" was found to per-
ion represents the solution. form well in comparison with other algo-
any different variants and implementa- rithms (186) and has thus become the main
ons of GAS for docking exist, but the general search strategy of the PRO-LEADS docking
atures are always similar. The application of program (188, 189). Briefly, the tabu search
GAS in drug design and docking has been re- operates on randomly generated positions
by Clark et al. (175). A prominent ex- that are examined on the basis of a tabu list.
of a docking program based on a GA is This list contains a number of previously gen-
(176, 177). A special characteristic of erated solutions and serves to impose restric-
LD is the direct encoding of hydrogen tions on the search process: a random move of
nding motifs in the chromosome represen- the ligand is considered "tabu" if it generates a
ion. Upon chromosome decoding, a least- solution that is not sufficiently different from
ares fit is used to optimize the overlap of the stored solutions, unless its energy is more
plementary pairs of hydrogen-bonding favorable than the energy of the best solution
es present in the ligand and the receptor. so far. Using these restrictions, the search is
e newest version of AutoDock contains an prevented from revisiting regions of the
resting variant of a GA, a so-called search space and the exploration of new areas
arckian GA (115).This is the combination is encouraged. Ideas from tabu search are also
a traditional GA with a local search method used in the recently described adaptation of
perform energy minimization. At each gen- the Mining Minima algorithm for protein-
Docking and Scoring FunctiondVirtual Screening
ligand docking (190). Here, an exclusion zone docking it is frequently not justified to neglect
is placed around each energy minimum as it is protein flexibility (35). If no alternative for
discovered, to avoid rediscovering it in future docking into the rigid protein is available, at
docking iterations. Mining Minima itself is least a protein conformation (possibly from a
based on a variety of optimization techniques complex structure) should be used that is com-
to gradually focus a large region of random patible with suitable binding modes. Obvi-
search to areas around the lowest energy min- ously, a preferable docking tool would con-
ima. sider full protein flexibility, but appropriate
realization of this goal remains a challenge be-
3.2 Special Aspects of Docking
cause of the high dimensionality of protein
Besides the general characteristics outlined conformation space. Consideration of protein
above, there are a number of special issues flexibility also complicates the problem of
associated with the docking methodology that scoring and selecting the best ligand place-
deserve explicit consideration: protein flexibil- ment, given the difficulty in accurately evalu-
ity, water molecules, and objective assess- ating protein conformational free energies in
ment. In addition, the interplay of docking addition to ligand-binding free energies.
with QSAR methods and homology modeling Current approaches to the problem of flex-
is of further interest to highlight the possibil- ible protein docking have recently been re-
ities opened by combined application of stan- viewed by Carlson and McCammon (196), and
dard methods in structure-based drug design. more briefly by Abagyan and Totrov (18)and
Claussen et al. (197). The methods differ by
3.2.1 Protein Flexibility. Proteins are in- the degree of flexibility they can cover. The
herently dynamic systems (153,191).A single, least complex methods are those that model
fixed conformation, even the average provided small adjustments of contact residues and side
by a crystal structure, may not be an adequate chains in an implicit way using soft docking.
representation of the protein, unless the sys- The protein itself remains fixed, but either
tem is very rigid (192). Instead, even under through an adapted geometric representation
standard equilibrium conditions, the native or using a tolerant scoring function a certain
folded state of a protein is best characterized amount of overlap between the protein and
by a collection or ensemble of energetically the ligand is allowed, emulating some "plastic-
nearly equivalent conformations. If the condi- ity" of the receptor. The docking program by
tions are changed, the local minima and the Jiang and Kim based on the matching of mo-
population of these states may shift, eventu- lecular surface cubes is explicitly based on this
ally resulting in an observable change of the soft docking idea (103). Other more recent
average structure. Also, the introduction of a docking approaches have implemented a soft
ligand corresponds to a change of the environ- scoring function (198). The advantage of these
ment that may lead to similar effects. Accord- simple approaches is that they do not increase
ingly, the binding conformation of the recep- the demands on computing time.
tor may already be present in the ensemble of The next level is represented by methods
protein conformations (193, 194) and the li- that allow for explicit side-chain flexibility.
gand does not actively deform a fixed state of GOLD'S genetic algorithm can handle the ro-
the protein, as generally inferred from the "in- tation of a few terminal hydrogen-bond donor
duced fit" model. and acceptor groups to optimize the hydrogen-
Whatever the actual mechanism might be, bonding network (176, 177). A technique to
the comparison of experimental protein struc- handle larger side-chain movements is the use
tures in the ligand-free and in the complexed of side-chain rotamer libraries, as first demon-
state frequently shows protein conforma- strated by Leach. In this approach, heuristic
tional changes induced by or associated with algorithms such as dead-end elimination are
ligand binding (195). The spectrum of phe- used to search the large combinatorial space
nomena ranges from side-chain rotations to (199). Schaffer and Verkhivker instead use a
loop rearrangements and the movement of en- rotamer library to first generate likely side-
tire domains. Accordingly, in the context of chain conformations, which are then sub-
3 Docking
jected to energy minimization together with MD simulations with grid-based docking pro-
the docked ligand (200). Another approach tocols (209). The third and most sophisticated
making use of minimization has been de- approach to handle protein ensembles is im-
scribed by Apostolakis et al.: after "seeding" plemented into FlexE, a variant of the FlexX
the receptor with randomly generated ligand program (197). FlexE is based on a united pro-
positions that may overlap with the protein, tein description generated from the superim-
the complex is subjected to minimization, dur- posed structures of the ensemble. For the
ing which nonbonded interactions are gradu- parts that differ among the protein structures,
ally switched on, to gently relieve steric over- discrete alternative conformations are explic-
lap by minor conformational changes of the itly taken into account on the fly during the
ligand and receptor. The best-ranked solu- incremental construction of the ligand in the
tions are then subjected to further refinement binding site. As an important feature, these
by Monte Carlo minimization (151). Further- geometric alternatives are optimally joined to
more, the Monte Carlo minimization tech- create new valid protein structures in a com-
nique in internal coordinates of the ICM pro- binatorial fashion. Thus, conformations of the
gram can sample and optimize side-chain protein are not limited to those explicitly
torsions during ligand docking (117, 201). Fi- present in the ensemble, nor are the interac-
nally, the docking tool SLIDE allows for some tions blurred by averaging over distinct alter-
side-chain flexibility at the optimization stage native instances, which may correspond to un-
of initial placements. In SLIDE, collisions are realistic protein conformations.
resolved by rotations about single bonds in the The so-called Low Mode Search (LMOD),
ligand and the protein side-chains to reduce a originally established as a method for confor-
maximal number of collisions by minimal con- mational analysis (2101, has recently been
formational changes of both binding partners demonstrated to be applicable also to the prob-
(111,202). lem of docking flexible ligands into flexible
An alternative to account, in principle, for protein binding sites (211). To explore the po-
an arbitrary degree of protein flexibility is the tential energy surface of molecules, LMOD is
use of protein structure ensembles. The en- based on eigenvector following, where eigen-
sembles could be assembled from multiple vectors correspond to the (low-frequency)
crystal structures of a given protein, from "normal modes" of vibration. For the purpose
NMR structure determination, or from trajec- of docking, LMOD has been combined with a
tories of molecular dynamics simulations. In limited torsional Monte Carlo movement, as
addition, a rotamer library can be used to cre- well as random translation and rotation of the
ate a minimal set of new conformations (203). ligand.
Whatever the origin of the individual mem- Generally, however, full consideration of
bers of the ensemble, each represents a dis- flexibility, either of the binding site or the
tinct conformational state of the protein, and entire protein, remains the domain of MD sim-
may eventually correspond to the preferred li- ulations. The disadvantage is their high com-
gand-binding state. Three different ways to putational demand required to achieve signif-
use protein ensembles for docking can be dis- icant sampling. Simplified MD restricted to
tinguished: in its most straightforward form, the binding site has been used by Luty et al.,
docking is carried out sequentially with each where the bulk of the protein receptor is rep-
member of the ensemble using rigid-receptor resented as a grid, whereas a full atomic de-
docking (124,204-206). Another way is to use scription is used only for the proximity of the
a weighted-average representation of the en- binding site to include flexibility in the dock-
semble. Knegtel et al. followed this approach ing process (169). The approach of Mangoni et
by generating composite grids that were used al. mentioned above provides a method for en-
for scoring within the DOCK program (207). hanced sampling. It has been used to dock a
Recently, it has also been tested with ligand into a receptor that is treated fully flex-
AutoDock (208). Broughton has developed an- ible and solvated with explicit water molecules
other method by combining statistical analy- (165). Alternatively, shorter MD runs may be
sis of a conformational ensemble from short used at intermediate or final stages of a dock-
Docking and Scoring Functions/Virtual Screening
ing procedure to refine complexes generated tion. Because of the high computational costs,
by rigid-body docking methods. In this case, the approach seems affordable only in special
however, flexibility is not considered simulta- cases where the presence of explicit solvent
neously to the docking process. It thus only appears important.
refines solutions from rigid receptor docking An approach to explicitly place water mol-
and does not enhance the scope of the search ecules during fast docking has been intro-
for possible binding modes. duced into F l e a (216). In a preprocessing
phase, possible favorable water sites in the
3.2.2 Water Molecules. Water plays a cru- binding pocket are calculated and stored. Dur-
cial role in molecular interactions (212, 213). ing the incremental construction phase of
At the interface of a protein-ligand complex, FlexX, water molecules are switched on at
water molecules can have a significant impact these sites if they provide additional hydrogen
on complex formation, either by mediating or bonds to the ligand. Steric constraints pro-
improving specificity and affinity of the inter- duced by these water molecules and the qual-
action. They promote adaptability, thus allow- ity of the achieved hydrogen bond geometry
ing for promiscuous binding (214). Individual are then used to optimize the ligand orienta-
conserved ("structural") water molecules can tion during the construction process. In sev-
therefore be crucial for the successful design eral cases, water molecules between protein
of new inhibitors. A prominent example is the and ligand could be correctly predicted; how-
structural water molecule observed in nearly ever, the overall improvement on the FlexX
all HIV protease complexes with substrate- docking results for a test set of 200 complexes
like inhibitors. Attempts at replacing it have was nearly negligible.
guided the design of new tight-binding inhibi- The program SLIDE can consider tightly
tors [e.g., (21511. Instead of the usual implicit bound waters while docking potential ligands
modeling of solvation effects, explicit consid- (111). To select which water molecules to re-
eration of structural water molecules and wa- tain and which to remove from the binding
ter-mediated interactions would therefore be pocket before docking, the knowledge-based
a highly desirable feature in docking methods. approach Consolv (217) is applied to deter-
Ideally, simultaneously to the ligand place- mine those waters that are likely to be con-
ment the docking program should be able to served upon ligand binding and to adjust a ,
predict whether at a particular site water mol- penalty for their displacement. Once these wa-
ecules mediating protein-ligand interactions ters have been selected to be initially retained
may preferably reside or whether the displace- upon docking, SLIDE either translates or dis-
ment of these water molecules by appropriate cards a water molecule to remove overlap with
ligand functional groups would be more favor- ligand atoms after the ligand has been docked
able. No docking tool is yet available to accom- to the binding site. Displacement of a water
plish this task. Obviously, not only the place- molecule is performed only if collisions cannot
ment of water molecules is demanding, but be resolved by iterative translations. Any dis-
especially their energy scoring, resulting from placements by nonpolar ligand atoms are pe-
the complicated thermodynamics associated nalized upon scoring. In database screening
with water interactions. runs on three different target proteins, this
In principle, MD simulations provide the procedure was found to produce reasonable re-
most natural route to the explicit consider- sults with respect to water-mediated interac-
ation of water molecules. In the MD docking tions, but no systematic test has been reported
approach described by Mangoni et al., explicit so far.
water molecules are indeed used (165). It was As long as a simultaneous docking of water
found, though, that the presence of explicit molecules and ligands is an unsolved problem,
water molecules shields the interactions be- it remains common practice to consider essen-
tween the ligand and the receptor. Conse- tial water molecules as a fixed part of the bind-
quently, different weights were applied to the ing site. Preplaced water molecules may either
ligand-receptor and ligand-solvent interac- correspond to recurrently observed waters
tions, respectively, to cope with this complica- found in multiple crystal structures of the tar-
3 Docking
get protein, or to predicted positions based on by such a test. The number of complexes used
estimated water aMinity potentials suggested has varied as much as the reported success
by programs such as GRID (218-220). The lat- rates, which are between 10% (224) and 100%
ter strategy has been applied by Minke et al. (152). Clearly, success rates of 100% are
using AutoDock (221), showing that success- rather a consequence of the limited test set
ful docking of carbohydrate derivatives to the size than a reflection of the mere quality of the
heat-labile enterotoxin critically depends on docking method.
the inclusion of water molecules. Examples for Numerous critical issues have to be ad-
the consideration of experimentally observed dressed in this context. Validations carried out
water molecules as part of the target during on very few complexes (120) do not ade-
docking are the studies of Rao et al. (docking quately assess the scope of the method, partic-
to factor Xa using AutoDock) (222) and Pospi- ularly if no attempt was made to select a
sil (docking to thymidine kinase using representative set of structures that appropri-
AutoDock and FlexX). (223).
. The influence of ately covers a broad range of binding features
explicit water molecules in docking was also important to protein-ligand complexes. Up to
investigated in the validation study of the new now, only a few docking methods have been
program DARWIN (178). Inclusion of explicit assessed on a broad range of complexes [e.g.,
water molecules was essential in some cases, F l e d (200 complexes) (139), ScoreDock and
unless interaction energies were calculated DOCK (200 complexes) (225), EUDOC (154
with a Poisson-Boltzmann-based implicit sol- complexes) (125), DOCK, FlexX, and Drug-
vent model. Yet another example is a search Score (100-150 complexes) (2261, GOLD (100
for metallo-P-lactamase inhibitors (14) with complexes) (177), the method of Diller and
the docking program FLOG. Docking was per- Merz (using the GOLD test set) (1121, and
formed with three different configurations of PRO-LEADS (70 complexes) (18911. In the
bound water in the active site. The top-scoring case of GOLD it has been explicitly mentioned
compounds showed an enrichment in biphenyl that the test set was selected by a researcher
tetrazoles. A crystal structure of one tetrazole not involved in the development of the algo-
not only confirmed the predicted binding rithm (177). The definition of an objective and
mode but also displayed the water configura- relevant reference test set that could serve as
tion that had, retrospectively, been the most standard benchmark for every new docking
predictive one of the three models. Further method would be highly desirable for both
examples from virtual screening studies are user and developer (18). First efforts in devel-
available that show that the inclusion of con- oping a database that could be of use in this
served water molecules in the docking process context have been reported (227). Suitable
can dramatically improve the hit rate (15,161. test sets should cover a sufficient number of
highly diverse protein-ligand complexes, in-
3.2.3 Assessment of Docking Methods. cluding cases that provide some challenge to
Docking methods are usually assessed by their docking methods (e.g., water-mediated inter-
ability to reproduce
- the binding- mode of ex- actions, interactions with metal ions). To test
perimentally resolved protein-ligand com- performance with respect to potential induced
plexes: the ligand is removed from the com- fit, the structure of the unligated protein or
plex, a search area is defined around the actual alternative complexes with different bound li-
binding site, the ligand is redocked into the gands should be available as well. The test set
protein, and the achieved binding mode is should comprise fully resolved crystal struc-
compared with the experimental position, tures with a resolution of 52.5 A. Complexes
usually in terms of a root-mean-square devia- with ligands significantly involved in crystal
tion (rmsd).If the rmsd is below 2 A, it is gen- packing contacts should be avoided. Such
erally considered a successful prediction. The cases will likely fail in reproducing the exper-
vious goal is that such a "near-native" solu- imental binding mode because of missing con-
tion is ranked best among the set of ligand tacts present only in the packing environment
poses generated. Virtually any introduction of (228). Finally, the importance to study low-
a new docking method has been accompanied affinity or "non-binding" ligands must be ad-
Docking and Scoring FunctiondVirtual Screening
dressed; accordingly, experimental informa- late experimental binding data with features
tion about the binding geometry and affinity described by a set of relevant descriptors. In
of some weak-binding ligands should also be 3D QSAR, such as CoMFA (Comparative Mo-
available. lecular Field Analysis), these descriptors are
In addition to the tests usually reported by essentially virtual interaction energies (van
the authors of a program, comparative studies der Waals and coulombic), calculated using an
have been reported
- on the assessment of dif- appropriate probe atom placed at the intersec-
ferent docking and scoring approaches. In part tions of a regularly spaced grid surrounding
they also address some of the aspects raised the molecules. The model derived from differ-
above. Westhead et al. have presented a com- ences in the various interaction fields -provides
parison of four heuristic search algorithms
a quantitative spatial description of those mo-
(simulated annealing, genetic algorithm, evo-
lecular properties that matter for binding.
lutionary programming, and tabu search)
(186). In an attempt to provide an unbiased They can be interpreted as a surrogate repre-
comparison, all algorithms were implemented sentation of the binding site. Essential for the
into the PRO-LEADS program and a single success of all 3D QSAR approaches is an ap-
scoringfunction was used. Other recent exam- propriate alignment of the ligands: their rela-
ples are the studies of Ha et al., who compared tive spatial superposition must reflect the dif-
DOCK (using two different scoring functions) ferences in binding geometry also experienced
and F l e a (229),and, in the context of virtual at the binding site of the structurally un-
screening, the work of Bissantz et al., who known protein. Various strategies have been
compared DOCK, F l e a , and GOLD together developed to achieve this goal (235, 236). In-
with seven different scoring functions (230) creasingly, however, these methods are also
(cf. also Section 5.2 below). applied if the receptor structure is known.
An unbiased test scenario is guaranteed if This results in "receptor-based 3D QSAR," a
researchers are provided with a set of protein- combination of a ligand-based QSAR approach
ligand complexes of experimentally resolved, with information extracted from receptor
but yet unpublished structure. Two such blind structures (238). This additional information
trial competitions have been carried out so far is used to generate a ligand alignment based
(231, 232). A series of interesting issues re- on the experimental or predicted binding,
garding docking tests and problems with true mode of the ligands in the binding site. The
predictions have been amply discussed by standard 3D QSAR techniques are subse-
Dixon (231) and participants in the CASP2 quently used to derive a correlation model and
docking competition (117, 145,233, 234). Un- to ultimately predict the binding affinity of
fortunately, the number of targets subjected new, appropriately aligned ligands (239). As a
to such blind tests has so far been rather practical advantage, receptor-based 3D QSAR
scarce. A major limitation to such blind com- provides important information as to which of
parisons is the availability of experimental the protein-ligand interactions are responsi-
data before publication. ble for the variance in biological activity
among the given set of ligands.
3.2.4 Docking and QSAR. As long as the Obviously, in the case of known receptor
problem of accurate binding free energy pre- structure, the ligand alignment can be ob-
diction on the basis of a given complex geom- tained by docking. This strategy has indeed
etry has not been resolved (cf. section on scoring been followed in a variety of studies: it has
functions), computational methods establishing been used to set up CoMFA models [e.g., (24011
quantitative structure-activity relationships or extended to the Comparative Binding En-
(QSARs) to estimate relative binding aMinity ergy (COMBINE) analysis (241-244), that ex-
differences within a set of ligands remain a plicitly exploits receptor information to gener-
pragmatic alternative. Both classical and 3D ate the QSAR descriptors. Furthermore, in a
QSAR methods have been developed as ligand- GRIDIGOLPE (245) analysis, the model gen-
based approaches (235-237). They rely exclu- erated with the docking alignment has been
sively on ligand information and try to corre- compared to the traditional CoMFA model
based on ligand alignment (238, 246); the often beyond the scope of the method. In fact,
alignment generated by docking could be members of a homologous protein family may
shown to exhibit higher relevance. show considerable differences in the binding
Another concept to combine docking with region. Accordingly, homology models may
QSAR has recently been proposed by Vieth not be sufficiently accurate to apply standard
and Cummins in their DoMCoSAR approach docking tools, and special methods addressing
(247). DoMCoSAR is used to statistically de- the docking of ligands to low-resolution struc-
termine the docking mode that is consistent tures have been presented (248).
with a structure-activity relationship, based Clearly, flexible-receptor docking could
on the explicit assumption that all molecules help to alleviate the problem. A frequently fol-
exhibit the same binding mode. In a first step, lowed alternative is to refine the initial com-
all molecules of a chemical series with com- plex between the protein model and the li-
mon substructure are docked in an unbiased
gand, most commonly by relaxation with MD
way to the protein binding site and the results
simulations (249-251). This may also be com-
are clustered to establish the most favorable
docking modes for the common substructure. bined with free energy calculations to deter-
Subsequently, constrained docking is per- mine the binding mode most consistent with
formed by forcing all molecules to align with experimental affinity data (252). However, re-
the common substructure in the major dock- finement does not overcome the problem that
ing modes. In a final stage, interaction-en- the initial conformation of the model may pre-
ergy-based descriptors are calculated for all clude the binding of certain ligands. This has,
major docking modes. QSAR models are then for example, been demonstrated by Schapira
derived to determine the statistically signifi- et al. in a virtual screening for retinoic acid
cant and most predictive set of descriptors and receptor (RAR) antagonists based on an RAR
thus the docking mode that is most consistent homology model (201). The automatic selec-
with a given structure-activity relationship. tion procedure based on flexible ligand dock-
As noted by the authors, the appeal of this ing was followed by optimization of the se-
method is that an objective statistical justifi- lected candidates with flexible protein side
cation for the selection of a binding mode is chains using the ICM program (82,116,117).
obtained. This may especially prove useful in Nevertheless, some known ligands were rg-
cases where the primary docking scores yield peatedly missed by the screening algorithm
nearly degenerate multiple binding modes and because of incompatible binding site confor-
a selection of the most representative result is mations. Consideration of side-chain flexibil-
difficult. However, because one alignment is ity already in the initial docking simulation
rendered prominent among others for the was required to accommodate these ligands.
sake of best agreement with the derived QSAR An approach developed especially for the
model, the danger exists that unconsidered or purpose of docking ligands into approximate
ill-defined descriptors in the QSAR could pos- protein models generated by homology model-
sibly distort the final or accepted alignment. ing is the DragHome method (253). The bind-
ing site is analyzed in terms of putative ligand
3.2.5 Docking and Homology Modeling. In interaction sites and translated using Gauss-
the absence of an experimental protein struc- ian functions into a functional binding-site de-
, a homology model may be used for dock- scription represented by physicochemical
and structure-based design. Such a model properties. Similarly, ligands are translated
be generated by comparative modeling into a description based on Gaussian functions
based on homologous proteins of known struc- and the dockingis computed by optimizing the
ture. Obviously, it is most reliable in the re- overlap between the two functional descrip-
ons of highest homology between the tem- tions. The use of "soft" Gaussian functions to
s and the target protein. Although an describe protein-ligand interactions is one
era11 skeleton of the target protein can fre- possibility to take into account the limited ac-
ently be obtained with sufficient accuracy, curacy of modeled structures for the purpose
e structural details of the binding site are of docking. The method for generating and op-
Docking and Scoring Functions/Virtual Screening
timizing ligand orientations relative to the accurate values of binding free energies, ex-
binding-site representation was adapted from tensive Monte Carlo or MD simulations are
the ligand alignment program SEAL necessary, which require large computational
(254-256). For a set of different ligands, the resources. Clearly, this is impractical for stan-
generated solutions are analyzed with respect dard docking applications. Furthermore, even
to the mutual ligand alignment. This align- the most advanced techniaues are reliable
ment is then used to generate 3D QSAR mod- only for calculating binding free energy differ-
els, which in turn can be interpreted with re- ences between closely related ligands (162,
spect to the surrounding protein model. This 163, 257, 258). However, some less rigorous,
can highlight inconsistencies and deficiencies but faster and, as experience shows, often not
present in the model, and thus information less accurate methods have been developed,
which in future developments of the methods that are suitable to handle larger numbers of
is planned to be fed back into a subsequent ligands. For example, continuum solvation
modeling step to improve the protein model. models are used to replace explicit solvent
The idea behind this is that the cycle of dock- molecules at least in the final energy evalua-
ing and alignment, ligand data analysis (3D tion of the simulation trajectory (2591, or lin-
QSAR), and protein structure modeling ear response theory is applied (260-262),
should be repeated until self-consistency is sometimes augmented by a surface term
achieved. This would provide a protein homol- (263).
ogy model optimized with respect to the bind- Scoring functions that can be evaluated
ing site and suitable to obtain consistent dock- fast enough to be applied in docking and vir-
ing results. tual screening can only estimate the free en-
ergy of binding. They usually take into ac-
count only one possible configuration of the
4 SCORING FUNCTIONS
receptor-ligand complex and disregard ensem-
ble averaging and explicit properties of the un-
This section is dedicated to the scoring aspect
bound states of the binding partners. Further-
of the docking problem. Various approaches
more, all methods share the assumption that
are discussed that try to capture the essential
the free energy can be decomposed into a sum
elements of protein-ligand interactions in
of terms (additivity). In a strict physical sense,
computationally efficient scoring functions.
this is not allowed, given that the free energy
The discussion focuses on general approaches
of binding is a state function, although its
rather than individual functions. The reader
components are not (77,264). In addition, sim-
is referred to Table 7.2 for original references
ple additive models cannot describe subtle co-
to the most important scoring functions.
operativity effects (265). Nevertheless, it is of-
ten useful to interpret receptor-ligand binding
4.1 Description of Scoring Functions for
in an additive fashion (266-2681, and esti-
Protein-Ligand Interactions
mates of binding free energy based on the ad-
Reversible protein-ligand binding is an equi- ditivity assumption are often accessible at
librium between the bound state and the un- very low computational cost.
bound state of the binding partners. The rig- Three main classes of fast scoring functions
orous theoretical description requires full can be distinguished:
- force field-based meth-
consideration of all species involved: the sepa- ods, empirical scoring functions, and know-
rate solvated protein, the separate solvated li- ledge-based methods. The following sections
gand, and the solvated complex, in which the are dedicated to a separate discussion of each
binding partners are partially desolvated and method.
form interactions with each other. The quan-
tity of interest to characterize this equilibrium 4.1 .I Force Field-Based Methods. An obvi-
is the free energy of binding. Its most accurate ous idea to circumvent parameterization ef-
calculations are based on the evaluation of en- forts for scoring is to use nonbonded energies
semble averages according to principles of sta- of existing, well-established molecular me-
tistical mechanics (45). To obtain reasonably chanics force fields for the estimation of bind-
4 Scoring Functions 307
ing affinity. In doing so, one substitutes esti- studies, however, experimental data repre-
mates of the free energy of binding in solution sented rather narrow activity ranges and cov-
by an estimate of the gas phase enthalpy of ered little structural variation.
binding. Even this crude approximation can The AMBER (271, 272) and CHARMM
lead to satisfying results. A good correlation (179) nonbonded terms are used as scoring
was obtained between nonbonded interaction function in several docking programs. As men-
energies calculated with a modified MM2 force tioned above (Section 3.1.1), protein terms are
field and IC,, values of 33 HIV-1 protease in- usually precalculated on a rectangular grid to
hibitors (269). Similar results were reported speed up the energy calculation compared to
n a study of 32 thrombin-inhibitor complexes traditional atom-by-atom evaluations (273).
with the CHARMM force field (270). In both Distance-dependent dielectric constants are
Docking and Scoring FunctiondVirtual Screening
usually employed to approximate the long- the partial charges of the ligand atoms. This
range shielding of electrostatic interactions by approach seems to be successful for Ki predic-
water (274). However, compounds with high tion as well as virtual screening applications
formal charges still obtain unreasonably high (284). Its conceptual advantage is the implicit
scores as a result of overestimated ionic inter- consideration of entropic and solvent effects
actions. For this reason, a common practice in and some protein flexibility.
virtual screening is to separate databases of The calculation of ligand strain energy tra-
compounds into subgroups according to their ditionally also lies in the realm of molecular
total charges and rank these groups sepa- mechanics force fields. Although effects of
rately. When electrostatic interactions are strain energy have rarely been determined ex-
complemented by a solvation term calculated perimentally (3), it is generally accepted that
by the Poisson-Boltzmann equation (32) or high-affinity ligands bind in low-energy con-
faster continuum solvation models (e.g., Ref. formations (285, 286). If a compound must
275),effects of high formal charges are usually adopt a strained conformation to fit into a re-
leveled out. In a validation study on three pro- ceptor pocket, this should lead to a less nega-
tein targets, Shoichet and coworkers observed tive binding free energy. Strain energy can be
significantly improved ranking of known in- estimated by calculating the difference be-
hibitors upon correction for ligand solvation tween the global energy minimum of the un-
(276). The current version of the docking pro- bound ligand and the current conformation of
gram DOCK calculates solvation corrections the ligand in the complex. However, force field
based on the generalized Born (277) solvation estimates of energy differences between indi-
model (278). The method has been tested in a vidual conformations are not reliable for all
study where several peptide libraries were systems. In practice, better correlation with
docked into various serine protease active experimental binding data is observed when
sites (279). strain energy is used as a filter to weed out
In the context of scoring, the van der Wads unlikely binding geometries rather than in-
term of force fields is mainly responsible for cluding it in the final score. Estimation of li-
penalizing docking solutions with respect to gand strain energy based on force fields can be
overlap between receptor and ligand atoms. It time-consuming and therefore alternatives
is often omitted when only the binding of ex- are often employed, such as empirical rules
perimentally determined complex structures derived from small-molecule crystal data
is analyzed (280-282). (140). Conformations generated by such pro-
Very recently, a new contribution to the list grams are, however, often not strain-free be-
of force-field-based scoring methods has been cause only one torsional angle is regarded at a
developed by Charifson and Pearlman. This time. Some strained conformations can be ex-
so-called OWFEG (one window free energy cluded when two consecutive dihedral angles
grid) method (283) is an approximation to the are taken into account simultaneously (287).
expensive first-principles method of free en-
ergy perturbation (FEP). For the purpose of 4.1.2 Empirical Scoring Functions. The un-
scoring, an MD simulation is carried out with derlying idea of empirical scoring functions is
the ligand-free, solvated receptor site. During that the binding free energy of a noncovalent
the simulation, the energetic effects of probe receptor-ligand complex can be factorized into
atoms on a regular grid are collected and av- a sum of localized, chemically intuitive inter-
eraged. Three simulations are run with three actions. Such decompositions can be a useful
different probes: a neutral methyl-type atom, tool to gain some insight into binding phenom-
a negatively charged atom, and a positively ena, even without analyzing 3D structures of
charged atom. The resulting three grids con- receptor-ligand complexes. Andrews and col-
tain information on the score contributions of leagues derived average functional group con-
neutral, positively, and negatively charged tributions to the binding free energy by ana-
probe atoms located in various positions of the lyzing a set of 200 compounds for which the
receptor site. They are used for scoring a li- affinity to a receptor had been experimentally
gand position by linear interpolation based on determined (266). Such average functional
4 Scoring Functions 309
group contributions can then be used to esti- Usually, between 50 and 100 complexes are
mate the mean overall binding affinity of a used to derive the weighting factors. In a re-
compound independent of a particular binding cent study it has been shown that many more
site. This value can be compared to the exper- than 100 complexes were necessary to achieve
imental binding free energy: if the experimen- convergence (293). The reason for this finding
tal affinity is similar to or even more favorable is probably the fact that the publicly available
than the computed one, the ligand obviously protein-ligand complexes fall in a few rather
shows a good fit with the receptor and its func- strongly populated classes.
tional groups are supposedly all involved in Empirical scoring functions usually con-
tain individual terms for hydrogen bonds,
interactions with the protein; on the other
ionic interactions, hydrophobic interactions,
hand, if it is significantly less favorable, the
and binding entropy. Hydrogen bonds are of-
compound apparently does not fully exploit its
ten scored by simply counting the number of
potential to form optimal interactions. Simi- donor-acceptor pairs that fall into a given dis-
larly, experimental binding affinities have tance and angle range favorable for hydrogen
been analyzed on a per-atom basis in quest of bonding, weighted by penalty functions for de-
the maximal binding affinity of noncovalent viations from ideal standard values (80, 294-
ligands (288). It was concluded that in the 296). The amount of error tolerance in these
strongest binding ligands each non-hydrogen penalty functions is critical. If large deviations
atom on average contributes 6.3 kJ/mol to the from the ideal are tolerated, the scoring func-
binding energy. tion cannot discriminate sufficiently between
The analysis of binding phenomena can be different placements of a ligand, whereas too
performed with much more detail if the 3D stringent tolerances artificially score similar
structures of receptor-ligand complexes are complexes rather differently. Attempts have
available. Based on the assumption of additiv- been described to reduce the strong distance
ity, the binding affinity AGbind can be esti- dependency of such interactions by assigning
mated as a sum of interactions multiplied by soft modulating functions on an atom-pair ba-
weighting factors: sis (297). Other concepts try to avoid penalty
functions and introduce distinct regression co-
efficients for strong, medium, and weak hy-
drogen bonds (293). The Agouron group ha;
Here, each fi corresponds to an interaction used a simple four-parameter potential that is
term that depends on structural features of a piecewise linear approximation of a potential
the complex and each AGi represents a weight- neglecting angular terms ("PLP scoring func-
ing coefficient, which is determined on the ba- tion") (185). Most functions consider all types
sis of a training set of experimental affinities of hydrogen bonds equivalently. Some at-
for crystallographically known protein-ligand tempts have been made to distinguish be-
complexes. Scoring schemes that use this con- tween different donor-acceptor functional
cept are called empirical scoring functions. group pairs. Hydrogen bond scoring in GOLD
Several reviews summarize details of individ- (176, 177) is based on a list of hydrogen bond
ual parameterizations (26, 44, 56, 289-292). energies, derived from ab initio calculations,
The individual terms in empirical scoring for any combination of 12 donor and 6 accep-
nctions are usually chosen such that they tor atom types. A similar differentiation of do-
uitively cover important contributions of nor and acceptor groups is attempted in the
e total binding free energy. Most empirical program GRID (218) for the characterization
ring functions are derived by evaluating of binding sites (219,220, 298). The consider-
e functionsf , on a set of protein-ligand com- ation of such lookup tables in scoring func-
xes and fitting the coefficients AG, to exper- tions might help to avoid false predictions
ental binding affinities of these complexes originating from an oversimplification of some
multiple linear regression or supervised individual interactions.
ing. The relative weight of the individual Reducing the weight of hydrogen bonds lo-
ributions depends on the training set. calized at the solvent-exposed rim of a binding
Docking and Scoring Functions/Virtual Screening
site is a useful concept to avoid false positives and acceptor groups are overrepresented
in virtual screening. This is achieved by reduc- (many peptide and carbohydrate fragments).
ing charges of surface-exposed residues in In most empirical scoring functions, a hy-
cases where explicit electrostatic terms are drophobic character is attributed to several
used (274) or by multiplying the hydrogen atom types, with equivalent weight for all hy-
bond contribution with a factor that depends drophobic contributions. In a more sophisti-
on the accessibility of the involved protein cated approach, the propensity of particular
counter group (299). atom types to be solvent-exposed or embedded
Ionic interactions are handled in a way sim- in the interior of a protein can be assessed by
ilar to hydrogen bonds. Long-distance charge- so-called atomic solvation parameters. These
charge interactions are usually neglected, and have been derived, for example, from experi-
it is thus more appropriate to refer to salt mental octanol/water partition coefficients
bridges or charge-assisted hydrogen bonds. (303, 304) or from protein crystal structures
The scoring function by Boehm implemented (305, 306). Atomic solvation parameters are
in LUDI (294)assigns a stronger weight to salt used in the VALIDATE scoring function (307)
bridges than to neutral hydrogen bonds. This and have been tested in DOCK (308).
differentiation generally proved successful in Entropy terms account for the restriction
scoring series of thrombin inhibitors (295, of conformational degrees of freedom of the
300). However, comparable to force field scor- ligand upon complex formation. A crude but
ing, the danger exists that highly charged mol- useful estimate of this entropy contribution is
ecules receive overestimated scores. Experi- the number of rotatable bonds of a ligand (294,
ence with FlexX containing a variant of 296). This measure has the advantage of being
Boehm's scoring function has shown that a function of the ligand only. More sophisti-
more reliable predictions are obtained if cated estimates try to take into account the
charged and uncharged hydrogen bonds are nature of the ligand portion on either side of a
handled equally in a virtual screening applica- flexible bond, particularly with respect to the
tion. Similar experience has also been col- interactions formed with the receptor (80,
lected using the ChemScore function (80). 307). This concept is based on the assumption
Hydrophobic interactions are usually cali- that purely hydrophobic contacts allow for
brated to the size of the contact surface buried more residual motion in the ligand fragments.
upon receptor-ligand complex formation. Of-
ten, a reasonable correlation between experi- 4.1.3 Knowledge-Based Methods. Empiri-
mental binding energies can be achieved con- cal scoring functions regard only those inter-
sidering only a surface term [see, for example, actions that are explicitly part of the model. Less
(1, 301, 302) and the discussion in Section frequent interactions are usually neglected,
2.1.1. Various approximations for such surface even though they can be strong and specific, for
terms have been described, for example, as example, NH-.rr hydrogen bonds. To generate a
grid-based (294) or volume-based approaches comprehensive and consistent description of all
(cf. the discussion in Ref. 115). Many functions these interactions in the framework of empirical
are based on a distance-dependent summation scoring functions would be a difficult task. How-
over neighboring receptor-ligand atom pairs. ever, the exponentially growing body of struc-
Distance-dependent cutoffs have been intro- tural data on receptor-ligand complexes can be
duced in various ways, either short (110) or exploited to discover favorable binding geome-
longer to include atom pairs that are not in- tries. "Knowledge-based" scoring fundions try
volved in direct van der Wads contacts (80, to capture the knowledge about protein-ligand
185). The weighting factor AGi of the hydro- binding that is implicitly stored in the protein
phobic term depends strongly on the training data bank by means of statistical analysis of
set. Supposedly, this fact has been underesti- structural data, without referring to often in-
mated in the development of many empirical consistent experimentally determined binding
scoring functions (35) because in most train- afKnities (309).They are based on the concept of
ing sets ligands composed of numerous donor the inverse formulation of the Boltzmann law,
4 Scoring Functions
protein environment on complex formation. ciencies that one should be aware of in any
Contributions of these surface potentials and application. First, most scoring functions are
the pair potentials are weighted equally in the in some way fitted to or derived from experi-
h a l scoring function. This scoring function has mental data. The functions necessarily reflect
initially been developed with the primary goal to the accuracy of the data that were used for
differentiate between correctly docked (near na- their derivation. For instance, a general prob-
tive) ligand poses versus decoy binding modes lem with empirical scoring functions is the
for the same protein-ligand pair. However, fact that the experimental binding energies
through appropriate scaling also quantitative usually originate from many different sources
estimates across different protein-ligand com- and therefore consist of a rather heteroge-
plexes are possible (318). neous data set affected by all kinds of experi-
Mitchell and coworkers choose a different mental errors. Furthermore, scoring func-
type of reference state for their BLEEP poten- tions mirror not only the quality but also the
tial (315). The pair interaction energy is writ- scope of experimental data used for their de-
ten as velopment. Virtually all scoring functions are
still derived from data mostly based on high-
affinity receptor-ligand complexes. Many of
these are still of peptidic nature, whereas in-
teresting leads in pharmaceutical research are
non-peptidic. This is reflected in the relatively
Here, the number density pG(r)is defined as high contributions of hydrogen bonds in the
above, but it is normalized by the occurrence total score. The balance between hydrogen
frequency of all atom pairs at this same dis- bonding and hydrophobic interactions is a
tance instead of by the number of pairs in very critical issue in scoring, and its conse-
the whole reference volume. The variable mG quences are especially obvious in virtual
is the number of pairs ij found in the evaluated screening applications, as illustrated in Sec-
data set, and u is an empirical factor that de- tion 5.3.
fines the weight of each observation. This po-
tential is combined with a van der Wads po- 4.2.2 Molecular Size. The simple additive
tential as a reference state to compensate for nature of most fast scoring functions o f t q
the lack of sampling at short distances and for leads to gradually increasing scores for mole-
certain underrepresented atom pairs. cules of larger size. Although it is true that
Besides differences in the functional form small molecules with a molecular weight be-
and reference state, from a more practical low 200-250 rarely show very high affinity,
point of view, the knowledge-based potentials there is no physical reason why larger com-
differ also with respect to scope of atom type pounds should automatically possess higher
definitions and the amount of structural data activity. Comparing the scores of two com-
used for their derivation. The number of dif- pounds of significant size difference therefore
ferent atom types ranges from 17 in Drug- calls for a term that compensates the size de-
Score to 40 nonmetal atom types in BLEEP. In pendency. In some applications, a constant
all cases, the Protein Data Bank (321) was the "penalty" term has been added to the score for
source of the solved crystal structures. For each heavy atom (324) or a term proportional
BLEEP 351 selected complexes were used, to the molecular weight has been considered
whereas the PMF function was extracted from (325). The empirical scoring function imple-
697 complexes, and Drugscore was derived us- mented in the docking program FLOG has
ing 1376 complexes. In the latter case, the data been normalized to remove the linear depen-
have been extracted from Relibase (322,323). dency of the crude score on the number of li-
gand atoms (121).Originally introduced to im-
4.2 Critical Assessment of Current prove the correlation between experimental
Scoring Functions and calculated affinities, entropy terms re-
flecting the change in conformational mobility
4.2.1 Influence of the Training Data. All upon ligand binding also help to reduce an ex-
fast scoring functions share a number of defi- cessive score for overly large and flexible mol-
ecules (80,294). The size of the solvent-acces- remove them according to user-specified
ible surface of the ligand in its bound state thresholds (329). A promising approach to
can also be used as penalty term to discard properly reflect such cases is the inclusion of
large ligands not fully buried in the binding artificially generated, erroneous, decoy solu-
site. It should be noted, however, that all these tions in the optimization of scoring functions
approaches are very pragmatic in nature and as reported for the scoring function of a flexi-
do not solve the problem of size dependency, ble ligand superposition algorithm (330,331).
which is closely related to a proper under-
standing of cooperativity effects (265). 4.2.4 Specific Attractive Interactions. An-
other general deficiency of scoring functions is
4.2.3 Penalty Terms. In general, scoring the simplified description of attractive inter-
functions reward favorable interactions such actions. Molecular recognition is not entirely
based on hydrogen bonding and hydrophobic
contacts. Especially in host-guest chemistry,
other specific types of interactions are fre-
d energetically unfavorable quently used to characterize the observed phe-
d within the binding site nomena. For example, hydrogen bonds are
not observed and can hardly be accounted formed between acidic protons and T-systems
ased scoring function. (332). These bonds can substitute for conven-
owledge-based scoring functions try to cap- tional hydrogen bonds in strength and speci-
eferring to a reference ficity, as has been noted in protein-DNA rec-
te that corresponds to a mean situation. At ognition (333).Another type of less frequently
st glance, the neglect of angular terms in the observed interactions is the cation-T interac-
ge-based scoring func- tion, which is especially important at the sur-
d pair potentials that face of proteins (39, 334). Current empirical
not discriminate sufficiently between dif- scoring functions usually neglect these inter-
ent binding geometries. However, some de- actions. Similarly, the directionality of inter-
dency is considered, actions between aromatic rings is hardly con-
n that pair potentials for different atom sidered (335, 336). Because of the regression-
are always evaluated in combination type adjustment, some energy contributions
each other (226). Obvious deficiencies in originating from these interactions are al-
functions, such as ready implicitly incorporated into the conven-
rostatic repulsions and steric clashes, can tional interaction terms. This might be one
avoided by defining reasonable penalty explanation why hydrogen bond contributions
em from molecular are traditionally overestimated in regression-
echanics force fields. This has been realized based scoring functions. Knowledge-based ap-
the "chemical scoring" function imple- proaches automatically incorporate these in-
ogram DOCK (106, teractions in a scoring function, provided they
ich is a modified occur with reasonable frequency in the data
der Wads potential being attractive or re- set used to develop the potentials.
"conserved water molecules" and to consider structure prediction, several studies have
them as part of the receptor. A knowledge- shown that knowledge-based scoring func-
based tool to estimate the "conservation" of tions are at least equivalent to regression-
water molecules upon ligand binding has been based functions. The PMF function has been
developed (217) and incorporated into a dock- successfully applied to structure prediction of
ing procedure (111) (cf. Section 3.2.2). It is inhibitors of neuraminidase (339) and MMP3
based on crystallographic information and (229) in combination with the program
tries to extract rules about water sites by an- DOCK, yielding superior results to the DOCK
alyzing whether they are recurrently occupied force field and chemical scoring. The Drug-
by water molecules in series of related pro- Score function was tested on a large set of PDB
tein-ligand complexes. complexes and gave significantly better re-
Scoring functions require predefined atom sults than those of the original FlexX scoring
types for each protein and ligand atom. This function using solutions generated by FlexX
also implies the fixed assignment of a proton- as the docking engine. DrugScore performed
ation state to each acidic and basic group. similarly to the force field score in DOCK, but
Knowledge-based functions, which do not con- outperformed the chemical scoring (226).
sider hydrogen atoms, are equally affected by Moreover, with respect to the correlation be-
the problem because the atom type definitions tween experimental and calculated binding
normally imply a certain protonation state. energies, very promising results have been ob-
Presently, such estimates might be reliable tained with DrugScore (318) and PMF (229,
enough for the situation in aqueous solution; 317, 319, 339). BLEEP has recently been
however, significant pK, shifts are possible tested for scoring docked protein-ligand com-
upon ligand binding (338) as a result of strong plexes (340). It was found to be slightly better
changes of the local dielectric conditions. They than the DOCK energy function in discrimi-
give rise to protonation reactions in parallel to nating decoy situations from near-native bind-
the binding process. With respect to scoring, ing modes.
switching from a donor to an acceptor func- Although in many docking programs the
tionality because of altered protonation states same function is applied as an objective func-
has important consequences (279). Accord- tion for structure generation and for energy
ingly, improved docking and scoring algo- evaluation, better results can sometimes be
rithms must incorporate a more detailed and obtained if different functions are applied. In
flexible description of protonation states. particular, the docking objective function can
be adapted to the docking algorithm used. In a
4.2.6 Performance in Structure Prediction parameter study, Vieth et al. found that using
and Rank Ordering of Related Ligands. Similar a soft-core van der Wads potential made their
to the broad range of available docking tools MD-based docking algorithm more efficient
(cf. Section 3.2.31, the multitude of different (274). Using F l e S as the docking engine, we
scoring schemes calls for an objective assess- observed that the original FlexX scoring func-
ment to evaluate their scope and limitations. tion emphasizes directional interactions
This depends in part on the anticipated appli- (mostly hydrogen bonds) in the docking phase.
cation; that is, whether protein-ligand com- Subsequently, the ranking of individual li-
plexes should be predicted (using the scoring brary entries can be done successfully with a
scheme as objective function in docking), simple PLP potential that lacks directional
whether a set of ligands should be ranked with terms, but considers general steric fit of recep-
respect to one target protein (K, prediction), or tor and ligand. Results are significantly worse
whether the scoring function is used to select if PLP is used already in the incremental
possible hits out of a large database of candi- built-up procedure of the docked ligand.
date molecules (virtual screening). It is even more difficult to draw valid con-
An objective assessment of the available clusions about the relative performance of
scoring functions is difficult because only very scoring functions to rank sets of inhibitors
few functions have been tested on the same with respect to their binding affinity for the
data sets or with the same docking tool. For same target. First, there is hardly any pub-
lished study in which different functions have the binding site that could be included in the
been applied to the same data sets. Second, docking process. Tools such as Relibase (322,
experimental data are often not measured un- 323) may be used to perform these compara-
der the same conditions but collected from tive analyses of protein-ligand complexes in
various literature references. This retrieval an efficient way. Subsequently, programs like
from various sources usually implies larger GRID (218),LUDI (108, log), Superstar (351,
uncertainties within the experimental data 352), or Drugscore (318) are used to visualize
potential binding sites ("hot spots") in the ac-
The task of ranking sets of 10-100 related tive site; in principle, any scoring function
ligands with respect to one target can also be could be used for this purpose.
handled by computationally more demanding An important result of the 3D structure
methods. The most general approaches are analysis is usually the identification of one or
probably force field scores complemented by more key interactions that all ligands should
electrostatic desolvation and surface area satisfy. In aspartic proteases, for example, in-
terms. An example is the MM-PBSA method hibitors should form at least one hydrogen
that combines Poisson-Boltzmann electro- bond to the catalytic Asp side-chains, whereas
statics with AMBER molecular mechanics cal- in metalloproteinases a coordination to the
culations and MD simulations (341,342). This metal seems mandatory. Sometimes, a known
method has recently been applied to an in- ligand portion is used as initial scaffold based
creasing number of examples, showing quite on which virtual screening techniques search
promising results (343-346). Poisson-Boltz- for optimal side-chains. In principle, this step
mann calculations have been performed on a is not required, and instead one could fully
variety of targets with many related computa- rely on the docking and scoring step. However,
tional protocols (280-282, 347-350). Alterna- following a pragmatic approach, it is impor-
tively, extended linear response protocols tant to use any well-founded information that
(263) can be used. The OWFEG grid method is available about the system under consider-
by Pearlman has also shown very promising ation because more valuable results can usu-
ally be expected this way.
Once a reasonable hypothesis about the
binding-site requirements has been gener-
VIRTUAL SCREENING ated, the next level of virtual screening is ap-
proached. Whether databases of commercially
outlined in section, virtual screening is a available compounds or "virtual" libraries of
s. Although, in principle, the
designed compounds are screened, it is advis-
ole process can be fully automated, it is
able not to dock every possible compound, but
ghly advisable to allow for manual interven-
only those that pass a series of hierarchical
ual inspection and selection
filters (cf. also Fig. 7.3). Simple preliminary
starts with a detailed filters remove
sis of the available 3D protein struc-
ly homologous struc- 0 compounds with reactive groups such as
swill also be analyzed, either to generate S 0 , C l or -CHO because they are expected
tional ideas about possible ligand struc- to cause problems in some biological assays
me insight on how to as a result of unspecific covalent binding to
eve selectivity against other proteins of the protein.
same class. A superposition of different 0 compounds with molecular weights below
provides some ideas 150 or above 500. Small molecules such as
epeatedly found in benzene are known to bind to proteins
t-binding protein-ligand complexes. Such rather unspecifically at several sites. Large
overlay will also highlight flexible parts of molecules such as polypeptides are difficult
e protein or recurring water molecules in to optimize subsequently, given that good
31 6 Docking and Scoring Functions/Virtual Screening
Selection based on
known Zn-binding groups, e.g.:
0 H 0
N-N
,N K H II
-S-NH,
I I1
OH H 0
3D Pharmacophore based on
binding site "hot spots"
- Visual inspection
Figure 7.3. Hierarchical filtering process in virtual screening for carbonic anhydrase inhibitors.
docking tools cannot produce reliable solu- puter resources are used, as in the Dock-
tions for all compounds; often, even some Crunch project based on the PRO-LEADS
well-scoring compounds are simply docked program (360).Examples for such fast docking
to the outer surface of the protein or adopt tools are SLIDE (111)or the docking method
rather strained conformations to achieve by Diller and Merz (112). Both have been de-
good surface complementarity within the veloped for database screening and library pri-
binding pocket. Computational filters help oritization. Before docking, it is generally ad-
to detect such situations (329). visable to eliminate compounds that would
provide only redundant information (similar-
Finally, the selected compounds are ordered ity filters) or are very unlikely to yield high
or synthesized and then tested. If the goal is to scores. Clearly, the filter routines need to be
identlfy even weakly binding ligands as first faster than the docking and scoring procedure,
leads, suflicient sensitivity of the biological as- but this is normallv " the case.
say has to be ensured [cf., for example, Ref. 161. Complementary to initial filtering, a preor-
In this context it has also to be considered that ganization of compounds into families exhib-
limited solubility of the hits in water or water1 iting some kind of similarity has been demon-
DMSO mixtures often hampers affinity deter- .
strated to im~rovethe results of database
minations at high concentrations. screening. In the strategy shown by Su et al.
Successful virtual screening has to produce (359), all molecules of any family are docked
a set of compounds significantly enriched with and scored, but only the best-scoring member
active compounds compared to random selec- of a high-ranking family is allowed to r e m a i ~
tion. A key parameter to assess the perfor- in the final hit list, whereas the scores of re-
mance of docking and scoring in virtual lated molecules are recorded as annotations to
screening is therefore, at least in theoretical this representative family member. This in-
case studies, the so-called enrichment factor. creases the diversity of the hit list and helps to
It is simply the ratio of active compounds in identify a higher number of different classes of
the subset selected by docking divided by the potential ligands.
number of active compounds found in a ran- An alternative to sequential docking can be
domly selected subset of equal size. To record followed if combinatorial libraries are evalu-
such enrichment factors also for controlling ated. Quite a few programs have been specifi-
performance at the various filter steps, a set of cally designed for speed-up by so-called com-
known active compounds is mixed with the set binatorial docking. They profit from the
of candidate molecules. This strategy, how- structured, incremental nature of combinato-
ever, requires a set of reasonable size (e.g., rial libraries and the fact that molecules of a
30-50 ligands), which is not always given in a combinatorial library consist of a common
real-life virtual screening study. Further- core. This core is assumed to form common
more, enrichment factors are far from being specific interactions with the receptor (possi-
ideal indicators, particularly at later filter bly supported by experimental evidence) and
steps where a (hopefully) increasing amount can thus be prepositioned in the binding
of active compounds detected among the en- pocket in one or a few similar orientations. It
tries of the database competes with the set of then serves as skeleton for the addition of sub-
known active ligands and artificially lowers stituents. Obviously, this step is ideally suited
the enrichment factor. for incremental construction algorithms (361)
Docking and Scoring FunctionsNirtual Screening
and significantly reduces the complexity of the process and can be placed with high confidence
docking problem, limiting the required com- in a well-characterized specificity pocket, such
putation time per ligand. Earlier examples of as the S1 pocket in thrombin. A further issue
this combinatorial docking approach are to consider is mutual fragment dependencies,
PRO SELECT (362) and CombiDOCK (324). that is, when multiple fragments are hooked
The latter is based on the DOCK program and up to a scaffold in a sequential manner; the
has recently been enhanced by a vector-based results can depend on the sequence by which
orientation filter, to ensure productive scaf- they are added (see, for example, Ref. 363).
fold poses, and by a free-energy-based scoring Thus, in unfavorable cases, different orders of
procedure (279). Another recent combinato- attachment have to be followed to circumvent
rial docking procedure has been implemented this possible limitation.
as FlexXcextension in FlexX (363).It follows a
5.2 Seeding Experiments to Assess Docking
recursive scheme to traverse the combinato-
and Scoring in Virtual Screening
rial library space efficiently. The algorithm is
based on a tree data structure that allows the True enrichment factors can be calculated
efficient reuse of previously calculated dock- only if experimental data are available for the
ing results. FlexXc follows the library search full library, although such situations are un-
tree in a depth-first manner, whereas Combi- usual. Accordingly, studies using enrichment
DOCK uses a breadth-first approach to evalu- factors as a figure-of-merit to assess the per-
ate fragments attached to a scaffold. A general formance of a virtual screening can serve for
advantage of breadth-first searches is that theoretical validation purposes only. Several
they allow for an efficient pruning of the authors have tested the predictive ability of
search tree based on the scoring values. docking and scoring tools by compiling an ar-
De novo design tools have also been adapted bitrary set of diverse, drug-like compounds
to the problem of combinatorial docking and complemented by a number of known active
combinatorial library design. The program compounds. This "seeded" library is then sub-
LUDI, for example, has been enhanced by the jected to the virtual screening, and for the pur-
ability to connect building blocks in a chemi- pose of assessment it is assumed that the
cally and structurally adequate manner; it can added active compounds are the only true ac-
thus be used for combinatorial docking by fit- tives in the library. Clearly, this is a rather
ting building blocks onto the interaction sites questionable assumption.
and simultaneous linking to previously docked Several seeding experiments have been
core fragments (300). It has been successfully published. An example has been performed at
applied in the design of new thrombin inhibi- Merck using FLOG (121). A library consisting
tors accessible through a single reaction. An- of 10,000 compounds including inhibitors of
other example is a variant of the Builder pro- various types of proteases and HIV protease
gram (364) that was used to select substituents was docked into the active site of HIV pro-
for a library of cathepsin D inhibitors (12). Yet tease. This resulted in excellent enrichment of
another approach is DREAM+ +, a suite of pro- the HIV motease inhibitors: all inhibitors but
grams for the design of virtual combinatorial li- one were among the top 500 library members.
braries (365).Here, the DOCK algorithm is used However, inhibitors of other proteases were
for the molecular placement. Variable frag- also considerably enriched (366).
ments are joined consecutively in compliance Seeding experiments also allow for compari-
with predefined types of well-characterized or- sons of different docking and scoring proce-
ganic reactions. Speed-up is achieved by pre- dures, as shown, for example, by Charifson et
serving ("inheriting") information about com- al. (86), Bissantz et al. (230), and Stahl and
mon partial structures across different Rarey (287). Charifson et al. compiled sets of
reactions, such that only the conformations of several hundred active molecules for three dif-
newly added fragments are searched. ferent targets, p38 MAP kinase, inosine mono-
Generally speaking, combinatorial docking phosphate dehydrogenase, and HIV protease.
approaches work best in cases where a core These were docked into the corresponding ac-
fragment plays a dominant role in the binding tive sites together with 10,000 randomly se-
319
lected, but drug-like, commercially available tarity. This is clearly reflected in results of
compounds using DOCK (327) and the Vertex database-ranking experiments. To combine
in-house tool Gambler. ChemScore (80, 1881, the virtues of both scoring functions and to
the DOCK AMBER force field score, and PLP construct a more robust general function, a
(185) performed consistently well in enriching combination of PLP and F l e d called Screen-
active compounds. This result was partially Score has recently been published (287). It
attributed to the fact that a rigid-body optimi- was derived by a systematic optimization of
zation could be carried out with these func- library ranking results over seven targets and
tions because they include repulsive terms in covers a wide range of active sites with respect
contrast to many other tested functions. Stahl to form, size, and polarity. Screenscore ob-
and Rarey compared DrugScore (226), PMF tains good enrichments for COX-2 (highly li-
(317), PLP (185),and the original F l e d score pophilic binding site) and neurarninidase
using FlexX for docking (110,130,138). Inter- (highly polar site), whereas the individual
estingly, the two knowledge-based scoring functions fail in one of the two cases. The au-
functions performed differently. DrugScore thors of PLP have recently enhanced their
achieved better ranking for the tight-binding scoring function by including directed hydro-
ligands in narrow lipophilic cavities of COX-2 gen bonding terms (367). Similar to Screen-
and the thrombin S1 pocket. In contrast, PMF Score, this could also lead to a more robust
obtained better enrichment for the case of the scoring function.
very polar binding site of neuraminidase. Ob- 5.4 Finding Weak Inhibitors
viously, a general strength of PMF is the de-
scription of complexes showing multiple hy- Seeding experiments are often carried out
drogen bonds. This has also been noted in the with a small number of active compounds that
study by Bissantz et al., in which PMF was are already optimized for binding to the stud-
found to perform well for the polar target thy- ied target. Enrichment factors based on the
midine kinase and less well for the estrogen retrieval of these compounds are not very con-
clusive because the recovery of potent inhibi-
tors from a large set of candidate molecules is
5.3 Hydrogen Bonding versus Hydrophobic
significantly easier than the discovery of new,
but usually rather weak inhibitors from a
A balanced description of the contribution of large majority of nonbinders. In general, as i s
hydrogen bonding and hydrophobic interac- HTS, one can only expect hits from virtual
tions to the total score is of general impor- screening that bind in the low micromolar
tance, to avoid a bias toward either highly po- range.
lar or completely hydrophobic molecules. The Nevertheless, a recent study showed that
actual parameterization of a scoring function library screening can also successfully detect
depends on the compilation of the data set very weak ligands. Approximately 4000 com-
used to develop the function. Empirical scor- mercially available compounds had been
ing functions are more likely affected by the screened for FKBP-binding by means of the
data set composition used for parameteriza- SAR-by-NMR technique (368) and 31 com-
tion, but can be quickly reparameterized. In pounds with activity in the low millimolar
the case of knowledge-based functions such a range were detected. This set of compounds
readjustment is more difficult to perform; was flexibly docked into the FKBP binding site
e of the much larger data- using DOCK 4.0 with the PMF scoring func-
heir development, they are tion (369). Interestingly, significant enrich-
posed to be less dependent on special data ment factors of 2 to 3 were achieved, whereas
scoring with the standard AMBER score of
The PLP function, for example, addresses DOCK did not really provide an enrichment.
al steric complementarity and hydro-
5.5 Consensus Scoring
ic interactions based on rather long-
ge pair potentials, whereas the FlexX score Different scoring schemes focus on different
hydrogen-bond complemen- aspects as most important contributions to
Docking and Scoring Functions/Virtual Screening
binding. However, these differences do not RNA targets, providing a selection of approxi-
necessarily become obvious when calculating mately 5000 compounds. This was followed by
binding affinities of known active compounds. two additional steps involving longer sam-
In contrast, the scoring of non-active com- pling of conformational space to retrieve 350
pounds could unravel such differences. Vertex most promising candidates. Of these, a very
has reported good experience with so-called small fraction was tested experimentally and
consensus scoring. Here, docking results are two compounds were found to significantly re-
scored by several distinct functions and only duce the binding of the Tat protein to HIV-1
those hits are considered that are rendered
prominent by several of the functions. A sig-
TAR (CD,, a).
1
Recently, Grueneberg et al. discovered sub-
nificant decrease in false positives has been nanomolar inhibitors of carbonic anhydrase I1
described (86), but inevitably a number of true
by virtual screening (15). The study was per-
positives is lost (see, for example, Ref. 230).
formed following a protocol of several consec-
When consensus scoring is applied, one
should thus keep in mind that, although the utive steps of hierarchical filtering (Fig. 7.3).
number of false positives can be reduced, the Carbonic anhydrase I1 is a metalloenzyme
danger exists to discard some active com- used as prominent target for the treatment of
pounds highlighted by only one of the scoring glaucoma. Its binding site is a rather rigid,
functions. This would, for example, apply to funnel-shaped pocket. Known inhibitors such
the above-mentioned PLP and FlexX scoring as dorzolamide bind to the catalytic zinc ion by
functions, which emphasize different aspects a sulfonamide group. In a recent crystallo-
of ligand binding. Here, consensus scoring graphic study it could be demonstrated that
could be counterproductive. Therefore, along only the sulfonamide group represents an
with consensus scoring, the individual scoring ideal anchor for zinc coordination (377). An
results should be consulted. Generally, how- initial data set of 90,000 entries from the May-
ever, it appears that one can expect more ro- bridge (378) and LeadQuest (379) libraries
bust results from consensus scoring. was converted to 3D structures with Corina
(380). In a first filtering step, compounds were
5.6 Successful Identification of Novel Leads
requested to possess a known zinc-binding
through Virtual Screening
group. These compounds were then processed
A considerable number of publications have through UNITY (355) using a protein-derived
proved that virtual screening can be efficiently pharmacophore query. The pharmacophore
used to discover novel leads (11,13,142,370- hypothesis had been constructed from a "hot
375). Some of the most recent examples are spot" analysis of the available X-ray struc-
briefly presented in the following. tures of the enzyme. This yielded a set of 3314
The program ICM has been used to identify compounds. In a subsequent filtering step, the
novel antagonists for a nuclear hormone re- known inhibitor dorzolamide was used as a
ceptor (201) and, together with DOCK, to find template onto which all potential candidates
inhibitors for the RNA transactivation re- were flexibly superimposed by means of the
sponse element (TAR) of HIV-1 (25). The vir- program FlexS (330). The top-ranking com-
tual screening protocol started with 153,000 pounds from this step were docked into the
compounds from the Available Chemicals Di- binding site with FlexX (110,130,138),taking
rectory (ACD) (376) and involved increasingly into account four conserved water molecules
elaborate docking and scoring schemes as the in the active site. After visual inspection, 13
screening proceeded toward smaller selections top-ranking hits were selected for experimen-
of compounds. In the HIV-1 TAR study, the tal testing. Nine of these compounds showed
ACD library was first rigidly docked into the activities below 1 a , and three had Ki values
binding site using the DOCK program along below 1 nM.Two of the hits were also exam-
with a simple contact scoring scheme. Then, ined crystallographically. The docking solu-
20% of the best-scoring compounds were sub- tion predicted as best by Drugscore was found
jected to flexible docking with ICM and an em- to be closer to the experimental structure than
pirical scoring function specifically tailored to the one predicted by the FlexX score.
6 Outlook
This strategy of hierarchical filtering start- Then, close analogs of the first series of hits
ing with a mapping of candidate molecules were assayed, resulting in a total screen of
onto a binding site-derived pharmacophore, 3000 compounds. This provided 150 hits, clus-
followed by a similarity analysis with known tered into 14 chemical classes. Seven of these
ligands using either FlexS (3301, SEAL (254- classes could be demonstrated as novel DNA
256), or FeatureTrees (357, 358); and con- gyrase inhibitors competing for the ATP bind-
cluded by flexible docking with FlexX, which ing site. Subsequent structure-based optimi-
meanwhile was applied to three other proteins zation resulted in inhibitors with potencies
in the same laboratory. For t-RNA guanine equal to or up to 10 times better than those of
transglycosylase, thermolysin, and aldose re- known antibiotics.
dudase, novel micromolar to submicromolar
lead structures could be discovered. Most chal-
lenging in this context is aldose reductase be- 6 OUTLOOK
cause it performs pronounced induced fit
changes upon ligand binding. Crystal struc- The first docking programs were introduced
ture analysis of a micromolar hit retrieved by about 20 years ago, and the publication of the
virtual screening clearly revealed known and first generally applicable scoring functions
new areas of induced fit adaptation. The crys- dates back about 10 years. Since then, much
tal structure obtained with this hit provides a experience has been gained in developing and
good starting point for further lead optimiza- applying docking algorithms, using scoring
tion. functions, and assessing their accuracy. Sig-
The de novo design of inhibitors of the bac- nificant progress has been made over the last
terial enzyme DNA gyrase, a well-established few years and it appears as if there are now
antibacterial target (381), is another example docking tools available to address a variety of
for successful structure-based virtual screen- goals with considerable accuracy, from the
ing, reported by Roche (16). HTS performed precise and detailed analysis of binding inter-
on the proprietary compound library provided actions for a small set of ligands up to a fast
no suitable lead structures. Therefore, a new screening of large compound collections. Sim-
rational approach was developed to discover ilarly, scoring functions are currently avail-
potential lead structures using structural in- able that can be applied to a wide range of
formation of the ATP binding site in subunit B different proteins and consistently yield a c o h
of the enzyme. At the onset of the project, the siderable retrieval of active compounds. As a
crystal structures of DNA gyrase subunit B consequence, the pharmaceutical industry in-
complexed with a substrate analog and two creasingly uses virtual screening to identify
inhibitors were available. In the buried part of possible leads.
the pocket they all donate a hydrogen bond to In fact, structure-based design is now es-
an aspartic acid side-chain and accept one tablished as an important approach to drug
from a conserved water molecule. As a design discovery complementing HTS (3821, al-
concept, the formation of these two key hydro- though HTS has a number of serious disad-
gen bonds has been defined as mandatory. As vantages. It is expensive (383) and it leads to
an additional requirement, a lipophilic portion many false positives and a disappointingly
forming hydrophobic interactions with the en- small number of real leads (384, 3851, partic-
zyme was demanded. A new assay was estab- ularly if screening is performed on a member
lished to allow for the detection of weakly of a new protein class. Also, not all assays are
binding inhibitors. A computational search of easily amenable to HTS requirements. Fi-
the ACD (3761 and the Roche Compound In- nally, despite the library sizes of several mil-
ventory identified hits having low molecular lion entries available to the -pharmaceutical in-
weights and matching the above-mentioned dustry, these compound collections do not
criteria. Relying on the results of the in silico approach the size and diversity needed to even
screening Based on docking with LUDI and approximately cover the chemical space of
a pharmacophore search with CATALYST drug-like organic molecules. Accordingly, fo-
(356)l 600 compounds were tested initially. cused design of novel compounds and com-
Docking and Scoring Functions/Virtual Screening
pound libraries should only gain importance. 5. All scoring functions are essentially ex-
In light of current trends in structural geno- pressed as simple analytical functions fit-
mics and patenting strategies, one may specu- ted to experimental binding data. The pres-
late that structure-based de novo design will ently available crystal data on complex
become much more important in the near fu- structures are strongly biased toward pep-
ture. tidic ligands. Because these data are used
To meet the increasing demands being for the development of scoring functions,
placed on virtual screening, the development many overestimate the role of polar inter-
of more reliable scoring functions is certainly actions. The development of improved
vital for success. In addition, novel or im- scoring functions clearly requires access to
proved docking algorithms are required. We better data, especially for nonpeptidic, low
molecular weight, drug-like ligands, in-
conclude by summarizing our perspective on
cluding weakly binding compounds.
major challenges in the further development
of docking procedures and scoring functions: 6. Unfavorable interactions and unlikely
docking modes are not penalized strongly
1. The fact that protein-ligand interactions enough. Methods for taking such undesired
features into account are still lacking in
occur in aqueous solution is generally ap-
presently available scoring functions.
preciated, but not yet adequately ac-
counted for in molecular docking proce- 7. So far, fast scoring functions cover only
dures. In particular, the simultaneous part of the whole receptor-ligand binding
placement of explicit water molecules upon process. A more detailed picture could be
obtained by taking into account properties
docking, accurate estimates of the water
of the unbound ligand, that is, solvation
versus ligand interaction-energy balance,
effects and energetic differences between
and the fast prediction of protonation the low-energy solution conformations and
states in binding pockets await a more sat-
~. the bound conformation.
isfactory solution.
The consideration of a sufficient degree of 7 ACKNOWLEDGMENTS
protein flexibility needs to become part of
standard docking approaches. This will re- The authors have benefited from numerous'
quire faster algorithms. In addition, with discussions with many researchers active in
respect to scoring, an often overlooked as- the field of docking and scoring, especially
pect of this problem is that as soon as re- Holger Gohlke (University of Marburg/
ceptor flexibility is allowed, protein confor- Scripps Research Institute), Ingo Muegge
mational energy changes need to be (Bayer),and Matthias Rarey (GMD St. Augus-
accounted for appropriately. tin).
Although flexible-ligand docking has al-
ready become standard practice, the error REFERENCES
rate in predictions of interaction geome- 1. H. J. Boehm and G. Klebe, Angew. Chem. Int.
tries is still significant for more flexible li- Ed. Engl., 35, 2588 (1996).
gands. Again, more efficient algorithms 2. R. E. Babine and S. L. Bender, Chem. Rev., 97,
will be required to sample the conforma- 1359 (1997).
tion space more thoroughly. 3. J. Greer, J. W. Erickson, J. J. Baldwin, and
Polar interactions are still not treated ade- M. D. Varney, J. Med. Chem., 37,1035 (1994).
quately. It is striking that, even though the 4. S. W. Kaldor, V. J. Kalish, J. F. Davies, 2nd,
B. V. Shetty, J. E. Fritz, K. Appelt, J. A. Bur-
role of hydrogen bonds in biology has been
gess, K. M. Campanale, N. Y. Chirgadze, D. K.
appreciated for a long time and the nature Clawson, B. A. Dressman, S. D. Hatch, D. A.
of hydrogen bonds is qualitatively well un- Khalil, M. B. Kosa, P. P. Lubbehusen, M. A.
derstood, their quantitative energetic de- Muesing, A. K. Patick, S. H. Reich, K. S. Su,
scription in protein-ligand interactions is and J. H. Tatlock, J. Med. Chem., 40, 3979
still unsatisfactory (65). (1997).
rences
47. T. Wiseman, S. Williston, J. F. Brandts, and 67. A. C. Tissot, S. Vuilleumier, and A. R. Fersht,
L. N. Lin,Anal. Biochem., 179, 131 (1989). Biochemistry, 35, 6786 (1996).
48. S. H. Sleigh, P. R. Seavers, A. J. Wilkinson, 68. J. D. Dunitz, Science, 264,670 (1994).
J. E. Ladbury, and J. R. Tame, J. Mol. Biol., 69. C. Chothia, Nature, 254,304 (1975).
291,393 (1999). 70. F. M. Richards, Annu. Rev. Biophys. Bioeng.,
49. M. H. Parker, D. F. Ortwine, P. M. O'Brien, 6, 151 (1977).
E. A. Lunney, C. A. Banotai, W. T. Mueller, P. 71. K. A. Sharp, A. Nicholls, R. Friedman, and B.
McConnell, and C. G. Brouillette, Bioorg. Med. Honig, Biochemistry, 30,9686 (1991).
Chem. Lett., 10,2427 (2000).
72. M. S. Searle and D. H. Williams, J. Am. Chem.
50. D. H. Williams, D. P. O'Brien, and B. Bardsley, Soc., 114, 10690 (1992).
J. Am. Chem. Soc., 123, 737 (2001).
73. M. S. Searle, D. H. Williams, and U. Gerhard,
51. P. Gilli, V. Ferretti, G. Gilli, and P. A. Brea, J.
J. Am. Chem. Soc., 114,10697 (1992).
Phys. Chem., 98, 1515 (1994).
74. M. A. Hossain and H. J. Schneider, Chem. Eur.
52. J. D. Dunitz, Chem. Biol., 2, 709 (1995).
J.,5, 1284 (1999).
53. K. Sharp, Protein Sci., 10,661 (2001). 75. J. Hermans and L. Wang, J. Am. Chem. Soc.,
54. P. C. Weber, J . J. Wendoloski, M. W. Panto- 119,2702 (1997).
liano, and F. R. Salemme, J. Am. Chem. Soc., 76. K. P. Murphy, D. Xie, K. S. Thompson, L. M.
114, 3197 (1992).
Amzel, and E. Freire, Proteins, 18,63 (1994).
55. F. Dullweber, M. T. Stubbs, D. Musil, J. Stuer- 77. K. A. Dill, J. Biol. Chem., 272, 701 (1997).
zebecher, and G. Klebe, J. Mol. Biol., 313,593
(2001). 78. G. Folkers, Ed., Pharm. Acta Helv., 69, 175
(1995).
56. J. R. Tame, J. Cornput.-Aided Mol. Des., 13,99
(1999). 79. D. E. J. Koshland, Angew. Chem. Znt. Ed.
Engl., 33,2408 (1994).
57. A. R. Khan, J. C. Parrish, M. E. Fraser, W. W.
Smith, P. A. Bartlett, and M. N. James, Bio- 80. M. D. Eldridge, C. W. Murray, T. R. Auton,
chemistry, 37,16839 (1998). G. V. Paolini, and R. P. Mee, J. Cornput.-Aided
Mol. Des., 11,425 (1997).
58. H. Mack, T. Pfeiffer, W. Hornberger, H. J.
Boehm, and H. W. Hoeffken, J. Enzyme Znhib., 81. I. K. McDonald and J. M. Thornton, J. Mol.
9, 73 (1995). Biol., 238, 777 (1994).
59. Y. W. Chen, A. R. Fersht, and K. Henrick, J. 82. M. Totrov and R. Abagyan in R. B. Raffa, Ed.,
Mol. Biol., 234, 1158 (1993). Drug-Receptor Thermodynamics: Introduction
and Applications, John Wiley & Sons, Chiches-
60. P. R. Connelly, R. A. Aldape, F. J. Bruzzese,
ter, 2001, p. 603.
S. P. Chambers, M. J. Fitzgibbon, M. A. Flem-
ing, S. Itoh, D. J. Livingston, M. A. Navia, J. A. 83. I. D. Kuntz, E. C. Meng, and B. K. Shiochet,
Thomson, and K. P. Wilson, Proc. Natl. Acad. Acc. Chem. Res., 27, 117 (1994).
Sci. USA, 91, 1964 (1994). 84. A. R. Leach and M. M. Hann, Drug Discov.
61. A. R. Fersht, J. P. Shi, J. Knill-Jones, D. M. Today, 5, 326 (2000).
Lowe, A. J . Wilkinson, D. M. Blow, P. Brick, P. 85. R. A. Lewis, S. D. Pickett, and D. E. Clark in
Carter, M. M. Waye, and G. Winter, Nature, K. B. Lipkowitz and D. B. Boyd, Eds., Reviews
314,235 (1985). in Computational Chemistry, Vol. 16, Wiley-
62. U. Obst, D. W. Banner, L. Weber, and F. Die- VCH, New York, 2000, p. 1.
derich, Chem. Biol., 4, 287 (1997). 86. P. S. Charifson, J. J. Corkery, M. A. Murcko,
63. B. P. Morgan, J. M. Scholtz, M. D. Ballinger, and W. P. Walters, J. Med. Chem., 42, 5100
I. D. Zipkin, and P. A. Bartlett, J. Am. Chem. (1999).
Soc., 113, 297 (1991). 87. G. E. Terp, B. N. Johansen, I. T. Christensen,
64. B. A. Shirley, P. Stanssens, U. Hahn, and C. N. and F. S. Jorgensen, J. Med. Chem., 44, 2333
Pace, Biochemistry, 31, 725 (1992). (2001).
65. H. Kubinyi in B. Testa, H. van de Waterbeemd, 88. T. P. Lybrand, C u m Opin. Struct. Biol., 5,224
G. Folkers, and R. Guy, Eds., Pharmacokinetic (1995).
Optimization in Drug Research, Wiley-VCH, 89. G. Jones and P. Willett, Curr. Opin. Biotech-
Weinheim, Germany, 2001, p. 513. nol., 6,652 (1995).
66. H A . Schneider, T. Schiestel, and P. Zimmer- 90. T. Lengauer and M. Rarey, Curr. Opin. Struct.
mann, J. Am. Chem. Soc., 114,7698 (1992). Biol., 6,402 (1996).
R. Rosenfeld, S. Vajda, and C. DeLisi, Annu. 114. G. M. Morris, D. S. Goodsell, R. Huey, and A. J.
Rev. Biophys. Biomol. Struct., 24, 677 (1995). Olson, J. Cornput.-Aided Mol. Des., 10, 293
P. Bamborough and F. E. Cohen, Curr. Opin. (1996).
Struct. Biol., 6, 236 (1996). 115. G. M. Morris, D. S. Goodsell, R. S. Halliday, R.
J. S. Dixon and J. M. Blaney in Y. C. Martin Huey, W. E. Hart, R. K. Belew, and A. J. Olson,
and P. Willett, Eds., Designing Bioactive Mol- J. Comput. Chem., 19,1639 (1998).
ecules: Three-Dimensional Techniques and 116. R. Abagyan, M. Trotov, and D. Kuznetsov,
Applications, American Chemical Society, J. Comput. Chem., 15,488 (1994).
Washington, DC, 1997, p. 175. 117. M. Totrov and R. Abagyan, Proteins (Suppl.),
P. J. Gane and P. M. Dean, Curr. Opin. Struct. 215 (1997).
Biol., 10,401 (2000).
118. J. Y. Trosset and H. A. Scheraga, J. Comput.
C. A. Sotriffer, W. Flader, R. H. Winger, B. M. Chem., 20,412 (1999).
Rode, K. R. Lied, and J. M. Varga, Methods,
20,280 (2000). 119. J . Y. Trosset and H. A. Scheraga, Proc. Natl.
Acad. Sci. USA, 95,8011 (1998).
R. B. Russell and D. S. Eggleston, Nut. Struct.
Biol.7 (Suppl.), 928 (2000). 120. F. S. Kuhl, G. M. Crippen, and W. G. Richards,
J. Comput. Chem., 5 , 2 4 (1984).
M. Hendlich, F. Rippmann, and G. Barnickel,
J. Mol. Graph. Model., 15, 359 (1997). 121. M. D. Miller, S. K. Kearsley, D. J. Underwood,
and R. P. Sheridan, J. Cornput.-Aided Mol.
G. P. Brady, Jr. and P. F. Stouten, J. Cornput.-
Des., 8, 153 (1994).
Aided Mol. Des., 14,383 (2000).
C. A. Orengo, A. E. Todd, and J. M. Thornton, 122. D. M. Lorber and B. K. Shoichet, Protein Sci.,
Curr. Opin. Struct. Biol., 9, 374 (1999). 7,938 (1998).
J. M. Thornton, A. E. Todd, D. Milburn, N. 123. M. McGann (2001). FRED [Online]. OpenEye.
Borkakoti, and C. A. Orengo, Nut. Struct. http://www.eyesopen.com/fred.html [2001,
Biol.7 (Suppl.), 991 (2000). Sept 251.
S. Schmitt, M. Hendlich, and G. Klebe, Angew. 124. Y. P. Pang and A. P. Kozikowski, J Cornput.-
Chem. Znt. Ed. Engl., 40,3141 (2001). Aided Mol. Des., 8,669 (1994).
J. Ruppert, W. Welch, and A. N. Jain, Protein 125. Y. P. Pang, E. Perola, K. Xu, and F. G. Pren-
Sci., 6,524 (1997). dergast, J. Comput. Chem., 22, 1750 (2001).
F. Jiang and S. H. Kim, J. Mol. Biol., 219, 79 126. E. C. Meng, D. A. Gschwend, J. M. Blaney, and
(1991). I. D. Kuntz, Proteins, 17,266 (1993).
D. Fischer, S. L. Lin, H. L. Wolfson, and R. 127. D. A. Gschwend and I. D. Kuntz, J. Cornput.-
Nussinov, J. Mol. Biol., 248,459 (1995). Aided Mol. Des., 10, 123 (1996).
P. Burkhard, P. Taylor, and M. D. Walkin- 128. R. L. DesJarlais, R. P. Sheridan, J. S. Dixon,
shaw, J. Mol. Biol., 277,449 (1998). I. D. Kuntz, and R. Venkataraghavan, J. Med.
I. D. Kuntz, J. M. Blaney, S. J. Oatley, R. Lang- Chem., 29,2149 (1986).
ridge, and T. E. Ferrin, J. Mol. Biol., 161, 269 129. J. Wang, P. A. Kollman, and I. D. Kuntz, Pro-
(1982). teins, 36, 1 (1999).
C. M. Oshiro and I. D. Kuntz, Proteins, 30,321 130. M. Rarey, B. Kramer, and T. Lengauer,
(1998). J. Cornput.-Aided Mol. Des., 11,369 (1997).
H. J. Boehm, J. Cornput.-Aided Mol. Des., 6, 131. T. J. Ewing, S. Makino, A. G. Skillman, and
593 (1992). I. D. Kuntz, J. Cornput.-Aided Mol. Des., 15,
H. J. Boehm, J.Cornput.-Aided Mol. Des., 6,61 411 (2001).
(1992). 132. R. L. DesJarlais, R. P. Sheridan, G. L. Seibel,
M. Rarey, B. Kramer, T. Lengauer, and G. J. S. Dixon, I. D. Kuntz, and R. Venkataragha-
Klebe, J. Mol. Biol., 261, 470 (1996). van, J. Med. Chem., 31,722 (1988).
V. Schnecke and L. A. Kuhn, Perspect. Drug 133. D. J. Bacon and J. Moult, J. Mol. Biol., 225,
Discov. Des., 20, 171 (2000). 849 (1992).
D. J. Diller and K. M. Merz, Jr., Proteins, 43, 134. A. Wallqvist and D. G. Covell, Proteins, 25,403
113 (2001). (1996).
D. S. Goodsell and A. J. Olson, Proteins, 8,195 135. M. Y. Mizutani, N. Tomioka, and A. Itai, J.
(1990). Mol. Biol., 243, 310 (1994).
Docking and Scoring FunctionsIVirtual Screening
136. B. B. Goldman and W. T. Wipke, Proteins, 38, 160. T. P. Straatsma and J. A. McCammon, Annu.
79 (2000). Rev. Phys. Chem., 43,407 (1992).
137. S. Makino and I. D. Kuntz, J. Comput. Chem., 161. P. A. Kollman, Chem. Rev., 93,2395 (1993).
18, 1812 (1997). 162. T. P. Straatsma in K. B. Lipkowitz and D. B.
138. M. Rarey, S. Wefing, and T. Lengauer, J. Com- Boyd, Eds., Reviews in Computational Chem-
put.-Aided Mol. Des., 10,41(1996). istry, Vol. 9, VCH, New York, 1996, p. 81.
139. B. Kramer, M. Rarey, and T. Lengauer, Pro- 163. M. R. Reddy, M. D. Erion, and A. Agarwal in
teins, 37, 228 (1999). K. B. Lipkowitz and D. B. Boyd, Eds., Reviews
140. G. Klebe and T. Mietzner, J. Cornput.-Aided in Computational Chemistry, Vol. 16, Wiley-
Mol. Des., 8,583 (1994). VCH, New York, 2000, p. 217.
141. G. Klebe, J. Mol. Biol., 237,212 (1994). 164. A. Di Nola, D. Roccatano, and H. J. Berendsen,
142. R. L. DesJarlais and J. S. Dixon, J. Cornput.- Proteins, 19, 174 (1994).
Aided Mol. Des., 8,231 (1994). 165. M. Mangoni, D. Roccatano, and A. Di Nola,
143. B. K. Shoichet and I. D. Kuntz, Protein Eng., 6, Proteins, 35, 153 (1999).
723 (1993). 166. M. Vieth, J. D. Hirst, B. N. Dominy, H. Daigler,
144. N. Metropolis, A. W. Rosenbluth, M. N. Rosen- and C. L. Brooks, J. Comput. Chem., 19, 1623
bluth, A. H. Teller, and E. Teller, J. Chem. (1998).
Phys., 21, 1087 (1953). 167. Y. Pak and S. Wang, J.Phys. Chem. B, 104,354
145. T. N. Hart, S. R. Ness, and R. J. Read, Proteins (2000).
(Suppl.), 205 (1997). 168. Y. Pak, I. J. Enyedy, J. Varady, J. W. Kung,
146. T. N. Hart and R. J. Read, Proteins, 13, 206 P. S. Lorenzo, P. M. Blumberg, and S. Wang,
(1992). J. Med. Chem., 44, 1690 (2001).
147. M. Liu and S. Wang, J. Cornput.-Aided Mol. 169. B. A. Luty, Z. R. Wasserman, P. W. F. Stouten,
Des., 13,435 (1999). C. N. Hodge, M. Zacharias, and J. A. McCam-
mon, J. Comput. Chem., 16,454 (1995).
148. Z. Li and H. A. Scheraga, Proc. Natl. Acad. Sci.
USA, 84,6611 (1987). 170. A. Miranker and M. Karplus, Proteins, 11, 29
(1991).
149. R. Abagyan and P. Argos, J. Mol. Biol., 225,
519 (1992). 171. A. Caflisch, A. Miranker, and M. Karplus,
J. Med. Chem., 36,2142 (1993).
150. A. Caflisch, S. Fischer, and M. Karplus,
J. Comput. Chem.,18, 723 (1997). 172. R. Judson in K. B. Lipkowitz and D. B. Boyd,
Eds., Reviews in Computational Chemist?,
151. I. Apostolakis, A. Plueckthun, and A. Caflisch, Vol. 10, VCH, New York, 1997.
J. Comput. Chem., 19,21(1998).
173. D. E. Goldberg, Genetic Algorithms in Search,
152. C. McMartin and R. S. Bohacek, J. Cornput.-
Optimization, and Machine Learning, Addi-
Aided Mol. Des., 11,333 (1997).
son-Wesley, Reading, MA, 1989.
153. M. Karplus and J. A. McCammon, Annu. Rev.
174. L. Davis, Handbook of Genetic Algorithms,
Biochem., 52, 263 (1983).
Van Nostrand Reinhold, New York, 1991.
154. W. F. van Gunsteren and A. E. Mark, Eur.
175. D. E. Clark and D. R. Westhead, J. Cornput.-
J. Biochem., 204,947 (1992).
Aided Mol. Des., 10, 337 (1996).
155. W. F. van Gunsteren, P. H. Huenenberger,
A. E. Mark, P. E. Smith, and I. G. Tironi, Com- 176. G. Jones, P. Willett, and R. C. Glen, J. Mol.
put. Phys. Commun., 91,305 (1995). Biol., 245,43 (1995).
156. D. Rognan, Perspect. Drug Discov. Des., 11, 177. G. Jones, P. Willett, R. C. Glen, A. R. Leach,
181 (1998). and R. Taylor, J. Mol. Biol., 267, 727 (1997).
157. M. E. Tuckerman and G. J. Martyna, J. Phys. 178. J. S. Taylor and R. M. Burnett, Proteins, 41,
Chem. B, 104, 159 (2000). 173 (2000).
158. D. L. Beveridge and F. M. DiCapua, Annu. Rev. 179. B. R. Brooks, R. E. Bruccoleri, B. D. Olafson,
Biophys. Biophys. Chem., 18,431 (1989). D. J. States, S. Swaminathan, and M. Karplus,
159. D. A. Pearlman and P. A. Kollman in W. F. van J. Comput. Chem., 4, 187 (1983).
Gunsteren and P. K. Weiner, Eds., Computer 180. R. S. Judson, E. P. Jaeger, and A. M. Treasury-
Simulation of Biomolecular Systems: Theoret- wala, Theochem. J. Mol. Struct., 114, 191
ical and Experimental Applications, Vol. 1, (1994).
ESCOM, Leiden, The Netherlands, 1989, p. 181. K. P. Clark and Ajay, J. Comput. Chem., 16,
101. 1210 (1995).
rences
225. P. Tao and L. Lai, J. Cornput.-Aided Mol. Des., Design. Proceedings of the 13th European
15, 429 (2001). Symposium on Quantitative Structure-Activity
226. H. Gohlke, M. Hendlich, and G. Klebe, J. Mol. Relationships, Prous Science, BarcelonaPhil-
Biol., 295, 337 (2000). adelphia, 2001, p. 78.
227. 0. Roche, R. Kiyama, and C. L. Brooks, 3rd, 245. G. Cruciani and K. A. Watson, J. Med. Chem.,
J. Med. Chem., 44,3592 (2001). 37,2589 (1994).
228. C. A. Sotriffer,H. H. Ni, and J. A. McCammon, 246. W . Sippl and H. D. Hoeltje, J. Mol. Struct.-
J. Am. Chem. Soc., 122,6136 (2000). Theochem., 503,31 (2000).
229. S. Ha, R. Andreani, A. Robbins, and I. Muegge, 247. M. Vieth and D. J. Cummins, J. Med. Chem.,
J. Cornput.-Aided Mol. Des., 14,435 (2000). 43, 3020 (2000).
230. C. Bissantz, G. Folkers, and D. Rognan, 248. M. Wojciechowski and J. Skolnick, J. Comput.
J. Med. Chem., 43,4759 (2000). Chem., 23, 189 (2002).
231. J. S. Dixon, Proteins (Suppl.),198 (1997). 249. S. Moro, A. H. Li, and K. A. Jacobson, J. Chem.
232. N. C. Strynadka, M. Eisenstein, E. Katchalski- Inf Comput. Sci., 38, 1239 (1998).
Katzir, B. K. Shoichet, I. D. Kuntz, R. Aba- 250. S. Moro, D. Guo, E. Camaioni, J. L. Boyer,
gyan, M. Totrov, J. Janin, J. Cherfils, F. Zim- T . K. Harden, and K. A. Jacobson, J. Med.
merman, A. Olson, B. Duncan, M. Rao, R. Chem., 41, 1456 (1998).
Jackson, M. Sternberg, and M. N. James, Nut. 251. R. Kiyama, Y . Tamura, F. Watanabe, H. Tsu-
Struct. Biol., 3,233 (1996). zuki, M. Ohtani, and M. Yodo, J. Med. Chem.,
233. B. Kramer, M. Rarey, and T . Lengauer, Pro- 42, 1723 (1999).
teins (Suppl.), 221 (1997). 252. C. A. Sotriffer, W . Flader, A. Cooper, B. M.
234. V . Sobolev, T . M. Moallem, R. C. Wade, G. Rode, D. S. Linthicum, K. R. Liedl, and J. M.
Vriend, and M. Edelman, Proteins (Suppl.), Varga, Biophys. J., 76,2966 (1999).
210 (1997). 253. A. Schafferhans and G. Klebe, J. Mol. Biol.,
235. H. Kubinyi, Ed., 30 QSAR in Drug Design. 307,407 (2001).
Theory, Methods, and Applications, ESCOM, 254. S. Kearsley and G. Smith, Tetrahedron Com-
Leiden, The Netherlands, 1993. put. Methodol., 3, 615 (1990).
236. H. Kubinyi, G. Folkers, andY. C. Martin, Eds., 255. G. Klebe, T . Mietzner, and F. Weber, J. Com-
30 QSAR in Drug Design. Recent Advances, put.-Aided Mol. Des., 8, 751 (1994).
Vol.3, Kluwer/ESCOM, Dordrecht, The Neth-
erlands, 1998.
256. G. Klebe, T . Mietzner, and F. Weber, J. Com-
put.-Aided Mol. Des., 13,35 (1999). .
237. H. Kubinyi, QSAR, Hansch Analysis and Re- 257. P. A. Kollman,Acc. Chem. Res., 29,461 (1996).
lated Approaches, VCH, Weinheim, Germany,
258. M. L. Lamb and W . L. Jorgensen, Cum. Opin.
1993.
Chem. Biol., 1,449 (1997).
238. W . Sippl, J. Cornput.-Aided Mol. Des., 14, 559
259. M. K. Gilson, J. A. Given, and M. S. Head,
(2000).
Chem. Biol., 4, 87 (1997).
239. C. L. Waller, T . I . Oprea, A. Giolitti, and G. R.
260. J. Aqvist, C. Medina, and J. E. Samuelsson,
Marshall, J. Med. Chem., 36,4152 (1993).
Protein Eng., 7, 385 (1994).
240. A. M. Gamper, R. H. Winger, K. R. Liedl, C. A.
261. T . Hansson, J. Marelius, and J. Aqvist, J. Com-
Sotriffer, J. M. Varga, R. T . Kroemer, and
put.-Aided Mol. Des., 12, 27 (1998).
B. M. Rode, J. Med. Chem., 39,3882 (1996).
262. J. Aqvist,V . B. Luzhkov, and B. 0. Brandsdal,
241. R. C. Wade i n H. D. Hoeltje and W . Sippl, Eds.,
Acc. Chem. Res., 35,358 (2002).
Rational Approaches to Drug Design. Proceed-
ings of the 13th European Symposium on 263. R. C. Rizzo, J. Tirado-Rives, and W . L. Jor-
Quantitative Structure-Activity Relationships, gensen, J. Med. Chem., 44, 145 (2001).
Prous Science, BarcelonaPhiladelphia, 2001, 264. A. E. Mark and W . F. van Gunsteren, J. Mol.
p. 23. Biol., 240, 167 (1994).
242. A. R. Ortiz, M. T . Pisabarro, F. Gago, and R. C. 265. D. Williams and B. Bardsley, Perspect. Drug
Wade, J. Med. Chem., 38,2681 (1995). Discov. Des., 17, 43 (1999).
243. R. C. Wade, A. R. Ortiz, and F. Gago, Perspect. 266. P. R. Andrews, D. J. Craik, and J. L. Martin,
Drug Discov. Des., 9, 19 (1998). J. Med. Chem., 27, 1648 (1984).
244. T . Wang and R. C. Wade in H. D. Hoeltje and 267. H. J. Schneider, Chem. Soc. Rev., 23, 227
W . Sippl, Eds., Rational Approaches to Drug (1994).
268. T. J . Stout, C. R. Sage, and R. M. Stroud, Struc- 290. R. M. A. Knegtel and P. D. J. Grootenhuis,
ture, 6, 839 (1998). Perspect. Drug Discov. Des., 9-11, 99 (1998).
269. M. K. Holloway, J. M.Wai, T . A. Halgren, P.M. 291. T . I. Oprea and G. R. Marshall, Perspect. Drug
Fitzgerald, J. P. Vacca, B. D. Dorsey, R. B. Discov. Des., 9-11,3 (1998).
Levin, W . J. Thompson, L. J. Chen, and S. J. 292. H. J. Boehm and M. Stahl, Med. Chem. Res., 9,
deSolms, J. Med. Chem., 38,305 (1995). 445 (1999).
270. P. D. J . Grootenhuis and P. J. M. van Galen, 293. R. Wang, L. Liu, L. Lai, and Y . Tang, J . Mol.
Acta Cryst. D, 51,560 (1995). Model., 4, 379 (1998).
271. S. J. Weiner, P. A. Kollman, D. A. Case, U. C. 294. H. J. Boehm, J. Cornput.-Aided Mol. Des., 8,
Singh, C. Ghio, G. Alagona, S. Profeta, and P. 243 (1994).
Weiner, J. Am. Chem. Soc., 106, 765 (1984).
295. C. W . Murray, T . R. Auton, andM. D. Eldridge,
272. S. J. Weiner, P. A. Kollman, D. T . Nguyen, and J. Cornput.-Aided Mol. Des., 12,503 (1998).
D. A. Case, J. Comput. Chem., 7,230 (1986).
296. H. J. Boehm, J. Cornput.-Aided Mol. Des., 12,
273. E. C. Meng, B. K. Shoichet, and I. D. Kuntz, 309 (1998).
J. Comput. Chem., 13,505 (1992).
297. A. N. Jain, J. Cornput.-Aided Mol. Des., 10,427
274. M. Vieth, J . D. Hirst, A. Kolinski, and C. L. (1996).
Brooks, J. Comput. Chem., 19,1612 (1998).
298. D. N. Boobbyer, P. J. Goodford, P. M. McWhin-
275. N. Majeux, M. Scarsi, J. Apostolakis, C. nie, and R. C. Wade, J. Med. Chem., 32, 1083
Ehrhardt, and A. Caflisch, Proteins, 37, 88 (1989).
299. M. Stahl, Perspect. Drug Discov. Des., 20, 83
276. B. K. Shoichet, A. R. Leach, and I. D. Kuntz, (2000).
Proteins, 34,4 (1999).
300. H. J. Boehm, D. W . Banner, and L. Weber,
277. W. C. Still, A. Tempczyk, R. C. Hawley, and T. J. Conput.-Aided Mol. Des., 13,51 (1999).
Hendrickson, J. Am. Chem. Soc., 112, 6127
301. M. Matsumara, W . J . Becktel, and B. W . Mat-
thews, Nature, 334,406 (1988).
278. X. Zou, Y. Sun, and I. D. Kuntz, J. Am. Chem.
302. V . Nauchitel, M . C. Villaverde, and F. Suss-
SOC.,121,8033 (1999). man, Protein Sci., 4, 1356 (1995).
279. M. L. Lamb, K. W . Burdick, S. Toba, M. M.
303. L. Wesson and D. Eisenberg, Protein Sci., 1,
Young, A. G. Skillman, X. Zou, J. R. Arnold,
227 (1992).
and I. D. Kuntz, Proteins, 42,296 (2001).
304. S. Vajda, Z. Weng, R. Rosenfeld, and C. DeLisi,
280. M. Schapira, M. Totrov, and R. Abagyan, J.
Biochemistry, 33, 13977 (1994).
Mol. Recognit., 12, 177 (1999).
305. C. Zhang, G. Vasmatzis, J . L. Cornette, and C.
281. T. Zhang and D. E. Koshland, Jr., Protein Sci.,
DeLisi, J. Mol. Biol., 267, 707 (1997).
282. P. H . Hunenberger, V . Helms, N . Narayana, 306. S. Miyazawa and R. L. Jernigan, Macromole-
S. S. Taylor, and J. A. McCammon, Biochemis- cules, 18,534 (1985).
try, 38,2358 (1999). 307. R. D. Head, M. L. Smythe, T . I. Oprea, C. L.
283. D. A. Pearlman and P. S. Charifson, J. Med. Waller, S. M. Green, and G. R. Marshall, J. Am.
Chem., 44, 502 (2001). Chem. Soc., 118,3959 (1996).
284. D. A. Pearlman, J. Med. Chem., 42, 4313 308. E. C. Meng, I. D. Kuntz, D. J. Abraham, and
G. E. Kellogg, J. Cornput.-Aided Mol. Des., 8,
299 (1994).
285. J. Bostrom, P. 0. Norrby, and T . Liljefors,
J. Cornput.-Aided Mol. Des., 12,383 (1998). 309. H. Gohlke and G. Klebe, Curr. Opin. Struct.
Biol., 11,231 (2001).
286. M. Vieth, J. D. Hirst, and C. L. Brooks, 3rd,
J. Cornput.-Aided Mol. Des., 12, 563 (1998). 310. M. J. Sippl, J. ComputAided Mol. Des., 7,473
(1993).
287. M. Stahl and M. Rarey, J. Med. Chem., 44,
311. G.Verkhivker, K. Appelt, S. T . Freer, and J. E.
288. I. D. Kuntz, K. Chen, K. A. Sharp, and P. A. Villafranca,Protein Eng., 8,677 (1995).
Kollman, Proc. Natl. Acad. Sci. USA, 96,9997 312. A. Wallqvist, R. L. Jernigan, and D. G. Covell,
Protein Sci., 4, 1881 (1995).
289. J. D. Hirst, Curr. Opin. Drug Disc. Dev., 1,28 313. R. S. DeWitte and E. I. Shakhnovich, J. Am.
Chem. Soc., 118,11733 (1996).
Docking and Scoring Functions/Virtual Screening
Table 8.2 Comparative EST Counts for Five Genes Sequenced from Normal Prostate, Stage
B2 Cancer, Stage C Cancer, and Benign Prostatic Hyperplasia (BPH) cDNA Libraries
Normal Stage B2 Cancer Stage C Cancer BPH
Prostate All Other
Gene Total Tags P Tags P Tags P Tissue
PSA 13 7 0.7-0.8 14 0.6-0.7 22 0.8-0.9 0
PAP 4 1 0.1-0.2 34 B0.999 9 0.7-0.8 1
HGK 1 7 >0.999 6 0.97-0.98 5 0.8-0.9 0
PSI 0 3 0.993-0.994 7 0.997-0.998 1 0.4-0.5 0
PS2 0 2 0.97-0.98 7 0.997-0.998 0 0-<0.1 0
Total clones 4500 1400 3400 4800 732,000
The tag counts are from Ref. 21. The P values are calculated according to Equation 8.1, modified for use with different
total EST counts from the source libraries. The web URL http://igs-server.enrs-mrs.fr/-audic/e~-bid~nflat.pl was used to
calculate the probability intervals. A P value nearer to 1 indicates that the differential expression is likely to be significant.
While prostate specific antigen (PSA) and glandular kallikrein (HGK) have been proposed as prostate cancer markers, both
PSI and PS2 are prostate specific. Thus, the down-regulation of PAP in stage B2 cancer is not significant using this test,
whereas, the test shows its up-regulation in the BPH sample to be more significant. So, for lower changes in copy number,
where more sensitivity is expected, this test of significance is a valuable tool.
overall profiles obtained from tag counting ex- ues for the probability of certain genes ex-
periments could be performed using the tradi- pressed at different levels in normal prostate,
tional 2 test. However, this is the wrong ap- stage B2 cancer, stage C cancer, and tissue
proach for experiments where the significance from a benign prostatic hyperplasia (BPH)
of differences between expression levels (i.e., sample as shown in Table 8.2.
tag counts) of individual genes is to be deter- The relationship between gene expression,
mined, for example, in diseased and normal mRNA level, and protein expression is com-
tissue states (19). One of the issues in per- plex and not one that can be gleaned from col-
forming tag-sampling experiments is that the lecting copy number information in this type
experiments themselves are usually not repli-
cated. Thus, the dispersion of results cannot
of experiment. Even with careful statistical .
analysis, such as that described above, the as-
be used to estimate the SEs associated with sumption that increases or decreases in copy
each expression measurement. This elimi- number reflect real biologically significant
nates the possibility of using standard tests of events relies on the confidence with which we
variance. Instead the Poisson distribution, can compare a library made from one set of
which includes an implicit estimate of stan- cells to a library made from a different set of
dard error, approximates random sampling of cells. Thus, most transcript analysis experi-
tags very well. Audic and Claverie (20) have ments setting out to be quantitative end up
proposed a significance test (see Equation 8.1) simply as target identification exercises. A ma-
in which the sample size plays no part, so long jor goal of proteomics is to generate a factory-
as it is the same for both experiments, but only type approach to profiling protein level expres-
depends on the observed tag counts of the sion that more closely reflects the biological
same gene from diseased, g,, and normal, g,, reality. The EST approach has been turned
states: into an industrial scale process but has not
been able to impact the drug discovery process
significantly because of the biological lirnita-
tions described and the lack of sound mathe-
matical modeling of the whole process.
The equation has also been extended to cover Expression experiments are measures of
the more practical case of different total num- cell population averages, not the contents of
bers of tags. Thus, taking some data from Fan- individual cells, so it is important to consider
non (21) as an example, we can calculate val- to what extent all cells in the candidate popu-
Bioinformatics: Its Role in Drug Discovery
ferent training and background and thus offer attempting to answer biological questions.
a range of opinions on similar pieces of infor- They also stress the importance of organizing
mation. Even the word "activity" means dif- and understanding biological data, rather
ferent things to a chemist who has synthesized than linking these aspects strongly to specific
a group of compounds or a biologist who has hardware or software implementations. Use of
developed an assay to test the compounds. computers may be involved in the process but
Both aspects are necessary for the discovery of the definitions are not limited by the applica-
new drugs, but they are different viewpoints tion of any particular technology.
that need to be supported by appropriate rela- Bioinformatics has also been defined as the
tionship mining in the data. If the bioinfor- application of computer technology to solving
matics job is done well, both views can be ac- biological problems. This definition, perhaps
commodated in the data structures and user what some would consider to be the canonid
interfaces used by both sets of users. one, is broad but restricts the scope of the def-
Throughout the pharmaceutical industry, inition to problems to which computer tech-
bioinformatics and chemoinformatics groups nology can be applied.
are working closer together than has been the
case hitherto. This is a consequence of the re- 3.2 Integration of Information
alization that managing data effectively re-
Bioinformatics has become a byword for inte-
quires integration of thinking (about defini-
gration; specifically the integration of data
tions of common attributes of molecules both
across different data resources to generate in-
small and large), integration of processes, and
tegrated information resources. Linking data
integration of implementation. The recent
and information in this way is fundamental to
rise in popularity in bioinformatics of the on-
bioinformatics activities and so some discus-
tology is an example of the application of a
sion of the meaning of data, information, and
computer science paradigm to the issue of re-
knowledge in the context of bioinformatics for
dundancy in nomenclature in many areas of
drug discovery is provided in Section 6. Inte-
biology. Application across the chemistry-bi-
gration is important because it provides con-
ology domain interface could well be beneficial
text, or at least a background, against which
for drug discovery effectiveness. The ontology
computational analyses are performed. In the
is simply a means to an end, in this instance,
past, for single molecule experiments, th5s
that end is improved communication and un-
background was achieved through reading the
derstanding of basic concepts within and
literature. Now that multiple molecule exper-
across the boundaries of major scientific disci-
iments are common, even genome-wide or in-
plines. There may, of course, be a variety of
ter-genome analyses, it is simply not practical
other means to reach that goal.
any longer to rely on the literature in its raw
form, unless it is part of an integrated knowl-
3 WHAT IS BIOINFORMATICS? edge-based approach that provides connec-
tions between disparate pieces of information,
3.1 Definitions backed up by experimental evidence from
which to draw conclusions (12).
Concisely, bioinformatics is our ability to or-
ganize biological data. From another perspec-
3.3 Bioinformatics and Skills
tive, bioinformatics is our ability to under-
stand how biological information is organized. The pursuit of bioinformatics involves a num-
From this understanding should spring an en- ber of different skills. Organizing, storing, re-
hanced view of the interactions between bio- trieving, and querying sets of biological data
logical molecules. This should, in turn, inform are techniques that lie at the heart of the sub-
our search strategies for new small molecules ject. An ability to analyze the characteristics of
that will modulate the behaviour of biological particular sets of biological data is fundamen-
molecules to give a beneficial therapeutic ef- tal. The translation of those characteristics
fect. These definitions arise from observation into electronic representations that can be or-
of the way diverse skills are brought to bear in ganized on a large scale is the domain of the
3 What is Bioinformatics?
bioinformatics software and database devel- pact. Technologies will be described to the ex-
oper. The process of analyzing and under- tent that such understanding is necessary to
standing biological data using the tools avail- grasp the relevance of the data being gener-
able is the domain of the bioinforrnatics ated and its significance.
analyst. When new tools are in the course of
development, substantial interaction between
the two skill sets is essential. 3.4 Standardization
In the pharmaceutical environment, both Progress in linking items of relevant data and
developer and analyst skills are necessary. generating integrated information resources
This is so even where commercial software is would be very limited were it not for efforts in
in use, because there is no single system avail- standardization that have been brought about
able commercially that provides the level of by international collaboration. There is still a
integration between the worlds of bio- and long way to go, however. While it is becoming
chemoinformatics necessary to effectively en- cheaper to obtain each piece of individual data,
hance the drug discovery process. Some inter-
the proportion of automated experiments is
facing of different systems is required and the
increasing, at least in the life sciences, because
warehousing of proprietary data is always an
of the ready availability of new technologies. It
issue.
This broad description of bioinformatics may seem a simple matter to create resources
and of the two types of bioinformatics scientist that store and manage streams of DNA
is quite abstract. It does not detail the charac- bases-represented by the four alphabetic
teristics of the data with which the bioinfor- characters A, C, T, and G. However, when we
matics scientist has to work. Neither does it also wish to integrate information on experi-
define the set of tools that the developer mentally or computationally determined
should work with or implement. There was a annotation and cross-reference to other re-
time, in the late 1980s and early 1990s, when sources using gene names, there are signifi-
the type of data was well defined. Molecular cant problems. The literature abounds with
sequence data, the stream of bases in DNA, synonyms for gene names and functions; even
and the stream of residues at the protein level the labels given to specific cellular functions
were the main types of data. Programmers de-
veloped code in FORTRAN or C and scripting
are not always clearly defined (13). .
To be able to process data automatically, it
languages were immature. has to be presented in a form that can be
Now, as science moves forward into a new parsed by a computer program and must also
millennium, additional types of data have be- include all the elements necessary to an un-
come important; for example, protein-protein derstanding of the biological system under
interactions and three-dimensional structure, study. Reliable information systems should
high density gene expression chips, cell imag- have source data of a consistently high quality
ing, etc. Developers have a wide range of tools to prevent application errors and enable inte-
to call on, including high performance C and gration into other biological information sys-
Ctt compilers, rich scripting languages (Perl, tems. Some progress is now being made to-
Python, etc.), and efficient, easily accessible wards consensus in gene naming through the
operating systems (particularly Linux) that work of the HUGO Gene Nomenclature Com-
make porting software to different hardware mittee (see http://www.gene.ucl.ac.uk/nomen-
platforms less of an issue than it was. clature/). Many researchers now use this sys-
Of equal importance to the medicinal tem as a source of unique gene names and
chemist, to whom this review is principally di- descriptions in the published literature (14)
rected, is the impact of bioinformatics on the and in commercial products (e.g., see http://
discovery of new medicines. Rather than ex- www.biowisdom.com/). Standardizing vocab-
plain comprehensively all the popular tools ulary expressing the relationships between
and their underlying algorithms, this review the complex network of gene functions is the
focuses on the points in the discovery research work of the Gene Ontology (GO) project (see
process where bioinformatics is making an im- http://www.geneontology.org/).
Bioinformatics: Its Role in Drug Discovery
ure 8.1. Bar charts indicating the growth of GenBank from December 1992 to August 2001 in
ns of (a) bases and (b) sequence entries. The release files indicate no release in February 1999. It
vident from the trends in both charts that while there has been explosive growth, particularly
n December 1999 until about August 2000, growth is slowing. The base entry curve is showing a
inctly sigmoid shape.
Bioinformatics: Its Role in Drug Discovery
Figure 8.2. A schematic illustrating the bioinformatics process required to create an online gene
index by collating data and then integrating related elements to generate value added information
through hyperlinking to online resources. Determination of phylogenetic relationships is a relatively
late stage in the process. EST analysis is only performed after phylogenetic relationships have been
determined because EST data does not cover the whole expressed sequence and may not therefore
cover regions that were included in the phylogenetic determination. This is a knowledge generation
phase because it is allowing placement of potential new targets within the context of a carefully
researched phylogenetic tree. Transfer of knowledge is intimately related to the environment in
which the results of analysis are made available, in this case as an online resource.
of genes themselves from genomic sequence is tein level has been a crucial tool for assessing
itself a non-trivial matter, especially where the significance of specific classes of cells as
those genes are interrupted by non-coding re- targets. With the advent of fluorescence-based
gions (introns) and control regions (expres- sequencing techniques and automated se-
sion promoter sites). Functional genomics is quencing technology, it is now much quicker
the Process of creating an understanding of to generate sequence data on specific molecu-
the way genomes f~nctionthrough gene ex- lar targets than ever. Many researchers spend
pression. Genes are expressed by a variety of entire careers working on one target type or a
mechanisms, not all of which are fully under- restricted part of a target gene family. hi^
stood. We can, however, make some measure- approach has yielded many valuable targets
ments of the results of gene expression at the for drug discovery. With the new technologies
transcript level, mRNA, and at the protein of molecular biology, it is now possible to sur-
level. Several of the techniques that have been in a variety of contexts; perhaps
used assist drug target are pre- ey targets
vwithin different types of cells, cells treated
sented in the following sections. with different agents, or even across entire ge-
nomes using chip technologies.
4.2 Expression Profiling for Target Discovery
There are issues of interpretation of exper-
A
Bioinformatics spans analysis in depth on imental design and results. Does mRNA ex-
small quantities of data through to expansive pression mean anything at a quantitative
genomic scale analyses, which may be at a level? Perhaps even a qualitative view of
lesser level of detail. Historically, the expres- mRNA expression can be misleading. How is
sion of genes at the mRNA level or at the pro- mRNA expression correlated with protein ex-
4 Bioinformatics and Target Discovery 341
Sense strand
genomic DNA L/
Transcription
Genome
J.
5' UTR Coding Sequence 3' UTR
mRNA Transcriptome
Translation
Protein Proteome
Figure 8.3. A schematic illustration of the relationships between levels of genomic information.
Genomic DNA is contained in the nucleus of eukaryotic cells. In many species, including humans,
information required to make up the coding sequence of a gene is split into exons (regions that are
expressed) interrupted by introns (regions that are not expressed and are edited out of the message
at the transcription step). At either end of the gene sequence are untranslated regions (UTRs). 5' and
3' refer to the orientation of the strand of DNA as defined by the sugar-phosphate backbone. The
mRNA is the messenger RNA molecule generated by the process of transcription, which is itself
mediated by a number of enzymes. The collection of mRNA transcripts that make up the mRNA
expression profile of a cell is known as the transcriptome, although the term could also refer to the
total possible mRNA transcripts achievable from a genome. Finally, translation of the mRNA occurs
on the ribosome and protein sequence is produced, which folds into its final three-dimensional
shape-a process that may be assisted by a number of different chaperone proteins. Any post-
translational modifications are all part of the proteome, the collection of proteins that represent the
expressed genome.
pression? In general, most drugs we discover contain 1-3 million clones. The library itself is
are likely to interact with proteins and not created from mRNA extracted from tissue or
mRNA, so some understanding of protein ex- refined cell populations. By making a random
pression is an essential adjunct to our genomic selection of several thousand clones from a
knowledge. Hence the proteomics approaches cDNA library it is possible by sequencing
described later. Our exploration of expression ESTs to generate a rapid, if somewhat low res-
profiling begins with a study of mRNA tran- olution, survey of the types of genes repre-
script profiling using expressed sequence tags sented by the library. The library in turn re-
because this technique has led to rapid gene flects the composition of genes that are
discovery that has, in turn, been able to assist expressed in the tissue or cell line from which
with the annotation of genomic sequences. it was constructed. Thus, we have a qualita-
Then, we consider how whole genome expres- tive link between gene expression, at the
sion profiles can provide a rich new source of mRNA level, and the sequence level analysis
data for bioinformatics analysis. required for target identification, without the
need to go through the full sequencing and
4.2.1 EST Profiling. An EST is a short, sin- validation process across the whole length of
gle sequence run collecting data over about each clone. This is a very significant time and
200-400 bases from a clone selected from a cost saving. One of the major issues of EST
cDNA library. Typically, cDNA clone libraries profiling has been the significance that can be
Bioinformatics: Its Role in Drug Discovery
ascribed to expression levels through counting development. It can also be seen that this pro-
copies of ESTs. This issue is dealt with in some cess is much lengthier than taking a single
detail in Section 4.2.4. sequence read (a single oligonucleotide string)
without correcting errors or considering cov-
4.2.2 Sequence Assembly for cDNA Clon- erage of the complete gene sequence.
ing. To appreciate fully the speed advantage
of sequencing tags, rather than fully validat- 4.2.3 Comparing ESTs with Databases.
ing the sequence of an entire clone of a gene, it Bioinformatics provides the tools necessary to
is useful to step through a brief description of compare each EST with the databases of
the cloning process from the point of view of a known genes and a hypothetical functional as-
practitioner of bioinformatics. signment may be made to a proportion (typi-
Sequence assembly is the process of dealing cally 40-50%) of all the ESTs from a sequenc-
with the bioinformatics of cloning and genomic ing run.
sequencing (16, 17). When a gene is cloned, it In this way a rich resource of tags for many
is selected from a set of potential clones in a clones from many diverse libraries has been
cDNA library. The gene is present as a piece of built up in the public domain and in commer-
cDNA inserted into a cloning vector (a piece of cially available, proprietary databases. One
circular DNA) that has been designed for the particular approach that generated much in-
purpose of cloning. It is necessary to check terest in the 1990s was that advocated by In-
that the cDNA indeed represents the sequence cyte. Here, the simple identification of a gene
of the gene that has been cloned. To do this, expressed through identification of its EST
DNA oligonucleotides are designed that will was not the primary goal. Instead, the ap-
bind in a complementary fashion (hybridize) proach was based on comparative transcript
to the DNA of the cloning vector and also at expression (the so-called "digital Northern").
150- to 200-base intervals along the cDNA it- Here the number of copies of each EST iden-
self. These oligonucleotides are then extended tified was calculated, giving counts for the
by adding a base that is complementary to the numbers of each type of EST found in compar-
cDNA insert by using a DNA polymerase. Dif- ing normal with diseased tissue, for example.
ferent polymerasesare available commercially Subsequent techniques have focussed on more
that provide high fidelity reproduction of the controlled experiments in which specific cell
cDNA insert. In fluorescence-based sequenc- lines are treated with an agent and the expreH-
ing, a small proportion of the nucleotides sion of genes before administration is com-
available to the polymerase are fluorescent an- pared with the profile afterward. This is the
alogues. Incorporation of one of these into the basis of pharmacogenomics (18).
oligonucleotide terminates the extension, re-
sulting in a population of oligonucleotides of 4.2.4 Statistics for Assessing Expression
different lengths. These are separated by elec- Level Significance. There are issues with ap-
trophoresis and the sequence determined. proaches based on counting the number of
When the sequence of each oligonucleotide copies of an EST observed in the output from a
has been determined, the strings of letters sequencing machine. First, the tissue or cell
that represent the bases are assembled to- line must be of very high quality and the
gether to generate a full-length sequence of mRNA harvested in a timely manner because
the cDNA that has actually been cloned. Er- it degrades very quickly. Second, the process
rors in the base sequence can be resolved at
+
of preparing the cDNA library should enable
this stage, and if necessary, mutagenesis ex- the numbers of clones to be estimated as accu-
periments designed to correct any mistakes. rately as possible. Third, the random sampling
The bioinformatics process is intimately for the sequencing runs must be controlled
linked with the molecular biology techniques carefully so as not to introduce bias into the
of cloning and sequencing. For target discov- experiment. The mathematical model for eval-
ery, a very high degree of confidence in the uating the meaning of data from such experi-
sequence of the cDNA clone is required before ments is not well worked out.
the clone can be expressed and used in assay Comparison of the differences between the
4 Bioinformatics and Target Discovery 343
Table 8.2 Comparative EST Counts for Five Genes Sequenced from Normal Prostate, Stage
B2 Cancer, Stage C Cancer, and Benign Prostatic Hyperplasia (BPH) cDNA Libraries
Normal Stage B2 Cancer Stage C Cancer BPH
Prostate All Other
Gene Total Tags P Tags P Tags P Tissue
PSA 13 7 0.7-0.8 14 0.6-0.7 22 0.8-0.9 0
PAP 4 1 0.1-0.2 34 B0.999 9 0.7-0.8 1
HGK 1 7 >0.999 6 0.97-0.98 5 0.8-0.9 0
PSI 0 3 0.993-0.994 7 0.997-0.998 1 0.4-0.5 0
PS2 0 2 0.97-0.98 7 0.997-0.998 0 0-<0.1 0
Total clones 4500 1400 3400 4800 732,000
The tag counts are from Ref. 21. The P values are calculated according to Equation 8.1, modified for use with different
total EST counts from the source libraries. The web URL http://igs-server.enrs-mrs.fr/-audic/e~-bid~nflat.pl was used to
calculate the probability intervals. A P value nearer to 1 indicates that the differential expression is likely to be significant.
While prostate specific antigen (PSA) and glandular kallikrein (HGK) have been proposed as prostate cancer markers, both
PSI and PS2 are prostate specific. Thus, the down-regulation of PAP in stage B2 cancer is not significant using this test,
whereas, the test shows its up-regulation in the BPH sample to be more significant. So, for lower changes in copy number,
where more sensitivity is expected, this test of significance is a valuable tool.
overall profiles obtained from tag counting ex- ues for the probability of certain genes ex-
periments could be performed using the tradi- pressed at different levels in normal prostate,
tional 2 test. However, this is the wrong ap- stage B2 cancer, stage C cancer, and tissue
proach for experiments where the significance from a benign prostatic hyperplasia (BPH)
of differences between expression levels (i.e., sample as shown in Table 8.2.
tag counts) of individual genes is to be deter- The relationship between gene expression,
mined, for example, in diseased and normal mRNA level, and protein expression is com-
tissue states (19). One of the issues in per- plex and not one that can be gleaned from col-
forming tag-sampling experiments is that the lecting copy number information in this type
experiments themselves are usually not repli-
cated. Thus, the dispersion of results cannot
of experiment. Even with careful statistical .
analysis, such as that described above, the as-
be used to estimate the SEs associated with sumption that increases or decreases in copy
each expression measurement. This elimi- number reflect real biologically significant
nates the possibility of using standard tests of events relies on the confidence with which we
variance. Instead the Poisson distribution, can compare a library made from one set of
which includes an implicit estimate of stan- cells to a library made from a different set of
dard error, approximates random sampling of cells. Thus, most transcript analysis experi-
tags very well. Audic and Claverie (20) have ments setting out to be quantitative end up
proposed a significance test (see Equation 8.1) simply as target identification exercises. A ma-
in which the sample size plays no part, so long jor goal of proteomics is to generate a factory-
as it is the same for both experiments, but only type approach to profiling protein level expres-
depends on the observed tag counts of the sion that more closely reflects the biological
same gene from diseased, g,, and normal, g,, reality. The EST approach has been turned
states: into an industrial scale process but has not
been able to impact the drug discovery process
significantly because of the biological lirnita-
tions described and the lack of sound mathe-
matical modeling of the whole process.
The equation has also been extended to cover Expression experiments are measures of
the more practical case of different total num- cell population averages, not the contents of
bers of tags. Thus, taking some data from Fan- individual cells, so it is important to consider
non (21) as an example, we can calculate val- to what extent all cells in the candidate popu-
344 Bioinformatics: Its Role in Drug Discovery
Table 8.3 Brief Descriptions of Three Technologies for Genomic Scale Transcript Proilling
Expression
Profiling
Technology Brief Description Form of Data Generated
cDNA array chip Tens of thousands of cDNA clones of genes are placed Fluorescence intensities and
onto a glass slide in a grid formation. Hybridisation colours for each spot on
of molecular probes (RNA extracts) to the clones is the chip. The nature of
detected using a fluorescence system. By using two the clones on the chip is
sets of probes, labelled with differently coloured known.
fluorescent dyes, it is possible to assess expression
differences.
High-density Arrays of oligonucleotides are synthesised directly An image of the entire chip
oligonucleotide onto the glass chip using special chemistries and is processed using
arrays light sensitive masking. This generates arrays of specialised chip scanning
known sequences of fixed length. Probes are software.
hybridised to the arrays and computational
analysis is necessary to interpret the resulting
patterns.
Serial analysis A sequence-based approach to the identification of Sequence data for SAGE
of gene differentially expressed genes through comparative tags allows profiling of
expression analysis. Allows simultaneous analysis of sequences gene expression.
that derive from different cell populations or
tissues. This is not a chip-based method.
Identification of sequences relies on completeness
of public sequence databases and, therefore, can
only be used to analyse known genes.
lation are in the same state (22). Whereas ber of organisms, including Homo sapiens, the
work in single-celled organisms may be more flowering plant Arabidopsis thaliana, the sin-
straightforward to control, work in multi-cel- gle celled yeast Saccharomyces cerevisiae, and
lular organisms has the added complexity that a large number of bacteria. The analysis of the
expression measurements may involve contri- sequence data then becomes the issue. It is Go
butions from cells derived from a variety of trivial task even to locate the positions of all
tissues. Furthermore, when taking into con- the genes in the human genome. Genes for
sideration mRNA copy number, it should be which there are no homologs in the current
understood that absolute transcript abun- sequence databases will take some time to elu-
dance measurements do not completely mea- cidate. See Ref. 23 for a detailed analysis of
sure mRNA concentration. this topic and then Refs. 24 and 25 for detailed
Although there was initially some concern studies on the human genomic sequence.
that the use of ESTs was a shortcut to discov- The three basic technologies for generation
ery of genes for the purposes of patenting and of genome-wide expression information are
ring-fencing areas of research for profit, in cDNA microarrays, high-density oligonucleo-
fact, the substantial numbers of quality ESTs tide arrays ("GeneChipsV),and serial analysis
in the public domain have helped in the pin- of gene expression (SAGE) (22). These tech-
pointing of genes in genomic data and have nologies are outlined in Table 8.3.
contributed to the speed with which the hu- In terms of quantities of data, a single mi-
man genome sequence was completed. croarray experiment looking at 40,000 genes
from 10 different samples, under 20 different
4.2.5 Genome-Wide Expression Analysis. A conditions, produces at least 8,000,000 pieces
major step towards understanding how organ- of data (26). Chip technologies, though origi-
isms work is the determination of the com- nally expensive because of the costs of chip
plete sequence of all genes in the genome. This fabrication, are now being used to contribute
remarkable goal has been achieved for a num- data to public domain databases and are
. 5 Databases, Tools, and Applications
EBI solely to perform this task. A computa- search and selecting the appropriate search
tionally annotated supplement, TrEMBL (8), method, followed by insight and experience in
has been made available to make up for this assessing the meaning of the results of the
deficiency. Nevertheless, computer annota- search. A search query with a single previ-
tion still has some way to go before it comes ously known sequence is likely to return not
close to the level of competence of skilled hu- only the match with itself but also a host of
man annotators. This is an area of active re- other matches at varying levels of similarity
search (30). with the query sequence. This extra informa-
Nevertheless, with the rapid generation of tion can be very valuable in placing the query
sequence data from genome scale experiments sequence in the context of many closely re-
more effective means of characterizing pro- lated sequences that make up the family of
tein sequences and annotation are now re- genes to which the query belongs. More dis-
quired. The database has responded by im- tantly related sequence matches can poten-
proving labeling of annotation in both SWISS- tially indicate genes with similar function,
PROT and TrEMBL and by adding more even if the match is relatively short and of low
advanced and rigorous tagging of evidence for score.
functional statements that have been made The experienced analyst should be able to
(31). sort the significant matches from the uninter-
Whereas most patent sequences are avail- esting ones. Often, this type of experience is
able in the public domain for use in research difficult, if not impossible with current tech-
and for commercial exdoitation, there is a nologies, to capture in a computer program.
substantial body that are the subject of patent Rules that seem to work under some circum-
protection. It is often useful when conducting stances produce nonsensical results in others.
searches of sequence databases to be aware of As a result, many of the techniques used for
the sequences that are patented because this current sequence comparison engines are heu-
may imply certain restrictions on the use to ristic rather than strictly algorithmic, that is,
which these sequences can be put in a com- the rules that are implemented as part of the
mercial context. The commercial repository is process for returning significant hits from the
maintained by Derwent (Thomson Scientific), query database tend to produce the correct re-
which generates the Geneseq database of pat- sult but cannot be guaranteed to do so in all
ented sequences. This is a useful collection be- circumstances. For a fuller discussion of algo-
cause it contains a broad historical collection rithms and heuristics, albeit outside the con-
as well as more recent examples, although the text of bioinformatics, see Ref. 32.
terms for a commercial license to use the da- One of the key aspects of sequence compar-
tabase may be off-putting to some potential ison is the understanding of similarity when
users. There are also patent sections of Gen- applied to molecular sequences. There are es-
BanWEMBL DNA databases too. but these are sentially two ways of considering this: simple
of limited value because they contain only residue identity and residue substitution. In
more recent sequence data. this discussion, we consider the comparison of
two protein sequences, but the process is the
5.2 Sequence Comparison
same for comparison of DNA or RNA se-
When dealing with the output of most experi- quences. The alphabet used in the comparison
ments in target discovery the question "has is just different because it is 20 for protein
this gene been seen before?" arises. The an- sequences and 4 for DNA and RNA. By com-
swer is, at first sight, straightforward: Com- paring residues at the same position in each
pare the sequence obtained from the experi- sequence and counting up the number of iden-
mental output with all the known sequences tities we arrive at, a score that can be ex-
and print the result. pressed as a percentage match for the pair of
Sequence comparison makes up a major sequences. The alternative method compares
part of the work of the bioinformatics analyst. each pair of residues and looks up a score for
It demands skill in operating the tools; for ex- that pair in a substitution table or scoring ma-
ample, choosing the appropriate databases to trix. The summed score across the whole se-
5 Databases, Tools, and Applications
quence length can again be expressed as a per- FASTA (36). Such methods are readily imple-
centage match. The two sequences under mented on standard computer hardware and
comparison are, however, likely to be suffi- thus are accessible as Internet resources or as
ciently different that equivalent residue posi- local implementations on UNIX or Linux serv-
tions are not in register when the two se- ers.
quences are laid out, one on top of the other. In The most popular tool currently in use is
this situation, the sequences must be aligned BLAST (Basic Local Alignment Search Tool)
with each other so that equivalent residue po- (37) from the NCBI. BLAST is an example of a
sitions are in register to make the score mean- heuristic that attempts to optimize a specific
ingful. This may involve insertion of gaps into similarity measure. The most recent revisions
one or both sequences. The skill here is to cre- to the algorithm are gapped BLAST and PSI-
ate an alignment between the two sequences BLAST (38), with improved accuracy for PSI-
that reflects some biological reality; it is from BLAST using composition-based statistics
this biological reality that we derive the notion (39).
of equivalent residue positions. These posi- 5.3 Phylogenomics and Gene Family
tions can be deduced from manual manipula-
Databases
tion of the alignment on the basis of mutation
data or other functional information using a Determining protein function from genomic
suitable sequence editor (33), or perhaps from sequences is a central goal of bioinformatics
understanding the spatial layout of residues if (40),and to achieve this goal, comparing single
structural data is available. In each of these sequences against databases of DNA or pro-
cases, the resulting sequence alignment will tein sequences is a necessary bioinformatics
reflect the manner in which equivalent resi- skill. However, many such searches have al-
due positions have been determined-both ready been carried out, and the results are
methods have their place. A variety of meth- available to analyze at a higher level of ab-
ods have been developed for comparing pairs straction in the protein and gene family data-
of sequences, including the basic classical bases (9,10, 14,41-43). It is the relationships
methods of Needleman and Wunsch (34) and between sequences that form the basis of any
Smith and Waterman (35). gene family database. Many of the current da-
Extending these pairwise comparison tabases did not set out to become gene family
methods to database searching has been car- databases. However, application of the under-
ried out, and a plethora of hybrid methods and lying methodology for defining gene families
improvements have been made. The manner (whether based on blocks of conserved se-
in which significant alignments are reported quence alignment or on profiles representing
varies from implementation to implementa- entire sequences, or simple regular expres-
tion. Database searching by alignment in this sions) has resulted in a number of resources
way is computationally intensive and special- that are particularly valuable in placing drug
ized computer hardware is often used to gain discovery targets in their biological context.
speed increases. Because the comparison of The processes of evolution by natural selec-
pairs of sequence takes place in an exhaustive tion imply that species are related to each
manner, these types of database searching other in a tree-structured hierarchy; but more
methods are considered to be the most sensi- than this, the history of sequence relation-
tive. More modern methods of database ships during evolution is also significant. Or-
searching look for shorter matches spread ganisms are defined by their genes, and their
over the lengths of the query and database behavior is modified through environmental
sequences, and then extend these matches un- experience. The relationships between genes
til the score for the match falls below a thresh- within a single organism indicate that genes
old level. Lists of sequence matches returned and their protein products also fall into well-
are then aligned using a pairwise alignment defined families. Protein phylogenetic profil-
technique to provide a match and score over ing (40) and phylogenomic analysis (44) are
the whole length of the comparison sequences. methods that are valuable where functional
For an example of this type of approach, see assignment by sequence similarity alone is
Bioinformatics: Its Role in Drug Discovery
251 300
pdela-human KLHYRWTMAL MEEFFLQGDK EAELGLP.FS PLCDRKSTM. VAQSQIGFID
pdelb-human LVHSRWTKAL MEEFFRQGDK EAELGLP.FS PLCDRTSTL. VAQSQIGFID
pde 1c-human DLHHRWTMSL LEEFFRQGDR EAELGLP.FS PLCDRKSTM. VAQSQVGFID
pde2a-human KTTRKIAELI YKEFFSQGDL E.KAMGNRPM EMMDREKA.Y IPELQISFME
pde3a-human ELHLQWTDGI VNEFYEQGDE EASLGLP.IS PFMDR.SAPQ LANLQESFIS
pde3b-human DLHLKWTEGI VNEFYEQGDE EANLGLP.IS PFMDR.SSPQ LAKLQESFIT
pde4a-human ELYRQWTDRI MAEFFQQGDR ERERGME.IS PMCDKHTAS. VEKSQVGFID
pde4b-human ELYRQWTDRI MEEFFQQGDK ERERGME.IS PMCDKHTAS. VEKSQVGFID
pde4c-human PLYRQWTDRI MAEFFQQGDR ERESGLD.IS PMCDKHTAS. VEKSQVGFID
pde4d-human QLYRQWTDRI MEEFFRQGDR ERERGME.IS PMCDKHNAS. VEKSQVGFID
pde5a-human PIQQRIAELV ATEFFDQGDR ERKELNIEPT DLMNREKKNK IPSMQVGFID
pde6a-human EVQSQVALLV AAEFWEQGDL ERTVLQQNPI PMMDRNKADE LPKLQVGFID
pde6b-human EVQSKVALLV AAEFWEQGDL ERTVLDQQPI PMMDRNKAAE LPKLQVGFID
pde6c-human EVQSQVALMV ANEFWEQGDL ERTVLQQQPI PMMDRNKRDE LPKLQVGFID
pde7a-human ELSKQWSEKV TEEFFHQGDI EKKYHLG.VS PLCDRHTES. IANIQIGFMT
pde7b-human EMSKQWSERV CEEFYRQGEL EQKFELE.IS PLCNQQKDS. IPSIQIGFMS
QYCIEWAARI SEEYFSQTDE EKQQGLPVVM PVFDRNTCS. IPKSQISFID
DLCIEWAGRI SEEYFAQTDE EKRQGLPV'VM PVFDRNTCS. IPKSQISFID
EVAEPWVDCL LEEYFMQSDR EKSEGLP.VA PFMDRDKVT. KATAQIGFIK
PVTKLTANDI YAEFWAEGD. EMKKLGIQPI PMMDRDKKDE VPQGQLGFYN
EISRQVAELV TSEFFEQGDR ERLELKLTPS AIFDRNRKDE LPRLQLEWID
Figure 8.5. Part of an alignment of catalytic domains of the human phosphodiesterase gene family.
Positions in the alignment where gaps have been introduced into a sequence to bring it into align-
ment with other sequences are indicated by "." characters.
family using all the sequence information be used to identify modules of functional se-
available. The principal value of this resource quence across different gene families.
is that it presents patterns for recognition of One of the issues in using different data-
gene families that are relatively simple to un- bases of gene family information is that defi-
derstand. The downside is that the use of such nitions of which genes belong to which gene
patterns can produce both true positive hits families can vary depending on the method
(members correctly predicted) and false posi- used. Apweiler et al. have undertaken a useful
tive hits (members incorrectly predicted). effort at rationalizing and integrating family
PROSITE lists true and false positives for database annotation a t the EBI in the Inter-
searches performed in the production of a re- Pro resource (52). The databases that make up
lease of the database, but it is as well to be the membership of the InterPro consortium
aware that when the patterns are used in iso- are PROSITE (49), PRINTS (91, Pfam (531,
lation, there is often a false positive hit rate ProDom (54), and SMART (55). InterProScan
that must be taken into account by reconciling is a tool that enables scanning of individual
the results of a pattern search with the results protein sequences against the InterPro mem-
of database annotations or other pattern rec- ber databases (56).
ognition methods.
The PRINTS system (9, 10, 50, 51) is an
approach based on an examination of core re- 6 THE BlOlNFORMATlCS KNOWLEDGE
gions of un-gapped sequence conservation MODEL
within a set of aligned sequences (multiple se-
quence alignment). The method rigorously Up to this point, we have discussed sources of
builds up fingerprints for a gene family data and means of manipulating and compar-
through use of an iterative database searching ing data elements (in terms of sequences,
technique allied to intelligently applied se- alignments, gene families, etc.), but the end
quence alignment. The fingerprints them- point of all this analytical process must be the
selves can then be used to diagnose new gene acquisition of knowledge. It is through in-
family members in novel sequence data or can creased understanding that sound decisions
Bioinforrnatics: Its Role in Drug Discovery
can be made in applying the results of bioin- ment only comes later when the observer ac-
formatics analyses to application areas, such tually reads and understands the article.
as drug discovery. So, in this section we con- Compare this with the act of photocopying a
sider the relationships between data, informa- research article, a process that does not in it-
tion and knowledge, which are frequently re- self add to understanding on the part of either
garded as poor relations to laboratory-based the photocopier or of the researcher. The ac-
experimental data acquisition. However, as quisition of knowledge implies an active rela-
drug discovery organizations, including large tionship between author and recipient of the
pharmaceutical and smaller biotechnical com- information. In this, intuitive sense, we know
panies, develop a significant history of assays, that the hierarchical view works to some ex-
screens, and leads, it is vital to have strong tent as a model of the way in which some
internal support for managing data flows, in- knowledge is acquired.
tegrating related data into information sys-
6.3 The Scientific View
tems, and transforming knowledge thus
gleaned into tangible benefits. The second view is the scientific one (58).
Here, we start with the piece of information
6.1 Data, Information, and Knowledge that we are trying to understand, perhaps a
According to the University of California at gene whose function we plan to determine. Ex-
Berkeley (571, it has taken 300,000 years for periments are designed and performed to de-
humankind to accumulate 12 exabytes of termine the characteristics of the function of
data.2It will take just 2.5 more years to create the gene; such experiments yield data that de-
the next 12 exabytes. (An exabyte is scribe aspects of the information. Knowledge
1,000,000,000,000,000,000 bytes or a billion comes from understanding and interpreting
gigabytes.) This is a truly unimaginable the results of the experiments. Again, knowl-
amount of data, equivalent to the data stored edge is accumulated as part of an active rela-
on a pile of floppy disks 24 miles high. It is the tionship between the data describing the in-
rate of accumulation of data that is the key formation and the investigator reviewing the
point of interest, however, and the fact that it data and drawing conclusions about the state
is accelerating. of the information. Gene function is itself a
It is crucial to distinguish between the complicated concept because the functions of
terms data, information, and knowledge so gene products can rarely be assessed in isola-
that we can think clearly about the goal of data tion, owing to the network of interactions in
accumulation in our own industry sector. which most genes are involved. A collection of
There are two views: a tiered hierarchical view sequence data, collected at the DNA or protein
and a more formally correct scientific view. level, describe the molecular structure of a
gene or its product at a primary level-it is
6.2 The Hierarchical View not, however, a complete description. There
are other biochemical factors to be considered;
In the hierarchical view, data is the bottom for example, proteins that assist in the folding
rung of a ladder leading to the accumulation of process to create an active three-dimensional
information that leads, ultimately, to an in- molecule, post-translational modifications,
crease in knowledge. Apply this hierarchical glycosylation, interactions with other mole-
principal to an everyday example of taking cules to generate a higher-level function, etc.
this article to the photocopier: the data repre-
sented by the article is the sequence of strokes 6.4 Data is Not Knowledge
and dots on the page that make up the page Simply increasing the amount of data in the
image. The page image is the information rep- genomic universe does not necessarily in-
resented by the article. The knowledge ele- crease the speed of knowledge acquisition. In
short, data is not knowledge. Knowledge itself
In fact, the referred study uses the word "informa- requires understanding and demands the ac-
tion." However, within the usage of this article tive participation of the one acquiring the
"data" is a more appropriate term. knowledge.
6 The Bioinformatics Knowledge Model
lated) molecular functions. In fact the variety A neural network was trained to predict pro-
of types of fold taken up by polypeptides is tease function with 86% accuracy in a test set.
thought to be quite limited [SCOP (62) and Neural networks are an example of a tech-
CATH (63)l. Discovering a useful relationship nique used in bioinformatics for generating a
between folding topology and sequence, which predictive program from a set of weights that
can be used to predict folding accurately is, can be applied in a learning tool. The tool is
however, not trivial. By comparing the struc- trained by using parameters that show dis-
tures newly determined from structural crimination between, in this case, proteases
genomics initiatives with structures already and non-proteases. In this example, 36 pro-
deposited in the Protein Data Bank, it may be teases were tested. Each protease in turn was
~ossibleto extend the inference of molecular used as a test example, the network being
function further than that achieved from se- trained using the remaining 35 proteases. In
quence comparisons alone. Once the molecu- 31 of 36 cases (86%),the network was able to
lar function has been characterized in this identify the remaining protease. By perform-
computational way, we may begin to postulate ing the same test on 258 counter-examples,
the cellular function of the protein under anal- 87% were correctly classified as non-pro-
ysis. teases.
quence identity. The subjective impression is their targets rather than merely exploiting bi-
that structure prediction is getting better year ological assay systems as tools for drug discov-
after year. This analysis, however, seems to ery.
suggest there is some way to go before reliable
models can be generated for fold types not yet
available in the structural databases. 9 ACKNOWLEDGMENTS
. M. Gerstein, Nut. Struct. Biol., 7 Suppl, 960- 38. S. F. Altschul, T. L. Madden, A. A. Schaffer, J.
963 (2000). Zhang, Z. Zhang, W. Miller, and D. J. Lipman,
. A. Brazma, Bioinformatics, 17, 113-114 (2001). Nucleic Acids Res., 25, 3389-3402 (1997).
14. J. Packer, E. Conley, N. Castle, D. Wray, C. Jan- 39. A. A. Schaffer, L. Aravind, T. L. Madden, S. Sha-
uary, and L. Patmore, Trends Pharmacol. Sci., virin, J. L. Spouge, Y. I. Wolf, E. V. Koonin, and
21,327-329 (2000). S. F. Altschul, Nucleic Acids Res., 29, 2994-
3005 (2001).
15. B. Destenaves and F. Thomas, Curr. Opin.
Chem. Biol., 4,440-444 (2000). 40. M. Pellegrini, E. M. Marcotte, M. J. Thompson,
D. Eisenberg, and T. 0. Yeates, Proc. Natl.
16. R. Staden, K. F. Bed, and J . K. Bonfield, Meth-
Acad. Sci. USA, 96,4285-4288 (1999).
ods Mol. Biol., 132, 115-130 (2000).
41. E. L. Sonnhammer, S. R. Eddy, E. Birney, A.
17. R. Staden, D. P. Judge, and J. K. Bonfield,Meth-
Bateman, and R. Durbin, Nucleic Acids Res.,
ods Biochem. Anal., 43,303-322 (2001).
26,320-322 (1998).
18. D. S. Bailey, A. Bondar, and L. M. Furness,
42. E. L. Sonnhammer, S. R. Eddy, and R. Durbin,
Curr. Opin. Biotechnol., 9,595-601 (1998).
Proteins, 28, 405-420 (1997).
19. J. M. Claverie, Hum. Mol. Genet., 8, 1821-1832
43. J. G. Henikoff, E. A. Greene, S. Pietrokovski,
(1999).
and S. Henikoff, Nucleic Acids Res., 28, 228-
20. S. Audic and J. M. Claverie, Genome Res., 7, 230 (2000).
986-995 (1997).
44. J. A. Eisen, Genome Res., 8, 163-167 (1998).
21. M. R. Fannon, Trends Biotechnol., 14,294-298
45. G. Theissen, Nature, 415, 741 (2002).
(1996).
46. J. Felsenstein, Annu. Rev. Genet., 22, 521-565
22. M. Gerstein and R. Jansen, Curr. Opin. Struct.
(1988).
Biol., 10, 574-584 (2000).
47. J. Felsenstein, Methods Enzymol., 266, 418-
23. F. Sterky and J. Lundeberg, J. Biotechnol., 76,
427 (1996).
1-31 (2000).
48. J. Packer and D. J. Parry-Smith, Curr. Drug
24. Science, 291,1145-1434 (2001).
Discov., March,29-33 (2002).
25. Nature, 409,745-964 (2001).
49. L. Falquet, M. Pagni, P. Bucher, N. Hulo, C. J.
26. A. Brazma, A. Robinson, G. Cameron, and M. Sigrist, K. Hofmann, and A. Bairoch, Nucleic
Ashburner, Nature, 403, 699-700 (2000). Acids Res., 30,235-238 (2002).
27. M. Gardiner-Garden and T. G. Littlejohn, Brief 50. W. Wright, P. Scordis, and T. K. Attwood, Bioin- .
Bioinform., 2, 143-158 (2001). formatics, 15, 523-524 (1999).
28. A. D. Baxevanis, Nucleic Acids Res., 29, 1-10 51. T. K. Attwood, M. J. Blythe, D. R. Flower, A.
(2001). Gaulton, J. E. Mabey, N. Maudling, L. McGre-
29. A. D. Baxevanis, Nucleic Acids Res., 30, 1-12 gor, A. L. Mitchell, G. Moulton, K. Paine, and P.
(2002). Scordis, Nucleic Acids Res., 30,239-241 (2002).
30. A. G. Rust, E. Mongin, and E. Birney, Drug Dis- 52. R. Apweiler, T. K. Attwood, A. Bairoch, A. Bate-
cov. Today, 7, S70476 (2002). man, E. Birney, M. Biswas, P. Bucher, L.
31. R. Apweiler, Brief Bioinform., 2, 9-18 (2001). Cerutti, F. Corpet, M. D. Croning, R. Durbin, L.
Falquet, W. Fleischmann, J. Gouzy, H. Hermja-
32. W. D. Hillis, The Pattern on the Stone, Weiden-
kob, N. Hulo, I. Jonassen, D. Kahn, A. Kanapin,
feld & Nicolson, London, 1998, pp. 77-90.
Y. Karavidopoulou, R. Lopez, B. Marx, N. J.
33. D. J. Parry-Smith, A. W. Payne, A. D. Michie, Mulder, T. M. Oinn, M. Pagni, F. Servant, C. J.
and T. K. Attwood, Gene, 221, GC57-GC63 Sigrist, and E. M. Zdobnov, Bioinformatics, 16,
(1998). 1145-1150 (2000).
34. S. B. Needleman and C. D. Wunsch, J. Mol. 53. A. Bateman, E. Birney, L. Cerruti, R. Durbin, L.
Biol., 48,443-453 (1970). Etwiller, S. R. Eddy, S. Griffiths-Jones, K. L.
35. T. F. Smith and M. S. Waterman, J. Mol. Biol., Howe, M. Marshall, and E. L. Sonnhammer,
147,195-197 (1981). Nucleic Acids Res., 30,276-280 (2002).
36. W. R. Pearson, Methods Mol. Biol., 132, 185- 54. F. Corpet, F. Servant, J. Gouzy, and D. Kahn,
219 (2000). Nucleic Acids Res., 28, 267-269 (2000).
37. S. F. Altschul, W. Gish, W. Miller, E. W. Myers, 55. J. Schultz, R. R. Copley, T. Doerks, C. P. Pont-
and D. J. Lipman, J. Mol. Biol., 215, 403-410 ing, and P. Bork, Nucleic Acids Res., 28, 231-
(1990). 234 (2000).
Bioinformatics: Its Role in Drug Discovery
56. E. Zdobnov and R. Apweiler, Bioinformatics, 62. C. L. Lo, B. Ailey, T. J. Hubbard, S. E. Brenner,
17,847-848 (2001). A. G. Murzin, and C. Chothia, Nucleic Acids
57. P. Lyman and H. R. Varian, How much infor- Res., 28, 257-259 (2000).
mation?, available online a t http://www.sims. 63. C. A. Orengo, F. M. Pearl, J. E. Bray, A. E. Todd,
berkeley .edu/research/projects/how-much-info, A. C. Martin, C. L. Lo, J. M. Thornton, Nucleic
accessed on September 12, 2002. Acids Res., 27,275-279 (1999).
58. D. E. Knuth, Selected Papers on Computer Sci- 64. M. Perutz, Protein Structure: New Approaches
ence, Cambridge University Press, Cambridge, to Disease and Therapy, Freeman, New York,
UK, 1996. 1992, pp. 119-137.
59. J. M. Claverie, Science, 291, 1255-1257 (2001). 65. R. T. Miller, D. T. Jones, and J. M. Thornton,
60. E. W. Stawiski, A. E. Baucom, S. C. Lohr, and FASEB J.,10,171-178 (1996).
L. M. Gregoret, Proc. Natl. Acad. Sci. USA, 97, 66. D. T. Jones, M. Tress, K. Bryson, and C. Hadley,
3954-3958 (2000). Proteins, 37, 104-111 (1999).
61. M. Weir, M. Swindells, and J. Overington, 67. C. Venclovas, A. Zemla, K. Fidelis, and J. Moult,
Trends Biotechnol., 19, 61-66 (2001). Proteins, 45 ( Supp151, 163-170 (2001).
CHAPTER NINE
Chemical Information
Computing Systems in
Drug Discovery
DOUGLAS R. HENRY
PdDL Information Systems, Inc.
San Leandro, California
Contents
1 Introduction, 358
1.1Motivation for Chemical Information
Management, 358
1.2 Literature, References, Societies,
and Research Groups, 359
1.3 Brief History of Chemical Data Management,
360
1.3.1 Pre-1980-Flat File Storage of
Chemical Structures, 360
1.3.2 The 1980s--Flat Database Storage, 362
1.3.3 The 1990s-Relational Data Storage,
.
363
1.3.4 The 2000s, 363
2 Chemical Representation, 363
2.1 Types of Chemical Entities, 363
2.1.1 Sequences, 363
2.1.2 2D Structures, 364
2.1.3 Reactions, 366
2.1.4 3D Models, 366
2.1.5 Mixtures, 367
2.1.6 Generic Structures, 368
2.1.7 Substances, 368
2.1.8 Search Queries, 368
2.2 Types of Chemical Representation, 368
2.2.1 Linear Notation, 368
2.2.2 Tabular Storage, 369
2.2.3 Graphical Representation, 371
2.2.4 Markup Languages, 371
2.3 Chemical Structure File Conversion, 372
2.4 Representing Nonstructural Chemical Data,
373
Burger's Medicinal Chemistry and Drug Discovery 3 Storing and Searching Chemical Structures and
Sixth Edition, Volume 1: Drug Discovery Reactions, 373
Edited by Donald J. Abraham 3.1 Storing Chemical Information in Databases,
ISBN 0-471-27090-3 O 2003 John Wiley & Sons, Inc. 373
358 Chemical Information Computing Systems in Drug Discovery
3.2 Registering Chemical Information, 377 3.6 Sequence and 3D Structure Databases, 387
3.2.1 Extract the Data, 377 3.7 In-House Proprietary and Academic
3.2.2 Cleaning and Transforming the Data, Database Systems, 387
378 4 Chemical Property Estimation Systems, 388
3.2.3Loading the Data, 378 4.1 Topological Descriptors, 388
3.3 Searching Chemical Structures 4.2 Physicochemical Descriptors, 389
and Reactions, 379 4.3 Absorption, Distribution, Metabolism, and
3.3.1 Exact Match Searching, 379 Excretion Properties, 389
3.3.2Substructure Searching, 381 4.4 Property Calculations Online, 390
3.3.3Similarity Searching, 382 5 Data Warehouses and Data Marts, 390
3.3.4 Reaction Searching, 383 5.1 Data Warehouses of Chemical Information,
3.3.5 Searching Other Data, 383 390
3.4 Chemical Information Management Systems 5.2 Data Marts of Chemical Information, 391
and Databases, 384 6 Future Prospects, 393
3.5 Commercial Database Systems 7 Glossary of Terms, 397
for Drug-Sized Molecules, 384 8 Acknowledgments, 412
Total 6-12 YR
Figure 9.1. Traditional "serial" drug design costs. The drug discovery "funnel" typically shows
about a 10-fold reduction at each stage in the process. A chemist who could produce 10-20 structures
per week would be lucky to discover a single marketable drug in a 20- to 30-year career.
date information about commercially avail- informatics (or chemoinformatics) has re-
able and in-house structures, reactions, and cently become common to describe the
data. This chapter briefly describes the history acquisition, management, and use of chemical
ofthese systems, the current state of chemical information.
information management as it applies to drug
1.2 Literature, References, Societies,
discovery, and a look at future developments
and Research Groups
in the field. The coverage is primarily aimed at
corporate applications of chemical informa- The literature of chemistry is vast, and chem-
tion management, as practiced in the pharma- ical information management occupies a small
ceutical industry. The expanding use of micro- corner of this domain. The chemical informa-
computers running Microsoft Windows or tion literature overlaps that of computer sci-
Linux operating systems means that many of ence, database management, molecular mod-
the programs and database systems now used eling, QSAR, and even mathematics. The
in industry can also be installed and applied in primary journals that publish chemical infor-
academic settings. Much of the innovation in mation articles are the American Chemical So-
chemical information management comes ciety's Journal of Chemical Information and
from academia, whereas most of the applica- Computer Sciences, Kluwer's Journal of Com-
tion has been seen in industry. This review is puter-Aided Molecular Design, and Elsevier's
limited to the management and storage of Journal of Molecular Graphics and Modeling.
chemical structure information in databases. Less frequently, chemical information articles
Other chapters deal with the generation of appear in Wiley's Journal of Computational
this information (molecular modeling, prop- Chemistry, Quantitative Structure-Activity Re-
erty calculation) and with the use of the infor- lationships, and Journal of Chemometrics, the
mation in drug discovery (library design, dock- ACS Journal of Medicinal Chemistry and the
ing and structure-based drug design, and Journal of Organic Chemistry, and Elsevier's
QSAR). By analogy with another rapidly ex- Analytica Chimica Acta, Computers and
panding field, bioinformatics, the term chem- Chemistry, and Chemometrics and Intelligent
360 Chemical Information Computing Systems in Drug Discovery
Laboratory Systems. Other journals with Most chemical information research and
articles on chemical information include the development is conducted by commercial soft-
University of Bayreuth's Communications ware vendors and in-house at pharmaceutical
in Mathematical Chemistry (MATCH), Else- firms. A small number of academic research
vier's Drug Discovery Today, ACS's Modern groups study chemical information. The Com-
Drug Discovery, and a handful of newer peri- putational Information Systems group at the
odicals (2). University of Sheffield, under Peter Willett,
The history of chemical information man- has been very active in studying database
agement has recently been catalogued online searching (15). The Computer-Chemie-Cen-
by the Chemical Heritage Foundation (3). The trum at the University of Erlangen under Jo-
American Chemical Society has a Division of hann Gasteiger focuses on organic structure
Chemical Information (CINF), and divisional representation and reaction classification
symposia are held at national meetings of the (16). Numerous other academic groups are ac-
ACS, often in conjunction with other divisions tive in QSAR and modeling research, de-
including Medicinal Chemistry, Computers in scribed in other chapters in this series.
Chemistry, and Pesticide Chemistry. The In addition to the academic groups already
Skolnik Award is given annually by the ACS mentioned, a number of online resources deal
Division of Chemical Information for with chemical information management. Ex-
"achievement in the areas of computerized in- amples include the comprehensive CHEM-
formation systems, chemical information, INFO site at Indiana University (17), Cam-
chemical indexing and notation systems, no- bridge Health Institute's Cheminformatics
menclature, structure-activity relationships, Glossary (18), the Chemical Structure Associ-
and numerical data analysis and correlation." ation (19), the Computational Chemistry List
Herman Skolnik, who died in 1994, was the (CCL) (20), the Molecular Graphics and Mod-
first recipient. He founded the Journal of eling Society (21), the Open Molecule Founda-
Chemical Documentation, which became the tion (22), the QSAR and Modeling Society
Journal of Chemical Information and Com- (231, the Royal Society of Chemistry Chemical
puter Sciences, and he made many contribu- Information Group (24), and the UK QSAR
tions to the field (4). Besides the ACS, other and Cheminformatics Group (25).
national and international meetings on chem-
ical information include the Noordwijkerhout
.
1.3 Brief History of Chemical Data
Conference on Chemical Structures (5), the Management
Quantitative Structure-Activity Relationship
Gordon Conference (6), and the International The history of chemical information manage-
Conference on Chemical Information (7). ment parallels the history of computers. It can
Except for journal articles and some confer- be roughly viewed in terms of decades of de-
ence proceedings, very recent general books velopment (Fig. 9.2).
on chemical information management are
rather few in number. This is caused in part by 1.3.1 Pre-1980-Flat File Storage of Chem-
the rapid changes in a field so closely tied to ical Structures. Computers consisted of main-
computer hardware and software develop- frame machines (e.g., IBM 3090) and small
ment. Another reason for the paucity of texts minicomputers (Digital, Prime). Users con-
is that most chemical information manage- nected through low speed serial connections,
ment systems are commercially developed and using "dumb" terminals (no graphics capabil-
marketed, not widely used by universities, and ity) or monochrome vector graphics terminals
in many cases, they use trademarked or even such as Tektronix and Imlac. Chemical struc-
patented technology. Some texts of note in the tures were mainly stored as either (1)individ-
last decade include several by Collier (81,Mar- ual structure files, indexed by name, and han-
tin and Willett (9), and Warr and Suhr (lo), dled one or a few structures at a time or (2) in
one by Wiggins and Emry (111, and ones by a flat-file database accessed by record number
Maize11 (12) and Ash et al. (131, and a book on (26). A typical corporate database contained
chemical searching by Ridley (14). up to a few tens of thousands of structures.
1 htroductior 361
2000's - chemical data marts and data warehouses - the "star" schema
"Fact"
table
Figure !8.2. Evolution of chemical information storage. The storage of chemical information has
typically lagged the development of database management systems, but it is catching up. In the
1970s, st ructures were stored in individual molecule files or large concatenated files. In the 1980s,
proprieta~ r databases
y of structures and reactions appeared, in which a single record contained all the
informat:ion for a given structure. In the 1990s, this information was distributed into tables in a
relationa1 database. In the 2000s, we see the application of the concepts of data warehousing and data
marts t hat consolidate information from a variety of sources for transactional andlor analytical
purposes
In-house chemical information management hensive study of the user acceptance of CAS
systems began to emerge at some of the larger ONLINE was published in 1988 (29).The first
chemical and pharmaceutical firms. These in- commercial chemical structure database sys-
cluded CONTRAST and SOCRATES at tems appeared in the late 1970s. These offered
Pfizer, SYNLIB at SmithKline, COUSIN at an in-house solution using a mainframe chem-
Upjohn, MSDRLICSIS at Merck, and CROSS- ical structure management system with a
BOW at ICI (27). The Chemical Abstracts da- graphical interface, which could be accessed
tabase was made available online in 1967 (28). by interactive graphics terminals. A standard
In 1980 this became CAS ONLINE. A compre- program in widespread use was the MACCS
362 Chemical Information Computing Systems in Drug Discovery
Figure 9.3. MACCS-the Molecular ACCess System-an early structure indexing system. This
program originally used fixed menus for searching, registration, and reporting. Later versions al-
lowed users to customize the menus. The figure shows the result of a 3D pharmacophore search for
ACE inhibitors. Out of a database of 115,000 structures, 21 fit the 2D and 3D requirements of the ,
search query. The user could typically browse the "hits" from the search, save the list of structures to
a list file, and output the structures to a structure-data file (SDFile). The MACCS database was a
proprietary flat database system in which data of a given type, say, formula, was stored in a given file,
indexed by the compound ID number.
program (Fig. 9.3). Structures could be drawn, and workstations. Highly successful PC-base
registered, searched, and output to files. The "personal" chemical information systems ay
systems were only slightly customizable, and peared, which included chemical structur
the graphics terminals, which used vector dis- drawing and text processing programs (e.g
plays, were large and expensive. ChemDraw, ChemText) and personal chem
cal databases (e.g., ChemBase) (30). Custom
1.3.2 The 1980s-Flat Database Stor- zable mainframe systems appeared (31),as di
age. This was the era of minicomputers reaction indexing and searching systems (32
(Prime, Vax) and a period of immense growth Additional commercial chemical informatio
for chemical information, molecular model- vendors appeared including Daylight Chem
ing, and QSAR. In industry, chemical struc- cal Information Systems, Chemical Desig
ture databases consisted mainly of custom-de- Ltd., DARC-Questel, and Cambridge Scier
signed "flat" databases (where each record in tific Corporation. The Beilstein System cam
a given table refers to a given structure in the online in 1988 (33). In-house and commerciz
database-much like in a spreadsheet). Cli- database sizes were typically 100-200K struc
ent-server architectures appeared, and per- tures in size. The rapid and accurate convei
sonal computers replaced graphics terminals sion of two-dimensional (2D) structures t
2 Chemical Representation 363
three-dimensional (3D) models became possi- tional databases, to take maximal advantage
ble using the program CONCORD, introduced of the scale and performance of these systems.
by Pearlman in 1987 (34). This enabled the We see the increasing use of web-based clients,
introduction of 3D structural databases with also known as "thin" clients, because they
the ability to generate, store, and search 3D need little software other than a web browser.
molecular models on a large scale. These 3D Former single databases are turning into dis-
database systems included ALADDIN by tributed and replicated database systems, and
Daylight Chemical Information Systems, we see increasing use of data marts and data
UNITY3D by Tripos, CHEMDBS3D by Chem- warehouses, more fully integrated structure,
ical Design Ltd., and MACCS3D by MDL (35). reaction, data, and citation searching, and in-
creasingly "intelligent" database systems.
1.3.3 The 1990s-Relational Data Stor-
age. This period saw the decline of single-
2 CHEMICAL REPRESENTATION
computer mainframe chemical management
programs and the rise of server-based systems
Chemical structures and reactions can be rep-
and distributed computing. By far, the most
resented in many ways. At the most funda-
significant influences on chemical information
mental level, the parameters of the time-de-
management were the Internet, the introduc-
pendent Schriidinger equation-the atomic
tion of relational database technology, and the
and molecular orbitals-do a more or less
shift to high-throughput combinatorial chem-
complete job of characterizing a chemical com-
istry. In a relational database, information
pound. Storing and representing structures as
that formerly was kept in a single large table is
mathematical wave functions is obviously not
stored in numerous smaller tables, indexed by
suitable for thousands or millions of struc-
"keys." This is a much more flexible architec-
tures; nor is such a representation useful for
ture, and combining different fields from sev-
drug discovery, except perhaps to a molecular
eral tables into a "view" of the data gives the
modeler. Synthetic chemists still function in a
user the impression of a single large table, as
mostly 2D chemical structure space. Intuition,
before. At the end of the decade, chemical and
training, and experience allow a chemist to ex-
pharmaceutical firms could obtain chemical
trapolate from a flat representation with a few
structure, reaction, and 3D model databases
stereochemical hints-dashed and wedged
from a variety of vendors. These databases
bonds or Z/E double bonds-to a higher-di-
were even somewhat integrated with molecu-
mensional mental representation of a struc-
lar modeling, quantum mechanics, and dock-
ture. Chemical representation systems are a
ing programs, and to literature, spectra, and
compromise of several factors, including the
biological databases. The largest database of
needs of the chemist, the storage and perfor-
known chemical structures, the Chemical Ab-
mance characteristics of the chemical data-
stracts Registry, grew to about 20 million
base system, and the ultimate 3D reality of
structures, whereas a typical corporate inven-
chemical structures.
tory increased to between 100,000 and
1,&0,000 structures. A database of billions of
2.1 Types of Chemical Entities
virtual chemical structures was constructed
and made available for drug-design purposes There are several ways to look at chemical rep-
by Tripos, Inc. (36). resentation. One approach is to classify ac-
cording to the type of chemical data that is
1.3.4 The 2000s. Like the customization stored. The most basic types of chemical struc-
and distributed computing of the 1980s that ture data are shown in Fig. 9.4, including the
followed the introduction of mini-mainframe following.
systems, the 2000s are witnessing the cus-
tomization and further distribution of rela- 2.1.1 Sequences. For linear chemical sys-
tional and integrated database systems. tems, such as DNA, RNA, and proteins, the
Chemical structure-specific and reaction-spe- sequence of subunits (nucleotide bases or
cific search types can be integrated into rela- amino acids) provides most of the information
364 Chemical Information Computing Systems in Drug Discc
Figure 9.4. Basic types of 2D chemical structure data. The amount of information and the complex-
ity of searching increases with the dimensionality of the data.
about the structure. The deciphering of the tachment between building blocks differ
human genome and the exploding interest in simple sequence notation is not possible (
bioinformatics as a means of identifying new becomes more complex.
drug targets means there will be an increasing
growth in the use of sequence data. The use of 2.1.2 2D Structures. When the builtling
a sequence representation depends on a natu- blocks are unique or when dealing with the
ral "vocabulary" of fked building blocks. This large variety of ordinary chemical structures,
vocabulary consists of nucleotides in the case a 2D representation is used. In mathemat,ical
of nucleic acids and consists of the amino acids terms, this is a "graph" of the structure, wliich
in the case of proteins. If any of the building consists of a set of "nodes" (atoms) connec:ted
blocks are unique, or even if the bonding at- by "edges" (bonds). The important atom infor-
2 Chemical Representation
mation includes atom type (symbol or atomic The structure represents a single stereoiso-
number), its 2D coordinates, formal charge, mer among the possible ones. More than
valence state, atom stereochemistry, and iso- one collection of stereo centers may be
tope information. Note that atom stereochem- present in the structure.
istry can be local (i.e., relative) or it may follow Relative as a mixture of stereoisomers-an
Cahn-Ingold-Prelog (CIP) conventions. Local up or down bond represents the current
atom stereochemistry gives the clockwise or relative configuration, with respect to some
counter-clockwise direction of the attachment collection of other chiral centers in the
of neighboring atoms when viewed from some structure. Now, however, the structure
reference attached atom-often a hydrogen represents a mixture of the possible stereo-
atom or the lowest atomic numbered atom isomers, considering combinations of the
(37). The order of atoms in the rotation usu- stereo collections that are present.
ally depends on atomic number. CIP stereo-
chemistry is the familiar "R,S" nomenclature Examples of these alternatives are shown
that relates the stereochemistry of the given Fig. 9.5, which shows the present and the
atom to the entire structure (38). CIP stereo- newer stereochemistry options, using a ste-
chemistry requires analyzing the entire struc- roid structure as an example.
ture to determine the stereochemistry values. The bond information usually includes the
It can occasionally be ambiguous, and if any bonding atoms, the bond type, and bond stere-
part of the structure changes, the CIP stereo- ochemistry. Bond types include the common
chemistry on distant atoms in the structure single, double, triple, and aromatic types.
may switch. For these reasons, it is common in They may also include types that are unique to
chemical databases to store local atom stereo- the type of structure, including dative, ionic,
chemistry, but to perceive CIP stereochemis- hydrogen bonds, etc. The bond stereochemis-
try "on the fly." try for double bonds is usually Z (Zusammen-
A particular problem with relative stereo- together), E (Entgegen-opposite), or either
chemistry is that a given combination of "up" (indicating an unknown stereochemistry). For
and "down" bonds on a structure implies a single bonds attached to a chiral or prochiral
mixture of at least two stereoisomers. If all the
centers are specified, the structure represents
center it is typically "up" (wedge or thick .
bond), "down" (dashed or dotted bond), or "ei-
at least the two enantiomers. If some of the ther" (often a wiggly bond). Some systems al-
stereo centers are not designated, the number low the representation of extended stereo-
of isomers the structure represents is 2", chemistry, as with the terminal groups of
where n is the number of undesignated cen- allene systems, which can show a type of tet-
ters. Some database vendors (e.g., MDL) allow rahedral stereochemistry if you collapse the
a "chiral" designation on the molecule, which allene system to a point. The bondinginforma-
indicates that the structure represents only a tion-which atoms are attached to which
single stereoisomer, but does not specify other atoms and the bond types-is collected
which one. One approach to dealing with these in the "connection table" of the structure. Ta-
problems, which is being adopted in MDL pro- ble 9.1 shows a simple atom connection table
grams, is to allow three kinds of stereo desig- for camphor. The diagonal elements of the ta-
nation at a given tetrahedral center: ble describe the type of atom at a given posi-
tion in the structure. The off-diagonal ele-
1. Absolute-an atom is given a known abso- ments describe the bonding of that atom with
lute stereochemistry. If all the stereo cen- other atoms in the structure. Some informa-
ters are so designated, this represents a sin- tion about a structure can be derived implic-
gle stereoisomer of the structure, as drawn. itly from the connection table. This includes
2. Relative as a single stereoisomer-an up or the rings that are present, and the hydrogen
down bond represents the current relative atoms that could be attached. When a struc-
configuration, with respect to some collec- ture can be represented by more than one iso-
tion of other chiral centers in the structure. mer, it is common to either (1)store multiple
Chemical Information Computing Systems in Drug Discovery
Chiral
Current convention:
a single
stereoisomer with
known absolute
configuration
A single stereoisomer
whose absolute
configuration is
known
Abs
A mixture of
relative
stereoisomers
,-, RelMixl
isomers in the database, or (2)run a structure 2.1.4 3D Models. These extend the struc-
search using a search query that will hit the ture representation by adding one or more
desired isomers. This is true for stereoiso- sets of 3D atomic coordinates for the various
mers, enantiomers, and tautomers. Because conformations that the molecule can adopt.
the connection table is often symmetric, it is 3D model representation may also include ad-
possible to store only, say, the upper diagonal ditional atom or bond information such as ~ a r - A
11 0 8
Atom 1 2 3 4 5 6 7 8 9 10 11
and they serve as a good starting point for 9.6). These are typically used to represent
further optimization. mixtures and generic structures.
Recently, with the use of combinatorial and
high-throughput chemistry, more general 2.1.5 Mixtures. Mixtures are useful to rep-
types of structure representation, so-called resent isomers, formulations, and the prod-
chemical libraries, have become common (Fig. ucts of reactions. Their representation usually
.
Chemical libraries
Mixtures:
Generic structures:
R3
\ R1 = Ph, 2-furyl, 2-hexyl, ...
Figure 9.6. Chemical structure data for high-throughput chemistry. The generic structure repre-
sentation is often referred to as a Markush structure.
368 Chemical Information Computing Systems in Drug Discovery
requires adding data to specify percent or ular drawing programs has recently appeared
amount content in the mixture for each com- on the Internet (45). Query structures often
ponent. contain generalized atom types, bond types,
and ring types. They may specifjr the required
2.1.6 Generic Structures. Generic or Mar- presence or absence of certain atom types or
kush structures are commonly used to represent functional groups. In the case of 3D models,
structures for patent purposes. Since the intro- queries can be devised to represent pharma-
duction of combinatorial chemistry, generic cophores for certain types of therapeutic activ-
structures and generic reactions have become a ity (46). A n important distinction must often
standard means of representing potentially be made between the query representation of
huge numbers of specific compounds in a highly a pharmacophore used for 3D searching and
compact representation that is familiar to the the conceptual pharmacophore used for drug
chemist (40). The central structure of a generic, development.
which is common to all the structures it repre-
sents, is commonly called the "root" or "parent."
2.2 Types of Chemical Representation
The variable parts of the structure (R,, &,etc.)
are referred to as the "Rgroups." The exact sub- A second way of looking at chemical represen-
stituents that make up the various Rgroups tation is to consider the manner in which the
(e.g., 4 1 , -Br, -OH) are referred to as the chemical structure data is organized and ex-
"members" of the Rgroup. Finally, a specific changed, either in some file format or in a da-
combination of root and Rgroup members- tabase. The most common ways of represent-
which constitutes a single, real structur+is re- ing structures and reactions include the
ferred to as a specific or "enumerated" struc- following.
ture. Some chemical computations, like
property and similarity calculations, can be per- 2.2.1 Linear Notation. One of the earliest
formed on the generic structure without enu- forms of chemical structure representation is
merating all the specific structures (41). Wiswesser line notation (WLN), developed in
The remaining types of chemical data that 1946. This notation used short letter codes to
need representation include substances and represent functional groups in molecules (47).
search queries. An alternative early notation is the Beilstejn
ROSDAL string (48). These two formats are
2.1.7 Substances. Less common in drug not used much today, having been replaced by
discovery, but very useful for material science the Daylight SMILES notation (49) and its ex-
and polymer chemistry, is the ability to store tensions (50). Figure 9.7 shows a drug-like
"substances." These include unspecified or molecule along with WLN, SMILES, and ROS-
uncertain chemical structures, polymers, and DAL notation. Also shown is a simple chemical
other chemical entities that cannot be classed reaction represented in SMILES. Note that
with the other chemical representations (42). SMILES and other linear notation schemes do
Polymers pose particular problems, as dis- not include 2D coordinates for display of the
cussed in the article by Schultz and Wilks (43). structure. These are either stored separately
or generated on the fly (51). The SMILES no-
2.1.8 Search Queries. For all types of tation has become especially popular for prop-
chemical representation, there are query rep- erty estimation programs, because atom coor-
resentations that can be applied to a database dinates are not usually needed for connection
to return a list of structures which match or table-based calculations. It is a very conve-
"fit" the query, or that the query "hits" in the nient method for web-based input of struc-
database. The same chemical drawing pro- tures for property calculations (52). Note that
grams that are used to input structures can the order of atoms in most linear notations is
commonly be used to input chemical structure arbitrary, depending on where in the molecule
queries. These drawing programs currently the notation generator (program or chemist)
include several programs in the commercial starts. For this reason, some linear notations
and public domains (44). A comparison of pop- have a canonical (or "uniquified") form that
2 Chemical Representation
SMILES:
Figure 9.7. Various linear notation schemes for chemical representation. Some contain only atom
types and connectivity (WLN, ROSDAL, SMILES, SLN) and are chemist-readable. Others are com-
pressed versions of molecule file formats (CHIME) and are meant for computer interpretation.
places the atoms in a topological order, usually ture. In the MDL molfile format, the atom and
reflecting their degree of branching, the types bond information is separated into separate
of neighboring atoms and bonds, etc. This ca- blocks. In the Hyperchem HIN file format, the
nonical ordering of the atoms reduces any bond information is mixed with the atom in-
user-input ordering to the same string. It can formation, resulting in fewer records in the
then be used for exact-match lookup of the file. In the PDB format, the atoms can be as-
structure, regardless of how it was drawn or signed to residues. Descriptions of various for-
typed. The SMILES notation has also been ex- mats can be found in the reference manuals
tended to include reactions as shown in Fig. for chemical management and molecular mod-
9.7 (53). Occasionally, other linear notations eling programs or in the literature (55). The
are described (54). systems that manage reactions typically have
their own file formats as well.
2.2.2 Tabular Storage. To preserve more Both linear and tabular formats are capa-
specific information about atoms and bonds, ble of being transmitted over a network be-
such as coordinates, stereochemistry, charge, tween computers. This allows passing struc-
and isotope number, it is necessary to store ture information from a server to a
molecule information in a tabular format. workstation for display purposes. It is com-
Each row of the table typically contains all the mon to compress andlor encrypt the chemical
information about a single atom or bond. In structure information before it is transmitted,
some formats, the atom and bond information and then have the workstation or display pro-
is combined on a single line. Table 9.2 shows gram uncompress or decrypt the resulting
three common file formats for a simple struc- structures. This is done for performance and
370 Chemical Information Computing Systems in Drug Discovery
for security. In MDL systems, the Chime lin- Often, the graphical format allows the con-
ear format is used to transmit structures and nection table to be stored and transferred
reactions, whereas Daylight systems simply transparently with the image-through the
use the SMILES representation and depict the computer's clipboard, for instance. This al-
structure on the fly (Fig. 9.7). lows the receiving program to "interpret" the
image as a chemical structure and manipulate
2.2.3 Graphical Representation. Occasion- it accordingly.
ally it is desirable to store chemical structures
as "pictures", usually for document purposes. 2.2.4 Markup Languages. The Internet has
For example, some chemical drawing pack- spawned a host of new "languages" that fa-
ages and many molecular modeling packages cilitate the exchange of information. The
can store structures as the following: most common of these are HTML (hypertext
markup language) and XML (extensible
0 Wordperfect or Microsoft Word document
markup language). A variation of XML that is
(.doc files)
designed for chemical information exchange is
a Extended postscript (.eps files) the Chemical Markup Language CML (56).Al-
0 Windows metafile (.wrnf files) though it is not widely used as of this writing,
0 A proprietary sketch (MDL .skc files) it bears watching as more web-based chemical
0 A variety of compressed graphics formats in- information platforms become available.
cluding JPEG (jpg files), bitmap (.bmp Problems with markup languages are that
files), GIF (.gif files), and TIFF (.tif files) they are verbose compared with structure
372 Chemical Information Computing Systems in Drug Discovery
files, and they are difficult for chemists to read chemist to import and export structures using
(although they are not usually meant for a variety of file formats. Commercial programs
chemist interpretation). This is evident in Ta- designed specifically for file conversion are
ble 9.3, which shows the CML for acetic acid. available (57). A widely used public domain
By comparison, the SMILES for acetic acid is program, Babel, is available in source code and
simply "CC(=O)O". in a Windows version (58). It is being extended
by the "OpenBabel" programming project
2.3 Chemical Structure File Conversion (59). It is possible, with a fair amount of accu-
Many chemical information management sys- racy, to convert a chemical structure from a
tems, especially modeling programs, permit a connection table format to an acceptable
I 3 Storing and Searching Chemical Structures and Reactions
0 The master "data dictionary" table, which A handful of tables that contain database
describes all the objects in the database, as parameters. These include substructure
well as some parameters that are specific to search key definitions, the periodic table
the database (exact match criteria, version used with the database, and a list of salt
of the database, etc.). This is sometimes re- moieties that can be considered during
f e d to as "metadata" or "data about data." searches.
Chemical Information Computing Systems in Drug Discovery
D l definitions
Main data
dictionary
(metadata)
€3 Structures
€3
I
Periodic Formulas
€3
table
Flexmatch index
definitions
1 (tautomers,isomers, etc)
Substructure
Y
keJ
Fastsearch index
(substructuresearching) Structures
and
Database data
parameters
Search indices
Structure and data storage is shown on the bonyl), or more complex atombond combi-
right. A structure table contains the struc- nations (e.g., carbonyl separated from a sec-
tures, their internal identifiers, and their ondary amine by three bonds). In ISIS, a set
external identifiers, if any. The structures of 166 searchable keys can be explicitly us,ed
are stored in a compact binary representa- as filters for structure searching. A larger
tion that includes the connection table, the set of 960 keys is used for similarity calcula-
coordinates, the ring information, and any tions. For 3D models, it is common to gener-
stereochemical, valence, isomer, isotope, or ate 3D pharmacophore keys, which encode
bond information. Certain types of struc- all the possible 2- and 3-point pharmaco-
ture-specific information such as polymer or phores represented in the structure, some-
component designations are stored here, times considering multiple conformations.
whereas other types of structure-specific in-
formation (atom- or bond-specific data, and 0 A third kind of information includes indexes
more verbose text data) are stored in their to enable structure and substructure
own tables, referenced by the internal iden- searching. A "flexmatch" table contains a
tifier, and the atom or bond numbers to numerical hash (see Glossary) of certain fea-
which the data correspond. A formula table tures of the molecule, including stereochem-
contains the molecular formula and various istry, charge, and isotopes. This table can be
atom and atom-type indexes to enhance for- used to retrieve a set of candidate structures
mula searching and sorting. quickly for exact match verification (63). It
0 A table of substructure keys containing a can also be used for "fuzzy" exact-match
binary or text string of the substructure fin- searches to retrieve tautomers and isomers
gerprint that was identified in the given of the input structure.
structure at registration time. These keys Another index table contains a "fastsearch"
represent the presence of either simple index. This contains a single balanced tree
functional groups (e.g., phenyl ring, car- (see Glossary) of all the substructural frag-
3 Storing and Searching Chemical Structures and Reactions
/ / \
C- C C=C C#C' c- 0 C N...
Figure 9.10. Simplified ISIS Fastsearch index-ethanol is a leaf node that can be reached from
several substructure nodes.
ments found in structures in the database, 3.2 Registering Chemical Information
up to a fixed pathlength. These are stored in
a highly compressed binary format (Fig. Chemical structure registration is an impor-
9.10). Similar approaches have appeared in tant activity that is necessary for drug discov-
the literature (64). Leaf nodes in the tree ery. The structures that have been developed
contain identifiers of specific structures in by a pharmaceutical company constitute the
the database (simplified in Fig. 9.10). An ex- "crown jewels" of chemical information, and
act match or substructure search consists of they must be properly and securely archived.
traversing the tree to find structures in the The registration process usually involves the
database that have substructural fragments process of extracting, cleaning, transforming,
in common with the query structure. Be- and loading the data-sometimes termed
cause the fastsearch index is large-often as ECTL.
large as the rest of the structure database,
updating it for the addition or removal of 3.2.1 Extract the Data. First, the struc-
structures is time consuming. tures/reactions and corresponding data are ex-
tracted, collected, and validated. Increasingly,
This relational chemical database format is this is managed automatically, using output
extended in ISIS to include 3D models, generic from the high-throughput chemistry process.
structures, and most recently, reactions. In Laboratory information management systems
these cases, additional "trees" in the database (LIMS) that are "structure smart" can man-
hierarchy connect 2D structures with 3D mod- age chemical structure information starting
els, connect root structures with correspond- from the design of a reaction, through the syn-
ing Rgroup members, or connect molecules
thesis of the compounds, the chemical analysis
with reactions.
of the structures, the in vitro biological assay,
Other relational structure/reaction data-
and finally the storage in the chemical data-
base systems are available commercially.
These include the Thor system from Daylight base. Certain steps, such as drawing the initial
(651, Accord and RS3 Discovery from Accelrys structures/reactions, still remain an activity
(661, and Unity from Tripos (67). Personal da- for the chemist, although many chemical in-
tabase systems that can be implemented on a formation systems can take a generic struc-
desktop computer include ISIS/Base (68), Ac- ture, enumerate the many specific combina-
cord for Access (66), and Team Works from tions, and layout the structures automatically
Afferent (69). (for example, the Monomer Toolkit by Day-
378 Chemical Information Computing Systems in Drug Discovery
light, the Central Library program by MDL, dexes, substructure or similarity keys, molec-
and CombiLibMaker by Tripos). ular formula, molecular weight, and other
structure-based properties. Substructure keys
3.2.2 Cleaning and Transforming the Data. or "fingerprints" are particularly important.
Next, the structures/reactions are passed They consist of a number of binary descriptors
through a filtering program that searches for for the presence of certain functional groups
structure anomalies and corrects the chemical or more generalized atom/bond combinations.
representation. In this step, chemical "busi- These keys can be used to filter structures
ness rules" are applied to the structures to before searching. They are also used for simi-
insure that representations that can be drawn larity calculations. Originally, substructure
in different ways, such as nitro groups and search keys were always used to filter struc-
tautomers, are represented by a single conven- tures before performing a substructure search
tion. Specialized chemical manipulation lan- of the database. If a query structure contains,
guages such as and Genie Control Language say, a carbonyl group, then only carbonyl-con-
by Daylight, Cheshire by MDL, and Sybyl Pro- taining structures should be examined during
gramming Language by Tripos are used to im- the substructure search. A key representing
plement this step. These languages are versa- the carbonyl group can be used to filter struc-
tile and easily programmed, and they can be tures that contain the group (the key turned
applied to other steps in the drug discovery on, or set to 1) from those lacking it (key set to
process, such as searching, property calcula- 0). Tree-based substructure searching does
tion, and structure manipulation in general. not require prior filtering, so today, substruc-
ture keys are primarily used for similarity cal-
3.2.3 Loading the Data. Finally the struc- culations between molecules. If the key values
tureslreactions are handed to a chemical reg- of two structures are compared, the more keys
istration system. The chemical registration they have in common, the higher their similar-
system will typically "perceive" the struc- ity value will be. When registering reactions,
tures-identify atoms, bonds, rings, stereo- the reactants and products may undergo auto-
chemistry, valence states, isotope values, and mated or semi-automated perception of react-
other chemical information as needed. In the ing bonds and atom centers (71). Generic
case of reactions, it notes which structures are structures may be analyzed and "clipped" or
reactants, which are products, and which are reverse-transformed to generate root and
agents or catalysts. Because there can be member structures, which may be stored sep-
many valid ways of drawing a structure de- arately (72).
pending on which atom you start with, a struc- Before finally storing the structure in the
ture may be given a canonical renumbering of database, the registration program may
the atoms using a variant of the Morgan algo- search the database for some level of match to
rithm (70). In the case of a linear representa- the input structure or reaction, and skip the
tion like SMILES, this canonicalization yields registration if it is a duplicate. This is some-
a unique string for the structure, which can be times termed "deduplication" through "exact
generated from any valid SMILES string de- match" searching. There is usually some re-
rived from the structure. In the case of a struc- dundancy in chemical databases, and to save
ture stored in a connection table, the Morgan search time and disk space, most companies do
algorithm results in the atoms being reor- not store duplicate structures or reactions, but
dered in the connection table to generate a rather store pointers to them.
tree, branching outwards from the most The final step, after registering the struc-
highly connected atom in the structure. Be- ture or reaction, is to assign it a unique
cause of the efficiency of indexing in modern registry identifier, which is typically used
relational chemical databases, Morgan re- throughout the company to identify the given
numbering is not used as much today as in the structurelreaction and any chemical, biologi-
past. cal, or inventory data that is associated with it.
The registration system then computes in- Some identifiers, like the Chemical Abstracts
dexes. These include structure-searching in- Service CAS number and the Beilstein BRN,
3 Storing and Searching Chemical Structures and Reactions 379
have wide application, and these may be used the chemist's experience and preferences, and
in addition to, or instead of, a corporate-as- balanced by synthetic feasibility and econom-
signed external registry identifier. ics. The reagents may be located in-house, or
they may require ordering from a chemical
3.3 Searching Chemical Structures
supplier.
and Reactions
A completely separate approach to reaction
The type of chemical structure and reaction discovery is the reaction planning approach
searching that a chemist does usually depends implemented in such programs as Logic and
on the current stage of a project. For example, Heuristics Applied to Synthetic Analysis
if the chemist is starting a new therapeutic (LHASA) (75). This program works by search-
project, a therapeutic activity search might be ing a chemical knowledge base that contains
conducted, using a database such as the Der- information on approximately 2300 retro-re-
went World Drug Index, the MDL Drug Data actions or transforms. The chemist draws a
Report, or the MDL Comprehensive Medicinal target molecule and indicates a strategy for
Chemistry database. Retrieving many search the reverse-synthetic analysis. The program
hits, the chemist might organize them by sort- then searches the transform knowledge base
ing on name, molecular weight, ring system, for transforms that satisfy the strategy the
or some topological basis. If the resulting list is chemist selected. The program decides which
too large, the chemist might perform a cluster transforms are suitable for the particular tar-
analysis of the structures to see what general get structure and displays the resulting pre-
classes of compounds have been synthesized in cursors to the chemist. The chemist can then
the past. After sampling from the various clus- select a precursor for further analysis and
ters, and identifying a handful of interesting choose another strategy option, on which the
structures, the chemist might perform a sub- program returns a second level of precursors
structure search to find structures that con- in the same way. Processing continues in this
tain the features that are felt to be important manner until the chemist is satisfied that one
to activity (i.e., the pharmacophore). If that or more of the precursors correspond to a rea-
search returns too many hits, the search query sonable starting point for a synthesis. Ret-
can be refined by making it more specific. If rosynthetic methods have not become as
the search returns too few hits, the search widely used in industry as reaction searching,
query can be relaxed, or a similarity search partly because the certainty of the reactions is
can be used to find structures in the topologi- not guaranteed. Also, searching existing reac-
cal neighborhood of the query structure. tion databases generally yields the desired re-
Eventually, a number of structures will be ob- action or something close to it. Indeed, a major
tained as candidates for synthesis and/or test- problem with search results from reaction da-
ing. tabases is often an overabundance of hits,
The next step is to design a set of reactions which typically need further organization and
to synthesize the compounds. One or more re- filtering to be useful. One approach to organiz-
action databases can be searched to find ing the results of reaction searching is to apply
whether any reactions give the desired struc- some clustering or classification to the reac-
tures as products or give structures that are tions (76).
similar to the desired ones. The chemist may To support the workflow just described, a
also use reaction similarity searching (73) and number of structure and reaction search types
searching across reaction schemes (e.g., if A + have come into use (Fig. 9.11). These are
B -+ C + D and C + E + F + G; a reaction briefly described as follows.
scheme search will find the query A +F) (74).
Once a reaction is found, the chemist needs to 3.3.1 Exact Match Searching. Here, the
decide what reagents to use in the synthesis chemist has a particular structure (or reac-
and where to obtain them. The selection of tion) that he wishes to find in the database.
reagents will usually be based on a combina- The structure/reaction is drawn using a draw-
tion of physicochemical property consider- ing program and then passed to a search pro-
ations (i.e., QSAR and diversity), tempered by gram. The program submits the query to the
Chemical Information Computing Systems in Drug Discovery
Substructure 1 X X X X X I
Pharmacophore X X
Similarity X X X X X
Figure 9.11. Search types depend on the nature of the chemical information.
search routine that typically generates index ture is mapped to the candidate structure us-
values from the query that are of the same ing a process known as atom-atom mapping,
type as those generated for structureslreac- which is known in topology as the "graph iso-
tions when stored in the database. The index morphism" problem. This mapping is time-
values are then used as filters to retrieve a set consuming, so the prior filtering step should
of candidate structures/reactions. In ISIS, be as efficient as possible. Each structure that
these filters include the formula, the molecu- maps exactly to the query is placed in the re-
lar weight, and the flexmatch index, a numeric sult set or "hit list." To accommodate various
hash code based on the presence of isomers, chemists' needs, exact match searching can
tautomers, isotopes, salts, charges, and stere- usually be "relaxed" to permit the finding of
ochemistry (see Glossary). The resulting fil- isomers, tautomers, salts, charged or un-
tered structures have the minimum set of re- charged species, etc. In the case of reactions,
quirements to fit the search query, but variations of the reaction can be retrieved-by
typically only a fraction of these structures relaxing the constraints on the reaction condi-
will fit the query exactly. Once this set of can- tions, solvent, and catalyst (Fig. 9.12).
didate structures is obtained, the query struc- In a Daylight Thor database, where the
N
Query Tautomer Salt
I
Figure 9.12. Different degrees of exactness I
D
D
can be defined by allowing tautomers, salts, and
isomers successively in the search. Isomers
3 Storing and Searching Chemical Structures and Reactions
CH2OH
R1
the more flexible the search becomes, but the
search may also require more time to com-
plete. There is a trade-off between putting the
flexibility into the database (i.e., storing and
indexing multiple forms of a structure) and
putting the flexibility into the search query
Link node Stereo bonds Markush and the search software.
structure is stored as unique SMILES, the ca- requests a particular hydrogen count or
nonical query SMILES can be compared lexi- range at a given position
cally with strings in the database using fast 0 Link node-which specifies a range of al-
string comparison and indexing techniques, to lowed atom or functional group links be-
find exact match structures and reactions. Be- tween atoms
cause a structure in a Thor database consists
0 Stereo bond-including Z/E/either or up/
of a meaningful, canonical sequence of charac-
ters, the computational efficiencies of string downleither
searching and comparison can be applied 0 Markush feature-used for patent repre-
when searching the database. This is in con- sentation, for representation of generic
trast to the highly specialized search tech- structures for combinatorial chemistry, or
niques used in other structure database for- to limit the substituents that can be present
mats. at a given position. Note that some systems
allow logical operations on Markush fea-.
3.3.2 Substructure Searching. A substruc- tures (if 4 H at R,, then no - 4 1 at R,).
ture search is performed when a chemist has
in mind a pharmacophore consisting of a set of A specialized case of substructure search-
functional groups or a substructure which he ing is 3D pharmacophore searching, in which a
knows must be present in the structures to be substructure search is combined with the
retrieved. Only part of the molecule is drawn, measurement/generation of 3D features, to
along with query features that generalize at- identify models that could fit a 3D pharma-
oms, bonds, and rings in the structure. Figure cophore. Figure 9.14 shows an example of a 3D
9.13 shows some typical substructure query substructure search query that includes vari-
features. The features include the following: ous 3D features or constraints. A given confor-
mation of a molecule that is stored in the da-
0 Single atom-specifies a periodic table atom tabase may not exactly match a given query,
that must be present or a more generalized but it could be modified by rotation about sin-
atom (hetero, metal, etc.) or "superatom" gle bonds to fit the query. For this reason, con-
(condensedfunctional group, such as Ph, Et, formationally flexible 3D searching is a fea-
Ala, etc.) ture of most 3D database systems (77). When
0 Atom l i s t a list of atoms, any one of which searching conformers, the conformational
may be present flexibility can be incorporated into (1) the
0 "Any" atom-which simply means some query, by tethering flexible groups to fixed an-
atom must be attached at the given position. chor points in the structure, (2) the database,
As with structures, the hydrogen atoms in by storing multiple low energy conformations
substructures are implicit, unless the user for each structure, or (3)the search process, by
Chemical Information Computing Systems in Drug Discovery
incorporating a rapid conformational analysis 3.3.3 Similarity Searching. The most gen-
into the 3D search algorithm. The last is the eralized type of structure/reaction searching- is
most common approach and is a part of data- searching for "similar" structures or reactiong
base systems from Tripos, Accelrys, and MDL. in the database. Chemical similarity has been
Many different approaches to substructure a highly debated topic for some time, mostly
searching have been devised (78). In ISIS, the from the standpoint of what constitutes good
fastsearch index file is used to retrieve candi- descriptors to use in the similarity calcula-
date structures. If needed, the query is then tions (81). Nevertheless, there are some gen-
mapped onto these structures using a "back- eral approaches that are widely used, not be-
tracking" approach. This involves succes- cause of their theoretical soundness, but
sively matching atoms and bonds in the struc- simply because they work for the chemist. For
ture to those in the query in a stepwise 2D structures, the most useful and efficient
manner. When a match fails at any given step, similarity approach is key-based similarity.
the program backtracks to the last successful This involves computing the overlap between
step and selects an alternative atom or bond. a query structure and a candidate structure
Once all the atoms and bonds have been using substructure or fragment keys. ISIS
matched, the structure is considered a hit. An uses the 960 keys that are generated when the
issue of the Journal of Chemical Information structure is registered. The overlap is typically
and Computer Sciences has been devoted to computed using the Tanimoto metric, which
substructure search methods (79). Hicks and was first used in 2D structure similarity by
Jochum reported a comparison of several sub- Willett et al. (82). Depending on the nature
structure search algorithms in 1990 (80). and number of the keys, it may be desirable to
These authors found the Beilstein-Softron S4 weight the Tanimoto calculation inversely ac-
search system to be superior in search speed at cording to the prevalence of the key in the
that time. database. Thus, a cyclopropyl key, which may
3 Storing and Searching Chemical Structures and Reactions
not be highly prevalent in the database, and tached, retention or inversion of stereochem-
would be "swamped" by other, less relevant istry, etc. Bond changes include making and
keys in an unweighted similarity calculation, breaking of bonds, and changes in bond order
may have more influence in a weighted calcu- and stereochemistry. When searching reac-
lation. This weighted calculation is used as the tions, the chemist can search for exact, isomer,
default in ISIS chemical databases. It is possi- or substructure matches in the reactants, in
ble for an ISIS database administrator to re- the products, or both. The structure searching
generate the keys using custom values of the can be accompanied by a search of the reaction
weights to enhance differences in the similar- text information for -yield and conditions. Sev-
ity calculations and select, say, more "drug- eral commercial reaction indexing systems are
like" molecules in the search. In the reaction available from molecule database vendors,
domain, similarity can be defined in terms of and online searching is even possible (87).
the structures, the reactions, or a combination In most reactions, the majority of the at-
of the two (83). Other similarity search sys- oms and bonds are not involved in the reac-
tems have been described in the literature, in- tion, and they remain unchanged between re-
cluding the one used by CAS (84). It is also actants and products. To avoid examining
possible to use 3D pharmacophore keys to these unchanging atoms and bonds, most re-
compute similarity, although these have typi- action indexing systems allow the user to
cally not performed as well as 2D keys. It is mark, in the reactants and products, those at-
possible that conformational flexibility so oms and bonds that are involved in the reac-
vastly expands the "chemical space" of the tion. These are termed reacting center atoms
molecules that a limited number of keys is and bonds, and when they are present, they
simply inadequate for 3D similarity calcula- enable much faster reaction searching and
tion. When attempting to predict the type of they reduce the number of false hits obtained.
therapeutic activity a compound has, Briem A simple example is seen in Fig. 9.15. Some
and Lessel concluded that 2D and 3D keys systems have semiautomatic perception of re-
have complementary information (85). acting centers, which must usually be aug-
mented or checked by a chemist, especially
3.3.4 Reaction Searching. Reaction search- with complex transformations.
ing, sometimes called reaction indexing, has As with molecules, it is also possible to do
been available for over 20 years. Originally de- reaction similarity searching. Given a reaction
veloped as online searching systems, the intro- with reactants, products, and agents, one can
duction of in-house systems like REACCS al- typically run molecule similarity searches for
lowed pharmaceutical companies to augment the reactants, the products, or both. This will
published reaction sources with their own re- retrieve reactions that have similar structures
actions and data (86).As with molecules, reac- involved in them. This does not guarantee
tion storage has moved from proprietary data- that the molecules undergo the same or even
base foundations to storage and access in similar transformations. It is possible in some
relational systems. Reaction searching encom- systems to also include the similarity of the
passes many of the same types of searches transformation as part of the overall similar-
used for molecules. A reaction typically con- ity search. This is usually carried out using
sists of three types of structures: reactants, special keys that have been generated for a
products, and catalysts or agents, along with fixed number of possible transformations. As
textual information about yield, conditions, with molecules, the more keys a query and a
etc. Reactant and product structures undergo reaction have in common, the higher will be
structural changes in the reaction, whereas the similarity.
agents do not. The atom and bond changes
that occur in a reaction are isolated in one or 3.3.5 Searching Other Data. Data other
more reacting centers of the reactants and than structures and reactions must also be
products. The atom changes consist of searched in the drug discovery process. Vari-
changes in atom valence, charge, number of ous systems exist for indexing and searching
attached hydrogens, number of bonds at- literature and journal contents (881, patents
Chemical Information Computing Systems in Drug Discovery
Query
jl
-
.8. 3.
7.c;. o',
.6.
S
.5.
-, si,/O
I
&.L
.3. .2.
.I.
.4.
6.
.5.
-
.lo.
.9.
f l . 8 .
.4. 0
.5.
.6.
.7.
.lo.
\O
.9.
r"
.3./2.
.4.
Figure 9.15. Reaction substructure search query and some example hits. If no reacting center or
0 .6.
.5.
.a.
.7.
mapping information is used, all three hits are found. If reading bond information is used, hit c is
excluded. If both reacting atom and reacting bond information is included, then false hits b and c are
excluded.
(891, material safety data sheets (go), and the Internet. Some representative systems
chemical suppliers (91). Some useful tools in- that are being sold or have been discussed re-
clude the Accord ChemExplorer program, cently in the literature are discussed below.
which allows searching word processor docu-
ments and files for particular chemical struc-
tures, and the Cambridgesoft ChemFinder for 3.5 Commercial Database Systems
Word (92). for Drug-Sized Molecules
Accelrys. A subsidiary of Pharmacopeia,
3.4 Chemical Information Management Inc, Accelrys was originally a provider of mo-
Systems and Databases lecular modeling software. They recently ac-
A number of software and database vendors quired several companies that provide offer-
provide programs and database systems to im- ings in the chemical information and
plement representation, registration, and bioinformatics areas. The company provides
searching of chemical information in a corpo- unique databases including several for reac-
rate environment. Some of these vendors have tions.
smaller personal chemical database systems
that support registration and searching on a BioCatalysis-biomolecules as catalysts
personal computer. A handful of academic and BioSter-pairs of biologically similar struc-
public domain systems are also available. Fi- tures for bioisosterism applications
nally, an increasing number of chemical infor- Biotransformations-developed in conjunc-
mation systems are being made available on tion with the Royal Society of Chemistry
3 Storing and Searching Chemical Structures and Reactions 385
Failed Reactions-those that did not pro- CODENs, and patent information, are also
ceed as expected stored. The data are organized into substance,
Metabolism-developed in conjunction reaction, and citation contexts, and a user can
with the Royal Society of Chemistry easily switch from one context to the other. An
Methods in Organic Synthesis--33,000 re- ACS symposium volume devoted to the Beil-
actions, Protecting Groups-functional stein database has been published (96).
group protection with region/stereoselectiv- Chemical Abstracts Service. As a division of
ity the American Chemical Society, CAS develops
and manages the world's largest databases of
Solid Phase Synthesis-with emphasis on chemical structures and reactions.
small-molecule and combinatorial chemis-
try CAS Registry--35 million structures-19.5
million distinct structures-13 million bio-
The chemical information programs pro- sequences
vided by Accelrys include several database sys- CASREACT-4 million reactions
tems.
CHEMCATS-2.5 million commercially
available chemicals
Accord for Excel and Access-relational
chemical storage for Microsoft programs MARPAT-500,000 searchable Markush
structures
Accord for Oracle-a chemical data car-
tridge (see Glossary) The CAS databases are maintained online,
Accord Database Explorer-to access Accel- with searching allowed on a subscription ba-
rys reaction databases sis. SciFinder is a clienttserver application to
RS3 Discovery System-with programs for search CAS databases by author, keyword, ex-
chemical structure, data management, act, and substructure. It includes a "keep me
high-throughput screening, and inventory posted" update feature, reaction information
back to 1974, nucleotide and protein sequence
Accelrys also provides programs for de- searching, browsing of 1600 journals, and in-
scriptor calculation, QSAR, and data mining tegration of structure, data, and citation infor-.
(93). mation. STN International is a collection of
The Beilstein Database. The Beilstein Data- 200 databases covering chemistry, life sci-
base, with over 8 million structures, is the old- ences, engineering, patents, etc. STN Express
est in existence, based on the Beilstein Hand- provides wizard-assisted searching, and STN
book of Organic Chemistry, and contains data on the Web serves as a web client for STN. The
that extend back to 1771. The database is pro- ChemPort program provides web access to
duced by the independent Beilstein Institute journals (97).
(94). Access to the database is either through Daylight Chemical Information Systems,
Beilstein Online, available through STN and Inc. This company provides numerous third-
Dialog, or through the Web using Crossfire party databases in the Thor format. These in-
Beilstein, which is marketed by MDL GmbH- clude the following:
formerly Beilstein Inc. (95). Data that are
stored include the structure, Beilstein and Databases of organic structures: Available
CAS Registry Numbers, names, formula, Chemicals Directory-250,000 structures,
preparations, reactions, natural product isola- Asinex catalog-115,000 structures, May-
tions, and chemical derivatives. Physical prop- bridge catalog-62,000 structures, Info-
erties, if available, are also stored, including Chem SPRESI'95-2.5 million structures
optical data, mechanical properties, multi- Drug and biological databases: BioScreen
component system data, spectral and thermo- NP and SC-about 52,000 structures in-
dynamic properties, as well as biological func- cluding natural products, Pomona College
tion, ecological data, toxicity, and common Medchem-36,000 structures with mea-
uses. Citation data, including author, journal, sured LogP, National Cancer Institute-
Chemical lnformation Computing Systems in Drug Discovery
0
using a distance geometry approach
PCModels for LogP and other physical prop-
ware. Databases include the following: .
erty calculations Available Chemicals Directory ACD-
CombiChem Package to manage high- 300,000 structures-reagents and general
throughput synthesis chemicals, with supplier information
0 Reaction Package Bioactivity databases-AIDS database-
0 DayCGI-a web development toolkit 43,000 structures and data from the Na-
tional Cancer Institute, Comprehensive Me-
0 A set of Java tools for chemical information
dicinal Chemistry (CMC)-7500 common
management
drug structures, MDL Drug Data Report
(MDDRI-120,000 patented drug struc-
Derwent Information. A division of Thom-
tures
son Scientific, Inc., Dement is the leading
supplier of value-added patent information. 0 Reactions-ChemInform-850,000 reac-
The Dement databases, which are main- tions and 1.2 million structures, Theilhei-
tained online, include the following: mer/Chiras/Metalysis-171,000 reactions
and 223,000 structures
Derwent World Patents Index-references 0 Metabolism-Metabolite-53,000 transfor-
to patents, including chemical structure and mations-34,000 structures
use patents Toxicity-EPA RTECS-based-150,000
Patents Citations Index-bibliographic and structures
citation data, the Innovations index com- 0 Material safety-OHS Material Safety Data
bined entries from WPI and PC1 Sheets
3 Storing and Searching Chemical Structures and Reactions 387
Software from MDL includes the ISIS cal diversity techniques to chemical popula-
scientific information system (ISISPraw, tions to characterize and populate chemical
ISISIBase, and ISISPirect), Cheshire for space (102)
chemical structure manipulation, and Chime
and Chemscape for Web access. Combinatorial 3.6 Sequence and 3D Structure Databases
and high-throughput chemistry programs in- Sequence databases of biological macromole-
clude Afferent, Central Library, Project Li- cules are useful when defining new therapeu-
brary, Reagent Selector, and Elan. Biological tic targets. Databases for DNA, RNA, and pro-
data management programs include Apex and teins are available from such sources as the
Assay Explorer; literature access through National Center for Biotechnology Informa-
LitLink; reaction access through Reaction tion (NCBI) (103) and the European Bioinfor-
BrowserIWeb; and finally, molecular modeling matics Institute (104). Numerous online pro-
through Sculpt (101). grams and tools are available to researchers to
Tripos, Inc. Originally the major provider search and align sequences, generate phyloge-
of molecular modeling software, Tripos now netic analyses (chemical evolutionary trees),
offers chemical information content in the map genes, and predict secondary structure
form of databases and the tools to manage (105). The Protein Data Bank stores the larg-
them. These include the following: est collection of crystallographic, NMR, and
molecular-modeling derived protein and nu-
Several Chapman and Hall databases in- cleic acid 3D models (106). The Cambridge
cluding ones for organic structures (180,000 Crystallographic Data Center is the primary
structures), inorganic and organometallic source for crystal structure data on small mol-
structures (40,000 structures), natural ecules, with more than 250,000 entries. The
products (105,000 structures), and pharma- Cambridge Database can be searched using
cological agents (22,000 structures) the programs ConQuest for searching, Mer-
cury for structure visualization, and Vista for
The National Cancer Institute structures in numerical display and statistical analysis
a Tripos-compatible format (107).
The Dement World Drug Index (60,000
structures) 3.7 In-House Proprietary and Academic
.
Database Systems
Chemical information software offered by
Tripos now also extends beyond just molecu- Larger chemical and pharmaceutical firms
lar modeling. Their programs include the fol- have, over the years, developed in-house sys-
lowing: tems with capabilities that are specific to the
chemist's needs. Today, the costs of develop-
The Unity 3D database system, which fea- ing from scratch and maintaining an in-house
tures rapid flexible 3D pharmacophore system are prohibitive, especially because
searching commercial chemical information systems are
Concord and Stereoplex-for generating 3D highly efficient and customizable. Personal
models of database structures including chemical information software is still being
multiple stereochemical isomers developed and reported in the literature. Ex-
amples include a relational database pat-
ChemEnlighten for chemical data mining terned after the Upjohn Cousin system (1081,
The AUSPYX structure data cartridge for and CheD, which is a SQL-based system with a
Oracle Web client (109).
A suite of programs for combinatorial Commercial personal database systems are
chemistry-Legion to build and store vir- available from several vendors, as described
tual libraries, CombiLibMaker to enumer- above. These products extend the productivity
ate structures, Selector to define diversity of an individual chemist or a small workgroup,
measures to select diverse subsets of struc- but are not designed for corporate or enter-
tures, and DiverseSolutions to apply chemi- prise applications. Other personal chemical
388 Chemical Information Computing Systems in Drug Discovery
database programs that are available include is important to carefully consider the use of
ChemFinder from Cambridgesoft, Chem- any given property for drug discovery pur-
Folder from ACDLabs, ChemWindow from poses. Too often, properties are calculated
Softshell, and Aura-Mol from Cybula (110). simply because they are available, then used
in a QSAR analysis, and possibly applied to
firture predictions-all without proper consid-
4 CHEMICAL PROPERTY ESTIMATION eration of their precision, accuracy, and rele-
SYSTEMS vance to the chemical problem.
Given this caveat, it must be noted that
The design and screening of drug candidates is
there are a multitude of programs available
increasingly being conducted in silico. This is
for the calculation of properties of structures.
made possible by improvements in programs
Some programs compute only a single prop-
for property calculation and estimation. Here,
the term property calculation refers to the erty, like Lo@. Others calculate a series of
generation of some topological (depending values in a given genre of property, like molec-
only on the 2D structure), topographical (de- ular connectivity (111)or BCUT descriptors
pending on the 3D conformation), or physico- (112). Still others compute a vast range of
chemical property of a molecule-directly properties that include topological, topo-
from the structure. The term property estima- graphical, and physicochemical descriptors
tion refers to the generation of some property alike. It is beyond the scope of this chapter to
as a function of other properties-either detail all the programs and vendors that pro-
through a regression equation, a formula, vide property calculation and estimation soft-
neural network calculation, or some other in- ware. Many of the calculations are provided as
direct means. part of molecular modeling and QSAR pro-
The distinction between calculation and es- gram systems. Some programs and vendors
timation is important because some proper- whose products are solely for property calcu-
ties, like molecular weight, polar surface area, lation are described below.
molecular connectivity values, counts of
chemical functional groups, partial charges, 4.1 Topological Descriptors
and other quantum mechanical descriptors,
can be calculated precisely and de novo from Descriptors based on the 2D structure or sim-
the structure alone. Most of these properties ply on the connectivity matrix of a structure
have some fixed definition or algorithm that have long been used for chemical similarity
enables their calculation to be performed un- and for property correlations. Because they of-
ambiguously, with little or no error. What er- ten lack any relationship to mechanism, these
ror is present is usually systematic or deter- descriptors are best used within a congeneric
ministic. A second class of properties, series or at least a set of similar structures.
including LogP and other additive-constitu- They may be empirically useful for cluster
tive properties, may be calculated by fragment analysis and chemical library design, because
additivity with various correction terms. they are effective at representing structure
These properties differ from de novo proper- differences and similarities. A few programs
ties because they are approximations to the and providers of topological descriptors in-
true (sometimes measured) values. Often, clude the following:
there are multiple approaches to their calcula-
tion. The errors in the calculation of these Barnard Chemical Information-provides
properties are statistical or stochastic. A third chemical Fingerprint Generation Pack-to
class of properties includes those that can only compute fragment-based fingerprints for
be estimated from other properties, using a cluster and diversity analysis (113)
regression analysis, neural network, or other 0 DRAGON-implementation of about 1400
linear or nonlinear function of variables. The descriptors of Todeschini and Consonni
errors in these properties can be complex and (114) including constitutional, topological,
difficult to determine. For all these reasons, it autocorrelation, geometrical and functional
4 Chemical Property Estimation Systems
groups, and including simple molar refrac- QSAR, toxicology, oncology, and other bio-
tivity, polar surface area, and Moriguchi logical properties (122)
Sirius-Analytical-provider of instruments
Molconnz-EduSoR LC-pmvides MOLCONNZ for Lo@ and pKa determination, and the
molecular connectivity and electrotopologi- Absolv program to predict physicochemical
cal state descriptors of Kier and Hall (115) properties (123)
7
Master
dictionary I Moltable
dictionary
Tablename Struct-id
Structure
Source-id Formula
Location
...
pource table\,
Reactions'\,
Source-id
External-id External-id Struc-id
Role ... Prop-id
... Prop-value
...
/ \
1
I
, /
/
,
\
\
\
\
Property
dictionary
Prop-id
Method
Table
Figure 9.16. Star schema design of a chemical data warehouse. The central source table allows
access to the External-ID of every molecule, arranged by source database. These External-ID values
can be used to build multidimensional views of the data. For example, to see all the reactions with
products that can be found in source database ACD, one would combine data from the source
dictionary table (Source ID for database ACD), the reactions table (Struct-ID, and Role), and molt-
able (Struct-ID) table, using identifiers (External-ID)from the central source table.
structures that satisfy a search query, then sive, and a much smaller database is sufficient.
drill-down using a web browser to access the Such a data mart has the same architecture as
original data sources. In the case of reactions, a data warehouse, but it has only a single di-
the user might retrieve and browse a list of mension of structural data-for example, syn-
reactions that contain the structures that thetic reagents. The MDL Reagent Selector
were found in the search. In addition to drill- program is one example of a data mart of re-
down, a "hop-into" facility allows passing a set agent structures, with information on their
of structures into a search program or web price and availability from various suppliers
browser that is native to the source database (Fig. 9.17). It has a fact table that links struc-
being accessed. tures to their identifiers in the various source
databases. It stores properties that can be
5.2 Data Marts of Chemical Information
used to filter reagents, such as the molecular
For certain purposes, like reagent selection, a weight and Lo@, and it has pointers to sup-
data warehouse is too large and comprehen- plier information stored in the MDL Chemical
392 Chemical Information Computing Systems in Drug Discover)
Figure 9.17. Reagent Selector-an example of a chemical data mart. Various components of the
system are shown, including the data sources, the daemon program that automatically updates the
mart, the concordance database, and the clientlserver architecture, which is implemented in a three-
tier system.
Products Index (CPI) database. To aid in re- mon to see so-called "multitier" architectures
ducing the size of a hit list, a Reagent Selector in which the client program (the "application
user can filter reagents and sort on properties, tier") may be a very "thin" Web client that
availability, presence or absence of functional communicates to a more extensive "middle
groups, etc. (Fig. 9.18). Further list reduction tier" of programs that serve the immediate
can be achieved by clustering the structures needs of the client (see Glossary). Requests for
by means of a cluster analysis using substruc- searching and registration, which demand da-
ture keys as descriptors. tabase server resources, are passed from the
An important feature of Reagent Selector is middle tier to a "database tier" that corre-
the daemon program, which runs in the back- sponds mostly to the server part of former cli-
ground. This agent-like program "awakens" ent-server architectures. There are many ad-
on a fixed schedule and checks the various vantages to this arrangement. The programs
source databases for new or deleted structures can be distributed onto different computers to
or for changes in the structures and data. If optimize performance of the system. The mid-
any changes or additions are found, the dae- dle tier can be modified independently to ac-
mon updates the data mart accordingly, so us- commodate changes in the client and server.
ers will see the latest information when they From a development point of view, the various
run searches. Another aspect of chemical tiers in the architecture can be developed and
warehouses and data marts concerns their maintained on their own schedule, with mini-
physical architecture. It is increasingly com- mal dependence on other components.
6 Future Prospects 393
Figure 9.18. Filtering structures as part of the reagent selection process. The filter criteria include
criteria for structure complexity, logP, Hdonorlacceptor, molecular weight, formula, and substruc-
tures.
to generate knowledge that might not other- lished in the marketing, sales, and telecommLU-
wise have been evident. Once this knowl- nications fields (139). Data mining is being
edge is materialized, it can be managed, used increasingly by scientists, especially in
shared, and deployed for future applica- genomics and proteomics. Example applic:a-
tions. This process is termed Knowledge tions include the clustering of DNA array da~ t a
Discovery in Databases (KDD), and it is be- and using database information for protc!in
coming more widely practiced (138). secondary structure prediction (140). Exam-
ples are starting to appear in the field of d rug
Data mining is the mechanism by which design. Depending on the stage of a drug dis-
knowledge is derived from databases. It is gen- covery project, one can mine chemical stn1C-
erally defined as the extraction of predictive ture data for diversity, similarity, or specific-
models and associations from large volumes of ity, as shown in Fig. 9.20. This figure sho7WS
data using statistical and pattern recognition that the lead discovery, refinement, and opti-
techniques, usually for some competitive ad- mization phases of drug discovery proceed
vantage. Data mining is already well estab- through mini-cycles, each with their own daka
1 receptor
mining requirements. So far, data mining has value of the SMILES column in that table
mostly been applied to library design, QSAR, has an 80% similarity to acetic acid."
and ADME prediction (141). Whether the Again, the "= 1"parameter is an artifact.
techniques become more widely used will de-
pend on the accuracy of predictions made us- This approach greatly simplifies the devel-
ing them, on the availability of convenient opment of applications. Also, the searches can
software, and most of all, on clean and rele- take advantage of optimization that is built
vant data. into the relational database system. Fig. 9.21
shows a Web browser that uses the MDL reac-
Software: Integration with relational sys- tion data cartridge to perform structure and
reaction searches. The use of a direct search-
tems. A consequence of treating structures
ing approach with an object-relational data-
as relational data is a tighter integration of
base for combined retrieval of chemical and
once-specialized structure management biological information was reported by Cargill
software techniques with relational data- and MacCuish (142). In the field of data min-
base systems. In Oracle, so-called "data car- ing, the generation, storage, and deployment
tridges" are being increasingly used to allow of predictive models is fully integrated into
a chemist to treat structures like other rela- SQL Server 2000 and Oracle 9i, and this trend
tional data in a search. Structures, models, will soon extend to other relational database
and reactions can all be input, registered, systems (143).
and searched using standard SQL to which Another advance in chemical information
special operators have been added. SQL software that promises to have considerable
stands for Structured Query Language-the impact on drug discovery is "meta-layer"
standard language for querying relational searching, as described by Hoctor (144). In
systems (see Glossary). For example, in the this approach, queries entered by the chemist
Daylight relational data cartridge, substruc- are submitted first to a middle-tier search en-
ture and similarity searches in a reaction gine, the meta-layer, which automatically and
database can be conducted directly in SQL transparently generalizes and transforms the
as follows: query into several queries. These are then
submitted to various databases to retrieve
Substructure-to find reactions contain- "more of the same" kinds of information. The '
ing benzoic acid as a product: results are automatically formatted and pre-
SELECT * FROM RXN WHERE CON- sented to the chemist in the context of a Web
TAINS (SMILES, '>>O=C(O)clcccccl') browser (Fig. 9.22). Thus, a name search
= 1; might get converted automatically to a struc-
This statement translates to "Select ev- ture, for substructure or reaction searching, a
erything from the table named RXN, literature citation search, or a patent search,
where the SMILES field contains the sub- etc. The linking of searches across indirectly
structure string for benzoic acid as a related literature can also be used to generate
product". The "=1"clause is an artifact new knowledge (145).
of the data cartridge implementation; it
does not necessarily mean that only a sin- Hardware and Operating Systems: The
gle occurrence of the benzoic acid sub- value of parallel and distributed processing
structure should be found. was reported early in the development of
Similarity-to find how many reactions structure search systems (146). Since then,
have a solvent that is 80% or more similar some commercial products have adopted
to acetic acid: parallel processing. These mostly involve
SELECT COUNT (*) FROM MEDIUM CPU-intensive searching like conformation-
WHERE SIMILAR (SMILES, 'OC(=O)C', ally flexible 3D searching and docking. With
0.8) = 1; the exception of such tasks, the speed of
This translates to "Tell me the number of most chemical information searching is de-
rows in the table MEDIUM where the termined by data input and output (i.e, "110
Chemical Information Computing Systems in Drug Disco1
; F React~onSubstructure Search
! C React~onFlexmatch (Exact) -- r Subset
: Reaction Similarity -- Similarity thresholds:
.:, 80...............
I
20 r Sorted
Options
RSS Query H~ghl~ght~ng
Atom-Atom Number~ng
V Bond-Change Marks
Figure 9.21. Web client for an application that searches a relational reaction database. SQL state-
ments are used to select structures and reactions that satisfy the search query.
bound") because chemical structures are a dows 98 and Windows 2000 software strei
highly "verbose" type of data. As chemical and will give them continuing dominana
information systems integrate more with re- the PC market for a while. Linux is quic
lational systems, they can take advantage of catching on as an inexpensive alternative, i
the parallel and distributed processing capa- it has a strong foothold in the molecular n:
bility of the relational system. An important eling area, but it requires operating sysi
development is the "24-7" availability of expertise and lacks a business software bi
data in chemical databases (24 h/d, 7 d/wk). Small handheld personal data assistants
This can only be accomplished by distribut- becoming more capable, and wireless comr
ing and replicating databases across a ing is on the rise. The standard desktop cc
network. puter has at least a 2.0 GHz processor v
512 Mb or more of RAM, about a 30-72
It would be pure speculation to estimate hard drive, and a combination readtwrite
the impact of changes in hardware and oper- with DVD. A relational database of 1 mill
ating systems on chemical information man- structures consumes about 3-4 Gb of c
agement. Presently, Sun Microsystems is space and can be substructure searched i
probably the dominant Unix system in chem- few seconds, returning a hit list contain
ical database management, largely because of thousands of structures. It takes much lon
their network presence and their support of for a chemist to wade through the result
Java. Microsoft has released their Windows such a search or to process analytical or bic
XP operating system, which merges the Win- say results from a single combinatorial chc
7 Glossary of Terms 397
Figure 9.22. Using meta-layer searching to retrieve implicit information. A name search query is
converted to a structure, which is then transparently searched to add structure-based search results
to the literature citation. .
istry experiment than to run most data at a given position (Rgroups), specifying a
searches. In light of this, it seems evident that range of chain length size (link nodes) and
the tools that will succeed are those that will specifying atom, bond, or molecular data que-
best assist the chemist in extracting relevant, ries (Sgroups).
implicit knowledge from the data and deploy 2 0 Structure. In terms of chemical informa-
that knowledge for future benefit. tion, a collection of information about atoms
and bonds that can be displayed in a manner
7 GLOSSARY OF TERMS such that a chemist would recognize it as a
chemical structure. The atom and bond types
2 0 Query Feature. A structural feature and connections are usually explicit. The lay-
added to a 2D substructure search query to out of the atoms in the display may be explicit
generalize the query or make it more specific. (x,y coordinates) or implicit-determined at
An example atom query feature would be spec- the time of display. Hydrogen atoms may be
ifying a list of allowed atoms (Cl, Br, I) or lim- fully or partially suppressed to save storage
iting the number of attachments. A bond space.
query feature would be allowing a single or 3 0 Model. In terms of chemical informa-
double bond (S/D) or forcing the bond to have tion, all the information in a 2D structure plus
a particular stereochemistry. More complex at least one set of 3D atomic coordinates. This
query features can be used to specify which is a single conformation of the structure,
functional groups or substituents are allowed which is typically a low energy conformation
Chemical Information Computing Systems in Drug Discovery
Backtracking. One process that is used in Biological Data. This includes the results
mapping substructure atoms and bonds to the of in vitro and in vivo assays, toxicology and
corresponding atoms and bonds in a candidate metabolism studies, DNA and protein array
structure. Given a certain query-say, an data, etc. It complements the chemical data,
amide group [--C(=O)N-1, a backtracking and increasingly, both chemical and biological
algorithm searches first for a carbon atom, data are being stored in large corporate rela-
then for an oxygen atom, then checks to see if tional databases. At any given stage in the
they are doubly-bonded, and finally checks to drug discovery process, obtaining and analyz-
see if a nitrogen is singly-attached. At any step ing the biological data has traditionally been
in the process, if the check fails (e.g., with an considered the more complex and rate-limit-
ing step in the process. The application of
ester, the final check would fail), the program
high-throughput methods to screening and
"backtracks" to the last successful step and
pharmacokinetic analysis is yielding consider-
examines another eligible atom or bond. If no
able benefit in the collection and processing of
eligible atoms or bonds are found at that step, biological data.
it backtracks to the next previous step in turn. Bitset. A contiguous set of binary digits
This procedure is guaranteed to find a map- (bits, 011) in computer memory. Bitsets are of-
ping, but it can be slow, especially with large ten used in chemical information to store col-
or highly symmetric queries or structures, lections of yestno, presencelabsence, and ac-
where a multitude of similar paths must be tivelinactive responses in a compact form.
examined. Alternative approaches that use an Bitsets are used to store substructure search
indexed tree can be faster, especially for large keys for each structure (fingerprints), which
databases. are used in similarity calculations. Bitset in-
BCUT Descriptors. Descriptors of chemical dexes are common features of relational data-
structure that are derived from an eigen anal- bases, where a collection of bits, one for each
ysis of the connection table of the structure. structure in the database, can store the pres-
The class of BCUT descriptor depends on the ence or absence of a given piece of data for
quantities that are stored in the table (simple each structure, or, in the case of a substruc-
connection information versus electronic or ture search, the a compact representation of
steric interaction values). BCUT descriptors the result set from the search. The advantage
have found value in molecular diversity and of bitset representation is that computers can
chemical library design. perform very fast logical operations (union
Binary Data. Data stored in a file or data- and intersection) on bitsets, which enables fil-
base that is not chemist-readable, and usually tering and subsetting of large lists of struc-
cannot be converted to printable characters. tures and data.
Examples include connection table storage in BLOB. A Binary Large Object data type.
a database, substructure search keys, and a This data type is used in Oracle, for instance,
graphics image of a structure. Note that some to store large amounts (e.g., up to several Gi-
other data that is also not chemist-readable, gabytes) of binary data. Storage of the connec-
like certain linear notations (e.g., a Chime tion table and all the perceived structural in-
string), may be made up of printable charac- formation for a registered structure is one
ters and is not strictly binary data. example. Another example is the storage of
Bioinformatics. The application of statisti- the entire fastsearch index for a database,
cal and mathematical techniques to turn se- which can be accessed as a single object by the
quence data into useful biological informa- Oracle data storage and retrieval routines.
tion. The general goal of bioinformatics is to Bond Stereochemistry. This complements
define the structure, location, and function of the atom and molecule stereochemistry of a
the proteins and nucleic acids that are the structure. A given double bond can be as-
products of the processing of a genome. The signed Z or E, or cis or trans stereochemistry
application of bioinformatics in drug discovery based on the attachments. If the stereochem-
is primarily the identification of new thera- istry is unknown or is a mixture, it can be
peutic targets. assigned a value of "either." In a substructure
400 Chemical Information Computing Systems in Drug Discovery
search, the bond stereochemistry can be spec- a database, or more commonly, a subset of
ified in the query to limit the scope of the these. It might consist of diverse structure
search. In some registration systems, the bond types or it might represent the enumeration of
stereochemistry of a given structure is per- one or a few generic structures. Libraries can
ceived from the input drawing of the struc- be classified according to the stage of discov-
ture. In the case of linear notations, it can be ery-i.e., diverse libraries for lead discovery,
specified by characters in the string (e.g., focused libraries for lead development, and op-
Cl\C!=C\Clspecifies trans dichloroethene). timized libraries for lead optimization.
BRN. The Beilstein Registry Number, Chemical Space. A loosely defined concept
which can be used to access structures in the that all the known or possible chemical struc-
Beilstein database. tures define some multidimensional svace . in
Business Rule. An established convention which the structures are points. Structures
for the representation of data in a given com- that are topologically or topographically simi-
pany or laboratory. In the case of chemical lar to each other (i.e., look similar), cluster in
structures, an example of a chemical business chemical space, and by the principle of chem-
rule would be "all nitro groups should be ical similarity, should show similar physico-
drawn as -N(=O)(=O)-and not the charge chemical and biological properties. This is the
separated form -Nf ( 4 - ) ( = O ) . " In the basis for diversity analysis of chemical librar-
case of biological data, a business rule might ies. The challenge is to select or discover prop-
enforce the units in which a given piece of test erties of the structures that define the chemi-
data is reported (e.g., dosage in mmol/kg). cal space and can be used.
Business rules can be enforced by preprocess- CIP Stereochemistry. Cahn-Ingold-Prelog ste-
ingdata before it enters the database, or in the reochernistry conventions. An IUPAC approved
case of multiple, diverse data sources feeding and widely used set of rules for assigning stereo-
into a data warehouse or data mart, the data isomers based on atom and group priorities (see
can be transformed to the correct form before http://www.chern.qmw.ac.uk/iupac/stere~~.
storage in the warehouse. Cleaning and Transforming Data. When im-
Canonical Numbering. Reordering the porting data from diverse data sources (files,
numbering of atoms in a structure to a unique databases, spreadsheets, LIMS systems, etc.)
order, based on the extended counting of the into a database or data warehouse, the qata
number of attachments at each center the usually needs to be standardized, checked, and
atom and bond types, etc. sometimes transformed to some common for-
CAS Number. Chemical Abstracts Registry mat and content. This allows faster search and
identification number-very widely used to retrieval, and serves as a check of data integ-
identify chemical structures. rity. The rules that define the cleaningltrans-
Chem(o)lnformatics. By analogy with bioin- formation process are often termed "business
formatics, this is the application of statistical rules," and in the case of chemical data, they
and mathematical techniques to turn chemi- may include checking and modification of
cal structure data into useful chemical and bi- chemical structures.
ological information. It makes use of tech- Client-Server Architecture. A computer ar-
niques from statistics, pattern recognition, chitecture in which a "server" computer (usu-
artificial intelligence, and data mining to de- ally a larger and faster machine at a central
rive useful predictive relationships between location) runs programs that communicate
structures and their biological or physico- over a network with numerous workstations
chemical properties. Broadly considered, or "client" machines that reside in offices and
cheminformatics also includes the input, stor- laboratories. The server computer performs
age, management, and searching of chemical heavy duty computing tasks such as database
structure information. searching and molecular and data modeling,
Chemical Library. A collection of struc- in response to commands from the users of the
tures, real or virtual, that is the current start- client comvuters. It then communicates the
ing point for high-throughput screening or results back to the client machines. There, de-
analysis. A library may be all the structures in pending on whether the client is "thick" (a
7 Glossary of Terms 401
relatively large and capable application that can be "parsed" by a freely available computer
can display and manipulate data and struc- program that can return the structural infor-
tures), or "thin" (a small program, possibly mation on demand.
running in an Internet browser), the data is Combina torial Chemistry. The application
displayed, manipulated, and reported. Client- of high-throughput, parallel methods to the
server architecture is two-tier, and is being synthesis, analysis, screening, and testing of
supplanted by more versatile multi-tier ap- materials. This approach relies on robotics
proaches. and computer-assisted methods to generate
Clipping. The computer application of a and analyze the results. Synthesis, analysis,
chemical transformation to a set of structures. and testing of samples occurs in the wells of
One example would be the conversion of a set microtiter plates, which may contain as few as
of o-subsituted phenols to a generic represen- 96 samples or as many as a few thousand.
tation with the ortho substituents collected Solid-phase and solution methods are used,
into an Rgroup attached to the parent phenol and samples may be 66 one-bead-one-com-
structure. The reverse process, going from the pound" or they may contain mixtures, which
generic structure to all the specific non-ge- require "deconvolution" to determine which
neric structures, is termed enumeration. Clip- component is responsible for observed activ-
ping also includes functional group transfor- ity.
mations, such as converting a ketone to an CONCORD. Rapid 2D to 3D conversion
alcohol. In the process of cleaning and trans- program introduced by Robert Pearlman's
forming chemical structure data, clipping may group in 1987. It generates low energy-ap-
be involved when chemical business rules are proximate 3D models from 2D connection ta-
"enforced." bles. It can also do stereo "multiplexing,"
CLOB. A Character Large Object data where multiple configurations of stereochemi-
type. This data type is used in Oracle, for in- cally ambiguous structures are generated.
stance, to store large amounts (e.g., up to sev- Marketed by Tripos, Inc.
eral Gigabytes) of character data. Storage of Concordance. A data warehouse architec-
structures in a relational database in mole- ture used in MDL relational chemical and re-
cule-file format is an example. action databases. The central "fact" table of a.
Cluster Analysis. The process of discovering concordance has a record for each unique
"natural" groupings of points in the space of structure in the database, with pointers to the
some measurements or descriptors. In chemi- instances of the structure in various "source"
cal information management, one often clus- databases.
Connection Table. A table or matrix con-
ters chemical structures for diversity analysis
taining topological information about a chem-
or to subset the results of a search. Structures
ical structure. A structure can be considered a
are most often clustered using functional "graph" in 2D space, with atoms as "nodes"
group fingerprints as the descriptors. Cluster- and bonds as "edges." The atom connection
ing methods usually consist of either parti- table has one row and one column for each
tioning methods like k-means and Jarvis- atom. The diagonal elements of the table are
Patrick, or hierarchical methods, which may usually the atomic number, and the off-diago-
work by successively dividing the points (divi- nal elements have a zero or null if two atoms
sive clustering) or by successively aggregating are not connected; otherwise they contain the
points (agglomerative clustering). Cluster order (1, 2, 3, aromatic, etc.) of the bond con-
analysis is an important part of unsupervised necting the row and column atom. A less com-
data mining and pattern recognition. mon connection table is the bond connection
CML. The Chemical Markup Language. table, in which the rows and columns are the
Based on XML and HTML, it provides a stan- bonds in the structure, the diagonal elements
dard self-documenting molecule file and infor- are the bond order, and the off-diagonal ele-
mation interchange format. Information is de- ments contain information about the atoms at
scribed by tags and values. A CML document the ends of the bonds.
402 Chemical Information Computing Systems in Drug Discovery
a data warehouse, because historical trends drill-down. The opposite process, which aggre-
are important. For this reason, the warehouse gates data, is termed roll-up.
grows very large over a long period of time, ECTL. The process of Extracting, Cleaning,
and thus its organization and indexing are Transforming, and Loading data into a data
crucial considerations. An example in chemis- mart or data warehouse. The data in a mart or
try would be a single database containing warehouse should be standardized, complete,
structures, models, reactions, and data, all unambiguous, etc. Raw data from files, instru-
cross-referenced, and used by chemists, biolo- ments, databases, the Internet, etc., must usu-
gists, and modelers. Typically, each group ally be preprocessed before it is "clean"
enough to be used in decision making. Struc-
would extract their own data mart from the
tures present special problems because tau-
warehouse, containing information relevant
tomers, isomers, salts, etc. may all represent
to their needs. Data warehouses are often used valid forms. The use of chemical processing
in decision support systems (DSS) to provide languages, which can search for substructures
data on which to base important corporate de- and make modifications of specific atoms and
cisions. bonds, enables the enforcement of chemical
Database Tier. In a three-tier programming business rules during the ECTL process.
architecture, the database tier resides on a Encryption. The conversion of data in a
server computer with access to the databases readable or decipherable code into another,
and the programs that manage them. possibly undecipherable, code. The most com-
Deduplication. When registering into a mon encryption involves sensitive pieces of
chemical structure database, the process of data like passwords and identification num-
finding whether the given structure already bers. In chemistry, it is sometimes necessary
exists in the database. This usually involves to encrypt larger pieces of information, such
performing an exact match search with the as chemical structures and the results of as-
given structure as the search query. Note that says-at least during passage of such informa-
the definition of exact match may vary with tion over networks or the Internet. Decryption
the database, and it may even be configurable. of the information typically requires one or
For example, some databases may consider more keys, which are often built into the en-
tautomers to be acceptable as exact matches, cryption and decryption software. .
whereas others may require a more strict def- Enumeration. The systematic substitution
inition. of all the Rgroup members in a generic struc-
Dimension Tables. In a data mart or ware- ture, giving each possible specific structure
house, the dimension tables store non-redun- the generic structure represents. If some of
dant information about the entries in the fact the Rgroups are not converted in the process,
table of the database. For the chemical exam- it is termedpartial enumeration.
ple of an inventory data mart, the fact table Equivalence Class. In the canonicalization
stores the various source database identifiers of structures that have some element of sym-
of each unique structure in the data mart. A metry, certain atoms that are topologically
dimension table of molecular formulas would equivalent may yield the same canonical num-
store the formula for the unique structure in ber. These atoms are considered to be in the
the mart, rather than storing the same for- same equivalence class. The concept of equiv-
mula for each occurrence of that structure in alence class is used, for example, in the Day-
the various source databases. light Chemical Information Systems handling
Drill-Down. Accessing data with increas- of reactions, to examine equivalent atoms
ing amounts of detail. When examining and when mapping reactant and product atoms.
browsing the results of a database search, a Exact Match Search. One type of structure
chemist can often request further information searching in which a query molecule is
about a structure, even though that informa- searched for in a database of structures. To
tion was not included in the search. The pro- exactly match the query, the target structure
cess of accessing further information, often must be topologically identical and not be a
stored in a hierarchical manner, is termed substructure or superstructure of the query.
404 Chenmica1 Information Computing Systems in Drug Discovery
"Rgroups" or "substituent groups" that can Hit List. Older term for a list of identifiers
each contain multiple substituents or frag- of structures or other objects obtained from a
ments. database search. A more modern term is "re-
Gigabyte. One thousand megabytes, or 10' sult set."
bytes of data. The largest chemical structure HTML. HyperText Markup Language. The
databases presently contain a few tens to hun- most commonly used specification language of
dreds of gigabytes of data. A typical structure the Internet. Other markup languages of in-
in a database may require a few thousand terest in chemistry includ; XML (extensible
bytes of data to store the connection table, co- Markup Language-information in general),
ordinates, and other structure-specific data. CML (Chemical Markup L a n g u a g e chemical
Graph andsubgraph lsornorphisrn. In chem- structures), VRML (Virtual Reality Markup
istry, the mapping of a structure or substruc- Language-3D visualization), and PMML
ture query to a target structure. All the atoms (Predictive Model Markup Languagedata
and bonds in the query (the nodes or vertices mining).
and edges of the "graph" of the query struc- Index. A secondary data field generated
ture) must be mapped to corresponding atoms from one or more primary data fields, to en-
and bonds in the target structure to generate a hance the searching and retrieval of the pri-
hit. mary data. An index in a chemical database
Hash Code. Converting a set of numeric or may be a characteristic of the database, such
character properties into a single, mostly as Oracle indexes, or it may be a chemistry-
unique, number, for the purpose of rapid specific index such as a tree index for substruc-
lookup and retrieval. For example, in the case ture searching. Indexes require extra space,
of chemical structures, it is common to gener- and they typically must be created and main-
ate and store a hash of the molecular formula, tained by some administrative process in the
so that when a user requests a formula search, database.
the search query typed by the user is con- Inventory Data. Typically, information
verted to the same hash number, and a single about the availability of reagents for chemical
lookup in the index gives all the structures synthesis. This includes the suppliers, pack-
that correspond to the given formula. A hash age sizes, purity, and cost of commercial re-
code is often generated as a linear combina- agents, and the location, owner, and availabil- .
tion of the possible values of each of the prop- ity of in-house reagents. Increasingly, this
erties (e.g., nlPl+n,P,+. . ., where the n's are information is being integrated with chemical
selected such that the products never overlap). structure databases and warehouses and with
If several structures have the same hash code, automated ordering and procurement pro-
they are termed "collisions," and typically re- grams.
quire further processing-like substructure Inverted Keys. When substructure search
searching-to differentiate them. keys are generated for a structure, they may
Hierarchical Clustering. One of three main be stored in normal order (where each record
types of clustering applied to chemical struc- represents a structure, and the bits or fields
tures (hierarchical, partitioning, and fuzzy for that structure represent the keys). Alter-
clustering). In hierarchical clustering, a tree natively, they may be stored in inverted or piv-
or dendrogram is constructed, with one struc- oted order, where each record represents a
ture at each of the leaves of the tree. By "trim- given substructure key, and the bits represent
ming" the tree at a given level, one can collect structures that have that particular key set.
structures into a given number of clusters, This type of storage benefits key searching,
such that all the structures in a single cluster where a user wants all the structures that
have some level of similarity to each other. have a particular key set.
High-Throughput Chemistry. Application of Isomer and Tautorner Search. A search types
parallel processing to the synthesis, analysis, where bond order, hydrogen counts, certain
and screening of structures. A subset of high- atom valences, and bond or atom stereochem-
thoughput chemistry is combinatorial chemis- istry may be allowed to vary from those speci-
try. fied in the query. Such searching allows re-
406 Chemical Information Computing Systems in Drug Discovery
OLAP. OnLine Analytical Processing. An sis, and supervised methods such as curve
activity that involves routine searching, anal- fitting and classification. Engineering applica-
ysis, and reporting on data stored in a large tions of pattern recognition include recogniz-
database. The database, which may have a ing objects in pictures, and character and voice
data mart or data warehouse organization, is recognition. Chemical applications are found
optimized for the kinds of searches and re- in the fields of drug discovery, analytical
ports that it supports. It may not be optimal in chemistry, and chem/bioinformatics.
organization for transaction processing (OLTP), Petabyte. One thousand terabytes of data
which may involve registration of small (1015bytes). At present, the largest databases
amounts of data on an irregular schedule, or of chemical and biological information are gi-
for data mining, which involves the retrieval
gabytes (lo9bytes) in size.
and analysis of large volumes of data. In chem-
Pharmacophore. The minimum amount of
istry, an example of OLAP might be an inven-
tory application in which the chemist draws in chemical functionality needed in a drug to
a structure or substructure, runs one or more elicit a given biological response. This func-
filters on the resulting result set, and retrieves tionality is defined in terms of atoms and func-
and prints a report of structures and inven- tional groups and their geometric relation-
tory data. ships to each other, including distances,
OLTP. OnLine Transaction Processing. An angles, etc. A pharmacophore query is the rep-
activity that involves registration, update, or resentation of a pharmacophore in a format
simple searching in a database of transactions. that can be used to search a chemical database
In chemistry, this might be the routine regis- for structures that can satisfy the pharmaco-
tration of a new structure and analytical data hore and may elicit the desired response.
into a chemical database. Such a database is Pharmacophore searching is usually con-
optimized for registration and may not be suit- ducted on a 3D structural database using
able either for more analytical types of search- search software that combines 2D searching
ing and reporting (OLAP) or for data mining. with conformational analysis to find struc-
Parallel Processing. A technique whereby a tures that can, by rotating about single bonds,
given computer task is distributed among sev- adopt a conformation that satisfies the phar-
eral central processing units (CPUs). The macophore.
CPUs may be part of the same computer (e.g., Pharmacophore Keys. Originally designed
a multiprocessing computer in which several to speed pharmacophore query searching,
CPUs share common memory and physical de- pharmacophore keys are bitset fingerprints
vices), or they may consist of several single- that indicate the presence or absence of given
processor computers (a "cluster") that are 3- or Cpoint pharmacophores in a structure
networked to rapidly share information and stored in a database. The 3-point pharmacoph-
disk space. In database management, it is be- ore keys represent triangular arrangements of
coming increasingly common to have parallel atoms and functional groups separated by
copies (replications) of a given database at sev- given distances or distance ranges. The
eral sites, perhaps worldwide. Special data- 4-point pharmacophore keys represent tetra-
base and networking software provides rapid hedral arrangements of atoms and functional
updates of certain information like data and groups. When a structure is registered into a
periodic updates of other information like 3D database, a rapid conformational analysis
search indexes. is performed involving key single bonds in the
Pattern Recognition. The application of structure. From the interatomic distance
computers to build descriptive or predictive ranges between given atoms and functional
models i . . , find patterns) of information groups in the structure, the various bits in the
from input datasets. The techniques of pat- pharmacophore keys are set. These keys can
tern recognition overlap those used in statis- be used as filters in pharmacophore searching,
tics, chemometrics, and data mining, and in- or increasingly, as filters before docking the
clude data display, description, and reduction, structures into a known receptor (virtual
unsupervised methods such as cluster analy- screening). Pharmacophore keys have also
7 Glossary of Terms
been used less commonly as descriptors in cently, tools such as partial least squares
QSAR and data mining. (PLS), neural networks, and a variety of data
Physicochemical Properties. Originally these mining methods like decision trees and sup-
were just measured properties like melting port vector machines have come into use.
point, pKa, solubility, and octanoVwater LogP. Reacting Center. An atom in a reactant
Increasingly, they are obtained from pro- which is modified during the course of a reac-
grams that can calculate them from the 2D or tion. Specifying reacting center information
3D structure. In QSAR, the classical triad of when searching for reactions can speed the
steric, electronic, and lipophilic properties is search and reduce the number of incorrect
i still widely used, but it has been enhanced to hits.
include enery-based descriptors, measures of Reaction Scheme. A series of one-step reac-
binding interaction, 3D-QSAR multivariate tions that lead from a given reactant to a given
descriptors (CoMFA), and others. Quantum product, by way of intermediate steps. A reac-
mechanical calculations are being used in- tion search system should be able to find reac-
creasingly to estimate physicochemical prop- tant/product combinations that span several
erties. Once calculated, the properties are intermediate reactions.
used to filter structures which may have unde- Refine a Search Query. The process of add-
sirable ADMET criteria (e.g., the "rule of ing or modifying constraints of a search query
five"), or they may be used directly in models to reduce or increase the number of hits. Con-
to estimate the type or level of biological activ- straints may be added, removed, relaxed, or
ity (QSAR). tightened to achieve the desired search re-
Pivoting Data. Changing data from row to sults.
column values or vice versa. This technique Registry Number. A unique identifier as-
can be a very useful tool for summarizing data. signed to a chemical structure or other piece of
One example of pivoting is to convert sub- data when it is registered into a database. The
structure keys that are stored by structure registry number may be internal, primarily
(with a bit turned on for each key the struc- for use by the database search system, or ex-
ture contains), to storage by key (with one bit ternal, to be used by chemists and to link the
turned on for each structure that has the given data to other databases and files.
key). Another example is to convert assay data Relational Database. A common database.
that is stored by structure, to data that is architecture in which related data items are
stored by assay. In the process of pivoting stored in separate tables, accessed by key
data, it is common to consolidate values, for fields, and indexed for rapid search and re-
example, converting raw assay results to ED,, trieval. The dominant relational database sys-
values, or taking the average of some physico- tems used in pharmaceutical discovery in-
chemical property. clude Oracle, Microsoft Access and SQL-
Proteomics. The conversion of protein se- Server, and IBM DB2.
quence data into useful biological informa- Result Set. A list of records resulting from a
tion. In general, the goal of proteomics is to database search. The result set commonly con-
characterize a gene product-i.e., protein-as sists of a list of record identifiers (sometimes
to its structure, subcellular location, and func- called a cursor), which can be navigated to se-
tion. Additional information includes how a lect records. In some systems, a result set may
protein interacts ("networks") with other re- also contain related data for each record.
actions and cell processes. Retrosynthetic Analysis. An approach to
QSAR. Quantitative structure-activity re- computer-assisted synthesis design that starts
lationships-the science of deriving quan- with the products of a reaction or sequence of
titative linear or nonlinear mathematical rela- reactions and works backwards toward the re-
tionships between physicochemical and topo- actants. An example program that imple-
1ogicaVtopographical properties of chemical ments retrosynthetic analysis is the LHASA
structures and their biological activity. Origi- program of E. J. Corey's group.
nally, regression analysis was the only tool Rgroup. In a generic or Markush structure,
used to derive QSAR equations. More re- generalized substituents or moieties are given
410 Chemical Information Computing Systems in Drug Discovery
the representation R,, R,, etc. These Rgroups light Chemical Information Systems software
represent collections of specific substituents and widely supported by other systems.
or moieties (members) that can be replaced at SQL-Structured Query Language. The
the given position. standard query specification language for
Roll-up. The agglomeration, summariza- searching relational databases. Most database
tion, or consolidation of data into a summary systems support the SQL standard but then
presentation. Roll-up often involves summa- add extensions particular to their implemen-
rizing data at a given level in a data hierarchy. tation.
Examples would include the average of several Star Schema. A standard data warehouse
ED,, values, or a simple yes/no indication that architecture, characterized by Ralph Kimball,
in which a central "fact" table is connected to
toxicity data for a given structure exists some-
various "dimension" tables.
where in the database.
Structural Data Mining. Application of data
Root Structure. The invariant portion of a
mining methodology to chemical structure
Markush or generic structure. The attached and reaction databases. Currently in its in-
Rgroups contain the substituents that vary fancy, it remains to be seen whether a "data
from one specific structure to the next. Some- snooping" approach to information and
times termed a parent structure. knowledge discovery can be as useful in drug
ROSDAL. Linear notation scheme devised discovery as it has proven to be in finance,
by the Beilstein Institute. It can contain just marketing, and merchandising.
connection table information, or it may also Substance. In some structure databases, an
contain atom coordinates. Several chemical entry that lacks a structure completely (a
information systems can convert ROSDAL "nostructure"), is only partially character-
strings to other structure file formats. ized, or is an unspecified mixture of known
Sgroup Data. In MDL structure storage, structures. Substances pose obvious problems
the attachment of structure-differentiating in database searching.
data directly to the structure. Such data may Substructure Search (SSS) Keys. Originally
relate to the structure as a whole, or to atoms, developed to facilitate substructure searching,
bonds, fragments, or collections of atoms and these consist of a string of bits that represent a
bonds. Examples would include atomic partial fingerprint of the structure with respect to ei-
charges on 3D models or percent composition ther (1)a set of known and defined functidnal
attached to components of a formulation. groups (e.g., MDL), or (2) a set of discovered
Similarity Search. A type of "fuzzy" struc- atom-bond paths that the structure contains
ture searching in which molecules are com- (e.g., Daylight). In MDL systems, the sub-
pared with respect to the degree of overlap structure keys are currently either 166 or 960
they share in terms of topological and/or phys- bits in length. In Daylight systems, the sub-
icochemical properties. Topological descrip- structure keys are of varying length and can
tors usually consist of substructure keys or be "folded" to achieve a higher density of bits
fingerprints, in which case a similarity coeffi- turned on. Although SSS keys were originally
cient like the Tanimoto coefficient is com- developed to screen candidates for substruc-
puted. In the case of calculated properties, a ture searching, they are currently used more
simple correlation coefficient may be used. for similarity calculations.
The similarity coefficient used in a similarity Substructure Search. Application of "sub-
search can also be used in various types of graph isomorphism" search to chemical struc-
cluster analysis to group similar structures. tures. This consists of finding a particular ar-
SLN-Sybyl Line Notation. Linear notation rangement of atoms and bonds as they are
used in conjunction with Tripos SPL (Sybyl embedded in a chemical structure. The ar-
Programming Language) to manipulate rangement being searched for is termed the
chemical structures. It is similar in syntax to query substructure, the structures being
SMILES notation. searched are termed the candidates, and any
SMILES. Simplified Molecular Input Line particular structure in that set is termed a
Entry System-linear notation used in Day- target structure. If the query substructure is
7 Glossary of Terms
found in the target, the target structure is sive activity like graphics and calculations on
added to the result set (or hit list). In display- individual molecules. The alternative thin-cli-
ing the results of the search, the atoms and ent architecture either does not require much
bonds in the substructure as mapped onto the local computing, or it uses a built-in resource
hit may be highlighted (shown darkened or in like an Internet browser as a client.
a different color) in the structure display. In Toolkit. A collection of computer routines
general, more than one occurrence of a sub- that each perform one or a small number of
structure may be found in a given structure, information management tasks. The routines
and substructure mappings may be overlap- are provided as a library and they can be in-
ping or non-overlapping. corporated into custom user-written applica-
Superstructure Search. Modification of sub- tion programs to carry out tasks that ordinary
structure search in which the substructure application programs may not perform. The
query becomes the target structure, and the interface between the toolkit routines and the
target structure in the database becomes the user-written program is referred to as the Ap-
substructure search query. The search finds plication Programming Interface, or API.
structures in the database that are substruc- Topographical. Structure data that is
tures of the query. A similar extension to based on the connection table and the 3D
structure similarity searching yields super- structure of a molecule. Examples include sur-
structure similarity searching. face area and volume and pharmacophore dis-
Supervised Data Mining. Searching large tances between atoms.
volumes of data for hidden predictive relation- Topological. Structure data that is based
ships. Supervised analysis requires one or only on the connection table of the structure,
more "dependent" or response variables, to be without regard to 2D or 3D coordinates of the
predicted from a set of "independent" or pre- atoms. Examples include molecular weight
dictor variables. The techniques used include and formula, counts of substructures, and in-
various classification methods (decision tree, dices like molecular connectivity.
support vector, Bayesian) and various estima- Tree. A data structure that is widely used
tion methods (regression, neural nets). in chemical information storage. Commonly
Tanimoto Coefficient. Standard coefficient viewed with the root of the tree at the top (or
for computing the similarity of chemical struc- to the left), successive levels of branching lead .
tures. If structure A has 20 bits turned on in a to the "nodes" of the tree and ultimately to its
fingerprint, and structure B has 30 bits turned "leaves" (terminal nodes). Depending on how
on, and the two structures have 10 bits in com- they split at a node, trees may be binary or
mon, the Tanimoto coefficient is 10/(20+ 30 - n-nary, and depending on how their nodes are
2 X 10) or 0.33. Its value can range from 0 (no distributed, they may be balanced or unbal-
similarity) to 1.0 (perfect match). Other simi- anced. A tree is usually traversed from the
larity coefficients are also used, and in some root to the leaves, and this traversal can be
systems (such as MDL) the various bits are depth-first (followinga single path until a leaf
weighted inversely according to their occur- node is reached), or breadth-first (looking at
rence in the database, so that very common all the nodes at a given level). An example of a
substructures do not contribute much to the tree data structure is the fastsearch index
similarity. used in MDL substructure searching.
Terabyte. One thousand gigabytes, or 1012 Unicode. A 32-bit successor to the ASCII
bytes of data. The largest relational databases character set. With Unicode, foreign alphabets
of any kind are currently a few tens of ter- and special characters can be encoded.
abytes in size. At present, the largest data- Unix. Widely used operating system for
bases of chemical and biological information workstations and server computers. Various
are gigabytes (lo9bytes) in size. computer vendors supply their version of
Thick or Thin Client. A thick client architec- Unix, which typically descends from either the
ture is one in which a significant amount of Bell Labs or Berkeley versions. A microcom-
computing is done on the user's workstation. puter version of Unix is Linux, which is rap-
This is appropriate for user-interface-inten- idly growing in acceptance.
412 Chemical Information Computing Systems in Drug Discovery
Unsupervised Data Mining. Searching large 4. W .V . Metanomski, J. Chem. Inf Comput. Sci.,
volumes of data for hidden descri~tiverela-
A
35,173-174(1995).
tionships. Unlike supervised data mining, no 5. The 5th International Conference on Chemical
response variables are used. The techniques Structures, June 6-10, 1999, Leeuwenhorst
used include various display and data reduc- Congress Center, Noordwijkerhout, The Neth-
tion methods, as well as cluster analysis and erlands, available online at http://www.
chemweb.com/conference/5iccs/5iccs.html,ac-
association analysis. cessed on September 10,2002.
VARCHAR, VARCHAR2. SQL data types
6. The 2001 Gordon Research Conference on
used to store character data in a relational da- Quantitative Structure Activity Relationships,
tabase system. Storage is limited to about August 5-10,2001, Tilton School, Tilton, N H ,
4000 characters, so larger pieces of data must available online at http://www.grc.uri.edu/
be stored as CLOB data. programs/2001/qsar.htm, accessed on August
Virtual Screening. Using computer model- 28,2001.
ing to screen leadsfor abtivity. The screening 7. The 2001 International Chemical Information
may be through some QSAR or data mining Conference, October 21-24, 2001, Nimes,
model, which typically requires only 2D struc- France, available online at http://www.
tures and data, or it may involve 3D molecular infonortics.com/chemic~index.html, accessed
modeling and docking with a known or puta- on August 26,2002.
tive receptor. The speed and increasing accu- 8. H. R. Collier, Chemical Information, Springer-
Verlag, New York, 1990;H . R. Collier, Ed., Re-
racy of virtual screening make it a vital step in
cent Advances in Chemical Information, Royal
the drug discovery process. Society of Chemistry, London, 1993;H . R. Col-
XML. Extensible Markup Language--a lier, Ed., Further Advances in Chemical Infor-
widely used standard for producing self-docu- mation, Royal Society o f Chemistry, London,
menting text. Documents that subscribe to the 1994.
XML standard can be freely exchanged over 9. Y. C. Martin and P. Willett, Eds., Designing
networks and between applications, using Bioactive Molecules: Three-Dimensional Tech-
standard parsing programs to interpret the niques and Applications, American Chemical
document. CML is an extension of XML that Society, Washingbon, DC, 1998; P. Willett,
can be used to transport structures, reactions, Three-Dimensional Chemical Structure Han-
and chemical and biological data. A query lan- dling, vol. 1,Wiley, New York, 1991.
guage, XMLQuery, is being developed to allow 10. W . Warr and C. Suhr, Chemical Informaiion
Management, Wiley, New York, 1992; W.
searching of XML documents in a manner sim-
Warr, Ed., Chemical Structures. The Interna-
ilar to the use of SQL to search relational da- tional Language of Chemistry, Springer-Ver-
tabases. lag, New York, 1988;W . Warr, Ed., Chemical
Structures 2. The International Language of
8 ACKNOWLEDGMENTS Chemistry, Springer-Verlag, New York, 1993.
11. G. D. Wiggins and K. Emry, Eds., Chemical
Grateful acknowledgement is made to Guenter Information Sources, McGraw Hill, New York,
Grethe, Stephen Heller, Lingran Chen, and Tim 1991.
Hoctor of MDL Information Systems, Inc. for 12. R. E. Maizell, How to Find Chemical Informa-
useful discussions during the preparation of this tion, Wiley, New York, 1998.
chapter. 13. J . E. Ash,W . A.Warr, and P. Willett, Chemical
Structure Systems: Computational Techniques
REFERENCES for Representation, Searching, and Process of
1. J. Knight, Nature, 412,571(2001). Structural Information, Ellis Horwood, New
2. Some examples include Drug DiscoveryWorld, York, 1991.
New Drugs, Nature Reviews Drug Discovery, 14. D. D. Ridley, Online Searching: A Scientist's
Current Drug Discovery and Current Opinion Perspective; A Guide for the Chemical and Life
in Drug Discovery and Development. Sciences, John Wiley & Sons, Chichester, New
3. The Chemical Heritage Foundation, available York, 1996.
online at http://www.chemheritage.org/His- 15. Computational Informatics Research Group,
toricalServices/cheminfo.htm, accessed on available online at http://www.shef.ac.uW-is/
September 10,2002. researcNcirg.htm1, accessed on June 19,2002.
References
16. J. Gasteiger Research Group, available online 32. P. Willett, Ed., Modern Approaches to Chemi-
at http://www2.ccc.uni-erlangen.de,accessed cal Reaction Searching, Gower Press, Brook-
on March 6,2002. field, VT,1986.
17. G. D. Wiggins, available online at http://www. 33. S. R. Heller, Ed., The Beilstein Online Data-
indiana.edu/-cheminfol, accessed on Septem- base: Implementation, Content, and Retrieval,
ber 16,2002. American Chemical Society, Washington, DC,
18. Cambridge Health Institute, Cheminformatics 1990.
Glossary, available online at http://www. 34. R. S. Pearlman, Chem. Design Automation
genomicglossaries.com/default.html, accessed News, 1,5-7(1987).
on August 21,2002. 35. Y . C. Martin, M . G. Bures, and P. Willett in
19. The Chemical Structure Association, available K. B. Lipkowitz and D. B. Boyd, Eds., Reviews
online at http://www.chem-structure.org/, ac- in Computational Chemistry,VCH Publishers,
cessed on September 10,2002. New York, 1990, pp. 213-264.
20. The Ohio State University, Computational 36. R.D. Cramer, D. E. Patterson, R. D. Clark, F.
Chemistry Listserver, available online at http:// Soltanshahi, and M . S. Lawless, J. Chem. Inf.
www.ccl.net/chemistry/, accessed on June 26, Comput. Sci., 38,1010-1023(1998).
2002. 37. For discussions of relative stereochemistry,
21. The Molecular Graphics and Modeling Society, see the documentation forvarious chemical da-
available online at http://www.mgms.org/, ac- tabase systems. For example, MDL stereo-
cessed on September 10,2002. chemistry is described online at http://www.
22. The Open Molecule Foundation, available on- mdli.com/downloadsniterature/ctfile.pdf, and
line at http://www.xml-cml.org/, accessed on Daylight conventions are described at http://
August 14,2002. www.daylight.com/release/f -manuals.htm1.
23. The QSAR and Modeling Society, available on- CAS stereochemical conventions have been de-
line at http://www.ndsu.nodak.edu/qsar_soc/, scribed in L. M. Staggenborg in H . Colier, Ed.,
accessed on September 17,2002. Recent Advances i n Chemical Information,
24. The Royal Society o f Chemistry Chemical In- Royal Society o f Chemistry, Cambridge, UK,
formation Group, available online at http:// 1993, pp. 89-112.
www.rsc.org~lap/rsccom/dab/scafOOl.htm, ac- 38. R. S. Cahn, C. K. Ingold, andV. Prelog, Angew.
cessed on September 10,2002. Chem., 78, 413-447 (1966);V . Prelog and G.
25. The U K QSAR and Cheminformatics Group, Helmchen, Angew. Chem., 94, 614-631
available online at http://www.iainm.demon. (1982); D. Seebach and V . Prelog, Angew:
co.uk/indexnew.htm, accessed on February 1, Chem., 94,696-702(1982).
2002. 39. J. Sadowski and J . Gasteiger, Chem. Rev., 93,
26. M.F. Lynch, J. M . Harrison, W . G. Town, and 2567-2581(1993).
J . E. Ash, Computer Handling of Chemical 40. M . F. Lynch, J. M . Barnard, S. M . Welford,
Structure Information, MacDonald, London, J. Chem. Inf. Comput. Sci., 21, 148-150
1971. (1981);J. D. Holliday, G. M . Downs, V . J . Gil-
27. W . J. Howe, M. M . Milne, and A. F. Pennell, let, and M. F. Lynch, J. Chem. Inf. Comput.
Retrieval of Medicinal Chemical Information, Sci., 33,369477(1993).
ACS Symposium Series 84, American Chemi- 41. G. M. Downs and J . M. Barnard, J. Chem. Inf.
cal Society, Washington, DC, 1978. Comput. Sci., 37,59-61(1997).
28. Chemical Abstracts Service, available online at 42. A. J. Gushurst, J . G. Nourse,W . D. Hounshell,
http://www.cas.org, accessed on September 16, B. A. Leland, and D. G. Raich, J. Chem. Znf.
2002. Comput. Sci., 31,447-454(1991).
29. W . A. Warr and A. R. Haygarth Jackson, 43. J. L. Schultz and E. S. Wilks, J. Chem. Inf.
J. Chem. Inf. Comput. Sci., 28,68-72(1988). Comput. Sci., 37,436-442(1997).
30. D. E. Meyer, W . A. Warr, and R. Love, Eds, 44. ChemDraw from Cambridgesoft: http://www.
Chemical Structure Software for Personal cambridgesoft.com/; ChemWindow from Bio-
Computers, American Chemical Society, Rad: http://www.bio-rad.com; ChemSketch
Washington, DC, 1988. and Structure Drawing Applet from ACDLabs:
31. E. K.F. Ahrens in W . A. Warr, Ed., Chemical http://www.acdlabs.com; Peter Ertls Java mo-
Structures, Springer-Verlag, Berlin, 1988, pp. lecular editor: http://www.elsevier.com/inca/
97-111. homepage/saa/eccc3/paper6/; ISISDraw from
41 4 Chemic:al Information Computing Systems in Drug Discovery
115. L. B. Kier and L. H. Hall, Molecular Structure 135. D. Normile, Science, 293,787 (2001).
Description: The Electrotopological State, Aca- 136. S. B. Singh, R. P. Sheridan, E. M. Fluder, and
demic Press, New York, 1999. R. D. Hull, J. Med. Chem., 44, 1564-1575
116. Biobyte, Inc: http://www.biobyte.com. (2001).
117. Syracuse Research Corporation: http://www. 137. R. K. Lindsay, B. G. Buchanan, E. A. Feigen-
syrres.com. baum, and J. Lederberg, Applications of Arti-
118. CompuDrug Ltd: http://www.compudrug.com. ficial Intelligence for Chemistry: The DEN-
119. ACD Labs, Inc: http://www.acdlabs.com. DRAL Project, McGraw-Hill, New York, 1980.
120. Peking University: http://cheminfo.pku. 138. U . Fayyad, G. Piatetsky-Shapiro, P. Smyth,
edu.cdcalculator/xlogp/. and R. Uythurusamy, Eds., Advances in
121. EduSoft LC: http://www.eslc.vabiotech.com. Knowledge Discovery and Data Mining, MIT
122. SciVision, Inc: http://www.scivision.com/. Press, Cambridge, MA, 1996.
123. A. M. Zissimos, M. H . Abraham, M. C. Barker, 139. M. J . A. Berry and G. Linoff, Data Mining
K. J. Box, and K. Y . Tam, J. Chem. Soc., Perkin Techniques: For Marketing, Sales, and Cus-
Trans., 2,1-9 (2002). tomer Support, Wiley, New York, 1997.
124. LION Bioscience, Inc: http://www.lionbio- 140. M. J . Zaki, H. T . T . Toivonen, and J . T . L.
science.com/. Wang, Eds., Proceedings of BIOKDD'Ol:
125. V . V . Poroikov, D. A. Filimonov,V . B. Y u , A. A. Workshop on Data Mining in Bioinformatics,
Lagunin, A. Kos, J. Chem. Inf Comput. Sci., 7 t h ACM SIGKDD International Conference
40,1349-1355 (2000). on Knowledge Discovery and Data Mining-
KDD'O1, San Francisco, CA, Association for
126. Daylight Chemical Information online calcula-
Computing Machinery, New York, August 26,
tor: http://www.daylight.com/cgi-bidcontribl
2001.
pcmodels.cgi.
127. ACD Labs online calculator: http:// 141. R. W . Snyder, "Symposium on Structure-
www2.acdlabs.com/ilab/. Based Data Mining", Abstracts, American
Chemical Society 221st National Meeting, San
128. Syracuse Research online calculator: http://
Diego, CA, April 1-5, 2001, American Chemi-
esc.syrres.com/interkow/kowdemo.htm.
cal Society, Washington, DC, 2001.
129. Molinspiration:http://www.mollnspiration.com/.
142. J. F. Cargill and N. E. MacCuish, Drug Discov.
130. Alogp-VCCLab online LogP calculation:
Today, 3,547-551 (1998).
http://vcclab.orgAab.
131. PETRA: http://www2.chemie.uni-erlangen.de/ 143. B. de Ville, Microsof? Data Mining, Digital
services/petra/. Press, Boston, MA, 2001.
132. USEPA Suite: http://www.epa.gov/opptintr/ 144. T . Hoctor, "Linking Context-similar Informa-
exposure/docs/episuite.htm. tion", Abstracts, American Chemical Society
133. R. Kimball and M. Ross, The Data Warehouse 222nd National Meeting, Chicago, IL, August
Toolkit: The Complete Guide to Dimensional 2 6 3 0 , 2001, American Chemical Society,
Modeling, 2nd ed., Wiley, New York, 2002. Washington, DC, 2001.
134. M . G. Axel and I.-Y. Song, "Data Warehouse 145. D. R. Swanson and N. R. Smalheiser, Artificial
Design for Pharmaceutical Drug Discovery Re- Intelligence, 91,183-203 (1997).
search," in Proceedings o f the 8th Interna- 146. N. Farmer, J. Amoss, W . Farel, J . Fehribach,
tional Workshop in Database and Expert Sys- and C. Eidner in W . A. Warr, Ed., Chemical
tems Applications, IEEE Press, Piscataway, Structures, Springer-Verlag, Berlin, 1988, pp.
NJ, September 1997. 283-295.
CHAPTER TEN
DONALD J. ABRAHAM
Virginia Commonwealth University
Richmond, Virginia
MARTIN K. SAFO
Virginia Commonwealth University
Richmond, Virginia
Contents
1 Introduction, 418
2 Structure-Based Drug Design, 419
2.1 Theory and Methods, 419
2.2 Hemoglobin, One of the First
Drug-Design Targets, 419
2.2.1 History, 419
2.2.2 Sickle-Cell Anemia, 419
2.2.3 Allosteric Effectors, 421
2.2.4 Crosslinking Agents, 424
2.3 Antifolate Targets, 425
2.3.1 Dihydrofolate Reductase, 425
2.3.2 Thymidylate Synthase, 426
2.3.2.1 Structure-Guided Optimization:
AG85 and AG337,426
2.3.2.2 De Novo Lead Generation:
AG331,428
2.3.3 Glycinamide Ribonucleotide
Formyltransferase, 429
2.4 Proteases, 432
2.4.1 Angiotensin-Converting Enzyme and
the Discovery of Captopril, 432
2.4.2 HIV Protease, 433
2.4.3 Thrombin, 442
2.4.4 Caspase-1, 444
2.4.5 Matrix Metalloproteases, 445
2.5 Oxidoreductases, 446
2.5.1 Inosine Monophosphate
Dehydrogenase, 447
Burger's Medicinal Chemistry a n d Drug Discovery 2.5.2 Aldose Reductase, 448
Sixth Edition, Volume 1: Drug Discovery 2.6 Hydrolases, 449
Edited by Donald J. Abraham 2.6.1 Acetylcholinesterase, 449
ISBN 0-471-27090-3 Q 2003 John Wiley & Sons, Inc. 2.6.2 Neuraminidase, 450
417
Structure-Based Drug Design
Figure 10.1. Schematic of the structure-based drug discovery/design process. The figure maps out
the iterative steps that make use of X-ray crystallography, molecular modeling, organic synthesis,
and biological testing to identify and optimize ligand-protein interactions.
CHO CHO
I
(lb) BW12C
(la) vanillin
(21, a diuretic agent, and clofibric acid (3),an
antilipidemic agent, were reported to have
of the a-chain (23). A derivative of vanillin has strong antigelling activity (24, 25), and
been patented and is a candidate for clinical through X-ray analyses of cocrystals, the bind-
trials. ing sites of these agents to Hb were elucidated
Two marketed medicines, ethacrynic acid (26). Unfortunately, it was found that high
Structure-Based Drug Desij
d Clinical Trials
Figure 10.1. Schematic of the structure-based drug discovery/design process. The figure maps out
the iterative steps that make use of X-ray crystallography, molecular modeling, organic synthesis,
and biological testing to identify and optimize ligand-protein interactions.
CHO CHO
I
OH (lb) BW12C
(la) vanillin
(21, a diuretic agent, and clofibric acid (3),a
antilipidemic agent, were reported to hav
of the a-chain (23).A derivative of vanillin has strong antigelling activity (24, 25), an
been patented and is a candidate for clinical through X-ray analyses of cocrystals, the bind
trials. ing sites of these agents to Hb were elucidate1
Two marketed medicines, ethacrynic acid (26). Unfortunately, it was found that higl
ucture-Based Drug Design
steric binding site different from that of 2,3- solely related to .their binding constant, pro-
DPG (compound 4). Perutz and Poyart tested viding a structural basis for E. J. Ariens' the-
another antilipidemic agent, bezafibrate (com- ory of intrinsic activity (42).
pound 5), and found that it was an even more By use of X-ray crystallographic analyses,
the key elements linking allosteric potency
with structure were uncovered. In addition,
the computational program HINT, which
quantitates atom-atom interactions, was used
to determine the strongest contacts between
various bezafibrate analogs and Hb residues.
These analyses revealed that the amide link-
age between the two aromatic rings of the
compounds must be orientated so that the car-
(5) bezafibrate bony1 oxygen forms a hydrogen bond with the
side-chain amine of aLys99 (41, 43). Three
other important interactions were found. The
potent right-shifting agent than clofibrate first are the water-mediated hydrogen bonds
(36). Perutz et al. (26) and Abraham (35) de- between the effector molecule and the protein,
termined the binding site of bezafibrate and the most important occurring between the ef-
found it to link a high occupancy clofibrate site fector's terminal carboxylate and the side-
with a low occupancy site. Lalezari and Lalez- chain guanidinium moiety of residue olArgl41.
ari synthesized urea derivatives of bezafibrate Second, a hydrophobic interaction involves a
(37), and with Perutz et al. determined the methyl or halogen substituent on the effec-
binding site of the most potent derivatives tor's terminal aromatic ring and a hydropho-
(38). Although these compounds were ex- bic groove created by Kb residues aPhe36,
tremely potent, they were hampered by serum aLys99, aLeu100, aHisl03, and pAsnl08.
albumin binding (39,40). Third, a hydrogen bond is formed between the
Abraham and coworkers synthesized a se- side-chain amide nitrogen of Asnl08 and the
ries of bezafibrate analogs (39-42). One of electron cloud of the effector's terminal aro-
these agents, efaproxaril (RSR 13, compound matic ring (40,41,43).Abraham first observed
6a) is currently in Phase I11 clinical trials for this last interaction while elucidating the Hb'
radiation treatment of metastatic brain tu- binding site of bezafibrate (35). Burley and
Petsko had previously pointed out this type of
hydrogen bond in a number of proteins, indi-
cating that this contact is involved in a num-
ber of other receptor interactions (44,451. Pe-
rutz and Levitte estimated this bond to be
about 3 kcal/mol (46). Figure 10.3 shows the
overlap of four allosteric effectors (6a, 6b, 7a
and 7b) that bind at the same site in deoxy Hb
but differ in their allosteric potency.
Figure 10.3. Stereoview of allosteric binding site in deoxy hemoglobin. A similar compound envi-
ronment is observed at the symmetry-related site, not shown here. (a) Overlap of four right-shifting
allosteric effectors of hemoglobin: (6a) (RSR13, yellow), (6b)(RSR56, black), (7a) (MM30, red), and
[7b)(MM25,cyan). The four effectors bind at the same site in deoxy hemoglobin. The stronger acting
RSR compounds differ from the much weaker MM compounds by reversal of the amide bond located
between the two phenyl rings. As a result, in both RSR13 and RSR56, the carbonyl oxygen faces and
nakes a key hydrogen bonding interaction with the m i n e of mLys99. In contrast, the carbonyl
xygen of the MM compounds is oriented away from the mLys99 amine. The aLys99 interaction with
;he RSR compounds appears to be critical in the allosteric differences. (b) Detailed interactions
~etweenRSR13 (6a) and hemoglobin, showing key hydrogen bonding interactions that help con-
strain the T-state and explain the allosteric nature of this compound and those of other related
:ompounds. See color insert.
424 Structure-Based Drug Design
H3C CH3
Fi gure 10.5. Stereoview of the binding site for (9) (n = 3, TB36, yellow) in deoxy Hb. A similar
co:mpound environment is observed at the symmetry-related site, not shown here. One aldehyde is
CO'valently attached to the N-terminal alVall, whereas the second aldehyde is bound to the opposite
subunit, a2Lys99 ammonium ion. The carboxylate on the first aromatic ring forms a bidentate
hy.drogen bond and salt bridge with the guanidinium ion of a2Arg141 of the opposite subunit. The
efiTector thus ties two subunits together and adds additional constraints to the T-state, resulting in a
shift in the Hb allosteric equilibrium to the right. The magnitude of constraint placed on the T-state
by the crosslinked aLys99 varies with the flexibility of the linker. Shorter bridging chains form
tig:hter crosslinks and yield larger shifts in the allosteric equilibrium. See color insert.
DHFR
bindi~ig curve, are generally consistent with Tetrahydrofolate
the biehavior of the allosteric effectors and
cross1inking agents. [Purines]
intifolate Targets I t TS
I
C1-Tetrahydrofolate -7 -
Dihydrofolate
2.3 .I Dihydrofolate Reductase. The re- 1
duced form of folate (tetrahydrofolate) acts as Thymidylate
a one-carbon donor in a wide variety of biosyn-
Scheme 10.1.
thetic transformations. This includes essen-
tial st;eps in the synthesis of purine nucleo-
tides 2md of thymidylate, essential precursors The first crystal structure of a drug bound
to DNIA and RNA. For this reason. folate-de- to its molecular target was provided by the
pendent enzymes have been useful targets for pioneering X-ray diffraction study of the com-
the dlevelopment of anticancer and anti-in- plex between DHFR and methotrexate (57),
flamrrlatory drugs (e.g., methotrexate) and - was a bacterial
albeit in this case the target
anti-irlfedives (trimethoprim, pyrimethamine). surrogate for the actual target (the human en-
During the reaction catalyzed by thymidylate zyme). Once X-ray structures of DHFR from
synthiase (TS), tetrahydrofolate also acts as a eukaryotic sources were also solved, compari-
reducitant and is converted stoichiometrically sons of the bacterial and eukaryotic
" DHFR
to dikydrofolate. The regeneration of tetrahy- structures revealed the structural basis for
drofolate, required for the continuous func- the selectivity of the antibacterial drug tri-
tioning of this cofactor, is catalyzed by dihy- methoprim for the bacterial enzyme. This un-
drofolate reductase (DHFR). derstanding allowed Goodford and colleagues
Structure-Based Drug Design
ping of'the compound as highly charged forms which is now approved for treatment of colo-
after a1ddition of several additional glutamates rectal cancer in European markets.] Removal
by a cellular enzvme.
" of the glutamate reduced the potency by 2 to 3
TS inhibitors were designed by Agouron orders of magnitude (Table 10.1, 12 versus
scienti:sts with the aim of providing a drug 13). The crystal structure solved by use of (10)
that could enter cells passively and thus avoid indicated potential interactions that were ex-
the neted for transport or polyglutamylation. ploited by substituents such as the m-CF, in
The fil.st were designed by structure-guided compound (14). The phenyl moiety in (15)was
modific:ation of known antifolates, and others added to interact with Phe176 and Ile79 (Fig.
were dlesigned de novo. Starting with (12), the -
10.6). Combining substituents does not neces-
glutamlate moiety was deleted from the struc- sarily produce the expected sum of binding
I
ture. [Compound (12), the 2-desamino-2- free energy (compare 16 with 14 and 15).
methyl analog of (lo), had been found to be Structures of the complexes with several of
much more water soluble than (10). This these compounds revealed that ideal place- .
eventually led (65) to AstraZeneca's Tomudex, ment of one group does not always accommo-
Figure 10.6. Binding site for (10) (N10-propynyl-5,8-dideazafolate), within the active site of thymi-
dylarte synthase from Escherichia coli. The surface of the inhibitor is shown in the left view. The red
sphc?resin the left view are tightly bound water molecules. See color insert.
428 Structure-Based Drug Design
Inhibitors of TS a
Table 10.1 SAR for 2-Methyl-4-0x0-quinazoline
date the best interaction for another. (This is a binding mode. Two dozen 5-substituted
general problem for rigid scaffolds.) Com- quinazolines were made to explore the SAR for
pounds (15-17) had significant activity in in this scaffold. However, the eventual clinical
vitro cell-based assays, which could be re- candidate (19) was only two steps away from
versed by exogenous thymidine. Compound (18).The methyl group at position 6 was in-
(17) (AG85)was tested in human clinical trials corporated for favorable interaction with
for treatment of psoriasis (9). Trp80. This also favorably restricted the tor-
The structure shown in Fig. 10.6 also sug- sional flexibility for the 5-substituent, and in-
gested another approach to alter the structure creased the inhibitory potency against human
of (12) to generate a lipophilic inhibitor of TS.
TS by 10-fold. The 2-methyl was replaced by
The hydrophobic cavity filled by the aromatic
an amino group, to create a hydrogen bond to
ring of the para-aminobenzoyl group could be
a backbone carbonyl in the protein, and in-
filled instead by a substituent attached to po-
sition 5 of the quinazoline nucleus. Four dif- creased potency another sixfold. Compound
ferent 5-substituted 2-methyl-4-oxoquinazo- (19) (AG337, also known as nolatrexed, and as
lines were made to test this idea, and one of the hydrochloride, Thymitaq) advanced into
these (18)was a 1 inhibitor of human TS human testing and had progressed into later-
(66). stage clinical trials as an antitumor agent by
The X-ray structure of the bacterial en- 1996 (67).
zyme with (18) confirmed the hypothetical 2.3.2.2 De Novo Lead Generation: AG331.
The de novo design effort was initiated
through the use of a computational method,
Goodford's GRID algorithm (68,69), to locate
a site favorable to the binding of an aromatic
system within the TS active site (70). Using
computer graphics, naphthalene was visual-
ized and manipulated within this favorable
site (Fig. 10.7). This facilitated alterations of
the naphthalene scaffold to a benz[cd]indole
to provide hydrogen-bonding groups to inter-
act with the enzyme and a tightly bound wa-
ter. Elaboration from the opposite edge of the
naphthalene core to extend into the top of the
2 Structure-Based Drug Design
Figure 10.7. Conceptual design of compound (201, by use of the active site of E. coli TS as a
template. W represents a tightly bound water molecule. [Adapted from Babine and Bender (91.1
dogs of folate were synthesized and then Agouron began with consideration of the
tested as inhibitors of tumor cell growth or of structure of the complex between the E. coli
the activity of various folate-dependent en- enzyme and 5-deazatetrahydrofolate (77). An
zymes (73-75). A recent paper reported the active and soluble fragment of a multifunc-
formation in situ of a potent bisubstrate ana- tional human protein that contained the
log inhibitor of GARFT, from glycinamide ri- GARFT activity was provided by recombinant
bonucleotide and a folate analog, apparently approaches (78), and its structure was also
catalyzed by the enzyme itself (76). The sub- solved (79) in complex with novel inhibitors.
strate analog was designed based on consider- Comparison of the two structures subse-
ation of enzyme structure and the GARFT quently validated the use of the bacterial en-
mechanism. This emphasizes the potential to zyme as a model for the human GARFT. The
exploit the interplay between binding and cat- design of novel inhibitors also relied on previ-
alytic events in the design of new inhibitors. ous studies of the structure-activity relation-
The development of GARFT inhibitors at ships (SAR) for substitutents around the core
2 Structure-Rased Drug Design
of (23),including some GARFT inhibitors in the 5-thia position were much less active. Sev-
which the ring containing N5 was opened (80). eral other analogs, such as (261, were made in
Inspection of the structure of the bacterial attempts to fill the active site more fully, and
GARFT-inhibitor complex revealed several to restrict the conformational flexibility of the
important features. The pyrimidine portion of linker. Molecular mechanics calculations
the pteridine was fully buried within the failed to correctly predict the conformation on
GARFT active site, forming many hydrogen the 5-thiamethylene group of (25) bound to
bonds with conserved enzymic groups. The D- GARFT because of unforeseen conformational
glutamate moiety was largely solvent exposed, flexibility of the enzyme revealed by an X-ray
with no immediately obvious potential for structure of this complex. This again empha-
building additional interactions. Retention of sizes the importance of interative experimen-
the D-glutamate unmodified was also desirable
tal confirmation of molecular designs. Several
for pharmacodynamic reasons. A significant
functional criteria in addition to GARFT inhi-
opportunity was presented by the fact that the
bition and cell-based assays were evaluated
active site might accommodate a bulkier hy-
drophobic atom than the methylene group in during the several cycles of optimization.
5-deazatetrahydrofolate that replaces the nat- These included the ability of exogenous purine
urally occurring N5 in tetrahydrofolate. To to rescue cells (which indicates selective
test this idea, a series of 5-thiapyrimidinones GARFT inhibition), and the ability of the in-
were synthesized, including compound (24). hibitors to function as substrates for enzymes
These analogs were more readily prepared involved in the transport and cellular accumu-
than the corresponding cyclic derivatives. lation of antifolate drugs. Balancing these cri-
This compound had a potency of 30-40 nM in teria has resulted in the choice of compounds
both a cell-based antiproliferation assay and a (26) and (27) (AG2034 and AG2037, respec-
biochemical assay for human GARFT inhibi- tively) for clinical development at Pfizer. (In
tion. A crystal structure of human GARFT, 1999, Agouron Pharmaceuticals was acquired
complexed with (24) and glycinamide ribonu- by Warner-Lambert, which was subsequently
cleotide, confirmed the structural homology acquired by Pfizer.) It is as yet unclear
between E. coli and human enzymes. whether the considerable toxicity of these and
Compounds with one fewer methylene in other GARFT inhibitors will allow these com-
the linker connecting the thiophenyl moiety to pounds to be acceptable as anticancer drugs.
Structure-Based Drug Design
(26) X = H
(27) X = methyl
A key tool in the discovery of captopril at sorbed intestinally, and thus are not good drug
Squibb was the use of a model for the active candidates. However, the best peptide inhibi-
site of angiotensin-converting enzyme (Fig. tor was 500-fold more potent than (28a). The
2 Structure-Based Drug Design 433
substrate cleavage
---N
Yc , 0-
H 0
L Angi
infornlation provided by the peptides, the maturation of infective virus particles, the
struct-ural model for the active site of angio- cleavage of polyprotein precursors to yield ac-
tensin-converting enzyme, and biochemical tive products. After this was demonstrated i n *
and tissue-based pharmacological assays for the mid to late 1980s, HIV-P became a target
the en zyme's function were used to guide an for the development of antiviral drugs to treat
iterative design process to improve the po- acquired immunodeficiency syndrome (AIDS).
tency, selectivity, and stability of small mole- Several HIV-P inhibitors have been approved
cules inhibitors. The R1 and R2 substitutents for human therapeutic use in the past 10
were optimized, and the zinc ligand was years, and the speed with which they were de-
changc3d to a thiol, which significantly in- veloped is attributed in part to the successful
creased potency (Table 10.2, compare 28a use of SBDD methods. There are excellent re-
with 18c). This process yielded the orally cent reviews of this area (88, 89). There are
availald e and stable small molecule captopril numerous reviews of the early work on HIV-P
(28d) within 18 months of the creation of the inhibitors (8,9, 90, 91).
model, HIV-P is a symmetrical homodimer of iden-
Thc following quotation [from the original tical 99 residue monomers, structurally and
research report (81) on the design of captopril] mechanistically similar to the pseudosymmet-
predicted the great promise of SBDD: "The ric pepsin family of proteases (92-941, whose
studie;s described above exemplify the great members include renin. Because the protease
heuristic value of an active-site model in the is a minor component of the virion particle,
design of inhibitors, even when such a model is intensive structural studies required overpro-
a hypc~theticalone." duction through recombinant DNA methods.
One of the first structures was determined
2.4.,2 HIV Protease. The aspartyl endopro- with material synthesized nonbiologically
tease e!ncodedby human immunodeficiency vi- (through peptide synthesis). As of June 2002,
rus (H:IV-P) catalyzes essential events in the there were over 100 X-ray structures repre-
434 Structure-Based Drug Design
sented by coordinate sets in the Protein Data surface area. The minor differences between
Bank, and many hundreds more have been de- the HIV proteases from two major strains of
termined in proprietary industrial studies. HIV (HIV-1 and HIV-2) are not addressed
The active site of the enzyme is C2 symmet- here. More significant are the HIV-P sequence
ric in the absence of substrates or inhibitors variants with much reduced sensitivity to ex-
(Fig. 10.9a),and contains two essential aspar- isting drugs that have evolved because of se-
tic acid residues (Asp25 and Asp25'). The en- lective pressure and the rapid mutation rate of
trance to the active site is partly occluded by the virus. The reader interested in the differ-
"flaps" constructed of two beta strands (resi- ences between the proteases from HIV-1 and
dues 43-49 and 52-58) from each monomer, HIV-2, or in the issues surrounding drug-re-
connected by a turn. In the absence of sub- sistant variants, is referred to Ref. 91 and Ref.
strate or inhibitor, the flaps seem to be rather 89, respectively.
flexible. Upon binding of inhibitors and pre- The early work on inhibition of HIV-P was
sumably of substrates, the residues within the much influenced by previous structural and
flaps undergo movements up to several ang- mechanistic work on pepsin and its inhibitors.
stroms to interact with the bound ligand (Fig. Both enzymes are thought to catalyze peptide
10.10). A single tightly bound water is ob- hydrolysis through a tetrahedral transition
served in the structures of most HIV-P-inhib- state, shown below as (29).The previous work
itor complexes, accepting hydrogen bonds
from the backbone amides of both flap resi-
dues Ile50 and Ile50' and donating to carbon-
yls of the bound inhibitors. This is referred to
as the "flap" water. Despite the presence of
this water and several tightly bound water
molecules on the floor of the active site, the
cavity also contains extensive hydrophobic
ucture-Based Drug Design
- - inhibitors
ansition state mimics as pewin
the sequence of some cleavage sites for
.P led to the discovery at Roche of the R
5 versions of (30)as submicromolar inhib-
of HIV-P, with the R enantiomer being
?fold more potent (95). These inhibitors
oy a hydroxyethylamine moiety to re-
! the PI-P1' linkage that is normally
red (the scissile bond) with a stable group.
lead molecules were optimized without
Cbz-Asn-N
H J?? OH
(30)
C02-t-Butyl
shows the asymmetrical binding mode of the HIV-P inhibitor drugs are less than ideal, the
molecule in the HIV-P active site. Because the search for better ones has continued. Many of
metabolic and pharmacokinetic characteris- the deficits arise from the large size and pep-
tics of this compound and several other early tidic nature of the inhibitors. Another early
inhibitor was the modified octapeptide (32, ily site S,'. The optimal stereochemistry at the
U-85548) developed at Upjohn (96). hydroxymethyl center appears to be which-
This subnanomolar inhibitor was used to ever one will allow the interaction of the hy-
define the extensive hydrophobic and hydro- droxyl with both catalytic aspartates while ac-
gen bonding interactions available in the commodating the placement of inhibitor
HIV-P active site (97). A common feature in moieties in the S,, S,, S,', and S,' sites with
the binding of (31)and (32) to HIV-P is the minimal conformational strain on the inhibi-
interaction of the central hydroxyl group of tor (9).
the inhibitors with the carboxylates of both Both (31)(Fig. 10.9b) and (32) (Fig. 10.11)
Asp25 and Asp25'. This hydroxyl group re- bind to the HIV-P active site asymmetrically.
places a water molecule that likely binds be- However, after the X-ray studies of crystalline
tween these aspartyl side chains during pep- HIV-P apoenzyme revealed it to be a symmet-
tide hydrolysis by HIV-P. The inhibitors can rical dimer, C2 symmetric inhibitors were de-
therefore be seen as mimics of a "collected signed to take advantage of this structural fea-
substrate." The liberation of this water to ture (Fig. 10.12). Both alcohol diarnines and
bulk solvent probably contributes about 5 kcal diol diamines were examined. For example,
mol-I to the free energy of inhibitor binding, the C2 symmetric compound (33) (A-77003)
based on the studies by Rich and his colleagues was synthesized at Abbott and entered clinical
on similar inhibitors of pepsin (98,991. An in- trials as an antiviral agent for intravenous
teresting difference between (31)and (32) is treatment of AIDS (100).
that (31) has R stereochemistry at the hy- The X-ray structures of complexes between
droxymethyl center, whereas in (32) this is an HIV-P and diol diamine derivatives like (33)
S center. Part of the reason for this is that showed (101) that, although one of the hy-
when (31) binds to HIV-P, the decahydro- droxyl groups bound between the catalytic as-
quinoline ring system induces a conforma- party1 carboxylates and made contacts with
tional change in the protein, affecting primar- both, the second hydroxyl made only one such
438 Structure-Based Drug Design
, diol diamine
hydroxyethylene diamine
Figure 10.12. Design principle for C2 symmetric inhibitors of HIV-P and the related hydroxyeth-
ylene diamine scaffold.
2 Structure-Based Drug Design
gram was ritonavir (35,A-84538, ABT-538, or oxygen replacing the flap water. Compound
Norvir), which has been successfully launched. (36) was licensed to Triangle Pharmaceuti-
Another C2 symmetric HIV-P inhibitor, cals, and the mesylate advanced into Phase I
discovered at Dupont Merck is compound (36) clinical trials. Its future is uncertain after the
(DMP-450). This was one of a series of cyclic trials were put on hold because of animal tox-
ureas designed to interact with both the aspar- icity (http://www.tripharm.com/dmp45O.html).
tyl carboxylates and the Ile50 and Ile50' back- One of problems common to many of the
bone amides that hydrogen bond with the flap HIV-P inhibitors already discussed is their
(35) ritonavir
Structure-Based Drug Design
(37) indinavir
low solubility, which translates to low bio- tion of HIV-P by (38) was discovered by
availability. The discovery of (37) (indinavir, screening. Classical medicinal chemistry
L-735,524) was the result of the successful ap- methods allowed a reduction in size, and the
plication of SBDD at Merck to directly address discovery of an amino-2-hydroxyindan moiety
this problem. During an iterative optimization to replace the terminal dipeptide (correspond-
process, the physicochemical properties of ing to P,', thought to bind into the s,' site).
HIV-P inhibitors were modified within con- This approach (105, 106) resulted in the gen-
straints that were established structurally eration of (39)(L-685,434).Although (39) had
(104). Crixivan (the sulfate of 37) was success- a subnanomolar IC,, for inhibition of HIV-P,
fully launched for use as an antiviral drug. it also had very low aqueous solubility, like
The process leading to indinavir (Fig. most peptidomimetics. One way to improve
10.13) began with (381, a hydroxyethylene- solubility is to insert a charged functional
containing heptapeptide mimic, originally de- group into the molecule. The tertiary amino
signed as a renin inhibitor (105). The inhibi- group in the HIV-P inhibitor saquinavir (31)
Phenyl
Phenyl boc, OH -
OH
boc
%Leu-
,
Phenyl
Phenyl /
0
(boc= tert-butyloxycarbonyl)
4" Phenyl
\
(41
(cbz = benzyloxycarbonyl)
Figure 10.13. Structures of HIV-P protease inhibitors during the optimization process leading to
the discovery of (37) (indinavir).
2 Structure-Based Drug Design
was already identified. Piracy of the decahy- be accommodated by the S, pocket and to
droisoquinoline tert-butylamide from (31) further improve aqueous solubility, yielded (37).
provided the idea for the hybrid molecule (40). Several other approved AIDS drugs that
In addition to the charged group, use of this act by inhibition of HIV-P have also been de-
ring system would partly "preorder" the in- veloped through use of SBDD methods. Com-
hibitor's structure, lessening the entropic cost pound (42) (amprenavir, Agenerase, also
of binding. Molecular modeling was used with known as VX-478) is the most recent addition
known structures of HIV-P-inhibitor com- to the HIV-P inhibitors approved for human
plexes to evaluate this idea, and it was judged antiviral treatment, and differs significantly
to be reasonable enough to justify the synthe- from earlier inhibitors. Compound (42) was
specifically designed by Vertex scientists to
sis of (40) (104). This compound was subse-
minimize molecular weight to increase oral
quently shown to have much better pharma-
cokinetic behavior than its antecedents,
consistent with improved solubility and
dissolution.
A convergent synthetic route was devised
to generate (40) to improve the accessibility of
important analogs. Although (40) was an 8 nM
inhibitor of the isolated enzyme, better po-
tency was needed for acceptable cell-based ac-
tivity, and still better solubility characteristics
were needed. A method for structure-based
computational estimation of the interaction
energy for HIV protease inhibitors with the
enzyme was developed and used to help esti-
mate inhibitor potency before synthesis (107).
Variation of the group contributing the ter-
tiary m i n e led to the discovery of the pipera-
zine derivative (41) (L-732,747), which had
subnanomolar potency against HIV-P. The X- bioavailability (108). Compound (43) (nelfina- ,
ray structure of the HIV-P complex with (41) vir, AG-1343, also known as LY3128571, like
confirmed the binding mode predicted by mo- the precursors to the earlier drug (37) (indina-
lecular modeling, with the molecule filling the vir), copied the decahydroisoquinoline tert-bu-
S,, S,, S,', and S,' pockets, and the S, pocket tylamide group from the first marketed HIV-P
occupied by the terminal benzyloxycarbonyl inhibitor (31) (saquinavir). Compound (43)
moiety. Replacement of the benzyloxycar- was developed in a collaboration between sci-
bony1 with more polar heterocycles, chosen to entists at Lilly and Agouron (log), and is mar-
(42) amprenavir
Structure-Based Drug Design
(44)
argatroban
(45)
melagatran
NH2+
(46)
NAPAP
the sulfonamide nitrogen. Such substituents viding a 10-fold increase in potency (com-
appeared likely to extend into solvent and pound 50). By this point, the structural basis
therefore to be tolerated without compromis- for interaction of this compound series with
ing affinity. This was confirmed (i.e., com- thrombin was understood sufficiently to sug-
pound 491, and this decreased the undesirable gest that the amidosulfonyl group could be re-
affinity for serum-binding proteins. X-ray placed by a carboxamide. This was confirmed
studies with some of the inhibitors at this by use of several compounds, such as (51).
point indicated that a longer linker between Compound (61)(BIBR 953) was quite active as
the central benzimidazole and the benzami- an anticoagulant in animals dosed intrave-
dine moiety in the S, pocket might provide nously, but required conversion to prodrug
some advantage. This was confirmed with sev- (compound 47) to mask its charge and allow
eral analogs, with the methylamino linker pro- oral dosing.
Structure-Based Drug Design
.
2.4.4 Caspase-1 Caspase-1 (interleukin developed as a caspase-1 inhibitory therapeu-
1-pconverting enzyme, or ICE) is a member of tic agent through use of SBDD in a collabora-
a family of cysteine proteases that catalyze the tion between Vertex and Aventis. Although
cleavage of key signaling proteins in such pro- the details of the discovery process have not
cesses as inflammatory response and apopto- been published, (52) probably functions as a
sis. Genetic methods have provided evidence prodrug. The cleavage of the lactone of (52)
supporting a role for caspase-1 in diseases would yield a hemiacetal that could hydrolyze
such as stroke (118) and inflammatory bowel to release ethanol and the aldehyde form of
disease (119). The X-ray structure of crystal- the drug, which then can form a covalent thio-
line human caspase-1 was solved in 1994 by acetal with the active site thiol of caspase-1,
several groups (120,121),and has been a valu- leading to pseudoirreversible inhibition. Clin-
able tool in intensive efforts to design potent ical trials of compound (52) as an anti-inflam-
and bioavailable inhibitors of the enzyme. matory agent for treatment of rheumatoid ar-
Compound (52) (pralnacasan, VX-740) was thritis began in 1999 (122). In April 2002, the
2 Structure-Based Drug Design
/
n-hexyl
0
(47) BIBR 1048
Figure 10.15. Optimization of structure leading to the discovery of (51) (BIBR 953).
companies announced that these trials would vasiveness of tumor cells. There are publicly
continue and would be expanded to include available X-ray structures of enzyme-inhibi-
treatment of osteoarthritis. tor complexes for at least seven different
MMPs, as of this writing. Several detailed re-
2.4.5 Matrix Metalloproteases. Matrix metal- views of the SAR and binding modes for inhib-
loproteases (MMPs) are a large and diverse itors of matrix metalloproteases are available
family of zinc endoproteases. Several mem- (9, 123). All MMP inhibitors contain a moiety
bers of this family (such as the collagenases that binds to the active site zinc, such as the
and the stromelysins) are thought to have im- hydroxamates of (53) (prinomastat, AG3340)
portant roles in proliferative diseases, includ- and (54) (CGS-27023) and the carboxylic acid
ing arthritis, retinopathy, and metastatic in- of (55) (tanomastat, BAY 12-9566). These
Structure-Based Drug Design
in human testing. The efforts with these two sants. Other utilities that have been suggested
targets are described briefly below. for IMPDH inhibitors are antiviral and anti-
cancer therapies.
2.5.1 lnosine Monophosphate Dehydroge- The structure of hamster IMPDH in com-
nase. Proliferative cells such as lymphocytes plex with IMP and (56)was solved at Vertex in
have high demands for the rapid supply of nu- the mid-1990s (129). This allowed the visual-
cleotides to support DNA and RNA synthesis, ization of a covalent intermediate, in which a
as do viruses during their proliferative phase. cysteine thiol from the enzyme adds to C2 of
The first dedicated step in the de novo biosyn- the purine ring of the nucleotide substrate. An
thesis of guanine nucleotides is conversion of analogous covalent adduct is postulated to be a
inosinate to XMP, catalyzed by inosine mono- key catalytic intermediate during normal
phosphate dehydrogenase (IMPDH). turnover (130). The structure was a key tool in
the discovery of (57) (VX-497,merimepodip),a
IMP + NAD+ +XMP + NADH novel potent inhibitor of human IMPDH suit-
able for oral administration (131).
A prodrug form of (56) (mycophenolicacid), a An experimental screen of a diverse library
noncompetitive inhibitor of IMPDH, is ap- of commercially available compounds for in-
proved for human therapeutic use as an im- hibitors of IMPDH identified molecules with
the phenyl, phenyloxazole urea scaffold (58)
as weak inhibitors. Through use of the compu-
pRTNy /
(O,
N
(57) merimepodip
Structure-Based Drug Design
tors bind within the hydrophobic cleft and in- geted in SBDD projects that have produced
teract with the anionic site. The binding of compounds that are either launched or in clin-
potent inhibitors induces a conformational ical trials.
change, opening an adjacent hydrophobic
pocket. The conformation induced by (60) dif- 2.6.1 Acetylcholinesterase. A pronounced
fers from that caused by other, less selective decrease in the level of the neurotransmitter
inhibitors. This "specificity" pocket was acetylcholine is one of the most pronounced
thought to offer an opportunity for selective changes in brain chemistry observed in the
inhibition of aldose reductase while sparing sufferers of Alzheimer's disease (139). Several
aldehyde reductase. Hence, this structural drugs that are approved for the treatment of
study provided an initial pharmacophore for the dementia thought to result from this neu-
both potency and selectivity. rotransmitter deficit act by inhibiting acetyl-
The SAR for this pharmacophore was de- cholinesterase. These include (63) (tacrine, or
veloped with a series of synthetically accessi-
ble salicylic acid derivatives that were scored
for potency and selectivity with the purified
enzymes, and efficacy in a diabetic rat model
(137). One of the most potent and selective of
the derivatives was (62), containing the benz-
(63) tacrine
(64) donepezil
thiazole heterocycle. The SAR was employed, Cognex, a Pfizer drug that was the first such
guided by the structures of selected inhibitor agent approved for this indication), (64) (don-
complexes, to design a novel indole scaffold to ezepil), and (65)(rivastigmine). Several other
present the pharmacophoric elements (M. Van agents are in clinical trials. Disappointing ef-
Zandt, personal communication). The optimi-
zation of this series provided the clinical can-
didate (61) (138).
2.6 Hydrolases
Some other hydrolytic enzymes, in addition to
proteases, that are important drug targets in-
clude protein phophatases, phosphodiester-
ases, nucleoside hydrolases, acetylhydolases, (65) rivastigmine
glycosylases, and phospholipases. Structure-
based inhibitor design is currently being ap- ficacy is observed with the existing drugs, aris-
plied to a number of these enzymes. The last ing from dose limitations that are likely attrib-
three mentioned have been successfully tar- utable to the inhibition of acetylcholinesterase
Structure-Based Drug Design
in peripheral tissues (140). This may be a con- ing site. The length of both alkyl linkers was
sequence of the high serum levels required to varied, and the effect of adding a third alkyl
get these highly cationic molecules to pene- substituent was examined. The phthalimide
trate the blood-brain barrier. portion of the structure was chosen to improve
In a discovery project that is reminiscent of the synthetic accessibility of the analogs
the discovery of captopril, scientists at Takeda needed for this exercise. The compounds were
created a hypothetical structure for the active tested not only for inhibitory potency toward
site of acetylcholinesterase, based on SAR rat cerebral acetylcholinesterase, but also for
from previous biochemical and medicinal peripheral response and toxicity in dosed in-
chemical work (141). The model consisted of tact rats. After the work was under way, Suss-
(in addition to the serine protease-like cata- man and coworkers solved the atomic struc-
lytic machinery) an anionic binding site sepa- ture of acet~lcholinesterase
- from the electric
rating two discrete hydrophobic binding sites. eel, including complexes with several inhibi-
This model was then used to design inhibitors tors, by X-ray crystallography (143). The
of the enzyme (reviewed in ref. 142). One set of availability of this structure made it possible
analogs examined were based on the N-(w- to retrospectively analyze the basis for the
phthalimidylalky1)-N-(w-phenylalky1)-amine S A R in this series of compounds, by use of
(scaffold 66). An iterative process of testing, DOCK (144).
bond with another active-site acid, the side LY315920), which has 6500-fold greater
chain of Asp49. The latter finding again em- ity for hnps-PLA2 than did the original hit
phasizes the importance of experimental molecule (74). LY315920 effectively inhibits
structures to guide improvements of inhibitor hnps-PLA2 in the serum of transgenic mice
potency, given that placing two presumed an- dosed with the compound orally or i.v., and is
ions so close together would likely never have undergoing clinical trials in the United States
been predicted by a computational model. and Japan (162,163).
Other slight conformational changes were ob-
2.7 Picornavirus Uncoating
served to accommodate the 5-methoxy group
of (74). Picornaviruses, which include the rhinovi-
The inhibitor's 3-acetate moiety was con- ruses and enteroviruses. are RNA viruses that
verted to an acetamide in a successful attempt cause several infectious human diseases.
to restore the active site calcium, form a hy- These diseases include common colds as well
drogen bond to His48, and increase potency. as life-threatening infections of the respira-
The crystal structure of the complex with the tory and central nervous systems. Effective
amide version of (74) also revealed a signifi- treatments of these diseases would relieve
cant reorientation of the indole core and 5-me- much human suffering, save many lives, and
thoxy substituent, resulting in an unantici- have great economic benefit. There are over
pated 5-A movement of the terminal methyl. 100 serotypes of rhinoviruses alone, making it
Further changes in inhibitor structure were impossible to generate a vaccine effective
guided by iterative structural studies and against infections by all variants of the virus
functional assays of potency and selectivity. (164).
These changes involved the use of substitu- The Achilles heel of ~icornaviruseshas
ents at positions 3 or 4 to optimize coordina- been suggested to be that part of the virus
tion of the metal ion, extension of the van der structure that interacts with the cell surface
Wads interaction by lengthening the receptor because those structural features
2-methyl to an ethyl, and conversion of the must be well conserved (165). The virus parti-
3-acetamide to glyoxamide (159,161). This re- cle consists of a positive-strand RNA coated by
sulted in the synthesis of (77) (compound an icosahedral shell, containing 60 copies of
four distinct 0-barrel proteins (166). Thege
structural proteins contain the binding site
for the cellular receptor and undergo signifi-
cant conformational changes to liberate the
viral RNA genome during infection of the cell.
A series of isoxazoles that inhibit this picoma-
virus "uncoating" process were discovered in
the early 1980s by scientists at Sterling
Winthrop, by use of an in vitro cellular assay
for antiviral activity (167-170). One of these,
compound (78) (WIN-51711, disoxaril), gave a
50% suppression of viral plaque formation in
this assay at 0.3 $. Compound (78) was also
effective in animal models (171) and entered
phase I clinical trials, but failed to advance
because of its toxicity. Compound (78) was
shown (172) to bind to viral capsid protein
(78) WIN-51711,disoxaril
ure-Based Drug Design
ithin a hydrophobic pocket in the floor ance potency and selectivity, and the struc-
*
canyon" that contains the binding site tural information helped to guide compound
cell surface receptor (Fig. 10.17A). design in pursuit of this balance.
ral changes induced in the canyon A second-generation compound, (79)
on binding of such molecules may also (WIN-54954) also advanced into clinical tests,
receptor binding directly (173). X-ray but had disappointing efficacy in Phase I1 tri-
ographic studies of (78) and analogs als, probably because of extensive metabolism.
;o the target protein VP1 were an es- Modification of the phenylisoxazole, guided by
part of the iterative optimization pro- both structural and metabolic considerations
~tled to safer and more effective anti- (177), allowed the creation of a stable and po-
znts (174-176). The goal of the process tent antiviral, the third-generation compound
generate a compound that is potent, (80)(WIN-63843, pleconaril, or Picovir) (178).
dly and metabolically stable, and effec- This compound was evaluated in Phase I11
inst as many serotypes of the virus as clinical trials and showed efficacy in humans.
!. There was therefore a need to bal- Oral dosing of virally infected patients with
Structure-Based Drug Design
(80) three times daily decreased the average oxidants. MAPK p38a has a central role in
time needed to become free of cold symptoms integrating the inputs from a complex signal-
from 10 days to between 8 and 9 days, and also ing network. Activation of MAPK p38a re-
reduced the duration of severe cold symptoms quires the dual phosphorylation of conserved
from 4.5 to 3.5 days (179). During the clinical threonine and tyrosine residues on a loop near
studies to support the new drug application the enzyme's active site (180). The unacti-
for (80)' about a quarter of the clinical isolates vated (nonphosphorylated) enzyme has a very
(of rhinovirus present initially or during the low affinity for ATP, but can bind to pyridinyl-
treatment) were resistant to the compound. imidazole inhibitors (181, 182). The activated
The majority of these resistant viruses had a enzyme in turn phosphorylates numerous
single mutation at VP1 residue Ile98, which substrates, including several transcription
directly interacts with (80) bound to VP1 in factors. This leads to activation of the tran-
wild-type virus. The clinical data also showed scription of many genes and causes the release
the elevation in some patients of hepatic cyto- of proinflammatory cytokines, primarily in-
chrome P450 levels during treatment with terleukin-lp (IL-1p) and tumor necrosis fac-
(go), raising concerns about potentially haz- tor (TNFa). MAPK p38a was identified as a
ardous drug-drug interactions. ViroPharma central player in this inflammatory pathway
sought and failed in early 2002 to gain the in a key study by scientists at SmithKline
approval of the U.S. Food and Drug Adminis- Beecham (183). The study involved the molec-
tration for its new drug application for (80)for ular cloning of the genes encoding proteins
treatment of the common cold. that bind to anti-inflammatory pyridinyl-imi-
dazole compounds already known to block the
2.8 Phosphoryl Transferases
biosynthesis of IL-1p and TNFa. The binding
Protein kinases and phosphatases play vital proteins turned out to be members of a known
roles in intracellular signaling pathways and kinase family. Since this finding, the enzymes
in the integration and control of major cellular in the MAPK pathway, and especially MAPK
processes. Kinases and other phosphoryl p38a, have been attacked by many scientists
group transferases are essential in the metab- seeking to discover anti-inflammatory drugs
olism of lipids, nucleotides, and other small (184).
biomolecules. The use of SBDD methods on Compound (81) (SB 2035801, a specific in-
such targets has expanded as more of their hibitor of MAPK p38a, is a prototype for the
X-ray structures have been solved, and will pyridinyl-imidazole compounds (185). This
continue to grow as more targets are validated compound is active in animal models of several
for their involvement in human diseases. inflammatory diseases (186),but was not itself
pursued as a clinical candidate because of its
2.8.1 Mitogen-Activated Protein Kinase p38a inhibition of other enzymes, including hepatic
Mitogen-activated protein kinase (MAPK) cytochrome P450 reductases. The pyridinyl-
p38a is a member of a family of SerIThr-spe- imidazole compounds have dissociation con-
cific protein kinases that are activated upon stants for MAPK p38a in the nanomolar
exposure of cells to mitogens such as bacterial range, competing with ATP for binding to the
lipopolysaccharide or environmental stresses enzyme. Because these compounds bind
such as exposure to W irradiation or chemical tightly to the unactivated enzyme, which has a
2 Structure-Based Drug Design
Figure 10.18. Binding of SB203580 (shown a s a ball and stick structure) in the active site of W K
p38a. In addition to the side chains of the labeled residues, the protein backbone between Leu104 and
Met109 is shown, as well a s several aliphatic side chains and a water molecule (red sphere). Hydrogen
bonds (dotted lines) are shown between the backbone amide of Met109 and the inhibitor's pyrirnidi-
nyl nitrogen, and between the €-aminoof Lys53 and the inhibitor's imidazole N3. This figure is based
on the PDB coordinate set 1A9U (187). See color insert.
Structure-Based Drug Design
efficacy in vivo makes it evident that there are Scientists at BioCryst, CIBA-Geigy, Southern
multiple ways to effectively inhibit this en- Research Institute, and the University of Ala-
zyme. bama collaborated to design inhibitors of hu-
man PNP (199,200). The project used an iter-
2.8.2 Purine Nucleoside Phosphorylase. ative process, in which new compound design
Purine nucleoside phosphorylase (PNP) cata- was guided by synthetic considerations, com-
lyzes the reversible phosphorolysis of purine puter graphics analysis of X-ray structural
nucleosides to the purine base and ribosyl or models, computational (Monte Carlo and en-
2-deoxyribosyl-a-1-phosphate. ergy minimization) methods, and the inhibi-
The vital role of PNP in the proliferation of tory potency of the compounds against PNP in
T-cells is evident from the fact that people vitro. Evaluation of the most potent inhibitors
with an inherited deficit in this activity have by use of cell-based assays, followed by phar-
30- to 100-fold lower numbers of T-lympho- macokinetic and pharmacological character-
cytes than normal (197). The accumulation of ization of several inhibitors in animal models,
dGTP and the resulting inhibition of ribonu- led to the choice of (86)for advancement into
cleotide reductase in PNP-deficient T-cells clinical trials. Compound (86) (BCX-34, pelde-
causes the suppression of T-cell proliferation. sine) is being evaluated for treatment of psori-
B-lymphocytes are unaffected. Hence, small asis and skin cancer (201,202).
molecule inhibitors of PNP could be used to
treat T-cell lymphomas and other T-cell-me-
diated diseases such as psoriasis. Adjunct
therapy with PNP inhibitors could also block
the catabolism of therapeutically useful nucle-
oside analogs.
Human PNP is a homotrimer of 32-kDa
subunits. The X-ray structures of the apoen-
zyme and some substrate analog complexes
were described in 1990. Each of the three iden-
tical active sites, located near the subunit in-
terfaces, are composed primarily of residues
from one subunit, with Phe159 participating
(86) peldesine
in the active site of the adjacent subunit (198).
+
PNP
Structure-Based Drug Design
Asn243
Lys244
NH3+ H
-..
--
..
Guanine
While this work was under way, a Phase I and superior solubility and pharmacokinetic
clinical trial was undertaken of PNP inhibitor properties, and so was advanced into human
(90) (PD-119229), which was developed by testing.
Combine and Integrate Technologies. Dedi- development because compounds with very
cated molecular biology and protein chemistry low solubility have limited or variable bio-
personnel and equipment are essential for availability.
identifying the right constructs for crystalliza- Binding Sites Can Be Filled Many Ways.
tion and to the assurance of a steady supply of More than one small molecule scaffold can
protein. Synthetic chemists trained in graphi- provide the necessary and sufficient hydro-
cal analysis of protein structures tend to be phobic and polar complementarity to generate
excellent designers, and will be unlikely to de- potent inhibition. Sometimes, there are many
sign molecules that they cannot make. Early scaffolds that will work. However, the struc-
tactical integration of the synthetic ap- tures of complexes with all the different scaf-
proaches is even more important if combina- folds will likely have common features that
are distinct from the structure of the apo en-
torial chemistry is part of the program. The
zyme, attributed to large-scale conformational
structural information can be used to design
changes that occur upon binding any ligand.
combinatorial libraries as effectively as it can
The most useful X-rav " models to use for the
to design molecules one at a time. The use of design of new compounds will be those that
libraries can compensate for the inaccuracies already have some substrate or inhibitor
inherent in current computational scoring al- bound. There are several ways to design these
gorithms. More significantly, the integration compounds: modification of existing inhibi-
of orthogonal technologies will stimulate cre- tors, de novo creation of novel inhibitors, or
ative thought and yield much more than the some combination of these methods.
sum of the different technologies applied sep- Not All Inhibitors Are Drugs. Having the X-
arately. ray structure of the targetprotein, or even
Go Big Early and Often. Filling active site having used the solved structure to design a
space as much as possible will maximize the potent inhibitor, is only the beginning of solv-
chance that a compound will be a potent inhib- ing the difficult problems of drug design. The
itor. During compound design, it should also use of structure to create ~ o t e ninhibitors
t can
be recognized that proteins are flexible, and certainly shorten the time to get compounds
that accessible conformations are hard to pre- into human testing, but use of SBDD methods
dict. Sometimes, larger functionality can be does not guarantee that a potent compound
accommodated than the existing structural will become a drug. This is an old lesson, actu-
model permits. A few compounds should be ally, but is forgotten at great cost.
included to probe this. These may give rise to Structure of Free Inhibitor Is Important. De-
an unexpected boon, such as access to a signif- solvation of the free ligand and of the protein's
icantly altered new protein conformation with active-site groups upon complex formation are
novel sites that can be exploited in new rounds both significant. Both enthalpic and entropic
of design and synthesis. contributions to the binding energy must be
Aqueous Solubility is Critical to Success. considered. Particular attention should be
Both early in SBDD and later on in clinical paid to the advantage that can be gained from
u
development, sufficient aqueous solubility is preorganization" of the inhibitor before
critical. Solubility is important early because binding, that is, low energy conformers bind
the concentrations of compounds must be with greater apparent avidity.
high during crystallization experiments to sat- Bound Water Is Special, But Not All Hydro-
urate the high levels of protein. The ratio of gen Bonds Are Created Equal. Each of the
the solubility to the inhibition constant of a tightly bound waters present in an X-ray
compound is also critical to the success of the structural model has a uniaue environment
crystallization experiment. Once some struc- and a unique function. In some cases, libera-
tural information becomes available, both pa- tion of a bound water molecule by displacing it
rameters can be manipulated, but usually, sol- with an inhibitor's functionality can greatly
uble inhibitors must be available before the increase inhibitor affinity, although this is nit
availability of structural information. Solubil- globally applicable. The entropic advantage of
ity matters during animal testing and later in releasing a bound water into bulk solvent does
eferences
not always exceed the enthalpic cost of the dis- variable, so multiple orthogonal methods
placement. In many situations, the preferred should be used to assess the effects of changes.
solution will be to retain a water molecule and It is also important during the rational design
use it to maximize inhibitor binding. For ex- process to include room for serendipity. Do not
ample, a water molecule that donates two hy- reject an idea for a new compound that seems
drogen bonds and accepts one cannot be isos- to make intuitive sense based on a single crys-
terically replaced. Electrostatic interactions tal structure or computational calculation.
that are more complex than hydrogen bonds
and simple ion pairs are very difficult to
model, anticipate, and exploit in inhibitor de- REFERENCES
sign. 1. D. J . Abraham, Intra-Sci. Chem. Rep., 8, 4
Retain Potency While Addressing Other Is- (1974).
sues. Structural information can be very use- 2. http://www.agouron.com/
ful in designing compounds that are not part 3. http://www.stromix.com/
of a competitor's intellectual property, or that 4. http://www.astex-technology.com
cannot be patented because of information in 5. http://www.accelrys.com/consortia/htc/
the public domain. Redesign of a compound 6. P. J. Goodford, J. Med. Chem., 27,557 (1984).
that is not itself proprietary, by use of struc- 7. C. R. Beddell, Ed., The Design ofDrugs to Mac-
tural information obtained with that com- romolecular Targets, John Wiley & Sons,
pound, can yield valuable new proprietary Chichester, UK, 1992.
molecules. Structural information can also 8. J. Greer, J. W. Erickson, J. J. Baldwin, and
guide the modification of physicochemical, M. D. Varney, J. Med. Chem., 37, 1035-1054
metabolic, or pharmacological properties or (1994).
target selectivity without compromising the 9. R. E. Babine and S. L. Bender, Chem. Rev., 97,
potency against the primary therapeutic tar- 1359 (1997).
get. 10. P. Veerapandian, Ed., Structure-Based Drug
All Models Are Wrong; Some Are Useful. At Design, Marcel Dekker, New York, 1997.
present, it is impossible to calculate an accu- 11. R. T. Borchardt, R. M. Freidinger, T. K. Saw-
rate value for a binding constant on an abso- yer, and P. L. Smith, Eds., Integration ofPhar-
lute scale. However, accurately estimating the maceutical Discovery and Development. Case .
relative binding of a series of closely related Histories (Pharmaceutical Biotechnology,
compounds is possible, and is much more Band l l ) , Plenum Press, New York, 1998.
likely to be successful if X-ray structures of 12. K. Gubernator and H.-J. Bohm, Eds., Struc-
target complexes with some of the compounds ture-Based Ligand Design, Wiley-VCH, New
are available. Thus, although there is much YorWWeinheim, 1998.
room for improvement, local computational 13. C. L. Nobbs, H. C. Watson, and J . C. Kendrew,
models can sometimes be quite useful. Even in Nature, 209,339 (1966).
the absence of an experimentally determined 14. M. F. Perutz, Nature, 228, 726 (1970).
X-ray structure of the target, a hypothetical 15. R. C. Ladner, E. J. Heidner, and M. F. Perutz,
model can be a powerful tool for the design of J. Mol. Biol., 114,385 (1977).
useful compounds (e.g., captopril and 16. G. Fermi, M. F. Perutz, B. Shaman, and R.
TAK-147). Fourme, J. Mol. Biol., 175, 159 (1984).
Iterative SBDD Cycles Are Optimal. Small 17. B. C. Wishner, K. B. Ward, E. E. Lattman, and
alterations in ligand structure often cause ma- W. E. Love, J. Mol. Biol., 98, 179 (1975).
jor changes in binding mode, protein confor- 18. D. J. Harrington, K. Adachi, and W. E. Royer
mation, or both. These changes can go unde- Jr., J. Mol. Biol., 272, 398 (1997).
tected if the structural effects are not analyzed 19. C. R. Beddell, P. J. Goodford, G. Kneen, R. D.
by X-ray analysis iteratively or too infre- White, S. Wilkinson, and R. Wootton, Br. J.
quently. This can yield confusing or mislead- Pharmacol., 82,397 (1984).
ing structure-activity relationships, leading to 20. M. Merrett, D. K. Stammers, R. D. White, R.
a waste of precious time. Moreover, changes in Wootton, and G. Kneen, Biochem. J.,239,387
compound structure seldom affect only one (1986).
Structure-Based Drug Design
21. F. C. Wireko and D. J. Abraham, Proc. Natl. 43. M. K. Safo, C. M. Moure, J. C. Burnett, G. S.
Acad. Sci. USA, 88,2209 (1991). Joshi, and D. J. Abraham, Protein Sci., 10,951
22. D. J. Abraham, A. S. Mehanna, F. C. Wireko, (2001).
E. P. Orringer, J. Whitney, and R. P. Thomas, 44. S. K. Burley and G. A. Petsko, FEBS Lett., 201,
Blood, 77, 1334 (1991). 751 (1986).
23. M. K. Safo, S. Nokuri, and D. J. Abraham, Un- 45. S. K. Burley and G. A. Petsko, Science, 229,23
published results. (1985).
24. P. E. Kennedy, F. L. Williams, and D. J. Abra- 46. M. Levitt and M. F. Perutz, J. Mol. Biol., 201,
ham, J. Med. Chem., 27, 103 (1984). 751 (1988).
25. D. J. Abraham, M. F. Perutz, and S. E. V. Phil- 47. D. J. Abraham, M. K. Safo, T. Boyiri, R. E.
lips, Proc. Natl. Acad. Sci. USA, 80,324 (1983). Danso-Danquah, J. Kister, and C. Poyart, Bio-
26. M. F. Perutz, G. Fermi, D. J. Abraham, C. Po- chemistry, 34,15006 (1995).
yart, and E. Bursaux, J. Am. Chem. Soc., 108, 48. M. P. Grella, R. Danso-Danquah, M. K. Safo,
1064 (1986). G. S. Joshi, J. Kister, S. J. Hoffman, M. Mar-
27. E. P. Orringer, D. S. Blythe, J. A. Whitney, S. den, and D. J. Abraham, J. Med. Chem., 25,
Brockenbrough, and D. J. Abraham, Am. J. 4726 (2001).
Hematol., 39, 39 (1992). 49. A.M. Youssef, M. K. Safo, R. Danso-Danquah,
28. D. J. Abraham, A. S. Mehanna, F. Williams, G. S. Joshi, J. Kister, M. Marden, and D. J.
E. J. Cragoe Jr., and 0. W. Woltersdorf Jr., Abraham, J. Med. Chem., 45,1184 (2002).
J. Med. Chem., 32,2460 (1989). 50. J. A. Walder, R. H. Zaugg, R. Y. Walder, J. M.
29. D. J. Abraham, P. E. Kennedy, A. S. Mehanna, Steele, and I. M. Klotz, Biochemistry, 18,4265
D. Patwa, and F. L. Williams, J. Med. Chem., (1979).
27,967 (1984). 51. R. Chatterjee, E. V. Welty, R. Y. Walder, S. L.
30. A. Arnone, Nature, 237, 146 (1972). Pruitt, P. H. Rogers, A. h o n e , and J. A.
31. V. Richard, G. G. Dodson, and Y. Mauguen, J. Walder, J. Biol. Chem., 261,9929 (1986).
Mol. Biol., 233,270 (1993). 52. S. R. Snyder, E. V. Welty, R. Y. Walder, L. A.
32. P. J. Goodford, J. St-Louis, and R. Wootton, Williams, and J. A. Walder, Proc. Natl. Acad.
Br. J. Pharmacol., 68, 741 (1980). Sci. USA, 84,7280 (1987).
33. C. R. Beddell, P. J. Goodford, F. E. Norrington, 53. N. Komiyama, J. Tame, and K. Nagai, Biol.
S. Wilkinson, and R. Wootton, Br. J. Pharma- Chem., 377,543 (1996).
col., 57,201 (1976). 54. T. Boyiri, M. K. Safo, R. E. Danso-Danquah,'J.
34. F. F. Brown and P. J. Goodford, Br. J. Phar- Kister, C. Poyart, and D. J. Abraham, Bio-
macol., 60,337 (1977). chemistry, 34,15021 (1995).
35. A. S. Mehanna and D. J. Abraham, Biochemis- 55. M. F. Perutz, Br. Med. Bull., 32, 195 (1976).
try, 29,3944 (1990). 56. J. Monod, J. Wyman, and J.-P. Changeux, J.
36. M. F. Perutz and C. Poyart, Lancet, 2, 881 Mol. Biol., 12,88 (1965).
(1983). 57. D. A. Matthews, R. A. Alden, J. T. Bolin, S. T.
37. I. Lalezari and P. Lalezari, J. Med. Chem., 32, Freer, R. Hamlin, N. Xuong, J . Kraut, M. Poe,
2352 (1989). M. Williams, and K. Hoogsteen, Science, 197,
38. I. Lalezari, P. Lalezari, C. Poyart, M. Marden, 452 (1977).
J. Kister, B. Bohn, G. Fermi, and M. F. Perutz, 58. L. F. Kuyper, B. Roth, D. P. Baccanari, R. Fer-
Biochemistry, 29, 1515 (1990). one, C. R. Beddell, J. N. Champness, D. K.
39. D. J. Abraham, R. S. Randad, M. A. Mahran, Stammers, J. G. Dann, F. E. Norrington, D. J.
and A. S. Mehanna, J. Med. Chem., 34, 752 Baker, and P. J. Goodford, J. Med. Chem., 25,
(1991). 1120 (1982).
40. D. J. Abraham, F. C. Wireko, R. S. Randad, C. 59. D. A. Matthews, J. T. Bolin, J. M. Burridge,
Poyart, J. Kister, B. Bohn, J. F. Leard, and D. J. Filman, K. W. Volz, and J. Kraut, J. Biol.
M. P. Kunert, Biochemistry, 31,9141 (1992). Chem., 260,392 (1985).
41. F. C. Wireko, G. E. Kellogg, and D. J . Abraham, 60. K. Appelt, R. J. Bacquet, C. A. Bartlett, C. L. J.
J. Med. Chem., 34,758 (1991). Booth, S. T. Freer, M. A. Fuhry, M. R. Gehring,
42. D. J. Abraham, J. Kister, G. S. Joshi, M. C. S. M. Herrmann, E. F. Howland, C. A. Janson,
Marden, and C. Poyart, J. Mol. Biol., 248,845 T. R. Jones, C. C. Kan, V. Kathardekar, K. K.
(1995). Lewis, G. P. Marzoni, D. A. Matthews, C.
Mohr, E. W. Moomaw, C. A. Morse, S. J. Oat- 77. R.J. Almassy, C. A. Janson, C. C. Kan,and Z.
ley, R. C. Ogden, M. R. Reddy, S. H. Reich, Hostomska, Proc. Natl. Acad. Sci. USA, 89,
W. S. Schoettlin, W. W. Smith, M. D. Varney, 6114-6118(1992).
J. E. Villafranca, R. W. Ward, S. Webber, S. E. 78. C. C. Kan, M. R. Gehring, B. R. Nodes, C. A.
Webber, K. M. Welsh, and J. White, J. Med. Janson, R. J. Almassy, and Z. Hostomska, J.
Chem., 34,1925(1991). Protein Chem., 11, 467-473(1992).
61. S. H. Reich and S. E. Webber, Perspect. Drug 79. M. D. Varney, C. L. Palmer, W. H. Romines
Discov. Des., 1,371-390(1993). 3rd, T. Boritzki, S. A. Margosiak, R. Almassy,
62. L. W. Hardy, J. S. Finer-Moore, W. R. Mont- C. A. Janson, C. Bartlett, E. J. Howland, and R.
fort, M. 0.Jones, D. V. Santi, and R. M. Stroud, Ferre, J. Med. Chem., 40,2502-2524(1997).
Science, 235,448-455(1987). 80. C. Shih, L. S. Gossett, J. F. Worzalla, S. M.
63. D.A. Matthews, K. Appelt, S. J. Oakley, and Rinzel, G. B. Grindey, P. M. Harrington, and
N. H. Xuong, J. Mol. Biol., 214, 923-936 E. C. Taylor, J. Med. Chem., 35, 1109-1116
(1990). (1992).
64. W. R. Montfort, K. M. Perry, E. B. Fauman, 81. D.W. Cushman, H. S. Cheung, E. F. Sabo, and
J. S. Finer-Moore, G. F. Maley, L. Hardy, F. M. A. Ondetti, Biochemistry, 16,5484 (1977).
Maley, and R. M. Stroud, Biochemistry, 29, 82. M. A. Ondetti, B. Rubin, and D. W. Cushman,
6964-6977(1990). Science, 196,441 (1977).
65. Y.Takemura and A. L. Jackman, Anticancer 83. M. J. Wyvratt and A. A. Patchett, Med. Res.
Drugs, 8,3-16(1997). Rev., 5,483-531(1985).
66. S. E.Webber, T. M. Bleckrnan, J. Attard, J. G. 84. D. W. Cushman and M. A. Ondetti, Hyperten-
Deal, V. Kathardekar, K. M. Welsh, S. Webber, sion, 17,589(1991).
C. A. Janson, D. A. Matthews, W. W. Smith, 85. D. W. Cushman and M. A. Ondetti, Nat. Med.,
S. T. Freer, S. R. Jordan, R. J. Bacquet, E. F. 5,1110(1999).
Howland, C. L. J. Booth, R. W. Ward, S. M. 86. J. Rahuel, V. Rasetti, J. Maibaum, H. Rueger,
Herrmann, J. White, C. A. Morse, J. A. R. Goschke, N. C. Cohen, S. Stutz, F. Cumin,
Hilliard, and C. A. Bartlett, J. Med. Chem., 36, W. Fuhrer, J. M. Wood, and M. G. Grutter,
733-746(1993). Chern. Biol., 7,493-504(2000).
67. I. Niculescu-Duvaz, Curr. Opin. Invest. Drugs, 87. L. D. Byers and R. Wolfenden, Biochemistry,
2,693-705(2001). 12,2070-2078(1973).
68. P.J. Goodford, J. Med. Chem., 28,849(1985). 88. J. R. Huff and J. Kahn, Adv. Protein Chem., 56,
69. P.Goodford, J.Chemom., 10,107(1996). 213-251(2001).
89. A. Wlodawer and J. Vondrasek, Annu. Rev.
70. M.D. Varney, G. P. Marzoni, C. L. Palmer,
Biophys. Biomol. Struct., 27,249 (1998).
J. G. Deal, S Webber, K. M. Welsh, R. J. Bac-
quet, C. A. Bartlett, C. A. Morse, C. L. Booth, 90. T. D. Meek, J.Enzyme Inhib., 6,65(1992).
S. M. Herrmann, E. F. Howland, R. W. Ward, 91. A. Wlodawer and J. W. Erickson, Annu. Rev.
and J. White, J. Med. Chem., 35, 663-676 Biochem., 62,543(1993).
(1992). 92. R. Lapatto, T. Blundell, A. Hemmings, J. Over-
71. D. R. Newell, Semin. Oncol., 26 (Suppl. 61, ington, A. Wilderspin, S. Wood, J. R. Merson,
74-81(1999). P. J. Whittle, D. E. Danley, K. F. Geoghegan, et
al., Nature, 342,299302(1989).
72. P. Norman, Curr. Opin. Invest. Drugs, 2,
93. M. A.Navia, P. M. Fitzgerald, B. M. McKeever,
1611-1622(2001).
C. T. Leu, J. C. Heimbach, W. K. Herber, I. S.
73. E. C. Taylor, Adv. Exp. Med. Biol., 338, 387- Sigal, P. L. Darke, and J. P. Springer, Nature,
408(1993). 337,615-620(1989).
74. G. P. Beardsley, B. A. Moroson, E. C. Taylor, 94. A. Wlodawer, M. Miller, M. Jaskolski, B. K.
and R. G. Moran, J. Biol. Chem., 264,328333 Sathyanarayana, E. Baldwin, I. T. Weber,
(1989). L. M. Selk, L. Clawson, J. Schneider, and S. B.
75. J. R. Piper, G. S. McCaleb, J. A. Montgomery, Kent, Science, 245,616-621(1989).
R. L. Kisliuk, Y. Gaumont, J. Thorndike, and 95. I. B. Duncan and S. Redshaw, Infect. Dis.
F. M. Sirotnak, J. Med. Chem., 31,2164-2169 Ther., 25,27-47(2002).
(1988). 96. A. G. Tomasselli, M. K. Olsen, J. 0. Hui, D. J.
76. S. E. Greasley, T. H. Marsilje, H. Cai, S. Baker, Staples, T. K. Sawyer, R. L. Heinrikson, and
S. J. Benkovic, D. L. Boger, and I. A. Wilson, C. S. Tomich, Biochemistry, 29, 264-269
Biochemistry, 40,13538-13547(2001). (1990).
Structure-Based Drug Design
185. A. Cuenda, J. Rouse, Y. N. Doza, R. Meier, P. 194. J. J. Haddad, Curr. Opin. Invest. Drugs, 2,
Cohen, T. F. Gallagher, P. R. Young, and J. C. 1070 (2001).
Lee, FEBS Lett., 364,229-233(1995). 195. C. Pargellis, L.Tong, L. Churchill, P. F. Cirillo,
186. A. M. Badger, J. N. Bradbeer, B. Votta, J. C. T. Gilmore, A. G. Graham, P. M. Grob, E. R.
Lee, J. L. Adams, and D. E. Griswold, J. Phar- Hickey, N. Moss, S. Pav, and J. Regan, Nut.
mmol. Exp. Ther., 279,1453-1461 (1996). Struct. Biol., 9,268-272(2002).
187. Z. Wang, B. J. Canagarajah, J. C. Boehm, S. 196. J. Regan, S. Breitfelder, P. Cirillo, T. Gilmore,
Kassisa, M. H. Cobb, P. R. Young, S. Abdel- A. G. Graham, E. Hickey, B. Klaus, J. Madwed,
Meguid, J . L. Adams, and E. J. Goldsmith, M. Moriak, N. Moss, C. Pargellis, S. Pav, A.
Structure, 6,1117-1128(1998). Proto, A. Swinamer, L. Tong, and C. Torcel-
188. K. P. Wilson, M. J. Fitzgibbon, P. R. Caron, lini, J. Med. Chem., 45,2994(2002).
J. P. Griffith, W. Chen, P. G. McCaffrey, S. P. 197. G. R. Boss and J . E. Seegmiller, Annu. Rev.
Chambers, and M. S. Su, J. Biol. Chem., 271, Genet., 16,297-328(1982).
27696-27700(1996). 198. S. E. Ealick, S. A. Rule, D. C. Carter, T. J.
189. T. Fox, J. T. Coll, X. Xie, P. J. Ford, U. A. Greenhough, Y. S. Babu, W. J. Cook, J. Ha-
Germann, M. D. Porter, S. Pazhanisamy, M. A. bash, J. R. Helliwell, J. D. Stoeckler, R. E.
Fleming, V. Galullo, M. S. Su, and K. P. Wil- Parks Jr., S. Chen, and C. E. Bugg, J. Biol.
son, Protein Sci., 7,2249(1998). Chem., 265,1812(1990).
190. R. J. Gum, M. M. McLaughlin, S. Kumar, Z. 199. S. E. Ealick, Y. S. Babu, C. E. Bugg, M. D.
Wang, M. J. Bower, J. C. Lee, J. L. Adams, G. P. Erion, W. C. Guida, J. A. Montgomery, and
Livi, E. J. Goldsmith, and P. R. Young, J. Biol. J. A. Secrist 3rd, Proc. Natl. Acad. Sci. USA,
Chem., 273,15605-15610(1998). 88,11540-11544(1991).
191. J. L. Adams, J. C. Boehm, T. F. Gallagher, S. 200. J. A. Montgomery, S. Niwas, J. D. Rose, J. A.
Kassis, E. F. Webb, R. Hall, M. Sorenson, R. Secrist 3rd, Y. S. Babu, C. E. Bugg, M. D.
Garigipati, D. E. Griswold, and J. C. Lee, Erion, W. C. Guida, and S. E. Ealick, J. Med.
Bioorg. Med. Chem. Lett., 11, 2867-2870 Chem., 36,55-69(1993).
(2001). 201. M. Duvic, E. A. Olsen, G. A. Omura, J. C.
192. T. Fullerton, A. Sharma, U. Prabhakar, M. Maize, E. C. Vonderheid, C. A. Elmets, J. L.
Tucci, S. Boike, H. Davis, D. Jorkasky, and W. Shupack, M. F. Demierre, T. M. Kuzel, and
Williams, Clin. Pharmacol. Ther., 67, 114 D. Y. Sanders, J. Am. Acad. Dermatol., 44,
(2000). 940-947 (2001).
193. Pat. Appl. Vertex Pharmaceuticals, Inc., as- 202. P. E. Morris Jr. and G. A. Omura, Curr. *
signee, PCT WO 00/36096(2000). Pharm. Des., 6,943-959(2000).
CHAPTER ELEVEN
X-Ray Crystallography
-
in rug Discovery
DOUGLAS A. LMNGSTON
SEANG. BUCHANAN
KEVINL. D'AMICO
" MICHAEL V. MILBURN
THOMAS S. PEAT
J. MICHAEL SAUDER
Structural GenomiX
San Diego, California
Contents
1 Introduction, 472
2 Methodology, 472
2.1 Theory, 472
2.2 Crystallization, 473
2.3 Data Collection, 474
2.4 Phase Problem, 476
2.5 Computing and Refinement, 478
2.6 Databases, 478
3 Applications of the Use of Crystallographic
Studies in Drug Discovery and Development, 479
4 Structural Genomics, 481
4.1 Introduction to Structural Genomics, 481
4.2 Genome Annotation, 481
4.3 Pathways, 495
4.4 Protein Structure Modeling, 495
5 Conclusion, 496
Figure 11.1. (a) A look at a two-dimensional crystal lattice diffraction pattern for a small molecule
natural product, MW 222. Each diMaction intensity in the lattice is numbered to give a unique three
hensional address (identification)for that measurement. These numerical addresses are referred to as
Uiller indices or hkl values. (b) A diffraction pattern from a precession photograph for hemoglobin, MW
55,000. Note the the diffraction lattice spacings are much smaller for the large molecule and reflects the
nature of Bragg's law, where the lattice is observed in reciprocal space (lld = 2sinBlnA). (c)An image plate
Wfraction pattern for a protein. [Adapted with permission from D. J. Abraham, Computer-Aided Drug
Design, Methods, and Applications, Marcel Dekker, Inc., New York, 1989.1
fere:nce in intensities observed in Fig. 11.1. tunately, crystallization is still more empirical
The steps that one goes through to solve a than scientific. It requires closely monitored
crystal structure follow, with the intent of pro- matrix changes in growing conditions, i.e., pH,
vidiiig the non-crystallographer with a simpli- salt concentration, temperature, solvents, and
fied and pictorial view of the process. crystallization setups. Most laboratories now
use well-known sparse matrix screens pio-
Crystallization neered by Jancarik and Kim (4) and further
Cry$ltallization is the critical first and most refined and commercially distributed by
imp(wtant step, because good single crystals Hampton Research (5, 6). Screens will typi-
USUiilly provide quality diffraction. Linus cally employ vapor diffusion experiments
Paulling once entitled one of his lectures "The (hangingdrops or sitting drops), and occasion-
Imp1ortance of Being Crystalline" (3). Unfor- ally batch and liquid-liquid diffusion methods.
X-Ray Crystallography in Drug Discovery
More recently, batch crystallizations have icantly lower. The typical exposure time for
been rejuvenated by the development of mi- home laboratory CuKa sources ranges from 5
crobatch robots and by the groups of Chayen to 60 min for a range of data, whereas the equiv-
(7), DeTitta (8), and D'Arcy (9). alent set of data at an undulator beamline, i.e.,
Although discovering the crystallization the advanced photon source (APS), requires
conditions for a new protein or nucleic acid only about 1 s of exposure time. Synchrotron
can be tedious, relatively inexperienced indi- radiation has also allowed the use of MAD, en-
viduals can usually succeed at growing crys- abling phasing (imaging) of the protein using a
tals once the initial conditions are established. derivative with only one heavy element.
Some of the most successful crystallization A variety of detectors are in common use to
methodologies are based on vapor diffusion record X-ray data and have the advantage of
methods (Fig. 11.2). The general idea behind measuring the intensities of large numbers of
vapor diffusion crystallization is to dissolve diffraction spots simultaneously. The most
the protein in a buffer, with a non-precipitat- popular detectors are image plates and charge-
ing amount of the miscible vapor solvent, in a coupled device (CCD) cameras. Image plates
reservoir that is in equilibrium with higher are typically the choice for laboratory rotating
concentration of the vaporizing solvent anode sources and lower flux synchrotron
nearby. Another variation is to set up the crys- sources (Fig. 11.3). CCDs have the distinct ad-
tallization cocktail containing salts, buffers, P vantage of speed at the higher flux synchro-
(poly) ethylene glycols (PEGS), small mole- tron sources, because they simultaneously
cule solvents, etc., where volume is slowly re- measure and record diffraction intensities
duced by the equilibrating mixture, which is (amplitudes). Current CCD cameras have
placed nearby. McPherson, Carter, and others readout times on the order of a few (typically
have developed more quantitative methods for 2-8) seconds, a speed not dreamed of when the
optimizing crystal growth (10). first protein structure data was recorded from
phot&aphs (with intensities measured by
2.3 Data Collection
eye comparison to standard reference spots on
Most laboratories have rotating anode sources a separate film strip). Speed of data collection
for production of high intensity X-ray beams. can be an important advantage at third gener-
These are coupled with an area detector that ation synchrotron sources, with even shorter
has made single crystal diffractometers obso- exposure times. On the other hand, image
lete. Mirrors and other technology have also plates have a greater range of use, being acces-
been used to provide a more intense and sible in any X-ray diffraction laboratory, with
monochromatic radiation source (11).Radia- many of the newer models taking less than 1
tion from rotating anode sources is at a fixed min to record the intensity data. Image plate
wavelength, usually from high-voltage elec- detectors offen have more than one image plate,
trons impinging on either a copper or molyb- so one can be read while the other is exposed,
denum rotating anode, i.e., radiation at 1.54 effectively wasting no time during the collection
(copper) or 0.71 A (molybdenum). Radiation period. The image plates also offer a larger sur-
from synchrotron sources can often be tuned face area for data collection than most CCD cam-
to a wavelength of interest for multiwave- eras and are considerably less expensive.
length anomalous diffraction (MAD) experi- X-ray diffraction data from crystals are ei-
ments (see below). ther collected at room temperature or under
X-rays generated by a synchrotron source cryogenic conditions at liquid nitrogen tem-
are typically two orders of magnitude stronger peratures [around 100°K (- 170"C)l.For room
than conventional CuKa radiation generated temperature data collection, crystals are nor-
by a rotating anode. Synchrotron sources have mally mounted in thin-walled glass capillar-
greatly extended the ability to solve new pro- ies, with a small amount of mother liquor
tein structures when only weakly diffracting about 5 mm from the crystal. The mother li-
or small crystals are available. Another advan- quor in the capillary is critical because protein
tage in using the stronger synchrotron radia- crystals are 40-80% water-dried protein
tion is that the crystal exposure time is signif- crystals do not diffract. The nearby mother
Drop: 50% protein,
50% cocktail
Drop: 50%
protein,
50% cocktail
Fi gure 11.2. (a) The drops are typically 1-10 pA total volume, with between 100 and 1000 pA total
vo.lume of cocktail in the well. The smaller the drop size, the faster the equilibrium occurs, in general.
Thlere are a variety of plates now available in which to set up these vapor diffusion experiments, the
mc1st common being 24-well Limbro plates and 96-well microtiter plates. Several robots have been
d eveloped to automatically set up the crystallization experiments; although most are no faster than
do1ing the same procedure by hand (particularly with a multi-channel pipettor), there can be other
ad.vantages (e.g., consistency and reducing repetitive stress syndrome). Once plates are set up, they
arc? typically kept at a constant temperature and observed periodically under a microscope. (b and c)
Prc3gress in automating this aspect of characterization has occurred, and there are now imagers that
wil1 take high resolution, digital pictures of each drop in turn and store these for either manual or
aut;omated analysis. (dl Batch experiments are set up such that the protein is mixed with cocktail and
the!re is little concentration or dilution to the sample over time. This can now be done in very high
thr,oughput and small scale: 50-200 nL drops under oil in 1536-well plates, for example. This kind of
aPI)roach has been used to screen hundreds of conditions with small amounts of protein, which may
all()w for faster optimization later. One caveat is that small crystals don't necessarily lead to larger
crystals later, and all structures to date have had crystals of greater than 10 microns in a t least one
dinlension.
X-Ray Crystallography in Drug Discovery
proven in many instances to result in very grams such as WARP (24) can automatically
high quality maps (15). provide models of protein structures. When
Many different heavy atoms have been high
- resolution data is not available. a model is
used for MADISAD phasing, the most popular most often built in by hand using such graph-
being selenium. Selenium is incorporated into ics programs like 0 (25) or XFIT. The models
the amino acid sequence of the protein by add- are refined against the data by programs such
ing selenomethionine to the growth media as REFMAC (19) and CNS (26). All of these
when the protein is produced (16). For pro- programs have become much faster and easier
teins that bind DNA, 5-bromouracil has been a to use because of the incredible increases in
popular choice for phasing through anomalous speed that new hardware has allowed.
scattering. Most heavy elements have good It is worth mentioning that statistical and
anomalous signals (Hg, Pt, U, Au, etc.). Lan- probabilistic techniques have had a significant
thanides have a particularly good signal and impact in how heavy atoms are found and
can sometimes substitute for divalent metals models are refined (e.g., SHARP, SOLVE,
found naturally in the protein (e.g., Ca) (17). REFMAC). Baysian statistics and maximum
One of the major advantages of MAD phasing likelihood methods are now used instead of
is that the signal does not decay at higher res- least-squares methods. One may want to con-
olution with perfectly isomorphic crystals, so sider how various data collection strategies
the experimentally phased map can be quite may affect the later steps in the process by
good out to the full resolution of diffraction. keeping this in mind, i.e., high redundancy in
This typically has not been the case when us- the data makes for better statistics.
ing multiple isomorphous replacement, where The quality of a structure is measured in
the experimentally phased map often only ex- many ways: how low the R factor or R,,, is
tends to around 2.5 A resolution, because of a (the fit of observed data to the model), the res-
lack of isomorphism between the native and olution limit of the data, the ideality of the
heavy metal substituted crystals. Anomalous bonds and angles, etc. How well a structure
scattering has been useful in the structure de- measures up to other structures of about the
termination of very large structures; the 30s same resolution also gives a good idea of how
ribosome was recently solved using 0s and Lu "good" a given structure is (PROCHECK pro-
derivatives (18). gram). SFCHECK is a useful program for,as-
sessing the agreement between the atomic
2.5 Computing and Refinement
model and the experimental X-ray data. The
Raw intensity X-ray crystallographic data is level of confidence one expects from a given
next reduced and scaled to provide structure model will depend on the resolution of the
factors (F)that are used to solve and image the data. This can be seen clearly in Fig. 11.6,
structure. Two of the most -popular
- software where a residue from a protein structure is
packages employed to reduce raw date are shown with three different data cutoffs at dif-
Mosflm/CCP4 (19) and the HKL suite (20). ferent resolution ranges. The model from a
Both work very well and are very fast with 3.0-A data set may look the same as one from a
modern computers. A variety of programs, 1.3-A data set, but the level of confidence is
such as SOLVE (21), Shake and Bake (22), or much higher in the latter. A reasonably well-
SHELX (23), can be employed to find the refined structure will have a crystallographic
heavy atom positions, including hand search- R factor between 15% and 25% and will have
ing methods through Patterson maps. Once an R,,, of less than 30% under most circum-
heavy atom sites are found, they are usually stances.
refined with the programs SHARP (24) or
2.6 Databases
MLPHARE (19). The heavy atom positions
are next used as phase information input to The Protein Data Bank (PDB) (27,281 is now
provide initial phases for electron density coordinated by a consortium of several insti-
maps, which are used to fit the remainder of tutions (Rutgers University, the San Diego
the protein or nucleic acid. Once a model of the Supercomputer Center, and National Insti-
structure is obtained it is refined. In cases tute for Standards and Technology). As of this
where high resolution data is available, pro- writing, the PDB has over 18,000 structures,
lications of the Use of Crystallographic Studies in Drug Discovery and Development 479
gure 11.6. Three density maps at differing resolutions: a, 1.3 A; b, 2.1 A; c, 3.0A. See color insert.
with alver 15,000 of these done by X-ray crys- of structure-based drug design. As structural
tallogeaphy. Most of the rest were done by biology moves into the post-genomic age,
NMR. For small molecules, the Cambridge many companies and academic laboratories
Struct(ural Database (CSD) (29) contains are faced with the challenge of co-crystalliza-
structural information for over 230,000 or- tion of targets and inhibitors or activators on a
ganic and organometallic compounds. All of scale never before attempted. Previously,
these structures
, have been determined by X- crystal structure determination of a protein-
ray or neutron diffraction techniques. substrate or inhibitor complex in an academic
or industrial environment often yielded the
3 AP PLlCATlONS OF THE USE OF structural information desired to understand
CRYST'ALLOCRAPHIC STUDIES IN DRUG the mechanism of action or in the design of a
DlSCC)VERY A N D DEVELOPMENT more suitable substrate or inhibitor. However
modern day laboratories are now faced with
Crystadlization of small molecule compounds the daunting challenge of crystallizing hun-
with a protein or nucleic acid target followed dreds of compounds for clues in further ligand
by X-riay crystallographic determination of the design using standard organic synthesis or
combiined structure is the basis and hallmark combinatorial approaches.
X-Ray Crystallography in Drug Discovery
Until recently, structure determination by There are several ways- in which structural
protein crystallography was a time-consuming genomics has promise as a tool for genome an-
method accessible to a few privileged skilled notation and target prioritization. For genes
practitioners. X-ray crystallography was re- of unknown function. structure can often -pro-
served to tackle questions requiring atomic vide clues to biochemical function. Sequence
resolution details of a demonstrably impor- homology has become a routine method for
tant protein, often a drug target. Indeed, to functional assignment, but even the most
Table 11.1 Known Drug Targets with Published Structures
Target and PDB Reference Resolution Source Homology Year Reference
Acetylcholinesterase
1MAHW Green mamba
1B41(A), 1F8U(A) Green mamba
1C2B(A), 1C20(A) Electric eel
lMAA(A) Mouse
Adenosine deaminase
lFKX, lFKW Mouse
1A4L(A),1A4M(A) Mouse
1UI0, lUIP Mouse
lADD
2ADA
Alpha-amylase
lJXT(A), lJXK(A) Human
lSMD Human
1C8Q(A) Human
lCPU(A), 2CPU(A) Human
lBSI Human
lHNY Human
3CPU(A) Human
1B2Y(A) Human
lDHK(A) Kidney bean
1J F H Pig
lPIF, 1PIG Pig
lOSE Pig
lHXO(A) Pig
lBVN(P) S. tendae
1PPI
Androgen receptor
1E3G(A) Human
1137(A), 1138(A) Rat
Anticoagulant protein C
lAUT(C) Human
Aquaporin 1
1IH5(A) Human
1FQWA) Human
P-Amyloid
lMWP(A) Human
p-Lactamase[Sal
lBTL Bacteria
lFQG(A) Bacteria
lJTD(A) Bacteria
lHTZ(A) Bacteria
lERM(A), lERO(A) Bacteria
lERQ(A) Bacteria
lXPB Bacteria
lESU(A) Bacteria
1BT5(A) Escherichia coli
lTEM Escherichia coli
1CK3(A) Escherichia coli
lAXB Escherichia coli
0-Tubulin
lJFF(B) Bovine
lTUB(B) Pig
lFFX(B) Rat
Calcineurin A
lTCO(A) Bovine
lAUI(A) Human
Carbonic anhydrase 2
lHEA, 4CAC, 5CAC Human, HSV-1
1G6V(A) Arabian camel
lCNw, lCNx, 1CNY Human
1IF4(A), 1IF5(A), 1IF6(A) Human
1IF9(A) Human
lCA.3, lHEB, lHED Human
lDCA, 1DCB Human
lCRA Human
lCIL, lCIM, lCIN Human, HSV-1
lCAY Human
lRZA, lRZB, IRZC, lRZD,1RZE Human
2CA2 Human
lBNl,lBN3,1BN4,1BNM Human
lCAH Human
Table 11.1 (Continued)
Target and PDB Reference Resolution Source Homology Year
118Z(A) 1.93 A Human 100%
1BV3(A) 1.85 A Human 100%
12CA 2.40 A Human 99%
1G53(A) 1.94 A Human 100%
1AM6 2.10 A Human 100%
lCAN, lCA0 1.90 A Human rhinovirus 100%
lGOE(A), lGOF(A) 1.60 A Human 99%
lAVN 2.00 A Human 100%
lUGF 2.00 A Human 99%
lHVA 2.30 A Human 99%
5cA2 2.10 A Human 99%
lHCA 2.30 A - 100%
4CA2,6CA2,7CA2,9CA2 2.10 A-2.80 A Human 100%
lZNC(A) 2.80 A Human 100%
Catechol methyltransferase
lVID Rat
Cholecystokinina receptor
1D6G(A) NMR
Coagulation factor 10
lEZQ(A), lFOR(A),lFOS(A) Human
1C5M(D) Human
lxKA(C), lxKB(C) Human
lFAX(A) Human
lFJS(A) Human
lKIG(H) Soft tick
lHCG(A) -
Coagulation factor 2
1AI8(H) Hirudo rnedicinalis
lMKW(K), lMKX(K) Bos taurus
lBTH(H) Bovine
lHXF(H) Hirudo medicinalis
1G30(B) Hirudo medicinalis
1A3E(H) Hirudo medicinalis
1D3P(B), 1D3Q(B) Hirudo rnedicinalis
lHDT(H) Hirudo medicinalis
1AD8(H) Hirudo medicinalis
lLHC(H), lLHF(H), lLHG(H) Hirudomedicinalis
1JOU(B) Human
lDIT(H) Human
lWS(H) Human
4THN(H) Human
lTHP(B) Human
lJOU(B) Human
lDIT(H) Human
1WS(H) Human
4THN(H) Human
lTHP(B) Human
1AY6(H) Human
lClU(H), lClV(H) Human
1A4W(H) Human
1G37(A) Human
lEOJ(A), lEOL(A) Human
lBBO(B) Human
1C4U(2), 1D6W(A),1D9I(A), 1DOJ(A) Human
7KME(H) Human
.b
lQBV(H) Human
% 1DM4(B) Human
lUMA(H) Medicinal leech
lBMM(H), lBMN(H) Medicinal leech
1A2C(H) M. aeruginosa
lFPC(H) -
lNRO(H), lNRR(H) -
lHAG(E) -
lHLT(H) -
1TMU(H) -
4HTC(H) -
lAK(H), lDWB(H), lDWC(H) Hirudo medicinalis
2HPP(H) -
lABI(H) -
Coagulation factor 7
lJBU(H) Bacteria
Coagulation factor 7a
1QFWH) Human
lDVA(H) Human
Table 11.1 (Continued)
Target and PDB Reference Resolution Source Homology Year Reference
lDAN(H) 2.00 A Human 100% 1997 (152)
lCVW(H) 2.28 A Human 100% 1999 (153)
lFAK(H) 2.10 A Human 100% 1998 (154)
Coagulation factor 9
lRFN(A) 2.80 A Human 100% 1999 (155)
lPFX(C) 3.00 A Pig 88% 1995 (156)
Cox-1
1DWA) 3.00 A Sheep 93% 1999 (157)
lCQE(A), lPRH(A) 3.10 A, 3.50 A Sheep 92% 1994 (158)
1PTH 3.40 A Sheep 92% 1995 (159)
lEBV(A) 3.20 A Sheep 93% 2000 (160)
1FE2(A) 3.00 A Sheep 92% 2000 (161)
lEQG(A), lEQH(A), 1HT5(A), lHT\ill\(A) 2.61 A-2.75 A Sheep 92% 2000 (162)
lPGE(A), lPGF(A), lPGG(A) 3.50 A, 4.50 A Sheep 92% 1995 (163)
Cox-2
lCW(A), lDDX(A) 2.40 A, 3.00 A Mouse 87% 1999 (164)
lCX2,3PGH, 4COX, 5COX, 6COX 3.00 A Mouse 87% 1996 (165)
Cytochrome P450 reductase
lBlC(A) 1.93 A Human 100% 1998 (166)
lAMO(A) 2.60 A Rat 93% 1997 (167)
1J9Z(A), lJAO(A), lJAl(A) 2.70 A, 2.60 4 1.80 A Rat 92% 2001 (168)
Dihydrofolate reductase
lBOZ(A) 2.10 A Human 99% 1998 (169)
lHFP, lHFQ, 1HFR 2.10 A Human 100% 1997 (170)
lOHJ, lOHK 2.50 A Human 100% 1997 (171)
lDRl,lDR5,1DR6, 1DR7 2.20 A, 2.40 A - 75% 1992 (172)
1DR2,1DR3 2.30 A - 75% 1992 (173)
1DR4 2.40 A - 75% 1992 (174)
lDHF(A), 2DHF(A) 2.30 A - 100% 1989 (175)
lDLR, 1DLS 2.30 A - 99% 1995 (176)
8DFR 1.70 A - 75% 1989 (177)
1DRF 2.00 A - 100% 1990 (178)
Dihydroorotate dehydrogenase
1D3G(A), 1D3H(A) 1.60 *,1.80 A Human 100% 1999 (179)
Dihydropteroate synthetase[Sal
lADl(A), 1AD4(A) S. aureus
DNA helicase pcra[Sa]
1QHHW B. thermophilus
DNA topoisomerase 1
1EJ9(A) Human
1A36(A) Human
1A31(A), 1A35(A) Human
Estrogen receptor l a
lQKT(A), lQKU(A) 2.20 A, 3.20 A Human
lHCP NMR Human
1A52(A) 2.80 A Human
lERR(A), lERE(A) 2.60 A, 3.10 A Human
lHCQ(A) 2.40 A Human
3ERT(A), 3ERD(A) 1.90 A, 2.03 A Human
FK506-binding protein
lTCO(C) 2.50 A Bovine
lFKD, 2FKE 1.72 A Human
lFKJ, lFKK, 1FKL 1.70 & 2.20 A Cow
lFAP(A) 2.70 A Human
3FAP(A), 4FAP(A) 1.85 A, 2.80 A Human
lNSG(A) 2.20 A Human
lFKR, lFKS, 1FKT NMR Human
lEYM(A) 2.00 A Human
1BL4(A) 1.90 A Human
1D60(A), 1D7H(A), 1D7I(A), 1D7J(A) 1.85 A-1.90 A Human
lQPF(A), lQPL(A) 2.50 A, 2.90 A Human
1F40(A) NMR Human
1B6C(A) 2.60 A Human
1A7X(A) 2.00 A Human
lBKF 1.60 A Human
2FAP(A) 2.20 A Human
1C9H(A) 2.00 A Human
IFKG, lFKH, lFKI(A) 2.00 A, 1.95 & 2.20 A
lFKF 1.70 A
lFKB 1.70 A
Table 11.1 (Continued)
Target and PDB Reference Resolution Source Homology Year Reference
Follicle stimulating hormone
1FL7(B) 3.00 A Human 99% 2000 (211)
GABA transferase
lGTX(A) 3.00 A
. Pig 94% 1999 (212)
Glucocorticoid receptor
lLAT(A) 1.90 A Rat 85% 1995 (213)
lGLU(A) 2.90 A - 94% 1992 (214)
Glutamate receptor 1
lEWK(A), lEWT(A), lEWV(A) 2.20 3.70 A, 4.00 A Rat 98% 2000 (215)
Glutathione peroxidase
lGPl(A) 90% J u n 1985 (216)
G-CSF 3
e 1CD9(A), lPGR(A) 2.80 A, 3.50 A
0
oa lBGC, IBGD, lBGE(A) 1.70 A, 2.30 $2.20 A
lGNC NMR
lRHG(A) 2.20 A
Granulocyte-macrophage CSF
lCSG(A) -
2GMF(A) Human
Growth hormone receptor
1A22(B) Human
1AmB) Human
lHWG(B), lHWH(B) Human
3HHR(B) -
HIV reverse transcriptase
lDLO HIV-1
1RT3(B) HIV-1
lHPZ, lHQE, ' ~ H Q U HIV-1
lBQM, lBQN HIV-1
lTVR(B), lUWB(B) HIV-1
lEET HIV-1
lIKv, lIKw, lIKX, lIKY HIV-1
lHW(B) HIV-1
lClB(B) HW-1 pol
2HMI(B) virus
1HYs Virus
1HMV Virus
lHNI Virus
1HNV virus
lFKP(B) virus
lJLA, lJLB, lJLC, lJLE, lJLF, 1JLG virus
1550, lQEl(B) HIV-1
3HVT(B) -
Inosine monophosphate dehydrogenase 2
lJRl(A) Chinese hamster
1B30(A) Human
Insulin-like growth factor 1
3LRI(A) NMR Human
lBQT NMR Human
1IMXA) 1.82 A Human
1B9G(A) NMR -
2GF1,3GF1 NMR -
Insulin-like growth factor 1 receptor
lIGR(A) Human
lGAG(A) Human
1144(A) Human
1IR3(A) Human
lIRK -
Insulin-like growth factor 2
lIGL NMR
Integrin alpham
lBHQ(l), lIDN(1) Human
lJLM Human
lIDO Human
Intercellular adhesion molecule 1
lIAM Human
lICl(A) Human
1D3E(I), 1D3I(I), 1D3L(A) Human rhinovirus
Table 11.1 (Continued)
Target and PDB Reference Resolution Source Homology Year Reference
Interferon a 1
lITF NMR Human
1RH2(A) 2.90 A Human
Interferon y
1FG9(A) 2.90 A Human
lFYH(A) 2.04 A Human
lEKU(A) 2.90 A Human
lHIG(A) 3.50 A
Interleukin 1
2ILA 2.30 A
Interleukin 1 receptor
lGOY(R) 3.00 A Human
1IPAO 2.70 A Human
lITB(B) 2.50 A Human
p. Interleukin 10
rD
o lVLK 1.90 A Epstein-Barr virus
21LK 1.60 A Human
lILK 1.80 A Human
1J7V(L) 2.90 A Human
lINR 2.00 A Human
Interleukin 12
1F42(A),1F45(A) 2.50 A, 2.80 A Human
Interleukin 13
1GA3(A) NMR Human
Interleukin 2
lIRL NMR Human
3INK(C) 2.50 A -
Interleukin 3
1JLI NMR Escherichia coli
Interleukin 4
lHIJ, lHIK 3.00 4 2.60 A Human
lIAR(A) 2.30 A Human
lHZI(A) 2.05 A Human
lITM NMR .
lBBN, 1BCN NMR
lCYL NMR
2CYK NMR
lITL NMR
21NT 2.40 A
1RCB 2.25 A
lITI NMR
Interleukin 5
lHUL(A) 2.40 A Human
Interleukin 6
1IL6,2IL6 NMR Human
lALU 1.90 A Human
Interleukin 8
lIKL, 1IKM NMR Human
lICW(A) 2.01 A Human
lILP(A), lILQ(A) NMR Human
1QE6(A) 2.35 A Human
lROD(A) NMR Human
3IL8 2.00 A -
1IL8(A),2IL8(A) NMR -
e Leukotriene A4 hydrolase
2
1HS6(A) 1.95 A Human
Lipocortin I
lAIN 2.50 A Human
1HM6(A) 1.80 A Pig
Luteinizing hormone P
lQFW(B) 3.50 A Escherichia coli
lHCN(B) 2.60 A -
lHRP(B) 3.00 A -
Macrophage CSF 1
lHMC(A) 2.50 A
Neurarninidase[int B virus]
lINF 2.40 A Influenza b virus
2.20 A 1.90 A Influenza b virus 94%
2.40 A - 99%
1.70 1.80 A - 94%
2.50 A, 2.40 A, 2.35 A Influenza b virus 99%
2.40 A - 99%
2.20 A - 94%
Table 11.1 (Continued)
Target and PDB Reference Resolution Source Homology Year Reference
Neuropeptide Y
lRON NMR Human
1F8P(A) NMR -
lFVN(A) NMR -
Parathyroid hormone
lHTH NMR Human
lFVY(A) NMR Human
lBWX, lHPY, lZWA, 1ZWC NMR Human
lETl(A) 0.90 A Human
1HPH NMR -
lZWB, lZWD, lZWE, lZWF, lZWG NMR -
PDGF p
lPDG(A) -
Phospholipase A2
lBCI NMR Human
lRLW 2.40 A Human
\O lCJY(A) 2.50 A Human
N
Potassium channel shaker
1A68 Sea hare
lEOD(A), lEOE(A), lEOF(A) Sea hare
lTlD(A) Sea hare
lEXB(E) Rat
lDSX(A), lQDV(A), lQDW(A) Rat
PPAR y
4PRG(A) Escherichia coli 97%
lPRG(A), 2PRG(A) Human 97%
1FM6(D), 1FM9(D) Human 99%
3PRG(A) Human 99%
Progesterone receptor
1E3K(A) Human
1A28(A) Human
Proladin receptor
1BP3(B) Human
1F6F(B) Rat
Retinoic acid receptor
lDSZ(A) Human
lEXA(A), lEXX(A) Human
2LBD 2.00 A Human
3LBD, 4LBD 2.40 A Human
lDKF(B) 2.50 A Human
lFCX(A), lFCY(A), lFCZ(A) 1.47 A, 1.30 4 1.38 A Human
lHRA NMR
Retinoid X receptor
1FM6(A), 1FM9(A) 2.10 A Human
lDSZ(B)
lDKF(A), 1LBD
2NLL(A)
*,
1.70 A
2.50 2.70 A
1.90 A
Human
Human
Human
lRXR NMR Human
lGlU(A), 1G5Y(A) 2.50 4 2.00 A Human
lFBY(A) 2.25 A Human
1BY4(A) 2.10 A Human
Serotransferrin p
1JNF(A) Rabbit
Stem cell factor
lEXZ(A) Human
lSCF(A) Human
P
a
W
Thymidine k i n a s e [ H W
1KWA)
10HI(A), 2KI5(A)
lKIZ,lKI3,1KI4,1KI6,lKI7,1KI8
lWK, 2WK, 3 W K
1E2H(A), 1E2I(A), 1E2J(A)
1E2M(A), 1E2N(A), 1E2P(A)
1E2K(A), 1E2L(A)
Tumor necrosis factor receptor 1
lNCF(A) Human
lEXT(A) Human
lTNR(R) -
Vitamin D receptor
1IE8(A), 1IE9(A) Human
lDBl(A) Human
Xanthine-guanine phosphoribosyltransferase
1A95(A), 1A97(A), 1A98(A) Escherichia coli
lNUL(A) Escherichia coli
1A96(A) Escherichia coli
X-Ray Crystallography in Drug Discovery
Many consortiurns are selecting targets for X- 3. L.Pauling, Lecture presented at the Interna-
ray crystallography that would provide the tional Congress of X-ray Crystallography at
templates for comparative modeling tech- Stonybrook, NY, August 1973.
niques of all other sequences (384, 385). As 4. J. Jancarik and S. H. Kim, J. Appl. Cryst., 24,
more structures are determined by NMR and 409-411(1991).
X-ray crystallography, the quality of the mod- 5. Hampton Research, available online at http://
els will improve simply because more similar www.hamptonresearch.com,accessed on Octo-
templates will become available but also be- ber 9,2001.
cause and new methods for loop modeling and 6. G. L.Gilliland, M. Tung, D. M. Blakeslee, and
ab initio structure prediction will undoubtedly J. Ladner, Biological Macromolecule Crystalli-
zation Database (BMCD), available online at
emerge (386, 387). Efforts are also underway
http://www.bmal.nist.gov:8O8O/bmcdmmal.html,
both in industry and academia to assemble da- accessedon October9,2001.
tabases of homology models for all sequences
7. N.E. Chayen, Structure, 5,1269-1274(1997).
that can be reasonably well modeled (388).
8. I. Jurisica, et al., IBMSystems J.,40,394-409
(2001).
5 CONCLUSION 9. A. D'Arcy, et al.,J. Cryst. Growth, 168, 175-
180 (1996).
Anyone who is involved or interested in drug 10. C. W. Carter Jr., Methods Enzymol., 276,
discovery will recognize the potential of pro- 74-99(1997).
tein crystallography to greatly enhance the 11. A. C.Bloomer and U.W. Arndt, Acta Crystal-
process. Whether this promise has been met to logr. D Biol. Crystallogr., 55(Pt lo),1672-1680
date is the subject of considerable debate. (1999).
What is certain, however, is that in the very 12. (a) G. A. Petsko, J. Mol. Biol., 96, 381-392
near future the advances in crystallography (1975);(b) R. L.Sutton, J. Chem. Soc. Faraday
technology will render this question moot. Trans., 1, 101-105 (1991);(c) D. W. Rodgers,
The histograms on the PDB website (27, 28) Structure, 2,1135-1140(1994).
that show the increasing rate of structures de- 13. D. W. Green, V. M. Ingram and M. F. Perutz,
posited over the last decade are a startling vi- Proc. R. Soc. A, 225,287-307(1954).
sual indicator of the revolution that is occur- 14. M. G. Rossman and D. M. Blow, Acta Cryst., 15,
ring in the field. Clearly, the impact will be felt 24-34(1962).
in drug discovery very soon and perhaps very 15. L. M. Rice, T. N. Earnest, A. T. Brunger, Acta
dramatically, and it serves the audience of this Cryst. D., 56,1413-1420(2000).
series to be well informed of these advances in 16. W.A. Hendrickson, et al., EMBO J.9,1665-
technology and their subtle limitations. 1672(1990).
It is tempting to draw analogy with the de- 17. W. I. Weis, et al., Science, 254, 1608-1615
velopment of other analytical technologies (1991).
(NMR, FAB-MS) and conclude that protein 18. W. M. Clemons, Jr., et al., J Mol Biol. 310,
crystallography will soon leave the incubator 827-843(2001).
of "big machine physics" to become an every- 19. Collaborative Computational Project, Number
day, routine tool used in the medicinal chem- 4,Acta Cryst. D,50,760-763(1994).
istry laboratory. Hopefully, this chapter has 20. Z.Otwinowski and W. Minor, available online
shown some of the subtle complexities of sam- a t http://www.hkl-xray.com,accessed October
ple preparation and handling, data collection, 9,2001.
and refinement, etc. that temper this vision 21. T.Terwilliger, Automated Structure Solution
and will likely keep this a specialized field for for MIR and MAD, available online at http://
some time. www.solve.lanl.gov, accessed October 9,2001.
22. C. M. Weeks, S. A. Potter, J. Rappleye, R.
Miller, available online a t http://www.hwi.
REFERENCES buffalo. edu/SnB, accessed on October 9,2001.
1. W. A. Hendrickson, Science, 254,51-58(1991). 23. G. M. Sheldrick, available online at http://
2. T. C. Tenvilliger, Nat. Struct. Biol., 7,935-939 shekuni-ac.gwdg.de/SHEW, accessed on Oc-
(2000). tober 9,2001.
References
24. V. S. Lamzin and A. Perrakis, available online 43. G. D. Brayer, G. Sidhu, R. Maurus, E. H. Ryd-
at http://www.embl-hamburg.de/ARP, accessed berg, C. Braun, Y. Wang, N. T. Nguyen, C. M.
on October 9,2001. Overall, and S. G. Withers, Biochemistry, 39,
25. A. Jones and M. Kjeldgaard, available online at 4778 (2000).
http://www.imsb.au.dk/-moWo, accessed on 44. E. H. Rydberg, G. Sidhu, H. C. Vo, J. Hewitt,
October 9,2001. H. C. Cote, Y. Wang, S. Numao, R. T. Macgil-
26. A. T. Brunger, P. D. Adams, G. M. Clore, W. L. livray, C. M. Overall, G. D. Brayer, and S. G.
Delano, P. Gros, R. W. Grosse-Kunstleve, J.4. Withers, Protein Sci., 8,635 (1999).
Jiang, J. Kuszewski, M. Nilges, N. S. Pannu, 45. G. D. Brayer, Y. Luo, and S. G. Withers, Pro-
R. J. Read, L. M. Rice, T. Simonson, and G. L. tein Sci., 4, 1730 (1995).
Warren, Crystallography and NMR System, 46. G. D. Brayer, G. Sidhu, R. Maurus, E. H. Ryd-
available online at http://cns.csb.yale.edu/vl.0, berg, C. Braun, Y. Wang, N. T. Nguyen, C. M.
accessed on October 9,2001. Overall, and S. G. Withers, Biochemistry, 39,
27. H. M. Berman, D. S. Goodsell, and P. E. 4778 (2000).
Bourne, Am. Scientist, 90,350-359 (2002). 47. M. Qian, R. Haser, G. Buisson, E. Duee, and F.
28. H. M. Berman, J. Westbrook, Z. Feng, G. Gilli- Payan, Biochemistry, 33,6284 (1994).
land, T. N. Bhat, H. Weissig, I. N. Shindyalov, 48. C. Bompard-Gilles, P. Rousseau, P. Rouge, and
P. E. Bourne, The Protein Data Bank, Nucleic F. Payan, Structure (Lond), 4, 1441 (1996).
Acids Res., 28,235-242 (2000). 49. M. Qian, S. Spinelli, H. Driguez, and F. Payan,
29. Information on how to obtain this database Protein Sci., 6, 2285 (1997).
is available at: http://www.ccdc.cam.ac.uW 50. M. Machius, L. Vertesy, R. Huber, and G. Wie-
prods/. gand, J. Mol. Biol., 260,409 (1996).
30. J. Drews and S. Ryser, Nature Biotechnol.15, 51. C. Gilles, J. P. h t i e r , G. Marchis-Mouren, C.
(1997). Cambillau, and F. Payan, Eur. J. Biochem.,
31. Y. Bourne, P. Taylor, and P. Marchot, Cell, 83, 238,561 (1996).
503 (1995). 52. M. Qian, V. Nahoum, J. Bonicel, H. Bischoff, B.
32. G. Kryger, M. Harel, M. Harel, A. Shafferman, Henrissat, and F. Payan, Biochemistry, 40,
I. Silman, and J. L. Sussman,Acta Crystallogr., 7700 (2001).
Sect. D, 56, 1385 (2000). 53. G. Wiegand, 0. Epp, and R. Huber, J. Mol.
33. Y. Bourne, J. Grassi, P. E. Bougis, and P. Mar- Biol., 247,99 (1995).
chot, J. Biol. Chem., 274,3370 (1999). 54. M. Qian, R. Haser, G. Buisson, E. Duee, and F'.
34. Y. Bourne, P. Taylor, P. E. Bougis, and P. Mar- Payan, Biochemistry, 33,6284 (1994).
chot, J. Biol. Chem., 274,2963 (1999). 55. P. M. Matias, P. Donner, R. Coelho, M.
35. V. Sideraki, K. A. Mohamedali, D. K. Wilson, Z. Thomaz, C. Peixoto, S. Macedo, N. Otto, S.
Chang, R. E. Kellems, F.A. Quiocho, and F. B. Joschko, P. Scholz, A. Wegg, S. Basler, M.
Rudolph, Biochemistry, 35,7862 (1996). Schafer, U. Egner, and M. A. Carrondo, J.Biol.
Chem., 275,26164 (2000).
36. Z. Wang and F. A. Quiocho, Biochemistry, 37,
8314 (1998). 56. J. S. Sack, K. F. Kish, C. Wang, R. M. Attar,
S. E. Kiefer,Y. Ang.Y. Wu, J. E. Schemer, M. E.
37. V. Sideraki, D. K. Wilson, L. C. Kurz, F. A. Salvati, S. R. Krystekjr., R. Weinmann, and
Quiocho, and F. B. Rudolph, Biochemistry, 35,
H. M. Einspahr, Proc. Nut. h a d . Sci. USA, 98,
15019 (1996).
4904 (2001).
38. D. K. Wilson, F. B. Rudolph, and F. A. Quiocho, 57. T. Mather, V. Oganessyan, P. Hof, R. Huber, S.
Science, 252, 1278 (1991). Foundling, C. Esmon, and W. Bode, Embo J.,
39. D. K. Wilson and F. A. Quiocho, Biochemistry, 15,6822 (1996).
32, 1689 (1993). 58. G. Ren, V. S. Reddy, A. Cheng, P. Melnyk, and
40. N. Ramasubbu, C. Ragunath, and Z. Wang, In A. K. Mitra, Proc. Nut. Acad. Sci. USA, 98,
press. 1398 (2001).
41. N. Rarnasubbu, P. Venugopalan, Y. Luo, G. D. 59. K. Murata, K. Mitsuoka, T. Hirai, T. Walz, P.
Brayer, and M. J. Levine, In press. A g e , J. B. Heyrnann, A. Engel, and Y. Fujiyo-
42. N. Ramasubbu, K. Sekar, and D. Velmurugan, shi, Nature, 407,599 (2000).
Acta Crystallogr. D Biol. Crystallogr., 52, 435 60. J. Rossjohn, R. Cappai, S. C. Feil, A. Henry,
(1996). W. J. Mckinstryd. Galatis, L. Hesse, G. Mul-
X-Ray Crystallography in Drug Discovery
thaup, K. Beyreuther, C. L. Masters, andM. W. 79. C.-Y. Kim and D. W. Christianson, In press.
Parker, Nut. Struct. Biol., 6, 327 (1999). 80. B. A. Grzybowski, A. V. Ishchenko, C.-Y. Kim,
61. C. Jelsch, L. Mourey, J. M. Masson, and J. P. G. Topalov, R. Chapman, D. W. Christianson,
Samama, Proteins, 16, 364 (1993). G. M. Whitesides, and E. I. Shakhnovich, Proc.
62. N. C. Strynadka, H. Adachi, S. E. Jensen, K. Nut. Acad. Sci. USA, 99,1270 (2002).
Johns, A. Sielecki, C. Betzel, K. Sutoh, and 81. S. K. Nair and D. W. Christianson, Biochemis-
M. N. James, Nature, 359, 700 (1992). try, 32,4506 (1993).
63. D. C. Lim, H. U. Park, L. Decastro, S. G. Kang, 82. J. A. Ippolito and D. W. Christianson, Bio-
H. S. Lee, S. Jensen, K. J . Lee, and N. C. J. chemistry, 32,9901 (1993).
Strynadka, Nut. Struct. Biol., 8,848 (2001). 83. S. Mangani and A. Liljas, J. Mol. Biol., 232,9
64. M. C. Orencia, J . S. Yoon, J. E. Ness, W. P. (1993).
Stemmer, and R. C. Stevens, Nut. Struct. Biol., 84. G. M. Smith, R. S. Alexander, D. W. Christian-
8, 238 (2001). son, B. M. McKeever, G. S. Ponticello, J . P.
65. S. Ness, R. Martin, A. M. Kindler, M. Paetzel, Springer, W. C. Randall, J. J. Baldwin, and
M. Gold, J. B. Jones, and N. C. J. Strynadka, C. N. Habecker, Protein Sci., 3,118 (1994).
Biochemistry, 39, 5312 (2000). 85. K. Hakansson, C. Briand, V. Zaitsev, Y. Xue,
66. E. Fonze, P. Charlier, Y. To'Th, M. Vermeire, and A. Liljas, Acta Crystallogr. D Biol. Crystal-
X. Raquet, and A. Dubus, Acta Crystallogr., logr., 50, 101 (1994).
Sect. D, 61,682 (1995). 86. K. Hakansson, A. Wehnert, and A. Liljas, Acta
67. E. Fonze, P. Charlier, Y. To'Th, M. Vermeire, Crystallogr. D Biol. Crystallogr., 50,93 (1994).
X. Raquet, and A. Dubus, Acta Crystallogr.,
87. A. E. Eriksson, P.M. Kylsten, T. A. Jones, and
Sect. D, 51,682 (1995).
A. Liljas, Proteins, 4,283 (1988).
68. L. Maveyraud, L. Mourey, L. P. Kotra, J. D.
Pedelacq, V. Guillet, S. Mobashery, and J. P. 88. P. A. Boriack-Sjodin, S. Zeitlin, H. H. Chen, L.
Samama, J. Am. Chem. Soc., 120,9748 (1998). Crenshaw, S. Gross, A. Dantanarayana, P. Del-
gado, J. A. May, T. Dean, and D. W. Christian-
69. L. Maveyraud, I. Massova, C. Birck, K. Mi- son, Protein Sci., 7, 2483 (1998).
yashita, J. P. Samama, and S. Mobashery,
J. Am. Chem. Soc., 118, 7435 (1996). 89. K. Hakansson and A. Wehnert, J. Mol. Biol.,
228, 1212 (1992).
70. P. Swaren, D. Golemi, S. Cabantous, A. Buly-
chev, L. Maveyraud, S. Mobashery, and J. P. 90. C.-Y. Kim, D. A. Whittington, J. S. Chang, J.
Samama, Biochemistry, 38,9570 (1999). Liao, J. A. May, and D. W. Christianson,
71. L. Maveyraud, R. F. Pratt, and J. P. Samama, J. Med. Chem., 45, 888 (2002).
Biochemistry, 37,2622 (1998). 91. F. Briganti, S. Mangani, A. Scozzafava, G. Ver-
72. J. Lowe, H. Li, K. H. Downing, and E. Nogales, naglione, and C. T. Supuran, J. Biol. Znorg.
J. Mol. Biol., 313, 1045 (2001). Chem., 4, 528 (1999).
73. E. Nogales, S. G. Wolf, and K. H. Downing, 92. S. K. Nair, T. L. Calderone, D. W. Christian-
Nature, 391, 199 (1998). son, and C. A. Fierke, J. Biol. Chem., 266,
74. B. Gigant, P. A. Curmi, C. Martin-Barbey, E. 17320 (1991).
Charbaut, S. Lachkar, L. Lebeau, S. Sia- 93. C.-Y. Kim, J. S. Chang, J. B. Doyon, T. T.
voshian, A. Sobel, and M. Knossow, Cell, 102, Bairdjr., C. A. Fierke, A. Jain, and D. W. Chris-
809 (2000). tianson, J.Am. Chem. Soc., 122,12125 (2000).
75. J. P. Griffith, J. L. Kim, E. E. Kim, M. D. Sint- 94. L. R. Scolnick, A. M. Clements, J. Liao, L.
chak, J. A. Thomson, M. J . Fitzgibbon, M. A. Crenshaw, M. Hellberg, J. May, T. R. Dean,
Fleming, P. R. Caron, K. Hsiao, and M. A. Na- and D. W. Christianson, J. Am. Chem. Soc.,
via, Cell, 82, 507 (1995). 119,850 (1997).
76. C. R. Kissinger, H. E. Parge, D. R. Knighton, 95. S. Mangani and K. Hakansson, Eur. J. Bio-
C. T. Lewis, L. A. Pelletier, A. Tempczyk, V. J. chem., 210,867 (1992).
Kalish, K. D. Tucker, R. E. Showalter, E. W.
Moomaw, L. N. Gastinel, N. Habuka, X. Chen, 96. D. Duda, C. Tu, M. Qian, P. Laipis, M. Ag-
F. Maldonado, J. E. Barker, R. Bacquet, and bandje-Mckenna, D. N. Silverman, and R.
J. E. Villafranca, Nature, 378,641 (1995). Mckenna, Biochemistry, 40,1741 (2001).
77. A. Desmyter, K. Decanniere, S. Muyldermans, 97. F. Briganti, S. Mangani, P. Orioli, A. Scoz-
and L. Wyns, In press. zafava, G. Vernaglione, and C. T. Supuran,
78. P. A. Boriack, D. W. Christianson, J. Kingery- Biochemistry, 36,10384 (1997).
Wood, and G. M. Whitesides, J. Med. Chem., 98. L. R. Scolnick and D. W. Christianson, Bio-
38,2286 (1995). chemistry, 35, 16429 (1996).
118. N. Y. Chirgadze, D. J. Sall, S. L. Briggs, D. K.
D. W. Christianson, Biochemistry, 32, 1510 Clawson, M. Zhang, G. F. Smith, and R. W.
(1993). Schevitz, Protein Sci., 9, 29 (2000).
100. J. F. Krebs, C. A. Fierke, R. S. Alexander, and 119. L. Tabernero, C. Y. Chang, S. Ohringer, W. F.
D. W. Christianso, Biochemistry, 30, 9153 Lau, E. J. Iwanowicz, W.-C. Han, T. C. Wang,
(1991). S. M. Seiler, D. G. M. Roberts, and J. S. Sack, J.
101. S. K. Nair and D. W. Christianson, J. Am. Mol. Biol., 246, 14 (1995).
Chem. Soc., 113,9455 (1991). 120. J. A. Malikayil, J . P. Burkhart, H. A. Schreu-
102. R. S. Alexander, S. K. Nair, and D. W. Chris- der, R. J. Broersmajunior, C. Tardif, L. W.
tianson, Biochemistry, 30,11064 (1991). Kutcheriii, S. Mehdi, G. L. Schatzman, B.
103. T. Stams, S. K. Nair, T. Okuyama, A. Waheed, Neises, and N. P. Peet, Biochemistry, 36,1034
W. S. Sly, and D. W. Christianson, Proc. Nat. (1997).
Acad. Sci. USA, 93,13589 (1996). 121. P. C. Weber, S. L. Lee, F. A. Lewandowski,
104. J. Vidgren, L. A. Svensson, and A. Liljas, Na- M. C. Schadt, C. W. Chang, and C. A. Kettner,
ture, 368, 354 (1994). Biochemistry, 34, 3750 (1995).
105. M. Pellegrini and D. F. Mierke, Biochemistry, 122. J. A. Huntington and C. T. Esmon, In press.
38,14775 (1999).
123. R. Krishnan, A. Tulinsky, G. P. Vlasuk, D.
106. S. Maignan, J. P. Guilloteau, S. Pouzieux, Pearson, P. Vallar, P. Bergum, T. K. Brunck,
Y. M. Choi-Sledeski, M. R. Becker, S. I. Klein, and W. C. Ripka, Protein Sci., 5, 422 (1996).
W. R. Ewing, H. W. Pauls, A. P. Spada, and V.
Mikol, J. Med. Chem., 43,3226 (2000). 124. R. A. Engh, H. Brandstetter, G. Sucher, A.
Eichinger, U. Baumann, W. Bode, R. Huber, T.
107. B. A. Katz, R. Mackman, C. Luong, K. Radika, Poll, R. Rudolph, and W. Vondersaal, Structure
A. Martelli, P. A. Sprengeler, J. Wang, H. (Lond), 4, 1353 (1996).
Chan, and L. Wong, Chem. Biol., 7,299 (2000).
125. A. Lombardi, G. Desimone, F. Nastri, S.
108. K. Kamata, H. Kawamoto, T. Honma, T.
Galdiero, R. Dellamorte, N. Staiano, C. Pe-
Iwama, and S. H. Kim, Proc. Nat. Acad. Sci.
done, M. Bolognesi, and V. Pavone, Protein
USA, 95,6630 (1998).
Sci., 8, 91 (1999).
109. H. Brandstetter, A. Kuhne, W. Bode, R. Huber,
W. Vondersaal, K. Wirthensohn, and R. A. 126. E. Guinto, S. Caccia, T. Rose, K. Futterer, G.
Engh, J. Biol. Chem., 271, 29988 (1996). Waksman, and E. Dicera, Proc. Nat. Acad. Sci.
USA, 96,1852 (1999).
110. M. Adler, D. D. Davey, G. B. Phillips, S. H.
Kim, J. Jancarik, G. Rumennik, D. L. Light, 127. B. E. Maryanoff, X. Qiu, K. P. Padmanabhan,
and M. Whitlow, Biochemistry, 39, 12534 A. Tulinsky, H. R. Almondjunior, P. Andrade-
(2000). Gordon, M. N. Greco, J. A. Kauffman, K. C.
Nicolaou, A. Liu, P. H. Brungs, and N. Fuset-
111. A. Wei, R. Alexander, J. Duke, H. Ross, S.
ani, Proc. Nat. Acad. Sci. USA, 90, 8048
Rosenfeld, and C.-H. Chang, J.Mol. Biol., 283,
(1993).
147 (1998).
112. K. Padmanabhan, K. P. Padmanabhan, A. Tu- 128. B. A. Katz, J. M. Clark, J.S. Finer-Moore, T. E.
linsky, C. H. Park, W. Bode, R. Huber, D. T. Jenkins, C. R. Johnson, M. J. Ross, C. Luong,
Blankenship, A. D. Cardin, and W. Kisiel, J. W. R. Moore, and R. M. Stroud, Nature, 391,
Mol. Biol., 232,947 (1993). 608 (1998).
113. M. G. Malkowski, P. D. Martin, J. C. Guzik, 129. J. H. Matthews, R. Krishnan, M. J. Costanzo,
and B. F. P. Edwards, Protein Sci., 6, 1438 B. E. Maryanoff, and A. Tulinsky, Biophys. J.,
(1997). 71,2830 (1996).
114. A. Vandelocht, W. Bode, R. Huber, B. F. Leb- 130. B. Bachand, M. Tarazi, Y. St-Denis, J. J. Ed-
onniec, S. R. Stone, C. T. Esmon, and M. T. munds, P. D. Winocour, L. Leblond, and M. A.
Stubbs, Embo J.,16,2977 (1997). Siddiqui, Bioorg. Med. Chem. Lett., 11, 287
115. E. Zhang and A. Tulinsky, Biophys. Chem., 63, (2001).
185 (1997). 131. J. J. Slon-Usakiewicz, J. Sivaraman, Y. Li, M.
116. H. Nar, M. Bauer, A. Schmid, J. Stassen, W. Cygler, and Y. Konishi, Biochemistry, 39,2384
Wienen, H. W. Priepke, I. K. Kauffmann, U. J. (2000).
Ries, and N. H. Hauel, Structure, 9,29 (2001). 132. R. Krishnan, E. Zhang, K. Hakansson, R. K.
117. A. Zdanov, S. Wu, J. DiMaio, Y. Konishi, Y. Li, Arni, A. Tu1inskym.S. Lim-Wilby, 0. E. Levy,
X. Wu, B. F. Edwards, P. D. Martin, and M. J. E. Semple, and T. K. Brunck, Biochemistry,
Cygler, Proteins, 17,252 (1993) 37,12094 (1998).
X-Ray Crystallography in Drug Discovery
133. R. Krishnan, I. Mochalkin, R. Arni, and A. Tu- 152. D. W. Banner, A. D'Arcy, C. Chene, F. K. Wink-
linsky, Acta Crystallogr., Sect. D, 56, 294 ler, A. Guha, W. H. Konigsberg, Y. Nemerson,
(2000). and D. Kirchhofer, Nature, 380,41 (1996).
134. I. Mochalkin and A. Tulinsky, Acta Crystal- 153. G. Kemball-Cook, D. J. D. Johnson, E. G. D.
logr., Sect. D, 55, 785 (1999). Tuddenham, and K. Harlos, J. Struct. Biol.,
127,213 (1999).
135. R. Bone, T. Lu, C. R. Illig, R. M. Soll, and J. C.
Spurlino, J. Med. Chem., 41,2068 (1998). 154. E. Zhang, R. Stcharles, and A. Tulinsky, J.
Mol. Biol., 285,2089 (1999).
136. R. Krishnan, E. J. Sadler, and A. Tulinsky,
Acta Crystallogr., Sect. D, 56,406 (2000). 155. K.-P. Hopfner, A. Lang, A. Karcher, K. Sichler,
E. Kopetzkih. Brandstetter, R. Huber, W.
137. M. Nardini, A. Pesce, M. Rizzi, E. Casale, R. Bode, and R. A. Engh, Structure (Lond), 7,989
Ferraccioli, G. Balliano, P. Milla, P. Ascenzi, (1999).
and M. Bolognesi, J. Mol. Biol., 258, 851
156. H. Brandstetter, M. Bauer, R. Huber, P. Lol-
(1996).
lar, and W. Bode, Proc. Nat. Acad. Sci. USA,
138. M. F. Malley, L. Tabernero, C. Y. Chang, S. L. 92,9796 (1995).
Ohringer, D. G. Roberts, J. Das, and J. S. Sack, 157. M. G. Malkowski, S. L. Ginell, W. L. Smith, and
Protein Sci., 5,221 (1996). R. M. Garavito, Science, 289, 1933 (2000).
139. J. L. R. Steiner, M. Murakami, and A. Tulin- 158. D. Picot, P. J. Loll, and R. M. Garavito, Nature,
sky, J. Am. Chem. Soc., 120,597 (1998). 367, 243 (1994).
140. I. I. Mathews and A. Tulinsky, Acta Crystal- 159. P. J. Loll, D. Picot, and R. M. Garavito, Nat.
logr. D Biol. Crystallogr., 51,550 (1995). Struct. Biol., 2,637 (1995).
141. I. I. Mathews, K. P. Padmanabhan,V. Ganesh, 160. P. J. Loll, C. T. Sharkey, S. J. O'Connor, C. M.
A. Tulinsky, M. Ishil, J . Chen, C. W. Turck, Dooley, E. O'Brien, M. Devocelle, K. B. Nolan,
and S. R. Coughlin, Biochemistry, 33, 3266 B. S. Selinsky, and D. J. Fitzgerald, Mol. Phar-
(1994). macol., 60, 1407 (2001).
142. J. Vijayalakshmi, K. P. Padmanabhan, K. G. 161. E. D. Thuresson, M. G. Malkowski, K. M. Lak-
Mann, and A. Tulinsky, Protein Sci., 3, 2254 kides, C. J. Riekea.M. Mulichak, S. L. Ginell,
(1994). R. M. Garavito, and W. L. Smith, J. Biol.
143. 1. I. Mathews, K. P. Padmanabhan, A. Tulin- Chem., 276, 10358 (2001).
sky, and J. E. Sadler, Biochemistry, 33,13547 162. B. S. Selinsky, K. Gupta, C. T. Sharkey, and
(1994). P. J. Loll, Biochemistry, 40, 5172 (2001). ,
144. J. P. Priestle, J. Rahuel, H. Rink, M. Tones, 163. P. J . Loll, D. Picot, 0. Ekabo, and R. M. Gara-
and M. G. Gruetter, Protein Sci., 2, 1630 vito, Biochemistry, 35, 7330 (1996).
(1993). 164. J. R. Kiefer, J. L. Pawlitz, K. T. Moreland, R. A.
145. T. J. Rydel, A. Tulinsky, W. Bode, and R. Hu- Stegeman, J. K. Gierse, W. F. Hood, J. K.
ber, J. Mol. Biol., 221, 583 (1991). Gierse, A. M. Stevens, D. C. Goodwin, S. W.
Rowlinson, L. J. Marnett, W. C. Stallings, and
146. D. W. Banner and P. Hadvary, J. Biol. Chem., R. G. Kurumbail, Nature, 405,97 (2000).
266,20085 (1991).
165. R. G. Kurumbail, A. M. Stevens, J. K. Gierse,
147. R. K. Arni, K. Padmanabhan, K. P. Padmanab- J. J. Mcdonald, R. A. Stegeman, J. Y. Pak, D.
han, T-P. Wu, and A. Tulinsky, Biochemistry, Gildehaus, J. M. Miyashiro, T. D. Penning, K.
32,4727 (1993) Seibert, P. C. Isakson, and W. C. Stallings, Na-
148. X. Qiu, K. P. Padmanabhan, V. E. Carperos, A. ture, 384,644 (1996).
Tulinsky, T. Kline, J. M. Maraganore, and 166. Q. Zhao, S. Modi, G. Smith, M. Paine, P. D.
J. W. Fentonii, Biochemistry, 31,11689 (1992). Mcdonagh, C. R. Wolf, D. Tew, L. Y. Lian, G. C.
149. C. Eigenbrot, D. Kirchhofer, M. S. Dennis, L. Roberts, and H. P. Driessen, Protein Sci., 8,
Santell, R. A. Lazarus, J. Stamos, and M. H. 298 (1999).
Ultsch, Structure, 9, 627 (2001). 167. M. Wang, D. L. Roberts, R. Paschke, T. M.
150. A. C. W. Pike, A. M. Brzozowski, S.M. Roberts, Shea, B. S. Masters, and J. J. Kim, Proc. Nat.
0. H. Olsen, and E. Persson, Proc. Nat. Acad. h a d . Sci. USA, 94,8411 (1997).
Sci. USA, 96,8925 (1999). 168. P. A. Hubbard, A. L. Shen, R. Paschke, C. B.
151. M. S. Dennis, C. Eigenbrot, N. J. Skelton, Kasper, and J. J. Kim, J. Biol. Chem., 276,
M. H. Ultsch, L. Santell, M. L. Dwyer, M. P. 29163 (2001).
O'Connell, and R. A. Lazarus, Nature, 404, 169. A. Gangjee, A. P. Vidwans, A. Vasudevan, S. F.
465 (2000). Queener, R. L. Kisliuk, V. Cody, R. Li, N. Gal-
References
275. A. Zdanov, C. Schalk-Hihi, A. Gustchina, M. 296. W. Somers, M. Stahl, and J. S. Seehra, Embo
Tsang, J . Weatherbee, and A. Wlodawer, J., 16, 989 (1997).
Structure, 3,591 (1995). 297. K. Rajarathnam, I. Clark-Lewis, and B. D.
276. K. Josephson, N. J. Logsdon, and M. R. Walter, Sykes, Biochemistry, 34, 12983 (1995).
Immunity, 15, 35 (2001). 298. C. Eigenbrot, H. B. Lowman, L. Chee, and
277. M. R. Walter and T. L. Nagabhushan, Bio- D. R. Artis, Proteins, 27, 556 (1997).
chemistry, 34, 12118 (1995). 299. N. J. Skelton, C. Quan, D. Reilly, and H. Low-
278. C. Yoon, S. C. Johnston, J. Tang, M. Stahl, J. F. man, Structure Fold Des., 7, 157 (1999).
Tobin, and W. S. Somers, Embo J., 19, 3530 300. N. Gerber, H. Lowman, D. R. Artis, and C.
(2000). Eigenbrot, Proteins: Struct., Funct., Genet.,
279. E. Z. Eisenmesser, D. A. Horita, A. S. Altieri, 38,361 (2000).
and R. A. Byrd, J. Mol. Biol., 310,231 (2001). 301. H. Sticht, M. Auer, B. Schmitt, J. Besemer, M.
280. H. R. Mott, B. S. Baines, R. M. Hall, R. M. Horcher, T. Kirsch, J. D. Lindley, and P.
Cooke, P. C. Driscoll, M. P. Weir, and I. D. Roesch, Eur. J. Biochem., 235, 26 (1996).
Campbell, J. Mol. Biol., 248, 979 (1995). 302. E. T. Baldwin, I. T. Weber, R. St. Charles, J.-C.
281. D. B. McKay, Science, 257,412 (1992). Xuan, E. Appella, M. Yamada, K. Matsushima,
282. Y. Feng, B. K. Klein, and C. A. Mcwherter, J. B. F. P. Edwards, G. M. Clore, A. M. Gronen-
Mol. Biol., 259,524 (1996). born, and A. Wlodawer, Proc. Nut. Acad. Sci.
283. T. Mueller, F. Oehlenschlaeger, and M. Bueh- USA, 88,502 (1991).
ner, J. Mol. Biol., 247, 360 (1955). 303. G. M. Clore, E. Appella, M. Yamada, K. Mat-
284. T. Hage, W. Sebald, and P. Reinemer, Cell, 97, sushima, and A. M. Gronenborn, Biochemis-
271 (1999). try, 29,1689 (1990).
285. M. Hulsmeyer, C. Scheufler, and M. K. Dreyer, 304. M. M. G. M. Thunnissen, P. N. Nordlund, and
Acta Crystallogr., Sect. D, 57, 1334 (2001). J. Z. Haeggstrom, Nut. Struct. Biol., 8, 131
(2001).
286. C. Redfield, L. J. Smith, J. Boyd, G. M. P. Law-
rence, R. G. Edwards, C. J. Gershater, R. A. G. 305. X. Weng, H. Luecke, I. S. Song, D. S. Kang,
Smith, and C. M. Dobson, J. Mol. Biol., 238,23 S-H. Kim, and R. Huber, Protein Sci., 2, 448
(1994). (1993).
287. R. Powers, D. S. Garrett, C. J. March, E. A. 306. A. Rosengarth, V. Gerke, and H. Luecke, J.
Frieden, A. M. Gronenborn, and G. M. Clore, Mol. Biol., 306,489 (2001).
Science, 256, 1673 (1992). 307. M. Tegoni, S. Spinelli, M. Verhoeyen, P. Davis,
288. T. Mueller, T. Dieckmann, W. Sebald, and H. and C. Carnbilla, J. Mol. Biol., 289, 1375
Oschkinat, J. Mol. Bio1.237,423 (1994) (1999).
289. T. Mueller, T. Dieckmann, W. Sebald, and H. 308. H. Wu, J. W. Lustbader, Y. Liu, R. E. Canfield,
Oschkinat, J. Mol. Biol., 237,423 (1994). and W. A. Hendrickson, Structure, 2, 545
(1994).
290. L. J. Smith, C. Redfield, J. Boyd, G. M. P. Law-
rence, R. G. Edwards, R. A. G. Smith, and C. M. 309. A. J. Lapthorn, D. C. Harris, A. Littlejohn,
Dobson, J. Mol. Biol., 224,899 (1992). J. W. Lustbader, R. E. Canfield, K. J. Machin,
F. J. Morgan, and N. W. Isaacs, Nature, 369,
291. M. R. Walter, W. J. Cook, B. G. Zhao, R. Cam- 455 (1994).
eronjunior, S. E. Ealick, R. L. Walterjunior, P.
Reichert, T. L. Nagabhushan, P. P. Trotta, and 310. A. Bohm, J. Pandit, J. Jancarik, R. Halenbeck,
C. E. Bugg, J. Biol. Chem., 267,20371 (1992). K. Koths, and S.-H. Kim, Science, 258, 1358
(1992).
292. A. Wlodawer, A. Pavlovsky, and A. Gustchina,
311. M. J. Jedrzejas, S. Singh, W. J . Brouillette,
Febs Lett., 309,59 (1992).
W. G. Laver, G. M. Air, and M. Luo, J. Mol.
293. R. Powers, D. S. Garrett, C. J. March, E. A. Biol., 267,584 (1997).
Frieden, A. M. Gronenborn, and G. M. Clore, 312. N. R. Taylor,A. Cleasby, 0.Singh, T. Skarzyn-
Biochemistry, 32,6744 (1993). ski, A. J. Wonacott, P. W. Smith, S. L. Sollis,
294. M. V. Milburn, A. M. Hassell, M. H. Lambert, P. D. Howes, P. C. Cherry, R. Bethell, P. Col-
S. R. Jordan, A. E. I. Proudfoot, P. Graber, and man, and J. Varghese, J. Med. Chem., 41,798
T. N. C. Wells, Nature, 363, 172 (1993). (1998).
295. G. Y. Xu, H. A. Yu, J. Hong, M. Stahl, T. Mc- 313. M. J. Jedrzejas, S. Singh, W. J . Brouillette,
donagh, L. E. Kay, and D. A. Cumming, J. Mol. W. G. Laver, G. M. Air, and M. Luo, Biochem-
Biol., 268,468 (1997). istry, 34, 3144 (1995).
i References
3
314. W. P. Burmeister, B. Henrissat, C. Bosso, S. 333. J. M. Gulbis, M. Zhou, S. Mann, and R. Mac-
Cusack, and R. W. H. Ruigrok, Structure, 1,19 kinnon, Science, 289, 123 (2000).
(1993). 334. D. L. Minorjr., Y.-F. Lin, B. C. Mobley, A. Ave-
315. J. B. Finley, V. R. Atigadda, F. Duarte, J. J. lar, Y. N. Jan1.Y. Jan, and J. M. Berger, Cell,
Zhao, W. J. Brouillette, G. M. Air, and M. Luo, 102, 657 (2000).
J. Mol. Biol., 293, 1107 (1999). 335. J. L. Oberfield, J. L. Collins, C. P. Holmes,
316. C. L. White, M. N. Janakiraman, W. G. Laver, D. M. Goreham, J. P. Cooper, J. E. Cobb, J. M.
C. Philippon, A. Vasella, G. M. Air, and M. Luo, Lenhard, E. A. Hull-Ryde, C. P. Mohr, S. G.
J. Mol. Biol., 245,623 (1995). Blanchard, D. J. Parks, L. B. Moore, J. M. Leh-
317. W. P. Burmeister, R. W. H. Ruigrok, and S. mann, K. Plunket, A. B. Miller, M. V. Milburn,
Cusack, Embo J.,11,49 (1992). S. A. Kliewer, and T. M. Wilson, Proc. Nut.
Acad. Sci. USA, 96,6102 (1999).
318. S. A. Monks, G. Karagianis, G. J. Howlett, and
R. S. Norton, J. Biomol. NMR, 8,379 (1996). 336. R. T. Nolte, G. B. Wisely, S. Westin, J. E. Cobb,
M. H. Lambert, R. Kurokawa, M. G. Rosenfeld,
319. R. Bader, A. Bettio, A. G. Beck-Sickinger, and
T. M. Willson, C. K. Glass, and M. V. Milburn,
0. Zerbe, J. Mol. Biol., 305,307 (2001).
Nature, 395, 137 (1998).
320. C. Cabrele, M. Langer, R. Bader, H. A. Wie-
337. R. T. Gampejr.,V. G. Montana, M. H. Lambert,
land, H. N. Doods, 0. Zerbe, and A. G. Beck-
A. B. Miller, R. K. Bledsoe, M. V. Milburn, S. A.
Sickinger, J. Biol. Chem., 275,36043 (2000).
Kliewer, T. M. Willson, and H. E. Xu, Mol. Cell,
321. G. Seidel, W. Schaefer, A. Esswein, E. Hof- 5, 545 (2000).
mann, and P. Roesch, In Press.
338. J. Uppenberg, C. Svensson, M. Jaki, G. Bertils-
322. Z. Chen, P. Xu, J.-R. Barbier, G. Willick, and F. son, L. Jendeberg, and A. Berkenstam, J. Biol.
Ni, Biochemistry, 39, 12766 (2000). Chem., 273,31108 (1998).
323. U. C. Mam, K. Adermann, P. Bayer, W.-G. 339. S. P. Williams and P. B. Sigler, Nature, 393,
Forssmann, and P. Roesc, Biochem. Biophys. 392 (1998).
Res. Comm., 267,213 (2000). 340. W. Somers, M. Ultsch, A. M. Devos, and A. A.
324. L. Jin, S. L. Briggs, S. Chandrasekhar, N. Y. Kossiakoff, Nature, 372,478 (1994).
Chirgadze, D. K. Clawson, R. W. Schevitz, D. L. 341. P. A. Elkins, H. W. Christinger, Y. Sandowski,
Smiley, A. H. Tashjian, and F. Zhang, J. Biol. E. Sakal, A. Gertler, A. M. Devos, and A. A.
Chem., 275,27238 (2000). Kossiakoff, Nut. Struct. Biol., 7, 808 (2000).
325. U. C. Mam, S. Austermann, P. Bayer, K. Ader- 342. F. Rastinejad, T. Wagner, Q. Zhao, and S. Kho-
mann, A. Ejcharth. Sticht, S. Walter, F.-X. rasanizadeh, Embo J.,19, 1045 (2000).
Schmid, R. Jaenicke, W.-G. Forssmann and P.
343. B. P. Klaholz, A. Mitschler, M. Belema, C. Zusi,
Roesch, J. Biol. Chem., 270, 15194 (1995).
and D. Moras, Proc. Nut. Acad. Sci. USA, 97,
326. U. C. Marx, Strukturen VerschiedenerParathor- 6322 (2000).
monfragmente in Loesung, University of Bay- 344. J. P. Renaud, N. Rochel, M. Ruff, V. Vivat, P.
reuth Thesis, Bayreuth, 1996. Chambon, H. Gronemeyer, and D. Moras, Na-
327. G. Y. Xu, T. Mcdonagh, H. A. Yu, E. A. Nalef- ture, 378, 681 (1995).
ski, J. D. Clark, and D. A. Cumming, J. Mol. 345. B. P. Klaholz, J. P. Renaud, A. Mitschler, C.
Biol., 280,485 (1998). Zusi, P. Chambon, H. Gronemeyer, and D. Mo-
328. 0. Perisic, S. Fong, D. E. Lynch, M. Bycroft, ras, Nut. Struct. Biol., 5, 199 (1998).
and R. L. Williams, J. Biol. Chem., 273, 1596 346. W. Bourguet, V. Vivat, J. M. Wurtz, P. Cham-
(1998). bon, H. Gronemeyer, and D. Moras, Mol. Cell,
329. A. Dessen, J. Tang, H. Schmidt, M. Stahl, J. D. 5, 289 (2000).
Clark, J. Seehra, and W. S. Somers, Cell, 97, 347. B. P. Klaholz, A. Mitschler, and D. Moras, J.
349 (1999). Mol. Biol., 302, 155 (2000).
330. A. Kreusch, P. J. Pfaffinger, C. F. Stevens, and 348. R. M. A. Knegtel, M. Katahira, J. G. Schilthuis,
S. Choe, Nature, 392,945 (1998). A. M. J. J. Bonvin, R. Boelens, D. Eib, P. T.
331. S. J. Cushman, M. H. Nanao, A. W. Jahng, D. Vandersaag, and R. Kaptein, J. Biomol. NMR,
Derubeis, S. Choe, and P. J. Pfaffinger, Nut. 3, l(1993).
Struct. Biol., 7,403 (2000). 349. W. Bourguet, M. Ruff, P. Chambon, H. Grone-
332. K. A. Bixby, M. H. Nanao, N. V. Shen, A. meyer, and D. Moras, Nature, 375,377 (1995).
Kreusch, H. Bellamy, P. J. Pfaffinger, and S. 350. F. Rastinejad, T. Perlmann, R. M. Evans, and
Choe, Nut. Struct. Biol., 6 , 3 8 (1998). P. B. Sigler, Nature, 375, 203 (1995).
X-Ray Crystallography in Drug Discovery
DAVID J. CFNK
RICHARD J. CLARK
Institute for Molecular Bioscience
Australian Research Council Special Research
Centre for Functional and Applied Genomics
University of Queensland
Brisbane, Australia
Contents
1 Introduction, 508
1.1 Overview of Drug Development, 509
1.2 Scope of Chapter, 510
1.3 Principles of NMR Spectroscopy, 510
1.4 Instrumentation, 514
1.5 Applications of NMR in Drug Design
and Discovery, 516
2 Ligand-Based Design, 517
2.1 Structure Elucidation, 517
2.1.1 Structure Elucidation of Natural
Products, 517
2.1.2 Structure Determination of Bioactive
Peptides, 518
2.1.2.1 NMR Structure of Ziconotide: A
Novel Treatment for Pain, 518
2.1.2.2 Endothelin as a Lead in Ligand-
Based Design, 523
2.1.3 Instrumental Advances and their
Impact on Structure Elucidation, 524
2.2 Conformational Analysis, 525
2.3 Charge State, 526
2.4 Tautomeric Equilibria, 526
2.5 Ligand Dynamics: Line-Shape
and Relaxation Data, 528
2.6 Pharmocophore Modeling: Conformations of
a Set of Ligands, 531
2.7 Limitations of Analog-Based Design, 532
2.8 Conformation of Bound Ligands:
Transferred NOES, 532
Burger's Medicinal Chemistry and Drug Discovery 3 Receptor-Based Design, 532
Sixth Edition, Volume 1: Drug Discovery 3.1 Macromolecular Structure Determination,
Edited by Donald J. Abraham 533
ISBN 0471-27090-3 O 2003 John Wiey & Sons, Inc. 3.1.1 Overview of Approach, 533
507
NMR and Drug Discovery
Cycle A
lead molecules to improve their druglike prop- egories 1 and 2 may be classified as structure-
erties. Again, several loops around Cycle B based design, whereas category 3 relates to
may be necessary before one or more develop- drug discovery.
ment candidates are identified. Ultimately one
or two of these development candidates are 1.2 Scope of Chapter
identified for progression through clinical tri- Our aim is to give a broad overview on the use
als. of NMR as a tool in structure-based design and
As indicated in Fig. 12.1, it is convenient to in screening approaches to drug discovery.
envisage five broad categories of NMR experi- The chapter also contains a description of the
ments that may contribute to this overall drug relevant NMR methods, which are highlighted
development process. by illustrative examples. We briefly describe
the instrumentation required for such studies
1. Small molecule, or ligand-based, NMR. and emerging trends in the field are discussed.
This involves studies of drugs and drug This includes developments in the field of drug
leads, typically organic molecules with a discovery in the postgenomic era that are
molecular weight <500 Da, but also includ- likely to have an impact on the way in which
ing small proteins of up to a few kDa. These NMR is used, as seen for example by the recent
studies may be used to characterize natural interest in structural genomics programs.
products or synthetic drug leads, or to de- NMR instrument developments are also de-
termine their conformation. scribed. For example, recent advances in cryo-
probe technology promise to dramatically in-
2. Macromolecular NMR. This involves stud- crease the sensitivity of NMR spectroscopy
ies of the macromolecular targets of drugs, and increase its application across the phar-
typically to determine their three-dimen- maceutical industry. Finally, a section outlin-
sional structure and/or the nature of their ing some of the practical considerations in
complexes with ligands. structure-based design and screening is in-
3. NMR screening. This involves the use of cluded. Future directions for the field are men-
NMR to identify lead molecules that bind tioned throughout the discussion.
to a macromolecular target. These studies There have been a number of reviews that
typically involve both small molecules and describe applications of NMR in drug discov-
macromolecules and seek to detect the ery or screening and the reader is referred to
presence of binding interactions between these for additional information (2-15). Re-
them. cent books covering aspects of NMR in drug
4. Metabolic NMR. This involves studies of design are also available (16, 17).
endogenous molecules whose levels may be It is assumed that most readers will be fa-
modified by drug treatment, or studies of miliar with the basic principles of NMR. How-
the metabolites of drugs themselves. ever, for completeness and to define some of
5. NMR imaging. Such studies provide ana- the terms that will be used in this chapter it is
tomical information in an animal model or useful to give a brief overview of the princi-
human patient. This includes, for example, ples. Excellent texts are available to provide
monitoring the size of plaques or tumors in more detail (18, 19).
the brains of Alzheimer's or cancer pa-
tients, respectively, during drug therapy. 1.3 Principles of NMR Spectroscopy
The underlying basis of NMR is that when
It is clear from these descriptions that nuclei with a nonzero spin quantum number
NMR covers a wide range ?f applications in are placed in a magnetic field they take up one
the pharmaceutical i n d u s t q although for the of a discrete number of quantized states. The
remainder of this chapter we will focus on application of radiofrequency (rf)energy pro-
NMR in the drug design/discovery phase of duces transitions between these states. The
drug development, that is, on categories 1-3 of energy changes associated with these transi-
the preceding list. Together, the studies in cat- tions are detected as small voltages induced in
1 Introduction
rf pulse
I
Sample tube FID
inside coil
Figure 12.2. Overview of the principles of NMR spectroscopy. Polarization of nuclear spins by a
magnetic field is perturbed by application of a radiofrequency (rf) pulse. The resultant signal is
Fourier transformed, to yield a spectrum reflecting the number and environments of nuclei in the
sample.
a receiver coil that are subsequently amplified, nuclei in a molecule is such that chemical
digitized, and processed to yield spectra, as il- shifts range up to only a few hundred parts per
lustrated in Fig. 12.2. The most commonly million (ppm)of the base resonance frequency
studied NMR-active nucleus is the proton, 'H, for 13C and 15N. For 'H the range is smaller
but in modern NMR experiments 2H, 13C.and still, covering only about 10 ppm. Despite this
15N nuclei are also very important. For these small range, chemical shifts provide valuable
heteronuclei it is common to isotopically en- diagnostic information on the environment of
rich the sample because of their low natural the nucleus giving rise to the signal.
abundance. This is particularly important for The chemical shift is an extremely impor-
studies of proteins, as will become apparent tant NMR parameter but there are many
later in this chapter. Occasionally, other nu- other parameters that can be discerned from
clei find specialist applications. For example, NMR spectra. Indeed, NMR is unique among
in fluorine-containing drugs it is possible to many forms of spectroscopy in that there are
use sensitive lSF-NMR signals to monitor in- so many parameters associated with a spec-
teraction with target proteins, as described trum other than just peak intensity and fre-
later in this chapter. quency. These include coupling constants,
In modern spectrometers the rf energy is which provide information on local conforma-
supplied in the form of short pulses (typically, tions and also on molecular connectivities; nu-
-10 ps) that simultaneously excite all nuclei clear Overhauser effects (NOES),which pro-
of a given isotope type (e.g., all protons or all vide information on internuclear distances:
13Cnuclei). Nuclei of a given isotope that are and relaxation parameters, which provide in-
in different chemical environments by virtue formation on molecular dynamics. Table 12.1
of their atomic locations in the molecule have summarizes the main NMR parameters that
slightly different resonance frequencies and may be measured and highlights their applica-
lead to different oscillating voltages in the re- tions in the drug discovery process.
ceiver coil. The resultant combined signal, The following sections of this chapter pro-
termed a free induction decay (FID), is Fourier vide specific examples of how these various pa-
transformed to give a spectrum that is basi- rameters are useful in the drug discovery pro-
cally a plot of peak intensity vs. frequency, cess. Before doing this, though, it is useful to
with one peak for each chemically distinct nu- consider some of the limitations of one-dimen-
cleus. These features are schematically illus- sional (ID) NMR spectroscopy, particularly
trated in Fig. 12.2. The frequency axis is when the detected nucleus is 'H. as is most
termed the chemical shift because it reflects commonly the case. With one signal coming
the local chemical environment of each nu- from each chemically distinct proton and with
cleus. The range of chemical environments of those signals spread only over 10 ppm, it is
51 2 NMR and Drug Discovery
clear that spectral overlap can potentially be a drug cyclosporin, and includes a region of both
major problem for anything but the simplest the 'H/l5N and lH/l3C HSQC spectra. In
of molecules. The development of higher field HSQC spectra-overlap problems are alleviated
NMR spectrometers, which effectively provide because, even if two protons have the same
greater dispersion in the frequency dimen- chemical shift and would hence be overlapped
sion, has contributed significantly to overcom- in a 1D spectrum, chances are that the respec-
ing this limitation and increasing the applica- tive heteronuclear signals will not be over-
tion of NMR for studying pharmaceutically lapped, allowing the signals to be resolved in
relevant molecules. In addition to such instru- the 2D spectrum. HSQC spectra are widely
mental developments, methodological ad- used in NMR-based drug screening and we
vances have also played a key role in extending will return to them later.
the use of NMR. Multidimensional NMR Multidimensional NMR spectra are not re-
methods have revolutionized biomolecular stricted to cases where the separate frequency
NMR spectroscopy by removing the limita- axes encode signals from different nuclear
tions of a single frequency dimension, leading types. Indeed, much of the early work on the
to the development of 2D, 3D, and 4D spectra. development of 2D NMR was performed on
A simple way of illustrating multidimen- cases where both axes involved 'H chemical
sional NMR is through reference to hetero- shifts. The main value in such spectra comes
nuclear correlation spectroscopy, in which two from the information content in cross peaks
or more separate frequency dimensions are between pairs of protons. In COSY-type spec-
correlated with one another. For example, a tra (COSY = Correlation SpectroscopY) cross
particularly valuable 2D experiment is 'H-15N peaks occur only between protons that are sca-
heteronuclear single quantum correlation lar coupled (i.e., within 2 or 3 bonds) to each
(HSQC) spectroscopy, in which the resultant other, whereas in NOESY (NOE Spectros-
spectrum has two frequency axes, correspond- copy) spectra cross peaks occur for protons
ing to 'H and 15Nfrequency dimensions, and that are physically close in space (<5 A apart).
one intensity axis. Analogous 'H-13C HSQC A combination of these two types of 2D spectra
spectra are also widely used. Such spectra are may be used to assign the NMR signals of
normally represented with the intensity axis small proteins and provides sufficient infor-
in contour form so that they may be drawn in mation on internuclear distances to calculate
two dimensions as a set of contour peaks. three-dimensional structures. Figure 12.3 in-
Spectral peaks occur for pairs of l5NI1H or cludes a panel showing the COSY spectrum of
l3C/lH nuclei that are directly bonded to one cyclosporin and highlights the relationships
another, and with each frequency being char- between 1D 'H-NMR spectra and correspond-
acteristic for the local chemical environment ing 2D homonuclear (COSY) and hetero-
they represent a relatively simple, but highly nuclear (HSQC) spectra.
characteristic fingerprint of the sample. Fig- Homonuclear 2D spectra are generally ap-
ure 12.3 shows the relationship between 1D plicable for the study of proteins up to only
and 2D spectra for the immunosuppressive approximately 80 amino acids in size. For
Figure 12.3. A schematic representationof the
(a) 1D 'H; (b) 2D DQF-COSY; (c) l5NI1H-HSQC;
and (dl 13C/1H-HSQCspectra
- of the immuno-
suppressive agent cyclosporin. Example reso-
nanceslcorrelations from residues 6 and 7 have
been highlighted to illustrate the assignment
process.
NMR and Drug Discovery
Magnet
Sample tube
Probe
Figure 12.4. Block diagram of a modern NMR spectrometer. These systems use superconducting
magnets that are based on a solenoid of a suitable alloy (e.g., niobium/titanium or niobiumltin)
immersed in a dewar of liquid helium. The extremely low temperature of the magnet itself (4.2 K) is
well insulated from the sample chamber in the center of the magnet bore. The probe in which the
sample is housed usually incorporates accurate temperature control over the range typically of 4 to
40°C for biological samples. The rf coil in the probe is connected in turn to a preamplifier, receiver
circuitry, analog-to-digital converter (ADC), and a computer for data collection.
larger proteins the increased number of sig- The above discussion provides a basic over-
nals leads to overlap problems and, in addi- view of some of the methods important in
tion, COSY-type spectra suffer from poor sen- modern NMR spectroscopy. Before examining
sitivity when the signal linewidths are of the specific applications in drug discovery it is use-
same order as or larger than 'H, 'H scalar ful to describe the instrumental requirements
coupling constants. Such limitations are re- for such studies.
duced by use of spectra of higher dimensional-
1.4 Instrumentation
ity (i.e., 3D or 4D spectra) that are based on
correlations involving heteronuclear rather NMR spectrometers constitute a powerful and
than homonuclear coupling constants. Such homogeneous magnet, a radiofrequency con-
spectra are important in the structure deter- 'sole for generating appropriate rf pulses, a
mination process for larger proteins and are probe for applying this rf energy to the sample .
typically recorded for samples that incorpo- and receiving the resultant signals, and a com-
rate uniform labeling with 15N, or both 13C puter console for controlling the experiments
and 15Nnuclei. Multidimensional spectra that and acquiring the resultant data. These fea-
involve irradiation of 'H, 13C, and 15N nuclei tures are summarized in Fig. 12.4. Spectrom-
are referred to as triple resonance spectra. eters are normally specified in terms of the
The details of how multidimensional spec- resonant frequency of protons at the given
tra are obtained is beyond the scope of this magnetic field (e.g., 500 MHz corresponds to a
chapter, but it suffices to say that, like most magnetic field of 11.7 Tesla). Both sensitivity
other modern NMR experiments, they involve and dispersion of signals increase with in-
irradiation of the sample with a set of rf pulses creasing magnetic field.
of defined length, frequency, and phase, with There have been some major break-
specific interpulse delays. The pulse programs throughs in both NMR instrumentation and
for such experiments are commonly provided methodology over the last decade that have
with the spectrometer as part of a standard greatly increased the utility of NMR for drug
library of experiments and may easily be run discovery applications. These are summarized
by novice users after input of an appropriate in Table 12.2, which also includes some of the
set of parameters to define the relevant spec- earlier milestones in the development of
tral widths and type of experiment required. NMR. Most notable among recent innovations
1 Introduction 515
are the use of pulsed-field gradient methods currently available but, at the time of writing,
for improving spectral quality and allowing only a few have been installed. Numerous 800-
new types of experiments to be performed, MHz systems dedicated to structure-based de-
transverse relaxation-optimized spectroscopy sign have been installed in pharmaceutical
(TROSY) methods (20) for increasing the size laboratories. The high field instruments pro-
of macromolecules that can be examined, and vide another advantage in that TROSY exper-
cryoprobes for enhancing sensitivity. The de- iments (20) can be used to produce a marked
velopment of cryoprobes has resulted in the improvement in spectral quality for larger
biggest single gain in sensitivity over recent proteins. Such developments promise to push
years, effectively giving 500-MHz spectrome- higher the size of proteins whose structure can
ters the sensitivity of 800-MHz spectrometers be determined by NMR.
(although without the gain in resolution!). For NMR drug screening programs, the ba-
The enhanced sensitivity is obtained by cool- sic requirement of a spectrometer of 500 MHz
ing the receiver coil and associated circuitry to or greater remains, but in addition, an inter-
near liquid helium temperatures, thereby re- face that allows the spectrometer to sample a
ducing thermal noise. There were consider- library of compounds of potential binding li-
able technical barriers to be overcome in de- gands needs to be present. This may be done
veloping such probes because of the large either by use of a discrete sample changer or a
difference in temperature between the re- flow-type system. Flow systems have the po-
ceiver coils and the sample, which are only a tential advantage of increased throughput but
few millimeters apart. These barriers have have the potential disadvantage of precipita-
now been overcome and cryoprobes are being tion of protein samples. In practice this ap-
installed in a large number of laboratories. pears not to have been a major problem and
They are also becoming available for higher both types of systems are in use in the phar-
field systems (800 MHz), thus providing fur- maceutical industry. Sample changer systems
ther sensitivity gains. currently have the advantage that they may be
Although the basic configurations of in- adapted for use with cryoprobe technology
struments tailored for structure-based design (currently unavailable for flow systems). Cryo-
or for NMR drug screening are similar, there probes allow dramatically enhanced sensitiv-
are some minor differences. For structure- ity gains, which bring particular advantages to
based design applications a relatively high the study of macromolecule-ligand interac-
field spectrometer is required (>500 MHz), tions used in screening programs (21).
usually equipped with three or four radiofre- Pulsed-field gradients have become inte-
quency channels for the simultaneous irradi- gral to most modern NMR spectrometers and
ation of 'H, 13C, 15N, and in some cases 'H are routinely used both for structure determi-
nuclei. The greatest sensitivity and dispersion nation and screening experiments. Another
are obtained with the highest possible mag- recent development has been the interface of
netic field. Instruments of up to 900 MHz are NMR spectrometers with other instrumenta-
516 NMR and Drug Discovei
rn ____)
NMR scrc--
jell -' \
UI
I :^^-
u y a i IId
Structure-based
compound library enhancement ~ ua
I design
L ~
Figure 12.5. A summary of the relationship between NMR screening and structure-based design.
(Adapted from Ref. 15.)
tion such as liquid chromatography (LC) design new drugs. The questions that may b
andlor mass spectrometry (MS). The applica- asked when embarking on structure-based df
tions of these instrumental developments to sign projects are:
drug discovery have been recently reviewed (8,
13,22). What are the solution and bound conform2
tions of the ligand?
1.5 Applications of NMR in Drug Design
What is its charge/tautomeric state?
and Discovery
Which functional groups bind to the recer
Our focus here is on the use of NMR in the tor and what charge state are they in?
discovery and design phase of drug develop-
0 What is the structure of the receptor?
ment. The major role of NMR in the design
process comes about by its exquisite ability to Which parts interact with the ligand?
provide structural information, whereas the What is the geometry of the ligand-recepto
major role of NMR in discovery comes through complex?
its use as a screening tool to detect the binding What are the kinetics of binding and arm
of novel ligands to macromolecular targets. As there dynamic motions of ligand, receptor
already noted, the latter application is a rela- or the complex?
tively recent development but has created
much interest in the pharmaceutical industry Table 12.3 summarizes these and othe
and promises to significantly enhance applica- questions and indicates the type of NMR ap
tions of NMR in this industry. The impact of proaches that can provide answers. Remain
the methodology is already becoming evident ing sections of this chapter are organize(
even at this early stage, with several SAR-by- around the headings identified in Table 12.3.
NMR-derived leads currentlv" in clinical devel- In considering these questions it is conve
opment. As already noted, though, the discov- nient to distinguish between ligand-based de
ery and design phases are often intimately sign, where the structural focus is on the smal
connected, with lead molecules discovered in lead molecule, and receptor-based design
screening programs routinely being optimized where the aim is to determine the structure o
by use of structure-based design approaches the macromolecular target. The NMR meth
(Fig. 12.5). ods used in ligand-based design have been we1
In the context of this chapter structure- established for many years, based on the use o
based design refers to the process of determin- NMR by organic and natural product chemist!
ing the three-dimensional structure of a lead for more than four decades. However, therc
molecule or macromolecular target, or deter- have been some important recent advances ir
mining the structure of the macromolecule- NMR methods such as the use of pulsed fiek
ligand complex, and using this information to gradients, and in the combination of NMF
2 Ligand-Based Design 51 7
with other technologies such as LC and MS a designed lead compound, with the aim of
that promise to enhance applications in this improving the activity or druglike properties.
field (13). The use of NMR to determine the The following sections examine various as-
three-dimensional structures of macromole- pects of ligand-based design and illustrate
cules is a newer field, commencing only in them with examples.
around 1985, and is one that is still rapidly
evolving. NMR screening is a still newer ap- 2.1 Structure Elucidation
proach, developed since around 1996. Ligand- If the bioactive molecule is a synthetic prod-
based and receptor-based design are exam- uct, its structure may be rapidly deduced by a
ined in Sections 2 and 3, respectively, and simple comparison of NMR parameters (often
screening-based approaches are examined in combined with MS) of the product relative to
Section 4. those of the known precursor, to see whether
the desired chemical transformation has
2 LIGAND-BASED DESIGN taken place. If the bioactive compound is an
unknown molecule discovered in an active
Many naturally occurring molecules have po- fraction in bioassay-guided screening, then
tent bioactivity that renders them useful leads the first step is to elucidate its structure. Typ-
in the drug design process. These may be nat- ical molecules that form the basis of such nat-
urally occurring hormones, neurotransmit- ural products-based drug discovery studies in-
ters, or other endogenous molecules, or they clude "organic" natural products as well as
may be bioactive molecules from plants or mi- small peptides and proteins. The approaches
croorganisms. Furthermore, screening pro- to structure elucidation for natural products
grams on synthetic compound libraries fre- and peptides/proteins are a little different
quently result in the discovery of bioactive from each other and are described in turn.
molecules that then become starting points in
drug design. The general aim of ligand- or an- 2.1 .I Structure Elucidation of Natural Prod-
alog-based design is to determine the struc- ucts. In the case of nonpeptidic natural prod-
ture and conformation of a known bioactive ucts the main structural focus initially is to
molecule and then mimic this conformation in elucidate the carbon framework. This nor-
NMR and Drug Discovery
mally involves a combination of 1D 'H and tion of peptide-based natural products in-
13C-NMR, followed by homonuclear (DQF- volves two distinct steps: (1)the elucidation of
COSY, TOCSY, ROESY, or NOESY) and het- the primary structure (amino acid sequence)
eronuclear (HSQC, HMBC) 2D experiments. followed by (2) a determination of secondary/
Heteronuclear multiple bond correlation tertiary structure. The primary structure de-
(HMBC) spectra are particularly valuable be- termination is routinely done through Edman
cause they assist in tracing the backbone of sequencing, or more recently by MS-MS meth-
the molecule. Such spectra display cross peaks ods. NMR plays a key role in the elucidation of
between a 13C nucleus and protons connected the secondary and tertiary structure of pep-
within two or three bonds and, in doing so, tides, mainly based on 2D homonuclear NMR
provide valuable information on molecular spectroscopy. A combination of DQF-COSY
connectivity. Figure 12.6 shows typical HMBC and TOCSY (Total Correlalation Svectros--
correlations seen for selected regions of taxol, copy) spectra are used to assign spin systems
a plant-derived natural product that is cur- to amino acid types and then NOESY spectra
rently a leading treatment for breast and ovar- are used to sequentially assign the resonances
ian cancers. Although the structure of taxol to individual protons in the peptide (23). The
itself was originally deduced from a combina- three-dimensional structure is then deter-
tion of X-ray crystallography on a degradation mined by deriving a series of internuclear dis-
product and a range of 'H and 13C spectra in tance restraints from the NOESY svectrum
the 1970s, before HMBC spectra had been in- and using them in a simulated annealing algo-
vented, HMBC spectra have been widely used rithm to calculate a family of structures con-
for studies of the many taxol derivatives that sistent with them.
have been examined in the last decade. Because the structure determination of
Elucidation of the carbon framework of peptides and proteins represents a very impor-
natural products often yields substantial in- tant contribution of NMR to the drug develop-
formation about the three-dimensional struc- ment process, it is informative to describe the
ture at the same time. but if there are remain- vrocess in more detail. To do this we will use
A
ing questions on the stereochemistry of chiral the recently developed peptide-based drug
centers or other factors affecting the three- MVIIA as an example.
dimensional structure, these can usually be 2.1.2.1 NMR Structure of Ziconotide: A
-
resolved from NOESY svectra andlor an anal- Novel Treatment for Pain. MVIIA, now known
ysis of coupling constants. We will return to as Ziconotide, is a 25-amino acid peptide orig-
the taxol example later in Section 2.2 when inally discovered from the venom of the ma-
describing conformational analysis. rine cone snail, Conus magus. Like other
o-conotoxins it is a potent blocker of N-type
2.1.2 Structure Determination of Bioactive calcium channels, giving it a wide range of po-
Peptides. In contrast to the process described tential therapeutic applications. When deliv-
for organic molecules, the structure elucida- ered intrathecally (i.e., through spinal infu-
2 Ligand-Based Design
sion), it is approximately 1000 times more ids units, the repeated NH, Ha, and side-chain
potent than morphine as an analgesic and has protons tend to fall in characteristic chemical
great potential for the treatment of intracta- shift ranges that can be useful in looking for
ble cancer pain (24). Figure 12.7 shows the patterns to identify amino acid types. Table
peptide sequence and illustrates selected re- 12.4 shows typical chemical shifts for each of
gions of the TOCSY and NOESY spectra. the 20 common amino acids when located in a
As seen in Fig. 12.7, the TOCSY experi- "random-coil" environment (23. . , 25. 26). It is
ment is useful for classifying spin systems to important to stress that these shifts can vary
amino acid type, with typically the most useful quite considerably in structured proteins (by
region being the "skewers" emanating from up to several ppm) and are more useful for
individual NH shifts (-7-10 ppm). For each pattern recognition purposes than for exact
NH proton in the peptide a series of cross identification of a particular residue. In the
peaks to the a, P, and other side-chain protons case of the H a protons, the differences be-
is observed and these patterns define the spin tween the actual shifts in a structured protein
system as belonging to a particular type of and these random-coil values have an addi-
amino acid. Note, however, that there is some tional important use, in that they provide an
degeneracy in the resultant patterns. The NH indication of the local secondary structure. In-
side-chain pattern is truncated if there is a tuitively, the further a chemical shift is from a
break of more than three bonds between pro- random-coil value. the more likely it is attrib-
tons within the spin system. This means, for
example, that the skewers for aromatic resi-
a
uted to that residue's being in structured
environment.
dues extend only as far as the P-protons and After the assignment is complete it is pos-
they therefore appear similar to other "AMX" sible to derive substantial information about
residues such as Cys, Ser, Asp, or Asn. Never- the secondary structure from an analysis of
theless, the ability to assign signals to either chemical shifts, coupling constants, and
individual amino acid types or to the AMX NOEs, even before the three-dimensional
group is a useful starting point in the assign- structure calculations are commenced. Fimre .,
ment. However, such spectra provide no infor- 12.8 shows a typical summary of the relevant
mation about the sequential location of an NMR information, again using the data for
amino acid if it is not unique in the sequence. MVIIA as an example (27,28). Trends in these,
These sequential assignments are obtained data provide a general indication of major ele-
from the NOESY spectrum, as illustrated in ments of secondary structure. For example, a
the sequential walk shown in the middle panel series of strong daN(i, i + I), relative to
of Fig. 12.7. The aim of the sequential assign- dNN(i, i + 1) NOEs often indicates an ex-
ment process is to locate adjacent amino acid tended or P-type structure, whereas strong
spin systems, principally through a cross peak dNN(i, i + 1) NOEs indicate local helical
between the a H proton of one residue (i) and structure or turns. Large J a N coupling con-
the NH of the following residue (i + l), often stants ( X . 5 Hz) are associated with extended
denoted as daN(i, i + 1). Additional support structure and small ones (<5 Hz) with helical
for the assignment is usually also sought in structure. Similarly, deviations of chemical
dpN(i, i + 1)and dNN(i, i + 1)correlations. At shifts from random-coil values, often repre-
the early stages of an assignment it is impos- sented in terms of "chemical shift indices"
sible to be certain whether a particular cross (29),indicate extended (positive values) or he-
peak is a sequential or longer range cross lical structure (negative values).
peak; however, as the assignment procedure An additional useful parameter is the ex-
progresses, ambiguities become resolved. The change rate of amide protons after dissolution
assignment process is generally highly conver- of the sample in D,O. Slowly exchanging
gent, in that once a series of correct assign- amide protons indicate protection from sol-
ments is made the number of choices for re- vent and possible involvement in intramolec-
maining cross peaks diminishes, in principle ular hydrogen bonds associated with elements
making their assignment easier. of secondary structure. All of the NMR and
Because peptides are polymers of amino ac- slow exchange data can be consolidated to give
NMR and Drug Discovery
Figure 12.7. Schematic representations of 2D-NMR spectra of the conotoxin MVIIA. (a) The fin-
gerprint region of the TOCSY spectrum with selected spin systems marked. (b) Fingerprint region of
the NOESY spectrum showing two (K2-A6 and Lll-Y13) sequential walks. (c) NH-NH region of the
NOESY spectrum showing correlations between the NH protons of Dl4 and G15; C16 and T17; and
S22 and G23.
2 Ligand-Based Design
Table 12.4 'H Chemical Shifts for the 20 Common Amino Acid Residues
in Random-Coil Peptidesa
Residue NH aH pH Others
Ala
Arg
Asn
Asp
CY~
Gln
Glu
G~Y
His
Ile
Leu
LY~
Met
Phe
Pro
Ser
Thr
T~P
TY~
Val
"The backbone shifts (aHand NH, ppm) are from Wishart et al. (26).The remaining shifts are from Wiithrich 1986 (23).
an accurate representation of secondary struc- tance restraints. These are then used in a sim-
ture, as indicated in the lower panel of Figure ulated annealing algorithm to calculate a fam-
12.8. In the case of MVIIA a triple-stranded ily of 3D structures consistent with the input
P-sheet may be deduced on the basis of the restraints. Fig. 12.9 shows two commonly
local NOE, coupling, chemical shift, and used methods of representing such NMR-de-
amide-exchange NMR data. rived structures, either as a stereoview of the
Once all peaks in the 2D spectra have been superimposed family of structures or as a rib-
assigned, cross peaks in the NOESY spectrum bon diagram, in which elements of secondary
are used to derive a series of interproton dis- structure are highlighted. For the latter rep-
522 NMR and Drug Discovery
(i) CSI
C K G K G A K C S R L M Y D C C T G S C R S G K C
Figure 12.8. A summary of the NMR data observed for MVIIA. (a) Ha-NH sequential NOEs. (b)
NH-NH sequential NOEs. (c)HP-NH sequential NOEs. (f-h)Other short-range NOEs. The thickness
of the bar indicates the strength of the observed NOE (weak, medium, or strong). (d) Three-bond
NH-Ha coupling data, where upward-pointing arrows indicate a large coupling (>8Hz) and down-
ward-pointing arrows indicate a small coupling (<5 Hz). (e) H/D exchange data, where a filled circle
represents a slow exchanging NH. (i) Chemical shift index (CSI) data. The CSI uses a scoring system
that compares Ha shifts to random-coil chemical shifts. A sequence of consecutive +1 scores is
indicative of f3-structure,whereas a sequence of consecutive -1 scores suggests helical structure. Cj)
The P-sheet of MVIIA. Double-headed arrows indicate observed NOES and broken lines indicate
proposed H-bonds.
ligand-Based Design 523
Figure 12.9. (a) A stereoview of the superimposed backbone structures of the 20 lowest energy
conformations for MVIIA(27). (b) Ribbon diagram of MVIIA.
sentation the lowest energy or average coupled receptors. In mammalian species two
ember of the ensemble is often chosen as receptors, ETA and ET,, have been cloned;
presentative of the structure. It is impor- both are widely distributed in human tissue
ta:nt, however, to examine the full ensemble to and are distinguished by different responses
gain a complete understanding of the struc- to various ET isoforms.
ture. Regions of disorder in the ensemble can The NMR-derived three-dimensional struc-
be indicative of a lack of sufficient distance ture of ET-1 consists of several distinct re-
re1straints, perhaps attributable to overlap or gions, including a random-coil N-terminus, a
as)signment errors, or may be related to local p-turnover residues 5-8, followed by a short
flexibility. helical region and a flexible C-terminal tail (as
In the case of MVIIA the peptide itself is summarized in Ref. 31). The presence of the
being clinically developed as the active drug flexible tail in solution is not surprising, as
f0l administration through the intrathecal may be imagined from the primary sequence
(s1~inalinfusion) route. However, in general, shown in Fig. 12.10. Although solution struc-
Peptides have a range of potential disadvan- tures of ET and its analogs (32-46) deter-
tal;es as drugs, including poor bioavailability mined by - NMR have been valuable in defining-
and susceptibility to proteolytic breakdown. the gross conformation of these molecules, the
Thus, for many cases involving peptide-based flexibility of the tail in solution makes it diffi-
lea~dsthe structural information of the type cult to extrapolate to the bound state. Indeed,
de!scribed above might be used as a starting an X-ray structure of ET-1 has quite a differ-
po:int to design smaller constrained peptides ent structure for the C-terminal tail than for
or nonpeptidic mimics. This is the case, for the random-coil arrangement in solution (46).
ex;imple, in the development of endothelin an- The bound conformation may be different
tagconists described below. again.
2.1.2.2 Endothelin as a Lead in Ligand- There is clearly an advantage to having
Ba:sed Design. Endothelin (ET), shown in Fig. lead molecules with reduced flexibility, given
12..lo, is a 21-amino acid endothelial-derived that their solution conformation will intrinsi-
COllstricting factor that has gained promi- cally provide a better reflection of the bound
neince as a pharmacological lead molecule. In- conformation. In addition, the development of
ter'est in the peptide arose because of its po- a more rigid drug will reduce unfavorable en-
terit renal, pulmonary, and neuroendocrine tropic contributions to binding energy. In-
act.ivities. Endothelin and its isoforms have deed, a range of small cyclic peptides that are
bet?n implicated in a wide variety of disease ETA-or ET,-selective antagonists have been
sta.tes including ischemia, cerebral vaso- discovered and provide valuable leads to the
sp: ism, stroke, renal failure, hypertension, development of potential therapeutics. NMR
antd heart failure (30). It exerts its pharmaco- studies have been instrumental in determin-
loeical effect by acting on specific G-protein-
0
ing their solution conformations. For exam-
NMR and Drug Discovery
Leu 4 d-Trp 5
Figure 12.10. (a) Primary sequence and disulfide connedivities of endothelin-1(ET-1). (b) Primary
structure of the cyclic endothelin antagonist BE18257B and (c)a family of 36 NMR structures, which
demonstrate the well-defined nature of the cyclic peptide backbone.
ple, the rather well defined solution confor- 2.1.3 Instrumental Advances and their Im-
mation (47) of the ETA-selective antagonist pact on Structure Elucidation. Over the last
BE18257B (shown in Fig. 12.10) contrasts few years there have been several exciting in-
with the flexibility of the tail region of ET that strumental developments that promise to dra-
this peptide is thought to mimic. The discov- matically expand the role NMR will play in the
ery and development of these molecules illus- drug discovery process. These relate to the
trate the principle that cyclic peptides are of- combination of NMR with other technologies
ten more suitable than linear peptides as lead such as LC and/or MS and the use of NMR to
ligands in drug design. In addition to their bet- directly monitor reactions carried out on solid-
ter-defined and less-flexible conformations phase resins (8,13,22).The latter promises to
than those of their linear counterparts, they indirectly enhance drug discovery programs
generally have improved bioavailability and by improving the monitoring and hence effi-
resistance to protease attack. ciency of solid-phase combinatorial synthesis.
We shall return to endothelin as a lead in Effectively, resin-based syntheses can be mon-
drug design, in relation to a nonpeptidic an- itored at successive stages without the need to
tagonist. The underlying theme illustrated by cleave intermediate products from the resin.
the endothelin example is that ligand-based As already mentioned, the additional sensi-
design often proceeds from initial studies of tivity brought about by cryoprobe technology
flexible endogenous molecules (particularly promises to enhance a wide range of NMR ap-
peptides) to constrained mimics (e.g., cyclic plications, but will be particularly important
peptides) and often culminates in the develop- in natural products-based drug discovery. In
ment of nonpeptidic drug leads. NMR assists many cases only limited amounts of pure com-
by defining the structures of the lead and sub- pounds are isolated from natural products ex-
sequent molecules. tracts and sensitivity has been a major limit-
2 Ligand-Based Design
Figure 12.11. Illustration of the Karplus relationship between three-bond scalar coupling constants
and the dihedral angle of the intervening bond. The relationship is indicated for the 4 torsion angle
of the H2 and H3 protons within the rigid core of taxol and related derivatives. See Fig. 12.6 for the
structure of taxol.
ing factor on structure elucidation. LC/MS/ ity. Three-bond vicinal coupling constants are
NMR systems will greatly improve the particularly valuable because their depen-
efficiency of such analyses by minimizing the dency on the intervening dihedral angle'
need for separate sample-handling steps for through the Karplus relationship allows local
the different analytical technologies. geometry to be determined. This is illustrated
in Fig. 12.11 for taxol. Although there are sev-
2.2 Conformational Analysis eral vicinal coupling constants in this mole-
Usually only 1D or 2D NMR methods are re- cule (Fig. 12.6), only one 3J,,,, occurs in a
quired to determine the solution conformation region of the molecule that is expected to be
of bioactive ligands. Useful tools include anal- conformationally rigid and thus suitable for
ysis of chemical shifts, coupling constants, and conformational determination by use of cou-
NOEs. An assumption inherent in the applica- pling constants. In taxol and a range of ana-
tion of such studies to drug design is that the logs this coupling is in the range 4-7 Hz, con-
solution conformation will be maintained on sistent with partially eclipsed dihedral angles
binding to the receptor. This is justified in the of approximately 120-140" for this ring-con-
case of relatively rigid ligands. However, for strained structure. This is in good agreement
potentially flexible ligands the possibility of with the X-ray structure of a taxol analog,
changes in conformation on binding must be where the angle is 120". Note that, in general,
considered, as noted above for the case of en- such a Karplus analysis does not give a unique
dothelin. solution unless several coupling constants
Coupling constants and NOEs are the main sampling the same dihedral angle are present
NMR parameters used in determining the so- and is reliant on the assumption that the mol-
lution conformations of drug leads. NOEs pro- ecule exists only in a single conformation in
vide information about through-space proxim- solution. Although it is generally believed that
NMR and Drug Discovery
Figure 12.12. Chemical shift changes of the P-protons of Asp14 in MVIIA illustrating the lack of
titration of the adjacent carboxyl group, indicating its invovlment in salt bridge. By contrast, the shift
of a control random-coil peptide varies with an apprent pK, value of 3.7, as expected for an uncom-
plexed carboxyl moiety in peptides.
this is the case for the core of taxol, recent function of pH for nuclei near these ionizable
relaxation data (48) described in section 2.4 groups provide a convenient way of determin-
suggest that this conclusion may need to be ing the pKa value and hence charge state. Thib
reexamined. is illustrated for ziconotide in Fig. 12.12,
In addition to studies of the taxol core, where it was suspected that one of the ioniz-
there have been a large number of studies of able groups in the molecule, Asp14, may be
the conformations of the side chains of taxol involved in a stabilizing salt-bridge interac-
and it appears that these are certainly flexible tion (28). This was confirmed by noting that
and that the molecule may adopt both ex- the pK, value for this residue is lowered con-
tended and folded conformations of the side siderably relative to the usual value for Asp.
chains. In a case like this the observed vicinal- The P-proton chemical shifts were essentially
coupling constants are a weighted average of independent of pH over the range 3-7 (indicat-
those from the participating conformers. ing a pKa < 3), whereas those of a control,
random-coil peptide, titrated as expected over
2.3 Charge State this range.
An advantage of NMR over other structural
2.4 Tautomeric Equilibria
techniques such as X-ray crystallography is
that it has the potential to provide informa- Tautomerization is a relatively common fea-
tion not only on structure but also on the elec- ture of drug molecules that is amenable to
tronic properties of molecules. Many drug analysis through the use of chemical shifts or
leads contain ionizable groups and a determi- coupling constants as probes. This was re-
nation of their charge state in solution and/or cently demonstrated in a study of some non-
at the bound site is important in the design of peptide endothelin analogs (49).Starting from
analogs. Simple plots of chemical shifts as a the modestly active compound (1) (Table
2 Ligand-Based Design
- Base
Acid
12.51, derived by screening a compound li- tion of pH, from 2.65-9.05 (49). At acidic pH,
brary for ETAantagonists, the nanomolar in- compound (2) exists essentially in the closed
hibitor (2) was developed. Further optimiza- butenolide form. Because the pH is slowly
tion through examination of electronic and raised by addition of NaOD, the spectrum be-
structural requirements led to the subnano- gins to exhibit properties associated with the
molar inhibitor (3),which was subsequently open form keto-acid, and at basic pH the com-
put forward for evaluation in a number of pre- pound is essentially all in the open form. The
clinical disease models for stroke. coupling pattern shown by the benzylic pro-
These molecules display keto-en01tautomer- tons is a particularly characteristic marker of
ization, as illustrated in the following struc- the tautomeric process. At acidic pH the ben-
tures. The open form keto-acid salts and the zylic protons exhibit an AB quartet pattern
closed form butenolides exist in a pH-dependent consistent with the ring-closed structure. As
equilibrium in solution, and at physiological pH the pH is raised this pattern coalesces to a
both forms exist. In principle, the biological ac- singlet, broad at neutral pH and sharp at basic
tivity could reside in either or both forms. pH, as would be expected with the open form
The extent of tautomerization was estab- keto-acid structure. After the pH was basic,
lished by evaluation of NMR spectra as a func- addition of DC1 to acidify the solution caused
Compound R1
(1) PD012527 C1 H 430 27000
(2) PD155080 OCH, H >0.4 4550
(3) PD156707 0CH3 3,4,5-0CH3 0.3 780
(4) OCH, 3,5-0CH3,4-0(CH2),S03Na 0.38 1600
"From Refs. 49 and 50
NMR and Drug Discovery
the spectrum to return to its original appear- tivity (48,51-54). For example, dynamics may
ance, consistent with a reversible tautomer- influence entropic contributions to the free
ization process. energy of binding. In general the more flexible
Identical biological results were obtained a ligand is, the more unfavorable will be the
with the salt and closed butenolide form in all loss in entropy on binding, assuming a rela-
pharmacological assays, reflecting equilibra- tively rigid bound state of the ligand. How-
tion at physiological pH. This made it difficult ever, in some cases flexibility of a ligand may
to identify the biologically active form from be a positive factor. This applies, for example,
these experiments alone, although methyl- if a degree of flexibility is required to allow a
ation of the OH group in compounds 1-3 re- ligand access to a buried active site, or if acti-
sulted in a loss of activitv.
" Because these ana- vation of a receptor requires a conformational
logs cannot tautomerize to form open keto- change mediated by ligand binding (9). There-
acids, it seems likely that the open form is fore, a knowledge of the flexibility of lead mol-
responsible for activity. ecules is an important supplement to the
In addition to its impact on the biologically structural and electronic information avail-
active form, the tautomeric process has pro- able from NMR.
found implications for formulation of drug The two major NMR methods for obtaining
candidates, as illustrated in some recent fur- information on ligand flexibility are line-shape
ther development work on compound (3)(50). analysis and relaxation measurements (usu-
Although it is easy to synthesize and isolate ally 13C or 15N TI, T2,or heteronuclear NOE
water-soluble salts of the keto-acids, once they measurements). In general terms, the former
are placed in aqueous solution the tautomeric is sensitive to motions on the milli- to micro-
eauilibrium determines how much of each second timescale and the latter to nanosecond
form is present. Indeed, if the closed butenol- timescales. To some extent, structure calcula-
ide tautomer is sufficiently water insoluble, it tions on peptide-based lead molecules can also
can precipitate out of solution and the equilib- give an indication of regions of flexibility from
rium can drive the complete precipitation of an examination of local regions of disorder
the compound. Although (3)has good oral ac- among a family of calculated structures. Cau-
tivity, its intravenous use is limited by the in- tion must be exercised because other factors
solubility of the closed-form butenolide tau- can contribute to disorder, although in many
tomer without the use of a specific and cases there is a connection between disorder in
complex buffered formulation. Thus in recent a structural ensemble and molecular flexibil-
work a series of water-soluble butenolides was ity (55). A recent example concerns the solu-
developed (50) to overcome this limitation for tion structures of three isomers of the a-cono-
parented uses. This culminated in the devel- toxin GI (56). Attempts to increase structural
opment of (4) (Table 12.51, currently in pre- diversity through the engineering of nonna-
clinical evaluation. tive disulfide bonds showed that nonnative
This description of the development of (4) isomers were not onlv " different in conforma-
provides a good illustration of the fact that the tion but were also considerably more flexible
availability of an active molecule is not the end than the native isomer and had reduced activ-
of the drug development pathway, and that ity.
formulation considerations can be critical. In In an example that illustrates the applica-
this case NMR played a significant role in un- tion of NMR relaxation measurements for
derstanding tautomeric processes that had a studying ligand flexibility, Kessler and col-
direct bearing on solubility and hence formu- leagues (57) investigated the role of disulfide
lation. bonds in the a-amylase inhibitor tendamistat.
This small protein contains two disulfide
2.5 Ligand Dynamics: Line-Shape
bonds (Cll-C27 and C45-C73) and opening of
and Relaxation Data
the latter is known to reduce the melting tem-
It is increasingly being recognized that the so- perature of the protein (i.e., reduce its stabil-
lution molecular dynamics of drugs may have ity), but in this case does not affect its a-amy-
an important role in modulating biological ac- lase inhibitor function. The latter observation
2 Ligand-Based Design
Theoreticalb
Two-State
a
Experimental Isotropic Motion Internal Motion
Chemical Shift
Position (PP~) TI( 5 ) NOE TI(s) NOE T I (6) NOE
- -
Ha' 0
COOH
NH2
Figure 12.13. Schematic illustrations of motions of the outer ring of thyroxine. The dotted line
through the outer ring shows the jump axis about which the ring rotates. (a) Ha is shown in the
proximal position and is closer to the viewer than Hb because the torsion angle +' is greater than O".
This conformation corresponds to one of the two states of the two-state jump model and agrees with
the "twist" of the outer ring observed in the crystal structure. (b) Rotation about the dotted line
through the center of the outer ring moves Ha away from the viewer and brings Hb toward the
viewer. This corresponds to the second state of the outer ring in the two-state jump model. (c) Hb is
now in the proximal position and closer to the viewer than Ha. (d) Hb is in the proximal position and ,
is now further from the viewer than Ha. Transition from a to b and from c to d involves small
amplitude jumps on the nanosecond timescale and is detected by NMR relaxation measurements.
Although not illustrated in the figure, the inner ring also exhibits this type of motion. Transitions a
to c and b to d result in 180" flips of the outer ring and exchange of the environments of Ha and Hb.
This ring flip occurs on a microsecond timescale and is detected by variable temperature line-shape
studies. (Reprinted with permission from Ref. 52. Copyright 1996 American Chemical Society.)
can, in principle, be designed. The NMR ap- tor is crucial. It is clearly better if this can be
proach used in such pharmocophore modeling measured directly rather then be inferred
often involves a combination of many of the from the conformation of the free ligand. In
techniques already described. By determining certain circumstances this information on the
information about structure and electronic bound conformation can be obtained from the
properties for a range of different ligands, all transferred NOE (TrNOE) technique (67,68).
acting at the same receptor site, it is often This method takes advantage of the fact that
possible to infer information about the bind- NOEs build up more rapidly in a ligand-mac-
ing site, even if direct structural studies of this romolecule complex than they do in free
site are not possible. ligand, and given appropriate exchange
conditions for a mixture of ligand and macro-
2.7 Limitations of Analog-Based Design molecule (typically satisfied for KD 2
M-'1, then signals from a free ligand may be
Although a determination of the structure of
used to determine the bound conformation.
bioactive molecules is of key importance, there
The theory of the technique was reviewed
are distinct limitations on the use of solution
previously (69) and recent developments that
structures for drug design. In particular, un-
minimize potential artifacts from spin diffu-
less the molecule is rigid there is no certainty
sion have been described (5). Because it is not
that the solution conformation is the same as
necessary to monitor signals from the macro-
the bioactive bound conformation. For this
molecule in this technique, it is usually
reason there has been a shift over recent years
present in substoichiometric amounts, thus
to approaches in which information about the
requiring only minimal amounts of what is
bound state is obtained. The other approach
sometimes the more expensive component of
has been to probe the bound conformation by
ligand-macromolecule complexes. In addition,
making a range of constrained analogs of a
the molecular weight restrictions inherent in
flexible lead molecule, as illustrated earlier for
full 3D-structure determinations of complexes
endothelin.
are ameliorated and the conformations of li-
The most direct way of determining the
gands bound to very large macromolecules
conformation of a drug lead is to determine
may be determined. For example, the tech-
the full three-dimensional structure of its re-
nique was recently used to determine the
ceptor complex. This has now been achieved in
structure of an antibiotic bound to the ribo-
a significant number of cases but represents a
some (70).A range of other applications in-
substantial undertaking, as described in later
cluding enzyme-substrate, protein-carbohy-
sections of this chapter. A simpler approach
drate, and protein-peptide interactions have
that has also been applied is to use transferred
recently been summarized (5).
NOE methods, as described below. This ap-
In addition to its application as a tool for
proach fits at the interface of ligand-based de-
determining bound conformations of ligands,
sign and receptor-based design. It fits with the
the TrNOE method has also been used re-
former because no knowledge of the receptor
cently as a screening aid for the identification
structure is required, but it also fits with the
of ligands from mixtures th atbind to a protein
latter because it reauires the macromolecule
of interest. This application is addressed in
of interest to be inckded in the mixture to be
more detail later in this chapter.
analyzed. It is appropriate therefore to intro-
duce the topic here but also to discuss it fur-
ther in Section 3.
3 RECEPTOR-BASED DESIGN
2.8 Conformation of Bound Ligands:
Receptor-based design refers to the process of
Transferred NOEs
determining the three-dimensional structure
In ligand-based drug design it is not necessary of a macromolecular target and using this in-
to know the structure of the receptor, or even formation to design ligands to interact with it.
the location of the binding site, although the In general there have been few cases where
conformation of the ligand bound to the recep- the structure of a macromolecule or receptor
alone has been successfully used to design, de phy. Despite this limitation, NMR has made
novo, a ligand to interact with that receptor. major inroads into the macromolecular struc-
However, such an approach is likely to become ture determination process, and currently ap-
more common with improved computer-based proximately one-fifth of all new structures de-
approaches to molecular design in the future posited in the protein database have been
(71).Currently, the most common approach is determined by NMR spectroscopy.
to study a ligand-macromolecule complex and
to initiate the design process based on the in- 3.1.1 Overview of Approach. The basis for
teraction of the lead ligand with the macro- structure determination by NMR is that, by
molecule. determining a large number of distance re-
Although the structure of the macromole- straints between pairs of protons, it is possible
cule alone is of less interest than that of the to reconstruct a three-dimensional image of
complex, in many cases a determination of the the molecule. These distance restraints are de-
structure of the complex follows from earlier rived primarily from nuclear Overhauser ef-
studies on the unbound macromolecule. It is fect (NOE) measurements, which detect dis-
thus useful to describe the approaches to tances up to about 5 A. Over recent years such
structure determination of macromolecular distance restraints have been supplemented
targets. This is followed by a discussion of the by a range of other restraints, including dihe-
dynamic aspects of protein structures in dral angle restraints derived from coupling
Section 3.2, before addressing the main topic constant measurements and orientation re-
of macromolecule-ligand interactions in Sec- straints derived from residual dipolar cou-
tion 3.3. plings. These restraints are input into a simu-
lated annealing algorithm, which is used to
3.1 Macromolecular Structure
calculate a familv " of structures consistent
Determination
with the restraints.
The two major techniques for determining NMR is unique in that it can provide de-
three-dimensional structures of proteins or tailed and specific information on molecular
nucleic acids are X-ray crystallography and dynamics in addition to structural informa-
NMR spectroscopy. The crystallographic ap- tion. The use of relaxation time measure-
proach to structure determination is de- ments allows the relative mobility of individ-
scribed elsewhere of this volume and here the ual atomic positions within a macromolecule
focus is on NMR. NMR has been used to deter- to be determined. The dynamic information
mine the structures of proteins only for about obtained includes not oily the rates or fre-
the last 15 years, with the first NMR structure quencies of internal motions but also their am-
determination being made in 1985. NMR has a plitudes. Such amplitudes are often expressed
number of advantages over X-ray crystallogra- by order parameters. Not surprisingly, it is
phy, including the fact that the requirement observed in many cases that the termini of
that the protein needs to be crystallized is proteins are more flexible than internal re-
avoided, and that the dynamic information gions. More interestingly, NMR has provided
available from NMR studies com~lementsthe a number of examples where internal loops in
structural information. A major disadvantage proteins have been shown to have dynamics
of NMR spectroscopy, though, is that it is cur- that may be associated with their function. A
rently limited to the determination of struc- good example of this is HIV protease, where
tures of <35 kDa. With the development of NMR studies have identified reduced-order
new NMR techniques, such as TROSY (201, parameters in the flap region of the molecule
this seems certain to increase significantly that may reflect flexibility to allow entry of
over coming years, although the fact remains substrates or inhibitors into the active site.
that among all structures currently deposited In summary, a major strength of NMR is
in the protein database the average size of that a global picture not only of the structure
NMR structures is about 8 kDa (72),substan- but also of the dynamics of the macromolecu-
tially smaller than the average size of protein lar target is obtained. Further, NMR provides
structures determined by X-ray crystallogra- information on ionization states of titratable
NMR and Drug Discovery
groups and other electronic features within of structural genomics programs being devel-
macromolecules that may have an impact on oped. The demands arising from such pro-
ligand binding and function. grams will no doubt stimulate new methods
for the large-scale production of labeled pro-
3.1.2 Sample Requirements and Assignment teins (77, 78), and for speeding up the rate of
Protocols. Structure determination by NMR structure determination by both NMR and
typically requires 500 pL of a 1-2 rnM solution crystallography.
of the protein of interest. It is important that
the macromolecule does not aggregate be- 3.1.4 Dynamics. Proteins exhibit a range
cause this causes spectral broadening and may of internal motions, from the millisecond to
preclude assignment. The sample should pref- nanosecond timescale, and a full understand-
erably be stable in solution over the extended ing of how small drugs might interact with
period of time required to collect the range of such a "moving target" requires more than
NMR experiments (73-75) needed for assign- just the time-averaged macromolecular struc-
ment and structure determination. Individual ture. Thus, over recent years much effort has
experiments may last from a few hours to sev- been directed toward defining motions within
eral days, with several weeks of data acquisi- vroteins.
A
tion required for studies of larger proteins. The most commonly applied approach has
The particular set of NMR experiments re- been to use 13C or 15N relaxation parameters
quired for NMR structure determination de- such as TI, T,, and the heteronuclear NOE to
pends on the size of the protein. It suffices to derive correlation times for overall motion, to-
say that for smaller proteins ( 5 7 kDa) it is gether with rates and amplitudes of internal
usually possible to determine the structures, motions (79). Although the precise interpreta-
mainly using 2D NMR, without the need for tion of the NMR relaxation data in terms of
isotopic labeling, by use of procedures de- motional parameters remains dependent on
scribed in detail above for Ziconotide. For pro- the appropriateness of the motional model
teins in the range 7-14 kDa, 15N-labelingand chosen, the results from many studies on the
a combination of 2D13D NMR experiments is dynamics of proteins are sufficiently clear to
usually sufficient, whereas for larger proteins confirm that nanosecond timescale motions in
13C/15Nlabeling and 3D or 4D NMR is more or proteins are common. The functional signifi-
less mandatory. For proteins at the top end of cance of motions on the nanosecond timescde
the currently accessible range (25-35 kDa), remains unclear and so far there have been
there are additional advantages associated few cases where significant differences in mo-
with partial deuteration of the protein. tions on this timescale between ligand-free
and ligand-bound forms of proteins have been
3.1.3 Recent Developments. A number of measured. It will be interesting to assess the
recently developed methods offer the potential functional significance of such motions as
for improving the quality of NMR structures more data become available. However, slower
and for increasing the size of proteins that can motions have been correlated with function in
be examined. In particular, the use of residual a number of proteins, with a good example
dipolar couplings and of anisotropic contribu- being HIV protease, described in more detail
tions to relaxation provide new kinds of re- in Section 4.2.
straints that promise to lead to more accurate Relaxation measurements require a con-
NMR structures (74, 76). As already men- siderable investment of svectrometer time
A
tioned the TROSY method (20) exploits relax- and in some cases it may be possible to derive
ation phenomena to produce spectra with nar- basic information about molecular dynamics
row lines and promises to significantly expand from the structural ensemble alone. Although
the size of protein targets that can be exam- regions of disorder can reflect factors other
ined by NMR, from the current limit of about than dynamics, a recent analysis (55)suggests
35 kDa to perhaps >100 kDa. that ill-defined regions in structural ensem-
Another development that is likely to have bles often do reflect slow, large-amplitude mo-
a significant impact is the increasing number tions. Even if relaxation measurements are
3 Receptor-Based Design
posed above about the complex. Many of the that KD values may vary over a wide range,
NMR parameters that were described earlier typically from millimolar to nanomolar (i.e.,
for deriving information about ligands are also K, = lop3-lop9 M) for cases of interest, is a
applicable to studies of complexes. These in- reflection of a variation in k,, for different
clude chemical shifts, NOES, and relaxation ligands. Consideration of the k,, value above
parameters. However, the presence of two in- and the range of KD values noted suggests a
teracting partners means that there are some range in k,,from 10-I to lo5 s-I. The lifetime
differences in the way such parameters are of the bound complex (T, = llk,,) may thus
measured and this has led to the development vary from much less than a millisecond to tens
of several techniques that are particularly im- of seconds (lop5to 10 s based on the above off
portant for the study of macromolecule-ligand
rates). The exchange rate for the second-order
interactions, including chemical-shift map-
binding process is given by (82):
ping, isotope editing, and various NMR titra-
tions. Section 3.2.3 describes these tech-
niques. Finally, illustrative examples of the
application of these techniques to specific drug
design problems are given in Section 3.2.4.
where pMLand p, are the mole fractions of
3.2.2 Influence of Kinetics and NMR Time- bound and free ligand, respectively.
scales. Macromolecule-ligand interactions are The appearance of an NMR spectrum of a
characterized by an equilibrium reaction that protein-ligand complex is dependent on the
potentially has a wide range of affinities and rate of chemical interchange between free and
rates: bound states. In particular, the effects of ex-
change on an individual NMR parameter (e.g.,
chemical shift, coupling constant, or relax-
ation rate) depend on the relative magnitude
The rate constant for the forward reaction is of the exchange rate and the difference in the
referred to as the on rate (k,,), whereas disso- NMR parameter between the two states. The
ciation of the complex is characterized by the cases where the rate of interchange is greater
reverse rate constant, k,, The equilibrium than, about equal to, or less than, the param-
constant for this interaction, represented in eter difference are referred to as fast, interme-
terms of the dissociation constant of the com- diate, and slow exchange, respectively, as in-
plex KD, reflects a balance of the on and off dicated in Table 12.7.
rates, as shown in Equation 12.2: Table 12.7 shows that the changes in chem-
ical shifts on ligand binding (for signals either
from the ligand or from the macromolecule)
are in general greater than those for coupling
For many protein-ligand interactions k,, is of constants or relaxation rates. Given that 100
the order of 10sM-' s-I, and is typically quite s-' might represent a typical exchange rate
similar for different ligands. The observation between free and bound states, it is clear that
3 Receptor-Based Design
Bound
Intermediate
- 1 Figure 12.14. Schematic illus-
h tration of the effects of slow, in-
termediate, and fast exchange
on the appearance of peaks in
NMR spectra of macromolecule-
ligand complexes. In the slow ex-
change case separate peaks are
seen for free and bound forms.
Note the broader peak for the
bound ligand because it now
adopts the correlation time of .
the macromolecule. In the fast
exchange case only an averaged
Fast peak is observed.
individual NMR signals may be found in either This reflects the sensitivity of relaxation pa-
slow, fast, or intermediate exchange on the rameters to molecular mobility: a ligand un-
chemical-shift timescale, but it is more likely dergoes a greater relative change in mobility
that couplings or relaxation parameters will on binding than does a protein, given that the
be in fast exchange. Thus, in most cases where relative increase in molecular weight in the
the term "NMR timescale" is used in the liter- complex is much greater for the ligand than
ature, it refers to the chemical-shift timescale. for the protein.
The table also emphasizes that there are two The exchange regime (slow, intermediate,
types of signals that can be monitored, those or fast) determines how a spectrum of a pro-
from the ligand and those from the macromol- tein-ligand mixture changes during a titra-
ecule. In general, the typical magnitude of tion, or as a function of temperature. Figure
changes to chemical shifts or couplings of ei- 12.14 schematically illustrates the various ex-
ther type of signal on binding are similar, al- change regimes for macromolecule-ligand
though the changes to ligand signals may be binding interactions. Slow exchange, corre-
larger than those from the macromolecule. sponding to tight binding, is potentially the
However, changes to relaxation parameters most useful regime, given that much detailed
for signals from ligands are much more likely information on the nature of a complex can be
to be greater than those for protein signals. deduced in this case. Nevertheless, fast ex-
NMR and Drug Discovery
ranging from 0:l to 2.6:l. Although the spec- In the fast-exchange cases such as this it is
tra are complicated by overlap in some re- possible to obtain an estimate of the dissocia-
gions, it is clear that addition of the ligand tion constant for the complex (KD) and the
causes significant changes to the DNA peaks. bound chemical shift (vB) of DNA resonances
A typical example is seen for the T6 methyl by fitting the observed chemical shift changes
peak, for which addition of ligand causes both as a function of ligand concentration to equa-
an upfield shift and broadening of the peak at tion 12.7 (85). The parameters that best fit the
- of the titration. The chemical
certain stages experimental data for the T6 methyl peak
shift moves monotonically with ligand concen- were KD = 1.2 x M and (v, - v,) = 46
tration up to a mole ratio of 1:l and then Hz. Limitations on the accuracy of KDvalues
reaches a plateau, remaining constant as derived in this way were described previously
larger amounts of ligand are added. Broaden- (85).
ing of the peak reaches a maximum at a ligand: To further define the thermodynamic con-
DNA mole ratio of approximately 0.3. Both ob- stants associated with binding, the linewidth
servations are consistent with there being data were also quantitatively examined by use
moderately fast exchange on the chemical- of Equations 12.6 and 12.9. In the case of mod-
shift timescale between the free and ligand- erately fast exchange, a maximum linewidth is
bound forms of the DNA in solution. In this predicted at a 1igand:DNA mole ratio of 0.33
case, the observed spectral peaks reflect nei- (82, 85), and this was indeed observed in the
ther the free nor the bound form of DNA, but current case. Derived binding parameters
are averaged signals. were KD I1.0 x M, kOff= 250 s-l, (v, -
Ligand peaks are also in fast exchange, as v,) = 49 Hz, and LWB = 12 Hz, consistent
seen with the L( NH,) methyl peak, which first with the values derived from the analysis of
appears at a 1igand:DNA ratio of 1.36:l as a chemical shifts. Subsequent studies with the
shoulder on the overlapped T3 and T7 methyl related ligand L( NO,) showed similar binding
peaks at approximately 1.27 ppm. This peak is to L(NH,). However, a third ligand, L(Gly),
not initially visible in spectra at low ligand: was found to bind somewhat more tightly,
DNA mole ratios because of the small popula- with some signals in the intermediate ex-
tion of bound species and the overlapping change regime.
DNA peaks. It moves upfield with increasing 3.2.2.3 Intermediate Exchange. In this re-
ligand concentration and, again, represents an gime the rate of exchange between bound and
averaged peak intermediate in chemical shift free states is comparable to the differences in
between free and bound forms, reflecting fast- NMR parameters associated with the ex-
exchange kinetics. Eventually, the chemical change. In general the spectral peaks often be-
shift of this signal approaches that of the free come very broad and analysis is difficult. This
ligand at 1.1 ppm, measured in a separate ex- is the case, for example, for L(G1y). In the
periment with a solution of ligand alone. methyl region of the spectra shown in Fig.
3 Receptor-Based Design
Figure 12.16. Expanded regions from 300-MHz 'H-NMR spectra for complexes between L(NH,)
and d(GGTAATTACC), recorded at 10°C. The two small peaks at 1.12 and 1.14 ppm arise from an
impurity. Increasing ligand concentration causes an upfield shift of the T6 methyl resonance (a),and
causes the T7 and T3 resonances to become overlapped a t later stages of the titration (c). Peak (b) is
an averaged resonance from the ligand methyl groups intermediate in shift between the bound and
free forms of the ligand. (Reprinted with permission from Ref. 86.)
12.17, the T7 CH, signal moves upfield and mediate exchange between the free and bound
the T3 CH, signal moves slightly downfield forms, with k,, = (v, - v,). Based on the mag-
with increasing ligand concentration, as seen nitude of v, - v, for this resonance, k,, for
previously for L(NH,) and L(N0,). However, L(G1y) is estimated to be 50 s-l, which is sig-
in contrast to the case for the other ligands, nificantly slower than that for L(N0,) and
the characteristic broadening of peaks at in- L(NH2).
termediate ratios is non-Lorentzian, suggest- At a 1igand:DNAratio of approximately 1:1,
ing kinetics in the intermediate exchange re- the ratio of the integrals of the T6 methyl peak
gime. The T6 CH, peak does not shift in the and the overlapped T3 and T7 methyl peaks is
characteristic fast-exchange manner but, in- about 1:6. The expected value is 1:2, which
stead, a new broad resonance appears close to indicates that the bound ligand methyl peak (4
the expected position of the bound T6 CH, x CH,) is overlapped with the T7 and T3
chemical shift on the first addition of ligand, methyl peaks, as observed with L(NH,) and
and increases in intensity with increasing li- L(N0,). When the 1igand:DNA ratio is in-
gand concentration. This observation is con- creased beyond a 1:l ratio, a new peak appears
sistent with the ligand being in slow to inter- at about 1.15 ppm and increases in intensity as
NMR and Drug Discovery
Figure 12.17. Expansions from the 600-MHz 'H-NMR spectra for complexes formed between
L(G1y)and d(GGTAATTACC), showing the methyl resonances. The two small peaks at 1.12 and 1.14
ppm are attributed to an impurity. The complex nature of the T6 methyl resonance at 1igand:DNA
ratios less than 1:l (a), and the manner in which signal intensity increases at about 1.15 ppm a t
DNAligand
- ratios greater
- than 1:l (b). are indicative of intermediate exchange. (Reprinted with
permission from Ref. 86.)
the ligand concentration is increased. This methyl resonances of the ligand have complex
new peak corresponds to the methyl peak of characteristics reflecting slow-intermediate
the free ligand and its appearance in this man- exchange. At higher temperatures, k,, 2 (v,
ner is consistent with slow exchange on the -
v,), SO the signal appears as a fast-ex-
chemical-shift timescale. To confirm this, changed average between the free and bound
spectra of a 2:l mixture of L(G1y) and d(GG- resonances. From a qualitative analysis of the
TAATTACC), were acquired at different tem- spectra, k,, for L(G1y) was estimated to be
peratures (86), as illustrated in Fig. 12.18. 50-60 s-' at 283 K.
At low temperatures, signals at 1.15 and The fact that some peaks (e.g., the oligonu-
1.30 ppm (overlapped with the T7 CH, and T3 cleotide T7 and T3 methyl signals) exhibit fast
CH, peaks) attributable to the methyl groups exchange, whereas others in the same spec-
from the free ligand and bound ligand, respec- trum of the same complex exhibit slow-inter-
tively, are distinguishable. As the tempera- mediate characteristics, is a reflection of the
ture is increased, a broad peak appears be- different (v, - v,) values for different peaks.
tween these two signals (at -1.22 ppm). At the This emphasizes the point made earlier that
lower temperatures k,, I(v, - v,), so that the "exchange regime" is a relative expression
Figure 12.18. Expansions of 'H-
NMR spectra of a 2:l mixture of
L(Gly) and d(GGTAATTACC1, ac-
quired at different temperatures.
(Reprinted with permission from
Ref. 86.)
and depends not only on the rate of exchange, ligand interactions. As well as being able to
but also the size of the chemical shift differ- observe different nuclei, measurements may
ences involved. In summary, the observations be made of a range of different NMR parame-
suggest that k,, for binding of L(G1y) to the ters, including chemical shifts, linewidths,
oligonucleotide duplex is much slower than coupling constants, and relaxation parame-
that for the other two derivatives. This pro- ters. In addition, there are several specific
vides an illustration of the value of NMR as a NMR techniques that have been applied for
quick method for comparing the binding of dif- the measurement of these parameters. T h e ,
ferent ligands and for confirming ligand-bind- techniques that are particularly valuable for
ing hypotheses. the study of macromolecule-ligand interac-
The change in binding kinetics may be ra- tions are described in the following sections.
tionalized by considering the different struc- 3.2.3.1 Chemical-Shift Mapping.
.. - Chemical
ture of the L(G1y) ligand relative to the other shifts are exquisitely sensitive markers of the
ligands. It was anticipated that, upon binding local charge state and environment. Although
to the minor groove, the terephthalamides it is not possible to construct an accurate
would adopt a conformation in which the sub- model of a binding site from a knowledge of
stituent on the central ring would form part of the chemical shifts of a bound ligand, a quali-
the convex edge of the ligands and therefore be tative interpretation of changes in chemical
directed toward the "mouth" of the groove. shifts of the macromolecule on binding pro-
Given this binding arrangement, the ligand vides significant insight into the location of
L(G1y) would have a positively charged alkyl- the binding site. Traditionally, such studies
amine group positioned to interact with the were done using 1D NMR but are now increas-
negatively charged phosphate groups of the ingly done by 2D HSQC spectra. By simulta-
DNA backbone. The L(G1y) derivative also has neously obtaining information on chemical
a bulkier substituent than that of the other - number of sites in a macro-
shifts for a large
ligands and this is also consistent with some molecule and seeing which ones change when
differences in its binding. a ligand binds, and which ones do not, it is
possible to deduce the location of the binding
3.2.3 NMR Techniques. NMR is a particu- site. This *~rocedureis referred to as chemical-
larly versatile tool for the analysis of protein- shift mapping. A prerequisite of the approach
544 N M R and Drug Discovery
- -------
-G= G-=m ,=2EE' 2- If IP -q -e 2- e"L I- 3L, o x Gz
u , i ; i c o c u a c u w m c o S i ; i S S S
I-
I I I I I I I I
Figure 12.19. Chemical-shift perturbations of DNA protons upon ligand binding. The lighter and
darker columns represent shifts attributed to L( NO,) and L(NH,) derivatives, respectively.
is that the chemical shifts have been assigned. fects. However, significant chemical shift
Chemical-shift mapping by use of HSQC spec- changes were also observed for some major
tra is widely used in NMR screening ap- groove protons. This illustrates the general
proaches and we will defer a more detailed dis- point that sometimes allosteric effects can
cussion on it until Section 4. cause changes at sites not directly involved in
The relative simplicity of how chemical binding. In the case of DNA, binding pertur-
shift information localizes binding sites may bations in the major groove have also been
be illustrated by continuing with the example observed for other established minor groove
introduced above of terephthalamide binding binders such as distarnycin (go), netropsin
to DNA. Figure 12.19 shows that, upon bind- (91), and Hoechst 33258 (92). Based on NOE
ing of the terephthalamides L(NH,) and and crystallographic data, it was concluded
L(N0,) to d(GGTAATTACC),, the DNA pro- that the effects were caused by distortions of
tons on the four base pairs between A5 and A8 the B-DNA duplex, including changes in the
are perturbed to a much larger degree than "base roll" of residues within the binding site,
protons in the rest of the sequence. It is thus upon complexation. Electronic effects arising
likely that these four residues form the bind- from the close proximity of charged groups
ing site. on the ligand to neighboring nucleotides
A more detailed analysis allows the binding were also found to perturb major groove
site to be further localized to the minor, rather resonances.
than to the major, groove in the region of these In the case of the terephthalamides a com-
bases. A4, A5, and A8 are the only residues parison of the minor and major groove pertur-
containing easily detectable minor groove pro- bations for a particular residue shows that the
tons (H2). These resonances, which originate minor groove protons are affected to a much
from the floor of the minor groove, are shifted greater extent. This is particularly evident for
downfield with ligand binding, whereas most AS, where the H2 proton shifts by approxi-
other resonances are shifted upfield. This ob- mately 0.25 ppm and the H8 proton is not af-
servation is consistent with the ligands bind- fected (Fig. 12.19). It is difficult to conceive of
ing in the minor groove and has been reported a binding mode in the major groove that would
for other minor groove binders such as account for such a large effect on the minor
Hoechst 33258 (87) and SN-6999 (88, 891, groove A8 H2 resonance without a simulta-
where adenine H2 protons on the floor of the neous effect on the major groove protons of T7
groove experience deshielding ring current ef- and AS. The observed 1:l stoichiometry of the
: 3 Receptor-Based Design
complex excludes the possibility that the li- ple, the bound conformation of a ligand to be
gand binds to the major and minor groove at determined from NOESY data recorded in
the same time. It is more likely, therefore, that D20. By rerunning the spectrum in H20, ad-
binding in the minor groove causes distortion ditional NOEs to exchangeable amide protons
of the DNA structure so that perturbations on the protein may be detected, thereby pro-
are observed for the major groove protons of viding information on contacts between ligand
A5 and T6, but not neighboring nucleotides. and protein. Alternatively, 15Nor 13C signals
Other examples of the use of chemical-shift may be introduced selectively into either the
mapping to locate binding sites have been ligand or protein and editing techniques used
made for ligands binding to a range of drug to select only signals attached to these labels
targets, including immunophilins, matrix and their proximate protons. This was used in
metalloproteases, and DHFR. Some of these the first example of an isotope-edited study, in
examples are described in more detail in sec- this case to examine the binding of a 15N-la-
tion 3.2.5. beled peptide-based inhibitor to pepsin (93).
3.2.3.2 NMR Titrations. There are a num- Potentially, the most useful approach in-
ber of advantages in undertaking a titration of volves uniform labeling of one of the compo-
ligand against macromolecule or vice versa nents with either 15N or 13C and leaving the
rather than just examining the final complex. other component unlabeled. It is then possible
These include introducing the possibility of to edit the spectrum by selecting for interac-
distinguishing signals from the individual tions (either through bond or through space)
components on the basis of intensities at in- that connect protons that are both one-bond
termediate stages of the titration in the slow- coupled to 15N or 13C.Alternatively, the spec-
exchange case and obtaining kinetic and ther- trum may be filtered to specifically remove
modynamic parameters associated with the such signals, thereby selecting only signals in-
interaction in the fast-exchange case. Such ti- volving protons coupled to 14Nor 12C(i.e., on
trations may be done using either 1D or 2D the unlabeled component). It is generally eas-
spectra and are very useful for establishing ier to uniformly label the protein rather than
the exchange regime of the complex, as de- the ligand, and editing methods are highly ef-
scribed in Section 2.1. A variety of parameters ficient, thus making it easy to visualize just
may be monitored in the titration, although the protein. However, because ligand signals ,
the two most common are chemical shifts and are often of interest, filtering experiments
linewidths. Examples of such titrations are play a valuable role in visualizing them. Un-
given in Figs. 12.16 and 12.17. fortunately, filtering experiments are more
3.2.3.3 Isotope Editing and Filtering. Iso- susceptible to artifacts than are editing exper-
tope editing provides a powerful way of distin- iments, although there have been recent ad-
guishing between the components in a com- vances in reducing artifacts (94).
plex without the need for a titration. It is one Another possibility is to use half-edited/
of the most useful tools for the studv " of mac- half-filtered 2D experiments to detect NOEs
A
romolecule-ligand complexes, and indeed the that specifically involve interactions between
background NMR technology that underpins protons attached to 15Nor 13Cand those that
isotope editing was developed specifically for are not. This approach is used, for example, to
the study of complexes. The principle of the detect intermolecular NOEs between a la-
approach is illustrated in Fig. 12.20 and is beled protein and an unlabeled ligand. Exam-
based on the use of isotopes to select for sig- ples of isotope editingtfiltering are given in
nals from either the ligand or macromolecule, section 3.2.4.
or signals exclusively linking both of them. 3.2.3.4 NOE Docking. In many cases the
The conceptually simplest approach is to study of a complex may follow a previous
uniformly deuterate the macromolecule, structure determination of the isolated macro-
thereby removing its signals from 'H-detected molecule and in that case it may be possible to
NMR spectra, and allowing signals from only determine much information about a complexA
Unlabelled Deuteration
Editing
Filtering
Figure 12.20. Isotope editing and filtering can be used to select signals from either the ligand or the
protein. (a) Normal protein and ligand with no filtering or editing. (b)Selection of the ligand signals
by 2H labeling of the protein. (c) Selection of protein or ligand signals by 13C andlor 15N labeling1
editing. (d) Removal of protein or ligand signals by 13C or 15Nfiltering.
Gradwell and Feeney (95) recently analyzed structure determinations, but it was found
factors important in such NOE docking exper- that more constraints per torsion angle are
iments. In their analysis, a high resolution X- required to define docked structures of similar
ray structure of a protein-ligand complex was quality. This is because the conformation and
used to simulate loose distance restraints of orientation of the ligand are defined only by
varying degrees of quality that might typically NOES and not by covalent attachment, as is
be estimated from experimental NOE intensi- the case for amino acid side chains in a protein
ties. These simulated data were used to exam- structure. The effectiveness of different NOE-
ine the effect of the number, distribution, and constraint averaging methods was explored
representation of the experimental con- and the benefits of using "Rp6 averaging"
straints on the precision and accuracy of the rather than "center averaging" with small
calculated structures. A standard simulated sets of NOE constraints were demonstrated.
annealing protocol was used, as well as a more With these considerations in mind it appears
novel method based on rigid-body dynamics. that NOE docking can be a very cost-efficient
The results showed some parallels with those procedure for defining the environment, ori-
from similar studies on complete protein NMR entation, and conformation of ligands.
3.2.4 Selected Examples. Applications of ally related molecules such as distarnycin and
the various NMR techniaues described are netropsin (87, 90, 91, 97-99). The first struc-
now illustrated with selected examples. The tural studies of Hoechst 33258 complexed to
examples have been chosen to give a broad short sequences of synthetic oligonucleotides
perspective on the types of NMR experiments were done using X-ray crystallographic meth-
that can be done and the types of information ods (100-102). NMR and further X-ray stud-
they provide. Specifically, the first example ies followed (92, 103-107). Three of the X-ray
covers the case of drug-nucleic acid binding studies (100, 101, 103) used the EcoRI
and focuses on more traditional NMR experi- sequence d(CGCGAATTCGCG), and another
ments, involving relatively standard homo- (102) used the sequence d(CGCGATAT-
nuclear methods. The second example covers CGCG),. Both sequences fulfil the require-
binding of moderately large ligands to immu- ment of at least four consecutive AT base
nophilins and highlights modern isotope edit- pairs, and the resulting complexes showed
ing techniques. The third example, covering similar modes of binding. In all of the X-ray
ligand binding to a matrix metalloproteinase, studies, the Hoechst ligand was found to bind
also highlights the importance of these tech- to the minor groove.
niques and shows how relatively simple spec- The NMR studies of complexes between
tra involving lsF-containing ligands can be Hoechst 33258 and oligonucleotide sequences
very informative. The fourth example de- provided complementary information to the
scribes ligand binding to DHFR, one of the crystal structure data (92, 103-106). Because
most extensively studied systems by NMR, the binding is reversible, the NMR data offer
and illustrates the derivation of a broad range the opportunity to derive information about
of kinetic and geometric information on inter- the kinetics of the interaction. As with the
molecular complexes. The final example, on crystallographic studies, the oligonucleotide
HIV protease, describes how NMR comple- sequences were designed to contain runs of AT
ments X-ray studies and provides information base pairs. Some NMR studies were per-
on dynamic motions within complexes. formed with dodecanucleotide sequences used
3.2.4.1 DNA-Binding Drugs. The NMR ap- in crystallographic studies, including d(CGC-
proaches that have been used to examine the GAATTCGCG),, which allowed a direct com-
interactions of minor groove binding drugs parison with the crystallographic data. Exper-
with DNA can be illustrated with studies iments were also performed with sequences
on the bisbenzimidazole-based compound, specifically designed to investigate different
Hoechst 33258, (9).It has been used widely as aspects of the interaction. The sequence
a fluorescent cytological DNA stain and is also d(CTTTTGCAAAAG), was designed to offer
active as a n anthelmintic agent. It has activity two binding sites, and it was shown that two
against intraperitoneally implanted L1210 Hoechst molecules interacted with the DNA
and P388 leukemias in mice (96). duplex in symmetry-related orientations at
Footprinting studies (96) have shown that the 5'-TTTT-3' and 5'-AAAA-3' sites (92).
sequences of four AT base pairs are a prereq- 3.2.4.7.7 Stoichiometry and Kinetics. The
uisite for strong binding to DNA, consistent starting point in studies of ligand-DNA com-
with similar observations for other structur- plexes is usually a titration experiment to es-
NMR and Drug Discovery
1 2 3 4 5 6 7 8 9 1 0
G G T A A T T A C C
C C A T T A A T G G
1 0 9 8 7 6 5 4 3 2 1
1.4
1
1 .O
Chemical shift (ppm)
-
1 2 3 4 5 6 7 8 9 1 0
G G T A A T T A C C
CCATTAATGG
20191817161514131211
Figure 12.21. 1D 'H-NMR spectra (recorded at 20°C) illustrating the thymine methyl region for the
symmetrical ligand-free duplex (a) and for the 1:lHoechst:d(GGTAATTACC), complex (b), which is
no longer symmetrical because of the ligand binding. x corresponds to a small impurity peak. The
DNA strands are numbered to the right of the spectra and the approximate location of the ligand is
indicated by a black bar. (Reprinted with permission from Ref. 105. Copyright 1993, Blackwell
Publishing Science.)
tablish the nature and stoichiometrv " of the added to d(GGTAATTACC),, the free DNA
complex. Complexes between the ligand and signals completely disappear at a DNA:drug
DNA duplex are obtained by adding small ali- ratio of 1:1,and the number of new resonances
quots of ligand solution to a sample of the is twice the number of previously observed
DNA dudex* with one-dimensional 'H NMR free DNA resonances (Fig. 12.21). This is a
spectra acquired after each addition. The ef- common feature of com~lexeswith 1:l stoichi-
fects observed on the NMR spectrum after ometry and reflects a loss of the dyad symme-
each addition reveal whether an interaction is try of the duplex attributed to ligand binding.
taking place and allow the interaction to be Upon addition of Hoechst 33258 to
characterized as fast or slow exchange on the d(CTTTTCGAAAAG),, the free DNA signals
NMR timescale. The stoichiometry of the in- completely disappeared at a ratio of 2:l drug:
teraction can also be determined from the ti- DNA and there was no doubling of the number
tration. of DNA resonances in the spectrum (92).From
In general, the addition of Hoechst 33258 this, it could be concluded that two molecules
to the oligonucleotide duplexes causes a de- were bound per duplex in a manner that re-
crease in the intensity of free DNA resonances tained the dyad symmetry of the DNA duplex.
and a concomitant increase in the intensitv" of The binding was also determined to be coop-
new resonances, which appear in previously erative, in that no intermediate 1:l complex
unoccupied spectral regions. This is consistent was detected (92). The formation of a 1:l com-
with the free and bound forms of the DNA plex would have resulted in a very complicated
duplex being in slow exchange with each spectrum at intermediate 1igand:DNA ratios,
other. For example, when Hoechst 33258 is given that resonances arising from the free
3 Receptor-Based Design
University Press.)
DNA, the 1:l complex and the 2:l complex, indicating that the exchange is slowed at lower
would have produced four times as many ob- temperatures. The exchange rate was esti-
servable peaks relative to the free DNA spe- mated to be <10 s-' at 10°C (92).
cies. At intermediate ligand concentration, The ability to observe such dynamic ex-
however, only two sets of peaks arising from change phenomena is one of the strengths of
DNA molecules were detected. In the 2:l com- NMR relative to X-ray crystallography and
plex only four thymine methyl resonances several examples of these phenomena are de-
were detected (1.0-1.5 ppm), as expected for a scribed later in the chapter.
symmetrical DNA duplex. These are all over- 3.2.4.1.2 Binding Site. A combination of
lapped in the free DNA spectrum. In the 1:l chemical shift and NOE information can be
mixture, only signals from free DNA and from used to locate and characterize binding sites.
the 2:l complex were detected. Chemical-shift differences between reso-
The reversible nature of the Hoechst:DNA nances arising from free and bound forms of
interaction is illustrated by the observation of DNA are indicative of the nature of the inter-
chemical-exchange cross peaks in NOESY action. In all studies of the Hoechst complexes
spectra of mixtures of free and complexed oli- described above (92, 104-107) significant
gonucleotides (92, 104). This may be seen in changes to the chemical shifts of thymine H1'
the NOESY spectrum of a mixture of free and protons and adenine H2 protons were ob-
complexed d(CTTTTCGAAAAG),, shown in served, in contrast to the generally small per-
Fig. 12.22, in which many chemical exchange turbations observed for the base HWH6 and
cross peaks are observed between resonances CH, resonances located in the major groove.
arising from the free and bound oligonucleo- perturbations of this nature are consistent
tide. In a NOESY spectrum acquired at lower with binding to the minor groove. In some in-
temperature, the intensity of these chemical- stances, significant perturbations were ob-
exchange cross peaks is significantly reduced, served to major groove protons located well
NMR and Drug Discovery
methylpiperizine moieties were found to point change cross peaks between symmetry-related
toward the center of the duplex, as indicated protons on opposite sides of the dyad axis of
by NOEs between the protons from the piper- the DNA duplex. The mechanism by which
izine ring and the 5'-terminus of the adenine this occurs has been described as dissociation
tract (Fig. 12.24). Corresponding NOEs were of the Hoechst molecule from the duplex, fol-
also observed between the drug phenolic pro- lowed by a 180" reorientation and rebinding
tons and the 5'-terminus of the thymine tract, (105, 106). The self-complementary nature of
as well as the 3'-terminus of the adenine tract the sequences ensures that the same complex
of the complementary strand. This model did is formed for either ligand orientation but
not indicate any interaction with the central with the net effect of interchanging the two
GC base pairs (92). strands with respect to the orientation of the
The orientation of the ligand was similarly Hoechst molecule. The rate at which this pro-
determined in the 1:1 complexes based on in- cess occurs was estimated using cross-peak in-
termolecular NOEs between protons located tensities in the NOESY spectrum (106). When
at the extremities of the Hoechst molecule and interacting with d(GGTAATTACC), and
protons of the binding site. For example, in the d(GTGGAATTCCAC),, the lifetime of the
interaction with d(GTGGAATTCCAC),, Fede complex in each state (Ilk,) was reported to
et al. (106) reported NOEs between protons be approximately 0.8 and 0.45 s, respectively
from the piperizine moiety and the HZ and (105, 106). These values indicate a small but
HI' protons of the dinucleotide fragment significant difference in the affinity of Hoechst
d(A5T5).d(A6T6). for TAATTA and GAATTC sites.
3.2.4.1.3 Dynamic Processes. The binding Intramolecular dynamic processes that are
of the Hoechst molecule to the self-comple- fast on the NMR timescale are also observable
mentary oligonucleotide duplexes in a 1:l ra- in the 'H-NMR spectrum of the bound
tio lifts the dyad symmetry of the duplexes so Hoechst molecule. Resonance averaging is ob-
that two sets of DNA resonances are observed. served for the H2/H6 and H3/H5 protons of
This indicates that the drug is in slow ex- the phenol group, which is consistent with the
change between the free and the bound forms. environments on either side of the ring being
Close examination of the 2D NOE data, how- averaged by rapid ring-flipping motions about
ever, reveals the presence of chemical-ex- the C4-C2' axis. This occurs despite the appar-
NMR and Drug Discovery
ent tight fit between the phenyl ring and the duction pathways leading to T-lymphocyte ac-
walls of the minor groove, which, in a static tivation. FK506 (10) and rapamycin (11)in-
model of the complex, must present a large
barrier for rotation. It was estimated (105)
that the rate for this process is as high as 1000
s-l. This is much higher than the rate of in-
terconversion between free and bound forms
of the duplex; thus, dissociation of the drug
from the complex cannot be the rate-limiting
factor for phenol ring flipping. Dynamic fluc-
tuations of the DNA conformation are more
likely to provide the rate-limiting step.
3.2.4.7.4 Summary of Solution Studies. The
data obtained from these NMR studies are
consistent with the bound ligand fitting
tightly within the minor groove of AT tetram-
ers, with the aromatic rings of the ligand being
roughly coplanar. The AT tract provides the
key recognition features required for binding,
including the narrowness of the minor groove.
The importance of van der Wads interactions
is evident, given the large number of NOE con-
tacts between the ligand and the walls and (10) FK506 R = CH2CHCH2
floor of the groove. Hydrogen bonding also (12) Ascomycin R = CH2CH3
plays a significant role in stabilizing the inter-
action, as do electrostatic interactions be-
tween the positively charged piperizine ring
and the minor groove. Electrostatic interac-
tions are also likely to play a significant role in
orienting the ligand within the binding site, as
shown in the 2:l complex, where the pipera-
zine rings point toward the center of the du-
plex where the positive charge is best stabi-
lized (92).The information derived from these
studies, as well as from NMR studies of the
interactions of other minor groove binders
with DNA, is useful for the design of ligands
with altered specificity or increased binding
affinity, with the overall goal being the devel-
opment of novel drugs.
3.2.4.2 lmmunophilins: Studies of FK506
u
Analog Binding to FKBP. Some of the most de- (11) Rapamycin
tailed investigations of the interaction be-
tween ligands and their target proteins have hibit the cis-trans isomerase activity of FKBP,
been made for the immunophilin class of pro- whereas cyclosporin A (structure shown in
teins. The major FK506 binding protein Fig. 12.3) inhibits that of Cp. NMR has con-
(FKBP) has a molecular mass of about 11.8 tributed significantly to the understanding of
kDa, whereas cyclophilin (Cp) has a mass of binding interactions to both proteins.
about 17 kDa. These proteins are unrelated in Initial studies on FK506 focused on the
amino acid sequence but both have peptidyl- structure of the free ligand to aid in the design
prolyl cis-trans activities that are inhibited by of further analogs (108-110).However, it was
immunosuppressants that block signal trans- established from studies of the cyclosporin A-
3 Receptor-Based Design
cyclophilin complex that the conformation of a In another study (116), a uniformly 13C-
molecule bound to its target site may be very labeled ascomycin, (12),was prepared, allow-
different from that in the free state (111-113). ing the bound conformation of ascomycin to be
In addition, analog design is assisted by know- determined in the presence of FKBP. The en-
ing the location of the binding region of the hanced 13C signals were used to edit the 'H
ligand. Studies were therefore undertaken to NOESY spectra used for the structural analy-
determine the bound state of the ligand as well sis. Not only were the assignments of side-
as to identify those portions of the drug inter- chain methyls made possible by the 13C en-
acting with the binding protein. richment, but ligand resonances could be
The first investigations involved the analy- distinguished readily from those of the pro-
sis of 13Ccarbonyl chemical shifts of C8 and C9 tein. The conformation of the ligand was de-
and the 'H chemical shifts of the piperidine termined from NOEs observed in a 3D
ring of FK5O6 bound to FKBP (114,115). The HMQC-NOESY spectrum. The resulting asco-
upfield shifts of the piperidine ring protons, as mycin structure (Fig. 12.25) differed consider-
well as NOEs observed between these protons ably from that of the uncomplexed FK506 ob-
and aromatic protons of FKBP, suggested that tained by X-ray crystallography, but was
the bound site on FKBP resided in an aro- similar to that of rapamycin. In particular, the
matic-rich domain, and allowed a putative bound ascomycin displayed a trans orientation
binding site on FKBP to be proposed. It was of the 7,8-amidebond, whereas this bond is cis
also evident that the pipecolinyl functionality in free FK506 and trans in rapamycin. The
of FK506 and analogs was involved in the backbone structure of the macrocyclic ring dif-
binding face of the ligand. fered from that of uncomplexed FK506, but
NMR and Drug Discovery
showed a similarity in the piperidine ring re- and CH-CH NOEs from the same experiment
gion to that of rapamycin. This study also repeated in D,O. Hydrogen bond constraints
showed that both the piperidine ring and the were obtained by the identification of slowly
pyranoyl moiety of ascomycin are involved in exchanging amide protons from a series of
the binding interface in the complex with HSQC spectra acquired over several days.
FKBP. Ligand protons that show NOEs to the Torsional angle- constraints were obtained
protein are in bold in Fig. 12.25. X-ray studies from coupling constants measured in a 2D
since have confirmed these results for both the HMQC-J spectrum of [U-15N]FKBP/ascomy-
FK506-FKBP and rapamycin-FKBP com- cin. In all, 1958 distance constraints were ap-
plexes, showing the trans orientation of the plied to the structure calculation, with the ex-
ligand amide bond in the bound conformation, tra resolution afforded by isotopic labeling, as
and verifying the involvement of the piperi- compared with the 590 and 1047 restraints
dine and pyranoyl regions of the ligand in the used in earlier homonuclear studies (118,
binding interface (117). 119). Restraints defining the structure of
The binding site of the FKBP complex has bound ascomycin were obtained from the pre-
also been investigated through use of NMR viously reported data of Petros et al. (116)and,
spectroscopy. Michnick et al. (118)and Moore along with the intermolecular NOE-derived
et al. (119) solved the structure of uncom- distance constraints also reported in their
plexed FKBP by use of 'H-NMR methods. Al- study, the complete ascomycin/FKBP solution
though spectral overlap did not allow every structure was calculated.
structural constraint present to be identified The extra detail afforded by the multi-
unambiguously, convergent structures defin- dimensional NMR approach allowed the
ing the global fold of the 107-residue FKBP ligand-protein contact area to be located un-
protein were obtained. Previous biochemical ambiguously and even specific intermolecular
data allowed the extensive aromatic cluster hydrogen bonds identified. The structure of
within the core of the structure to be identified the complexed FKBP was essentially similar
as the ligand-binding pocket. The loop regions to that of the uncomplexed structure, except
of the protein between residues 37-43 and 83- that the "ill-defined" loop regions between
90, situated at the open end of the binding residues 36-45 and 78-92 were found ,to
pocket, were also of interest. The loops were adopt well-defined conformations in the com-
the least well defined regions of FKBP and plexed proteins, as preempted by previous
were thought to be flexible, and perhaps in- studies. Although this difference may partially
be a result of the differences in resolution
volved in the binding interaction. Examina-
achieved in the complexed and uncomplexed
tion of lH and 15N chemical-shift changes on
FKBP NMR studies, generally it was thought
addition of ligand supported this notion and
that binding involved some rearrangement of
suggested that significant structural changes the 36-45 and 78-92 loops. This provides a
in these loop regions occurred upon ligand good example of the dynamic nature of protein
binding (118). binding as revealed by NMR spectroscopy.
In a later study, a high resolution structure The dynamic aspects of the ligand-FKBP
of the complete ascomycin-FKBP complex was complex formation were pursued by Cheng
calculated by heteronuclear 3D and 4D NMR et al. (121) through analysis of 15N-NMRre-
by Meadows et al. (120). Uniformly labeled laxation data. In particular, the increased
[15N]FKBP and [13C,15N]FKBP were pre- backbone mobility for several residues
pared and incubated with unlabeled ascomy- within the 36-45 and 78-95 loops compared
cin to form the complexes. Three-dimensional with that of the rest of the protein was
NOESY-HSQC spectra, resolved according to noted. From analysis of the 15N relaxation
15N shifts, were used to obtain the NH-NH rates of FKBP complexed with FK506, it was
NOEs within FKBP. CH-NH NOEs were de- found that flexibility was restricted along
rivedfrom a4D [13C,1H,15N,1H]-NOESYspec- the entire polypeptide chain (122). This con-
trum of the doubly labeled material in H,O firmed the proposition that the binding in-
teraction of FKBP with ligand involves sta-
bilization and structuring of the protein
loops adjacent to the binding site.
In summary, it was possible not only to de-
fine the free and bound conformations of the
ligand but also to identify the two binding in-
terfaces involved in the interaction and dem-
onstrate a reduction in protein mobility in a
defined region of the protein upon binding.
This level of analysis was possible because of
the tight binding of the FKBP-ligand complex,
its small size, and the availability of labeled
species. The information proved to be comple-
mentary to X-ray crystallographic studies and
will help to clarify the role of FKBP complex
formation in immunoregulation.
3.2.4.3 Matrix Metalloproteinases. Matrix
metalloproteinases (MMPs), including stro-
melysin, collagenase, and gelatinase, are in-
volved in tissue remodeling associated with
embryonic development, growth, and wound
healing. Unregulated or overexpressed MMPs
have been implicated in several pathological
conditions, including arthritis and cancer, and
inhibitors of stromelysin and other MMPs
have attracted much interest because of their
potential for the treatment of these diseases.
Several NMR structural studies of strome-
lysin (123-127) and collagenase (128, 129) Isotope editingtfiltering studies played an
complexes have been reported. The secondary important role in defining interactions be- a
structure and global fold have been found to tween the ligands and stromelysin. For exam-
be quite similar for the catalytic domains of ple, for the stromelysinPNU-107859 complex
both enzymes and their various complexes a 3D 12C-filtered,13C-editedNOESY spectrum
with ligands. The active site in each enzyme is recorded on the [12C,14NlPNU-107859/[ C, Nl -
13 15
a cleft spanning the width of the enzyme, with stromelysin complex was used to assign pro-
a catalytic zinc atom coordinated by three his- teinlligand NOEs. Of the 11 observed NOEs
tidine residues located in the center. Different between the ligand and protein aliphatic pro-
dynamic properties of active-site residues in tons, nine involved the aromatic ring of (13)
stromelysin/ligandcomplexes (3) and of colla- and one involved the terminal methyl group.
genase with and without bound inhibitor (128,
NOEs were observed between (13) and pro-
129) have been reported. It has been proposed
tons of Tyr155, His166, Try16', and Ala16'. All
that structural and dynamic differences can
be exploited in structure-based drug design to four of these residues are located in the Sl-S3
achieve broad inhibitor activity against sev- binding sites on one side of the active site.
eral MMPs or to obtain more selective inhibi- Comparison of 2D 'H-15N HSQC spectra
tion (3). showed that differences between the lH and
Of recent interest have been structural 15Nchemical shifts for the stromelysin/l3 and
data on a novel class of MMP-binding inhibi- stromelysin/l4 complexes are concentrated in
tors, represented by PNU-107859 (13) and the active site, indicating that no gross confor-
PNU-142372 (14), which contain a thiadiazole mational differences in protein structure ex-
moiety that coordinates the catalytic zinc ist. The aromatic rings of (13)and (14) bind in
atom through its exocyclic sulfur atom (130). the same region of the protein.
556 NMR and Drug Discover
Bound
-1 40 -1 45 -150 -155 -1 60
'F Chemical shift (ppm)
Figure 12.26. Region of the ID 19Fspectrum of the stromelysiflNU-142372 complex. Signals from
free (sharp) and bound (broad) PNU-142372 are observed. (Adapted from Ref. 3 and reprinted with
permission from Elsevier Science.)
A region of the 1D 19F spectrum of the in chemical shift between bound and free sig
stromelysin/l4 complex is shown in Figure nals reflects the different environment of thl
12.26 (3). Two separate resonances were ob- bound and free states. Third, signals from t h ~
served for the two ortho fluorine atoms of the bound ligand are broader than those from t h ~
bound ligand in contrast to the single reso- free ligand because of the higher molecula
nance observed for both ortho protons of weight of the complex but are still clearly vis
stromelysin-bound (13), indicating that the ible for a complex of this size.
ring flip rate (rotation about the CP-CY bond) NMR studies have also been reported fo:
is reduced for stromelysin-bound (14) com- ligands bound to collagenase. Interest so fa
pared to stromelysin-bound (13). A ring flip has focused on hydroxamate-containing li
rate of approximately 100 s-' was estimated gands, where it has been shown that bindinl
from the difference in linewidths for the causes a decrease in mobility of some but no
bound ortho and para fluorine atom reso- all active-site residues (128, 129). Interest
nances of (141, more than two orders of mag- ingly, some active-site residues adjacent tc
nitude slower than the ring flip rate for (13). residues that interact directly with inhibito:
The 19Fspectrum in Figure 12.26 illustrates were found to have high mobility both in thc
several general principles that are useful in presence and the absence of inhibitor (129)
NMR studies of ligand macromolecule com- This contrasts with what is observed foi
plexes. First, note that the use of a rare probe stromelysin complexed to hydroxamate li
nucleus such as 19F produces spectra of ele- gands and a more complete understanding o
gant simplicity. Because there is no naturally the dynamics of the respective interaction!
occurring 19Fin the macromolecule, it gener- may provide critical information for drug de
ates no interfering signals. Second, the offset sign (3).
3 Receptor-Based Design
Hydroxamate-containing ligands have also tracellular enzyme that is the target of several
featured in other NMR studies, this time using clinically used drugs, including methotrex-
transferred NOESto determine their bioactive ate (151,an anticancer compound, and tri-
conformations (131). TrNOE data were used methoprim (16),an antibacterial. These act by
to determine the conformation of the inhibi- inhibiting the enzyme in malignant cells and
tors when bound to stromelysin. The NOE- parasites, respectively. The small size of
derived structures of the bound inhibitors DHFR (18-20 kDa) makes it amenable to
were used as templates to screen a database of structural studies and there have been numer-
260,000 compounds. Eighteen of the 23 com- ous complexes determined using both X-ray
pounds identified for which stromelysin bind- and NMR methods. The focus here will be on a
recent illustrative example of the structure of
ing data were available had affinities less than
a new complex of DHFR with trimetrexate
200 nM, demonstrating the value of deriving a
(17). Trimetrexate was initially investigated
conformationally restricted template for
as an antimalarial agent but has subsequently
structure-based drug design (131). This study been found to have antineoplastic activity
also demonstrates the close synergy that ex- against breast, neck, and head cancers. It has
ists between structure-based design and also been used as an antibacterial for the
screening approaches, either in silico or exper- treatment of Pneumocystis carinii pneumonia
imental. in AIDS patients. As seen from the following
3.2.4.4 Dihydrofolate Reductase. Dihydro- structures, trimetrexate combines some of the
folate reductase (DHFR) is an important in- features of trimethoprim and methotrexate:
558 NMR and Drug Discove
Figure 12.27. Stereoview of a superposition over the backbone atoms (N,Ca, and C) of residues
1-162 of the final 22 structures of the DHFR-trimetrexate complex. (a)View of the protein backbone
and the trimetrexate heavy atoms. (b) View of trimetrexate in the binding site of enzyme. (c) Con-
formation of trimetrexate in the binding site of enzyme. The orientation of trimetrexate is identical
for (a)-(c)and only its heavy atoms are shown. (Reprinted with permission from Ref. 132. Copyright
1999 The Protein Society.)
tional differences were detected between the ments were also used to probe dynamics of the
different complexes. The 2,4-diaminopyrimi- protein and no large amplitude motions were
dine-containing moieties in the three drugs found, apart from that at the C-terminus
bind essentially in the same binding pocket (132). The power of NMR methods for study-
and the remaining parts of their molecules ing dynamics of complexes is further illus-
adapt their conformations such that they can trated by an earlier study of the complex of
make effective van der Wads interactions DHFR with methotrexate (133). In this case a
with essentially the same set of hydrophobic correlated dynamic rotation of a carboxylate
amino acids. The side-chain orientations and group on the ligand and of the protein
local conformations are not greatly changed in was detected, as illustrated in Fig. 12.28.
the different complexes. 3.2.4.5 HIV Protease. Because of its essen-
The ring flipping of the trimethoxy aro- tial role in the HIV life cycle, H N protease is a
matic ring mentioned above was detected by major target for structure-based design of
variable-temperature studies of the spectral antiAIDS drugs. There are now more than
line shape. The presence of such dynamics 100 structures of HIV protease and protease
processes involving the ligand appear to be not inhibitor complexes in the HIV-protease
uncommon in macromolecule-ligand com- structure database (134-136) and the avail-
plexes and the ability of NMR methods to de- ability of this wealth of high resolution struc-
tect such phenomena represents one distinct tural information has been the driving force
advantage of NMR over X-ray methods of behind numerous structure-based design pro-
structure determination. Relaxation measure- grams (134, 135, 137). Most of the high reso-
560 NMR and Drug Discove
lution structural information on HIV protease by the fact that the protease undergoes rapi
has been obtained from X-ray crystallography autocatalysis in solution. It required the dc
data (136). Although there are relatively few velopment of potent inhibitors before NMI
examples of HIV proteaselinhibitor complexes studies of the complex became feasible. Th
that have been determined by use of NMR first solution structure (Fig. 12.29) of HIV prc
spectroscopy, the NMR data, taken together tease bound to the cyclic urea inhibitor DMF
with the structural data from X-ray experi- 323 (18)was reported in 1996 (138).
ments, have contributed to an understanding
of protease-inhibitor recognition and dynam-
ics. Indeed, studies of HIV proteasefinhibitor
complexes are a powerful example of the way
in which complementary information ob-
tained from X-ray crystallography and NMR
spectroscopy can be used to facilitate struc-
ture-based drug design.
HTV proteaselinhibitor complexes have a
molecular weight of approximately 22 kDa. Al-
though NMR spectroscopy is well suited to de-
termination of the structure of molecules in
this size range, efforts to determine the solu-
tion structure of the complex were hampered
3 Receptor-Based Design
The protease exists as a homodimer. Each the crystal structure of the complex. A more
99-residue monomer contains 10 P-strands recent NMR study investigated the role of
and the dimer is stabilized by a four-stranded these water molecules to determine whether
antiparallel p-sheet formed by the N- and C- any had a structural role in the formation of
terminal strands of each monomer. The active the HIV proteasePMP-323 complex (141). In
site of the enzyme is formed at the interface, favorable cases, NMR can be used to estimate
where each monomer contributes a catalytic the residence times of hydration water mole-
triad ( A ~ p ~ ~ - T h r ' ~ - G l y ~is~ responsible
that ) cules (142), thus providing information about
for cleavage of the protease substrates. The the timescale of the interaction of buried wa-
"flap region" is located above the reactive site ter with the bulk solvent. This analysis led to
and is formed by a hairpin from each monomer the identification of a symmetry-related pair
of two antiparallel p-strands joined by a of water molecules that may have a structural
p-turn. There is little difference between the role in formation of the complex. Such infor-
solution and crystal structures of protease-in- mation may prove useful in the design of fu-
hibitor complexes, except in those regions ture cyclic urea inhibitors. An interesting
where the polypeptide chain is disordered. finding in this study was the fact that each of
However, experiments in solution have al- the hydroxyl protons of DMP-323 is in rapid
lowed access to parameters that are not exchange with solvent. This is a surprising re-
directly accessible from crystal data. These pa- sult, given that two of these hydroxyl protons
rameters, such as the amplitude and fre- are completely buried and form a network of
quency of backbone dynamics, the protona- hydrogen bonds with the catalytic Aspz5/
tion states of the catalytic aspartate residues, Asp125 side chains (143). Furthermore, the
and the rate of monomer interchange, are es- dissociation rate of DMP-323 is less than 1 s-'
sential in understanding the interaction of under the conditions of the experiment, which
HIV protease with potent inhibitors. is too small to average the chemical shifts of
The cyclic urea inhibitor DMP-323 was de- the hydroxyl protons and the bulk water. The
signed by analysis of crystal structures of HIV observation is ascribed to local fluctuations in
proteaselinhibitor complexes. A feature com- the complex that allow solvent molecules to
mon to many of the complexes of HIV protease penetrate into the binding site. This conclu-
'
is a buried water molecule that bridges the sion is supported by the observation that the
inhibitor and Ile50 in the flam. Interactions catalytic protons of the side
with this water molecule are thought to in- chains in the proteasePMP-323 complex un-
duce the fit of the flaps over the inhibitor dergo H-D exchange with solvent, even
(139). In contrast, mammalian aspartic-pro- though they are buried and hydrogen bonded
teaselinhibitor com~lexesare unable to ac- to the inhibitor (143). These studies highlight
cornmodate an equivalent water molecule that even well-ordered structures such as the
(135). This observation led to the design of a proteasePMP-323 complex may be flexible on
series of cyclic urea-based inhibitors that are the millisecond to microsecond timescale.
capable of displacing the buried water mole- Interestingly, in the DMP-323 complex,
cule (139).As well as improving the specificity both of the catalytic Asp25/Asp125side chains
of inhibitors to the viral protease, displace- are protonated over the pH range 2-7 (143).
ment of the water molecule was expected to The protonated Asp25/Asp125residues form a
increase the entropic contribution to inhibitor network of hydrogen bonds with the hydroxyl
binding and thus enhance the affinity of com- groups of DMP-323. In contrast it has been
plex formation. The cyclic urea inhibitors are shown that in the complex with the asymmet-
highly potent and specific inhibitors of HIV ric inhibitor KNI-272, the side chain of Aspz5
protease (139) and for DMP-323 it has been is protonated, whereas that of AsplZ5is not. A
shown in both the crystal structure (139) and suggested explanation for this is that both
in solution (140) that the urea moiety does oxygens of the AsplZ5side chain are deproto-
indeed replace the buried water molecule. nated to accept two hydrogen bonds, one from
Although DMP-323 replaces one buried a bound water molecule and one from the in-
water molecule, several others are observed in hibitor. In contrast the side chain of AspZ5 is
NMR and Drug Discovery
protonated so that it can donate a hydrogen plex (148). These data again highlight the im-
bond to the inhibitor (144). Consequently, portance of defining both the structural and
the protonation state of the enzyme is influ- dynamic aspects of binding to understand the
enced strongly by interaction with specific requirements for potent interactions between
inhibitors and this knowledge is essential for a HIV protease and its inhibitors.
detailed understanding of the protease/drug The development of inhibitors of HIV pro-
interactions. tease represents a major success for structure-
NMR has also been used to study the rela- based drug design. When HIV was first identi-
tionship between flexibility and enzymatic fied in the early 1980s there were no known
function for HIV protease. For the proteasel drugs effective for treatment of infection. A
DMP-323 complex, 15N spin-relaxation stud- combination of X-ray crystallography, NMR
ies determined that residues that are flexible spectroscopy, computer modeling, and chemi-
correlate well with residues that are disor- cal synthesis has resulted in the development
dered in the NMR structure of the complex of several effective HIV protease inhibitors.
(145). For example, residues in poorly defined However, in common with other retroviruses,
loops were found to undergo large-amplitude HIV has a high transcription error rate that
internal motions on the nanosecond-picosec- results in a rapid mutational rate. One of the
ond timescale. In contrast, two regions of the results of this is the production of a divergent
molecule were found to exhibit motions on the population of viruses in which the sequence of
millisecond-microsecond timescale. The first the HIV protease produced may differ sub-
of these is at the N-terminus of the protein stantially (149, 150). As a consequence, drug-
around Thr4-Leu5. This is adjacent to the ma- resistant strains of the virus emerge. Clearly,
jor site of autolysis of the protease and it has knowledge of the structural principles that
been suggested that the rate of cleavage may govern inhibition of the protease and the
regulate HIV protease activity in vivo (146). mechanism by which the virus develops resis-
Consequently, the observed flexibility may be tance will continue to be important in the de-
important for regulation of protein function. velopment of effective new drugs.
The second region found to be undergoing mil-
lisecond-microsecond motion was the tips of
the flaps around Ile50-Gly5'. In crystal struc- 4 NMR SCREENING
tures, this region of the protease is well or-
dered and not involved in crystal contacts, al- In the past, NMR was predominantly used in
though its conformation varies from structure the design stage of drug discovery rather than
to structure. This motion is interpreted as a the screening stages. Recently, new methods
dynamic conformational exchange process, that make use of NMR to screen ligands for
which is fast relative to the chemical-shift binding to a protein target have been devel-
timescale. Thus when the protease is bound to oped and are proving to be a powerful tool in
a symmetric inhibitor in solution, this confor- the discovery of new drug leads. This section
mational exchange results in the chemical gives an overview of the various experimental
shifts of the flap residues in the two monomers methods, summarized in Table 12.8, which
being identical (138, 145). In contrast, when can be used to screen mixtures of ligands for
the protease is bound to an asymmetric inhib- binding to a drug target. There will also be a
itor, such as KNI-272, crystal structures show brief discussion on the practical consider-
that each monomer interacts with the inhibi- ations that need to be made when designing an
tor in a different way (144). This is reflected in NMR screening program.
the fact that the chemical shifts of the mono-
mers are different when asymmetric inhibi- 4.1 Methods
tors are bound (141, 147). Analysis of spectra
from such an asymmetric complex has re- 4.1.1 Chemical-Shift Perturbation. Chemi-
vealed that the inhibitor is capable of "flip- cal shift is a function of the chemical (and
ping" its orientation with respect to the two hence magnetic) environment that individual
monomers without dissociating from the com- nuclei experience. Perturbations of chemical
Table 12.8 Summary of the Methods Available for NMR Screening and Their Respective Characteristics
Binding Mixture
Screening Signals Protein Information KD Suitable Deconvolution
Methodology Observed Size Limit Labeling Obtained KDLimit Determined for HTS Required?
Chemical shift Protein <30 kDa 15Nprotein Location M
10-~-10-~
perturbation
(e.g., SAR by
NMR)
STD Ligand None None Orientation M
10-~-10-~
Diffhsion-based Ligand None 2H protein for None M
10-~-10-~
WI (e.g., affinity isotope editing
g NMR)
Relaxation-based Ligand None None None 10-3-10-7M
trNOE Ligand None None Bound 10-3-10-7M
conformation
NOE pumping Ligand or None None Bound 10-3-10-7M
protein conformation
(reverse)
Spin labeling Ligand or None Spin label for Orientation, M
l0-~-10-~
protein either ligand or simultaneous
protein binding
"For reverse NOE pumping.
*For primary screening if the protein is spin-labeled or for second-site screening if the first-site ligand is spin-labeled.
NMR and Drug Discover!
Screen
for first
ligand
___)
4
Optimise
Ligand library
ligand
1 0 4 ~
Optimise
IOptimise
second
ligand
- Link
ligands
Figure 12.30. Summary of the SAR by NMR drug discovery methodology. A protein target is
screened against a library consisting of small organic molecules by use of the lH/l5N HSQC experi-
ment. When two ligands that bind in close proximity are identified, they are linked to form a
composite ligand with an increased affinity for the target.
shifts can be used to detect binding of a ligand This is a valuable piece of information in thc
to a protein target. When a ligand binds to a development of more potent second-genera.
protein the local chemical environment is tion drug leads. Binding affinities can also bc
changed, and this is reflected by a change in determined by measuring the change in chem,
the chemical shifts of nuclei in close proximity ical shift as a function of ligand concentration
to the ligand-binding site. The most common One technique that utilizes this screening
experiment used in this screening methodol- method for drug design is "SAR-by-NMR," de.
ogy is the 'H/15N HSQC that generates a dis- veloped by Fesik and coworkers (1, 4, 151-
crete signal for each amide group within the 155). SAR-by-NMR is a fragment-based drug
protein. A reference 'H/15N HSQC spectrum, design approach in which a potent drug candi.
which is acquired in the absence of potential date is derived by chemically linking two 01
ligands, is compared to a spectrum recorded in more small low affinity ligands for a target. In
the presence of ligands and any changes in the theory, the binding energy of the linked com.
amide chemical shifts are indicative of a ligand pounds will be the sum of the binding energies
binding to a location close to the correspond- of the two individual compounds plus contri.
ing amide groups. The major advantage of this butions to binding energy attributed to link.
technique is that, if the NMR assignment of age. Therefore, it is possible to generate a drug
the amide resonances is known, then the site lead with a nanomolar dissociation constant
of binding for each ligand can be determined. (KD)from two milli- to micromolar fragments.
4 NMR Screening
The first step in this process (Fig. 12.30) NMR screening and subsequently two ligands,
involves screening a library of ligands (typi- (19) and (20), were identified with KD values
cally with a MW < 400) in mixtures of up to 10
for binding to a protein target by comparing
the lH/l5N HSQC spectrum of a 15N-enriched
protein in both the presence and the absence
of ligands. Any ligand-induced changes in the
chemical shift of the nitrogen and amide pro-
ton signals indicate binding of one or more
ligands in the mixture to the protein target.
The mixture containing the binding ligand(s)
is deconvoluted and each individual compound
screened to identify the individual ligand(s)
.
res~onsible for the observed chemical-shift
perturbations. Once a binding ligand is iden-
tified analogs can be screened to optimize
binding.
A second ligand, which binds at a proximal
site, is then identified either from the original
screen or by repeating the library screening
with the first ligand site bound to the protein.
This ligand is then optimized and the struc-
ture of the ternary complex determined by use
of either NMR or X-ray crystallography. The
ternary complex structure provides informa-
tion on the conformation and orientation of
the bound ligands, which facilitates the syn-
thesis of hybrid molecules where the two li-
gands are joined by a suitable linking moiety. of 2 pM and 0.1 mM, respectively. A model of
There are several examples that illustrate the ternary complex between the protein and.
the potential of SAR by NMR. As noted earlier, both ligands was produced, which indicated
FK506 binding protein (FKBP) inhibits cal- that the methyl ester of (19) was close to the
cineurin and blocks T-cell activation when benzoyl hydroxyl group in (20). These two
complexed to the immunosuppressant FK506. groups were linked with alkyl chains of vari-
This protein was used as a target for SAR by ous lengths, with the most active compound
NMR and Drug Discovery
given that stromelysin demonstrates a sub- structure in the hope of improving the binding
strate preference for a hydrophobic amino ac- andlor pharmacological properties of the par-
ids and structural studies had identified a hy- ent compound (Fig. 12.31). The alternative
drophobic binding pocket supporting this fragment must bind in the same location
observation. From the library screen a series as the corresponding section of the original
of biphenyl compounds were identified and an- molecule, making 'H/l5N HSQC screening
alogs of these compounds were synthesized. A method ideal as it provides information on the
biphenyl derivative (23) was produced with a binding site of ligands.
In a demonstration of this fragmentation
method, an antagonist of the interaction be-
tween leukocyte function-associated antigen 1
(LFA-1) and intracellular adhesion molecule 1
(ICAM-1) was used as a starting molecule.
This interaction plays a role in the inflamma-
tory response and specific T-cell immune re-
sponses, and inhibitors have applications in
the treatment of inflammation and organ
KD value of 0.02 mM. The NMR structure of a transplant rejection. The p-arylthio cin-
ternary complex, consisting of stromelysin namide antagonist (26) had an IC,, value of 44
(22) and the biaryl derivative (24) (chosen for nM; however, it was envisaged that the mole-
its superior aqueous solubility), was deter- cule's activity and physical properties could be
mined and indicated that the methyl group of improved by replacing the isopropyl phenyl
(22) was in close proximity to the pyrimidine group with a more hydrophilic moiety. Screen-
ring of (24). With this information (22) and ing of a 2500-compound library provided sev-
(23) were subsequently linked by different eral hits, and analogs of (26) were made that
R Screening
Fragment lead
molecule
I Identify alternative
fragment
proved activity (IC,, values of 20 and 40 nM,
respectively) when compared to that of the
parent compound (26) (156).
Many compounds bind to human serum al-
bumin (HSA), which significantly reduces
their in vivo activity and hence their potential
Incorporate alternative as a drug lead. The fragmentation method has
fragment into original lead recently been used to find analogs of diflusinal
(29) that have a reduced affinity toward HSA
lncreasina saturation
Figure 12.32. A schematic representation of the saturation transfer effect. The protein resonances
are saturated (indicated by shading) by a selective pulse by spin diffusion. Resonances of nonbinding
ligands (triangles) are not affected by this pulse but ligands that are interacting with the protein
(ellipses)will also become saturated. These interacting ligands are transferred to solution through
chemical exchange where they are detected. (Reprinted with permission from Ref. 16. Copyright
1999, Wiley-VCH.)
is required, making this a good technique for ter, resonances generated by small molecules
high throughput screening of large ligand li- that do not bind to the protein can be removed
braries. Unlike the chemical-shift perturba- from the spectrum.
tion techniques, STD experiments provide no Diffusion editing is achieved with the use of
information on the site of ligand binding. a pair of gradient pulses. If field homogeneity
A second variation of saturation transfer is ignored, then all spins experience an identi-
experiment has been devised by Dalvit and co- cal magnetic field despite having different po-
workers that uses the transfer of magnetiza- sitions throughout the sample. The applica-
tion from the water (167). Water is intimately tion of a field gradient has the effect of making
associated with proteins being bound either field strength dependent on position. Under
within or on the surface of the macromolecu- the influence of the gradient pulse, the phase
lar structure. Saturation of the water reso- of individual spins become dependent on their
nance will lead to protein saturation through a position within the sample and hence the
variety of mechanisms, including saturation spins are spatially "encoded." If diffusion does
of the CYH resonances, saturation of exchang- not occur, this spatial encoding is fully revers-
ing protein resonances, and NOE interactions ible by a second gradient of inverse polarity
between water and the protein. If a compound and no loss of NMR signal will occur. How-
is bound to the protein it will also become sat- ever, the second gradient pulse will be unable
urated, and this effect can be used as an indi- to "decode" the spins that have undergone dif-
cation of ligand binding (167). fusion and the resulting NMR signal will be
reduced. Acquiring spectra of a sample with
4.1.3 Molecular Diffusion. Molecules can and without the diffusion filter and then sub-
be distinguished based on their diffusion coef- tracting them allows the ligands binding to
ficients, which are related to molecular size. the protein to be identified. This filtering
Large macromolecules, such as proteins, dif- method can be used for both 1D and 2D exper-
fuse more slowly than small molecules and it is iments and can be "tuned" by altering the
this size difference that can be exploited to strength and duration of the gradients.
screen for ligand binding. If a small molecule Because the ligand signals are being ob-
binds to a protein target its diffusion coeffi- served in this screening method, no convolu-
cient is altered to a value more like that of the tion of the ligand mixture is required, given
protein. Therefore, by utilizing a diffusion fil- that any signals can be assigned directly to
individual compounds within the mixture. of stromelysin from a mixture containing non-
However, signals from the protein are always binding compounds (177).
present, which can pose a problem in inter-
preting spectra. An isotope-edited version of 4.1.4 Relaxation. Like diffusion, the trans-
the diffusion experiment has been designed to verse relaxation time (T,) of molecules is also
avoid this problem, although labeled protein is dependent on molecular size. Large molecules,
required (168). Generally, there is no require- such as proteins, have a short T2 and hence
ment for labeling of the protein target or for exhibit broad NMR signals, whereas small
the protein resonances to be assigned and molecules have a longer T2 and hence nar-
thus, in theory, there is no size limit on the rower line widths. Therefore, if a small mole-
proteins that can be screened by use of this cule ligand binds to a protein, its T,value will
method, although no information is obtained decrease and a line-broadening effect of bound
on the location of ligand binding. However, if ligand signals can be observed. Alternatively,
the protein is large, then the transverse relax- a relaxation filter can be used to remove sig-
ation time may be too short to observe the nals from molecules with a short T, value.
bound ligands in the diffusion-edited spec- Subtraction from a reference spectrum will re-
trum (169). Only one sample, containing pro- sult in a spectrum containing only those li-
tein and ligands, is used to obtain both refer- gands that bind to the protein.
ence and screening data and therefore The ability to identify binding ligands us-
differences between the sample and reference ing relaxation filters has been demonstrated
spectra caused by addition of the ligands (pH, using FKBP. A mixture of nine compounds
salt concentration, etc.) are avoided. consisting of one known ligand of FKBP,
Diffusion-filtered NMR screening requires 2-phenylimidazole (34), and eight nonbinding
that there is a significant difference in ob- compounds (e.g., 35-37 were screened and
served translational diffusion between the only signals from (34) were observed (177).
free and bound states. The ligands are in fast
exchange on the diffusion timescale and as a
consequence the observed diffusion coefficient
for binding ligands is an average between the
free and bound diffusion values. Free ligands
diffuse at a much faster rate than those in the
bound state and thus only a small amount of
free ligand has a considerable effect on the
observed average diffusion coefficient. This ef-
fect may be significant enough to reduce the
difference between binding and nonbinding li-
gands, making it more difficult to interpret
results (169). It has also been demonstrated
that chemical exchange and NOE can affect
the interpretation of diffusion experiments
and that these factors need to be taken into
consideration (170, 171).
Shapiro and coworkers developed a meth-
odology based on diffusion filtering, named
"affinity NMR," that they have used to screen
for binding (172-175). Diffusion-edited NMR
experiments were able to identify two known
binding tetrapeptide ligands of vancomycin
from a mixture of 10 peptides (176). Hajduk et 4.1.5 NOE. NOE experiments can also be
al. demonstrated the application of diffusion- used to identify ligands that bind to protein
editing experiments by differentiating ligands targets (178-180).Small molecules have a fast
572 NMR and Drug Discovery
Relaxation
A second technique that uses NOES to de- for any ligands that bind simultaneously and
tect binding is NOE pumping. This method in close proximity to the first ligand-binding
was designed to alleviate some of the problems site. In addition, the degree of reduction in
associated with the diffusion-edited screening signal intensity gives an indication of the ori-
methods (169). Signals from ligand molecules entation of the second ligand in relation to the
are removed using a diffusion filter and then first, given that the effect of the spin label is
transfer of signal from the protein to bound inversely proportional to the distance separat-
ligands by NOE occurs. The inverse of this is ing the electron and proton. This information
possible (known as reverse NOE pumping), is valuable in the design of linkers to join the
which uses a relaxation filter to attenuate the two ligands.
protein resonances, after which the signal is There are several advantages to using the
transferred to the protein by NOE. Ligands spin-label screening method. Currently, it is
may lose signal either by relaxation (for a free the only method that can detect ligands that
ligand) or through relaxation and NOE trans- bind to the protein simultaneously, unlike
fer (for a bound ligand). Therefore by sub- other methods that can produce false positives
tracting spectra (which is done internally to if the first ligand-binding site is not fully sat-
reduce subtraction artifacts) from experi- urated. The concentration of protein required
ments with and without NOE pumping to the for screening is relatively small (-1OCLM) be-
protein, the binding ligands can be detected cause of the substantial enhancement of the
(181). relaxation rate by the spin label. The protein
The ability of NOE and reverse NOE can also be unlabeled and partially purified
pumping to identify ligands has been demon- and there is no molecular weight limit. The
strated through the use of human serum albu- spin labels also quench protein signals, mak-
min (HSA) and several known binding and ing interpretation of spectra easier. The ex-
nonbinding compounds (169, 181). periment is easy to set up and analyze, making
it amenable to automation. It is also insensi-
4.1.6 Spin Labels. Spin-spin relaxation rates tive to small changes in solvent conditions
are proportional to the product of the squares that can generate false positives in other
of the gyromagnetic ratios of the involved methods. The information obtained on the ori-
spins. The gyromagnetic ratio of an unpaired entation of ligands is also valuable and makes .
electron is significantly larger than that of a it an alternative to the chemical-shift pertur-
proton and therefore any spins influenced by bation methods when the proteins are large
this electron will have substantially shortened and NMR assignments have not been made.
relaxation times. The resonances of protons A disadvantage of the method is the re-
that are within 15-20 A from the unpaired quirement for spin-labeled proteins and li-
electron will experience this effect and be sig- gands. In addition, any ligands with slow dis-
nificantly broadened. The introduction of a sociation rates will show no averaging of
short spin-lock period will significantly reduce relaxation rates and therefore tightly binding
the intensity or quench these signals. compounds (KD< M ) will produce false
The spin-label method can be used as either negatives. Protein spin labeling must occur
a primary screening method or to identify a adjacent but not within the binding site to
second ligand-binding site. The primary minimize alteration of its binding properties.
screening method requires residues around The antiapoptopic protein Bcl-xL is respon-
the binding pocket of the target to be spin la- sible for the reduced susceptibility of cancer
beled. Residues suitable for this labeling in- cells to undergo apoptosis and is therefore a
clude lysine, cysteine, histidine, glutamate, target for the development of new anticancer
aspartate, tyrosine, and methionine. Any li- agents. The structure of a previously identi-
gands that bind to the protein in close proxim- fied ligand for Bcl-xL (39) was modified to in-
ity to the spin-labeled residues will be able to corporate a TEMPO spin label (40). By use of
be identified. To screen for second-site ligand spin-labeled (40), an eight-compound library
binding, the known first-site binding ligand is was screened for simultaneous binding to Bcl-
spin labeled. A reduced signal will be observed xL. From this library an aromatic ketoxime
NMR and Drug Discovery
cules that are selected for the library so that all known drugs could be represented by only
each neighborhood does not overlap, diversity 32 different frameworks. When atom type and
is maximized. bond order were included in the analysis, 41
However, if neighborhoods are only small frameworks were found to describe 24% of all
then compound libraries must be very large so drugs (190). A similar analysis of side-chain
that the neighborhoods overlap and hence all frequency indicated that approximately 70%
molecular space is covered. In addition, some of all side chains present in the compound da-
systems do not exhibit neighborhood behavior tabase analyzed were from the top 20 occur-
and relatively small changes to the structure ring side chains (191).
of a compound may lead to large changes in its The presence of these common frameworks
binding affinity for the target. Maximizing di- and side-chains has been exploited in the
versity may also be inefficient because many SHAPES methodology (7) for NMR screening.
molecules do not possess physicochemical This strategy employs a small focused library
characteristics that are suitable as the basis based on these common frameworks and side
for a drug. In practice, the more that is known chains to screen against protein targets
about the drug target, the less diverse and through the use of relaxation and NOE exper-
more focused the library can be. However, if iments. The advantages of this approach are
the library is too focused then some outlying that the library is small and hence only rela-
<(
new" ligand type for the target being tively small amounts of protein are required
screened may be missed. and any hits from the library will posse&
One strategy for library design is to select druglike characteristics. However, a disadvan-
compounds that have druglike characteristics. tage of the method is that it is unlikely to yield
A simple set of rules, determined by Lipinski new drug types, given that the library is based
and coworkers, for determining whether a on known drug frameworks.
compound is druglike is known as "the rule of Diversity of molecular type is not the only
5." According to this set of criteria, the major- factor that must be taken into account when
ity of orally available drugs have five or fewer designing a library to be used in an NMR
hydrogen bond donors, 10 or fewer hydrogen screening program. Because the screening oc-
bond acceptors, a log P of less than 5, and a curs in an aqueous solution, the organic com-
molecular weight less than 500 (186). Addi- pounds chosen for the library must demon-
tional factors that can be taken into consider- strate reasonable solubility in the aqueous
ation include the number of heavy atoms, ro- conditions used. In general, compounds are
tatable bonds, and ring systems (187-189). dissolved in DMSO and then added to the pro-
Another study has revealed that there are a tein solution at the appropriate concentration.
number of frameworks and side-chains that Currently, there are no good methods for de-
commonly occur in many drugs. Drug mole- termining the solubility of a wide range of
cules, from the comprehensive medicinal data- compounds before screening commences. A
base, were broken down into systems consist- simple method is to dilute the DMSO solution
ing of frameworks (Fig. 12.34) and side- in buffer and observe whether any precipita-
chains. Analysis of these two structural tion or aggregation occurs. However, this
features revealed that approximately 50% of method will not be suitable for compounds
NMR and Drug Discovery
that precipitate or aggregate over several especially when one uses large numbers of
hours, and solutions that appear clear may compounds per mixture. It has been demon-
still contain high MW aggregates, which will strated that in random mixtures of 10 com-
cause false positives in experiments such as pounds in DMSO, the probability of a reaction
the diffusion, relaxation, and TrNOE methods occurring between two of the mixture's com-
(15, 185). ponents is 26%. This value can be reduced by
It is also preferable to choose ligands that careful selection of mixture components (e.g.,
are synthetically accessible andlor possess separating acids from bases) to approximately
suitable moieties to build upon or link to other 9% (192).
fragments. This is especially important in the
SAR-by-NMR screening methodology because 4.2.3 Hardware and Automation. Automa-
this relies on the ability to link individual frag- tion is a requirement if libraries containing a
ments to form a more potent drug lead. If the large number of compounds are to be
ligands to be linked are not synthetically ac- screened. Technology has been developed that
cessible or do not possess suitable linking allows the automation of almost all steps of
functional groups then this process is severely the NMR screening process from sample prep-
hindered. aration through to data analysis (193).
4.2.2.2 Mixture Design. The optimal num- The general setup for NMR screening con-
ber of compounds per mixture is dependent on sists of a robot for just-in-time preparation of
the screening method. For ligand-observed ex- each sample, which is then transferred to the
periments the limiting factor for the number magnet either through a flow system or as dis-
of compounds in a mixture is spectral overlap. crete samples on a rail system. There are sev-
Ligands need to be chosen so that spectral eral disadvantages in using a flow system, in-
overlap is minimized, making interpretation cluding the possibility of contamination of
of the resulting data far simpler. In theory, samples by previously screened compounds,
protein-observed experiments could have a
the capillary line can be blocked if the protein
large number of compounds per mixture that
or ligands precipitate or form aggregates, re-
would both minimize screening time and the
requirement for large amounts of protein. covery of the sample is more laborious because
However, because the experiments are protein it has been diluted, and cryoprobe technology
observed then deconvolution of the mixtures (discussed later in this section) is not yet avail-
and rescreening of each individual compound able in the flow system. Many of these prob-
are required to identify any hits. Therefore, lems can be overcome by using the discrete
the number of compounds per mixture is de- samples with the rail system.
pendent on the hit rate in the screening pro- Data acquisition is easily automated and
cedure, given that the greater the hit rate, the there are several software packages that will
more deconvolution steps required and conse- automate data processing for 2D spectra. The
quently more protein and spectrometer time processing of 1D spectra automatically is re-
are needed. The number of experiments re- ported to be less reliable because of the large
quired is at a minimum when the number of solvent signal and usually require manual ad-
compounds is equal to l/(hit rate)'''; thus, justment of phasing (193). One of the most
with a hit rate of 10% the optimal number of laborious tasks in NMR screening is the anal-
compounds per mixture is three (185). In ad- ysis and comparison of the resulting spectra.
dition to these factors, if the hit rate is high For 1D ligand-observed experiments differ-
then it is likely that several compounds within ence methods (e.g., STD) provide the most re-
a mixture containing a large number of com- liable method for interpretation of results, in
pounds may compete for the same binding that the presence of signals in the spectra will
pocket, which may lead to false negatives. correspond to the ligands that are binding.
In mixtures of organic compounds the pos- In 2D protein-observed experiments (e.g., 'HI
sibility of interactions between compounds, 15N HSQC) a more statistically rigorous anal-
such as reactions or ion pairing, is also possi- ysis of changes in chemical shift is required
ble and should be taken into consideration, and a discussion of this is beyond the scope of
References
this chapter. A more in-depth account of data in both instruments and methodology. On the
analysis is provided by Ross and Senn (193). instrumental side, increases in magnetic field
Currently, approximately 50-100 samples strengths and the development of cryoprobes
can be screened per day and if mixtures con- have greatly increased sensitivity. Linkages of
tain 10 compounds each this provides a sub- NMR to LC and MS have increased versatility.
stantial throughput. This throughput rate will On the methods front there have been a range
increase as technology improves, as has been of new approaches discovered that will en-
demonstrated by the use of cryoprobes. Cryo- hance the study of larger molecular com-
genic NMR probes, where the preamplifier plexes. Advances in protein expression and la-
and radio frequency coils are cooled to low beling have played a major role in stimulating
temperatures, can significantly increase the the development of new NMR pulse sequences
signal-to-noise ratio of an NMR spectrum. By to extract information from such complexes.
use of these probes NMR data can be obtained
in much faster times and by use of lower pro- 6 ACKNOWLEDGMENTS
tein concentrations, which subsequently in-
creases throughput, the total amount of pro- Work in our laboratory on NMR in drug de-
tein needed to screen a library is reduced. sign and development is supported by the Aus-
Hajduk and coworkers (21) demonstrated the tralian Research Council. D.J.C. is an ARC
substantial improvements made through the Professorial fellow. We thank Norelle Daly for
use of a CryoProbe instead of a conventional assistance with some of the figures and Robyn
probe in lH/l5N chemical-shift perturbation Craik, Shaiyena Williams, and David Ireland
screening. Stromelysin (50 pM) was screened for help in preparation of the manuscript.
against mixtures of 100 compounds (50 pit4
each), facilitating the screening of more than REFERENCES
10,000 compounds in one day. The use of lower 1. P. J. Hajduk, R. P. Meadows, and S. W. Fesik,
concentrations of both protein and ligands in- Science, 278,497 (1997).
creases the stringency levels for the binding 2. S. W. Fesik, J. Biomol. NMR, 3,261 (1993).
strength of ligands. At a proteinbigand con- 3. B. Stockman, Prog. Nucl. Magn. Reson. Spec-
centration of 0.5 mM, ligands with dissocia- trosc., 33, 109 (1998).
tion constants in the millimolar range can be 4. P. J. Hajduk, R. P. Meadows, and S. W. Fesik,.
detected, although at a proteinbigand concen- Q. Rev. Biophys., 32, 211 (1999).
tration of 50 pM this dissociation constant 5. G. C. Roberts, Curr. Opin. Biotechnol., 10, 42
limit is reduced to approximately 0.15 mM. (1999).
Although using higher proteinbigand concen- 6. J. M. Moore, Curr. Opin. Biotechnol., 10, 54
trations can be advantageous when screening (1999).
libraries containing small low affinity ligands, 7. J. Fejzo, C. A. Lepre, J. W. Peng, G. W. Bemis,
a higher stringency is required when screen- Ajay, M. A. Murcko, and J. M. Moore, Chem.
ing large libraries, to reduce the number of Biol., 6, 755 (1999).
hits obtained to a manageable number (21). 8. P. A. Keifer, Curr. Opin. Biotechnol., 10, 34
(1999).
9. G. C. Roberts, Drug Discovery Today, 5, 230
5 CONCLUSIONS (2000).
10. D. C. Fry and S. D. Emerson, Drug Des. Dis-
In this chapter we have given an overview of c o ~ .17,
, 13 (2000).
the two major approaches used in NMR and 11. D. J. Craik and M. J. Scanlon in G. A. Webb,
drug discovery, structure-based design and Ed., Annual Reports on NMR Spectroscopy,
NMR-based screening. Both areas are flour- Vol. 42, Academic Press, San Diego, 2000, pp.
ishing and, together with more traditional 115-173.
uses of NMR, they demonstrate the versatility 12. T. Diercks, M. Coles, and H. Kessler, Curr.
of NMR as a tool in medicinal chemistry. The Opin. Chem. Biol., 5,285 (2001).
power of NMR has been dramatically en- 13. R. P. Hicks, Curr. Med. Chem., 8, 627 (2001).
hanced over the last decade by developments 14. M. Shapiro, Farmaco, 56, 141 (2001).
NMR and Drug Discovery
15. J. W. Peng, C. A. Lepre, J. Fejzo, N. Abdul- 34. V. Saudek, J. Hoflack, and J. T. Pelton, FEBS
Manan, and J. M. Moore, Methods Enzymol., Lett., 257, 145 (1989).
338,202 (2001). 35. S. Endo, H. Inooka, Y. Ishibashi, C. Kitada, E.
16. D. J. Craik, Ed., NMR in DrugDesign, CRC Mizuta, and M. Fujino, FEBS Lett., 257, 149
Press, Boca Raton, FL, 1996, pp. 1-476. (1989).
17. U. Holzgrabe, I. Wawer, and B. Diehl, NMR 36. R. G. Mills, S. I. 07Donoghue,R. Smith, and
Spectroscopy in Drug Development and Analy- G. F. King, Biochemistry, 31,5640 (1992).
sis, Wiley-VCH, Weinheim, Germany, 1999, 37. S. Munro, D. Craik, C. McConviIle, J. Hall, M.
pp. 1-299. Searle, W. Bicknell, D. Scanlon, and C. Chan-
18. A. E. Derome, Modern NMR Techniques for dler, FEBS Lett., 278,9 (1991).
Chemistry Research, Pergamon, New York, 38. M. D. Reily and J. B. Dunbar Jr., Biochem.
1987. Biophys. Res. Commun., 178,570 (1991).
19. H. Gunther, NMR Spectroscopy-An Introduc- 39. H. Tamaoki, Y. Kobayashi, S. Nishimura, T.
tion, John Wiley & Sons, Chichester, UK, 1980, Ohkubo, Y. Kyogoku, K. Nakajima, S. Kuma-
pp. 1-436. gaye, T. Kimura, and S. Sakakibara, Protein
20. K. Pervushin, R. Riek, G. Wider, and K. Wuth- Eng., 4,509 (1991).
rich, Proc. Natl. Acad. Sci. USA, 94, 12366 40. A. Aumelas, L. Chiche, S. Kubo, N. Chino, H.
(1997). Tamaoki, and Y. Kobayashi, Biochemistry, 34,
21. P. Hajduk, T. Gerfin, J. Boehlen, M. Haberli, 4546 (1995).
D. Marek, and S. Fesik, J. Med. Chem., 42, 41. A. Aumelas, L. Chiche, E. Mahe, D. Le-
2315 (1999). Nguyen, P. Sizun, P. Berthault, and B. Perly,
22. P. A. Keifer, Prog. Drug Res., 55, 137 (2000). Int. J. Pept. Protein Res., 37, 315 (1991).
23. K. Wuthrich, NMR of Proteins and Nucleic Ac- 42. D. C. Dalgarno, L. Slater, S. Chackalamannil,
ids, John Wiley & Sons, New York, 1986, pp. and M. M. Senior, Int. J. Pept. Protein Res., 40,
1-292. 515 (1992).
24. C. E. Heading, Drugs, 4,339 (2001). 43. Y. Boulanger, E. Biron, A. Khiat, and A.
Fournier, J. Pept. Res., 53,214 (1999).
25. D. S. Wishart, B. D. Sykes, and F. M-Richards,
J. Mol. Biol., 222,311 (1991). 44. K. Arvidsson, T. Nemoto, Y. Mitsui, S. Ohashi,
and H. Nakanishi, Eur. J. Biochem., 257, 380
26. D. S. Wishart, C. G. Bigam, A. Holm, R. S. (1998).
Hodges, and B. D. Sykes, J. Biomol. NMR, 5,
45. C. M. Hewage, L. Jiang, J. A. Parkinson; R.
67 (1995).
Ramage, and I. H. Sadler, J. Pept. Sci., 3,415
27. K. J. Nielsen, L. Thomas, R. J. Lewis, P. F. (1997).
Alewood, and D. J. Craik, J. Mol. Biol., 263,
46. B. A. Wallace, R. W. Janes, D. A. Bassolino, and
297 (1996).
S. R. Krystek Jr., Protein Sci., 4, 75 (1995).
28. L. K. MacLachlan, D. A. Middleton, A. J. Ed- 47. M. Coles, V. Sowemimo, D. Scanlon, S. L.
wards, and D. G. Reid in D. G. Reid, Ed., Pro- Munro, and D. J. Craik, J. Med. Chem., 36,
tein NMR Techniques, Vol. 60, Humana Press,
2658 (1993).
Totowa, NJ, 1997, pp. 337-362.
48. D. J. Detlefsen, S. E. Hill, S. H. Day, and M. S.
29. D. S. Wishart, B. D. Sykes, and F. M. Richards, Lee, Curr. Med. Chem., 6, 353 (1999).
Biochemistry, 31, 1647 (1992).
49. W. C. Patt, J . J. Edmunds, J. T. Repine, K. A.
30. F. Dasgupta, A. K. Mukherjee, and N. Gan- Berryman, B. R. Reisdorph, C. Lee, M. S.
gadhar, Curr. Med. Chem., 9, 549 (2002). Plummer, A. Shahripour, S. J. Haleen, J. A.
31. D. J. Craik, K. J. Nielsen, and K. A. Higgins in Keiser, M. A. Flynn, K. M. Welch, E. E. Reyn-
G. A. Webb, Ed., Annual Reports on NMR olds, R. Rubin, B. Tobias, H. Hallak, and A. M.
Spectroscopy, Vol. 32, Academic Press, San Di- Doherty, J. Med. Chem., 40, 1063 (1997).
ego, 1995, pp. 143-213. 50. W. C. Patt, X. M. Cheng, J. T. Repine, C. Lee,
32. S. R. Krystek Jr., D. A. Bassolino, J. Novotny, B. R. Reisdorph, M. A. Massa, A. M. Doherty,
C. Chen, T. M. Marschner, and N. H. K. M. Welch, J . W. Bryant, M. A. Flynn, D. M.
Andersen, FEBS Lett., 281, 212 (1991). Walker, R. L. Schroeder, S. J. Haleen, and J. A.
33. N. H. Andersen, C. P. Chen, T. M. Marschner, Keiser, J. Med. Chem., 42, 2162 (1999).
S. R. Krystek Jr., and D. A. Bassolino, Bio- 51. S. L. Munro, P. R. Andrews, D. J. Craik, and
chemistry, 31, 1280 (1992). D. J . Gale, J. Pharm. Sci., 75, 133 (1986).
References
52. B. M. Duggan and D. J. Craik, J. Med. Chem., 72. G. Wider and K. Wuthrich, Curr. Opin. Struct.
39,4007(1996). Biol., 9,594(1999).
53. B. M. Duggan and D. J. Craik, J. Med. Chem., 73. G. M. Clore andA. M. Gronenborn, Curr. Opin.
40,2259(1997). Chem. Biol., 2,564 (1998).
54. M. G. Casarotto and D. J. Craik, J. Pharm. 74. N. Tjandra and A. Bax, Science, 278, 1111
Sci., 90,713(2001). (1997).
55. R. Abseher, L. Horstink, C. W. Hilbers, and M. 75. G. F. King and J. P. Mackay in D. J. Craik, Ed.,
Nilges, Proteins: Struct., Funct., Genet., 31, NMR in Drug Design, CRC Press, Boca Raton,
370(1998). FL, 1996,pp. 101-200.
56. J. Gehrmann, P. F. Alewood, and D. J. Craik, J. 76. N. Tjandra, A. M. Garrett, A. M. Gronenborn,
Mol. Biol., 278,401(1998). A. Bax, and G. M. Clore, Nut. Struct. Biol., 4,
57. J. Balbach, S. Seip, H. Kessler, M. Scharf, N. 443 (1997).
Kashani-Poor, and J. W. Engels, Proteins, 33, 77. A.M. Edwards, C. H. Arrowsmith, D. Christen-
285(1998). dat, A. Dharamsi, J. D. Friesen, J. F.Green-
58. D. J. Craik, B. M. Duggan, and S. L. A. Munro blatt, and M. Vedadi, Nut. Struct. Biol., 7
in M. I. Choudary, Ed., Biological Inhibitors, (Suppl.), 970 (2000).
Vol. 2,Harwood Academic, Amsterdam, 1996, 78. D. Christendat, A Yee, A. Dharamsi, Y.
pp. 255302. Kluger, M. Gerstein, C. H. Arrowsmith, and
59. R. L.Wagner, J. W. Apriletti, M. E. McGrath, A. M. Edwards, Prog. Biophys. Mol. Biol., 73,
B. L. West, J. D. Baxter, and R. J. Fletterick, 339 (2000).
Nature, 378,690 (1995). 79. L. E. Kay, Nut. Struct. Biol., 5,513(1998).
60. D. J. Gale, D. J. Craik, and R. T. C. Brownlee, 80. F. M. Marassi and S. J. Opella, Curr. Opin.
Magn. Reson. Chem., 26,275(1988). Struct. Biol., 8,640(1998).
61. W. Bourguet, M. Ruff, P. Chambon, H. Grone- 81. A. Watts, Curr. Opin. Biotechnol., 10, 48
meyer, and D. Moras, Nature, 375,377(1995). (1999).
62. J. P. Renaud, N. Rochel, M. Ruff, V. Vivat, P. 82. L.-Y. Lian and G. C. K. Roberts in G. C. K.
Chambon, H. Gronemeyer, and D. Moras, Na-
Roberts, Ed., NMR of Macromolecules: A Prac-
ture, 378,681(1995). tical Approach, Oxford University Press, Ox-
63. J. Feeney and B. Birdsall in G. C. K. Roberts, ford, UK, 1993.
Ed., NMR of Macromolecules: A Practical Ap-
83. L. Dugad and J. T. Gerig, Biochemistry, 27,.
proach, Oxford University Press, Oxford, UK,
4310(1988).
1993,pp. 181-215.
84. E. I. Hyde, B. Birdsall, G. C . Roberts, J.
64. J. Feeney in I. Bertini, H. Molinari, and N.
Feeney, and A. S. V. Burgen, Biochemistry, 19,
Niccolai, Eds., NMR and Biomolecular Struc-
3746(1980).
ture, VCH, New York, 1991,pp. 189-205.
85. J. Feeney, J. G. Batchelor, J. P. Albrand, and
65. 5. Feeney, Biochem. Pharmacol., 40, 141
G. C. K. Roberts, J. Magn. Reson., 33, 519
(1990).
(1979).
66. K.J. Nielsen, D. Adarns, L. Thomas, T. Bond,
86. S. Pavlopoulos, M. Rose, G. Wickham, and D. J.
P. F. Alewood, D. J. Craik, and R. J. Lewis, J.
Craik, Anticancer Drug Des., 10,623(1995).
Mol. Biol., 289,1405 (1999).
87. G. J. Pelton and D. E. Wemmer, Proc. Natl.
67. A. P. Campbell and B. D. Sykes, Annu. Rev.
Acad. Sci. USA, 86,5723(1990).
Biophys. Biomol. Struct., 22,99 (1993).
88. W. Leupin, W. J. Chazin, S. Hyberts, W. A.
68. B. D. Sykes, Curr. Opin. Biotechnol., 4, 392
Denny, and K. Wuthrich, Biochemistry, 25,
(1993). 5902(1986).
69. D. J. Craik and K. A. Higgins in G. A. Webb,
89. S. M. Chen, W. Leupin, M. Rance, and W. J.
Ed., Annual Reports on NMR Spectroscopy,
Chazin, Biochemistry, 31,4406(1992).
Vol. 22,Academic Press, London, 1990,pp. 61-
138. 90. R. E. Klevit, D. E. Wemmer, and B. R. Reid,
70. G. Bertho, J. Gharbi-Benarous, M. Delaforge, Biochemistry, 25,3296(1986).
and J. P. Girault, Bioorg. Med. Chem., 6, 209 91. D.J. Pate1 and L. Shapiro,J. Biol. Chem., 261,
(1998). 1230 (1986).
71. R. E. Hubbard, Curr. Opin. Biotechnol., 8,696 92. M. S. Searle and K. J. Embrey, Nucleic Acids
(1997). Res., 18,3753(1990).
NMR and Drug Discovery
93. S. W. Fesik, J. R. Luly, J. W. Erickson, and C. 114. M. K. Rosen, R. F. Standaert, A. Galat, M. Na-
Abad-Zapatero, Biochemistry, 27,8297 (1988). katsuka, and S. L. Schreiber, Science, 248,863
94. C. Zwahlen, P. Legault, S. J. F. Vincent, J. (1990).
Greenblatt, R. Konrat, and L. E. Kay, J. Am. 115. T. J. Wandless, S. W. Michnick, M. K. Rosen,
Chem. SOC., 119,6711 (1997). M. Karplus, and S. L. Schreiber, J. Am. Chem.
95. M. J. Gradwell and J . Feeney, J.Biomol. NMR, Soc., 113,2339 (1991).
7,48 (1996). 116. A. M. Petros, R. T. Gampe Jr., G. Gemmecker,
96. K. D. Harshman and P. B. Dervan, Nucleic Ac- P. Neri, T. F. Holzman, R. Edalji, J. Hoch-
ids Res., 13, 4825 (1985). lowski, M. Jackson, J. McAlpine, J. R. Luly, et
97. J. G. Pelton and D. E. Wemmer, Biochemistry, al., J. Med. Chem., 34,2925 (1991).
27,8088 (1988). 117. G. D. van Duyne, R. F. Standaert, M. Karplus,
98. J. G. Pelton and D. E. Wemmer, Proc. Natl. S. L. Schreiber, and J. Clardy, Science, 252,
Acad. Sci. USA, 86,5723 (1989). 839 (1991).
99. M. Coll, J. Aymami, G. A. van der Marel, J. H. 118. S. W. Michnick, M. K. Rosen, T. J. Wandless,
van Boom, A. Rich, and A. H. Wang, Biochem- M. Karplus, and S. L. Schreiber, Science, 252,
istry, 28,310 (1989). 836 (1991).
100. P. E. Pjura, K. Grzeskowiak, and R. E. Dicker- 119. J. M. Moore, D. A. Peattie, M. J. Fitzgibbon,
son, J.Mol. Biol., 197,257 (1987). and J. A. Thomson, Nature, 351,248 (1991).
101. M. K. Teng, N. Usman, C. A. Frederick, and 120. R. P. Meadows, D. G. Nettesheim, R. X. Xu,
A. H. Wang, Nucleic Acids Res., 16, 2671 E. T. Olejniczak, A. M. Petros, T. F. Holzman,
(1988). J. Severin, E. Gubbins, H. Smith, and S. W.
102. M. A. Carrondo, M. Coll, J. Ayrnami, A. H. Fesik, Biochemistry, 32, 754 (1993).
Wang, G. A. van der Marel, J. H. van Boom, 121. J. W. Cheng, C. A. Lepre, S. P. Chambers, J. R.
and A. Rich, Biochemistry, 28, 7849 (1989). Fulghum, J. A. Thomson, and J. M. Moore,
103. J. R. Quintana, A. A. Lipanov, and R. E. Dick- Biochemistry, 32,9000 (1993).
erson, Biochemistry, 30, 10294 (1991). 122. J. W. Cheng, C. A. Lepre, and J. M. Moore,
104. J. A. Parkinson, J. Barber, K. T. Douglas, J. Biochemistry, 33,4093 (1994).
Rosamond, and D. Sharpies, Biochemistry, 29, 123. P. R. Gooley, B. A. Johnson, A. I. Marcy, G. C.
10181 (1990). Cuca, S. P. Salowe, W. K. Hagmann, C. K. Es-
105. K. J. Embrey, M. S. Searle, and D. J. Craik, ser, and J. P. Springer, Biochemistry, 32,
Eur. J. Biochem., 211,437 (1993). 13098 (1993).
106. A. Fede, A. Labhardt, W. Bannwarth, and W. 124. P. R. Gooley, J. F. O'Connell, A. I. Marcy, G: C.
Leupin, Biochemistry, 30, 11377 (1991). Cuca, S. P. Salowe, B. L. Bush, J. ~ . h e r m e s ,
107. A. Fede, M. Billeter, W. Leupin, and K. Wuth- C. K. Esser, W. K. Hagmann, J. P. Springer,
rich, Structure, 1, 177 (1993). and B. A. Johnson, Nut. Struct. Biol., 1, 111
108. T. Taga, H. Tanaka, T. Goto, and S. Tada, Acta (1994).
Crystallogr., C43, 751 (1987). 125. P. R. Gooley, J. F. O'Connell, A. I. Marcy, G. C.
109. B. E. Bierer, P. K. Somers, T. J. Wandless, S. J. Cuca, M. G. Axel, C. G. Caldwell, W. K. Hag-
Burakoff, and S. L. Schreiber, Science, 250, mann, and J. W. Becker, J.Biomol. NMR, 7,8
556 (1990). (1996).
110. P. Karuso, H. Kessler, and D. F. Mierke, J.Am. 126. S. R. Van Doren, A. V. Kurochkin, Q.-Z. Ye,
Chem. Soc., 112,9434 (1990). L. L. Johnson, D. J. Hupe, and E. R. P. Zuider-
111. S. W. Fesik, R. T. Gampe Jr., T. F. Holzman, weg, Biochemistry, 32,13109 (1993).
D. A. Egan, R. Edalji, J. R. Luly, R. Simmer, R. 127. S. R.Van Doren, A. V. Kurochkin, W. Hu, Q.-Z.
Helfrich, V. Klahore, and D. H. Rich, Science, Ye, L. L. Johnson, D. J. Hupe, and E. R. Zuider-
250, 1406 (1990). weg, Protein Sci., 4,2487 (1995).
112. S. W. Fesik, R. T. Gampe Jr., H. L. Eaton, G. 128. M. A. McCoy, M. J. Dellwo, D. M. Schneider,
Gemmecker, E. T. Olejniczak, P. Neri, T. F. T. M. Banks, J. Falvo, K. J. Vavra, A. M. Ma-
Holzman, D. A. Egan, R. Edalji, R. Simmer, R. thiowetz, M. W. Qoronfleh, R. Ciccarelli, E. R.
Helfrich, J. Hochlowski, and M. Jackson, Bio- Cook, T. A. Pulvino, R. C. Wahl, and H. Wang,
chemistry, 31,6574 (1991). J.Biomol. NMR, 9, 11 (1997).
113. C. Weber, G. Wider, K. von Freyberg, R. Tru- 129. F. J. Moy, M. R. Pisano, P. K. Chanda, C. Ur-
ber, W. Braun, H. Widner, and K. Wuetrich, bano, L. M. Killar, M. L. Sung, and R. Powers,
Biochemistry, 30, 6564 (1991). J. Biomol. NMR, 10, 9 (1997).
References
160. M. Mayer and B. Meyer, Angew. Chem. Znt. Ed. 177. P. J. Hajduk, E. T. Olejniczak, and S. W. Fesik,
Engl., 38, 1784 (1999). J. Am. Chem. Soc., 119, 12257 (1997).
161. J. Klein, R. Meinecke, M. Mayer, and B. Meyer, 178. D. Henrichson, B. Ernst, J. L. Magnani, W.
J. Am. Chem. Soc., 121, 5336 (1999). Wang, B. Meyer, and T. Peters, Angew. Chem.
162. W. Hellebrandt, T. Haselhorst, T. Kohli, E. Int. Ed. Engl., 38,98 (1999).
Baurnl, and T. Peters, J. Carbohydr. Chem., 179. B. Meyer, T. Weimar, and T. Peters, Eur.
19, 769 (2000). J. Biochem., 246, 705 (1997).
163. M. Vogtherr and T. Peters, J. Am. Chem. Soc., 180. M. Mayer and B. Meyer, J. Med. Chem., 43,
122, 6093 (2000). 2093 (2000).
164. H. Maaheimo, P. Kosma, L. Brade, H. Brade, 181. A. Chen and M. J. Shapiro, J. Am. Chem. Soc.,
and T. Peters, Biochemistry, 39,12778 (2000). 122,414 (2000).
165. M. Mayer and B. Meyer, J. Am. Chem. Soc., 182. S. Bagby, K. I. Tong, D. Liu, J. R. Alattia, and
123, 6108 (2001). M. Ikura, J. Biomol. NMR, 10, 279 (1997).
166. R. Meinecke and B. Meyer, J. Med. Chem., 44, 183. C. Lepre and J. Moore, J. Biomol. NMR, 12,
3059 (2001). 493 (1998).
167. C. Dalvit, P. Pevarello, M. Tato, M. Veronesi, 184. S. Bagby, K. I. Tong, and M. Ikura, Methods
A. Vulpetti, and M. Sundstrom, J. Biomol. Enzymol., 339,20 (2001).
NMR, l8,65 (2000). 185. C. A. Lepre, Drug Discovery Today, 6, 133
168. N. Gonnella, M. Lin, M. J. Shapiro, J. R. (2001).
Wareing, and X. Zhang, J. Magn. Reson., 131, 186. C. A. Lipinski, F. Lombardo, B. W. Dominy,
336 (1998). and P. J. Feeney, Adv. Drug Delivery Rev., 46,
169. A. Chen and M. J. Shapiro, J. Am. Chem. Soc., 3 (2001).
120,10258 (1998). 187. A. K. Ghose, V. N. Viswanadhan, and J. J.
170. A. Chen, C. S. Johnson Jr., M. Lin, and M. J. Wendoloski, J. Comb. Chem., 1, 55 (1999).
Shapiro, J. Am. Chem. Soc., 120,9094 (1998). 188. T. I. Oprea, J. Gottfries, V. Sherbukhin, P.
Svensson, and T. C. Kuhler, J. Mol. Graph.
171. A. Chen and M. J. Shapiro, J. Am. Chem. Soc.,
Model., 18,512 (2000).
121,5338 (1999).
189. J. Xu and J. Stevenson, J. Chem. Inf. Comput.
172. A. Chen and M. J. Shapiro, Anal. Chem., 71,
Sci., 40, 1177 (2000).
66911 (1999).
190. G. Bemis and M. Murcko, J. Med. Chem., 39,
173. M. Lin and M. J. Shapiro, J. Org. Chem., 61, 2887 (1996).
7617 (1996).
191. G. Bernis and M. Murcko, J. Med. Chem., 42,
174. M. Lin, M. J. Shapiro, and J. R. Wareing, 5095 (1999).
J. Am. Chem. Soc., 119,5249 (1997). 192. M. Hann, B. Hudson, X. Lewell, R. Lifely, L.
175. M. Lin, M. J. Shapiro, and J. R. Wareing, J. Miller, and N. Ramsden, J. Chem. Inf.Comput.
Org. Chem., 62,8930 (1997). Sci., 39,897 (1999).
176. K. Bleicher, M. Lin, M. J. Shapiro, and J. R. 193. A. Ross and H. Senn, Drug Discovery Today, 6,
Wareing, J. Org. Chem., 63,8486 (1998). 583 (2001).
CHAPTER THIRTEEN
Contents
1 Introduction, 584
2 Current Trends and Recent Developments, 591
2.1 LC-MS Purification of Combinatorial
Libraries, 592
2.2 Confirmation of Structure and Purity of
Combinatorial Compounds, 594
2.3 Encoding and Identification of Compounds in
Combinatorial Libraries and Natural
Product Extracts, 596
2.4 Mass Spectrometry-Based Screening, 597
2.4.1 Affinity Chromatography-Mass
Spectrometry, 598
2.4.2 Gel Permeation Chromatography-Mass -
Spedrometry, 599
2.4.3 Affinity Capillsuy Electrophoresis-
Mass Spectrometry, 599
2.4.4 Frontal Affinity Chromatography-Mass
Spectrometry, 601
2.4.5 BioafEnity Screening using Electro-
spray FTICR Mass Spedrometry, 601
2.4.6 Pulsed Ultrafiltration-Mass
Spectrometry, 603
2.4.7 Solid Phase Mass Spectrometric
Screening, 606
3 Things to Come, 607
4 Web Site Addresses and Recommended Reading
for Further Information, 608
5 Acknowledgments, 608
searchers in the petroleum industry (I), CI be- carried out by accurately weighing the un-
came another standard ionization technique known ion and comparing its m l . value to that
for organic mass spectrometry. During CI, of a calibration standard. Since the 1960s,
high energy electrons (as in EI) are used to other types of mass spectrometers capable of
ionize a gas called a reagent gas at a constant high resolution exact mass measurements
pressure (usually -1 Torr) in the mass spec- have become available as commercial prod-
trometer ionization source. The reagent gas in ucts, including Fourier transform ion cyclo-
turn ionizes the sample molecules through tron resonance (FTICR) mass spectrometers,
ion-molecule reactions that usually involve reflectron TOF instruments, and recently,
the exchange of protons. Less frequently, sam- quadrupole time-of-flight hybrid (QqTOF)
ple molecule ionization might involve a charge mass spectrometers (see Table 13.1 for a list-
exchange. Two of the most common ionization ing of types of organic mass spectrometers and
mechanisms in CI are summarized in Equa-
a comparison of their performance character-
tions 13.3 and 13.4.
istics). By the early 2000s, FTICR and QqTOF
M + RH + -+MH + + R CI through proton instruments became more popular than mag-
netic sector mass spectrometers for exact
transfer, R = reagent gas (13.3)
mass measurements, high resolution mea-
surements, and drug discovery applications.
As will be discussed below, exact mass mea-
CI through charge exchange
surements are essential to many types of mass
During the 1960s, high resolution double-fo- spectrometry-based screening and drug dis-
cusing magnetic sector instruments became covery today.
available and are now standard tools for the Biomedical applications of mass spectrom-
determination of elemental compositions us- etry began during the 1960s both at academic
ing a type of analysis called exact mass mea- institutions and pharmaceutical companies.
surement. In mass spectrometry, resolution is These applications depended on the volatiliza-
defined as MIAM, where M is the mlz value of a tion (usually by heating) of pharmaceutical
singly charged ion, and AM is the difference compounds and biochemicals before their gas-
(measured in mlz) between M and the next phase ionization using EI or CI. To increase
highest ion. Alternatively, AM may be defined the thermal stability and volatility of these
in terms of the width of the peak. High resolu- compounds, a variety of derivatization meth-
tion is typically regarded as a value of at least ods were developed to mask polar functional
10,000. At this resolution, the molecular ions groups and reduce hydrogen bonding between
of most drug-like molecules (that is com- molecules. These methods were particularly
pounds with molecular weights less than effective for use with gas chromatography-
-500) can be resolved from each other. After mass spectrometry (GC-MS), which was intro-
resolving a sample ion from others in a mass duced during the 1960s as a practical and pow-
spectrum, an exact mass measurement may be erful tool for qualitative and quantitative
Mass Spectrometry and Drug Discovery
analysis of compounds in mixtures. Both EI in 1982 with the invention of fast atom bom-
and CI were immediately useful for GC-MS, bardment (FAB) (2). FAB and its counterpart,
because both of these ionization methods re- liquid secondary ion mass spectrometry
quire that the analytes be in the gas phase. (LSIMS), facilitated the formation of abun-
When capillary GC was incorporated into GC- dant molecular ions, protonated molecules,
MS, this technique reached maturity. GC-MS and deprotonated molecules of non-volatile
may be used to select, identify, and quantify and thermally labile compounds such as pep-
organic compounds in complex mixtures at tides, chlorophylls, and complex lipids up to
the femtomole level. The speed of GC-MS is approximately mlz 12,000. FAB and LSIMS
determined by the chromatography step, use energetic particle bombardment (fast at-
which typically requires several minutes to 1 h oms or ions from 3 to 30,000 V of energy) to
per analysis. By the 1970s, some organic ionize compounds dissolved in non-volatile
chemists were announcing that organic mass matrices such as glycerol or 3-nitrobenzyl al-
spectrometry had reached maturity and that cohol and desorb them from this condensed
no new applications were possible. Like the phase into the gas phase for mass spectromet-
physicists and physical chemists who had pro- ric analysis (see Fig. 13.1). Protonated or de-
nounced the end of mass spectrometry a gen- protonated molecules are usually abundant
eration earlier, this group would soon be and fragmentation is minimal.
proved wrong. Introduced in the late 1980s, matrix-as-
Although GC-MS remains important for sisted laser desorption ionization (MALDI)
the analysis of many organic compounds, this has helped solve the mass limit barriers of la-
technique is limited to volatile and thermally ser desorption mass spectrometry so that sin-
stable compounds that comprise only a small gly charged ions may be obtained up to mlz
fraction of all organic compounds and even 500,000 and sometimes higher (3). For most
fewer biomedically important molecules. commercially available MALDI mass spec-
Therefore, thermally unstable compounds, in- trometers, ions up to mlz 200,000 are readily
cluding many pharmaceutical compounds obtained. Like FAB and LSIMS, MALDI sam-
such as nucleic acid analogs and biomolecules ples are mixed with a matrix to form a solution
such as proteins, carbohydrates, and nucleic that is loaded onto the sample stage for anal-
acids, cannot be analyzed in their native forms ysis. Unlike the other matrix-mediated tech-
using GC-MS. (For more details regarding niques, the solvent is evaporated before
GC-MS and its applications, see Watson 1997, MALDI analysis, leaving sample molecules
Section 4.) Although derivatization facilitates trapped in crystals of solid phase matrix. The
the GC-MS analysis of many of these com- MALDI matrix is selected to absorb the pulse
pounds, alternative ionization techniques of laser light directed at the sample. Most
were needed for the analysis of the vast major- MALDI mass spectrometers are equipped
ity of polar and non-volatile compounds of in- with a pulsed UV laser, although IR lasers are
terest to drug discovery. available as an option on some commercial in-
During the 1970s and early 1980s, desorp- struments. Therefore, matrices are often sub-
tion ionization techniques such as field de- stituted benzenes or benzoic acids with strong
sorption (FD), desorption EI, desorption CI UV absorption properties. During MALDI, the
(DCI), and laser desorption were developed to energy of the short but intense UV laser pulse
extend the use of mass spectrometry toward obliterates the matrix and in the process de-
the analysis of more polar and less volatile sorbs and ionizes the sample. Like FAB and
compounds (see Watson 1997, Section 4, for LSIMS, MALDI typically produces abundant
more information regarding desorption ion- protonated or deprotonated molecules with
ization techniques including DCI and FD). Al- little fragmentation.
though these techniques helped extend the By the time that GC-MS had become a stan-
mass range of mass spectrometry beyond a dard technique in the late 1960s, LC-MS was
traditional limit of mlz 1000 and toward ions still in the developmental stages. Producing
of mlz 5000, the first breakthrough in the anal- gas-phase sample ions for analysis in a vac-
ysis of polar, non-volatile compounds occurred uum system while removing the high perfor-
1 Introduction 587
Figure 13.1. Scheme for desorption ionization using FAB or LSIMS from a liquid matrix (0).
mance liquid chromatography (HPLC) mobile connected to a vacuum pump. As the droplets
phase proved to be a challenging task. Early evaporate, aggregates of analyte (particles)
LC-MS techniques included a moving belt in- form and pass through a momentum separa-
terface to desolvate and transport the HPLC tor that removes the lower molecular weight
eluate into an CI or EI ion source or a direct solvent molecules. Finally, the particle beam
inlet system in which the eluate was pumped enters the mass spectrometer ion source
at a low flow rate (1-3 pL/min) into a CI where the aggregates strike a heated plate
source. However, neither of these systems was from which the analyte molecules evaporate
robust enough or suitable for a broad enough and are ionized using conventional EI or CI.
range of samples to gain widespread accep- Particle beam LC-MS is limited to the analysis,
tance. of volatile and thermally stable compounds
Because FAB (or LSIMS) requires that the that are amenable to flash evaporation and EI
analyte be dissolved in a liquid matrix, this or CI mass spectrometry. Therefore, this ap-
ionization technique was easily adapted for in- proach is not used for polar biochemicals such
fusion of solution-phase samples into the FAB as carbohydrates, sugars, peptides, proteins,
ionization source in an approach known as or nucleic acids.
continuous-flow FAB. Then, continuous-flow Because thermospray became the first
FAB was connected to microbore HPLC col- widely used LC-MS technique (during the late
umns for LC-MS applications (4). Because this 1970s and early 1980s), this technique should
method is limited to microbore HPLC applica- be mentioned here. Thermospray facilitates
tions at flow rates of <10 pL/min and requires the interfacing of standard analytical HPLC
considerable operator intervention, it is not systems at flow rates up to 1 mL/min with
ideal for the analysis of large sample sets. In- mass spectrometers. Although the interface
stead, more robust techniques have been de- between the HPLC and mass spectrometer is
veloped to fulfill this requirement. However, inefficient and exhibits low sensitivity for
continuous-flow FAB is still in use in some most analytes, thermospray has been useful
laboratories. for the LC-MS analysis of many types of small
Like continuous-flow FAB, the popularity molecules. During thermospray, the HPLC el-
of particle beam interfaces is diminishing, but uate is sprayed through a heated capillary into
systems are still available from commercial a heated desolvation chamber at reduced pres-
sources. During particle beam LC-MS, the sure. Gas phase ions remaining after desolva-
HPLC eluate is sprayed into a heated chamber tion of the droplets are extracted through a
Mass Spectrometry and Drug Discovery
9 -
Figure 13.2. Positive ion APCI
mass spectrum of the red carot- -
enoid lycopene in a solution of $ - 119
methanol and tert-butyl methyl 444 467
ether (1:l;vlv).In this analysis, ly-
copene formed a protonated mole-
cule instead of a molecular ion,
100 200 300 400 500
skimmer into the mass spectrometer for anal- but similar to that formed during thermo-
ysis. The sensitivity of thermospray is poor spray. A cross-flow of heated nitrogen gas is
because there is no mechanism or driving - used to facilitate the evaporation of solvent
force to enhance the number of sample ions from the droplets. The resulting gas-phase
entering the gas phase from the spray during sample molecules are ionized by collisions
desolvation. Also, thermally labile compounds with solvents ions, which are formed by a co-
. in the heated source. These
tend to decom~ose rona discharge in the atmospheric pressure
problems were solved when thermospray was chamber. Molecular ions, M+' or M-', andlor
replaced by electrospray during the late 1980s. protonated or deprotonated molecule; can be
During the 1990s, electrospray and atmo- formed. The relative abundance of each type
spheric pressure chemical ionization (APCI) of ion depends on the sample itself, the HPLC
became the standard interfaces for LC-MS. solvent, and the ion source parameters. Next,
Today, APCI and electrospray ionization are ions are drawn into the mass spectrometer an-
the most widely used ionization sources and alyzer for measurement through a narrow
HPLC interfaces for drug discovery using opening or skimmer that helps the vacu.im
mass spectrometry. Unlike thermospray, par- pumps to maintain very low pressure inside
ticle beam or continuous-flow FAB, electro- the analyzer, while the APCI source remains
spray and APCI interfaces operate at atmo- at atmospheric pressure. For example, the
spheric pressure and do not depend on positive ion APCI mass spectrum of lycopene
vacuum pumps to remove solvent vapor. As a is shown in Fig. 13.2. The carotenoid lycopene
result, they are compatible with a wide range is the red pigment of ripe tomatoes and is un-
of HPLC flow rates. Also, no matrix is re- der clinical investigation for the prevention of
quired. Both APCI and electrospray are com- prostate cancer (5).
patible with a wide range of HPLC columns During electrospray, the HPLC eluate is
and solvent systems. Like all LC-MS systems, sprayed through a capillary electrode at high
the solvent system should contain only vola- potential (usually 2000-7000 V) to form a fine
tile solvents, buffers or ion pair agents to re- mist of charged droplets at atmospheric pres-
duce fouling of the mass spectrometer ion sure. As the charged droplets migrate towards
source. In general, APCI and electrospray the opening of the mass spectrometer because
form abundant molecular ion species. When of electrostatic attraction, they encounter a
fragment ions are formed, they are usually cross-flow of heated nitrogen that increases
more abundant in APCI than electrospray solvent evaporation and prevents most of the
mass spectra. solvent molecules from entering the mass
The APCI interface uses a heated nebulizer spectrometer. Molecular ions, protonated or
to form a fine spray of the HPLC eluate, which deprotonated molecules, and cationized spe-
is much finer than the particle beam system cies such as [M + Na] and [M + Klt can be
+
1 Introduction
formed. (For additional information on elec- terization would require CID and MS-MS as
trospray ionization, see Cole 1997, Section 4). discussed in the next section.
In addition to singly charged ions, electro- When analyzing complex mixtures such as
spray is unique as an ionization technique in the botanical extract shown in Fig. 13.3, the
that multiply charged species are common and use of chromatographic separation before
often constitute the majority of the sample ion mass spectrometric ionization and analysis is
abundance. The relative abundance of each of essential to distinguish between isomeric com-
these species depends on the chemistry of the pounds. Even simple mixtures of synthetic
analyte, the pH, the presence of proton donat- compounds might contain isomers that would
ing or accepting species, and the levels of trace require LC-MS for adequate characterization.
Another problem overcome by using a chro-
amounts of sodium or potassium salts in the
matography step before mass spectrometric
mobile phase. In contrast, APCI, MALDI, EI,
analysis is ion suppression. No matter what
CI, and FABLSIMS usually produce singly ionization technique is used, the presence of
charged species. A consequence of forming multiple compounds in the ion source might
multiply charged ions is that they are detected enhance the ionization of one compound while
at lower mlz values (i.e., z > 1)than the corre- suppressing the ionization of another. Usu-
sponding singly charged species. This has the ally, only some of the compounds in a complex
benefit of allowing mass spectrometers with mixture can be detected by mass spectrometry
modest mlz ranges to detect and measure ions without chromatographic separation. The
of molecules with very high masses. For exam- presence of salts and buffers in a sample can
ple, electrospray has been used to measure also suppress sample ionization. Therefore,
ions with molecular weights of hundreds of LC-MS has become a powerful tool for analyz-
thousands or even millions of Daltons on mass ing natural products, synthetic organic com-
spectrometers with mlz ranges of only a few pounds, and pharmaceutical agents and their
thousand. (For a review of LC-MS techniques, metabolites.
see Niessen 1999, Section 4.) In general, APCI facilitates the ionization
An example of the C,, reversed phase of non-polar and low molecular weight species,
HPLC-negative ion electrospray mass spectro- and electrospray is more useful for the ioniza-
metric (LC-MS) analysis of an extract of the tion of polar and high molecular weight com-
botanical, Trifoliumpratense L. (red clover),is pounds. In this sense, APCI and electrospray
shown in Fig. 13.3. Extracts of red clover are are often complementary ionization tech-
used as dietary supplements by menopausal niques. However, during the analysis of large
and postmenopausal women and are under in- or diverse combinatorial libraries, both polar
vestigation as alternatives to estrogen replace- and non-polar compounds are usually present.
ment therapy (6). The two-dimensional map As a result, no one set of ionization conditions
illustrates the amount of information that using APc'I or electrospray is adequate to de-
may be acquired using hyphenated techniques tect all the compounds contained in the library
such as LC-MS. In the time dimension, chro- of compounds. Therefore, a UV ionization
matograms are obtained, and a sample com- technique called atmospheric pressure photo-
puter-reconstructed mass chromatogram is ionization (APPI) has been developed for use
shown for the signal at mlz 269. An intense with combinatorial libraries and LC-MS (7).
chromatographic peak was detected eluting at Recently, APPI became a commercially avail-
12.4 min. In the mlz dimension, the negative able ionization alternative to APCI and elec-
ion electrospray mass spectrum recorded at trospray. During APPI, a liquid solution or
12.4 min shows a base peak at mlz 269. Based HPLC eluate is sprayed at atmospheric pres-
on comparison with authentic standards (data sure, as in APCI. Instead of using a corona
not shown), the ion of mlz 269 was found to discharge as in APCI, ionization occurs during
correspond to the deprotonated molecule of APPI because of irradiation of the analyte
genistein, which is an estrogenic isoflavone molecules by an intense UV light source. Ob-
(6). Because almost no fragmentation of the viously, the carrier solvent must not absorb
genistein ion was observed, additional charac- UV light at the same wavelengths, or interfer-
Mass Spectrometry and Drug Discovery
Computer-reconstructed
mass chromatogram of m/z 269
500 ............................................
.
1 , , I , , I , I , I 1 1 1 1 I
, 1 1 1 1 1 1 1 8 I , , I I I , , , ! I
1 1 1 1 , 1 1 1 1 1 8 8 I I I I J I I
I I I I I I I I I / , , , I 8 I I I I I I I (
,
,
,
,
..........,. ..,.....*.....*..............................
.....+. .....#............*..... I
I
,
,
.)
,
,
,
,
I
I
I
I
I
I
I
I I
I I
I
I
I
I
I
............* .....* ....................... .....*.....+ .....,
I
I I
I
I
I
I
I I
. I
I
I
I
I
I
I
I
I
I
, , , , , , , , , ~ , , , , ~ > _ ! , ! ~ ~ l ! ~
, I , I I , I , I I . I I I I I I
/
.......l""',.."'l'
, ,
,
/
....l.....I.....
, I
I
,...
,
,
C . ...,.....l..... .,............,.....,.....,.....
,
, ,
I
,
,
,
I
,
I
I
T.
I
/
,
...,......... 2r.....,....x;. .,..... .....,......
I
_
. ., I
,
I
.
I
I
I
.
I
~
~
~
I T
8
,,
1
I
,
4
/
I
1
I
,
I
I
,
I
I I
,
,
I
,
,
I
,
t
, i
I I
i ,
1
I
8
I
I
8 . 8 , I
I
I
I
I
I
I _
I
I
!
I
......,.......... .--.l...,...L.
J ..,.....,..... ......L ..... .....,.....1..... ..... ....3 ..... .I7 .... ................. .....
J
, I , , . , ,
l....
J.
" ,
L
,
1
,
J
,
,...7
~
1:
,
k 1
, ~ ~ ~
,, ,, ,, ,, ,, ,
I /
T '
, , ,
"""'r'""
..,.
I
8 8
8
I
8
8
I
,
/
I
1
I
8
,
1
-
8
1
8
8
1
8
I ..I---
I
8 .
I I I I
8
I-
I
.I
I.
I
I
I
I
I
/ ,
.....+ _ _-,_.....,.....,.....
,
__ ..:.. * ..... .....,..... ,.....,....*....+-..,.....,......,
,
I
I I ! :
I
,
I
I
I
I
I
I I
I
I
I
.
I
I
I
I I
I
I
I I I I _
.
.. i .
I I
-
- 4
, ,
, , ,
,
, , I , , , I I , - I I I ' - ! - - - - I I I
+:4T..".r: .....,.....,.....
8 T i - , I T ' , I - C T - 8 I I ' 8 . 2 8 I ' I . , - I
: ; ; ; f : .LI-::..:..I.-:
.......,.....,.....1.....l..... .I.....,...
8 / I I
,.....,... , _
...,-..,.....v....T.....
I _ , I I I 8 I I I I 8 . 8 I I I I - I L I
, ,
, ,
, , , T , , , , , , ,
8
I
, * I - , . ,
, , , + ,
8
'
8
,
,
,
8
,
8
,
,
,
,
,
- ,
,
- d
.
l
,
- #
I I
8
I I / I I I I I I I I - , I I I I I I I I L , - I - / - I - I
......1.....1.....
I ,
,....J.
,
......
,
I.....,..:..I. ..i..... J .....,...I
8 , . I , J
..... .... ..... .....,.....J......'...... .....,....L. .....1.....1..... .....>......
1
L.
1
.L
1
1
1 1 1 1
1
1 1 1 , I
J
I I I
1
#
1 1
# 1
# 1
, 1
, 1
, 1
.
1
,
1
,
1
/
1
,
1
, ,
1
,
1
I
1
I
1
I
1
/
1
,
1
I
,
I
I
I
I
I
I
I
I I
I
I , , , , , , I I , , I I I ' I I I I t I ~ I I
, , I , I , - I I I I I I I / . I I I I I / , ~ I I I
- 150
- .
'' 1
8
'4 16 20 24 28
u
",
m O0
C
3
%
1 Retention time (min)
Mass spectrum
at 12.4 min
269 [M-HI-
Figure 13.3. Two-dimensional map showing the LC-MS analysis of an extract of red clover under
investigation for the management of menopause. Reversed phase separation was carried out using a
C,, HPLC column in the time dimension and negative ion electrospray mass spectrometry was used
for compound detection and molecular weight determination in the second dimension.
ence would prevent sample ionization and de- hance the amount of structural information in
tection. The use of APPI as an alternative to these mass spectra, CID may be used to pro-
APCI and electrospray for drug discovery ap- duce more abundant fragment ions from mo-
plications is under investigation. lecular ion precursors formed and isolated
Desorption ionization techniques like FAB, during the first stage of mass spectrometry.
MALDI, and electrospray facilitate the molec- Then, a second mass spectrometry analysis
ular weight determination of a wide range of may be used to characterize the resulting
polar, non-polar, and low, and high molecular product ions. This process is called tandem
weight compounds including drugs and drug mass spectrometry or MS-MS and is illus-
targets such as proteins. However, the "soft" trated in Fig. 13.4.
ionization character of these techniques Another advantage of the use of tandem
means that most of the ion current is concen- mass spectrometry is the ability to isolate a
trated in molecular ions, and few structurally particular ion such as the molecular ion of the
significant fragment ions are formed. To en- analyte of interest during the first mass spec-
2 Current Trends and Recent Developments
100
Figure 13.4. Scheme illustrat-
ing the selectivity of MS-MS and
0 the process by which CID facili-
0
C
m
536 tates fragmentation of prese-
z3
M- lected ions. Negative ion electro-
spray tandem mass spectrum of
n
m 50
-
.-9
-m
lycopene. CID was used to induce
fragmentation of the molecular
ion of mlz 536. As a result, the
2 fragment ion of mlz 467 was
formed by the loss of a terminal
isoprene unit. This fragment ion
0 may be used to distinguish lyco-
pene from isomeric a-carotene
300 340 380 420 460 500 540 580
and p-carotene, which lack termi-
mlz nal isoprene groups.
trometry stage. This precursor ion is essen- ments that are essential to modern drug dis-
tially purified in the gas-phase and free of im- covery namely speed, sensitivity, and selec-
purities such as solvent ions, matrix ions, or tivity.
other analytes. Finally, the selected ion is frag-
mented using CID and analyzed using a sec-
ond mass spectrometry stage. In this manner, 2 CURRENT TRENDS AND RECENT
the resulting tandem spectrum contains ex- DEVELOPMENTS
clusively analyte ions without impurities that
might interfere with the interpretation of the Since the early 1990s, pharmaceutical re-
fragmentation patterns. In summary, CID search has focused on combinatorial chemis-
may be used with LC-MS-MS or desorption try (8,9) and high-throughput screening (10) t
ionization and MS-MS to obtain structural in- in an effort to accelerate the pace of drug dis-
formation such as amino acid sequences of covery. The goal has been to produce, in a
peptides and sites of alkylation of nucleic ac- short time, large numbers of synthetic organic
ids, or to distinguish structural isomers such compounds representing a great diversity of
as p-carotene and lycopene. Beginning in chemical structures through a process called
2001, TOF-TOF tandem mass spectrometers combinatorial chemistry and then quickly
became available from instrument manufac- screen them in vitro against pharmacologi-
turers. These instruments have the potential cally significant targets such as enzymes or
to deliver high resolution tandem mass spec- receptors. The "hits" identified through these
tra with high speed that should be compatible high-throughput screens may then be opti-
with the chip-based chromatography systems mized by quickly and efficiently synthesizing
now under development. and then screening large numbers of analogs
Over the course of the last century, mass called targeted or directed libraries. As a re-
spectrometry has become an essential ana- sult, lead compounds might emerge from such
lytical tool for a wide variety of biomedical combinatorial chemistry drug discovery pro-
applications including drug discovery and grams in a few weeks instead of several years.
development. By combining mass spectrom- Furthermore, a single organic chemist using
etry with chromatography as in LC-MS or by combinatorial synthetic methods might syn-
adding another stage of mass spectrometry thesize thousands of compounds or more in a
as in MS-MS, the selectivity of the technique single week instead of less than five in the
increases considerably. As a result, mass same time using conventional techniques, and
spectrometry offers all of the analytical ele- a single medicinal chemist might identify hun-
Mass Spectrometry and Drug Discovery
dreds of lead compounds per month instead of The application of combinatorial chemistry
just one or two in the same period of time. and high-throughput screening to drug dis-
Accompanying this new drug discovery covery has altered the traditional serial pro-
paradigm, new scientific journals have been cess of lead identification and optimization
established such as Combinatorial Chemistry that previously required years of human ef-
& High Throughput Screening, Journal of fort. Consequently, neither the synthesis of
Combinatorial Chemistry, Journal of Biomo- new chemical entities nor their screening is
lecular Screening, and Molecular Diversity limiting the pace of drug discovery. Instead, a
(see list of journal websites in Section 4). The new bottleneck is the verification of the struc-
variety of topics published in these journals ture and purity of each compound in a combi-
natorial library or of each lead compound ob-
reflects the multidisciplinary nature of the
tained from an uncharacterized library using
current drug discovery process and ranges
high-throughput screening. Because the num-
from organic chemistry, medicinal chemistry,
ber of lead compounds entering the drug de-
molecular modeling, molecular biology, and velopment process has increased, in part be-
pharmacology, to analytical chemistry. As de- cause compounds are entering development at
scribed below, the most significant analytical earlier stages than in the past, the traditional
component of drug discovery has become mass drug development investigations concerning
spectrometry. Only mass spectrometry has be- absorption, distribution, metabolism, and ex-
come an essential element at all stages of the cretion (ADME) and even toxicology evalua-
drug discovery and development process. tions of new drug entities have become addi-
Although a variety of spectroscopic and tional bottlenecks. As a solution to the drug
chromatographic techniques, including infra- development bottlenecks, high-throughput
red spectroscopy, nuclear magnetic resonance assays to assess the metabolism, bioavailabil-
spectroscopy, fluorescence spectroscopy, gas ity, and toxicity of lead compounds are being
chromatography, HPLC, and mass spectrom- developed and applied earlier than ever during
etry, are being used to support drug discovery the drug discovery process, so that only those
in various capacities, some of them, such as compounds most likely to become successful
gas chromatography and fluorescence spec- drugs enter the more expensive and slower
troscopy, are not applicable to most new chem- preclinical pharmacology and toxicology stud-
ical entities, some are not specific enough for ies. In support of these new combinatorial
chemical identification (e.g., infrared spec- chemistry synthetic programs and new high-
troscopy), and other techniques suffer from throughput assays, mass spectrometry has
low throughput (e.g., nuclear magnetic reso- emerged as the only analytical technique with
nance spectroscopy). Unlike gas chromatogra- sufficient throughput, sensitivity, selectivity,
phy, HPLC is compatible with virtually all and robustness to address all of these bottle-
drug-like molecules without the need for necks.
chemical derivatization to increase thermal
stability or volatility. In addition, mass spec-
2.1 LC-MS Purification of Combinatorial
trometry provides a universal means to char-
Libraries
acterize and distinguish drugs based on both
molecular weight and structural features Although combinatorial libraries were origi-
while at the same time providing high nally synthesized as mixtures, today most li-
throughput. With the development of routine braries are prepared in parallel as discrete
LC-MS interfaces and ionization techniques compounds and then screened individually in
such as electrospray and APCI, mass spec- microtiter plates of 96-well, 384-well, or 1536-
trometry has also become an ideal HPLC de- well formats. To facilitate subseauent struc-
tector for the analysis of combinatorial librar- ture-activity analyses and to assure the valid-
ies (ll),and LC-MS, MS-MS, and LC-MS-MS ity of the screening results, many laboratories
have become fundamental tools in the analysis verify the structure and purity of each com-
of combinatorial libraries and subsequent pound before high-throughput screening.
drug development studies (12-14). Semi-preparative HPLC has become the most
2 Current Trends and Recent Developments
m
8e7 1
50 rng inj. Desire1
produc
\
Purity = 90.7%
\
5
Time, min
8 Threshold for
.$
c
4e6 fraction collection
a
" I I I
5 2e6 2 4
Time, min
Time, min
Figure 13.5. Mass-directed purification of a combinatorial library. Chromatographic separation
was carried out using gradient elution of 10-90%acetonitrile in water for 7 min after an initial hold
at 10%acetonitrile for 1 min. (a) Total ion chromatogram showing desired product and impurities. (b)
Computer-reconstructed ion chromatogram (RIC) corresponding to the expected product. (c) Post-
purification analysis of the isolated component with a purity >go%. (Reproduced from Ref. 15 by
-
popular technique for the purification of com- ation (15-17). Any size HPLC column may be
binatorial libraries on the milligram scale be- used, and only a small fraction of the eluant
cause of high throughput and the ease of au- (-pL/min) is diverted to the mass spectrome-
tomation. Typically during semi-preparative ter equipped for APCI or electrospray ioniza-
HPLC, fraction collection is initiated when- tion. Because all of the components, including
ever a UV signal is observed above a predeter- autosampler, injector, HPLC, switching valve,
mined threshold. This procedure usually re- mass spectrometer, and fraction collector, are
sults in the collection of several fractions per controlled by computer, the procedure may be
analysis and hence creates additional issues fully automated. For greatest efficiency, the
such as the need for large fraction collector system may be programmed to collect only
beds and the need for secondary analysis using those peaks displaying the desired molecular
flow-injection mass spectrometry, LC-MS, or ions, or alternatively, all peaks displaying
LC-MS-MS to identify the appropriate frac- abundant ions within a specified mass range.
tions. When purification of large numbers of An example of the MS-guided purification of a
combinatorial libraries is required, this ap- compound synthesized during the parallel
proach can become prohibitively time consum- synthesis of a combinatorial library of discrete
ing and expensive. compounds is shown in Fig. 13.5. Although the
To enhance the efficiently of this purifica- crude yield of the reaction product was only
tion procedure, the steps of HPLC purification 30% (Fig. 13.5a), the desired product was de-
and mass spectrometric analysis may be com- tected based on its molecular ion (Fig. 13.5b).
bined into automated mass-directed fraction- After MS-guided fractionation, re-analysis us-
Mass Spectrometry and Drug Discovery
ing LC-MS showed that the desired product provide additional structural information
was >90% pure (Fig. 13.5~). through the use of CID to produce fragment
The use of MS-guided purification of com- ions. As discussed above (see also Table 13.11,
binatorial libraries provides a means for re- tandem mass spectrometers include triple
ducing the number of HPLC fractions col- quadrupole instruments, QqTOF mass spec-
lected per sample and eliminates the need for trometers, ion trap mass spectrometers, mul-
post-purification analysis to further charac- tiple sector magnetic sector instruments,
terize and identify each compound as would be FTICR instruments, and the new TOF-TOF
necessary when using UV-based fractionation. mass spectrometers. In most applications,
The ionization technique (i.e., electrospray, APCI or electrospray ionization is used.
APCI, or APPI), and ionization mode (positive In addition to molecular weight and frag-
or negative) must be suitable for the combina- mentation patterns, high precision and high
torial compound so that molecular ion species resolution mass spectrometers such as
are formed. Also, a suitable mobile phase and QqTOF instruments, reflectron TOF mass
HPLC column must be selected. As an alter- spectrometers, double focusing magnetic sec-
native to HPLC, supercritical fluid chroma- tor mass spectrometers, and FTICR instru-
tography-mass spectrometry (SFC-MS) has ments are necessary for the measurement of
been used for the high-throughput analysis of exact masses of drugs and drug candidates for
combinatorial libraries (18, 19). The advan- the determination of elemental compositions.
tages of SFC-MS relative to conventional The combination of high resolution and high
LC-MS for the purification of combinatorial precision is especially useful for determining
libraries of compounds are the lower viscosi- the elemental compositions of compounds in
ties and higher diffusivities of condensed CO, combinatorial library mixtures without hav-
compared with HPLC mobile phases and the ing to isolate each compound using chroma-
ease of solvent removal and disposal after tography or some other separation technique.
analysis. However, SFC instrumentation re- Because FTICR instruments and the hybrid
mains more expensive and less widely avail- QqTOF mass spectrometers are capable of si-
able than conventional HPLC systems. multaneously measuring exact masses at high
resolution of both molecular ions and frag-
ment ions generated during MS-MS, these .in-
2.2 Confirmation of Structure and Purity of
struments are becoming extremely popular
Combinatorial Compounds
within drug discovery programs.
The determination of molecular weights, ele- As an example of the exact mass measure-
mental compositions, and structures of com- ment of a combinatorial library mixture, the
pounds used for high-throughput screening, FTICR negative ion electrospray mass spectra
whether discrete compounds or combinatorial of a 36- and a 120-compound peptide library
library mixtures, is typically carried out using mixture are shown in Fig. 13.6. The resolution
mass spectrometry, because traditional spec- achieved in this experiment was 20,000-
troscopic and gravimetric techniques are too 40,000. Although the exact masses of all com-
slow to keep pace with combinatorial chemical ponents in a small combinatorial library can
synthesis. In addition, mass spectrometry may often be measured during a single infusion ex-
be used to assess the purity of compounds be- periment, on-line HPLC separation or the
ing used for high-throughput screening. The analysis of discrete compounds is sometimes
highest-throughput technique for confirming required to overcome ion suppression prob-
molecular weights and structures of drug can- lems. However, LC-MS is a relatively slow pro-
didates is flow injection analysis of sample so- cess because of the slow chromatographic sep-
lutions using electrospray, APCI, or APPI aration step. Because LC-MS is required in
mass spectrometry. Typically, no sample prep- many instances for the analysis of mixtures
aration is necessary. and to eliminate interfering salts or buffers,
Although any organic mass spectrometer two approaches have emerged to increase the
may be used to confirm the molecular weight throughput of this technique; parallel LC-MS
of a compound, tandem mass spectrometers and fast LC-MS. One approach to increasing
2 Current Trends and Recent Developments
I I ~ I I I I ~ I I I I ~ I I I I ~ I I I I ~ I I I I ~ I I I I ~ I I I I ~ I I I I ~ I I I I J I I
(b) ThrMe-X-Asp
637.3136 639.2922
I Asp-X-Asp 2.5 ppm
639.2559
3.8 ppm /
Figure 13.6. (a) Partial negative ion electrospray mass spectrum of a 36-component library mix-
ture. Both the measured mass and the difference between the measured and theoretical values (in
ppm) are shown. (b) Negative ion electrospray spectrum of the 120-component library showing the
resolution of three nominally isobaric peaks. (Reproduced from b f . 24 by permission of Bentham
Science Publishers).
throughput of the rate-limiting chromato- spray interfaces and HPLC systems are now
graphic separation has been to simultaneously available that can accommodate up to eight
interface multiple HPLC columns to a single HPLC columns simultaneously (20-22). Al-
mass spectrometer. This approach is called though the multiple sprays are introduced to
parallel LC-MS. Commercial parallel electro- the ion source simultaneously, these streams
Mass Spectrometry and Drug Discovery
tagging, this source of interference is avoided. based affinity screening methods have been
However, this approach is specific to peptide developed to streamline the tedious process
libraries and is not necessarily applicable to of activity-guided fractionation. These ap-
other types of combinatorial libraries. proaches are discussed in Section 2.4.
Another approach that eliminates possible Whether lead compounds in natural prod-
interference from the chemical tags, "ratio en- uct extracts are isolated using bioassay-guided
coding," has been developed for the mass spec- fractionation or mass spectrometry-based
trometric identification of bioactive leads us- screening, there is a high probability that the
ing stable isotopes incorporated into the structure of the active compound(s) has al-
library compounds (29,34). Within the ligand ready been reported in the natural product lit-
itself, the code might be a single-labeled atom erature. In such cases, the tedious process of
that is conveniently inserted whenever a com- complete structure elucidation using a battery
mon reagent transfers at least one atom to the of spectrometric tools should be unnecessary.
target compound or ligand. The code consists Instead, mass spectrometry alone may be used
of an isotopic mixture having one of the many to quickly "dereplicate" or identify the known
predetermined ratios of stable isotopes and compounds based on molecular weight, frag-
can be incorporated in the linker or added mentation patterns, and elemental composi-
through a reagent used during the synthesis. tion in combination with natural product da-
The mass spectrum of the compound shows a tabase searching (35-39). Commercially
molecular ion with a unique isotope ratio that available natural products databases include
codes for a particular library compound. For NAPRALERT (40), Scientific & Technical In-
example, Wagner et al. (29) used isotope ratio formation Network (STN) (41), and the Dic-
encoding during the synthesis of a 1000-com- tionary of Natural Products (42). Because
pound peptoid library and was able to identify some of these databases also contain WIVIS
uniquely all the components based on their absorbance data, it is also advantageous to use
isotopic patterns and molecular weights. Be- a photodiode array detector between the
cause isotope ratio codes are contained within HPLC and mass spectrometer to obtain addi-
each combinatorial compound, a chemical tag tional spectrometric data during LC-W-MS
is not required. The speed of MS-based decod- dereplication (36, 37).
ing outperforms most other decoding technol-
2.4 Mass Spectrometry-Based Screening
ogies, which are time consuming and decode a
restricted set of active compounds. The earliest approaches to combinatorial syn-
Although combinatorial synthesis provides thesis used portioning and mixing (26)and en-
rapid access to large numbers of compounds abled the synthesis of combinatorial libraries
for screening during drug discovery and lead containing hundreds of thousands to millions
optimization, these libraries are usually based of compounds. Today, this approach remains
on a small number of common structures or the most efficient method for preparing enor-
scaffolds. There is a constant need for increas- mous libraries of compounds. However, until
ing the molecular diversity of combinatorial the mid-1990s, efficient screening techniques
libraries and finding new scaffolds, and natu- did not exist to rapidly identify the "hits"
ral products have always been a rich source of within large combinatorial mixtures. There-
chemical diversity for drug discovery. The tra- fore, chemists were motivated to develop ways
ditional approach to screening natural prod- to prepare large numbers of discreet com-
ucts for drug leads uses bioassays to test or- pounds using massively parallel synthesis,
ganic solvent extracts for activity. If strong which could be assayed quickly for pharmaco-
activity is detected, then activity-guided frac- logical activity using high throughput screen-
tionation of the crude extract is used to isolate ing one compound at a time. Recently, several
the active compound(s),which is identified us- mass spectrometry-based screening assays
ing mass spectrometry (including tandem have been developed that are suitable for
mass spectrometry and exact mass measure- screening combinatorial library mixtures, and
ments), IR, W M S spectrometry, and NMR. some are even useful for screening natural
Recently, a variety of mass spectrometry- product extracts which have always been a
Mass Spectrometry and Drug Discovery
Binding
Library + R R R +
Affinity column
Wash unbound library
compounds to waste
Isolation
Elute bound ligands
using pH change
I
LC-MS-MS
identification
Trap ligands on C,,
column
source of molecular diversity for drug discov- In some applications (43), ligands are
ery. All of the mass spectrometry-based eluted from the affinity column and then
screening methods use receptor binding of li- trapped on a second column such as a reverse
gands as the basis for identification of lead phase HPLC column. LC-MS or LC-MS-MS
compounds. identification of the ligands (hits) is then car-
ried out using the trapping column. In other
2.4.1 Affinity Chromatography-Mass Spec- systems, ligands are identified directly from
trometry. Since the introduction of affinity the affinity column using mass spectrometry
chromatography more than 30 years ago, this (44). For example, Kelly et al. (44) prepared an
-
technique has become a standard biochemical affinity column containing immobilized phos-
tool for the isolation and identification of new phatidylinositol-3-kinase and used it for direct
binding partners to specific target molecules. LC-MS screening of a 361-component peptide
Therefore, the coupling of affinity chromatog- library. Electrospray mass spectrometry and
raphy to mass spectrometry is a logical exten- tandem mass spectrometry were used to iden-
sion of this technique, and the application of tify the ligands released from the affinity col-
affinity LC-MS to the screening of combinato- umn using pH gradient elution.
rial libraries has been demonstrated by sev- Advantages of affinity chromatography-
eral groups (43, 44). During affinity LC-MS mass spectrometry for screening during drug
screening, a receptor molecule such as a bind- discovery include versatility and re-use of the
ing protein or enzyme is immobilized on a column. Both combinatorial libraries and nat-
solid support within a chromatography col- ural product extracts can be screened using
umn. The library mixture is pumped through this approach, and a wide range of binding
the affinity column in a suitable binding buffers may be used. Mass spectrometry-com-
buffer so that any ligands in the mixture with patible mobile phases are only required during
affmity for the receptor would be able to bind. the final LC-MS detection step. Furthermore,
Then, unbound material is washed away. Fi- a single column may be used multiple times to
nally, the specifically bound ligands are eluted screen different samples for ligands unless the
using a destabilizing mobile phase and identi- destabilization solution irreversibly dena-
fied using mass spectrometry. This affinity- tures, releases, or inhibits the receptor.
column LC-MS assay is summarized in Fig. Despite these advantages, affinity chroma-
13.7. tography has numerous drawbacks that have
2 Current Trends and Recent Developments
Binding 0 0
0
+ R L-R + O 0
0 0 0
GPC isolation
prompted the development of alternative mass phase HPLC and identified either on-line or
spectrometer screening tools. For example, im- off-line using tandem mass spectrometry.
mobilization of the receptor might change its af- This screening method is illustrated in Fig.
finity characteristics causing false negative or 13.8.
false positive hits. This is particularly problem- During the pre-incubation and GPC steps,
atic for receptors that are solution-phase in their any binding buffer may be used, because the
native state. Also, developing and then imple- binding buffer will be removed during reverse
menting an immobilization scheme is often a phase LC-MS analysis. However, the GPC sep-
slow, tedious, and even expensive process, and aration step must be carried out quickly, be-
this process is unique for each new receptor. Fi- cause ligands begin to dissociate from the re-
nally, false positive hits are often obtained when ceptor immediately and can become lost into
screeninglarge molecularly diverse libraries, be- the size exclusion gel. Despite this disadvan-'
cause there are usually compounds in such mix- tage, this approach allows both receptor and
tures that have affinity for the stationary phase ligand to be screened in solution, which avoids
or linker molecule instead of the receptor. some of the problems associated with the use
of affinity columns for screening. The GPC
2.4.2 Gel Permeation Chromatography- LC-MS-MS screening method should also be
Mass Spectrometry. Another type of chroma- suitable for screening natural product ex-
tography that has been combined with mass tracts as well as combinatorial library mix-
spectrometry as a screening system for drug tures.
discovery is gel permeation chromatography
(GPC) (45,461.Also called size-exclusion chro- 2.4.3 Affinity Capillary Electrophoresis-
matography, GPC separates molecules accord- Mass Spectrometry. Affinity capillary electro-
ing to size as they pass through a stationary phoresis was originally used for the determi-
phase containing particles with a defined pore nation of the binding constants of small
size. During GPC-based screening, a library molecules to proteins (47-49). This solution-
mixture is pre-incubated with a macromolec- based technique is rapid and requires only
ular receptor to allow any ligands in the li- small amounts of ligands. Affinity constants
brary to bind, and then GPC is used to sepa- are measured based on the mobility change of
rate the large receptor-ligand complexes from the ligand on interaction with the receptor
the unbound low molecular weight com- present in the electrophoretic buffer (50). By
pounds in the mixture. Finally, ligands are re- combining affinity capillary electrophoresis
leased from the receptor during reversed with on-line mass spectrometric detection and
Mass Spectrometry and Drug Discovery
I I I I I I 1
0 1 2 3 4 5 6
Migration time (min)
Figure 13.9. Affinity capil-
lary electrophoresis-UV-mass
spectrometry of a 100-tetrapep-
tide library weened for binding
to vancomycin (104 pikf in the
electrophoresis buffer). (a) The
elution of peptides was moni-
tored with UV absorbance dur-
ing capiuary electrophoresis,
and the elution time increased
with increasing affinity for van-
comycin. 6) Positive ion electm-
spray mass spedrum with CID
of the Tris adduct of the proton-
ated peptide detected at -5 rnin
in the electropherogram shown
in a (Reproduced from Ref. 52
by permission of the American
Chemical Society.)
identification, affinity constants for multiple Tris, which was used in the electrophoresis
compounds can be measured in a single anal- buffer. Although the identification of this pep-
ysis (51). Recognizing that on-line mass spec- tide was not prevented by the formation of this
trometric detection was helpful for the identi- adduct, some buffers used during electro-
fication of each ligand, Chu et al. (52) extended phoresis might interfere with mass spectro-
this approach to include the screening of com- metric ionization and detection. Also, the
binatorial libraries as a means of drug discov- types of libraries that have been screened us-
ery. The data in Fig. 13.9 show the results of ing this approach have contained modest
screening a 100-tetrapeptide library for affin- numbers of synthetic analogs such as pep-
ity to vancomycin using affinity capillary elec- tides. Libraries exceeding 400 members re-
trophoresis-mass spectrometry. Without van- quired preliminary purification using affinity
comycin in the electrophoresis buffer, all the chromatography to reduce the number of com-
peptides eluted within 3 min. When vancomy- pounds (52). As a result, this approach is prob-
cin was present, the peptides eluted in order of ably not ideal for screening libraries contain-
affinity, with the highest affmity compounds ing molecularly diverse compounds or for
being detected between 4.5 and 5 min. Positive screening natural product extracts. However,
ion electrospray tandem mass spectrometry affinity capillary electrophoresis-mass spec-
was used to identify the highest affinity li- trometry is fast; each analysis requires less
gands (see Fig. 13.9b). than 10 min. Also, it may be used to measure
Note that some peptide ligands such as affinity constants for ligand-receptor interac-
Fmoc-DDFA were detected as adducts with tions.
2 Current Trends and Recent Developments
2.4.4 Frontal Affinity Chromatography- Because all library compounds must be moni-
Mass Spectrometry. Like affinity chromatog- tored simultaneously, the compounds must be
raphy-mass spectrometric screening (see Sec- selected so that they have unique molecular
tion 2.4.1), frontal affinity chromatography weights. Also, one compound in the mixture
uses an aMinity column containing immobi- should not suppress the ionization of another.
lized receptor molecules (53). The difference Therefore, this approach is probably re-
between the two screening methods is that the stricted to the screening of small combinato-
ligands are continuously infused into the col- rial libraries that are similar in chemical
umn during frontal affinity chromatography structure and ionization efficiencies. Finally,
and detected using mass spectrometry. Com- the binding buffer used for affinity chromatog-
pounds with no affinity for the immobilized raphy must be compatible with on-line APCI
receptor elute immediately in the void volume, or electrospray mass spectrometry. This
but the elution of the ligands is delayed. As means that the mobile phase must be volatile
compounds compete for binding sites on the and usually of low ionic strength (i.e., typically
affinity column, these sites become saturated <40 mM for electrospray ionization).
until ligands begin to elute from the column at
their infusion concentration. In this manner, 2.4.5 Bioaffinity Screening using Electro-
frontal affinity chromatography may be used spray FTICR Mass Spectrometry. Although
to measure affinity constants for ligands, and FTICR mass spectrometry may be used to de-
by using a mass spectrometer for on-line iden- termine the exact masses of combinatorial li-
tification of ligands, this technique becomes a brary compounds and to confirm their struc-
screening method (54,55). tures using CID and high resolution tandem
During frontal affinity chromatography- mass spectrometry (see definitions of CID and
mass spectrometry, signals for all compounds MS-MS in Section I), electrospray FTICR
eluting from the affinity column are recorded mass spectrometry may be used for the direct
by the mass spectrometer, and the last com- screening of combinatorial libraries without
pounds to elute at their infusion concentra- the need for any pre-purification or chroma-
tions represent the highest affinity com- tography. In this application, a combinatorial
pounds or "hits." An example of the screening library is pre-incubated with a receptor in so- .
of six oligosaccharides with different binding lution and then analyzed directly using elec-
affinities for an immobilized monoclonal car- trospray to identify receptor-ligand com-
bohydrate-binding antibody is shown in Fig. plexes in the gas phase (56-60). Once a
13.10. Compounds 1-3 eluted immediately (no receptor-ligand complex is ionized and
affinity), whereas compounds 4-6 eluted in trapped in the FTICR mass spectrometer, the
order of increasing affinity for the antibody. mass difference between the complex and the
Dissociation constants were determined to be receptor alone might be measured with suffi-
185, 12.6, and 1.8 p M for compounds 4-6, re- cient resolution and accuracy to determine the
spectively (54). mass(es) and perhaps elemental composi-
Because frontal affinity chromatography t i o n ( ~of
) the ligand(s). If the ligand carries a
uses a conventional affinity column, this tech- charge, then CID may be used to dissociate the
nique provides additional applications of this ligand for subsequent analysis using tandem
type of column to investigators already using mass spectrometry. This elegant and simple
affinity-mass spectrometry (See Section screening approach is summarized in Fig.
2.4.1). However, the same limitations and dis- 13.11.
advantages of using immobilized receptors An extension of this FTICR mass spectrom-
still apply, such as non-specific binding to the etry-based screening technique has been to
stationary phase, the development time and screen a combinatorial library for ligands to
cost of preparing the affinity columns, and the two receptors simultaneously (59,60). In this
possibility that immobilizing the receptor example, the two receptors consisting of RNA
might alter its binding characteristics and constructs representing the prokaryotic (16s)
specificity. In addition, mass spectrometric de- rRNA and eukaryotic (18s) rRNA A-site were
tection creates some additional limitations. incubated simultaneously with an aminogly-
Mass Spectrometry and Drug Discovery
-
-
-
-
-
I I I I I
10 20 30 40 50
tlmin-
10 20 30 40 50
tlmin-
coside library to identify potential ligands. By buffer and receptors that may be used. Only
screening a target mixture against the same low ionic strength and volatile buffers are
library, screening efficiency is enhanced and compatible with this approach (such as 10 mM
the number of analyses required is reduced. ammonium acetate). Also, the receptor and li-
The advantage of this screening method gand must be highly purified to avoid impuri-
over other approaches is the elimination of pu- ties that might interfere with ionization and
rification steps before mass spectrometric detection. Therefore, this technique is proba-
identification. Also, the disadvantages associ- bly more suitable for the screening of combi-
ated with chromatographic separations are natorial libraries than complex natural prod-
eliminated. However, the use FTICR mass uct mixtures. Finally, the receptor-ligand
spectrometric screening restricts the binding complex must ionize efficiently during electro-
2 Current Trends and Recent Developments
[L- RJ'
MS/MS
identification
1 FTICR-MS
dissociation Figure 13.11. Bioaffinity electrospray
FTICR mass spectrometry. The isolation and
mass spectrometric identification of receptor-
specificligands are carried out entirely in the
mass spectrometer without chromatography
or other separation steps.
spray under solvent and ion source conditions The principle of pulsed ultrafiltration screen-
that do not cause dissociation of the complex. ing of combinatorial libraries is shown in Fig.
13.12. During pulsed ultrafiltration, ligand-
2.4.6 Pulsed Ultrafiltration-Mass Spectrom- receptor complexes remain in solution in the
etry. A versatile approach to screening solu- ultrafiltration chamber while unbound library
tion phase combinatorial libraries and natural compounds and buffer are washed away. After
product extracts is pulsed ultrafiltration- unbound compounds are removed, the hits
mass spectrometry (61,62), which uses a stan- from the library are eluted from the chamber
dard LC-MS system with an ultrafiltration by destabilizing the ligand-receptor complex
chamber substituted for the HPLC column. using an organic solvent, a pH change, or a
Unbound
Ligand-receptor compounds
complexes / Wash unbound
compounds to waste
n
LC-MS
identification Elute desalted
ligands into MS
mlz
Figure 13.12. Combinatorial library screening using pulsed ultrafiltration mass spectrometry. Dur-
ing the loading step (left),ligands are bound to the receptor either on-line (top) using a flow-through
approach or off-line (bottom two incubations). Unbound compounds and binding buffer, cofactors,
etc. are washed out of the ultrafiltration chamber to waste during a separation step (middle). Bound
ligands are dissociated from the receptor molecules and eluted from the chamber by introducing a
destabilizing solution such as methanol, pH change, etc. Finally, released ligands are identified using
mass spectrometry, tandem mass spectrometry, or LC-MS (right). (Reproduced from Ref. 64 by
permission of John Wiley & Sons.)
Mass Spectrometry and Drug Discovery
267
100- 129 Library without adenosine
50-
C
0
142 172 207 299 354 375
Z: 410
?? 0 . 'b'l bn !"# : ! 8 ' 8 8 8 8 8
EHNA
\
Library with adenosine
Figure 13.13. Identification of EHNA as the highest affinity ligand for adenosine deaminase in a
combinatorial library of 20 adenosine analogs using ultrafdtration electrospray mass spectrometry.
(Reproduced from Ref. 61 by permission of the American Chemical Society.)
combination of both. The released ligands are tion (Fig. 13.13, Control). Despite being
identified on-line using APCI or electrospray present at a 10-fold lower concentration than
mass spectrometry (61) or collected and ana- the natural substrate adenosine analogs,
lyzed off-line using mass spectrometry, LC- EHNA was easily identified because it had the
MS, or LC-MS-MS (63). highest affinity among the library compounds
An example of pulsed ultrafiltration mass (K,= 1.9 nM). This demonstrates the use of
spectrometry for the screening of a library of ultrafiltration electrospray mass spectrome-
20 adenosine analogs for ligands to adenosine try for identifying a high affinity ligand among
deaminase is shown in Fig. 13.13. After a 15- a set of analogs that bind to a specific receptor.
min preincubation of the library compounds In a follow-up lead optimization study using
(17.5 p.M each except for EHNA, which was pulsed ultrafiltration mass spectrometry, a
present at 1.75 p.M) with 2.1 p.M adenosine synthetic combinatorial library of EHNA ana-
deaminase in 50 mM phosphate buffer, an al- logs was screened for binding to adenosine
iquot containing 420 pmol of the receptor was deaminase, and structure-activity relation-
injected into the ultrafiltration and washed for ships for EHNA binding were identified (65).
8 min at 50 pL1min with water to remove the As an illustration of the versatility of
phosphate buffer and unbound or weakly pulsed ultrafiltration-mass spectrometry,
binding library compounds. Methanol was in- binding assays for a variety of receptors have
troduced into the mobile phase to dissociate been reported including dihydrofolate reduc-
the enzyme-ligand complex and release bound tase (631, cyclooxygenase-2 (621, serum albu-
ligands for identification by electrospray mass min (66, 67) and estrogen receptors (68). Not
spectrometry. During methanol elution, only only is pulsed ultrafiltration useful for identi-
EHNA [erythro-9-(2-hydroxy-3-nonyl) ade- fying ligands to different receptors, but a wide
nine] was detected as the [M+HIt ion of mlz range of combinatorial libraries and natural
278 (Fig. 13.13). In control experiments using product extracts in any suitable binding buffer
the library without enzyme, no library com- may be screened. In addition to combinatorial
pounds were detected during methanol elu- libraries, complex natural product extracts
2 Current Trends and Recent Developments
have been screened (68),and neither plant nor trol injection is used to control for non-specific
fermentation broth matrices were found to in- binding to the apparatus. Because the concen-
terfere with screening (62). As another exam- tration of receptor and total amount of liquid
ple of the flexibility of this screening system, a are known, and because the concentration of
centrifuge tube equipped with an ultrafiltra- free ligand is measured as it elutes from the
tion membrane (69) has been used instead of chamber over a wide range of concentrations,
an on-line ultrafiltration chamber. Other ap- the affinity constant and other binding param-
plications of pulsed ultrafiltration-mass spec- eters may be calculated.
trometry include screening drugs and drug In most of the applications of pulsed ultra-
candidates for metabolic stability (701, meta- filtration to date, serial analyses were carried
bolic activation to reactive metabolites (711, out with a throughput of approximately one or
and the measurement of affinity constants for two assays per hour. Because the purpose of
ligand-receptor interactions (66, 67). these assays was to screen complex mixtures
Metabolism and toxicity screening appli- or to obtain metabolism data for new drug en-
cations of pulsed ultrafiltration use hepatic tities, the throughput of these analyses was
microsomes in the ultrafiltration chamber. acceptable, but was not high throughput. The
For metabolic screening drugs and the co- rate limiting step in these analyses was the
factor nicotinamide dinucleotide phosphate ultrafiltration separation and not the mass
(NADPH) are flow-injected through the ul- spectrometric detection. Two solutions have
trafiltration chamber (oxygen is dissolved in been reported to increase the throughput of
the mobile phase), and the metabolites pulsed ultrafiltration mass spectrometry. In
formed by microsomal cytochrome P450 and the first solution, van Breemen et al. (70) used
any unreacted compounds flow out of the a multiplex ultrafiltration system in which up
chamber for mass spectrometric identifica- to 60 ultrafiltration chambers could be ar-
tion and/or quantitative analysis (70). On- ranged in parallel and interfaced to a single
line applications require the use of volatile mass spectrometer. This scheme is shown in
buffers, but LC-MS and LC-MS-MS may be Fig. 13.14. In this system, a continuous flow of
used off-line to analyze the ultrafiltrate no the buffer or mobile phase is maintained
matter what buffer had been used. Screen- through the ultrafiltration chambers, but the
ing drugs for metabolic activation using mass spectrometer samples each ultrafiltrate -
pulsed ultrafiltration-mass spectrometry is solution at 1-minintervals. The sampling time
carried out in a similar manner, except that would be selected to correspond to the time at
glutathione is coinjected along with NADPH which a maximum concentration of metabo-
and the drug substrate (71). MS-MS may be lites would be expected to elute from the
used on-line or LC-MS-MS may be used off- chamber. This approach was demonstrated to
line to screen for glutathione adducts as an increase the throughput of metabolic screen-
indication that the drug was metabolized to ing using ultrafiltration mass spectrometry by
a reactive intermediate(s) that was trapped 60-fold. Although used originally for meta-
by reaction with glutathione. Finally, pulsed bolic screening, this approach would be appli-
ultrafiltration may be used with UV or mass cable to toxicity screening and drug discovery
spectrometric detection to measure affinity screening as well.
constants of individual compounds (66). The second solution to increasing the
To measure affinity constants and other throughput of pulsed ultrafiltration mass
physico-chemical properties of binding such as spectrometry has been to miniaturize the ul-
the number of binding sites, two pulsed ultra- trafiltration chamber volume while maintain-
filtration measurements are carried out. First, ing the flow rate and chamber pressure. Be-
an aliquot or pulse of a liquid is injected cause the ultrafiltration membrane cannot
through the chamber, and the elution profile withstand high pressure without rupturing,
is recorded. Then, the chamber is loaded with the ultrafiltration process cannot be acceler-
a receptor, and the ligand is reinjected. If bind- ated simply by increasing the flow rate
ing occurs, the elution profile will be delayed through the chamber. The approach of Bev-
in proportion to the affinity constant. The con- erly et al. (72) was to fabricate a 35-pL ultra-
Mass Spectrometry and Drug Discovery
La Switching-
Autoinjector
Parallel
ultrafiltration
chambers
Figure 13.14. High-throughput pulsed ultrafiltration mass spectrometry system for screening drug
candidates for metabolic transformation. Multiple ultrafiltration chambers are connected in parallel
to a single mass spectrometer detector. After loading each chamber with liver microsomes, a different
drug is injected into each chamber at intervals of 1 min (for 60 screensh using 60 chambers).
Constant flow of incubation buffer is maintained through all chambers, but only one chamber at a
time is connected on-line to the mass spectrometer. Drug metabolite profiles are recorded using mass
spectrometry for up to 1 min per chamber. (Reproduced from Ref. 70 by permission of the American
Society for Pharmacology and Experimental Therapeutics.)
affinity-based screening coupled with MALDI MS) (79). These chips are being developed to
mass spectrometry has not been a successful enable ultrafast and highly sensitive electro-
drug discovery approach. spray mass spectrometric analysis. Because of
However, progress is being made in the use their microscopic size, CE-MS chips have the
of affmity probes for the capture of proteins potential to hold large arrays of samples that
and other macromolecules from biological so- would facilitate high-throughput analysis.
lutions followed by MALDI mass spectromet- In terms of mass spectrometry instrumen-
ric detection and identification (75-77). One tation, the currently available instruments
affinity MALDI mass spectrometry method such as time-of-flight (TOF) analyzers and hy-
has been paired with the affinity probes using brid quadrupole-TOF analyzers are able to ac-
in surface plasmon resonance systems (78). quire complete mass spectra at rates compat-
These affinity-based MALDI mass spectrome- ible with fast CE separations. As CE or
try screening assays are promising approaches ultrafast chromatography replaces conven-
for testing blood or other biological fluids for tional, slow HPLC applications, TOF-based
the presence of specific proteins or other mac- mass spectrometers will be needed to replace
romolecules. As a result, these have the poten- the less efficient scanning types of instru-
-
tial to become clinical diagnostic tools or ments such as quadrupoles and ion traps for
might even lead to the identification of new most high-throughput applications. FTICR
therapeutic targets. However, they are un- mass spectrometry remains unsurpassed in
likely to become useful for screening combina- terms of resolution and mass accuracy for both
torial libraries or natural ~ r o d u cextracts
A
t for MS and MS-MS applications. However, the
the purpose drug discovery. throughput of FTICR mass spectrometric
Mass Spectrometry and Drug Discovery
analysis needs to be increased to remain use- tion about mass spectrometry and links to a
ful for combinatorial chemistry applications. variety of reference materials regarding bio-
Advances in increasing the throughput of medical mass spectrometry.
FTICR mass spectrometry are anticipated. http://www.bentham.org/cchts/ Combinato-
Hyphenated technologies such as LC- rial Chemistry & High Throughput Screen-
NMR-MS are being developed to support ing
structure elucidation of combinatorial librar- 0 http://pubs.acs.org/journals/jcchff/ Journal
ies (80).Although such technologies are still in of Combinatorial Chemistry
a developmental stage, they have great poten-
0 http://www.5z.com/moldiv/ Molecular Di-
tial for analyses of combinatorial libraries and
versity
for natural product drug discovery (81-83).
The main impediments of applying LC- http://www.liebertpub.com/BSC/defaultl.asp
NMR-MS to combinatorial chemistry remain Journal of Biomolecular Screening
poor sensitivity of the NMR, the obligatory use R. B, Cole, Ed, Electrospray Ionization Mass
of deuterated solvents for chromatography, Spectrometry, John Wiley and Sons, New
and the low throughput of NMR analyses. York, 1997.
However, efforts are in progress to improve 0 F. W. McClafferty, and F. Turecek, Interpre-
the throughput of NMR analyses (84-86). tation of Mass Spectra, 4th ed, University
In conclusion, mass spectrometry provides Science Books, Mill Valley, CA, 1993.
rapid, reliable, sensitive, and selective analy- W. M. Niessen, J. Chromatogr. A, 856,179-
sis of combinatorial libraries for structure 189 (1999).
confirmation, purity analysis, and library de-
0 J . T. Watson, Introduction to Mass Spec-
convolution. In addition, mass spectrometric
trometry, 3rd ed, Lippincott-Raven, Phila-
screening methods have been developed and
delphia, PA, 1997.
are beginning to be applied to drug discovery.
In the case of natural products, mass spec-
trometry facilitates the screening of natural 5 ACKNOWLEDGMENTS
product extracts and facilitates the dereplica-
tion and characterization of lead compounds. I thank Young Geun Shin, Benjamin Johnson,
At different times during the last 100 years, and Jennifer Mosel for help in writing and pre-
first physicists and physical chemists and then paring this chapter.
organic chemists pronounced that mass spec-
trometry had run out of new applications and
REFERENCES
had no future. Fortunately, they were wrong.
Today, medicinal chemists recognize that the 1. F. Field, J. Am. Soc. Mass Spectrom, 1,277-283
(1990).
potential of mass spectrometry to contribute
to all facets of drug discovery has only just 2. M. Barber, R. S. Bordoli, G . J . Elliott, R. D.
Sedgwick, and A. N . Tyler, Anal. Chem., 54,
begun to be explored. Furthermore, applica-
645A-657A (1982).
tions of mass spectrometry to drug develop-
3. F. Hillenkamp, M . Karas, R. C. Beavis, and B. T .
ment are even less developed and are waiting
Chait, Anal. Chem., 63, 1193A-1203A (1991).
to be developed. Mass spectrometry has be-
4. Y. Ito, T . Takeuchi, D. Ishii, and M . Goto,
come a fundamental analytical tool for drug J. Chromatogr, 346,161-166 (1985).
discovery, and this role should continue to
5. L. Chen, M . Stacewicz-Sapuntzakis, C. Duncan,
grow in the future.
R. Sharifi, L. Ghosh, R. van Breemen, D. Ash-
ton, and P. E. Bowen, J. Natl. Cancer Znst., 93,
1872-1879 (2001).
4 WEB SITE ADDRESSES AND 6. J. Liu, J. E. Burdette, H . X u , C. Gu, R. B. van
RECOMMENDED READING FOR FURTHER Breemen, K. P. L. Bhat, N . Booth, A. I. Con-
INFORMATION stantinou, J . M . Pezzuto, H . H . S. Fong, N . R.
Farnsworth, and J . L. Bolton, J. Agric. Food
0 http://www.asms.org Homepage of the Chem., 49,2472-2479 (2001).
American Society for Mass Spectrometry. 7. J. A. Syage and M. D. Evans, Spectroscopy, 16,
This web site contains additional informa- 14-21 (2001).
References
TIMOTHYS. BAKER
Purdue University
Department of Biological Sciences
West Lafayette, Indiana
Contents
1 Macromolecular Structure Determination by Use
of Electron Microscopy, 612
2 Electron Scattering and Radiation Damage, 612
3 Elastic and Inelastic Scattering, 613
4 Radiation Damage, 614
5 Required Properties of Illuminating Electron
Beam, 615
6 Three-Dimensional Electron Cryomicroscopy of
Macromolecules, 615
7 Overview of Conceptual Steps, 616
8 Classification of Macromolecules, 617
9 Specimen Preparation, 618
10 Microscopy, 620
11 Selection and Preprocessing
of Digitized Images, 623
12 Image Processing and 3D Reconstruction, 624
12.1 2D Crystals, 626
12.2 Helical Particles, 626
12.3 Icosahedral Particles, 627
13 Visualization, Modeling, and Interpretation of
Results, 628
14 Trends, 628
15 Acknowledgments, 628
16 Abbreviations, 628
I, incident beam
Specimen
(Maximum dose -5e-/A2for
organic or biological specimens)
Diffraction
'............ ..*- pattern Figure 14.1. Schematic diagram show-
(Strongest spots: ing the principle of image formation and
protein - 10-5 lo diffraction in the transmission electron
paraffin - 10-2lo) microscope. The incident beam I. illumi-
nates the specimen. Scattered and un-
scattered electrons are collected by the
objective lens and focused back to form
&st an electron diffraction pattern and
then an image. For a 2D or 3D crystal,
the electron-diffraction pattern would
show a lattice of spots, each of whose in-
tensity is a small fraction of that of the
nder focused incident beam. In practice, an in-focus
image image has no contrast, so images are re-
corded with the objective lens slightly
n-focus image defocused to take advantage of the out-
of-focus phase-contrast mechanism.
lllected by the imaging optics, shown here for describing the structure of the specimen.
mplicity as a single lens, but in practice con- The amplitudes and phases of the scattered
sting of a complex system of five or six lenses, electron beams are directly related to the
ith intermediate images being produced at amplitudes and phases of the Fourier com-
messively higher magnification at different ponents of the atomic distribution in the
~sitionsdown the column. Finally, in the specimen. When the scattered beams are re-
ewing area, either the electron diffraction combined with the unscattered beam in the
lttern or the image can be seen directly by image, they create an interference pattern
.e on the phosphor screen, or detected by a (the image), which, for thin specimens, is
i7 or CCD camera, or recorded on photo- related approximately linearly to the density
aphic film or image plate. variations in the specimen. The information
about the structure of the specimen can then
ELASTIC A N D INELASTIC SCATTERING be retrieved by digitization and computer-
based image processing, as described later.
le coherent, elastically scattered electrons The elastic scattering cross sections for elec-
ntain all the high resolution information trons are not as simply related to the atomic
Electron Cryomicroscopy of Biological Macromolecules
u
at least 10,000 molecules in theory, and even
more in practice (21). Crystals used for X-ray Cryomicroscopy
or neutron diffraction contain many orders of
magnitude more molecules. t
It is possible to collect both the elastically Micrographs
and the inelastically scattered electrons simul- I
taneously with an energy analyzer and, if a lmage selection
fine electron beam is scanned over the speci- Digitization
men, then a scanning transmission electron Preprocessing
micrograph displaying different properties of lmage processing & 3D reconstruction
the specimen can be obtained. Alternatively,
conventional transmission electron micro-
3D
1
density map
scopes to which an energy filter has been
added can be used to select out a certain en- I
ergy band of the electrons from the image. Visualization, modeling, and
Both types of microscope can contribute in interpretation
other ways to the knowledge of structure, but
in this presentation, we concentrate on high
+
Structure-function relationships
voltage, phase-contrast electron microscopy of
unstained macromolecules most often embed- Figure 14.2. Flow diagram showing all the proce-
ded in ice because this is the method of widest dures involved in electron cryomicroscopy from
impact in structural biology. sample preparation to map interpretation.
number of states) at relatively high concentra- cussed in 1971 (23) and demonstrated in 1975
tion, rapidly frozen (vitrified) as a thin film, (7,241, although earlier work on stained spec-
transferred into the electron microscope, and imens had shown the value of averaging to
photographed by means of low dose selection increase the signal-to-noise ratio. The im-
and focusing procedures. The resulting im- provement obtained, as in all repeated mea-
ages, if recorded on film, must then be digi- surements, gives a factor of N< improvement
tized. Digitized images are then processed by in signal-to-noise ratio, where N is the number
the use of computer programs that allow dif- of times the measurement is made. The effect
ferent views of the specimen to be combined of averaging to produce an improvement in
into a 3D reconstruction that can be inter-
signal-to-noise ratio is seen most clearly in the
preted in terms of other available structural,
processing of images from 2D crystals. Figure
biochemical, and molecular data.
14.3 shows the results of applying a sequence
of corrections, beginning with averaging, to
two-dimensional crystals of bacteriorhodopsin
7 OVERVIEW O F CONCEPTUAL STEPS
in 2D space group p3. The panels show: (a, b)
Radiation damage by the illuminating elec- 2D averaging, (c) correction for the micro-
tron beam generally allows only one good pic- scope contrast transfer function (CTF), and
ture (micrograph) to be obtained from each (dl threefold crystallographic symmetry av-
molecule or macromolecular assembly. In this eraging of the phases and combination with
micrograph, the signal-to-noise ratio of the 2D electron diffraction amplitudes. At each
projection image is normally too small to accu- stage in the procedure the projected picture
rately determine the projected structure. This of the molecules gets clearer. The final stage
implies, first, that it is necessary to average results in a virtually noise-free projected
many images of different molecules taken structure for the molecule at near atomic
from essentially the same viewpoint to in- (3A) resolution.
crease the signal-to-noise ratio and, second, The earliest successful application of the
that many of these averaged projections, idea of combining projections to reconstruct
taken from different directions, must be com- the 3D structure of a biological assembly was
bined to build up the information necessary to made by DeRosier and Klug (4). The idea-is
determine the 3D structure of the molecule. that each 2D projection corresponds after Fou-
Thus, the two key concepts are: ( 1 ) averaging rier transformation to a central section of the
to a greater or lesser extent depending on res- 3D transform of the assembly. If enough inde-
olution, particle size and symmetry to increase pendent projections are obtained, then the 3D
the signal-to-noise ratio; and (2)the combina-
transform will have been fully sampled and
tion of different projections to build a 3D map
the structure can then be obtained by back
of the structure.
transformation of the averaged, interpolated,
In addition, there are various technical cor-
rections that must be made to the image data and smoothed 3D transform. This procedure
to allow an unbiased model of the structure to is shown schematically for a three-dimen-
be obtained. These include correction for the sional object in the shape of a duck, which rep-
phase-contrast transfer function (CTF) and, resents the molecule whose structure is being
at high resolution, for the effects of beam tilt. determined (Fig. 14.4).
For crystals, it is also possible to combine elec- In practice, the implementation of these
tron diffraction amplitudes with image phases concepts has been carried out in a variety of
to produce a more accurate structure (7), and ways, given that the experimental strategy
in general to correct for loss of high resolution and type of computer analysis used depend
contrast for any reason by "sharpening" the on the type of specimen, especially the molec-
data by application of a negative temperature ular weight of the individual molecule, its
factor (22). symmetry, and whether it assembles into an
The idea of increasing the signal-to-noise aggregate with one-dimensional (ID), two-di-
ratio in electron images of unstained biologi- mensional (2D), or three-dimensional (3D) pe-
cal macromolecules by averaging was dis- riodic order.
8 Classification of Macromolecules
performed. The classification of molecules ac- to much below that realized in the bulk of cur-
cording to their level of periodic order and rent X-ray crystallographic studies, cryo-EM
symmetry (Table 14.1) provides a logical and methods provide a powerful means to study
convenient way to consider the means by molecules that resist crystallization in ID, 2D,
which specimens are studied in 3D by micros- or 3D. These methods allow one to explore the
COPY- dynamic events, different conformational
Each type of specimen offers a unique set of states (asinduced, for example, by altering the
challenges in obtaining 3D structural infor- microenvironment of the specimen), and mac-
mation at the highest possible resolution. The romolecular interactions that are the key to
best resolutions achieved by 3D EM methods understanding how each macromolecule func-
to date, at about 3-4 A, have been obtained tions.
with seve& thin, 2D crystals, in large part
because of their excellent order.
With the exception of true 3D crystals, 9 SPECIMEN PREPARATION
which must be sectioned to make them thin
enough to study by transmission electron mi- The goal in preparing specimens for cryomi-
croscopy, the resolutions obtained with biolog- croscopy is to keep the biological sample as
ical specimens are generally dictated by the close as possible to its native state to preserve
preservation of periodic order, and the sym- the structure to atomic or near-atomic resolu-
metry and complexity of the object. Hence, tion in the microscope and during microscopy.
studies of the helical acetylcholine receptor The methods by which numerous types of
tubes (36), the icosahedral hepatitis B virus macromolecules and macromolecular com-
capsid ( 4 4 , the 50s ribosome (45), and the plexes have been prepared for cryo-EM studies
centriole (26) have yielded 3D density maps at are now well established (9,56,57). Most such
resolutions of 4.6, 7.4, 15, and 280 & respec- methods involve cooling samples at a rate fast
tively. enough to permit vitrification (solid, glasslike
If high resolution were the sole objective of state) rather than crystallization of the bulk
EM, it would be necessary, given the capabili- water. Noncrystalline biological macromole-
ties of existing technology, to try to form well- cules are typically vitrified by applying a small
ordered 2D crystals or helical assemblies of (often <10 pL) aliquot of a purified, approxi-
each macromolecule of interest. Indeed, a mately 0.2-5 mg/mL suspension of sample to
number of different crystallization techniques an EM grid coated with a carbon or holey car-
have been devised [e.g., Horne and Pasquali- bon support film. The grid, secured with a pair
Ronchetti (46); Yoshimura et al. (47); Korn- of forceps and suspended over a container of
berg and Darst (48);Jap et al. (49); Kubalek et ethane or propane cryogen slush (maintained
al. (50); Rigaud et al. (51); Hasler et al. (52); near its freezing point by a reservoir of liquid
Reviakine et al. (53); Wilson-Kubalek et al. nitrogen), is blotted nearly dry with a piece of
(54)], and some of these have yielded new filter paper. The grid is then plunged into the
structural information about otherwise recal- cryogen, and the sample, if thin enough (-0.2
citrant molecules like RNA polymerase (55). pm or less), is vitrified in millisecond or
However, despite the obvious technological shorter time periods (58-60).
advantages of having a molecule present in a The ability to freeze samples with a time
highly ordered form, most macromolecules resolution of milliseconds affords cryo-EM one
function not as highly ordered crystals or he- of its unique and, as yet, perhaps most under-
lices but instead as single particles (e.g., many utilized advantages: capturing and visualizing
enzymes) or, more likely, in concert with other dynamic structural events that occur over
macromolecules as occurs in supramolecular time periods of a few milliseconds or longer.
assemblies. Also, crystallization tends to con- Several devices that allow samples to be per-
strain the number of conformational states a turbed in a variety of ways as they are plunged
molecule can adapt and the crystal conforma- into cryogen have been described [e.g., Subra-
tion might not be functionally relevant. maniarn et al. (61); Berriman and Unwin (59);
Hence, although resolution may be restricted Siege1 et al. (62); Trachtenberg (63); White et
Table 14.1 Classification of Macromolecules According to Periodic Order and Symmetry
Periodic
Order Type Symmetry Example Macromolecule/Complex Representative Reference
OD Point group CI Ribosome 25
CI Centriole 26
c5 BacteriophageW9 head 27
C8 Ribonucleoprotein vault 28
C17 TMV disk 29
Dz p-galactosidase 30
DB Clathrin coats 31
I36 Lumbricus terrestris hemoglobin 32
T Dps protein 33
0 Azotobacter pyruvate dehydrogenase core 34
I Icosahedral viruses 17
Screw axis (helical)" Acto-myosin filament 35
Acetylcholine receptor tubes 36
Microtubule 37
Bacterial flagella 38
Tobacco mosaic virus 39
2D 2D space group (2D crystal) ~3 Bacterial rhodopsin membrane 11
~4212 Aquaporin membrane 40
~6 Gap junction membrane 41
p321 Light harvesting complex I1 12
~ 1 % Tubulin sheet 13
3D 3D space group (3D crystal) p212121 Myosin S1 protein crystal 42
P6, or P6, Insect flight muscle 43
, space axis, which combines a rotation of 2 d n radius about an axis followed by a translation of nvh of the repeat distance.
"The symmetry of a helical structure is defined by an n
Electron Cryomicroscopy of Biological Macromolecules
al. (6011. Examples of the use of such devices intensified camera system. For some speci-
include spraying acetylcholine onto its recep- mens, like thin 2D crystals, searching is
tor to cause the receptor channel to open (64), conveniently performed by viewing the low
or lowering the pH of an enveloped virus sam- magnification, high contrast image produced
ple to initiate early events of viral fusion (65), by slightly defocusing the electron diffraction
or inducing a temperature jump with a flash- pattern by use of the diffraction lens.
tube system to study phase transitions in lipo- After a desired specimen area is identified,
somes (661, or mixing myosin S1 fragments the microscope is switched to high magnifica-
with F-actin to examine the geometry of the tion mode for focusing and astigmatism cor-
crossbridge powerstroke in muscle (67). rection. These adjustments are typically per-
Crystalline (2D) samples fortunately can formed in a region about 2-10 pm away from
often be prepared for cryo-EM by means of the chosen area at the same or higher magni-
simpler procedures, and vitrification of the fication than that used for photography. The
bulk water is not always essential to achieve choice of magnification, defocus level, acceler-
success (68). Such specimens may be applied ating voltage, beam coherence, electron dose,
to the carbon film on an EM grid by normal and other operating conditions is dictated by
adhesion methods, washed with 1-2% solu- several factors. The most significant ones are
tions of solutes like glucose, trehalose, or tan- the size of the particle or crystal unit cell being
nic acid, blotted gently with filter paper to re- studied, the anticipated resolution of the im-
move excess solution, air dried, loaded into a ages, and the requirements of the image pro-
cold holder, inserted into the microscope, and, cessing needed to compute a 3D reconstruc-
finally, cooled to liquid nitrogen temperature. tion to the desired resolution. For most
specimens at required resolutions from 3 to 30
A, images are typically recorded at 25,000-
10 MICROSCOPY 50,000X magnification, with an electron dose
of between 5 and 20 e-/A2. These conditions
Once the vitrified specimen is inserted into the yield micrographs of sufficient optical density
microscope and sufficient time is allowed (OD 0.2-1.5) and image resolution for subse-
(-15 min) for the specimen stage to stabilize quent image-processing steps. Most modern
to minimize drift and vibration, microscopy is EMS provide some mode of low dose operation
performed to generate a set of images that, for imaging beam-sensitive, vitrified biological
with suitable processing procedures, can later specimens.
be used to produce a reliable 3D reconstruc- The intrinsic low contrast of unstained
tion of the specimen at the highest possible specimens makes it impossible to observe and
resolution. To achieve this goal, imaging must focus on specimen details directly, as is rou-
be performed at an electron dose that mini- tine with stained or metal-shadowed speci-
mizes beam-induced radiation damage to the mens. Focusing, aimed to enhance phase con-
specimen, with the objective lens of the micro- trast in the recorded images but minimize
scope defocused to enhance phase contrast beam damage to the desired area, is achieved
from the weakly scattering, unstained biolog- by judicious defocusing on a region that is ad-
ical specimen, and under conditions that keep jacent to the region to be photographed and
the specimen below the devitrification tem- preferably situated on the microscope tilt axis.
perature and minimize its contamination. The appropriate focus level is set by adjusting
The microscopist locates specimen areas the appearance of either the Fresnel fringes
suitable for photography by searching the EM that occur at the edges of holes in the carbon
grid at very low magnification ( ~ 3 0 0 0 X to
) film or the "phase granularity" from the car-
keep the irradiation level very low (<0.05 e-/ bon support film.
A2) while assessing sample quality. In micro- Unfortunately, electron images do not give
scopes operated at 200 keV or higher, where a direct rendering of the specimen density dis-
image contrast is very weak, it is helpful to tribution. The relationship between image
perform the search procedure with the assis- and specimen is described by the contrast
tance of a CCD camera or a video-rate TV- transfer function (CTF), which is characteris-
10 Microscopy
tic of the particular microscope used, the spec- The overall dependency of CTF on resolu-
imen, and the conditions of imaging. The mi- tion, wavelength, defocus, and spherical aber-
!
croscope CTF arises from the objective lens ration is given by
focal setting and from the spherical aberration
present in all electromagnetic lenses and var-
ies with the defocus and accelerating voltage
according to a formula (see below) that in-
cludes both phase- and amplitude-contrast
components. First, however, it might be useful where ~ ( v =) T A ~ ( -
A ~0.5C,h2?); v is the
to describe briefly the essentials of amplitude spatial frequency (in kl); Fa,, is the fraction
contrast and phase contrast, two concepts car- of amplitude contrast; A is the electron wave-
ried over from optical microscopy. Amplitude length (in &, where
contrast refers to the nature of the contrast in
an image of an object that absorbs the incident
illumination or scatters it in any other way, so
that a proportion of it is lost. As a result, the (=0.037, 0.025, and 0.020 A for 100, 200, and
image appears darker where greater absorp- 300 keV electrons, respectively); Vis the volt-
tion occurs. Phase contrast is required if an age (in volts); Af is the underfocus (in &; and
object is transparent (i.e., it is a pure phase C, is the spherical aberration of the objective
object) and does not absorb but only scatters lens of the microscope (in A).
the incident illumination. Biological speci- In addition, this CTF is attenuated by an
mens for cryo-EM are almost pure phase ob- envelope or damping function, which depends
jects and the scattering is relatively weak, so on the coherence of the beam, specimen drift,
that the simple theory of image formation by a and other factors (6,71,72).Figure 14.5 shows
weak phase object applies (69, 70). An exactly a few representative CTFs for different
in-focus image of a phase object has no con- amounts of defocus on a normal and a FEG
trast variation because all the scattered illu- microscope. Thus, for a particular defocus set-
mination is focused back to equivalent points ting of the objective lens, phase contrast in the
in the image of the object from which it was electron image is positive and maximal only at -
scattered. In optical microscopy, the use of a a few specific spatial frequencies. Contrast is
quarter wave plate can retard the phase of the either lower than maximal, completely absent,
direct unscattered beam, so that an in-focus or it is opposite (inverted or reversed) from
image of a phase object has very high "Zer- that at other frequencies. Hence, as the objec-
nicke" phase contrast. However, there is as tive lens is focused, the electron microscopist
yet no simple quarter wave plate for electrons, selectively accentuates image details of a par-
so instead, phase contrast is created by intro- ticular size.
ducing phase shifts into the diffracted beams Images are typically recorded 0.8-3.0 pm
by adjustment of the excitation of the objective underfocus to enhance s~ecimenfeatures in
lens so that the image is slightly defocused. In the 20-40 A size range a i d thereby facilitate
addition, because all matter is composed of at- phase origin and specimen orientation search
oms and the electric potential inside each procedures carried out in the image-process-
atom is very high near the nucleus, even the ing steps. However, this level of underfocus
electron scattering behavior of the light atoms also enhances the contrast envelope in lower
found in biological molecules deviates from resolution maps, which may help in interpre-
that of a weak phase object; however, for a tation. To obtain results at better than 10-15
deeper discussion of this the reader should re- A resolution, it is essential to record, process,
fer to Reimer (70) or Spence (69). In practice, and combine data from several micrographs
the proportion of "amplitude" contrast is that span a range of defocus levels [e.g., Unwin
about 7% at 100 kV, 5% at 200 kV, and 4% at and Henderson (7); Bottcher et al. (44)l. This
300 kV for low dose images of protein mole- strategy ensures good information transfer at
cules embedded in ice. all spatial frequencies up to the limiting reso-
Electron Cryomicroscopy of Biological Macromolecules
1
0.05 0.10 0.1 5 0.20 0.25
Spatial frequency (k')
Figure 14.5. Representative plots of the contrast transfer function (CTF) as a function of spatial
frequency, for two different defocus settings (0.7 and 4.0 pm underfocus) and for a field emission
(light curve) or tungsten (dark curve) electron source. All plots correspond to electron images formed
in an electron microscope operated at 200 kV and with objective lens aberration coefficients, C, = C,
= 2.0 mm, and assuming amplitude contrast of 4.8% (73). The spatial coherence, which is related to
the electron source size and expressed as P, the half-angle of illumination, for tungsten and FEG
electron sources was fixed at 0.3 and 0.015 milliradians, respectively. Likewise, the temporal coher-
ence (expressed as AE,the energy spread) was fixed at 1.6 and 0.5 eV for tungsten and FEG sources.
The combined effects of the poorer spatial and temporal coherence of the tungsten source leads to a
significant dampening, and hence loss of contrast, of the CTF at progressively higher resolutions
compared to that observed in FEG-equipped microscopes. The greater number of contrast reversals
with higher defocus arises because of the greater out-of-focus phase shifts.
lution but requires careful compensation for (FEG) electron sources [e.g., Zemlin (76, 78);
the effects of the microscope CTF during im- Zhou and Chiu (77); Mancini et al. (79)l. The
age processing. Also, the recording of image high coherence of a FEG source ensures that
focal pairs or focal series from a given speci- phase contrast in the images remains strong
men area can be beneficial in determining or- out to high spatial frequencies (>1/3.5 kl),
igin and orientation parameters for processing even for highly defocused images. The use of
of images of single particles [e.g., Cheng et al. higher voltages provides potentially higher
(74); Trus et al. (75)l. resolution [greater depth of field (i.e., less cur-
Many high resolution cryo-EM studies are vature of the Ewald sphere) attributed to
now performed with microscopes operated at smaller electron beam wavelength], better
200 keV or higher and with field emission gun beam penetration (less multiple scattering),
11 Selection and Preprocessing of Digitized Images
reduced problems with specimen charging defocus to produce good phase contrast. This
that plague microscopy of unstained or un- is usually done by visual examination and op-
coated vitrified specimens (go), and reduced tical diffraction.
phase shifts associated with beam tilt. Once the best pictures have been chosen,
Images are recorded on photographic film the micrographs must be scanned and digi-
or on a CCD camera with either flood beam or tized on a suitable densitometer. The sizes of
spot-scan procedures. Film, with its advan- the steps between digitization of optical den-
tages of low cost, large field of view, and high sity and the size of the sample aperture over
resolution (-10 pm), has remained the pri- which the optical density is averaged by the
mary image recording medium for most densitometer must be sufficiently small to
cryo-EM applications, despite disadvantages sample the detail present in the image at fine
of high background fog and need for chemical enough intervals (83). Normally, a circular (or
development and digitization. CCD cameras square) sample aperture of diameter (or
provide image data directly in digital form and length of side) equal to the step between digi-
with very low background noise, but suffer tizations is used. This avoids digitizing over-
from higher cost, limited field of view, limited lapping points, without missing any of the in-
spatial resolution caused by poor point spread formation recorded in the image. - The size of
characteristics, and a fixed pixel size (typically the sample aperture and digitization step de-
between 14 and 24 pm). They are useful, for pends on the magnification selected and the
example, for precise focusing and adjustment resolution required. A value of 114 to 113 of the
of astigmatism [e.g., Krivanek and Mooney required limit of resolution (measured in pm
(81);Sherman et al. (82)l. on the emulsion) is normally ideal because it
For studies in which specimens must be avoids having too many numbers (and there-
tilted to collect 3D data, such as with 2D crys- fore wasting computer resources), without los-
tals, or single particles that adopt preferred ing anything during the measurement proce-
orientations on the EM grid, or specimens re- dure. For a 40,000X image, on which a
quiring tomography, microscopy is performed resolution of 10 A at the specimen is required,
in essentially the same way as described a step size of 10 pm { = 1/4 X [(lo A X 40,000)l
above. However, the limited tilt range (260- (10,000 &pm)]) would be suitable.
70") of most microscope goniometers can lead The best area of an image of a helical or 2D
to nonisotropic resolution in the 3D recon- crystal specimen can then be boxed off using a
structions (the "missing cone" problem), and soft-edge mask. For images of single particles,
tilting generates a constantly varying defocus a stack of individual particles can be created
across the field of view in a direction normal to by selecting out many small areas surround-
the tilt axis. The effects caused by this varying ing each particle. Because, in the later steps of
defocus level must be corrected in high resolu- image processing, the orientation and position
tion applications. of each particle are refined by comparing the
amplitudes and phases of their Fourier com-
ponents, it is important to remove spurious
11 SELECTION A N D PREPROCESSING features around the edge of each particle and
O F D I G I T I Z E D IMAGES to make sure the different particle images are
on the same scale. This is normally done by
Before any image analysis or classification of masking off a circular area centered on each
the molecular images can be done, a certain particle and floating the density so that the
amount of preliminary checking and normhl- average around the perimeter becomes zero
ization is required to ensure there is a reason- (83). The edge of the mask is apodized by ap-
able chance that a homogeneous population of plying a soft cosine bell shape to the original
molecular images has been obtained. First, densities so they taper toward the background
good quality micrographs are selected in level. Finally, to compensate for variations in
which the electron exposure is correct, there is the exposure attributed to ice thickness or
no image drift or blurring, and there is mini- electron dose, most microscopists normalize
mal astigmatism and a reasonable amount of the stack of individual particle images so that
624 Electron Cryornicroscopy of Biological Macromolecules
the mean density and mean density variation relatively specialized compared to that used in
over the field of view are set to the same values the more mature field of macromolecular X-
for all particles (84). ray crystallography. In part, this may be at-
Once some good particles or crystalline ar- tributed to the large diversity of specimen
eas for 1D or 2D crystals have been selected, types amenable to cryo-EM and reconstruc-
digitized, masked, and their intensity values tion methods. As a consequence, image-recon-
normalized, true image processing can begin. struction software is evolving quite rapidly,
and references to software packages cited in
Table 14.2 are likely to become quickly out-
12 I M A G E PROCESSING A N D 3 D -
dated. Extensive discussion of algorithms and
RECONSTRUCTION software packages in use at this time may be
found in a number of recent special issues of
Although the general concepts of signal aver- the Journal of Structural Biology [volumes
aging, together with combining different 116(1), 120(3), 121(2),and 125(2/3)1.
views to reconstruct the 3D structure, are In practice, attempts to determine or refine
common to the different computer-based pro- some parameters may be affected by the in-
cedures that have been implemented, it is im- ability to determine accurately one of the
portant to emphasize one or two preliminary other parameters. The solution of the struc-
points. First, a homogeneous set of particles ture is therefore an iterative procedure in
must be selected for inclusion in the 3D recon- which reliable knowledge of the parameters
struction. This selection may be made by eye, that describe each image is gradually built up
to eliminate obviously damaged particles or to produce an increasingly accurate structure,
impurities, or by the use of multivariate sta- until no more information can be squeezed out
tistical analysis (85) or some other classifica- of the micrographs. At this point, if any of the
tion scheme. This allows a subset of the parti- origins or orientations is wrongly assigned,
cle images to be used to determine the there will be a loss of detail and signal-to-noise
structure of a better defined entity. All image- ratio in the map. If a better determined or
processing procedures require the determina- higher resolution structure is required, it
tion of the same parameters that are needed to would then be necessary to record images on a
specify unambiguously how to combine the in- better microscope or to prepare new speci-
formation from each micrograph or particle. mens and record better pictures.
These parameters are: the magnification, de- The reliability and resolution of the final
focus, astigmatism, and, at high resolution, reconstruction can be measured by use of a
the beam tilt for each micrograph; the electron variety of indices. For example, the differen-
wavelength used (i.e., accelerating voltage of tial phase residual (DPR) (1331, the Fourier
the microscope);the spherical aberration coef- shell correlation (FSC) (134), and the Q-factor
ficient (C,) of the objective lens; and the orien- (135) are three such measures. DPR is the
tation and phase origin for each particle or mean phase difference, as a function of resolu-
unit cell of the ID, 2D, or 3D crystal. There are tion, between the structure factors from two
13 parameters for each particle, of which eight independent reconstructions, often calculated
may be common to each micrograph and two by splitting the image data into two halves.
or three (C,, kV, magnification) to each micro- FSC is a similar calculation of the mean corre-
scope. The different general approaches that lation coefficient between the complex struc-
have been used in practice to determine the ture factors of the two halves of the data as a
3D structure of different classes of macromo- function of resolution. The Q-factor is the
lecular assemblies from one or more electron mean ratio of the vector sum of the individual
micrographs are listed in Table 14.2. structure factors from each image divided by
The precise way in which each general ap- the sum of their moduli, again calculated as a
proach codes and determines the particle or function of resolution. Perfectly accurate mea-
unit cell parameters varies greatly and is not surements would have values of DPR, FSC,
described in detail. Much of the computer soft- and Q-factor of O", 1.0, and 1.0 respectively,
ware used in image reconstruction studies is whereas random data containing no informa-
12 Image Processing and 3D Reconstruction
tion would have values of go0,0.0, and 0.0. The 2. Calculate 3D structures for each particle
spectral signal-to-noise ratio (SSNR) crite- by use of an R-weighted back-projection
rion has been advocated as the best of all algorithm (93).
(136): it effectively measures, as a function of 3. Average 3D data for several particles in
resolution, the overall signal-to-noise ratio
real or reciprocal space to get a reasonably
(squared) of the whole of the image data. It is
good 3D model of the stain excluding the
calculated by taking into consideration how
well all the contributing image data agree in- region of the particle.
ternally. 4. Record a number of micrographs of the
An example of a typical strategy for deter- particles embedded in vitreous ice.
mination of the 3D structure of a new and un- 5. Use the 3D negative stain model obtained
known molecule without any symmetry and in (3) with inverted contrast to determine
that does not crystallize might be as follows:
the rough alignment parameters of the
particle in the ice images.
1. Record a single axis tilt series of particles
embedded in negative stain, with a tilt 6. Calculate a preliminary 3D model of the
range from - 60" to + 60". average, ice-embedded structure.
Electron Cryomicroscopy of Biological Macromolecules
7. Use the preliminary 3D model to deter- symmetry of the 2D space group of the crystal.
mine more accurate alignment parame- Finally, the whole data set is fitted by least
ters for the particles in the ice images. squares to constrained amplitudes and phases
8. Calculate a better 3D model. along the lattice lines (137) before calculating
9. Determine defocus and astigmatism to al- a map of the structure. The initial determina-
low CTF calculation and correct 3D model tion of the 2D space group can be carried out
so that it represents the structure at high by a statistical test of the phase relationships
resolution. in one or two images of untilted specimens
(138). The absolute hand of the structure is
10. Keep adding pictures at different defocus
automatically correct, given that the 3D struc-
levels to get an accurate structure at as
ture is calculated from images whose tilt axis
high a resolution as possible.
and tilt angle are known. Nevertheless, care
must be taken not to make any of a number of
For large single particles with no symmetry trivial mistakes that would invert the hand.
or for particles with higher symmetry or for
crystalline arrays, it should be possible to miss 12.2 Helical Particles
out the negative staining steps and go straight The basic steps involved in processing and 3D
to alignment of particle images from ice-em- reconstruction of helical specimens include:
bedding because the particle or crystal tilt an- Recording a series of micrographs of vitrified
gles can be determined internally from com- particles suspended over holes in a perforated
parison of phases along common lines in carbon support film. The micrographs are dig-
reciprocal space or from the lattice or helix itized and Fourier-transformed to determine
parameters from a 2D or 1D crystal. image quality (astigmatism, drift, defocus,
The following discussion briefly outlines presence, and quality of layer lines, etc.). Indi-
for a few specific classes of macromolecule the vidual particle images are boxed, floated, and
general strategy for carrying out image pro- apodized within a rectangular mask. The pa-
cessing and 3D reconstruction (see Fig. 14.6). rameters of helical symmetry (number of sub-
units per turn and pitch) must be determined
12.1 2D Crystals
by indexing the computed diffraction pat-
For 2D crystals, the general 3D reconstruction terns. If necessary, simple spline-fitting proce-
approach consists of the following steps: First, dures may be employed to "straighten" im-
a series of micrographs of single 2D crystals ages of curved particles (124), and the image
are recorded at different tilt angles, with ran- data may be reinterpolated (126) to provide
dom azimuthal orientations. Each crystal is more precise sampling of the layer line data in
then unbent using cross-correlation tech- the computed transform. Once a preliminary
niques, to identify the precise position of each 3D structure is available, a much more sophis-
unit cell (1271, and amplitudes and phases of ticated refinement of all the helical parame-
the Fourier components of the average of that ters can be used to unbend the helices onto a
particular view of the structure are obtained predetermined average helix so that the con-
for the transform of the unbent crystal. The tributions of all parts of the image are cor-
reference image used in the cross-correlation rectly treated (123). The layer line data are
calculation can either be a part of the whole extracted from each particle transform and
image masked off after a preliminary round of two phase origin corrections are made, one to
averaging by reciprocal space filtering of the shift the phase origin to the helix axis (at the
regions surrounding the diffraction spots in center of the particle image) and the other to
the transform, or it can be a reference image correct for effects caused by having the helix
calculated from a previously determined 3D axis tilted out of the plane normal to the elec-
model. The amplitudes and phases from each tron beam in the electron microscope. The
image are then corrected for the CTF and layer line data are separated out into near-
beam tilt (11,22, 127) and merged with data and far-side data, corresponding to contri-
from many other crystals by scaling and origin butions from the near and far sides of each
refinement, taking into account the proper particle imaged. The relative rotations and
Image Processing and 3D Reconstruction 627
Figure 14.6. Examples of macromolecules studied by cryo-EM and 3D image reconstruction and the
resulting 3D structures (bottom row) after cryo-EM analysis. All micrographs (top row) are displayed
at about 170,000X magnification and all models a t about 1,200,000x magnification. (a) A single
particle without symmetry: The micrograph shows 70s E. coli ribosomes complexed with mRNA and
Met-tRNA. The surface-shaded density map, made by averaging 73,000 ribosome images from 287
micrographs has a resolution (FSC) of 11.5 A. The 50s and 30s subunits and the tRNA are colored
blue, yellow, and green, respectively. The identity of many of the subunits is known and some RNA
double helices are clearly recognizable by their major and minor grooves (e.g., helix 44 is shown in
red). [Courtesy of J. Frank (SUNY, Albany), using data from Gabashvili et al. (86).1 (b) A single
particle with symmetry: The micrograph shows hepatitis B virus cores. The 3D reconstruction, a t a
resolution of 7.4 A (DPR), was computed from 6384 particle images taken from 34 micrographs.
[From Bottcher et al. (441.1 (c) A helical filament: The micrograph shows actin filaments decorated
with myosin S1 heads containing the essential light chain. The 3D reconstruction, at a resolution of
30-35 A, is a composite in which the differently colored parts are derived from a series of difference
maps that were superimposed on f-actin. The components include: f-actin (blue), myosin heavy chain
motor domain (orange), essential light chain (purple), regulatory light chain (white), tropomyosin
(green), and myosin motor domain N-terminal beta-barrel (red). [Courtesy of A. Lin, M. Whittaker,
and R. Milligan (Scripps Research Institute, La Jolla, CA).] (dl A 2D crystal, light-harvesting complex
-
LHCII at 3 . 4 4 resolution. The model shows the ~ r o t e i nbackbone and the arrangement of chro-
mophores in a number of trimeric subunits in the crystal lattice. In this example, image contrast is too
low to see any hint of the structure without image processing (see also Fig. 14.3). See color insert.
[Courtesy of W. Kuhlbrandt (Max-Planck-Institute for Biophysics, Frankfurt, Germany).]
of each particle, defined by three Eulerian an- tion helps to increase the signal-to-noise ratio
gles, is determined either by means of com- for the structure at high resolution. Cold
mon and cross-common lines techniques or stages are constantly being improved, with
with the aid of model-based procedures [e.g., several liquid helium stages now in operation
Crowther (106); Fuller et al. (107);Baker et al. (143, 144). Two of these are commercially
(17)l. Once a set of self-consistent particle im- available from JEOL and FEIPhilips.
ages is available, an initial, low resolution 3D Finally, three additional likely trends in-
reconstruction is computed by merging these clude: (1)increased automation, including the
data with Fourier-Bessel methods (106). This recording of micrographs, the use of spotscan
reconstruction then serves as a reference for procedures in remote microscope operation
(145, 146), and in every aspect of image pro-
refining the orientation, origin, and CTF pa-
cessing; (2) production of better electronic
rameters of each of the included particle im-
cameras (e.g., CCD or pixel detectors); and (3)
ages, for rejecting "bad" images, and for in-
increased use of dose-fractionated, tomo-
creasing the size of the data set by including graphic tilt series, to extend EM studies to the
new particle images from additional micro- domain of larger supramolecular and cellular
graphs taken at different defocus levels. A new structures (102, 147).
reconstruction, computed from the latest set
of images, serves as a new reference and the
above refinement procedure is repeated until 15 ACKNOWLEDGMENTS
no further improvements, as measured by the
reliability criteria mentioned above, are made. We are greatly indebted to all our colleagues at
Determination of the absolute hand of the Purdue and Cambridge for their insightful
structure requires the recording and process- comments and suggestions; to B. Bottcher, R.
ing of a pair of images taken with a known, Crowther, J. Frank, W. Kiihlbrandt, and R.
small relative tilt of the specimen between the Milligan for supplying images used in Figure
two views (142). 14.6; and J. Brightwell for editorial assistance.
T.S.B. was supported in part by Grant
GM33050 from the National Institutes of
13 VISUALIZATION, MODELING, Health.
A N D INTERPRETATION OF RESULTS
Once a reliable 3D map is obtained, computer 16 ABBREVIATIONS
graphics and other visualization tools may be
used as aids in interpreting morphological de- OD zero-dimensional (single parti-
tails and understanding biological function in cles)
the context of biochemical and molecular stud- 1D one-dimensional (helical)
ies and complementary X-ray crystallographic 2D two-dimensional
and other biophysical measurements. 3D three-dimensional
CCD charge coupled device (slow scan
TV detector)
14 TRENDS cryo-EM electron cryomicroscopy
CTF contrast transfer function
The new generation of intermediate voltage EM electron microscope/microscopy
(-300 kV)FEG microscopes is now making it FEG field emission gun
much easier to obtain higher resolution im-
ages that, by use of larger defocus values, have
good image contrast at both very low and very REFERENCES
high resolution. The greater contrast at low 1. S. Brenner and R. W . Horne, Biochem. Bio-
resolution greatly facilitates particle-align- phys. Acta-Prot. S t r u t . , 34, 103-110 (1959).
ment procedures, and the increased contrast 2. H. E. Huxley and G . Zubay, J. Mol. Biol., 2,
resulting from the high coherence illumina- 10-18 (1960).
References
3. A. Klug and J. E. Berger, J. Mol. Biol., 10, 25. J. Frank, Curr. Opin. Struct. Biol., 7,266-272
565-569 (1964). (1997).
4. D. J. DeRosier and A. Klug, Nature, 217, 130- 26. J. Kenney, E. Karsenti, B. Gowen, and S. D.
134 (1968). Fuller, J. Struct. Biol., 120, 320-328 (1997).
5. W. Hoppe, R. Langer, G. Knesch, and C. Poppe, 27. Y. Tao, N. H. Olson, W. Xu, D. L. Anderson,
Naturwissenschaften, 55,333-336 (1968). M. G. Rossmann, and T. S. Baker, Cell, 95,
431-437 (1998).
6. H. P. Erickson and A. Klug, Philos. Trans. R. 28. L. B. Kong, A. C. Siva, L. H. Rome, and P. L.
Soc. Lond. B, 261, 105-118 (1971). Stewart, Structure, 7, 371-379 (1999).
7. P. N. T. Unwin and R. Henderson, J. Mol. 29. A. C. Bloomer, J. Graham, S. Hovmoller,
Biol., 94,425440 (1975). P. J . G. Butler, and A. Klug, Nature, 276,362-
8. J. Dubochet, J. Lepault, R. Freeman, J. A. Ber- 368 (1978).
riman, and J.-C. Homo, J. Microsc., 128, 219- 30. R. H. Jacobson, X.-J. Zhang, R. F. DuBose, and
237 (1982~). B. W. Matthews, Nature, 369,761-766 (1994).
9. J. Dubochet, M. Adrian, J.-J. Chang, J.-C. 31. G. P. A. Vigers, R. A. Crowther, and B. M. F.
Homo, J. Lepault, A. W. McDowall, and P. Pearse, EMBO J.,5, 529-534 (1986).
Schultz, Q. Rev. Biophys., 21,129-228 (1988). 32. M. Schatz, E. V. Orlova, P. Dube, J. Jager, and
M. van Heel, J. Struct. Biol., 114, 28-40
10. K. A. Taylor and R. M. Glaeser, Science, 186,
(1995).
1036-1037 (1974).
33. R. A. Grant, D. J. Filman, S. E. Finkel, R.
11. R. Henderson, J . M. Baldwin, T. A. Ceska, F. Kolter, and J . M. Hogle, Nut. Struct. Biol., 5,
Zemlin, E. Beckmann, and K. H. Downing, J. 294-303 (1998).
Mol. Biol., 213, 899-929 (1990).
34. A. Mattevi, G. Obmolova, E. Schulze, K. H.
12. W. Kuhlbrandt, D. N. Wang, andY. Fujiyoshi, Kalk, A. H. Westphal, A. D. Kok, and W. G. J .
Nature, 367, 614-621 (1994). Hol, Science, 255, 1544-1550 (1992).
13. E. Nogales, S. G. Wolf, and K. H. Downing, 35. R. A. Milligan, Proc. Natl. Acad. Sci. USA, 93,
Nature, 391, 199-203 (1998). 21-26 (1996).
14. K. Murata, K. Mitsuoka, T. Hirai, T. Waltz, P. 36. A. Miyazawa, Y. Fujiyoshi, M. Stowell, and N.
A g e , J. B. Heymann, A. Engel, and Y. Fujiyo- Unwin, J. Mol. Biol., 288, 765-786 (1999).
shi, Nature, 407,599-605 (2000). 37. K. Hirose, W. B. Amos, A. Lockhart, R. A.
15. L. A. Amos, R. Henderson, and P. N. T. Unwin, Cross, and L. A. Amos, J. Struct. Biol., 118, '
Prog. Biophys. Mol. Biol., 39, 183-231 (1982). 140-148 (1997).
16. T. Walz & N. Grigorieff, J. Struct. Biol., 121, 38. K. Narnba and F. Vonderviszt, Q. Rev. Bio-
142-161 (1998). phys., 30,l-65 (1997).
17. T. S. Baker, N. H. Olson, and S. D. Fuller, Mi- 39. T.-W. Jeng, R. A. Crowther, G. Stubbs, and W.
crobiol. Mol. Biol. Rev., 63,862-922 (1999). Chui, J. Mol. Biol., 205, 251-257 (1989).
18. J. Frank, Three-Dimensional Electron Micros- 40. A. Cheng, A. N. van Hoek, M. Yeager, A. S.
copy of MacromolecularAssemblies, Academic Verkman, and A. K. Mitra, Nature, 387, 627-
Press, San Diego, CA, 1996,342 pp. 630 (1997).
41. V. M. Unger, N. M. Kumar, N. B. Gilula, and
19. I. Hargittai and M. Hargittai, Eds., Stereo-
M. Yeager, Science, 283,1176-1180 (1999).
chemical Applications of Gas-Phase Electron
Diffraction, VCH, New York, 1988. 42. D. A. Winkelmann, T. S. Baker, and I. Ray-
ment, J. Cell Biol., 114, 701-713 (1991).
20. M. Isaacson, J. Langmore, and H. Rose, Optik,
41,92-96 (1974). 43. K. A. Taylor, J. Tang, Y. Cheng, and H. Win-
kler, J. Struct. Biol., 120,372-386 (1997).
21. R. Henderson, Q. Rev. Biophys., 28, 171-193
44. B. Bottcher, S. A. Wynne, and R. A. Crowther,
(1995).
Nature, 386, 88-91 (1997).
22. W. A. Havelka, R. Henderson, and D. Oester- 45. A. Malhotra, P. Penczek, R. K. Agrawal, I. S.
helt, J. Mol. Biol., 247, 726-738 (1995). Gabashvili, R. A. Grassucci, R. Junemann, N.
23. R. M. Glaeser, J. Ultrastruct. Res., 36, 466- Burkhardt, K. H. Nierhaus, and J. Frank, J.
482 (1971). Mol. Biol., 280, 103-116 (1998).
24. R. Henderson and P. N. T. Unwin, Nature, 46. R. W. Horne and I. Pasquali-Ronchetti, J. Ul-
2 5 7 , 2 8 3 2 (1975). trastruct. Res., 47, 361-383 (1974).
Electron Cryomicroscopy of Biological Macromolecules
93. M. Radermacher in J. Frank, Ed., Electron To- 115. I. M. Boier Martin, D. C. Marinescu, R. E.
mography, Plenum Press, New York, 1992, pp. Lynch, and T . S. Baker, J. Struct. Biol., 120,
91-115. 146-157(1997).
94. M. Radermacher, Ultramicroscopy, 53, 116. Z. H. Zhou, W . Chiu, K. Haskell, H. J. Spears,
121-136(1994). J. Jakana, F. J. Rixon, and L. R. Scott, Biophys.
95. P. A. Penczek, R. A. Grassucci, and J . Frank, J.,74,576-588(1998).
Ultramicroscopy, 53,251-270(1994). 117. P.L. Stewart, C. Y . Chiu, S. Huang, T . Muir,Y.
96. M . Schatz and M. van Heel, Ultramicroscopy, Zhao, B. Chait, P. Mathias, and G. R. Nem-
32,255-264(1990). erow, EMBO J., 16,1189-1198(1997).
97. P. Penczek, M. Radermacher, and J. Frank, 118. J. Walz, T . Tamura, N. Tamura, R. Grimm,W .
Ultramicroscopy, 40,33-53(1992). Baumeister, and A. J. Koster, Mol. Cell, 1,
59-65(1997).
98. N. Grigorieff, J. Mol. Biol., 277, 1033-1046
(1998). 119. M. Stewart, J. Electron Microsc. Technol., 9,
325-358(1988).
99. D. E. Olins, A. L. Olins, H. A. Levy, R. C.
120. C. Toyoshima and N. Unwin, J. Cell Biol., 111,
Durfee, S. M. Margle, E. P. Tinnel, and S. D.
2623-2635(1990).
Dover, Science, 220,498-500 (1983).
121. D. G. Morgan and D. DeRosier, Ultramicros-
100.U . Skoglund and B. Daneholt, Trends Bio-
copy, 46,263-285 (1992).
chem. Sci., 11,499503(1986).
122. N. Unwin, J. Mol. Biol., 229, 1101-1124
101. J. C. Fung, W . Liu, W . J. DeRuijter, H. Chen,
(1993).
C. K. Abbey, J . W . Sedat, and D. A. Agard, J.
Struct. Biol., 116,181-189 (1996). 123. R. Beroukhim and N. Unwin, Ultramicros-
copy, 70,57-81 (1997).
102. W . Baumeister, R. Grimm, and J. Walz, Trends
Cell Biol., 9,81-85(1999). 124. E.H.Egelman, Ultramicroscopy, 19,367474
(1986).
103. A. K. Shah and P. L. Stewart, J. Struct. Biol.,
123,17-21(1998). 125. B. Carragher, M . Whittaker, and R. A. Milli-
gan, J. Struct. Biol., 116,107-112 (1996).
104. F. Beuron, M . R. Maurizi, D. M. Belnap, E.
126. C. H. Owen, D. G. Morgan, and D. J. DeRosier,
Kocsis, F. P. Booy, M. Kessel, and A. C. Steven,
J. Struct. Biol., 116,167-175(1996).
J. Struct. Biol., 123,248-259 (1998).
127. R. Henderson, J. M. Baldwin, K. H. Downing,
105. R. A. Crowther, L. A. Amos, J. T . Finch, D. J. J. Lepault, and F. Zemlin, Ultramicroscopy,
DeRosier, and A. Klug, Nature, 226,421-425 19, 147-178(1986).
(1970).
128. J. M . Baldwin, R. Henderson, E. Beckman, and
106. R. A. Crowther, Philos. Trans. R. Soc. Lond., F. Zemlin, J. Mol. Biol., 202,585-591(1988).
261,221-230(1971).
129. S. Hardt, B. Wang,andM. F . Schmid, J. Struct.
107. S. D. Fuller, S. J. Butcher, R. H. Cheng, and Biol., 116,68-70(1996).
T . S. Baker, J. Struct. Biol., 116,48-55(1996).
130. R. A. Crowther and P. K. Luther, Nature, 307,
108. R. H. Cheng, V . S. Reddy, N . H. Olson, A. J . 569-570(1984).
Fisher, T . S. Baker, and J. E. Johnson, Struc-
131. H. Winkler and K. A. Taylor, J. Struct. Biol.,
ture, 2,271-282(1994).
116,241-247(1996).
109. R. A. Crowther, N . A. Kiselev, B. Bottcher, 132. J. Frank in J . Frank, Ed., Electron Tomogra-
J. A. Berriman, G. P. Borisova,V . Ose, and P. phy: Three-Dimensional Imaging with the
Pumpens, Cell, 77,943-950(1994).
Transmission Electron Microscope, Plenum
110. T . S. Baker and R. H . Cheng, J. Struct. Biol., Press, New York, 1992,399 pp.
116,120-130(1996). 133. J. Frank, A. Verschoor, and M . Boublik, Sci-
111. J. R. Castbn, D. M . Belnap, A. C. Steven, and ence, 214,1353-1355(1981).
B. L. Trus, J. Struct. Biol., 125, 209-215 134. M. van Heel, Ultramicroscopy, 21, 95-100
(1999). (198'7b).
112. R. A. Crowther, R. Henderson, and J. M. 135. M. van Heel and J. Hollenberg in W . Baumeis-
Smith, J. Struct. Biol., 116,9-16(1996). ter and W . Vogell, Eds., Electron Microscopy at
113. J. A. Lawton and B. V . V . Prasad, J. Struct. Molecular Dimensions, Springer-Verlag, Ber-
Biol., 116,209-215(1996). lin, 1980, pp. 256-260.
114. P. A. Thuman-Commike and W . Chiu, J. 136. M . Unser, B. L. Trus, J. Frank, and A. C.
Struct. Biol., 116,41-47(1996). Steven, Ultramicroscopy, 30,429-434 (1989).
Electron Cryomicroscopy of Biological Macromolecules
137. D. A. Agard, J. Mol. Biol., 167, 849-852 143. Y. Fujiyoshi, T. Mizusaki, K. Morikawa, H.
(1983). Yamagishi, Y. Aoki, H. Kihara, and Y. Harada,
138. J. M. Valpuesta, J. L. Carrascosa, and R. Hen- Ultramicroscopy, 38,241-251(1991).
derson, J. Mol. Biol., 240,281-287(1994). 144. F. Zemlin, E. Beckmann, and K. D. vander-
Mast, Ultramicroscopy, 63,227-238(1996).
139. J. T.Finch, J. Mol. Biol., 66,291-294(1972).
145. N.Kisseberth, M. Whittaker, D. Weber, C. S.
140. Z. H. Zhou, S. Hardt, B. Wang, M. B. Sherman, Potter, and B. Carragher, J. Struct. Biol., 120,
J. Jakana, and W. Chiu, J. Struct. Biol., 116, 309-319(1997).
216-222(1996). 146. M. Hadida-Hassan, S. J. Young, S. T. Peltier,
141. N.H. Olson and T. S. Baker, Ultramicroscopy, M. Wong, S. Lamont, and M. H. Ellisman, J.
30,281-298(1989). Struct. Biol., 125,235-245 (1999).
142. D.M. Belnap, N. H. Olson, and T. S. Baker, J. 147. B. F. McEwen, K. H. Downing, and R. M. Glae-
Struct. Biol., 120,44-51(1997). ser, Ultramicroscopy, 60,357-373 (1995).
CHAPTER FIFTEEN
M. ANGELS ESTIARTE
DANIEL H. RICH
School of Pharmacy-Department of Chemistry
University of Wisconsin-Madison
Madison, Wisconsin
Contents
1 Introduction, 634
2 Classification of Peptidomimetics,634
3 Design of Conformationally Restricted Peptides,
636
4 Template Mimetics, 643
5 Peptide Bond Isosteres, 644
6 From Transition-State Analog Inhibitors to Non-
Peptide Inhibitors: Examples in Protease
Inhibitors, 646
6.1 TSA in Aspartic Peptidase Inhibitors, 647
6.2 TSA in Metallo Peptidase Inhibitors, 650
6.3 TSA-Derived Cysteine and Serine Peptidase
Inhibitors, 652
7 Speeding up Peptidomimetic Research, 655
8 Toward Rational Drug Design: Discovery of
Novel Non-Peptide Peptidomimetics, 657
9 Historical Development of Important Non-
Peptide Peptidomimetics,659
9.1 HIV Protease, 659
9.2 Thrombin, 660
9.3 Factor Xa,662
9.4 Glycoprotein IIbiIIIa (GP IIbDIIa),662
9.5 Ras-Farnesyltransferase, 665
9.6 Non-Peptidic Ligands for Peptide Receptors,
667
9.6.1 Angiotensin 11,668
9.6.2Substance P, 669
9.6.3Neuropeptide Y,670
9.6.4Growth Hormone Secretagogues, 670
9.6.5 Endothelin, 672
10 Summary and Future Directions, 674
Burger's Medicinal Chemistry and Drug Discovery
Sixth Edition, Volume 1: Drug Discovery
Edited by Donald J. Abraham
ISBN 0-471-27090-3 O 2003 John Wiley & Sons, Inc.
Peptidomimetics for Drug Design
1 INTRODUCTION 2 CLASSIFICATION OF
PEPTIDOMIMETICS
Protein-protein interactions are central to bi-
ology and provide one mechanism to convert The term peptidomimetic is often used in the
genomic information into regulated biological literature to indicate a multitude of structural
responses. Important examples of protein- types that differ in fundamental ways. Com-
peptide interactions include the binding of parisons between peptidomimetics suffer
peptide ligands to proteases, the binding of from the lack of accepted definitions of what a
peptide hormones to peptide receptors, the re- peptidomimetic is (1).The term is often ap-
cruitment of proteins to effect signal trans- plied to highly modified analogs of peptides
duction, and apoptosis. Peptides also act as without distinguishing how these differ from
neurotransmitters, neuromodulators, hor- classical analogs of peptides. For example,
mones, and autocrine and paracrine factors. peptide (2) is derived from the decapeptide
Unfortunately, their use as pharmaceutical LH-RH (1);(2) contains only five amino acids,
drugs is made difficult by their poor pharma- none of which is present in the parent com-
cokinetic profiles; they are easily proteolyzed, pound, yet it is a powerful antagonist of the
poorly transported, and rapidly excreted. Al- LH-RH receptor (Fig. 15.1) (2). Is (2) a peptide
though modern formulation techniques have analog or a peptidomimetic?
improved delivery of peptides (e.g., inhalation In the 19709, Hughes et al. were the first to
of insulin), there remains a need for small po- show that two very different chemical struc-
tent molecules that can be administered tures have similar agonist properties (3). The
orally. opioid natural product, morphine (3),was
For these reasons, much effort has been ex- found to resemble the N-terminal structure of
pended to find ways to replace portions of pep- the endogenous opioid peptides, enkephalins,
tides with non-peptide structures, called pep- (4a) and (4b),and p-endorphin ( 5 )(Fig. 15.2).
tidomimetics, in the hope of obtaining orally The remarkable similarity between the mor-
bioavailable entities. Several types of peptido- phine phenol system and the N-terminal ty-
mimetics have been developed, and the field rosine residue in the peptide opioids implied
has emerged as one of the important ap- that these units reacted with opioid receptors
proaches to drug design and discovery. This in a similar fashion to elicit comparable re-
review will describe the various methods de- sponses (4- 6).
veloped to design peptidomimetics. Due to The realization that a non-peptide natural
space limitations, the biological rationale for product was mimicking the action of a natural
each peptidomimetic and its chemical synthe- peptide effector led Farmer to postulate that
sis can not be covered. Selected examples of other non-peptide structures might be found
the strategies employed to obtain peptidomi- that would mimic other peptide effectors (7).
metics are provided to illustrate the breadth of His concept of "peptide mimetic," which later
research in this field. was called "peptidomimetic," proposed that
pGIu-His-Trp-Ser-Tyr-Gly-Leu-Arg-Pro-Gly-NH,
LH-RH
(1
&
Met-enkephalin Tyr-Gly-Gly-Phe-Met
(4a)
Leu-Enkephalin Tyr-Gly-Gly-Phe-Leu
(4b) o\+
Tyr-Gly-Gly-Phe-Met-Thr-Ser-GIu-Lys-Ser-
,%Endorphin Gln-Thr-Pro-Leu-Val-Thr-Phe-Lys-Asn-Ala-
HO b~
Ile-He-Lys-Asn-Ala-Tyr-Lys-Lys-Gly-Glu Morphine
(5) (3)
Figure 15.2. Examples of peptidic and non-peptidic opioid receptor ligands.
novel scaffolds could be designed to replace idence is available from X-ray crystallography
the entire peptide backbone while retaining that heterocyclic inhibitors are mimicking the
isosteric topography of the enzyme-bound extended p-strand of enzyme-bound sub-
peptide (or assumed receptor-bound) confor- strate-derived inhibitors (vide infra).
mation. Farmer's definition went beyond sim- Based on these considerations, four dis-
ple replacement of amide bonds and the con- tinct types of peptidomimeticshave been iden-
cept of stringing together conformationally tified to date (9, 10). The first invented were
restricted amino acid derivatives to mimic the structures that contain one or more mimics of
native peptide structure. In the intervening the local topography about an amide bond
years, many non-peptide and partially peptide (amide bond isosteres). Strictly speaking,
structures have been found that mimic (or an- these are properly classified as pseudopeptides
tagonize) the action of the peptide ligand at its ( l l ) , but in recent years, they have been called
receptor; this is particularly true with sub- peptidomimetics on occasion. For historical
stances active at G-protein-coupled receptors. reasons, we classify the peptide backbone mi-
The pyrrolinone unit (6)designed by Smith metics as type I mimetics (Table 15.1). These
and Hirschmann illustrates a modern use of
these two concepts (Fig. 15.3) (8). Pyrrolino-
nes constrain the peptide-like side-chains into
an extended p-structure topography that fits
the active sites of most peptidases; pyrrolino-
nes are resistant to normal proteolysis be-
cause no a-amino acid units remain, and the
units impart sophisticated partitioning prop-
erties to the final inhibitor. Pyrrolinones, like
many peptide-derived peptidomimetics, retain Peptide
an atom-to-atom correspondence to the par-
ent peptide, especially with respect to the pep-
tide backbone structure. Most of these struc-
tures contain elements that accomplish one of
two objectives: they replace amide bonds with
metabolically stable units, and they affect a
conformational constraint on peptides or on
the peptide replacement. In contrast, hetero-
cyclic natural products or screening leads that
bind to peptide receptors also have been called
peptidomimetics by virtue of their mimicking Pyrrolinone analog
(or antagonizing) the function of the natural
peptide. Although structural data confirming
mimicry of the designed mimetics are rarely Figure 15.3. Correlationof pyrrolinone-based pep-
available for receptor bound ligands, ample ev- tidomimetics and the parent peptide.
636 Peptidomimetics for Drug Design
mimetics often match the peptide backbone X-ray structural determination of both the
atom-for-atom while retaining functionality peptide-derived inhibitor and the heterocyclic
that makes important contacts with binding non-peptide inhibitor complexes have been
sites. Some units mimic short portions of sec- compared. These examples demonstrate that
ondary structure (e.g., p-turns) and have been alternate scaffolds can display side-chains so
used to generate lead compounds. Many early that they interact with proteins in fashion
protease inhibitors were designed from tran- closely related to that of the parent peptide.
sition state analog mimetics or from collected Recently, a fourth type of peptidomimetic
substrate/product mimetics, each designed to called a GRAB-peptidomimetic (group replace-
mimic reaction pathway intermediates of the ment-assisted binding) has been identified
enzyme-catalyzed reaction. These are mimics (10). These structures might share structural-
of the peptide bond in a putative transition functional features of type I peptidomimetics,
state or product state and will be classified but they bind to an enzyme form not accessible
here as peptidomimetics. with type I peptidomimetics.
The second type of mimetic to emerge was Previous reviews on peptidomimetics have
the functional mimetic, or type ZZ mimetic, addressed pseudopeptides (ll),macrocyclic
which is a small non-peptide molecule that mimetics (13), natural product mimetics (14),
binds to a peptide receptor. Morphine was the cyclic protease inhibitors (15),mimetics for re-
first well-characterized example of this type of ceptor ligands (16-22), and earlier general
peptidomimetic. Initially, type I1 mimetics overviews (23-29). This review will focus on
were presumed to be direct structural analogs the design process itself. Novel peptidomimet-
of the natural peptide, but characterization of ics in which the structural relationship be-
both the endogenous peptide and antagonist's tween parent peptide and the peptidomimetic
binding sites by site-directed mutagenesis has has been established by biophysical methods
raised doubts about this interpretation (12). are used to clarify the principles. Successful
The mutagenesis data indicate that antago- approaches are highlighted to illustrate how
nists for a large number of receptors seem to these concepts are currently used.
bind to receptor subsites different than those
used by the parent peptide. Consequently,
functional mimetics may not mimic the struc- 3 DESIGN O F CONFORMATIONALLY
ture of the parent hormone; this remains to be RESTRICTED PEPTIDES
determined. Despite this uncertainty, the ap-
proach has been quite successful and produced Peptide derivatives that contain conforma-
a number of potential drug lead structures. tionally restricting amino acid units or other
Type ZZZ mimetics represent the Farmer conformational constraints were first called
definition ofpeptidomimetics in that theypos- conformationally constrained (or restricted)
sess novel templates, which appear unrelated peptide analogs. The use of steric hindrance or
to the original peptides but contain the essen- cyclization to limit rotational degrees of free-
tial groups, positioned on a novel non-peptide dom in biologically active molecules has a long
scaffold to serve as topographical mimetics. history and was originally applied to non-pep-
Several type I11 peptidomimetic protease in- tide neurotransmitters (30). Subsequently, it
hibitors have been characterized where direct was applied to amino acid substituents and to
3 Design of Conformationally Restricted Peptides
H2N-Cys-Ser-Cys-Ser-Ser-Leu-Met
1 \ I
HO-Trp-lle-lle-Asp-Leu-His-Cys-Phe-Tyr-Val-Cys-Glu-Lys-~p
Endothelin
Ac-Ser-
(16)
15.6). Peptide chemists routinely apply con- modifications of the peptide substrate are the
formational restriction in their attempts to replacement of the amino acids of the PI-P,'
determine possible bioactive conformations. cleavage site by D-amino acids or the employ-
Flexible peptides can be conformationally ment of a-C or a-N alkylated amino acids and
restricted by a variety of methods other than cyclic or p-amino acids (Fig. 15.7).
macrocyclization of the peptide. For example, Mimicking the secondary structure of pep-
Marshall et al. introduced a-methyl amino tides has become one of the most important
acid substituents into peptides as a way to de- tools for rational drug design (44-47). These
crease the conformational space available to methods induce the synthetic analog to adopt
the resulting peptide (42); these types of ap- a set of target conformations, which are de-
proaches led to his "Active Analog" approach signed to mimic the bioactive conformation
for determining bioactive conformations of predicted in the native substrate from bio-
flexible molecules (43). Some other traditional physical techniques. Molecular surrogates
Peptidomimetics for Drug Design
a-Amino acid
Cyclic derivatives / N
H
have been found that efficiently- mimic turns, pound (19) retained good biological activity so
strands, sheets, and helices. By far, the major that the bioactive conformation of LH-RH
efforts have focused on the design of p-turn mi- probably contains a p-turn around residues 6
metic~.Some of the templates used to constrain and 7 (48).
the conformational torsion angles of the peptide Conformational restriction has also been
chain are summarized in Figs. 15.8-15.14. used to determine the bioactive conformation
In a very early example, Freidinger et al. of enzyme-inhibitor systems for which no X-
developed a series of cyclic lactams that stabi- ray crystal structure is available. Thorsett et
lized P- and y-turn structures in linear pep- al. (49) synthesized conformationally re-
tides (Fig. 15.8). This strategy was applied to stricted bicyclic lactam derivatives of the an-
determine conformations of LH-RH that are giotensin converting enzyme (ACE) inhibitors
-
consistent with the turn structure ~ermitted enalapril (20) and enalaprilat (21) (Fig. 15.9)
by the constraint. For example, the 3-amin- to characterize torsion angles in the bioactive
olactam (18)was used to mimic a p-turn con- conformation. Analog (22) was used to con-
formation. When inserted in LH-RH, com- strain the torsion angle psi (T). Flynn et al.
Ca(i)
I
0
Glu-His-Trp-Ser-Tyr 9v O
JArg-pro-Gly-NH2
-
\
LH-RH p-turn mimetic
(19)
(50) extended this principle to prepare the been shown to induce the desired secondary
very tight-binding tricyclic ACE inhibitor (23) structure in a gramicidine S analog. Later, it
(Fig. 15.9). was used to prepare a conformationally re-
Several other y-, 6,and elactam deriva- stricted cyclosporin A analog (51). Several
tives have been prepared and inserted into re- p-turn and y-turn mimetics are shown in Figs.
ceptor antagonists or agonists. For instance, 15.10-15.12, and many other examples are
the thiazolidine lactam (24) (Fig. 15.10) has available in the recent literature (52-54).
H
Figure 15.12. Some y-turn mimetic 0
scaffolds.
Template Mimetics
(27)
vided mimetics of multiple discontinuous pro- ing specific protein-protein interactions. In-
tein surfaces (56). Over the last few years, the sertion of the key pharmacophoric groups into
Gellman, Seebach, and Hanessian research a nonpeptidic framework has provided good
groups have invented novel helical structures inhibitors ofa variety of biological targets.
(e.g., 31,321 by use of P-, y-, and Gpeptides (58). This technique has been successfully ap-
It is important to stress that even a small plied in those biological targets where the key
change in the structure or in a single torsional structural amino acids of the native peptide
angle can be sufficient to dramatically modify for peptide recognition are known. Miscella-
the conformation of the resulting peptide. Nu- neous examples are found in glycoprotein
merous additional conformational constraints
GbIIb/IIIa inhibitors (33)that mimic the RGD
have been developed, and the reader is encour-
sequence (64) or in Ras-farnesyltransferase
aged to consult these reviews for additional
inhibitors (34) that mimic the CAAX sequence
examples (32,59-63).
(Fig. 15.15) (65).
An early example of this concept was devel-
4 TEMPLATE MIMETICS oped by Hirschmann et al. in the design of a
somatostatin analog (Fig. 15.15)(55).Three of
Highly functionalized molecular scaffolds the four crucial amino acid side-chains of the
have proven to be very successful in mimick- parent peptide (Tyr, Trp, and Lys) were posi-
Peptidomimetics for Drug Design
OMe
\
N-N
(28)
tioned on a sugar template (35). Although ligand were identified; then, NMR and molec-
originally designed as a somatotropin release ular modeling techniques were used to model
inhibitory factor (SRIF) antagonist, com- these side-chains onto known scaffolds and to
pound (35)also proved to be a good Substance compare with the original three-dimensional
P antagonist. These sugar derivatives, as well (3D) structure of the native peptide. Com-
as the benzodiazepine, diphenylmethane, and pound (37) (Fig. 15.16) is a potent peptidomi-
spiropiperidine scaffolds, are elements found metic derivative with improved solubility in
in a variety of inhibitors of receptors, and have water that functions the same as the cyolic
been designated as "privileged structures" (66). tetrapeptide (69, 70).
Thus, these common scaffolds can often provide
a template for further optimization of a desired
activity. Evans et al. have noted that the essen- 5 PEPTIDE B O N D ISOSTERES
tial surface area of biologically active peptides is
similar to the surface area of benzodiazepines, The replacement of amide bonds by retro-in-
one type of non-peptide scaffold known to bind verso amide replacements (71, 72) and other
to Gprotein-coupled receptors (67). amide bond isosteres generates pseudopep-
The quest for functionalized lead struc- tides (11). This process was first used to stabi-
tures that effectively mimic the "hot spots" lize peptide hormones in viuo, and later to pre-
within the biological ligand is not easy (68). pare transition state analog (TSA) inhibitors.
Molecular modeling and high-throughput Systematic efforts to convert good in vitro in-
screening (HTS) are techniques that are cur- hibitors into good in viuo inhibitors became
rently used for this purpose and have been the driving force for further development of
summarized elsewhere. peptidomimetics. Figure 15.17 illustrates
The design and synthesis of antifungal an- some of the peptide backbone modifications
alogs of the cyclic peptide rhodopeptin (36) that have been made in an effort to increase
(Fig. 15.16) illustrate a recent application of bioavailability. Replacement of scissile amide
peptidomimetic scaffolding, where the struc- (CONH) bonds with groups insensitive to hy-
ture of the biological target is not known. Af- drolysis (e.g., CH,NH) has been extensively
ter structure-activity relationship (SAR) stud- practiced. Reviews of this work have appeared
ies, the important side-chains of the peptide (11, 73). Removal of the proton donors and
Peptide Bond lsosteres
OMe
/
Somatostatin
(35)
acceptors in an amide bond also reduces hy- fects the geometry and increases the flexibility
dration, which improves the ability of the com- of the molecule at this position, which de-
pounds to penetrate lipid membranes (74). creases ligand binding. Effective analogs have
These approaches represent important first been obtained when conformational restric-
steps in development of peptidomimetics. tion, transition-state analog design, and
However, removal of an amide bond also af- amide bond replacements have been applied to
Peptide
Peptoid
Figure 15.17. Isosteres that replace peptide backbone amide groups to generate pseudopeptides.
dipeptide TSA provides the functionality that tic proteases, (87-89), and their success led to
interacts tightly with the enzyme catalytic other tetrahedral intermediate mimics such as
groups while the amino acid sequence up- and the hydroxylethylene (41) and hydroxyethyl-
downstream on the peptide chain provides in- amine (42) isosteres (Fig. 15.19) (90-92).The
teractions that lead to selective inhibition of statine subunit, which mimics the tetrahedral
the target enzyme. The enzyme active site typ- intermediate, represents one of the earlier ex-
ically is buried in a cleft capable of accommo- amples of TSA, although statine is one atom
dating up to three to nine amino acid residues short in backbone length to be a true dipeptide
of the substratelinhibitor depending on the or two atoms too long to be an isosteric re-
minimum amino acid sequence necessary for placement for a single amino acid.
hydrolysis. The inhibitor's exquisite selectiv-
Early work focused on developing inhibi-
ity derives from the interactions of the li-
tors of renin as potential antihypertensive
gand's p,-P,' residues with the enzyme bind-
agents, but these compounds failed to become
ing sites (S,S,') (83). Recently, some aspartic drugs primarily because of difficulties in ob-
and serine peptidase inhibitors have been
taining orally active drugs. As a result, the
found that access an additional binding site first pharmaceutical attempts to develop renin
sub-pocket (S3sP)to increase both inhibitor po- inhibitors for treatment of hypertension
tency and selectivity (84-86). through TSA-biased inhibitors failed (93). It
was eventually realized after extensive modi-
6.1 TSA in Aspartic Peptidase lnhibitors
fications to the ancillary peptide functionality
The reduced amide isostere (39),developed by that developing bioavailable peptide-derived
Szelke, and the statine (hydroxylmethylene) inhibitors critically depended on the molecu-
isostere (40) were early transition-state ana- lar weight of the inhibitor. Developing inhibi-
logs used to design inhibitors of various aspar- tors for HIV protease was substantially easier
Peptidomimetics for Drug Design
Saquinavir (Roche)
Ritonavir (Abbott)
than for renin because HIV protease requires and could be applied to the development of
a significantly smaller minimum substrate se- HIV protease inhibitors. Variations on the hy-
quence (94). In addition, the principles eluci- droxyethyl amine moiety proved to be very
dated to develop renin inhibitors were known successful. Some of the highly modified HIV
6 From Transition-State Analog Inhibitors to Non-Peptide Inhibitors: Examples in Protease lnhibitors 649
H2N-Lys-Thr-GIu-Glu-He-Ser-GIu-Val-Asn-HN
= 30
(43)
&
nM
OH 0
Val-Ala-Glu-Phe-OH
BocHN-Val-Met-HN LA OH
K, = 2.5 nM
(45)
=
Val-CONHBn
protease inhibitors now in clinical use (Fig. droxyl group is hydrogen bonded to Asp32 and
15.20) have excellent oral bioavailability and Asp228, like in other hydroxyethylene deriva-
establish that application of the transition tives, and the inhibitor binds in an extended
state analog design process can be very suc- conformation. Because the target p-secretase
cessful in favorable cases. is within the CNS, successful inhibitors have
More recently, the principles for designing to penetrate the brain blood barrier readily, a
inhibitors of aspartic proteases have been ap- property not yet achieved with any of the pep-
plied to the design of inhibitors of p-secretase tidomimetic inhibitors currently available.
(BACE or Memapsin-2) as potential agents for With the crystal structure in hand, struc-
treating or preventing Alzheimer's disease ture-based modification of the parent lead
(95, 96). Both statine-derived inhibitors (43) compound has just started to provide new pep-
and hydroxyethylene-derived BACE inhibi- tidomimetic structures with lower molecular
tors have been reported (Fig. 15.21) (97,98).A weight and fewer hydrogen bonds (e.g., 45)
crystal structure of (44) bound to p-secretase (Fig. 15.211, opening further avenues to phar-
has been reported (99). As expected, the hy- macologically useful compounds (100).
Peptidomimetics for Drug Design
Captopril Enalapril, R = Et
(38) Enalaprilat, R = H, (46)
Lisinopril
Figure 15.22. Examples of
TSA as ACE inhibitors. (47)
6.2 TSA in Metallo Peptidase Inhibitors trated by enalaprilat (46) and lisinopril (47)
The discovery of the angiotensin converting (Fig. 15.22) (101, 102).
enzyme inhibitors in the middle 1970s consti- Most metallopeptidase inhibitors append a
tutes one of the maior advances in the rational zinc chelating functionality to a peptide or
design of drugs, the consequences of which are peptidomimetic that is recognized by the S1'-
still being realized. The discovery of these me- S3' subsites in the target enzyme. Successful
tallo peptidase inhibitors was carried out by clinical candidates invariably contain groups
Ondetti et al. as part of a long-term study to that replace the initial di- and tri-peptide moi-
develop antihypertensive drugs (80); in 1999 eties to achieve selectivity and orally activity.
they received the Lasker Prize in Clinical For example, neutral endopeptidase (NEP),
Medicine for their work. another endopeptidase involved in degrading
Angiotensin converting enzyme (ACE) is a the larger opioid peptides dynorphan and/or
carboxy zinc metallo dipeptidase that cleaves endorphan, is inhibited by thiorphan (48)
His-Leu from the C-terminus of angioten- (103) and a variety of NEP inhibitors: retro-
sin-I. Ondetti et al. reasoned that the prod- thiorphan (49) (104) and kelatorphan (50)
uct of normal reaction, the carboxyl group, (Fig. 15.23) (105).The hydroxamic acid moiety
could bind to the active site zinc ion, and is used in many inhibitors of metallopepti-
that the carboxyl group of a collected-prod- dases.
uct inhibitor also could bind weakly. To im- Inhibition of NEP also prevents the degra-
-prove the interaction between inhibitor and dation of atrial natriuretic factor (ANF),a nat-
enzyme zinc ion, they replaced the carboxyl ural hypotensive peptide. Dual inhibitors of
group with a sulfhydryl group, which binds NEP and ACE have been designed success-
zinc about 1000 times more tightly. This led fully because both enzymes share significant
to captopril (Capoten) (38) (Fig. 15.22) (80). structural homology, particularly in their ac-
Later developments by other companies led tive sites. Simultaneous inhibition of both
to many ACE inhibitors. Some are illus- peptidases produces a more powerful hypoten-
6 From Transition-State Analog lnhibitors to Non-Peptide Inhibitors: Examples in Protease Inhibitors 651
0
Thiorphan
(48) Omapatrilat
ACE IC50 = 6 nM
NEP IC50= 9 nM
(51)
Retrothiorphan
Sampatrilat
ACE IC50= 7 nM
NEP ICS0= 20 nM
0
Kelatorphan
(50)
lectivity of this type of compound arises from apeutically useful. Synthetic analogs have
the specific coordination of the thiirane with been synthesized that inhibit this enzyme.
the active-site zinc ion, which facilitates thi- Clinical candidates like Ro-32,7315 (59) (Fig.
irane ring opening by nucleophilic attack by 15.27) are starting to emerge, and more are
neighboring Glu-404. This novel mode of bind- expected in the near future (115,116).
ing was assessed by X-ray absorption studies Aminopeptidases, enzymes that cleave off
because of the difficulty to obtain a suitable the N-terminal amino acid from a peptide
crystal structure (111,112). chain, are bismetallo peptidases, a class of
ADAMs are membrane proteins that con- metallopeptidase that contain two metals ions
tain a disintegrin and a metalloprotease do- in the catalytic site (117, 118). These can be
main. Disintegrins are RGD-containing pro- inhibited by compounds related to bestatin
teins that inhibit cewmatrix interactions (60) (Fig. 15.28), which contains the N-termi-
(adhesion) and cewcell interactions (aggrega- nal a-hydroxy-P-amino acid residue, some-
tion) through the integrin receptors. In addi- times referred to as norstatine. In leucine
tion, ADAMs have two other domains that are amino peptidase, chelation occurs between
involved in signaling and transport (113). both the amide carbonyl group and the adja-
There are more than 25 ADAMs proteases cent hydroxyl and the hydroxyl and the N-ter-
identified so far. ADAM 17 has been shown to mind amino group (119,120).
be TNF-a converting enzyme (TACE) (114).
Inhibition of TACE slows the production of 6.3 TSA-Derived Cysteine and Serine
TNF-a, a potent cytokine involved in inflam- Peptidase Inhibitors
matory responses to infection. Normally
TNF-a produces a useful response, but in Classical TSA inhibitors of cysteine and serine
some cases, too much TNF-a is released and proteases differ from the metallo- and aspartic
inhibition of TNF-a production would be ther- protease inhibitors in that they mimic the tet-
6 From Transition-State Analog lnhibitors to Non-Peptide Inhibitors: Examples in Protease Inhibitors 653
Ac-Pro-AalHN, P
NHPh
R1
gNAB/o~
H
I $ HN y N H%
0 OH 0
Trifluorornethylketone Boronic acid Diaminoketone
(61) (62)
AcHN
(.c;!c:"H2 HyHTr
0
Leupeptin
--
'r
0
H
0
-zko
N
-
NHCbz
Caspases are involved in a variety of cell mation of the typical tetrahedral intermediate
functions, especially in programmed cell death of the isatin type structures, which may com-
(apoptosis). These enzymes recognize tet- promise its selective inhibition of proteases
rapeptide sequences ending in an aspartic acid (133, 134).
recognition point: X-Y-Z-Asp-NHR. Much ef- These reversible caspase inhibitors differ
fort has been expended in trying to obtain se- from inhibitors that form irreversible covalent
lective inhibitors of the 14 different types bonds, the so-called "dead-end" or "suicide"
identified to date. In this context, selective in- inhibitors of these enzymes, For example, the
hibitors of caspase 1 or of caspase 317 have a-acetoxy ketone (72) in Fig. 15.32 is an alky-
recently been reported (130). lating irreversible inhibitor; the enzyme cys-
Peptidomimetic modifications of the tet- teinyl group displaces the a-acetoxy group to
rapeptide sequence have led to the conforma-
form an irreversible covalent bond (135).
tionally constrained compound (69)as a selec-
tive inhibitor of caspas&l or interleukin-lp
converting enzyme (ICE) as potential anti-in-
flammatory compounds (131). Recently, new 7 SPEEDING UP PEPTIDOMIMETIC
non-peptide peptidomimetic diphenyl ether RESEARCH
sulfonamides have been described as novel
lead structures (70) (Fig. 15.32) (132). As mentioned before, combinatorial chemis-
Finally, researchers from SmithKline try, high-throughput screening, and analo-
Glaxo have identified potent and selective in- gous techniques have become powerful tools
hibitors of caspases 3 and 7 that lack the re- to promote drug discovery in peptidomimetic
quired carboxyl group in P, (71) (Fig. 15.32). research. It is not the intention of this chapter
The X-ray co-crystal structure reveals the for- to summarize all these methods, and excellent
Peptidomimetics for Drug Design
reviews are available in the literature (136- Veber et al. (31) This approach yielded five
140). However, one successful approach devel- compounds (73-77) (Fig. 15.33), each being
oped at Merck for the rapid identification of selective for one of the somatostatin receptor
selective agonists of the somatostatin receptor subtypes: sstl (73), sst2 (74), sst3 (75), sst4
through combinatorial chemistry should be (76), and sst5 (77).
highlighted, because it illustrates the evolu-
tion of a constrained peptide into a non-pep-
tide peptidomimetic structure (141). 8 TOWARD RATIONAL DRUG DESIGN:
A series of combinatorial libraries were DISCOVERY OF NOVEL NON-PEPTIDE
constructed on the basis of molecular model- PEPTIDOMIMETICS
ing of known peptide agonists like MK-678
and ocreotide. A chemical collection of 200,000 Current pharmaceutical research has taken
compounds was screened, giving priority to advantage of newer computational methods,
the residues Tyr-Trp-Lys, important pharma- the so-called computer-aided drug design, and
cophores in somatostatin determined first by other physicochemical techniques such as X-
Peptidomimetics for Drug Design
OMe
(79)
Rich et al.
Figure 15.34. Examples of
GRAB peptidomimetics.
ray crystallography and NMR (142). The main peptide inhibitors of renin (78) (Ki = 26
goal in rational drug design is to translate the and (79) (Ki = 31 nM)(Fig. 15.34)(84, 147-
structural information in the native peptide 149) stabilize an enzyme active site conforma-
into low molecular weight non-peptidic mole- tion different than the P-strand binding en-
cules. Over the past years, many 3D structures zyme conformation typical for other peptidase
of biological targets have been solved and have inhibitors. A close analysis of the X-ray crystal
been successfully used to design new, pharma- structure of the enzyme inhibitor complex
cologically useful compounds (vide infra). Dif- shows that the piperidine C4-phenyl group
ferent computer-aided design methods, e.g., binds to the enzyme to replace Tyr75that has
3D pharmacophore model, 3D quantitative rotated to another position. Interestingly,
SAR (QSAR), docking, and de novo design, Leu73also rotates to fill some of the vacated
have been extensively reviewed elsewhere (75, Tyr75pocket, and this in turn allows Trp3' to
143-146). occupy a new site formed in part by the va-
Recently, the importance of generating in- cated Leu73(Fig. 15.35). This cascade of con-
hibitors that target receptor conformational formational transitions in the renin active site
ensembles has been pointed out (10). This allows the optimized inhibitor to stabilize an
method goes beyond the current docking of enzyme conformation not observed when the
known structures to known active site con- classic peptide-derived peptidomimetics bind.
formers and can lead to type 111 and GRAB This stabilization process is defined as group
peptidomimetics. replacement process, and the piperidine inhib-
The concept of Group Replacement As- itors constitute a new type of peptidomimetic:
sisted Binding (GRAB) peptidomimetics de- GRAB peptidomimetics.
rives from the discovery at Roche of the piper- Comparison of (78) and (79) with the struc-
idine class of renin inhibitors. The non- tures of other peptide-derived inhibitors re-
-
veided how the different enzyme active site are highly modified peptidic structures that
C01 formation were found. Bursavich et al. stabilize the enzyme-bound extended p-con-
have successfully extended the initial renin formation (151, 152).
)deling to the design of inhibitors of two Another approach to achieve greater in
ier aspartic peptidases: pepsin and R. chi- vivo activity is to start with a molecular tem-
zsis pepsin (80) (Ki = 2 p&) and (81)(Ki = plate with proven useful pharmacokinetics
1 r-JM) (Fig. 15.34) (150). and oral bioavailability and to selectively mod-
The extended P-strand binding conforma- ify it to achieve the desired activity. Identifica-
tion could be changed into the piperidine bind- tion of the orally active anticoagulant warfa-
in@; conformation by a series of low-energy, rin (84) (Fig. 15.37) as a weak inhibitor (IC,,
me!chanisticallyrelated conformational changes
= 18 p&)of HIV protease was followed by two
in active site side-chains. The discovery of the
reports of 4-hydroxycoumarins as possible
ROIthe inhibitors and the correlation of these
type I11 HIV inhibitors. Subsequent SAR stud-
structures with peptide-derived inhibitors are
an;dogous to a peptidomimetic "Rosetta Stone." ies led to the more potent 5,6-dihydro-4-hy-
This design strategy has the potential for de- droxy-3-pyrone inhibitor (85)(IC,, = 2.7 nM),
signing novel types of peptidomimetic struc- which has good anti-viral activity (EC,, = 0.5
tur'es. CLM) and is orally bioavailable (153). Upjohn
researchers also used a structure-based design
approach based on warfarin to obtain (86),
9 HISTORICAL DEVELOPMENT OF their clinical candidate PNU-140690 (154). It
IMIPORTANT NON-PEPTIDE
should be noted that both inhibitors bind to
PEF'TIDOMIMETICS the extended P-strand binding active site con-
formation.
HIV Protease
Workers at DuPont used a pharmacophore
TYI?e-I HIV-1 protease inhibitors, Saquinavir, model and database search to develop the first
Ritonavir, Indinavir, Amprenavir, Viracept type I11 mimetic inhibitor of HIV protease,
(neflinavir mesilate), and Lopinavir (Fig. DuP 450 (87) (Fig. 15.38). This evolved from a
15.20) are established drugs for the treatment 3D pharnacophore that retained two key inter-
of 1UDS. All these inhibitors employ the cen- actions: replacement of the flap-bound water
tra:I hydroxyl transition state mimetic as a and a hydroxyl transition-state isostere (155).
scaffold on which varying functionality was Molecular modeling led to a cyclohexanone as
systematically added until the required bal- a better spacer between these groups, and fi-
anc:ebetween potency, in uiuo activity and oral nally the seven-membered cyclic urea (87) was
absorption was achieved. In general, the bind- created (Fig. 15.38). The development of these
ing interactions were optimized through inhibitors illustrates the importance of con-
iternative synthesis and co-crystallization of in- formational analysis in the design of con-
hib:itor with enzyme, molecular modeling, and strained analogs.
re-clesigning the inhibitor side-chains. Phar- Surprisingly, the symmetric cyclic sulfo-
mac:okinetic properties were addressed only nyl-urea derivative analog (88) (Fig. 15.38,
aft€:r the initial inhibitor was identified and Ki = 3 nM) binds differently in the active site
opt:imized. Compounds (82-83) (Fig. 15.36) and adopts a flipped conformation (156).
Peptidomimetics for Drug Design
Moreover, SAR of the cyclic urea and cyclic The development of thrombin inhibitors
sulfamide inhibitors do not follow a straight- that lack the functionalized TSA highlights a
forward pattern. These contradictory results major new approach to type I peptidomimet-
clearly illustrate the structural diversity cre- ics. In 1995, a Lilly group found that D-Phe-
ated by a subtle structural modification in two Pro-Agmatine analogs showed increased se-
otherwise related peptidomimetic protease in- lectivity for thrombin over other fibrinolytic
hibitors. enzymes despite a 100-fold loss in potency
The peptidase inhibitors, (82) and (83),are caused by the removal of the aldehyde group
actually amino acid and transition-state mim- (160). These studies paved the way for
ics pieced together to emulate the typical Merck's development of picomolar thrombin
ligand-bound extended p-strand inhibitor con- inhibitors (161, 162), which use a similar mo-
formation. The structurally distinct heterocy- tif. Removal of an a-ketoamide transition
clic aspartic protease inhibitors (85-86) and state mimic from L-370,518 (89) (Fig. 15.39,
(87-88) are non-peptide peptidomimetics be- Ki = 0.09 nM)led to an expected 100-fold drop
cause of their remote structural relationship in potency for (90) (Ki = 5 nM). However, sys-
to native peptide substrates. Yet these two dis- tematic modification of the P, position re-
tinct peptidomimetic classes bind to the same stored potency and led to an inhibitor (91)
active site topography. These structurally dis- with a Ki = 2.5 pM. Interestingly, potency
tinct peptidomimetics selectively stabilize seems to be enhanced by a fortuitous hydro-
closely related enzyme conformations. phobic collapse into a favorable binding con-
formation.
9.2 Thrombin
Thrombin inhibitors (92) and (93) illus-
Thrombin and Factor Xa are both serine pro- trate a novel type 111 peptidomimetic. Most
teases involved in the blood coagulation cas- protease inhibitors bind in an extended
cade. Inhibition of these two enzymes is pro- p-strand conformation that is stabilized by
viding novel anticoagulants (157-159). multiple enzyme ligand hydrogen bonds.
9 Historical Development of Important Non-Peptide Peptidomimetics
9.3 Factor Xa
New approaches to design inhibitors of Factor
Xa as potential anticoagulants have been re-
viewed (173),and important type I11 mimetics
have been described (Fig. 15.42). All inhibitors
contain amidine or basic groups that bind in
the enzyme's S, site; none of the inhibitors
contains a classical electrophilic center of the
type employed in TSA inhibitors (174-180).
Compound (101)(Fig. 15.42) was designed
from a strategy involving connection of a
three-point pharmacophore derived from mo-
Figure 15.39. Non-TSA thrombin inhibitors. lecular modeling. Beginning with the X-ray
structure of the Factor Xa dimer, Gong et al.
does not pick up hydrogen bonding from the (176) envisioned three important enzymatic
important active site sequence Ser214- contact points: a phenylamidine in the S, sub-
Gly216 (168). Both crystal structures showed site, a phenylamidine in the S, site, and a car-
a similar binding mode; where interaction of boxylate moiety postulated by a group at Daii-
the C-2 side-chain with Trp6OD might explain chi to confer selectivity over thrombin
the high thrombin selectivity observed for this through an interaction with Gln192 of Factor
series (169). Xa. Systematic iterative modifications led to
Another type I11 peptidomimetic inhibitor the potent inhibitor (101) (Ki = 9 nM), which
was derived from the crystal structure of a has 350-fold selectivity over thrombin. This
bicyclic [3.1.31 inhibitor (170) complexed to approach highlights a truly de novo method
thrombin (97) (Fig. 15.41). The X-ray struc- where fragments were docked into the active
ture revealed that one of the carbonyls was site and an appropriate spacer was chosen to
oriented towards the hydrophobic P-pocket connect them. Further SAR data led to modi-
(S,). The desolvation necessary to place a car- fications that improved both potency and se-
bony1 in a hydrophobic pocket is unfavorable lectivity (176).
and various alkyl groups were used as possible
9.4 Clycoprotein ilb/llla (GP llb/llla)
replacements. This led to the potent (Ki = 13
nM) and selective (>760 for thrombin over Some outstanding examples of the use of con-
trypsin) inhibitor (98). formational restriction to characterize the
9 Historical Development of Important Non-Peptide Peptidomimetics
bioactive conformations of Arg-Gly-Asp pep- tives of the RGD sequence, which were de-
tidometic antagonists illustrate the present signed by analogy with the somatostatin work
state-of-the-art. Members of the integrin fam- (vide supra). Excellent antagonists related to
ily of receptors recognize and bind the peptide (102) were obtained. Further constraint of the
sequence, Arg-Gly-Asp, as an important step peptide system by use of the o-thiol benzene
in platelet aggregation and other physiological derivatives led to the novel antagonist SKF
processes (181), and competitive antagonists 107260 (103) (Fig. 15.431, a good inhibitor of
for this process could serve as potential drug both platelet aggregation and binding to
candidates. Much effort has been directed to- GPIIbDIIa. Barker et al. (186) followed a sim-
ward identifying small ligands that might ilar strategy but used cyclic sulfides as an ad-
mimic the RGD peptide sequence (182). This ditional conformationally restricting element.
drug design concept was supported by the fact These derivatives had the advantage of being
that protein antagonists of integrin receptors rapidly synthesized by solid phase methods.
are known that contain the RGD sequence Systematic structure-activity studies with re-
(183) and that small peptide sequences con- spect to the amino acid preceding the RGD
taining the RGD moiety weakly antagonize sequence and the chirality of sulfoxide deriva-
the endogenous ligand (184). Consequently, tives led to the discovery of (3-4120 (1041, a
several groups synthesized conformationally potent, biologically active derivative.
restricted derivatives of small peptides as The conformation of both (103)and (104)
starting points for developing metabolically in water was found to be highly constrained,
stable peptides or peptidemimetics. Ali et al. and a single predominant conformation could
(185) svnthesized a series of disulfide deriva- be characterized in aqueous solution by use of
Peptidomimetics for Drug Design
/
TMS
plus the charged polar functional groups at CAAX motif (where C is a cysteine residue, A
both ends. Esters or coumarin (197) linkers is any aliphatic amino acid, and X is usually
have been used to provide orally available pro- Met, Ser, or Ala). This tetrapeptide is the sig-
drugs, and bioisosteric replacements of the nal for farnesylation of Ras proteins. Ras-far-
guanidiniurn group by a pyridine, (198) tetra- nesyltransferase is one of the most promising
hydronaphtyridine, (199), or aminobenzimi- targets for novel anti-cancer drugs, because at
dazole (200) moieties provided more bioavail- least 30% of the human cancers contain mu-
able analogs. tated Ras (201,202).
Two types of peptidomimetic structures
9.5 Ras-Farnesyltransferase
have been used to develop - inhibitors (203).
Inhibitors of Ras-farnesyltransferase have Some typical type I inhibitors were generated
been developed by mimicking the C-terminal by replacing the amide backbone with differ-
Peptidomimetics for Drug Design
Overhauser effect experiments; coordination (Fig. 15.47) (211). Subsequent SAR work led
of the cysteine side-chain to the Zn ion pro- to the potent inhibitor SCH 66701 (115)(Ki =
motes the conformational change in the pep- 1.7 a),which was crystallized within the en-
tide backbone. Moreover, differences in the zyme active site (212). This series of com-
conformation binding mode of peptides and pounds is completely non-peptidic and also
peptidomimetics is one of the bases for selec- lacks the free sulfhydryl or imidazole seen in
tive farnesylation (208). the other inhibitors discussed here. This is a
Other type 111 peptidomimetic inhibitors of breakthrough that shows that potency can be
this enzyme have also been reported. Inhibitor achieved even without the "essential" cysteine
(112) (Fig. 15.47) was developed by replacing or sulfhydryl mimic.
the A,& dipeptidyl sequence with a benzodi-
azepine scaffold (209). Later, SAR modifica- 9.6 Non-Peptidic Ligands for Peptide
tions of the benzodiazepine nucleus that in- Receptors
cluded a hydrophilic 7-cyano group and a This section illustrates the successful develop-
4-sulfonyl group provided the potent, orally ment of non-peptide peptidomimetics from a
available and in uiuo active (113)(210). screening lead by assuming the inhibitor
HTS also produced several non-peptide binds to the receptor in the same way as does
leads typified by inhibitor SCH 47307 (114) the native peptide hormone. These assump-
Peptidomimetics for Drug Design
SMe
(109) (111)
tions actually led to effective inhibitors of the weak (IC,, = 43 pJ4) but quite selective A-I1
receptor. Later, site-directed mutagenesis of receptor antagonist (214). Using this as a lead
target receptors suggested that for many of compound, DuPont and SmithKline Glaxo re-
these compounds, the mimetic was binding to searchers independently developed potent
the receptor at ancillary, perhaps overlapping, small molecule A-I1 receptor antagonists. The
sites on the receptor. Later still, pharmacolog- DuPont group used the conformation sug-,
ical studies indicated that peptide receptors gested by Smeby and Fernandjian to guide the
adopted multiple states, suggesting that dif- design (215). It was speculated that the car-
ferent antagonists might bind to different re- boxyl group and the imidazol group of (116)
ceptor forms. Of course, if compounds do not were bound to the A-I1 terminal carboxyl
bind to the same receptor site as the endoge- group and to the imidazole group, respec-
nous hormone, SAR data collected on the nat- tively. This rationalization culminated in the
ural peptide substrate is not applicable to synthesis of nanomolar inhibitors, with com-
these antagonists. Most of these peptidomi- pound (117) as a clear representative (216).
metics are probably type I1 or functional mi- Although workers at SmithKline Glaxo
metic~.Yet the success of this approach sug- used the same conformation as starting point,
gests that at least for some non-peptide they postulated other binding modes to the
antagonists, there may be some congruent receptor. One of their alternative hypothesis
structure that interacts with the receptor. considered compound (116) as a constrained
These issues will only be determined unam- analog in which the benzyl and the carboxyl
biguously when high-resolution structures of groups corresponded to the Tyr side-chain and
the G-protein-coupled receptors (213) and the C-terminal carboxyl group of A-11. Follow-
other constitutive receptor systems are deter- ing this hypothesis, modification of lead com-
mined. pound (116)eventually led to compound (118)
(Fig. 15.48) with an IC,, = 1.45 nM and oral
9.6.1 Angiotensin II. The first non-peptide activity of 30% (217).
antagonists of the AT1 receptor were found by Site-directed mutagenesis studies on the
HTS. The imidazole (116) (Fig. 15.48) is a AT1 receptor revealed differences in the bind-
9 Historical Development of Important Non-Peptide Peptidomimetics
ing site of angiotensin and the small molecule 222). Antagonists of tachykinin receptors
non-peptide compounds (119-120) (Fig. 15.49). ~roducebeneficial effects in several CNS dis-
There is no evidence that the single residues ease states such as pain, asthma, emesis, and
involved in inhibitor binding overlap with en- depression.
dogenous peptide binding. A general approach for converting a variety
Some other non-peptide agonists have also of peptide structures into small, type I1 pep-
appeared in the literature. Surprisingly, their tidomimetic antagonists was devised by Hor-
binding mode differs from the binding mode of well and colleagues and is illustrated here for
the peptide agonist (121), as well as that of the antagonists to Substance P. An alanine scan of
structurally similar non-peptide antagonist the parent undecapeptide revealed that the
(122) (Fig. 15.49) (218). However, angiotensin Phe4-Phes sequence was required for binding.
and L-162,313 (122) require common critical Replacement of one these residues by Trp, fol-
residues for angiotensin AT1 receptor activa- lowed by introduction of conformational con-
tion (219). straints by a-alkylation, provided the sub-
nanomolar inhibitor (123) (Fig. 15.50) (223).
9.6.2 Substance P. The tachykinin recep- Improved brain penetration was achieved by
tors (NK-1, NK-2, and NK-3) and their endog- amine (124) (224).
enous ligands, the tachykinins, and neuroki- Chemical screening of corporate compound
nins are important neurotransmitters (220- libraries resulted in the discovery of another
Peptidomimetics for Drug Design
Asp-Arg-Val-Tyr-He-His-Pro-Phe-OH
Angiotensin II
GH replacement therapy (241). The peptidyl identifying the important residues for bioac-
GH secretagogue GHRP-6 (242) was used to tivity in GHPR-6, the Merck group began
develop the clinical candidate MK-0677 (132) searching other receptor libraries for known
(Fig. 15.55, EC,, = 1.3 nM) (243, 244). After "privileged structures" in a combinatorial
synthetic fashion (see Section 4) (66). The
Arg-Pro-Lys-Pro-Gln-Gln-Phe-Phe-Gly-Leu-Met-NH2 more active derivative contained a spiropiperi-
Substance P dine moiety attached to an indoline ring.
More recently, ghrelin has been isolated
and identified as an endogenous ligand of the
GHS receptor and some new peptidomimetic
structures [e.g., 133 (Fig. 15.5511have started
to appear (245).
In another approach, SAR studies and sys-
tematic simplification of GHPR-6 at Novo
Nordisk produced the orally bioavailable de-
rivative NN-703 (134). Molecular modeling
overlapping of NN703 (134) (Fig. 15.56) and
MK-0677 (132) (Fig. 15.55) showed structural
similarities between both compounds. Highly
potent hybrids of Ipamorelin and NN-703
Figure 15.50. Development of a substance P (e.g., 135) (Fig. 15.56) have also been de-
inhibitor. scribed (246, 247).
Peptidomimetics for Drug Design
Tyr-Pro-Ser-Lys-Pro-Asp-Asn-Pro-Gly-Glu-Asp-Ala-Pro-Ala-Glu-Asp-Leu-Ala
Arg-Tyr-Tyr-Ser-Ala-Leu-Arg-His-Tyr-lle-A~n-Leu-lle-Thr-Arg-Gln-Arg-Tyr-NH~
Neuropeptide Y
H N NH2 ~
,NH
onists [e.g., 139 (Fig. 15.58)l. The benzothia- Since the discovery of Ro46-2005 (141)
diazole (140) functions as a bioisostere that (Fig. 15.58), the first orally active ET inhibi-
retains and sometimes improves binding to tor, major efforts have been made to modify
the ETAreceptor (254-256). arylsulfonamide derivatives. An isoxazole as
Peptidomimetics for Drug Design
the heterocycle attached to the amino fundion- modeled. Thus, they must be classified as type
ality provided selectivity against ETA receptor I1 peptidomimetics until structural data can
(257)and led to BMS193884 (142)(Ki= 1.4 nM) resolve the issue.
(258)and others, e.g., TBC 3214 (143) (Ki = 0.04
nM) (2591, which are potent, selective, and
orally available ETAantagonists. 10 SUMMARY AND FUTURE
Different binding modes have been pro- DIRECTIONS
posed for ET antagonists. The acid or sulfon-
amido groups are needed to interact with a The "Holy Grail" of peptidomimetic research
cationic site in the receptor, and an aromatic in drug discovery has been to find ways to
interaction with Tyr12' is postulated to be re- transform the structural information con-
sponsible for ETA selectivity. However, be- tained in peptides into non-peptide structures
cause all these receptors are members of the that have drug-like pharmacodynamic proper-
GPCR, there is no assurance that any bind as ties. Many different strategies have been
His-D-Trp-Ala-Trp-D-Phe-Lys-NH2
GHRPB Ghrelin
8
\ / \ TFA
S02CH3
Figure 15.55. GHRP-6 and ghrelin non-peptide derivatives as growth hormone secretagogues
inhibitors.
Figure 16.56. Newer approaches to growth hormone secretagogues inhibitors.
Cys-Ser-Cys-Ser-Ser-Leu-Met
\ I
N-S
OMe OMe
PD156707 EMD 122946
(139) (140)
I 7
BMS 193884
(142)
employed in the search for useful peptidomi- the progress made to date suggests that this
metics-rational design of amide bond re- goal will be achieved. We know that some non-
placements, mimics of turn structures, and peptide scaffolds are topographical mimetics
the like, as well as both designed and discov- of the extended P-strand of enzyme-bound
ered scaffolds that replace the amide bond protease inhibitors because we have the bio-
core of peptides. The field has a long way to go physical methods for characterizing both
before rational design of type I11 peptidomi- types of enzyme-inhibitor complexes. Type I11
metics can be achieved routinely. However, peptidomimetic inhibitors of peptidases have
References
been designed from the substrate sequences about protein-protein interactions is still
and they have been revealed by HTS processes quite limited, the rapid growth of structural
and optimized by application of structural bi- information and methods will eventually al-
ology. At this point, we have learned more low us to design rationally peptidomimetic
about the design of inhibitors by studying how compounds suitable for use in human therapy.
screening leads inhibit enzymes than from the
design of inhibitors from our current, limited
knowledge of enzyme catalysis. Probably the REFERENCES
most important recent discovery is that some 1. M. D. Fletcher and M . M . Campbell, Chem.
screening leads inhibit proteases by binding to Rev., 98, 763 (1998).
a different enzyme active site conformation 2. F. Haviv, T. D. Fitzpatrick, C. J. Nichols, E. N.
that is related mechanistically to the well- Bush, G. Diaz, G. Bammert, A. T . Nguyen, E. S.
characterized extended P-strand of enzyme- Johnson, J. Knittle, and J. Greer, J. Med.
bound protease inhibitors. This result empha- Chem., 37, 701 (1994).
sizes the importance of considering the entire 3. J. Hughes, T. W . Smith, H. W . Kosterlitz, L. A.
ensemble of protein conformations when de- Fothergill, B. A. Morgan, and H. R. Morris, Na-
ture, 258,577 (1975).
signing inhibitors of peptide-protein interac-
tions. 4. G. D. Smith and J. F. Griffin, Science, 199,
Our understanding of peptide mimicry for 1214 (1978).
ligands of constitutive receptors, such as G- 5. A. Aubry, N. Birlirakis, M . Sakarellos-Daitsi-
protein-coupled receptors (GPCR), is much otis, C. Sakarellos, and M. Marraud, Biopoly-
more primitive because high resolution struc- mers, 28,27 (1989).
tural data for agonist- and/or antagonist-re- 6. A. F. Bradbury, D. G. Smyth, and C. R. Snell,
ceptor complexes are not yet available. For Nature, 260, 165 (1976).
this reason, all attempts to rationalize the in- 7. P. S. Farmer in E. J. Ariens, Ed., Drug Design,
teractions between ligand and receptor con- Academic Press, New York, 1980.
tain a considerable element of speculation. It 8. A. B. Smith 111, T . P. Keenan, R. C. Holcomb,
is too early to know whether small non-pep- P. A. Sprengeler, M. C. Guzman, J. L. Wood,
tide structures that bind to GPCR are func- P. J. Carroll, and R. Hirschmann, J. Am.
Chem. Soc., 114,10672 (1992).
tional or topographical mimetics. However,
based on the results obtained by studying pep- 9. A. S. Ripka and D. H. Rich, Curr. Opin. Chem.
tidase inhibitors, it seem likely that at least Biol., 2,441 (1998).
some of the known functional peptidomimet- 10. M. G. Bursavich and D. H. Rich, J. Med. Chem.,
ics receptors ligands will be shown to be topo- 45, 541 (2002).
graphical mimetics. Others may be found to 11. A. F. Spatola in B. Weistein, Ed., Chem. Bio-
act more like GRAB-peptidomimetics in that chem. Amino Acids, Pept., Proteins, Marcel
Dekker, New York, 1983.
they bind to receptor conformations closely re-
lated in energy and mechanism to native con- 12. U. Gether, Endocr. Rev., 21,90 (2000).
formations. Still others will no doubt be found 13. D. P. Fairlie, G. Abbenante, and D. R. March,
that inhibit or stimulate the receptor system Curr. Med. Chem., 2,654 (1995).
by allosteric mechanisms or by interfering 14. R. A.Wiley and D. H. Rich, Med. Res. Rev., 13,
with some multi-step binding process preced- 327 (1993).
ing the formation of the active ligand-receptor 15. J. D. A. Tyndall and D. P. Fairlie, Curr. Med.
complex. In any case, it is clear that successful Chem., 8,893 (2001).
design of functional mimetics by assuming 16. N. R. A. Beeley, Drug Discov. Today., 5, 354
some structural relationship between a (2000).
screening lead and the parent peptide can 17. R. M. Freidinger, Trends Pharmacol. Sci., 10,
work (see Section 9.6),as can the systematic 270 (1989).
modification of the parent peptide. The appli- 18. R. M. Freidinger, Curr. Opin. Chem. Biol., 3,
cation of the principles of peptidomimetic re- 395 (1999).
search has become very important to drug dis- 19. A. Giannis and T . Kolter, Angew. Chem. Intl.
covery. Although our present knowledge Ed. Engl., 32, 1244 (1993).
Peptidomimetics for Drug Design
20. G. J. Moore, Trends Pharmacol. Sci., 15, 124 43. G. R. Marshall, C. D. Barry, H. E. Bosshard,
(1994). R. A. Dammkoehler, and D. A. Dunn, ACS
21. D. C. Rees, Cum. Med. Chem., 1,145 (1994). Symp. Ser., 112,205 (1979).
22. E. E. Sugg, Annu. Rep. Med. Chem., 32, 277 44. R. M. J. Liskamp, Recl. Trav. Chim. Pays-Bas.,
(1997). 113, l(1994).
45. G. Holzemann, Kontakte (Darmstadt), 1, 3
23. T . K. Sawyer, Drugs Pharm. Sci., 101, 81
(1991).
(2000).
46. G. Holzemann, Kontakte (Darmstadt), 2, 55
24. B. A. Morgan and J. A. Gainor, Annu. Rep.
(1991).
Med. Chem., 24,243 (1989).
47. M. Kahn, Synlett, 821-826 (1993).
25. G. J. Moore, Proc. West. Pharmacol. Soc., 40,
115 (1997). 48. R. M. Freidinger, D. F. Veber, D. S. Perlow,
J . R. Brooks, and R. Saperstein, Science, 210,
26. A. Giannis and F. Rubsam, Adv. Drug Res., 29, 656 (1980).
1 (1997).
49. E. D. Thorsett, E. E. Harris, S. D. Aster, E. R.
27. M. Goodman and S. Ro in M. E. Wolff, Ed., Peterson, J. P. Snyder, J. P. Springer, J. Hir-
Burger's Medicinal Chemistry and Drug Dis- shfield, E. W . Tristram, A. A. Patchett, E. H.
covery, vol. 1, Wiley-Interscience, San Diego, Ulm, and T . C. Vassil, J. Med. Chem., 29,251
CA, 1995, pp. 803-861. (1986).
28. D. Obrecht, M. Altorfer, and J. A. Robinson, 50. G. A. Flynn, E. L. Giroux, and R. C. Dage,
Adv. Med. Chem., 4, 1 (1999). J.Am. Chem. Soc., 109,7914 (1987).
29. J. Gante, Angew. Chem. Zntl. Ed. Engl., 33, 51. U. Nagai, K. Sato, R. Nakamura, a n d R. Kato,
1699 (1994). Tetrahedron, 49,3577 (1993).
30. P. A. Hart and D. H. Rich, Pract. Med. Chem., 52. S. Hanessian, G. McNaughton Smith, H. G.
393-412 (1996). Lombart, and W . D. Lubell, Tetrahedron, 53,
31. D. F. Veber, Pept.: Chem. Biol., in J. A. R. 12789 (1997).
Smith and E. Jean, Eds., Proc. Am. Pept. 53. A. J. Souers and J. A. Ellman, Tetrahedron, 57,
Symp., 12th, ESCOM, Leiden, (1992). 7431 (2001).
32. G. R. Marshall, Tetrahedron, 49,3547 (1993). 54. K. Burgess, Acc. Chem. Res., 34,826 (2001).
33. J. W . Erickson and S. W . Fesik, Annu. Rep. 55. A. B. Smith 111, M. C. Guzman, P. A. Spren-
Med. Chem., 27,271 (1992). geler, T . P. Keenan, R. C. Holcomb, J. L.Wood,
34. G. Muller, Curr. Med. Chem., 7,861 (2000). P. J. Carroll, and R. Hirschmann, J. Ani.
Chem. Soc., 116,9947 (1994).
35. D. F. Mierke and C. Giragossian, Med. Res.
Rev., 21,450 (2001). 56. K. D. Stigers, M. J. Soth, and J. S. Nowick,
Curr. Opin. Chem. Biol., 3, 714 (1999).
36. D. F. Veber, F. W . Holly, R. F. Nutt, S. J. Berg-
strand, S. F. Brady, R. Hirschmann, M. S. 57. J. S. Nowick, E. M. Smith, and M. Pairish,
Glitzer, and R. Saperstein, Nature, 280, 512 Chem. Soc. Rev., 25,401 (1996).
(1979). 58. R. P. Cheng, S. H. Gellman, and W . F. De-
37. J. Rivier, M. Brown, and W . Vale, Biochem. Grado, Chem. Rev., 101, 3219 (2001).
Biophys. Res. Commun., 65, 746 (1975). 59. J. Venkatraman, S. C. Shankaramma, and P.
38. G. D. Rose, L. M. Gierasch, and J. A. Smith, Balaram, Chem. Rev., 101,3131 (2001).
Adv. Protein Chem., 37, 1 (1985). 60. D. P. Fairlie, M. L.West, and A. K. Wong, Curr.
Med. Chem., 5,29 (1998).
39. P. W . Schiller, The Peptides: Analysis, Synthe-
sis and Biology, Vol. 6, Academic Press, Or- 61. V . J. Hruby and P. M. Balse, Curr. Med. Chem.,
lando, FL, 1984. 7,945 (2000).
40. T . K. Sawyer, V . J. Hruby, P. S. Darman, and 62. V . J. Hruby, Acc. Chem. Res., 34,389 (2001).
M. E. Hadley, Proc. Natl. Acad. Sci. USA, 79, 63. H. Nakanishi and M. Kahn, Bioorg. Chem.
1751 (1982). Pept. & Protein, 12,395 (1998).
41. K. Ishikawa, T . Fukami, T . Nagase, K. Fujita, 64. R. Hirschmann, P. A. Sprengeler, T . Kawasaki,
T . Hayama, K. Niiyama, T . Mase, M. Ihara, J. W . Leahy, W . C. Shakespeare, and A. B.
and M. Yano, J. Med. Chem., 35,2139 (1992). Smith 111, Tetrahedron, 49,3665 (1993).
42. G. R. Marshall, F. A. Gorin, and M. L. Moore, 65. Y . Qian, A. Vogt, S. M. Sebti, and A. D. Hamil-
Annu. Rep. Med. Chem., 13,227 (1978). ton, J. Med. Chem., 39,217 (1996).
References
130. R. V. Talanian, K. D. Brady, and V. L. Cryns, 145. F. Ooms, Curr. Med. Chem., 7,141 (2000).
J. Med. Chem., 43, 3351 (2000). 146. J. Wouters and F. Ooms, Curr. Pharm. Des., 7,
131. D. S. Karanewsky, X. Bai, S. D. Linton, J. F. 529 (2001).
Krebs, J. Wu, B. Pham, and K. J. Tomaselli, 147. E. Vieira, A. Binggeli, V. Breu, D. Bur, W. Fis-
Bioorg. Med. Chem. Lett., 8,2757 (1998). chli, R. Guller, G. Hirth, H. P. Marki, M. Mul-
132. A. B. Shahripour, M. S. Plummer, E. A. Lun- ler, C. Oefner, M. Scalone, H. Stradler, M. Wil-
ney, T. K. Sawyer, C. J. Stankovic, M. K. Con- helm, and W. Wostl, Bioorg. Med. Chem. Lett.,
nolly, J. R. Rubin, N. P. C. Walker, K. D. Brady, 9,1397 (1999).
H. J. Allen, R. V. Talanian, W. W. Wong, and C. 148. G. Guller, A. Binggeli, V. Breu, D. Bur, W. Fis-
Humblet, Bioorg. Med. Chem. Lett., 11, 2779 chli, G. Hirth, C. Jenny, M. Kansay, F. Mon-
(2001). tavon, M. Muller, C. Oefner, H. Stradler, E.
133. D. Lee, S. A. Long, J. L. Adams, G. Chan, K. S. Vieira, M. Wilhelm, W. Wostl, andH. P. Marki,
Vaidya, T. A. Francis, K. Kikly, J. D. Winkler, Bioorg. Med. Chem. Lett., 9, 1403 (1999).
C.-M. Sung, C. Debouck, S. Richardson, M. A. 149. C. Oefner, A. Binggeli, V. Breu, D. Bur, J.-P.
Levy, W. E. DeWolf Jr., P. M. Keller, T. To- Clozel, A. D'Arcy, A. Dorn, W. Fischli, F.
maszek, M. S. Head, M. D. Ryan, R. C. Halti- Gruninger, R. Guller, G. Hirth, H. P. Marki, S.
wanger, P.-H. Liang, C. A. Janson, P. J. McDe- Mathews, M. Muller, R. G. Ridler, H. Stadler,
vitt, K. Johanson, N. 0.Concha, W. Chan, S. S. E. Vieira, M. Wilhelm, F. K. Winklier, and W.
Abdel-Meguid, A. M. Badger, M. W. Lark, D. P. Wostl, Chem. Biol., 6, 127 (1999).
Nadeau, L. J. Suva, M. Gowen, and M. E. Nut- 150. M. G. Bursavich, C. W. West, and D. H. Rich,
tall, J. Biol. Chem., 275, 16007 (2000). Org. Lett., 3,2317 (2001).
134. D. Lee, S. A. Long, J. H. Murray, J. L. Adams, 151. A. B. Smith 111, R. Hirschmann, A. Pasternak,
M. E. Nuttall, D. P. Nadeau, K. Kikly, J. D. W. Yao, P. A. Sprengeler, M. K. Holloway, L. C.
Winkler, C.-M. Sung, M. D. Ryan, M. A. Levy, Kuo, Z. Chen, P. L. Darke, and W. A. Schleif,
P. M. Keller, and W. E. DeWolf Jr., J. Med. J. Med. Chem., 40,2440 (1997).
Chem., 44,2015 (2001).
152. J. D. A. Tyndall, R. C. Reid, D. P. Tyssen, D. K.
135. R. E. Dolle, J. Singh, J. Rinker, D. Hoyer, Jardine, B. Todd, M. Passmore, D. R. March,
C. V. C. Prasad, T. L. Graybill, J. M. Salvino, L. K. Pattenden, D. A. Bergman, D. Alewood,
C. T. Helaszek, R. E. Miller, and M. A. Ator, S.-H. Hu, P. F. Alewood, C. J. Birch, J. L. Mar-
J. Med. Chem., 37,3863 (1994). tin, and D. P. Fairlie, J. Med. Chem., 43,3495
136. B. K. Kay, A. V. Kurakin, and R. Hyde-DeRuy- (2000).
scher, Drug Discov. Today, 3,370 (1998). 153. S. E. Hagen, J. V. N. V. Prasad, F. E. Boyer, .
137. F.Al-Obeidi, V. J. Hruby, and T. K. Sawyer, J. M. Domagala, E. L. Ellsworth, C. Gajda,
Mol. Biotechnol., 9,205 (1998). H. W. Hamilton, L. J. Markoski, B. A. Stein-
138. A. E. P. Adang and P. H. H. Hermkens, Curr. baugh, B. D. Tait, E. A. Lunney, P. J. Tum-
Med. Chem., 8,985 (2001). mino, D. Ferguson, D. Hupe, C. Nouhan, S. J.
Gracheck, J. M. Saunders, and S. Vander-
139. K. S. Lam, M. Lebl, and V. Krchnak, Chem. Roest, J. Med. Chem., 40,3707 (1997).
Rev., 97,411 (1997).
154. T. M. Judge, G. Phillips, J. K. Morris, K. D.
140. A. C. Good, S. R. Krystek, and J. S. Mason, Lovasz, K. R. Romines, G. P. Luke, J. Tulin-
Drug Discov. Today., 5,61(2000). sky, J. M. Tustin, R. A. Chrusciel, L. A. Dolak,
141. S. P. Rohrer, E. T. Birzin, R. T. Mosley, S. C. S. A. Mizsak, W. Watt, J. Morris, S. L. V. Velde,
Berk, S. M. Hutchins, D.-M. Shen, Y. Xiong, J. W. Strohbach, and R. B. Gammill, J. Am.
E. C. Hayes, R. M. Parmar, F. Foor, S. W. Mi- Chem. Soc., 119,3627 (1997).
tra, S. J. Degrado, M. Shu, J. M. Klopp, S . J . 155. G. V. De Lucca, S. Erickson-Viitanen, and
Cai, A. Blake, W. W. S. Chan, A. Pasternak, L. P. Y. S. Lam, Drug Discov. Today, 2, 6 (1997).
Yang, A. A. Patchett, R. G. Smith, K. T. Chap- 156. W. Schaal, A. Karlsson, G. Ahlsen, J. Lindberg,
man, and J. M. Schaeffer, Science, 282, 737 H. 0. Andersson, U. H. Danielson, B. Classon,
(1998). T. Unge, B. Samuelsson, J. Hulten, A. Hall-
142. R. S. Bohacek, C. McMartin, and W. C. Guida, berg, and A. Karlen, J. Med. Chem., 44, 155
Med. Res. Rev., 16,3 (1996). (2001).
143. Y. Kurogi and 0. F. Guner, Curr. Med. Chem., 157. J. P. Vacca, Curr. Opin. Chem. Biol., 4, 394
8, 1035 (2001). (2000).
144. J. S. Mason, A. C. Good, and E. J. Martin, Curr. 158. P. E. J. Sanderson, Med. Res. Rev., 19, 179
Pharm. Des., 7,567 (2001). (1999).
Peptidomimetics for Drug Design
Analog Design
JOSEPH G . CANNON
The University of Iowa
Iowa City, Iowa
Contents
1 Introduction, 688
2 Bioisosteric Replacement and Nonisosteric
Bioanalogs (Nonclassical Bioisosteres), 689
3 Rigid or Semirigid (Conformationally Restricted)
Analogs, 694
4 Homologation of Alkyl Chain or Alteration of
Chain Branching; Changes in Ring Size; Ring-
Position Isomers; and Substitution of an
Aromatic Ring for a Saturated One, or the
Converse, 699
5 Alteration of Stereochemistry and Design of
Stereoisomers and Geometric Isomers, 704
6 Fragments of the Lead Molecule, 707
7 Variation in Interatomic Distances, 710
Table 16.1 Bioisosteric Atoms and Groups bioisosteres. Floersheim et al. (5) proposed
1. Univalent that such compounds be designated as
-F -OH -NH2 -CH3 -C1 nonisosteric bioanalogs, replacing the older
S H -pH2 term, "nonclassical bioisosteres." However,
-I t-C4H9 most of the contemporary literature retains
-Br i-C3H7 the nonclassical bioisostere terminology. Ta-
2. Bivalent ble 16.2 lists representative nonclassical bio-
-0- 4- S e - -CH,- -NH- isosteres.
3. Tervalent Dihydromuscimol(1) and thiomuscimol(2)
-N= -CH=
are cyclic analogs of y-aminobutyric acid
-P= -As=
4. Quadrivalent
(GABA) (31, in which the C=N moiety of the
-C- S i -
5. Ring equivalents
-CH=CH- -S-(e.g., benzene, thiophene)
4 H - =N-(e.g., benzene, pyridine)
-0- S - -CH2- -NH-
-
flection of some action on the same biological
process or at the same receptor site. Bioisos-
teric similarity of molecules is commonly as-
signed on the basis of the number of valence heterocyclic ring is considered to be bioisos-
electrons of an atom or a group of atoms rather teric with the of GABA. The -S- moiety
than on the total number of orbital electrons, of thiomuscimol is bioisosteric with the ring
as was originally specified by Langmuir. In a -0- of dihydromuscimol. Both (1)and (2) are
remarkable number of instances, compounds highly potent agonists at GABA, receptors, as
result that have similar (or even diametrically determined in an electrophoresis-based assay
opposite) pharmacological effects compared (6).
with those of the parent compound. Catego- Because of its bioisosteric similarity to the
ries of classic bioisosteres have been described normal physiological substrate L-dopa (4),
(2) (Table 16.1). L-mimosine (5) inhibits catechol oxidation by
A more recent comprehensive review of the enzyme tyrosinase (7). These compounds
bioisosterism appeared in 1996 (3). In a short exemplify a situation in which bioisosteres dis-
communication, Burger (4) discussed and pro- play opposite pharmacologic effects at the
vided valuable insights into isosterism and same receptor.
bioanalogy in drug design. The sulfonium bioisostere (6) of N,N-di-
Many compounds have been identified that methyldopamine (7) retains the dopaminergic
comply with the "biology" aspect of the bio- agonist effect displayed by (7) (8). The fact
isostere concept but that do not fit the strict that (6) bears a permanent unit positive
chemical (steric and electronic) definition of charge was invoked in support of the hypoth-
Table 16.2 Nonclassical Bioisosteres
1. Carbonyl group
H
-NHCN -CH(CN)2
Catechol
H
X=O,NR
Halogen
X CF3 CN N(CN)2 C(CN),
Thioether
Thiourea
NO2 R NR3
Hydrogen
H F
Analog Design
cH2
I
H2N- C-H
I
COOH
(4)
CH~
I
H2N-C-H
I
COOH
3 R I G I D O R SEMIRIGID
(CONFORMATIONALLY RESTRICTED)
ANALOGS
of glutamate receptors (171, can be considered Imposition of some degree of molecular rigid-
to be a nonclassical bioisostere of the corre- ity on a flexible organic molecule (e.g., by in-
sponding carboxyl group of glutamic acid (20). corporation of elements of the flexible mole-
Compounds (21-23) illustrate further ex- cule into a rigid ring system or by introduction
amples of nonclassical bioisosteres. Com- of a carbon-carbon double or triple bond) may
pound (21) was reported to display anti- result in potent, biologically active agents that
trypanosomal activity (18).The analogs (22) show a higher degree of specificity of pharma-
and (23) also displayed antitrypanosomal ac- cologic effect. There are possible advantages
tivity (19). Compound (22) demonstrated the to this technique (20): the key functional
most impressive activity (IC,, values of 40 and groups are held in one steric disposition or, in
3 Rigid or Semirigid (Conformationally Restricted) Analogs 695
The conformational restrictions imposed The exo-phenyl isomer (32) was six times as
on the indole-3-ethylamine moiety permitted potent as the endo-phenyl isomer (31), and it
retention of affinity for the 5-HT,, receptor was twice as potent as meperidine itself in a
but it diminished affinity for the 5-HT,, re- benzoquinone-induced writhing assay for an-
ceptor by a factor of 1000. In two functional algesic effect.
assays, (29) exhibited potency equal to or mar- Rigid analogs (331, (341, and (35)of phen-
ginally greater than that of serotonin. Com- cyclidine (36) possess a rigid carbocyclic struc-
3 Rigid or Semirigid (Conformationally Restricted) Analogs 697
lustrates that the achievement of conforma- succinate. These results led to the conclusion
tional integrity by incorporation of a flexible that the molecular shape of the E-ester (51)
pharmacophore into a bulky, complex mole- more closely approximates that assumed by
cule may be at the expense of biological activ- succinylcholine when it interacts with myo-
ity. neural nicotinic receptors.
Rigidity was introduced into the glutamic Restricted rotation was also introduced
acid moiety in a series of bioisosteric conge- into the succinic acid moiety of succinyldicho-
ners (46-48) (34). These systems showed po- line by preparation of the choline esters of cis-
and trans-cyclopropane-1,2-dicarboxylic acids
(52) and (53)(36,371. Myoneural blocking ac-
tivity was assessed in dogs (37) and-cats (36)
and, as indicated above for the E- and Z-ole-
finic esters (51) and (50), the extended trans-
isomer (53) demonstrated much greater po-
tency and a longer duration of action than
those of the cis-isomer (52). The cyclobutane
congeners (54) and (55)presented unexpected
results that are difficult to rationalize: the cis-
isomer (54) was much less potent than the
tent agonist activity at subpopulations of trans-isomer (55)in a cat assay for myoneural
metabotropic glutamate receptors. The geom- blockade, but it presented a decidedly longer
etry of these congeners led to the conclusion duration of action than that of the trans-iso-
that glutamic acid itself interacts with the mer (36).
metabotropic glutamate receptors in a fully
extended conformation.
The rotational orientation of the ester moi- 4 H O M O L O G A T I O N O F ALKYL C H A I N
eties of the myoneural blocking agent succi- O R ALTERATION O F C H A I N BRANCHING;
nyldicholine (49) was restricted by introduc- CHANGES IN RlNG SIZE; RING-POSITION
tion of a double bond into the succinic acid ISOMERS; A N D SUBSTITUTION O F A N
portion (501, (51) (35). The E-fumarate ester AROMATIC RlNG FOR A SATURATED
ONE, O R THE CONVERSE
tem, position isomers may differ in their comple- remainder of a conformationally variable mole-
mentarity to receptors, and the position of a sub- cule. What has sometimes been trivialized as
stituent on a ring may influence the spatial "methyl group roulette" may indeed be an im-
occupancy of the ring system with respect to the portant parameter in the design of analogs.
Homologation of the N-alkyl chain in rats. It seems likely that the enhanced dopa-
norapomorphine (56) from methyl (57) to n- minergic agonist effects conferred by N-ethyl
propyl(59) produced incremental increases in and N-n-propyl groups on aporphine and
emetic response in dogs and in stereotypy re- p-phenethylamine-derived molecules are not
sponses in rodents (38,39). related merely to enhanced lipophilic charac-
ter or to partitioning phenomena, but rather
to the likelihood that the two- and three-car-
bon chains have a positive affinity for subsites
on certain dopamine receptors. It may be spec-
ulated that these receptor subsites do not ac-
commodate longer alkyl chains (e.g., n-butyl
or n-pentyl). However, different assays fordo-
paminergic stimulant effects and different an-
imal species were used in refs. 41,42, and 43,
and care must be exercised in drawing firm
structure-activity relationship conclusions
based on these data.
The alkyl linker between the two heterocy-
clic ring systems in structure (65) was modi-
(66) linker = Y =H
combinations of alkyl groups may impart a
high degree of dopamine agonist effects (40). (67) linker = 4 Y=H
NJV-dimethyldopamine (61) is extremely CH3
potent in assays for dopaminergicagonism (pi- (68) linker = Y =H
geon pecking, emesis in dogs, and inhibition of
cat cardioaccelerator nerve), as is NJV-di-n- (69) linker = Y =H
propyldopamine (62) (41). N-n-Propyl-N-n- CH3
butyldopamine (63) is potent in behavioral as-
says in nigra-lesioned rats (42). However, fied in studies of the ability of analogs to bind
NJV-di-n-butyldopamine(64)is virtually inert to the cholecystokinin-B receptor (44). When
in these assays (41, 42). N,N-di-n-Pentyldo- this linking group was -CH2--CH2-, the com-
pamine was reported (43) to be inert in a pound (structure 66) was extremely potent in
caudectomized mouse behavioral assay and in radioligand displacement assays on mouse
a rotatory behavioral assay in nigra-lesioned brain membranes. Introduction of carbon-car-
Analog Design
bon unsaturation (E-olefin) into the linker ished toxicity 16-fold compared to that of the
(structure 67) resulted in a 16-fold decrease in nonmethylated system (70).
binding ability; this suggests that conforma- In contrast, (R)-(71), the 2'-methyl conge-
tional restriction and limitation of molecular ner, exhibited only a fivefold decrease in anti-
flexibility have deleterious effects on biologi- viral potency compared to that of compound
cal activity. However, no data were reported (70),but it also exhibited a 30-fold lessening of
on the 2-isomer of this olefinic molecule, so toxicity, to produce a substantial increase in
that caution should be exercised in drawing therapeutic index over that of (70). The (S)-
conclusions. Introduction of a bromine sub- (71) enantiomer was somewhat less potent
stituent (65, Y = Br) into (66) produced a than its (R)-enantiomer. The gem-dimethyl
threefold increase in potency, whereas the congener (73) was also somewhat less potent
same structural modification of the olefin (67) than the (R)-2'-monomethyl compound (71)
resulted in a threefold decrease in potency. and it was markedly more toxic. The (S)-2'-
Branching the linker chain with a methyl methyl stereoisomer of (71) exhibited a decid-
group adjacent to the quinazolinone ring (68) edly lower therapeutic index than that of its
resulted in a 350-fold decrease in affinity. (R)-enantiomer.
However, chain branching with a methyl Closely related to alteration of chain
group in the alternate position on the ethylene length andlor chain branching is alteration
chain produced compound (69), whose recep- of ring size. Compound (74) showed nano-
tor affhity was of the same order of magnitude
as the extremely potent lead compound (66).
The exponential difference in receptor-bind-
ing ability exhibited by the two isomeric
branched-chain linker compounds (68) and
(69) was ascribed to unfavorable steric inter-
actions between the receptor and the linker
methyl group of (68) (44). This conclusion may
be compromised by the fact that both (68) and
(69) were evaluated as their racemates.
A study (45) of 2-(phosphonomethoxy)eth-
ylguanidines (70-73) as antiviral (herpes and
5 ALTERATION O F STEREOCHEMISTRY
A N D D E S I G N O F STEREOISOMERS
A N D G E O M E T R I C ISOMERS
The earlier, almost universally accepted belief
that if one enantiomer of a chiral molecule
demonstrates pharmacological activity, the
other enantiomer will be pharmacologically
inert, is not valid. It must be anticipated that
all stereoisomers of an organic molecule will
exhibit pharmacological effects, frequently
widely different and unpredictable. Many ex-
amples of qualitative and quantitative differ-
loss of pressor effect, but the drug, like am- ences in metabolism of enantiomers are docu-
phetamine, has been used as a nasal deconges- mented (57).
tant, and it has CNS-mediated anorexigenic ( 2)-3-(3-Hydroxypheny1)-N-n-propylpip-
effect (52,53). It is said to have somewhat less eridine (3-PPP, 92) was originally described
central stimulant action than the correspond- (58) as having highly selective activity at do-
ing aromatic ring derivatives (54a-d). paminergic autoreceptors.
The benzene (88) and cyclohexane (89) At high doses (R)-(92) selectively stimu-
congeners have almost identical effects in lated presynaptic dopaminergic receptor sites,
blocking bronchoconstriction produced by his- whereas at lower doses it selectively stimu-
tamine, serotonin, or acetylcholine in the lated postsynaptic receptor sites (59). In con-
guinea pig in vivo (55).They also showed iden- trast, the (S)-enantiomer stimulated presyn-
tical LD,, values in mice. The stereochemistry aptic dopamine receptors and at the same dose
of these compounds was not addressed. level, it blocked postsynaptic dopamine recep-
5 Alteration of Stereochemistry and Design of Stereoisomers and Geometric Isomers 705
H2C COOH
I
exploited to achieve a similar kind of open- noid X-receptor ligand and it is inactive at the
chain analogy to the steroid ring system as in retinoic acid receptor, whereas the (R,R)-en-
diethylstilbestrol, and a high level of estro- antiomer is an extremely weak agonist at the
genic activity results. retinoid X-receptor, although it has some ef-
Hexestrol(107), the saturated congener of fect a t the retinoic acid receptor. Thus, the
molecular modifications shown in (109) re-
sult in selectivity of action a t these two re-
ceptors.
eta1muscle blockade. This fundamental mech- ence in the spectrum and severity of side
anistic difference is probably attributed, at effects and in the technique of employment of
least in part, to the flexibility of the decame- these two drugs in clinical practice. In all types
thonium molecule compared with that of d- of analog design, changes in chemical struc-
tubocurarine. There is a considerable differ- ture may result in unanticipated changes in
Analog Design
Contents
1 Introduction, 716
1.1 Enzyme Inhibitors in Medicine, 716
1.2 Enzyme Inhibitors in Basic Research, 720
2 Rational Design of Noncovalently Binding
Enzyme Inhibitors, 720
2.1 Forces Involved in Forming the Enzyme-
Inhibitor Complex, 721
2.1.1 Electrostatic Forces, 723
2.1.2 van der Wads Forces, 723
2.1.3 Hydrophobic Interactions, 724
2.1.4 Hydrogen Bonds, 724
2.1.5 Cation-n Bonding, 724
2.2 Steady-State Enzyme Kinetics, 725
2.2.1 The Michaelis-Menten Equation, 725
2.2.2 Treatment of Kinetic Data, 726
2.3 Rapid, Reversible Inhibitors, 728
2.3.1 Types of Rapid, Reversible Inhibitors,
728
2.3.1.1 Competitive Inhibitors, 728
2.3.1.2 Uncompetitive Inhibitors, 729
2.3.1.3 Noncompetitive Inhibitors, 730
2.3.2 Dixon Plots, 731
2.3.3 IC,, Values, 731
2.3.4 Examples of Rapid Reversible
Inhibitors, 733
2.4 Slow-, Tight-, and Slow-Tight-Binding
Inhibitors, 734
2.4.1 Slow-Binding Inhibitors, 734
2.5 Inhibitors Classified on the Basis of
Structure/Mechanism, 740
2.5.1 Ground-State Analogs, 740
Burger's Medicinal Chemistry and Drug Discovery 2.5.2 Multisubstrate Analogs, 741
Sixth Edition, Volume 1: Drug Discovery 2.5.3 Transition-State Analogs, 748
Edited by Donald J. Abraham 3 Rational Design of Covalently Binding Enzyme
ISBN 0-471-27090-3 O 2003 John Wiey & Sons, Inc. Inhibitors, 754
715
716 Approaches to the Rational Design of Enzyme Inhibitors
Table 17.1 Examples of Enzyme Inhibitors Used in the Treatment of Bacterial, Fungal, Viral,
and Parasitic Diseases
Clinical Use Enzyme Inhibited Inhibitor
Antibacterial Dihydropteroate synthetase Sulphonamides
Antibacterial Dihydrofolate reductase Trimethoprim, methotrexate
Antibacterial Alanine racemase D-Cycloserine
Antibacterial Transpeptidase Penicillins, cephalosporins
Antifungal Fungal sterol l4a-demethylase Clotrimazole, ketoconazole
Antifungal Fungal squalene epoxidase Terbinafine, naftifine
Antiviral Thymidine kinase and thymidylate kinase Idoxuridine
Antiviral DNA, RNA polymerases Cytosine arabinoside (Ara-C)
Antiviral Viral DNA polymerase Acyclovir, vidarabine
Antiviral HIV reverse transcriptase Dideoxyinosine, zidovudine
Antiviral HIV protease Saquinavir
Antiviral Influenza virus neuraminidase Zanamavir, oseltamivir
Antiprotozoal Pyruvate dehydrogenase Organoarsenical agents
Antiprotozoal Ornithine decarboxylase a-Difluoromethylornithine
velopment of the sulfa drugs (sulfonamides), ble to exploit subtle structural differences
enzyme inhibitors have played a vital role in between the isozymes to obtain a highly spe-
controlling these infectious agents. Table 17.1 cific inhibitor that preferentially binds to
provides a list of enzyme inhibitors that have the invader's version. Trimethoprim (2)
been used in the treatment of the various dis- shows this selective toxicity. An inhibitor of
eases caused by these agents. All these com- dihydrofolate reductase, trimethoprim is a
pounds needed to satisfy the usual require- potent antibacterial agent because the bac-
ments for specificity and low toxicity. terial enzyme is inhibited at a concentration
This can be achieved in a variety of ways. several thousand times lower than that re-
For instance, it is possible to inhibit an es- quired for inhibition of the mammalian
sential pathway in the pathogen that does isozyme (12). Acyclovir ( 3 4 , an antiviral
not exist in the host. D-Cycloserine (1) (Fig. drug used for the treatment of herpes infec-
17.11, for example, inhibits alanine race- tions (13, 141, also fits into this category. It
mase, an enzyme involved in bacterial cell binds very tightly to the Herpes simplex
wall biosynthesis and not found in humans DNA polymerase with an estimated half-life
(8,9). D-Cycloserine is active against a broad of about 40 days. Acyclovir is a prodrug be-
spectrum of both gram-positive and gram- cause it requires transformation by a viral
negative bacteria (lo), but plays its major thymine kinase and cellular phosphotrans-
role in the treatment of tuberculosis (11). ferases to the corresponding triphosphate
Conversely, even if both host and pathogen (3b) to serve in vivo as an inhibitor of the
contain the same enzymes, it may be possi- viral DNA polymerase (15).
Table 17.3 Examples of Enzyme Inhibitors Used in Various Human Disease States
Clinical Use Enzyme Inhibited Inhibitor
Epilepsy GABA transaminase y-Vinyl GABA
Epilepsy Carbonic anhydrase Sulthiame
Epilepsy Succinic semialdehyde dehydrogenase Sodium valproate
Antidepressant Monoamine oxidase (MAO) Tranylcypromine, phenelzine
Antihypertensive Angiotensin converting enzyme Captopril, enalaprilat
Cardiac disorders Na',K'-ATPase Cardiac glycosides
Gout Xanthine oxidase Allopurinol
Ulcer Hf ,K+-ATPase Omeprazole
Hyperlipidemia HMG-CoA reductase Atorvastatin, simvastatin
Anti-inflammatory Prostaglandin synthase, Aspirin, naproxen, ibuprofen
Cyclooxygenase (COX) I and I1
Arthritis Cyclooxygenase (COX) I1 Celecoxib
Glaucoma Acetylcholinesterase Neostigmine
Glaucoma Carbonic anhydrase I1 Acetazolamide, dichlorphenamide
Although their inhibitors are not specifi- prodrug form of an inactivator of thymidylate
cally therapeutic agents in themselves, the synthase (25),and methotrexate (71, an inhib-
p-lactamases are another important target for itor of dihydrofolate reductase (26, 27), both
drug design. These are bacterial enzymes and, fit into this category. Unfortunately, rapidly
as with the alanine racemases, are not found dividing normal cells, such as hair follicles, the
in humans. Inhibitors of p-lactamases include cells lining the gastrointestinal tract, and the
clavulanic acid (4) (16-20) and sulbactam bone marrow cells involved in the immune
(penicillanic acid sulfone) (5) (18, 21-24). system are also significantly affected. The re-
These two compounds act to prevent the bac- sultant hair loss, nausea, and susceptibility to
terial degradation of penicillins and cephalo- infection means that this type of chemother-
sporins by p-lactamases, thereby extending apy is seldom employed as a first-line defense
their lifetime and effectiveness. Accordingly, against cancer.
both clavulanic acid (4) and sulbactam (5) The inhibition of enzymes involved in met-
have reached the market as drugs that act syn- abolic pathways is not restricted to anticancer
ergistically with these commonly prescribed agents. A variety of diseases have been corre-
antibacterial agents. lated with either the dysfunction of an enzyme
Even though it has proved possible to selec- or an imbalance of metabolites. A cross section
tively inhibit the enzymes of a number of of the disease states treated with enzyme in-
pathogens, the enzymes of cancer cells have hibitors is shown in Table 17.3. Practically,
proved to be a far more elusive target. Indeed, these may be treated by the inhibition of an
the majority of the currently employed antitu- individual enzyme or by using enzyme inhibi-
mor agents can be described as antiprolifera- tors to regulate the metabolite concentration
tive agents. These take advantage of the fact in the body. For example, an imbalance of the
that many, but not all, tumor cells grow and two neurotransmitters, glutamate and y-ami-
divide more rapidly than normal cells. Lym- nobutyric acid, is responsible for the convul-
phomas, for example, proliferate more rapidly sions observed in epileptic seizure. The latter
than solid tumors, whereas, conversely, acute is metabolized by y-aminobutyric acid amino-
leukemia cells divide more slowly than the transferase (GABA-T) and, consequently, in-
surrounding bone marrow cells. Most of the hibitors of this enzyme offered themselves as
enzyme inhibitors used as these antiprolifera- potential antiepileptic candidates. This led to
tive agents (Table 17.2) can also be described the development of the GABA-T inhibitor, vi-
as antimetabolites (i.e., they inhibit a meta- gabatrin (8)(28),which clinically results in an
bolic pathway), often those involved in DNA increase of the brain concentration of y-ami-
biosynthesis, which are important for cell sur- nobutyric acid and cessation of epileptic con-
vival or replication. 5-Fluorouracil (6), the vulsions. As with the anticancer agents, block-
1 Introduction
-0
N,
0
2 H2N ~q~~
I\\
OMe
OMe
RO 4
(3a) R = H
(3b) R = PPP
ade of a metabolic pathway may also have found to be effective in the treatment of hyper-
therapeutic benefits. The statins, a group of lipidemia and familial hypercholesteremia
serum cholesterol-lowering drugs, are inhibi- (33,341 and have become some of the world's
tors of hydroxymethylglutaryl-CoA (HMG- best-selling drugs.
CoA) reductase (29). HMG-CoA reductase cat- Finally, enzyme inhibitors can also be used
alyzes the irreversible conversion of HMG- to induce an animal model of a genetic disease.
CoA to mevalonic acid, the rate-determining Inactivation of y-cystathionase by propargyl-
step in cholesterol biosynthesis (3032). In- glycine, for example, produces an experimen-
hibitors such as simvastatin (9) have been tal model of the disease state known as cysta-
720 Approaches to the Rational Design of Enzyme inhibitors
thioninuria (35). Deficiency of this enzyme cating how it may be evaluated. The
leads to the accumulation of cystathionine in discussion will be accompanied by references
the urine and has sometimes been associated to recent, representative examples from the
with mental retardation (36). literature. Where appropriate, these examples
will be of inhibitors of therapeutic interest.
1.2 Enzyme lnhibitors in Basic Research It should be noted that we will concentrate
In basic research enzyme inhibitors have on inhibitors directed at the active site of the
found a multitude of uses. They serve as useful enzyme. While recognizing that there are in-
tools for the elucidation of structure and func- hibitors that bind to regions other than the
tion of enzymes, as probes for chemical and active site, such as allosteric effectors, these
kinetic processes, and in the detection of are not the focus of this chapter and will not be
short-lived reaction intermediates (37). Prod- included. There are many reviews of enzyme
uct inhibition patterns provide information inhibitors available in the literature (37,
about an enzyme's kinetic mechanism and the 46-48) and the reader is referred to them for
order of substrate binding (38). Covalently more detailed analysis.
binding enzyme inhibitors have been used to
identify active-site amino acid residues that
could potentially be involved in substrate 2 RATIONAL DESIGN OF
binding and catalysis of the enzyme (39, 40). NONCOVALENTLY BINDING
Reversible enzyme inhibitors are routinely ENZYME INHIBITORS
used to facilitate enzyme purification by using
the inhibitor as a ligand for affinity chroma- As their name indicates, this class of inhibi-
tography (41, 42) or as eluants in affinity-elu- tors binds to the enzyme's active site without
tion chromatography (43). Immobilized en- forming a covalent bond. Therefore the affin-
zyme inhibitors can also be used to identify ity and specificity of the inhibitor for the active
their intracellular targets (44), whereas irre- site will depend on a combination of the elec-
versible inhibitors can be used to localize and trostatic and dispersive forces, and hydro-
quantify enzymes in vivo (45). phobic and hydrogen-bonding interactions.
In Table 17.4 we have provided the classifi- Traditionally, noncovalently binding enzyme
cation of the various types of enzyme inhibi- inhibitors were analogs of substrates, prod-
tors that we employ in this chapter. The clas- ucts, or reaction intermediates. More recently,
sification may appear somewhat arbitrary, in an explosion in the use of combinatorial chem-
that some inhibitors may fit into more than istry and rapid screening techniques has seen
one category. This can arise because these cat- the development of large numbers of enzyme
egories are attempting to bring together some inhibitors that bear little or no resemblance to
nonrelated properties such as structure, the substrate or products, yet still bind selec-
mechanism of action, and kinetic behavior. tively to their target enzyme. Computer-aided
Thus, what we have classed as a reversible drug design, in the broadest sense, encom-
inhibitor may, simply because it has a slow passes both structure-based drug design and
dissociation rate, be described elsewhere in quantitative structure-activity relationship
the literature as being irreversible. In each in- (QSAR) methods. A complement to the rapid
stance we will discuss approaches to the de- screening techniques, computer-aided meth-
sign of that type of inhibitor, as well as indi- ods provide a more focused approach to the
2 Rational Design of Noncovalently Binding Enzyme Inhibitors 721
design and discovery of both substrate and ing of the forces involved in the binding of
nonsubstrate analog inhibitors. substrates and inhibitors to an enzyme's ac-
In structure-based design, the structure tive site.
of a drug target interacting with small mol-
ecules is used to guide drug discovery. Con- 2.1 Forces Involved in Forming the Enzyme-
sequently, either the three-dimensional en- Inhibitor Complex
zyme structure or, at a minimum, t h e To understand the design concepts of the var-
pharmacophore structure must be known. A ious types of noncovalently binding enzyme
pharmacophore represents the nature of the inhibitors, a basic knowledge of the binding
chemical groups of a given ligand and their forces between an enzyme's active site and its
relative orientation important for inhibitor inhibitors is required. The forces involved in a
binding. Today, structure-based design, substrate or an inhibitor binding to an en-
used in conjunction with docking tech- zyme's active site are, as with a drug binding
niques, combinatorial chemistry, and rapid to a receptor, the same forces that are experi-
screening not only leads more quickly to enced by all interacting organic molecules.
novel enzyme inhibitors but also greatly re- These include ionic (electrostatic) interac-
duces the number of compounds that must tions, ion-dipole and dipole-dipole interac-
be synthesized. More information on these tions, hydrogen bonding, hydrophobic interac-
approaches may be found in Chapter 10 and tions, and van der Waals interactions. A brief
some recent monographs (49-52). overview of the forces involved follows. More
Traditionally, an increase in inhibitory or comprehensive treatments can be found in
Chapter 4 and elsewhere (57-60).
biological activity was achieved by synthesiz-
The binding of an inhibitor is dependent on
ing an analog of the substrate and then mak-
a variety of interactions, and it is the sum of
ing gradual empirical changes in the structure these interactions that will determine the de-
by adding or removing functional groups. gree of affinity of an inhibitor for the particu-
QSAR methods provide a means of making lar enzyme. The reversible binding of an in-
this empirical testing more focused. In this hibitor to an enzyme's active site can be
technique there is no need to know the struc- described as shown in Equation 17.1.
ture of the active site. Instead, computer algo-
rithms are employed to correlate the biological
activity of a series of inhibitors with their
chemical structure, thereby allowing better
predictions as to how to change the structure
to obtain a more potent inhibitor. This topic is There is an equilibrium between the free
discussed further in Chapter 1, and detailed enzyme (E), inhibitor (I), and the enzyme-in-
reviews are also available (53-56). hibitor complex (E . I). The affinity of an in-
Table 17.4 shows the classification of non- hibitor for the enzyme is measured by the in-
covalent inhibitors we use in this chapter. hibition constant Ki, which is the dissociation
Based on their kinetics it is possible to distin- constant of the enzyme-inhibitor complex, at
guish among rapid reversible, tight-binding, equilibrium (Equation 17.2).
slow-binding, slow-tight-binding, irreversible,
and pseudoirreversible inhibitors. Conversely,
inhibitors classified on the basis of structure,
such as ground-state analogs, multisubstrate
inhibitors, and transition-state analogs, which The lower the Ki value, the better the in-
mimic the structures of substrates and prod- hibitor, given that the equilibrium lies more in
ucts, reaction intermediates, and transition favor of enzyme-inhibitor complex formation.
states, may fall into any of the kinetic catego- The affinity of an inhibitor for an enzyme may
ries. However, before introducing these cate- be related to the standard free energy (AG") of
gories, it is important to have an understand- a system by Equation 17.3.
722 Approaches to the Rational Design of Enzyme Inhibitors
displacement of water. Indeed, there are no polar solvents. As discussed above, because of
hydrophobic forces in the gas phase or in non- its high net permanent dipole moment, water
polar solvents. However, collectively, hydro- is very polar and has a large dielectric con-
phobic forces are thought to transcend other stant. The high polarity of water greatly di-
types of forces, particularly in the folding of minishes the attraction or repulsion forces be-
proteins, in all biological systems. tween any two charged groups giving rise to
the leveling effect of water. It is somewhat dif-
2.1.1 Electrostatic Forces. Although we re- ficult to predict the exact strength of a charge-
cognize that, in essence, all forces between at- charge interaction between an enzyme and an
oms and molecules are electrostatic, here we inhibitor. For example, the formation of a salt
use the term to describe ion-ion, ion-dipole, bridge (charge-charge) interaction between an
and dipole-dipole interactions. At physiologi- enzyme (Enz) and an inhibitor (I) may be de-
cal pH, the side-chains of basic residues such scribed by Equation 17.6.
as lysine and arginine and, to a lesser extent,
the imidazole ring of histidine will be proton-
ated, whereas the acidic groups on the side
. -
~ n z - f i ~ , (H20), + I-COP (HZO), =
chains of aspartic and glutamic acid residues
will be deprotonated. In addition, the N-termi-
nal amino groups and C-terminal carboxylates Both the charged species are initially sol-
will be ionized. Therefore, in addition to atoms vated by water, and to form the salt bridge
with permanent and induced dipoles, an en- both ions must be desolvated. This comes at
zyme potentially will have several charged some enthalpic cost, but the freeing of water
groups available for binding to charged or po- molecules leads to a concomitant, favorable in-
larized groups on a substrate or inhibitor. As crease in entropy. The strength of the ion pair
described by Equation 17.5, the electrostatic will depend on the stability of the salt bridge
force (F)between the charged atoms (q, and vs. that of the individual solvated ions. If the
q,) will depend on the distance between the salt bridge is buried in a relatively hydropho-
charged groups (r)and the dielectric constant bic active site, it is less solvated and will be
of the surrounding medium (D). more favored than the same interaction in a
solvent-exposed active site.
pounds and water. The optimal distance be- significant in nonpolar solvents, water greatly
tween the atoms is the sum of each of their van diminishes their magnitude. The energy of the
der Wads radii, so these forces come into play amide-amide N H . -0hydrogen bond is about
only when there is good complementarity be- 5 kcdmol, and is typical for hydrogen bonds
tween enzyme and inhibitor. Although van (60).
der Wads forces are quite weak, usually It should be remembered that, for a hydro-
around 0.5-1.0 kcal/mol for an individual at- gen bond to form between an enzyme and an
om-atom interaction, they are additive and inhibitor, any hydrogen bonds between the in-
can make an important contribution to inhib- hibitor and water, as well as those between the
itor binding. enzyme and water, must be broken (Equation
17.7).
2.1.3 Hydrophobic Interactions. Hydropho-
bic interactions may be described as entropy-
based forces. When a nonpolar compound is
dissolved in water, the strong water-water in-
teractions around the solute lead to an effec-
tive "ordering" of the structure of the solvent.
This is entropically unfavorable; that is, there
is negative entropy of dissolution. When a
nonpolar inhibitor binds to a nonpolar region
of an enzyme, all the ordered water molecules
become less ordered as they associate with
bulk solvent, leading to an increase in entropy. Overall, the total number of hydrogen
According to Equation 17.4 any increase in en- bonds remains constant and, provided that
tropy will lead to a decrease in free energy and, the hydrogen bonds between the inhibitor and
through Equation 17.3, stabilization of the en- enzyme are not significantly more favorable
zyme-inhibitor complex. It has been calcu- than those between water and the inhibitor or
lated that a single methylene-methylene in- those between water and the enzyme, the net
teraction releases about 0.7 kcdmol of free change in enthalpy is usually insignificant. On
energy. Even though this figure is not high, the other hand, formation of the enzyme-in-
given that enzymes and inhibitors usually hibitor complex usually leads to an overall'in-
have large regions of hydrophobic surface, this crease of entropy because the inhibitor re-
type of bonding may also play a significant role mains bound to the enzyme and the formerly
in inhibitor binding. bound water molecules are released.
2.1.4 Hydrogen Bonds. A hydrogen bond 2.1.5 Cation-.rr Bonding. Recently it has
occurs when a proton is shared between two become apparent that there is another impor-
electronegative atoms e . , -X-H. . .Y). tant noncovalent binding force that may be
Electron density is pulled from the hydrogen exploited when designing enzyme inhibitors.
by X, giving the hydrogen a partial positive Cations, from simple ions such as Lit to more
charge that is strongly attracted to the non- complex organic molecules such as acetylcho-
bonded electrons of Y. The bond is usually line, are strongly attracted to the electron-
asymmetric, with one of the heteroatoms, the rich (T) face of benzene and other aromatic
hydrogen bond donor, having a normal cova- compounds (61,62).Cation-T bonds, as well as
lent bond distance to the proton. The other other amino-aromatic interactions, are com-
heteroatom, the hydrogen bond acceptor, is mon in structures in the protein data bank
usually at a distance somewhat shorter than (631, and it has been estimated that more than
the van der Wads contact distance and, for 25% of tryptophan residues are involved in in-
optimal hydrogen bonding, the atoms should teractions of this type (64). The finding that
be arranged linearly. A hydrogen bond is a spe- the cationic group of acetylcholine was bound
cial type of dipole-dipole interaction and, as we primarily by aromatic residues, most espe-
have seen, although these forces can be quite cially by a tryptophan residue, not by the ex-
2 Rational Design of Noncovalently Binding Enzyme Inhibitors
pected carboxylate anion, provided evidence As can be seen from the following discus-
that cation-.rrinteractions may play an impor- sion, it is not difficult to carry out a kinetic
tant role in ligand binding (65,66). Model sys- analysis of a single-substrate reaction such as
tems suggest that, energetically, the cation-.rr that described in Equation 17.8. However, as
interaction can compete with full aqueous sol- more substrates are added the task becomes
vation in binding cations (61), and there is more complex. Fortunately, kinetic analysis of
now significant effort being expended in enzymatic reactions involving two or more
studying the contribution of these interac- substrates can be made easier by varying the
tions to molecular recognition (62,661. concentration of only one substrate at a time.
In summary, the Ki provides an indication By keeping all but one of the substrates at
of the relative stability of the enzyme-inhibi- fixed, saturating concentrations, the reaction
tor complex compared to stability of the en-
rate will depend only on the concentration of
zyme and inhibitor free in solution. Moreover,
the varied substrate. This permits the use of
it is clear that entropy, enthalpy, and water
the kinetic analysis employed for enzyme-cat-
will all have a major impact on the binding of
an inhibitor to an enzyme. alyzed, single-substrate reactions even for
complex multisubstrate reactions. In a further
2.2 Steady-State Enzyme Kinetics
simplification, the dissociation of the E P .
complex is assumed not to be rate limiting,
Just as an appreciation of the forces involved and the reversion of product to substrate is
is essential to comprehending the binding of assumed to be negligible. The latter assump-
an inhibitor to an enzyme, so is an under- tion is valid under what are known as initial
standing of the kinetic analysis of an enzyrne- velocity conditions, that is, when less than
catalyzed reaction essential to any kinetic about 5%of substrate has been consumed. Un-
evaluation of an inhibitor. In this section we der these conditions, the concentration of P is
provide a brief introduction to the study of low, and Equation 17.8 simplifies to Equation
enzyme kinetics, particularly steady-state ki- 17.9.
netics. Regardless, the reader is advised to re-
fer to other sources for more in-depth reviews
of the kinetic equations and mathematical
derivations involved (38, 60, 67-71).
2.3.1 Types of Rapid, Reversible Inhibitors. Here, Ki, sometimes called the inhibition
Binding of these inhibitors follows simple constant, is the equilibrium constant for the
2 Rational Design of Noncovalently Binding Enzyme Inhibitors
[EI[II
K. = - (17.17)
[E-I]
-
From Equation 17.21 it is clear that non-
competitive inhibitors have an effect only on
V,, decreasing it by a factor of (1 + [IIIKJ,
consequently giving the impression of reduc-
ing the total amount of enzyme present. As
with an uncompetitive inhibitor, a portion of
the enzyme will always be bound in the non-
productive enzyme-substrate-inhibitor com-
-
plex E S I, causing a decrease in maximum
e
E
I
+ noncompetitive
+ irreversible
inhibitor
2.3.4 Examples of Rapid Reversible Inhibi- tions are required for inhibition, and their in-
tors. Competitive inhibitors are often similar hibition is readily overcome by any buildup of
in structure to one of the substrates of the substrate. However, they are often useful probes
reaction they are inhibiting. Inhibitors of this for determining enzyme specificity and even
type are sometimes called substrate analogs mechanism. Phenylethanolamine N-methyl-
and their binding affmity (K,) usually approx- transferase (PNMT) catalyzes the terminal
imates that of the substrate. One of the first step in epinephrine (adrenaline) biosynthesis,
reactions inhibited by a substrate analog was the conversion of norepinephrine to epineph-
that catalyzed by succinate dehydrogenase rine (Equation 17.25), with concomitant con-
(Equation 17.24).
version of S-adenosyl-L-methionine (SAM,
- AdoMet) to S-adenosyl-L-homocysteine (SAH,
- succinate
02C-CH2-CH2-C02 AdoHcy).
dehydrogenase S-Adenosyl-L-homocysteine (10) (Fig.
succinate
17.12),the product of the reaction, and 2-(2,5-
dichlorophenyl)cyclopropylamine (11)are an-
alogs of S-adenosyl-L-methionine and norepi-
nephrine, respectively. Using these inhibitors
it was possible to ascertain the binding order
of the two substrates (75). Kinetic analyses
fumarate showed that SAH was a competitive inhibitor
of SAM and a noncompetitive inhibitor of nor-
This reaction is competitively inhibited by epinephrine, whereas (11)was a competitive
malonate (-00CCH2C00-) that has, like inhibitor of norepinephrine and an uncom-
succinate, two carboxylate groups. It is there- petitive inhibitor of SAM. This indicates that
fore able to bind to the enzyme's active site the binding of substrates is ordered, with SAM
but, with only one carbon atom between the binding first. If norepinephrine bound first, it
carboxylates, further reaction is impossible. would be expected that SAH would be an un-
Substrate analogs are rarely useful as en- competitive inhibitor and (11)would be non-
zyme inhibitors, given that large concentra- competitive with respect to SAM. If a random
Norepinephrine PNMT
Epinephrine
Approaches to the Rational Design of Enzyme Inhibitors
Time
(14) (16)
this species binds with one hydroxyl, coordi- the active site of arginase as tetrahedral spe-
nating to one of the two requisite manganese cies at alkaline pH (82). Of course, compound
ions. At pH 9.5 the tetrahedral species is the (12) is unable to form the tetrahedral species
major form and this initially binds also with and is a competitive inhibitor at all times.
one hydroxyl coordinated to a manganese ion. Leucine arninopeptidase (LAP) is a metal-
Then, in a second, slower step, a water mole- loenzyme that has been inhibited in a slow-
cule that bridges the two active-site manga- binding manner. This exopeptidase catalyzes
nese ions is displaced by a second hydroxyl the hydrolysis of N-terminal amino acids, par-
group on the boronic acid (83). Support for ticularly those with a leucine at the N-termi-
this mechanism is provided by crystal struc- nus, although it does have a broad specificity
tures, showing both (15) and (16)are bound in (Equation 17.32).
1
leueine aminopeptidase
738 Approaches to the Rational Design of Enzyme Inhibitors
(17) (18)
Bestatin (17) (Fig. 17.15) and amastatin coagulation pathway, which cleaves pro-
(18)have been identified as slow-tight-binding thrombin forming thrombin that, in turn, pro-
inhibitors of LAP from porcine kidney, with Ki motes blood clotting (Equation 17.33).
values in the low nanomolar range (84). Later, Inhibitors of Factor Xa activity offer poten-
bestatin was shown to be a slow-bindinginhib- tial as anticoagulants and several irreversible
itor of LAP employing mechanism B, with a Ki inhibitors of Factor Xa have been developed.
value of 0.11 pit4 and a Ki*value of 1.3 nM. One of the few tight-binding reversible inhib-
Values of 1.5 X lo-' s-' and 2 X lo-, s-' itors of Factor Xa is BnS02-D-kg-Gly-kg-ke-
were obtained fork, and k-, (Equation 17.291, tothiazole (19).
respectively (85). It was assumed that the in- The inhibitor could be displaced from Fac-
hibition of bovine lens leucine aminopeptidase tor Xa by substrates and, based on steady-
(blLAP) by amastatin would also proceed by state assumptions, the dissociation constant
mechanism B. This prediction was supported for (19) was found to be 14 pM (87). However,
by an X-ray crystallography study of the the reaction progress curves indicated a slow-
amastatin-blLAP complex (86), which sug- binding process, probably by mechanism B.
gested that (18)(and, by analogy, 17) initially Stopped-flow fluorescence studies, combined
binds to a Zn2+atom in a groove in the active with kinetic analysis, showed that the isomer-
site. The slow step in binding was seen as a . .
ization step (E I + E I*) is unusually fast
subsequent coordination to a second Zn2+ and that the formation of E I is, at least, par-
atom located deeper in the active site (86). tially rate limiting.
It is difficult to find clear-cut examples of In some instances the type of inhibition has
slow-binding inhibition occurring by mecha- been found to be isozyme specific. For exam-
nism A. However, the inhibition of Factor Xa ple, inducibly expressed isozymes (iNOS) and
by a peptidyl-a-ketothiazole was found to be constitutively expressed isozymes (cNOS) of
unusual because it appeared that the forma- nitric oxide synthase (NOS) all catalyze the
.
tion of E I was partially rate limiting. Factor conversion of L-arginine to L-citrulline and ni-
Xa is a trypsinlike protease found in the blood tric oxide (Equation 17.34).
2 Rational Design of Noncovalently Binding Enzyme Inhibitors 739
0 0 0 0 0 0
II II II II II II
aHN-CH-C-NH-CH-C-NH-CH2-C-NH-CH-C-NH-CH-C-NH-CH-C-NH~
I I I I I
CH(CH3) CH2 CH2 CH(CH3) CH2
I I I I I
CH2 CH2 CH2 CH2 OH
I I I I
CH3 COOH CH2 CH3
I
NH
II
0 0
II
0
0
II
II +
0
I1
0
II
1Factor Xa
aHN-CH-C-NH-CH-C-NH-CH2-C-NH-CH-COz-+H3N-CH-C-NH-CH-C-NHa
I I I I I
CH(CH3) CH2 CH2 CH(CH3) CH2
I I I I I
CH2 CH2 CH2 CH2 OH
I I I I
CH3 COOH CH2 CH3
I
NH
I
"
15 nM (89).
coz Many more examples of these types of in-
L-citrulline Nitric oxide hibitors can be found in the review by Morri-
son and Walsh (78).
co2-
Figure 17.16. Inhibitors of ni-
(20) (21 tric oxide synthase.
Approaches to the Rational Design of Enzyme lnhibitors
II II II II
HO-P-0-P-0- HO-P-COO- HO-P-CHp-COO-
Figure 17.17. Pymphosphate
I I I
0- 0- 0-
analogs used to inhibit DNA
polymerase. Pyrophosphate (PPi) (22)
-
through a transition state before products are
formed. In addition, there are often some high- DNA polymerase
energy intermediates along the pathway. + dGTP ~92'
(17.35)
Knowledge and understanding of an enzyme's
mechanism permits the identification of the
high-energy intermediates and the prediction of
the structures of the transition states. Armed
with that knowledge, it is possible to design en-
zyme inhibitors based on the structures of the
various intermediates along the reaction path-
way. Inhibitors designed in this manner are oc-
casionally referred to as mechanism-based in-
hibitors. However, for purposes of this chapter, DNA polymerase catalyzes the transfer of a
we will reserve that term for the covalently bind- complementary deoxynucleoside monophos-
ing inhibitors described in Section 3. phate moiety from its triphosphate (dNTP) to
the 3' hydroxyl of the primer terminus, with
2.5.1 Ground-State Analogs. The ground subsequent release of pyrophosphate (PP,, eq.
state of an enzymatic reaction consists of the 17.35). Initially, phosphonoformate (22) and
substrates and the products. Compounds that phosphonoacetate (23) were identified as in-
mimic the substrate of an enzymatic reaction hibitors of HSV DNA synthesis (92). Detailed
have been examined earlier (Section 2.3) and kinetic studies (931,using DNA polymerase in-
are not discussed again here. There are many duced by avian herpes viruses, showed that
examples of enzymatic reactions that are in- phosphonoacetate (23) was a noncompetitive
hibited by some or all of the reaction products. inhibitor of the four dNTPs. At low levels of
Both epinephrine and S-adenosyl-L-homocys- dNTPs it was a noncompetitive inhibitor of
teine, for example, are inhibitors of phenyleth- the substrate DNA, becoming uncompetitive
anolamine N-methyltransferase (Equation at saturating dNTP levels. It was also found
17.35). In much the same way - as described that (23) was a competitive inhibitor of pyro-
earlier for substrate analogs, product analogs phosphate, with a Ki value in the low micro-
can also be used to obtain information about molar range, in the dNTP-PP, exchange reac-
the binding mechanism of enzymes (90). tion catalyzed by a turkey virus DNA
Phosphonoformate (22) (Fig. 17.17) is an polymerase (93). The inhibition patterns were
antiviral agent that is used clinically in the identical to those observed using pyrophos-
treatment of herpes simplex virus (HSV) and phate as an inhibitor. Therefore it was con-
human cytomegalovirus (HCMV) (91). It acts cluded that (23) acted as an analog of pyro-
as a product analog, blocking the pyrophos- phosphate and competed for the same binding
phate-binding site, in the reaction catalyzed site (93). Later, both (22) and (23) were con-
by DNA polymerase (Equation 17.35). It is firmed as acting as pyrophosphate (i.e., prod-
also effective, using the same mechanism, uct) analog inhibitors of isolated HSV DNA
against HIV reverse transcriptase (91). polymerase (94).
2 Rational Design of Noncovalently Binding Enzyme Inhibitors 741
2.5.2 Multisubstrate Analogs. A large num- more tightly than substrate analog inhibitor
ber of enzymatic reactions involve the simul- because it has ( 1 ) the entropic advantage of
taneous binding of two or more substrates at reduced molecularity and (2)an additive bind-
the active site. The bound substrates must be ing contribution from each of the substrates it
in close proximity to each other and positioned mimics. For example, when two single-sub-
in such a way as to facilitate covalent bond strate analog inhibitors bind separately, but
formation or the transfer of a functional group next to each other, two sets of translational
from one substrate to another. Multisubstrate and rotational entropies are lost. However,
analog inhibitors mimic the simultaneous when a bisubstrate analog inhibitor binds it
binding of two or more substrates at the active loses only a single set of translational and ro-
site of the enzyme. The advantage of this, for a tational entropies (57, 60). Further, let us as-
bireactant system, is shown in Equation sume that the bisubstrate analog binds to the
17.36. same two sites as two single-substrate analog
inhibitors. In that case there will be a gain in
entropy from the release of water molecules
from each substrate-binding site, as well as
the favorable enthalpic contributions from the
formation of hydrogen bonds, buried salt
bridges, and so forth in each site. These favor-
able free-energy contributions will be the
same for a bisubstrate analog as for the two
individual inhibitors binding simultaneously.
On the other hand, compared to the binding of
a single-substrate analog, the multisubstrate
There are two ways the two substrates, A analog inhibitor gains favorable binding en-
and B, may bind to the enzyme to form an thalpies and entropies from the additional
-
E A B complex. First, and most likely, they binding site(s),while still losing only one set of
bind individually (in either a random or an translational and rotational entropies. Thus
ordered fashion) with dissociation constants the binding of a multisubstrate analog should
of KA and K,. Second, the substrates may be very tight, without needing any assistance
come together, positioned in such a way as to from transition-state complementarity.
facilitate their subsequent reaction with a dis- Inhibitors that combine two substrates are
sociation constant of KBi.This reactive com- termed bisubstrate analogs, whereas those
plex A B then binds to the enzyme with a
e combining three substrates are termed trisub-
dissociation constant of KMs.In general, the strate analogs and so on, with the former be-
-
formation of A B is entropically unfavorable. ing the most common. The design of a bisub-
However, a bisubstrate analog, designed to strate analog inhibitor ordinarily requires the
mimic A . B, can often be prepared by co- development of two single-substrate analog
valently connecting the corresponding sub- inhibitors of reasonable affinity. The two sin-
strates or substrate analogs with a suitable gle-substrate inhibitors are then connected by
linker group. Linking the two groups effec- an appropriate linker, and the optimal length
tively overcomes the unfavorable entropic of the linker is determined experimentally.
barrier. It has been calculated that an ideal Under normal circumstances, the Ki value for
bisubstrate analog inhibitor can bind up to 10' a bisubstrate analog inhibitor can be expected
times more tightly than the product of the to approximate the product of the Ki values of
substrate-binding constants (i.e., l/KBimay be the two substrate analogs. A guide to areason-
as high as lo-' M).This figure is based on ably achievable Kifor a bisubstrate analog also
entropic considerations and also assumes a may be obtained from the product of the KM
perfect fit of the bisubstrate analog inhibitor values of the individual substrates. For exam-
to the two binding sites on the enzyme (57). ple, if two substrates of an enzymatic reaction
Where does this high affinity come from? A have binding constants in the millimolar
multisubstrate analog inhibitor will bind range, a bisubstrate analog would be expected
742 Approaches to the Rational Design of Enzyme Inhibitors
tetrahydrofolate
to have a Kivalue in the micromolar range. tion. Several general reviews on multisub-
Note also that, if the enzyme binds substrates strate analog inhibitors have appeared (96-
in a random manner, then a multisubstrate 98), and multisubstrate analogs also receive
inhibitor should exhibit competitive inhibi- some discussion in reviews on transition-state
tion patterns with each substrate it mimics analogs (99-101).
because the binding of the inhibitor should be Glycinamide ribonucleotide transformy-
mutually exclusive with that substrate. If the lase (GAR TFase) catalyzes the transfer of a
enzyme employs an ordered mechanism, then formyl group from N1O-formyltetrahydrofo-
the inhibitor should be competitive with the late to glycinamide ribonucleotide (Equation
first substrate to bind and uncompetitive with
17.37). This is a crucial step in de novo purine
other substrates.
biosynthesis, which is essential for cell divi-
The multisubstrate analog approach to en-
zyme inhibition has the additional advantage sion, and GAR TFase has become a target en-
in that it provides a high degree of specificity. zyme for the deveIopment of antineopIastic
The combination of two or more substrates agents.
will usually produce a unique structure, un- Inglese et al. (102) were able to synthesize
likely to bind to other enzymes that may uti- the bisubstrate inhibitor p-thioGARdidea-
lize any one of the substrates. This approach zafolate (P-TGDDF) (24) (Fig. 17.18). This
has even been used to design isozyme-specific compound combines nearly all the features of
inhibitors (95). It should also be noted that the both substrates, linked by a stable thioether
distinction between a transition-state analog bridge, and was found to inhibit GAR TFase
(Section 2.5.3) and a multisubstrate analog in- with a Kivalue of 250 pM (102). P-TGDDF
hibitor is often quite arbitrary. In fact many acted as a slow, tight-binding inhibitor (Sec-
inhibitors described as transition-state ana- tion 2.4) and the Kivalue was about three
logs are often actually analogs of high-energy times lower than the product of the K, values
reaction intermediates that, in turn, may have of the substrates. More recently, the crystal
structures somewhat akin to those of multi- structure of the complex between BW1476U89
substrate analog inhibitors. However, multi- (25) and GAR TFase was obtained (103).
substrate analog inhibitors are intended to BW1476U89 is another multisubstrate analog
mimic the combined substrates in their and has a Kivalue of about 100 pit4 (104). The
ground-state forms and do not require any structure confirms that the inhibitor binds in
contribution from transition-state stabiliza- those sites identified previously as substrate-
2 Rational Design of Noncovalently Binding Enzyme Inhibitors
binding sites, and provides a starting point for making ATCase also a target for potential an-
development of even more potent transition- ticancer agents.
state analogs. N-Phosphonoacetyl-L-aspartate (PALA) (26)
The condensation of carbamyl phosphate (Fig. 17.19) was initially designed as a transi-
and L-aspartate, catalyzed by aspartate trans- tion- state analog inhibitor of ATCase (105). It
carbamoylase (ATCase), produces N-carba- was found to have a Kivalue of 27 nM,a value
myl-L-aspartate (Equation 17.38). This is one that is considerably lower than the KMvalues
of the early steps in de novo pyrimidine bio- of 27 pM and 17 rnM for carbamyl phosphate
synthesis, also a requirement for cell division, and L-aspartate, respectively (105). PALA was
0- 0 COOH 0- 0 COOH
I I1 I I II I
0=P-CH2-C-NH-CH-C~2-~O~~ O=P-0-C-NH2 CH2-CH2-COOH
I I
Figure 17.19. Putative transition state, substrate, and inhibitors of aspartate transcarbamylase.
Approaches to the Rational Design of Enzyme Inhibitors
COOH
I
aspartate
0 CH2
I
7
transcarbamvlase k-HN-CH
I
+pi
CoAS OH
COOH
(17.39)
found to inhibit cell growth in vivo (106) and, HMGCoA
eventually, underwent clinical trials as an an- + 2H' ------+
reductase
ticancer agent (107).
PALA provides an example of the difficul-
ties in distinguishing between a multisub- mevalonic acid
strate analog and a transition-state analog. As
shown in Fig. 17.19, in effect PALA (26) com-
bines two fragments, an analog of carbamyl + CoASH + 2NADP'
phosphate (27) and succinate (28). The tight
binding of PALA also suggested it was a poten- Several statin inhibitors of HMG-CoA re-
tial transition-state analog. However, succi- ductase are shown in Fig. 17.20. They consist
nate has a Ki value of 90 fl,and the product of rigid, hydrophobic groups connected to an
of the Ki values of succinate and carbamyl HMG-like group that, in inhibitors such as
phosphate is 24 nM,which is almost identical mevastatin (compactin) (31),simvastatin (9)
to the Ki value of PALA (105).As shown in Fig. (Fig. 17.11, and the dichlorophenol derivative
17.19, the transition-state structure (29) for (32) is present in the form of a lactone. In vivo,
the ATCase-catalyzed reaction is tetrahedral. the lactone is converted to the free acid, as
The pyrophosphate analog (30) was expected shown in Fig. 17.20 for mevastatin (33). More
to provide a much better mimic of the transi- recently developed statins, such as fluvastatin
tion state, yet its Ki value of 0.24 f l was ten- (34) and atorvastatin (35),are prepared as the
fold higher than that of PALA (108). It is not free acids. These inhibitors have Ki values in
clear why there is this discrepancy, but a re- the low nanomolar range (110), significantly
cent X-ray structure of the ATCase-PALA lower than the KM value of the substrate
complex identified several groups that are po- HMG-CoA, which is in the micromolar range
sitioned to bind to a tetrahedral transition (110,111). Given that these inhibitors did not
state (109). Two of these, the side chain of appear to be transition-state analogs, Naka-
Gln137 and the backbone carbonyl of Pro266, mura and Abeles (112,113) conducted a num-
were positioned to interact with the amino ber of experiments to determine the basis
group of the putative transition state (29). of the enhanced aMinity of, in particular, (31)
However, these groups would not be expected and (32).
to interact so well to the analogous oxygen Both mevastatin (31)and (32) were found
atom of the pyrophosphate transition-state to bind to the hydroxymethylglutarate portion
2 Rational Design of Noncovalently Binding Enzyme Inhibitors
HO
COO-
H3C
CI
"O y 'coo-
of the active site, but not the NADPH region, some cases (e.g., mevastatin), the hydrophobic
whereas only (31) bound to the coenzyme A group overlaps the CoASH site and in others,
portion. D,L-Mevalonateand D,L-3,5-hydroxy- such as the dichlorophenol group of (32),it
valerate, used as analogs of the upper portion does not (112). The structure of the statin is
of the statins, were both poor inhibitors, with analogous to that of a bisubstrate inhibitor, in
Ki values in the millimolar range; however, that there is linked binding to two distinct
analogs of the hydrophobic decalin region of binding sites on the enzyme, leading to greatly
mevastatin showed no inhibitory effect (112). enhanced inhibition of the enzyme. For mev-
Given that the Ki value for mevastatin is al- astatin, the entropic advantage provided by
most eight orders of magnitude lower than linking the mevalonate and decalin portions
that of D,L-3,5-hydroxyvalerate, it is clear that together is estimated to be approximately 5 x
the hydrophobic lower portion (and its cova- lo4 M (113). This is quite a reasonable en-
lent link) must play a significant role in the hancement, given that the theoretical maxi-
binding of (31)and, by implication, the bind- mum is 10' M (57),and it has been suggested
ing of all the statins. Presumably, the upper that such a "hydrophobic anchor" is responsi-
portion of the inhibitor is necessary for speci- ble for the enhanced binding of some inhibi-
ficity and the hydrophobic region for binding tors of alcohol dehydrogenase and adenosine
affinity. The hydrophobic region must be rel- deaminase (113).
atively nonspecific because a variety of hydro- Although this explanation appeared quite
phobic groups (Fig. 17.20) are accepted. In reasonable, it was thrown into doubt when X-
746 Approaches to the Rational Design of Enzyme Inhibitors
H ~ N - A S ~ - A ~ ~ - V ~ ~Tyr-
- Ile-His-Pro-Phe-His-Leu- COO-
Angiotensin I
I converting
angiotensin
enzyme
(ACE) (17.40)
+
H ~ N - A S ~ - A ~ ~ - V ~ ~~~r-11e-
- is-pro- he- coo + HsN-His-Leu- COO-
Angiotensin I1
ray structures of HMG-CoA reductase com- ACE, which had been isolated from a South
plexed with both substrates and products American pit viper (117).
were obtained (114, 115). These structures At that time the structure of ACE was un-
showed that, if the statins bound so the HMG- known, although it had been identified as a
like groups bound the HMG-binding pocket of zinc metalloprotease. It was surmised that its
the active site, the bulky hydrophobic groups mechanism and active site may resemble that
of the statins would clash with the residues of another metalloprotease, carboxypeptidase
lining the narrow pocket into which part of the A, whose X-ray structure was known. (R)-2-
coenzyme A bound (115). However, recently, Benzylsuccinic acid (36) (Fig. 17.21) had been
Istvan and Deisenhofer have obtained X-ray identified as a potent inhibitor of carboxypep-
structures of HMG-CoA reductase bound to tidase A, and it was suggested that (36) resem-
six individual statins, including (9), (31), (34), bled the collected products of the hydrolysis
and (35) (116). This study showed the sub- reaction (Fig. 17.21). In other words, (36) was
strate-binding pocket rearranges to accommo- a biproduct analog and, not unexpectedly, it
date the statins, that the statins do bind to the was found to bind with an affinity resembling
HMG-binding region, that a shallow hydro- the combined afhities of the two products
phobic groove now accommodates the hydro- (118). Carboxypeptidase A appeared to have
phobic groups, and that none of the NADP(H)- three main interactions with (36). Two sub-
binding pocket is occupied (116). In toto, the strate-binding sites bound the phenyl group
structural studies supported all interpreta- and one carboxylate, and the Zn2+ ion, usually
tions made some 15 years earlier based on ki- coordinated to the carbonyl of the amide bond
netic studies, and provided definitive evidence being cleaved, was now bound to the second
for a hydrophobic anchor enhancing the bind- carboxylate. Combining those suggestions
ing of the mevalonate portion of the statins. with studies with viper venom peptides, indi-
The evolution of the angiotensin convert- cating that a C-terminal proline was effective
ing enzyme (ACE) inhibitors is an illuminat- in inhibiting ACE, a number of carboxyal-
ing story in the development of enzyme inhib- kanoylproline derivatives were tested as ACE
itors as therapeutic agents. As shown in inhibitors (119). Of these, the succinyl-L-pro-
Equation 17.40, ACE catalyzes the conversion line derivative (37) was found to be the most
of angiotensin I to angiotensin 11. effective, with an IC,, value of 330 pM. Given
Angiotensin 11, itself a potent hypertensive that one carboxylate bound to the Zn2+ ion, a
agent, also stimulates the release of a second better zinc ligand, a thiol group, was substi-
hypertensive agent, aldosterone. In addition, tuted for this carboxylate, resulting in (38)
ACE catalyzes the cleavage of the nonapeptide with the IC,, value now reduced to 0.2 pM.
vasodilating agent, bradykinin (not shown). Finally, after taking into account the differ-
Therefore an ACE inhibitor was seen to have ences between the active sites of ACE and car-
the potential to limit three hypertensive ac- boxypeptidase A, captopril(39) was prepared.
tions. This premise was validated by in vivo Captopril was found to be a competitive inhib-
results with teprotide, a peptide inhibitor of itor of ACE, with a Ki value of 1.7 nM,and was
2 Rational Design of Noncovalently Binding Enzyme Inhibitors
CH2 CH2
I carboxypeptidase A I I
-02C-C-H -02C-C-H -02C-C-H
I H20
I I
NH NH2 CH2
I 0
C=O HO\C//
+
I
I I HO
R
R
(39)
1
COOH
0
RO
/
\
C-CH-
-
CH~
I
H
CH3 0
'
N-CH-C-N
" 3COOH
0
\
C-CH-N-CH-C-N
/ -- H
RO
CH~ COOH
I
Figure 17.21. (a) Biproduct analog inhibitor of carboxypeptidaseA and (b) several ACE inhibitors.
the first ACE inhibitor to be marketed. It was also possible that enalaprilat acts as a transi-
not long before attempts were made to make tion-state analog (Section 2.531, thereby ac-
capropril more productlike, with the resultant counting for its performing as a slow-tight-
development of enalaprilat (i.e. enalaprilat) binding inhibitor (121). Following enalapril,
(40). Enalaprilat was found to be a slow-tight- many more ACE inhibitors have been devel-
binding inhibitor (Section 2.4) of ACE with a oped mainly aimed at increasing oral bioavail-
Ki value below 1 nM,(120), but it was poorly ability, removing side effects, or improving
absorbed orally. However, the ethyl ester metabolism.These include ramipril (42), the
(enalapril), (41) acted as a prodrug, had good ester prodrug of ramiprilat (43), with 10 times
oral activity, and was marketed. Note that it is better bioavailability than that of enalapril.
748 Approaches to the Rational Design of Enzyme Inhibitors
Ramaprilat was also shown to be a slow-tight- zyme inhibitors. Such compounds, referred to
binding inhibitor of ACE, operating by mech- as transition-state analogs, can theoretically
anism B, with Ki*(Equation 17.30) of 7 pM have ratios of the binding constants of inhibi-
(122). A more detailed discussion of the devel- tor to substrate (Ki/Ks)on the order of lop8to
opment of the ACE inhibitors is available 10-14. In addition, transition-state analogs
(121). may have the further advantage of reduced
molecularity, as outlined earlier (Section
2.5.3 Transition-State Analogs. As a chemi- 2.5.2) for multisubstrate analog inhibitors.
cal reaction proceeds from substrates to prod- Several reviews on the theory and general as-
ucts, it will pass through one or more transi- pects of transition-state analog inhibitors are
tion states. The energy barrier imposed by the
available and are recommended for a more
highest energy transition state controls the
complete understanding of this topic (37, 96,
overall rate of the reaction. Enzymes bring
about rate enhancements of 1010-1015 (123) 99, 100, 124, 125).
by lowering this energy barrier. They do this The design of a good transition-state mimic
by having a greater affinity to the structure of is quite challenging. It requires, at the least,
the transition state than to the structures of sufficient knowledge of the mechanism of the
either substrates or products. Although an en- target enzyme to predict transition-state
zyme may have good affinity for its substrate, structure(s). This is why transition-state ana-
as evidenced by a low dissociation constant logs are sometimes (but not in this review)
(K,, Equation 17.411, for the Michaelis (E S) referred to as mechanism-based inhibitors. A
complex, the enzyme can further stabilize the detailed knowledge of the true energy profile,
inherently unstable transition state, for exam- including details such as the existence of dis-
ple, by forming extra electrostatic or hydrogen tinct chemical steps, high-energy intermedi-
bonds, by providing more effective hydropho- ates, and their associated transition states, is
bic interactions, or by using structural re- also useful (126). Further, by definition, the
arrangements to exclude solvent, thereby transition state is unstable, often highly
strengthening existing electrostatic contacts. charged, and possesses partially broken/
formed covalent bonds. Designing a stable
compound that will closely mimic a transition
state is impossible. However, the Hammond
postulate states that the transition state be-
tween a reactant and a high-energy reaction
intermediate wiIl resemble the intermediate
rather than the reactant. It is possible to de-
sigdsynthesize an analog of a high-energy in-
termediate. Indeed, the majority of so-called
transition-state analogs are actually analogs
of high-energy reaction intermediates. Al-
Simple transition-state theory states that though a clear distinction exists, the design
the rate of an enzyme-catalyzed reaction is process is, for all practical purposes, the same.
correlated with the rate of a noncatalyzed re- It should also be noted that an enzyme is
action by the same factor as the affinity of an designed to initially recognize the features of
enzyme for the transition state to the affinity its substrates. Often substrate binding brings
of an enzyme for a substrate (Equation 17.41) about a conformational change in the enzyme
(99). that will then maximize the'attractive forces
Therefore, the magnitude of enzymatic between the enzyme and transition state. The
catalysis (k,/k,) is related to the enhanced transition-state analog may not possess those
binding of the transition state to the enzyme features of the substrate that facilitate rapid
(K,IK,). Compounds that can take advantage binding, even though its affmity for the en-
of this enhanced binding to the transition zyme is extremely high. Although some tran-
state can prove to be potent and selective en- sition-state analogs bind rapidly to enzymes,
2 Rational Design of Noncovalently Binding Enzyme Inhibitors
Figure 17.22. (a) Thermolysin-catalyzed hydrolysis of peptide analogs showing putative transition
state, (b) phosphonamidate peptide analog, and (c) fluoroalkane peptide analog.
others bind slowly and show the properties of quantitatively the correlation between the en-
the slow-binding inhibitors described earlier hanced rates of enzymatic reactions and the
in Section 2.4. tight binding of transition-state analogs.
Slow binding, tight binding, or structural In an attempt to develop stringent criteria
similarity to the assumed transition-state for the distinction between transition- state
structure are not, in themselves, sufficient cri- and ground-state analogs, Bartlett and Mar-
teria to establish that an inhibitor is a true lowe (127) overcame some of these inherent
transition-state analog (127). Methotrexate difficulties by comparing the binding affinities
(7), for example, is an extremely high-affinity of a series of substrate analogs with those of
(K,= 58 pM), slow-binding inhibitor of dihy- the corresponding transition-state analogs.
drofolate reductase (128). On the surface, it One consequence of Equation 17.41 is that, if
would appear that methotrexate could be clas- there is a change in structure of a substrate
sified as a transition-state analog. However, that alters kc,JKM without altering the non-
crystallographic studies have shown that enzymatic rate of reaction, then an analogous
methotrexate binds with its pterin ring in the structural change in the transition-state
opposite orientation to that of the substrate, mimic should bring about a similar change in
dihydrofolate (129, 130). To distinguish be- K,. Put simply, there should be a linear rela-
tween a high-affinity, ground-state analog and tionship between the values of Kifor the tran-
a putative transition-state analog requires a sition-state analog and k,,JKM for the corre-
careful appraisal. There is a fundamental dif- sponding substrate. Bartlett and Marlow
ference between the entropy change of a uni- (127) designed a series of dipeptide analog
molecular enzymatic reaction and that of a substrates (44) (Fig. 17.22) for thermolysin in
multimolecular solution reaction (131). In ad- which the structural variation was remote
dition, the appropriate rate constant for the from the reactive center and therefore un-
nonenzymatic reaction is often either not likely to affect the nonenzymatic reaction
available or hard to obtain (132, 133). These rate. The reaction catalyzed by thermolysin is
factors combine to make it difficult t o evaluate proposed to proceed by the tetrahedral transi-
750 Approaches to the Rational Design of Enzyme Inhibitors
Table 17.6 Correlation of Ki Values for Inhibitors of Thermolysin with K , and K&,, Values
for the Corresponding Substratesa
0. G - 0 H CH&H(CH&
Y7
\
\C1
\C-R Inhibitor Data
Corresponding Substrate Data
CBz-NHCH2 NH
II
R = ~-Ala
R = NH,
R = Gly
R = L-Phe
R = L-Ala
R = L-Leu
~ \ c ~ H 2 c H ( c H ' i 2
CBz-NHCH2 C-R
II
R = Gly
R = L-Ala
R = L-Leu
R = L-Phe
"Data are from Ref. 127.
tion state (451, and with their long P-0 nase. Inhibitors of this enzyme have been used
bonds, the phosphonamidate compounds (46, as immunosuppressants and are also potential
Table 17.6) were expected to a d as transition- antitumor agents, whereas lack of adenosine
state analog inhibitors. It was found that the deaminase results in severe combined immu-
Ki values for the putative transition-state an- nodeficiency disease (SCID).
alog inhibitors correlated linearly with the Adenosine deaminase (ADA), which cata-
KMlkCatvalues of the corresponding sub- lyzes the conversion of adenosine to inosine
strates, although no correlation was found be- (Equation 17.42), is an extremely proficient
tween Ki and KM (Table 17.6). The fact that enzyme, providing a rate enhancement of
substrate binding (K,) was relatively unaf- more than 12 orders of magnitude (123). The
fected by a change at a remote site was not enzyme-catalyzed reaction is thought to pass
unexpected, but the observation that the bind- through an unstable hydrated intermediate
ing of the phosphonamide inhibitors was (48) (Fig. 17.23), with a KT (Equation 17.41)
greatly affected suggests that these inhibitors in the region of 10-l7 M (123). Clearly, even a
were, indeed, transition-state rather than crude analog of (48) would have the potential
ground-state analogs. Conversely, the Ki val- to be an extremely powerful inhibitor of ADA.
ues for a series of fluoroalkene isosteres of the The structures of several inhibitors of ADA
same substrates (47) (Fig. 17.22) correlated are shown in Fig. 17.23. Of these, the antibi-
strongly with KMbut not KM/kCat(Table 17.6), otics coformycin (49) and (R)-deoxycoformy-
indicating that the latter inhibitors were cin (pentostatin, 50) were found to be potent
ground-state analogs (134). This approach has ADA inhibitors, with Ki values of 1 X 10-l1 M
also been used to confirm that phosphonic acid (136) and 2.5 x 10-l2 M (137), respectively.
peptides were transition-state analog inhibi- The KMfor adenosine is around 30 pJ4 (138,
tors of pepsin (135). 139),whereas the Ki of the product, inosine, is
One of the most popular targets for design M. Thus, the two antibiotics show at
of transition-state analogs is adenosine dearni- least lo6-foldgreater affinity for ADA than for
2 Rational Design of Noncovalently Binding Enzyme Inhibitors
H OH H OH
HN
I I I
Ribose Ribose 2'-Deoxyribose
(48) (49) (50)
H OH
Hz0
k q = 1 . 1 x107
I I I
Ribose Ribose Ribose
(53) (54) (55)
Figure 17.23. Reaction intermediate and inhibitors of the readion catalyzed by adenosine deaminase.
.,TN,
I
N NI
+NH,
However, it was observed that the structure of
the ligand and the enzyme were perturbed
when purine riboside bound to ADA. 13C-
NMR spectroscopy showed that the ADA-
bound purine riboside was sp3 hybridized at
Ribose C-6 (141).The NMR and W spectra suggested
that it was the hydrated form of purine ribo-
inosine
side (54) that was binding to ADA and, using
the unfavorable equilibrium constant for hy-
the substrate, suggesting that they are acting dration in solution (1.1 X lop7M, Fig. 17.231,
as transition-state analogs. By contrast, (5')- a true Ki value of 3 x 10-l3 M could be calcu-
deoxycoformycin (51) and &ketodeoxycofor- lated for (54) (142). Given the low concentra-
mycin (521, with Ki values of 33 and 40 a, tion of the free hydrate in solution, and the
Approaches to the Rational Design of Enzyme Inhibitors
Figure 17.24. (a) Putative transition state for the dihydroorotase reaction, and (b) boronic acid
transition-state analogs.
rapid onset of inhibition, it appears that pu- hibitor (pentostatin) has proved to be of ther-
rine riboside (53) itself initially binds and is apeutic benefit.
then rapidly converted to (54) in the active site Inhibitors of pyrimidine and purine biosyn-
(141). This result, along with the high affini- thesis are used as antineoplastic agents. As a
ties of (R)-coformycinand (R)-deoxycoformy- consequence, dihydroorotase, which catalyzes
cin, argues that the reaction proceeds by a ste- the third step of de novo pyrimidine biosynthe-
reospecific, direct attack of water rather than sis, the conversion of carbamyl aspartate to
the double-displacement mechanism that also dihydroorotate (Equation 17.43), is a target
had been proposed (141). More recently, an for therapeutic intervention.
X-ray study on adenosine deaminase, which
had been crystallized in the presence of purine
-
ribonucleoside, confirmed that it was the hy-
drated species of purine ribonucleoside that
I
H$Oc) dihydroorotase
was present in the active site (143). Further, a
triad of a zinc atom, a histidine residue, and an
aspartic acid residue ensured that the binding
was stereospecific, with the 6R isomer (55)be-
ing favored. carbamyl aspartate
The adenosine deaminase story, in many
ways, provides a perfect example of the gen-
eral principles of enzymatic catalysis and the
utility of enzyme inhibitors. ADA is an ex-
tremely efficient catalyst, producing a rate en-
hancement of 12 orders of magnitude. 6R-Hy-
droxy-1,6-dihydropurine riboside (55) has an
affinity for ADA about 8 orders of magnitude
greater than that for substrates or products; dihydroorotate
that is, it expresses a substantial fraction of
the free energy of binding that separates the The reaction is thought to proceed through
transition state from the ground state in an the tetrahedral-activated complex (56) (Fig.
enzymatic reaction. Evidence of the extraordi- 17.24), which is a highly charged, unstable
nary ability of an enzyme to discriminate be- sp 3 carbon species (144, 145). At around neu-
tween stereoisomers is provided by the lO7- tral pH, compound (57), a boron-containing
fold difference in binding affinities of the analog of carbamyl aspartate, rearranges to
8R-OH (50) and 8s-OH (51)stereoisomers of the stable, tetrahedral boronic acid derivative
2'-deoxycoformycin. Inhibitors were used to (58). The affinity of (58) for dihydrooro-
differentiate among several potential reaction tase (Ki= 5 a) was found to be 10-fold
mechanisms for ADA and, finally, an ADA in- greater than that of the carbamyl aspartate
2 Rational Design of Noncovalently Binding Enzyme Inhibitors 753
KM= 50 pM, indicating that (58) was proba- tane derivative (63), in which the chair-chair
bly acting as a transition-state analog (145). conformation is fixed, was about the same as
Tetrahedral boronic acid structures are sta- that of (61). Taken together, this implied that
ble, unlike the analogous sp3 carbon species, the reaction proceeded through a chairlike
and boronic acid derivatives of substrate pep- transition state (147).
tides have proved to be quite potent inhibitors This approach was later refined by Bartlett
of a variety of proteases, particularly serine and Johnson, who suggested that IC5dKMra-
proteases (146). tios of 7 for compound (61) and 12 for com-
Chorismate mutase catalyzes the conver- pound (63) indicated that these inhibitors
sion of chorismate to prephenate (Equation were not particularly good transition-state an-
17.44). This reaction is unusual, in that it is alogs (148). In an attempt to improve potency,
the only pericyclic [3,3] sigmatropic rear- and to further define the stereochemistry of
rangement (Claisen rearrangement) that is the transition state, they synthesized several
catalyzed by an enzyme. compounds including the exo- and endo-car-
boxy unsaturated oxabicylic ethers, (64) and
(65), respectively (148). The exo-compound
-chorismate
mutase
(64) was not significantly better than its satu-
rated carbocyclic analog (61), but the endo-
derivative (65) bound chorismate mutase
some 100-fold more tightly than did chorismic
acid under the same conditions, with a Ki
value of 120 nM (148,149). Later, monoclonal
antibodies elicited against (65) were found to
chorismate be effective catalysts for the conversion of cho-
0 (17.44)
rismate to prephenate, with rate enhance-
ments of 200-fold in one case (150) and 10,000-
fold in another (151). In both instances it was
suggested that the rate enhancement was at-
tributable to increased binding of the transi-
tion state by the antibody (150, 151).
X-ray structures are now available of the
complexes of (65) with two chorismate mutase
prephenate enzymes (152, 153), as well as with the less-
efficient catalytic antibody (154). Although
Although chorismate mutase does provide each active site was found to employ a differ-
a rate enhancement of 2 x lo6 (1471, this uni- ent constellation of interactions with (651,the
molecular reaction readily occurs without en- dissociation constants for the binding of (65)
zyme, under mild conditions. The reaction to the three proteins were strikingly similar,
was expected to pass through a chairlike tran- ranging from 0.6 to 3 pM(153,154). However,
sition state (59)(Fig. 17.25) but early molecu- the micromolar affinity of (65) for both en-
lar orbital calculations indicated that the boat- zyme and antibody is considerably weaker
like transition state (60) was not out of than might be expected for a good mimic of the
the question (147). In an attempt to define transition state, and the antibody is not a par-
the transition-state structure, several com- ticularly efficient catalyst. Wiest and Houk
pounds, each designed to mimic a putative (155) have calculated that the bond lengths for
transition state, were synthesized and tested the breaking and forming bonds in the transi-
as chorismate mutase inhibitors (147). The tion state are considerably longer than those
enzyme was found to be inhibited by the exo- for (65), and the neutral inhibitor does not
carboxy nonane (61), with an apparent Ki mimic the charge separation that builds up in
value of 3.9 x lop4M. Conversely, the endo- -
the transition state. Although the two en-
carboxy nonane (62) did not inhibit the en- zyme-active sites have evolved to complement
zyme. The apparent Ki value of the adaman- the larger, polarized transition state, the anti-
Approaches to the Rational Design of Enzyme Inhibitors
body has no residues positioned to stabilize --SH group of cysteine, and the --COOH
the polar-transition state. Further, the active groups of aspartic and glutamic acid residues.
site is smaller and makes more van der Wads Other nucleophilic groups include the E-amino
contacts with the inhibitor, again features group of lysine and the imidazole ring of histi-
likely to impede catalysis. These features pro- dine. In some cases the -NH, and --COOH
vide evidence for the innate difficulties associ- groups of the enzyme's N- and C-termini, re-
ated with designing both good transition-state spectively, are also active-site nucleophiles,
analogs and efficient catalytic antibodies. whereas enzymic cofactors may also provide
targets for covalently binding inhibitors. Argi-
3 RATIONAL DESIGN OF COVALENTLY nine is the only common amino acid that has
BINDING ENZYME INHIBITORS an electrophilic side chain and it also can be
modified with suitable nucleophilic agents.
For the purposes of this chapter we have di- Kyte has recently provided an excellent over-
vided covalently binding enzyme inhibitors view of the general area of active-site modifi-
into categories according to Table 17.4. cation and labeling (156).
Pseudoirreversible inhibitors are discussed The first group of covalently binding en-
separately and the others are, in order of in- zyme inhibitors, the chemical modifiers, are
creasing specificity, chemical modifiers, afKn- small organic molecules, generally eledro-
ity labels, and mechanism-based inhibitors. philes, that are used to modify the enzyme's
The targets for these inhibitors are the chem- side chains in such a way as to produce a stable
ically reactive groups found within the en- covalent bond. These are often used to study
zyme's active site. These groups, in the major- enzyme inactivation and to identify residues
ity of cases, are nucleophiles such as the --OH potentially involved in binding and catalysis.
groups of serine, threonine, and tyrosine, the Some of the commonly used reagents are
3 Rational Design of Covalently Binding Enzyme Inhibitors 75 5
listed in Table 17.7. These compounds are zyme's active site in a noncovalent fashion,
chemically reactive and may lead to the modi- like rapid reversible inhibitors. However,
fication of both catalytic and nonessential res- upon formation of the enzyme-inhibitor com-
idues. As a consequence, experimental design plex (E I), they react by various mechanisms
e
(such as choice of reagent and reaction condi- with one or more amino acid residues in close
tions, use of substrate protection, etc.) is of proximity in the enzyme's active site. This re-
utmost importance in carrying out and inter- sults in covalent bond formation between the
preting chemical modification studies. Al- enzyme and the inhibitor (E-I) (Equation
though inhibitors of this type are not the 17.45).
prime focus of this chapter (and are not dis-
cussed further), it should be noted that most of
the kinetic equations that apply to affinity la-
bels also apply to chemical modifiers, and
there are a number of texts available that
cover this topic (40, 157, 158).
Although the organic modifiers are usually Usually the inhibitor contains an electro-
not specific for a given enzyme, the second philic moiety that labels amino acids contain-
group, the affinity labels, have a degree of ing nucleophilic groups. However, in some
specificity built in. Sometimes described as ac- cases, a nucleophilic species may be formed,
tive-site directed, irreversible inhibitors, af- which can react either with arginine or with
finity labels are usually substrate or product any tightly bound organic or inorganic low
analogs that contain an additional chemically molecular weight cofactors possessing electro-
reactive moiety. They first bind to the en- philic sites. Unlike the mechanism-based in-
hibitors described below, affinity labels do not active enzyme (E).In some instances the reac-
require activation by catalysis at the enzyme's tion may occur between the reactive species
active site. Most often, the covalent bond for- and the enzyme's cofactor, again resulting in
mation occurs by an S,2 alkylation-type inactivation of the enzyme.
mechanism, Schiff base formation, or acyla- It should also be noted that the activation
tion (156, 159). of a mechanism-based inhibitor by its target
Affinity labels, some of which have become enzyme is, formally, an example of metabolic
successful therapeutic agents, are often used activation. However, there is a clear distinc-
to identify catalytically important residues. In tion between the activation of a mechanism-
some cases, by examining the pH dependency based inhibitor described above and the meta-
of the rate of inactivation, it is possible to de- bolic activation of a prodrug. In the latter case,
termine the pK, of the labeled residue. Again, an inactive precursor is metabolized in the
there are a number of excellent reviews on this body (either chemically or enzymatically) to
topic (160-163), including a complete volume metabolites that possess the desired activity.
in the Methods in Enzymology series (159). For example, Acyclovir (3a)must be metabol-
Recently, Pratt (164) and Krantz (165) ically converted to the triphosphate (3b)and
have suggested that any inactivator that uti- released into the medium before it will inhibit
lizes an enzyme's mechanism, in the broadest viral DNA polymerase. Further discussion on
sense, should be described as a mechanism- prodrugs may be found in volume 2, chapter
based inhibitor. Although this is not unrea- 14.
sonable, we have, for the purposes of this
chapter, adopted the more narrow view of Sil- 3.1 Evaluation of the Mechanism of
verman (166). In this view, mechanism-based Inactivation of Covalently Binding
inhibitors (also called suicide substrates, Tro- Enzyme Inhibitors
jan horse inactivators, enzyme-induced inacti- The inherent complexity of the inactivation
vators, k , inhibitors, and latent inactivators) mechanisms of covalently binding enzyme in-
are described as unreactive compounds, the hibitors makes it necessary to evaluate their
structure of which usually resembles that of a proposed modes of action carefully. An over-
substrate or product of the target enzyme, and view of the criteria for the study of irreversible
that undergo a catalytic transformation by the inhibitors is provided below.
enzyme to species that, before release from the
active site, inactivate the enzyme. Thus, these 3.1.1 Criteria for the Study of Affinity La-
compounds usually contain a latent, reactive bels. The evaluation of affinity labels is based
functional group that gets activated during on the fulfillment of the following criteria:
the normal catalysis of the enzyme. Upon for-
mation of the initial reversible enzyme-inhib- 1. Irreversible inactivation. Inactivation by
.
itor complex E I, the enzyme starts its nor- affinity labels leads to irreversible covalent
bond formation between the enzyme and
mal catalytic cycle, leading in a usually rate-
determining step to the formation of a highly the inhibitor. Unlike the complex between
.
reactive species, E I' (Equation 17.46). and enzyme and a rapid, reversible inhibi-
tor, the covalent enzyme-inhibitor complex
is no longer in equilibrium with free en-
zyme and inhibitor. Therefore, exhaustive
dialysis or gel filtration of the covalent en-
zyme-inhibitor complex cannot lead to the
recovery of free, active enzyme. However,
such experiments do not allow distinction
The reactive species can either react with among tight-binding, noncovalent inhibi-
one of the enzyme active-site amino acid resi- tors, affinity labels, and mechanism-based
dues, to form a covalent bond between the en- inactivators.
zyme and the inhibitor (E-I"), or be released 2. Time- and concentration-dependent inacti-
into the medium to form product (P) and free vation showing saturation kinetics. The
3 Rational Design of Covalently Binding Enzyme Inhibitors
0L \
Time
Figure 17.26. Pseudo first-order inactivation ki-
netics of an active-site directed irreversible inhibi-
tor.
Figure 17.27. Kitz and Wilson plot.
Increasing [S]
constant [I]
3.1.2 Criteria for the Study of Mechanism- (171).In these cases, as with affinity labels,
Based Inactivators. In addition to the require- nonspecific covalent modification of resi-
ments described above for an affinity label, a dues other than those located in the active
mechanism-based inhibitor should also dem- site cannot be excluded. A second test for a
onstrate the following: metabolically activated affinity label is to
add an additional aliquot of fresh enzyme
1. Occurrence of a catalytic step. The major to the incubation buffer. The fresh enzyme
difference between the mechanism of inac- should be inactivated at a higher rate than
tivation of mechanism-based inactivators that of the first equivalent of enzyme be-
vs. that of any other type of inhibitor is the cause there is more reactive species present
obligate involvement of a catalytic step, in solution. By contrast, the mechanism-
that is, step 2 in Equation 17.46. Initially, based inhibitor should show no difference
the mechanism-based inhibitor binds re- in rate until the concentration of inhibitor
.
versibly to form the E I complex. The en- is depleted. It should also be noted that the
zyme then starts its normal catalytic cycle, observation of such rate increases necessi-
resulting in the conversion of the inhibitor tates that the reactive species is relatively
into a reactive species (I r). If the reactive stable and is not immediately quenched by
species is electrophilic,it may react with an the incubation buffer.
active-site nucleophile, much like an affin- Additional tests such as the addition of
ity label. If the reactive species is nucleo- nucleophilic scavengers (e.g., thiols such as
philic, it may react with an electrophilic dithiothreitol or P-mercaptoethanol) can
species on the enzyme, probably an oxi- provide further evidence for the presence of
dized cofactor. Finally, a radical species a free, reactive electrophilic species. The
may be generated that has the potential to scavengers should quench all of the free re-
react with an enzyme radical, or generate active species, thereby protecting the en-
one by hydrogen atom abstraction. The ex- zyme from inhibition. Unfortunately, this
periments necessary to provide evidence method cannot exclude the possibility that
for a catalytic step are obviously strongly a nucleophilic thiol may even attack the
dependent on the individual catalytic bound reactive species at the active site of
mechanism involved. The experiments the enzyme (which would also give rise to
may include spectrophotometric detection protection from inactivation). However,
of oxidized or reduced cofactor, observing the use of a bulky thiol, such as reduced
C- H bond cleavage by monitoring the re- glutathione, should limit that possibility.
lease of tritium, or the detection of some An alternative scenario occurs wherein the
component of cleaved inhibitor (such as released reactive species returns and reacts
fluoride ion as in some examples shown faster with an active-site nucleophile than
below). with the added thiol. Clearly this is a com-
2. No release of the activated species before en- plex problem and, consequently, it is advis-
zyme inactivation. For a mechanism-based able to use several different tests to avoid
inactivator to retain its high specificity misleading conclusions.
during inactivation, release of the reactive 3. Partition ratio. The partition ratio is the
species from the active site must not be ratio of product release to enzyme inactiva-
part of the normal mechanism of inactiva- tion and is a measure of the efficiency of the
tion. A time-dependent increase in the rate mechanism-based inhibitor. Formally, it
of inactivation points to the release of an refers to the ratio k d k , (Equation 17.46).
activated species before inactivation. This The most efficient inactivators will have
increase in the rate of inactivation is partition ratio of zero. In those cases, the-
brought about by the accumulation of free oretically, every enzymatically processed
reactive species in solution. Inhibitors gen- inhibitor molecule will result in the inacti-
erated in this manner have been termed vation of a molecule of enzyme. Even
metabolically activated affinity labels though the partition ratio is independent of
Approaches to the Rational Design of Enzyme Inhibitors
NH
I chymotrypsin
o=s=o -
compound was designed to mimic substrates provide a point of covalent attachment (175).
of chymotrypsin such as the tosyl-L-phenylal- TPCK was shown to irreversibly inhibit chy-
anine methyl ester (Equation 17.48), thereby motrypsin (it is still employed today to remove
providing a basis of affinity for the chymotryp- chymotrypsin from trypsin preparations) by
sin-active site. specifically labeling a histidine residue (175),
In addition to mimicking a substrate, it later identified as His57 (176). After the suc-
contains the halomethyl ketone moiety, to cess of TPCK, chloromethyl ketones became
methyl N-tosyl-L-phenylalanine
762 Approaches to the Rational Design of Enzyme Inhibitors
extremely popular for the inactivation of pro- a-phenylglycidate to the enzyme, the electro-
teases. By incorporating part of the sequence philic epoxide group would be subject to attack
of the physiological substrate into the halo- by the nucleophile responsible for a-proton
methyl ketone, it was possible to obtain selec- abstraction in the normal catalytic cycle. Fur-
tive inactivation of individual proteases (177). ther confirmation is provided by the X-ray
This selective inactivation also meant that structure of (S)-atrolactate bound to the race-
chloromethyl ketones became widely used as mase (181), which reveals that Lys166 has
probes for the binding requirements and been pushed away by the a-methyl group of
chemically reactive residues in the active sites (S)-atrolactate (which is positionally equiva-
of serine proteases, in particular. Replace- lent but much larger than the a-proton in (S)-
ment of the chloromethyl ketone moiety by a mandelate). In both structures the positions of
diazomethyl group provided a specific inacti- the remaining active-site residues are almost
vation of cysteine proteases (172). The use of identical.
TPCK has not been restricted to chymotryp-
sin, as elegantly demonstrated in a recent re-
port on the inhibition of human aldehyde de- mandelate
arachidonic acid
peroxidase
(17.50)
H
thromboxane A2
prostaglandin Hz
Approaches to the Rational Design of Enzyme Inhibitors
COOH
Figure 17.32. (a) Inactivation of prostaglandin H2 synthase by aspirin, and (b) inhibitors cocrys-
tallized with prostaglandin synthase.
(79)
0 0
0
11 SCH2C- CCH2Br COO-
-0-P-0-CH2 I
I NH-CH-CH2-COO-
0-
HO OH
0
(80)
I1
-0-P-0-CH2
I
0-
HO OH
(81
the PLP-dependent enzymes is extremely well agents, by providing an increase in the concen-
characterized, making the design process tration of GABA in the brain.
somewhat easier. The initial steps in the GABA aminotransferase (GABA transami-
mechanism for a PLP-dependent enzyme are nase, GABA-T) catalyzes the conversion of
shown in Equation 17.51. y-aminobutyric acid to succinic semialdehyde
with the subsequent transfer of an amino
group to pyruvate (Equation 17.52).
GABA-T
+ -coo
H3N -
GABA
/
pyruvate L-alanine
succinic
semialdehyde
COO- COO-
PMP + - -
?COO
NH2
NH~
- ornithine
decarboxylase
+
-
5a-reductase
NADPH
testosterone
(17.54)
putrescine
T 0
Lys69
I '1
C-o
P-
H ~ N ~ C H F - i J
COO-
+ ykF2
NH2 - -
fNH
non-enzymatic
slow
0
H H
(94) (93)
eventually dissociate and form dihydrofinas- tance to vancomycin (211, 212). As a conse-
teride (94), although the half-life of 14 days quence, VanX has become a prime drug target
also points to the effectiveness of finasteride for overcoming vancomycin resistance and a
as a steroid 5a-reductase inhibitor. number of transition-state analogs have been
Enzymes involved in steroid biosynthesis prepared (213,214).
have proved to be good targets, both for ther- The enzyme was also shown to process
apeutic intervention and for mechanism- dipeptides with bulky C-terminal amino
based inactivators (2). Aromatase, for exam- groups (213) and, using this knowledge, a
ple, catalyzes the final, rate-limiting step in novel mechanism-based inhibitor was re-
estrogen biosynthesis (Equation 17.55). Aro- cently developed (215). Its mechanism is
matase has proved susceptible to mechanism-
shown in Fig. 17.37.
based inhibitors such as formestane and ex-
(95) is a dipeptide-like analog of D-Ala-D-Ala
emestane. These are now both used in the
and is readily accepted by VanX. Cleavage of
treatment of breast cancer (210).
In the last decade there has been a consid- the peptide bond and elimination of D-alanine
erable increase in the occurrence of antibiotic- results in the formation of the metastable 2-p-
resistant microbial pathogens. Vancomycin, difluoromethylthioglycine (961, which spon-
one of the last resort antibiotics for treating staneously decomposes, yielding ammonia,
some gram-positive bacterial infections, inhib- glyoxylic acid, and p-difluoromethyl thiophe-
its peptidoglycan synthesis by binding the no1 (97). Elimination of a fluoride ion results
terminal D-alanyl-D-alanine(D-Ala-D-Ala) in the electrophilic 4-thioquinone fluoro-
dipeptide from pentapeptide precursors of methide (98), which irreversibly alkylates the
Enterococcus cell walls. VanX is a zinc-depen- enzyme (99). Interestingly, the turnover of
dent D-Ala-D-Ala dipeptidase (Equation 17.56), the analog was faster than that of D-Ala-D-Ala
which has been implicated in high-level resis- itself. However, the partition ratio of 7500 in-
3 Rational Design of Covalently Binding Enzyme Inhibitors
aromatase
___)
testosterone estradiol
OH
formestane exemestane
H
J2!
3 COO-
ible inhibitor will be determined by a combi- active enzyme is regenerated. In the first class,
nation of the rate of formation of the covalent exemplified by inhibitors of acetylcholinester-
enzyme inhibitor adduct and the half-life for ase, the enzyme is regenerated as the covalent
reactivation. E-I' bond is hydrolyzed (i.e., k, % k-,). As
As may be expected, criteria for the study of shown in Equation 17.58 , acetylcholinester-
pseudoirreversible inhibitors are very similar ase catalyzes the hydrolysis of acetylcholine,
to those for both affinity labels and mecha- yielding choline and acetate.
nism-based inhibitors. However, because of Acetylcholine is a neurotransmitter that
the inherent reversibility of pseudoirrevers- relays nerve impulses across the neuromuscu-
ible inhibitors, it may be more difficult to lar junction. Acetylcholinesterase (AcChE)
obtain structural evidence for the covalent en- rapidly breaks down acetylcholine, thereby
zyme inhibitor adduct. Further, determina- lowering its concentration in the synaptic cleft
tion of the rate of reactivation and character- and ensuring that nerve impulses are of a fi-
ization of the products of the recovery process nite length. As shown in Fig. 17.38, a nucleo-
will also be of major importance in designating philic serine residue reacts with the substrate
an inhibitor as pseudoirreversible. to form an acetyl-serine intermediate (100)
Pseudoirreversible inhibitors can be bro- with concomitant release of choline. This in-
ken into two classes, depending on how the termediate is then rapidly hydrolyzed by wa-
0 CH3 0 CH2
II +/ acetylcholinesterase II + /
H3C-C-0-CH2CH2N-CH3 H3C-C-0- + HO-CH2CH2N-CH3
\ H2O \ (17.58)
CH3 CH3
acetylcholine acetate choline
3 Rational Design of Covalently Binding Enzyme Inhibitors
AcChE
S
1 AcChE
I
0
II II
I
Ser
I
Ser
I
0
I Figure 17.38. (a) Mecha-
nism of reaction, (b) irrevers-
ible inhibitors, and (c,d)
pseudoirreversible inhibitors
(109) of acetylcholinesterase.
774 Approaches to the Rational Design of Enzyme Inhibitors
ter, producing acetate and regenerated en- 2.5.3). The aMinity of the inhibitor for AcChE
zyme. Agents such as parathion (101) and could be decreased (with a concomitant in-
sarin (102) have found utility as insecticides crease in the value of k,,), by sequentially re-
and nerve gases, respectively, because they re- ducing the number of fluorine atoms into the
act with the enzyme to form the active-site methyl group adjacent to the ketone (220). Fi-
serine-phosphate esters, (103) and (104). nally, it should be noted that the two classes of
These esters are hydrolyzed extremely slowly pseudoirreversible inhibitor can be differenti-
by water, making the inhibition effectively ir- ated by examining the decomposition products
reversible (i.e., both k-, and k, are very of the inhibition reaction. When hydrolysis is
small), although the inhibition can be over- required for enzyme regeneration, cleavage
come with high concentrations of strong nu- products, such as substituted carbamates, will
cleophiles such as hydroxylamine. be in evidence. Conversely, the trifluoro-
More recently, it has been established that methyl ketones will not be broken down by
inhibitors of acetylcholinesterase may play a AcChE and no decomposition products will be
role in the memory enhancement in patients observed.
with Alzheimer's disease (217). Unlike (101)
and (102), carbamate inhibitors such as phy-
sostigmine (105) and rivastigmine (106) are 4 CONCLUSIONS
classified as pseudoirreversible inhibitors be-
cause they react with AcChE to form a car- Enzyme inhibitors have long played an impor-
bamylated serine (107). By comparison with tant role in medicine, pharmacology, and basic
the serine-phosphate ester, the carbamylated research. The advances in DNA technology
serine is rapidly hydrolyzed, thereby regener- have enabled cloning and overexpression of
ating AcChE. For example, reactivation of the large numbers of enzymes, and the ap-
physostigmine-inactivated enzyme is rapid, proaches described in this chapter have al-
with a t,,, of less than 40 min (218). Rivastig- ready led to the development of novel thera-
mine, a more useful therapeutic agent, is con- peutic agents. However, in the postgenomics
siderably longer acting, with a half-life of more era, large numbers of new targets have been
than 10 h (217, 219). Overall, for pseudoirre- identified. Although the drug discovery pro-
versible inhibitors of this type, the effective- cess moves toward structure-based drug de-
ness and duration of the "irreversible" inhibi- sign as its prime tool, even with high-through-
tion will be controlled by the chemical nature put crystallography, not all target proteins
of the groups transferred to the active-site nu- will be readily accessible. The evolution of al-
cleophile, making it readily amenable to ma- gorithms that can predict enzyme function
nipulation. and mechanism will ensure that the rational
In pseudoirreversible inhibitors of the sec- design of enzyme inhibitors not only comple-
ond class, the enzyme is regenerated by the ments structure-based approaches but contin-
inhibitor simply dissociating from the en- ues to play a stand-alone role in the discovery
zyme; that is, the binding is covalent but re- of novel therapeutics.
versible (k-, * k,). This class can also be ex-
emplified by an AcChE inhibitor. For example,
the trifluoromethyl ketone (108) binds to REFERENCES
AcChE as a slow-binding inhibitor (Section 1. M. Sandler and H. J. Smith, Eds., Design of
2.4.1) with a Ki value of 0.06 nM,and a koff Enzyme Inhibitors as Drugs, Oxford Univer-
value of 6.7 X s-I (220). A linear corre- sity Press, Oxford, 1989.
lation was observed between Ki values of a se- 2. M. Sandler and H. J. Smith, Eds., Design of
ries of fluoromethyl ketones and the V,,IKi Enzyme Inhibitors as Drugs, Vol. 2, Oxford
value for the corresponding substrate (220). University Press, Oxford, 1994.
This suggests (127) that the tetrahedral ad- 3. P. Krogsgaard-Larsen, T. Liljefors, and U.
duct (log), in effect, mimics the transition Madsen, Eds., A Textbook of Drug Design and
state (or a high-energy intermediate), thereby Deuelopment, 2nd ed., Harwood Academic,
accounting for the high affinity (Section Amsterdam, 1996.
References
4. H. J. Smith, Ed., Smith and Williams' Intro- 25. W . L. Washtien, Mol. Pharmacol., 25,171-177
duction to the Principles of Drug Design and (1984).
Action, Harwood Academic, Amsterdam, 1998. 26. D. R. Seeger, J. Am. Chem. Soc., 71, 1753
5. G. Gregoriadis, Trends Biotechnol., 13, 527- (1949).
537 (1995). 27. D. A. Matthews, R. A. Alden, J. T.Bolin, S . T .
6. I. A. Bakker-Woudenberg, G. Storm, and M . C. Freer, R. Hamlin, N. Xuong, J. Kraut, M. Poe,
Woodle, J. Drug Target., 2,363371 (1994). M. Williams, and K. Hoogsteen, Science, 197,
7. J. Kreuter, Adv. Drug Del. Rev., 47, 65-81 452-455 (1977).
(2001). 28. B. Lippert, B. W . Metcalf, M. J. Jung, and P.
Casara, Eur. J. Biochem., 74,441-445 (1977).
8. F. C. Neuhaus and J. L. Lynch, Biochemistry,
3,471-480 (1964). 29. M. Cziraky, Pharmacoeconomics, 14 (Suppl.
3), 29-38 (1998).
9. R. R. Rando, Biochem. Pharmacol., 24, 1153-
1160 (1975). 30. A. Endo, M. Kuroda, and K. Tanzawa, FEBS
Lett., 72,323-326 (1976).
10. J. N. Delgado and W. A. Remers, Eds., Wilson
and Gisvold's Textbook of Organic Medicinal 31. V . W . Rodbell, Adv. Lipid Res., 14, 1 (1976).
and Pharmaceutical Chemistry, 10th ed., Lip- 32. A. W. Alberts, J. Chen, G. Kuron,V . Hunt, J.
pincott-Raven, Philadelphia, 1998. Huff, C. Hoffman, J. Rothrock, M. Lopez, H.
11. N. Rastogi and H. L. David, Res. Microbiol., Joshua, E. Harris, A. Patchett, R. Monaghan,
144,133-143 (1993). S. Currie, E. Stapley, G. Albers-Schonberg, 0.
Hensens, J. Hirshfield, K. Hoogsteen, J.
12. A. G. Gilman, T . W . Rall, A. S. Nies, and P. Liesch, and J. Springer, Proc. Natl. Acad. Sci.
Taylor, Eds., Goodman and Gilman's the USA, 77,3957-3961 (1980).
Pharmacological Basis of Therapeutics, 8th
33. A. P. Lea and D. McTavish, Drugs, 53, 828-
ed., Pergamon, New York, 1990, p. 985.
847 (1997).
13. H. J. Schaeffer,L. Beauchamp, P. de Miranda,
34. H. S. Yee and N. T . Fong, Ann. Pharmacother.,
G. B. Elion, D. J. Bauer, and P. Collins, Nature,
32, 1030-1043 (1998).
272,583-585 (1978).
35. S. Y u , K. Sugahara, K. Nakayama, S. Awata,
14. P. A. Furman, M . H . S t Clair, and T . Spector,
and H . Kodama, Metabolism, 49, 1025-1029
J. Biol. Chem., 259,9575-9579 (1984).
(2000).
15. G. B. Elion, P. A. Furman, J. A. Fyfe, P. de 36. A. Hestnes, 0.Borud, H. Lunde, and L. Gjess-
Mianda, L. Beauchamp, and H. J . Schaeffer, ing, J. Ment. Defic. Res., 33,261-265 (1989).
Proc. Natl. h a d . Sci. USA, 74, 5716-5720
(1977). 37. G. R. Stark and P. A. Bartlett, Pharmacol.
Ther., 23,45-78 (1983).
16. J. P. Durkin and T . Viswanatha, J. Antibiot.
(Tokyo), 31,1162-1169 (1978). 38. I. H. Segel, Enzyme Kinetics: Behavior and
Analysis of Rapid Equilibrium and Seady-
17. S. J. Cartwright and A. F. Caulson, Nature, State Enzyme Systems, Wiley-Interscience,
278,360-361 (1979). New York, 1975.
18. R. Labia, V. Lelievre, and J. Peduzzi, Biochim. 39. E. Shaw in P. D. Boyer, Ed., Chemical Modifi-
Biophys. Acta, 811,351357 (1980). cation by Active-Site Directed Reagents, Aca-
19. C. Reading and T . Farmer, Biochem. J., 199, demic Press, New York, 1970, pp. 91-147.
779-787 (1981). 40. R. L. Lundblad, Chemical Reagents for Protein
20. R. L. Charnas and J. R. Knowles, Biochemistry, Modification, 2nd ed., CRC Press, Boca Raton,
20,3214-3219 (1981). FL, 1991.
21. A. R. English, J . A. Retsema, A. E. Girard, J . E. 41. H. F. Hixson, Jr. and A. H. Nishikawa, Arch.
Lynch, and W . E. Barth, Antimicrob. Agents Biochem. Biophys., 154,501-509 (1973).
Chemother., 14,414-419 (1978). 42. K. I. Skorey, N. A. Johnson, G. Huyer, and
22. K. P. Fu and H . C. Neu, Antimicrob. Agents M. J .Gresser, Protein Express. Purifi, 15,178-
Chemother., 15,171-176 (1979). 187 (1999).
23. P. S. Mezes, A. J. Clarke, G. I. Dmitrienko, and 43. F . A. Norris and P. W . Majerus, J . Biol. Chem.,
T . Viswanatha, FEBS Lett., 143, 265-267 269,8716-8720 (1994).
(1982). 44. M. Knockaert, N. Gray, E. Damiens, Y . T .
24. D. G. Brenner and J. R. Knowles, Biochemis- Chang, P. Grellier, K. Grant, D. Fergusson, J .
try, 23,5833-5839 (1984). Mottram, M. Soete, J. F. Dubremetz, K. Le
Approaches to the Rational Design of Enzyme Inhibitors
Roch, C. Doerig, P. Schultz, and L. Meijer, 61. D. A. Dougherty, Science, 271,163-168 (1996).
Chem. Biol., 7,411-422 (2000). 62. N. S. Scrutton and A. R. Raine, Biochem. J.,
45. J. S. Fowler, R. R. MacGregor, A. P.Wolf,C. D. 319, 1-8 (1996).
Arnett, S. L. Dewey, D. Schlyer, D. Christman, 63. S. K. Burley and G. A. Petsko, Adv. Protein
J. Logan, M. Smith, H. Sachs, et al., Science, Chem., 39,125-189 (1988).
235,481-485 (1987).
64. J. P. Gallivan and D. A. Dougherty, Proc. Natl.
46. D. V . Santi and G. L. Kenyon in M. E. W o l f f , Acad. Sci. USA, 96,9459-9464 (1999).
Ed., Burger's Medicinal Chemistry: Ap-
65. J. L. Sussman, M. Harel, F. Frolow, C. Oefner,
proaches to the Rational Design of ~ n z ~ Zn-
m e
A. Goldman, L. Toker, and I. Silman, Science,
hibitors, Wiley-Interscience, New York, 1980,
253,872-879 (1991).
pp. 349-391.
66. A. Ordentlich, D. Barak, C. Kronman, N. Ariel,
47. A. Muscate, C. L. Levinson, and G. L. Kenyon
Y . Segall, B. Velan, and A. Shafferman,J. Biol.
in M. Howe-Grant, Ed., Kirk-Othmer Encyclo-
Chem., 270,2082-2091 (1995).
pedia of Chemical Technology, 4th. ed, John
Wiley & Sons, New York, 1994, pp. 644-671. 67. A. Cornish-Bowden and C. W . Wharton, En-
zyme Kinetics, IRL Press, Oxford, 1988.
48. A. Patel, H. J. Smith, and J. Stiirzebecher, in
ref. 4, pp. 261-330. 68. W .W . Cleland in D. S. Sigman, and P. D. Boyer,
Eds., T h e Enzymes, 3rd ed., Vol. XIX, Aca-
49. P. Veerapandian, Ed., Structure-Based Drug
demic Press, San Diego, 1990, pp. 99-158.
Design, Marcel Dekker, New York, 1997.
69. T . Palmer, Understanding Enzymes, Prentice
50. J . E. Ladbury and P. R. Connelly, Eds., Struc-
HallEllis Horwood, London, 1995.
ture-Based Drug Design: Thermodynamics,
Modeling and Strategy, Springer-Verlag, Ber- 70. D. L. Purich, Ed., Contemporary Enzyme Ki-
lin, 1997. netics and Mechanism, 2nd ed., Academic
Press, San Diego, 1996.
51. E. M. Gordon and J. F . Kerwin, Jr., Eds., Com-
binatorial Chemistry and Molecular Diversity 71. R. A. Copeland, Enzymes: A Practical Intro-
in Drug Discovery, Wiley-Liss, New York, duction to Structure, Mechanism and Data
1998. Analysis, 2nd ed., Wiley-VCH, New York,
52. K. Gubernator and H.J. Bohm, Eds., Struc- 2000.
ture-Based Ligand Design, Wiley-VCH, Wein- 72. R. Eisenthal and A. Cornish-Bowden, Bio-
heim, Germany, 1998. chem. J., 139, 715-720 (1974).
53. J. P. Dirlam, L. J. Czuba, B. W . Dominy, R. B. 73. C. L. Tsou, Adv. Enzymol. Relat. Areas Mol.
James, R. M. Pezzullo, J. E. Presslitz, and Biol., 61, 381-436 (1988).
W .W .Windisch, J. Med. Chem., 22,1118-1121 74. M. Dixon, Biochem. J., 55, 170-171 (1953).
(1979).
75. R. G. Pendleton and I. B. Snow, Mol. Pharma-
54. J. C. Dearden and K. C. James i n T . J. Perun col., 9, 718-725 (1973).
and C. L. Propst Eds., Computer-Aided Drug
Design: Methods and Applications, Marcel 76. H. J. Fromm, i n ref. 70, pp. 207-227.
Dekker, New York, 1989, pp. 168-207. 77. J. F. Morrison, Trends Biochem. Sci., 7, 102-
55. T. Hogberg a n d U.Norinder, in ref. 3, pp. 95- 105 (1982).
129. 78. J. F. Morrison and C. T . Walsh, Adv. Enzymol.
56. Y . C. Martin, P. Willett, and S. R. Heller, Eds., Relat. Areas Mol. Biol., 61,201-301 (1988).
Designing Bioactive Molecules: Three Dimen- 79. S. E. Szedlacsek and R. G. Duggleby in D. L.
sional Techniques and Applications, American Purich, Ed., Methods in Enzymology,Vol. 249,
Chemical Society, Washington, DC, 1998. Academic Press, New York, 1995, pp. 144-180.
57. W . P. Jencks, Adv. Enzymol. Relat. Areas Mol. 80. M. J. Sculley, J. F. Morrison, and W . W . Cle-
Biol., 43, 219-410 (1975). land, Biochim. Biophys. Acta, 1298, 78-86
58. W . P. Jencks, Proc. Natl. Acad. Sci. USA, 78, (1996).
4046-4050 (1981). 81. D. H. Rich, J. Med. Chem.28,263-273 (1985).
59. J. Kyte, Structure in Protein Chemistry, Gar- 82. J. D. Cox, N. N . Kim, A. M. Traish, and D. W .
land Publishing, New York, 1995, pp. 147-196. Christianson, Nut. Struct. Biol., 6, 1043-1047
60. A. R. Fersht, Structure and Mechanism in Pro- (1999).
tein Science: A Guide to Enzyme Catalysis and 83. D. M. Colleluori and D. E. Ash, Biochemistry,
Protein Folding, Freeman, New York, 1999. 40,9356-9362 (2001).
References
84. S. H.Wilkes and J . M. Prescott, J. Biol. Chem., 107. R. F. Morton, E. T . Creagan, S. A. Cullinan,
260,13154-13162(1985). J. A. Mailliard, L. Ebbert, M. H. Veeder, and
85. A. Taylor, C. Z . Peltier, F. J. Torre, and N. M. Chang, J. Clin. Oncol., 5, 1078-1082
Hakarnian, Biochemistry, 32,784-790(1993). (1987).
86. H. Kim and W . N. Lipscomb, Biochemistry, 32, 108. N. M . Laing, W . W . Chan, D. W . Hutchinson,
8465-8478(1993). and B. Oberg, FEBS Lett., 260, 206-208
87. A. Betz, P. W .Wong, and U. Sinha, Biochemis- (1990).
try, 38,14582-14591(1999). 109. L. Jin, B. Stec, W . N. Lipscomb, and E. R.
88. E. P. Garvey, J. A. Oplinger, E. S. Furfine, R. J. Kantrowitz, Proteins., 37,729-742(1999).
Kiff, F. Laszlo, B. J . Whittle, and R. G. 110. A.Corsini, F.M. Maggi, and A. L. Catapano,
Knowles, J. Biol. Chem., 272, 4959-4963 Pharmacol. Res., 31,9-27(1995).
(1997). 111. K. M. Bischoff and V. W . Rodwell, Biochem.
89. E. S. Furfme, M . F. Harmon, J. E. Paith, and Med. Metab. Biol., 48, 149-158(1992).
E. P. Garvey, Biochemistry, 32, 8512-8517 112. C. E. Nakarnura and R. H. Abeles, Biochemis-
(1993). try, 24,1364-1376(1985).
90. B. F. Cooper and F. B. Rudolph, in ref. 70, pp. 113. R. H. Abeles, Drug. Dev. Res., 10, 221-234
183-205. (1987).
91. B. Oberg, Pharmacol. Ther., 40, 213-285 114. E. S. Istvan, M. Palnitkar, S. K. Buchanan, and
(1989). J. Deisenhofer,EMBO J., 19,819-830(2000).
92. L. R. Overby, E. E. Robishaw, J . B. Schleicher, 115. E. S. Istvan and J. Deisenhofer, Biochim. Bio-
A. Reuter, N. L. Shipkowitz, and J. C.-H. Mao, phys. Acta, 1529,9-18(2000).
Antimicrob. Agents Chemother., 6, 360-365
(1974). 116. E. S. Istvan and J . Deisenhofer, Science, 292,
1160-1164(2001).
93. S. S. Leinbach, J . M . Reno, L. F. Lee, A. F.
Isbell, and J . A. Boezi, Biochemistry, 15, 426- 117. M . A. Ondetti and D. W . Cushman in R. L.
430(1976). Soffer, Ed., Biochemical Regulation of Blood
Pressure,Wiley, New York, 1981, pp. 165-186.
94. B. Eriksson, A. Larsson, E. Helgstrand, N. G.
Johansson, and B. Oberg, Biochim. Biophys. 118. L. D. Byers and R. Wolfenden, Biochemistry,
Acta, 607,53-64(1980). 12,2070-2078(1973).
95. F. Kappler and A. Hampton, J. Med. Chem., 119. D. W . Cushman, H. S. Cheung, E. F. Sabo, and
33,2545-2551(1990). M . A. Ondetti, Biochemistry, 16, 5484-5491
(1977).
96. R. Wolfenden, Annu. Rev. Biophys. Bioeng., 5,
271306(1976). 120. H . G. Bull, N. A. Thornberry, M . H . Cordes,
A. A. Patchett, and E. H. Cordes, J. Biol.
97. A. D. Broom, Fed. Proc., 45,2779-2783(1986). Chem., 260,2952-2962(1985).
98. A. D. Broom, J. Med. Chem., 32,2-7(1989). 121. R. B. Silverman, The Organic Chemistry of
99. G.E.Lienhard, Science, 180,149-154(1973). Drug Design and Drug Action, Academic Press
100. A. Radzicka and R. Wolfenden, in ref. 70, pp. Inc., San Diego, 1992, pp. 162-175.
229-257. 122. P. Buenning, J. Cardiovascular. Res., 10
101. M . Mader and P. A. Bartlett, Chem. Rev., 97, (Suppl. 7): S31S35,1987.
1281-1301(1997). 123. A. Radzicka and R. Wolfenden, Science, 267,
102. J. Inglese, R. A. Blatchly, and S. J. Benkovic, 90-93 (1995).
J. Med. Chem., 32,937-940(1989). 124. R. Wolfenden, Transition States of Biochemi-
103. C. Klein, P. Chen, J. H. Arevalo, E. A. Stura, A. cal Processes, Plenum, New York, 1978.
Marolewski, M . S. Warren, S. J. Benkovic, and 125. V. L. Schrarnm,Annu. Rev. Biochem., 67,693-
I. A.Wilson, J. Mol. Biol., 249,153-175(1995). 720(1998).
104. E. C. Bigharn, W . R. Mallory, S. J . Hodson, 126. P. A. Bartlett, Y . Nakagawa, C. Johnson, S.
D. S. Duch, R. Ferone, and G. K. Smith, Het- Reich, and A. Luis, J. Org. Chem., 53, 3195-
erocycles, 35,1289-1307(1993). 3210(1988).
105. K.D. Collins and G. R. Stark, J. Biol. Chem., 127. P. A. Bartlett and C. K. Marlowe, Biochemis-
246,6599-6605(1971). try, 22,4618-4624(1983).
106. E. A. Swyryd, S. S. Seaver, and G. R. Stark, 128. J. W . Williams, J. F. Morrison, and R. G.
J. Biol. Chem., 249,6945-6950(1974). Duggleby, Biochemistry, 18,2567-2573(1979).
Approaches to the Rational Design of Enzyme Inhibitors
174. 5. A. Katzenellenbogen, Ann. Rep. Med. 199. T . M. Penning, TIPS, 212-217 (1983).
Chem., 222-233 (1974). 200. C. T .Walsh, Annu. Rev. Biochem., 53,493-535
175. G. Schoellmann and E. Shaw, Biochemistry, 2, (1984).
252-255 (1963). 201. M. G. Palfreyman, P. Bey, and A. Sjoerdsma,
176. E. B. Ong, E. Shaw, and G. Schoellmann, Essays Biochem., 23,28-81(1987).
J. Biol. Chem., 240, 694-698 (1965). 202. M. J. Jung and C. Danzin, in ref. 1, pp. 257-293.
177. C. Kettner and E. Shaw in L. Lorand, Ed., 203. B. Fr~lund,in ref. 3, pp. 264-266.
Methods in Enzymology, Vol. 80, Academic 204. R. Poulin, L. Lu, B. Ackermann, P. Bey, and A. E.
Press, New York, 1981, pp. 826-842.
Pegg, J. Biol. Chem., 267, 150-158 (1992).
178. M. Dryjanski, L. L. Kosley, and R. Pietruszko,
205. N. V. Grishin, A. L. Osterman, H. B. Brooks,
Biochemistry, 37,14151-14156 (1998).
M. A. Phillips, and E. J. Goldsmith, Biochem-
179. K. von der Helm, B. D. Korant, and J.C. Chero- istry, 38,15174-15184 (1999).
nis, Eds., Proteases as Targets for Therapy, 206. T . Liang, M. A. Cascieri, A. H. Cheung, G. F.
Springer-Verlag, Berlin, 2000.
Reynolds, and G. H. Rasmusson, Endocrinol-
180. J. A. Fee, G. D. Hegeman, and G. L. Kenyon, ogy, 117,571-579 (1985).
Biochemistry, 13,2533-2538 (1974).
207. B. Faller, D. Farley, and H. Nick, Biochemis-
181. J. A. Landro, J. A. Gerlt, J.W . Kozarich, C. W . try, 32,5705-5710 (1993).
Koo,V . J. Shah, G. L. Kenyon, D. J. Neidhart, 208. H. G. Bull, M. Garcia-Calvo, S. Andersson,
S. Fujita, and G. A. Petsko, Biochemistry, 33, W . F. Baginski, H. K. Chan, D. E. Ellsworth,
635-643 (1994). R. R. Miller, R. A. Stearns, R. K. Bakshi, G. H.
182. J. R. Vane, Nat. New Biol., 231,232-235 (1971). Rasmusson, R. L. Tolman, R. W . Myers, J. W .
183. J. B. Smith and A. L. Willis, Nut. New Biol., Kozarich, and G. S. Harris, J. Am. Chem. Soc.,
231,235-237 (1971). 118,2359-2365 (1996).
134. G. J. Roth, N. Stanford,and P.W . Majerus,Proc. 209. B. Azzolina, K. Ellsworth, S. Andersson, W .
Natl. h a d . Sci. USA, 72,3073-3076 (1975). Geissler, H. G. Bull, and G. S. Harris, J. Ste-
185. M . Hemler and W . E. Lands, J. Biol. Chem., roid Biochem. Mo2. Biol., 61, 55-64 (1997).
251,5575-5579 (1976). 210. V . C. Njar and A. M . Brodie, Drugs, 58, 233-
186. F. J. Van der Ouderaa, M. Buytenhek, D. H. 255 (1999).
Nugteren, and D. A. Van Dorp, Eur. J. Bio- 211. P. E. Reynolds, F. Depardieu, S. Dutka-Malen,
chem., 109,l-8 (1980). M . Arthur, and P. Courvalin, Mol. Microbiol.,
187. G. J. Roth, E. T . Machuga, and J. Ozols, Bio- 13,1065-1070 (1994).
chemistry, 22,4672-4675 (1983). 212. D. E. Bussiere, S. D. Pratt, L. Katz, J. M. Sev-
188. P. J. Loll, D. Picot, and R. M. Garavito, Nut. erin, T . Holzman, and C. H . Park, Mol. Cell., 2,
Struct. Biol., 2,637-643 (1995). 75-84 (1998).
189. D. Picot, P. J. Loll, and R. M. Garavito,Nature, 213. Z.W u , G. D.Wright, andC. T .Walsh, Biochem-
367,243-249 (1994). istry, 34,2455-2463 (1995).
190. L. H. Rome and W . E. Lands, Proc. Natl. Acad. 214. Z. W u and C. T . Walsh, Proc. Natl. h a d . Sci.
Sci. USA, 72,4863-4865 (1975). USA, 92,11603-11607 (1995).
191. R. F. Colman in T . E. Creighton, Ed., Protein 215. R. Araoz, E. Anhalt, L. Rene, M. A. Badet-Den-
Function: A Practical Approach, 2nd ed., Oxford isot, P. Courvalin, and B. Badet, Biochemistry,
University Press, Oxford, 1997, pp. 155-183. 39,15971-15979 (2000).
192. P. K. Pal, W . J. Wechter, and R. F. Colman, 216. M. A. Ator and P. R. Ortiz de Montellano in
J. Biol. Chem., 250,8140-8147 (1975). D. S. Sigrnan and P. D. Boyer, Eds., The En-
193. J. L. Wyatt and R. F. Colman, Biochemistry, zymes, 3rd ed., Vol. XM, Academic Press, San
16, 1333-1342 (1977). Diego, 1990, pp. 214-282.
194. R. H. Abeles, Pure Appl. Chem., 53, 149-160 217. J. Grutzendler and J. C. Morris, Drugs, 61,
(1980). 41-52 (2001).
218. E. Perola, L. Cellai, D. Lamba, L. Filocamo,
195. T . I. Kalman, Drug. Dev. Res., 1,311428 (1981).
and M. Brufani,Biochim. Biophys. Acta, 1343,
196. C. T . Walsh, Tetrahedron, 38,871-909 (1982). 41-50 (1997).
197. C. T . Walsh, TIPS, 254-257 (1983). 219. R. J. Polinsky, Clin. Ther., 20,634-647 (1998).
198. R. H. Abeles, Chem. Eng. News., 61, 48-56 220. K. N. Allen and R. H. Abeles, Biochemistry, 28,
(1983). 8466-8473 (1989).
CHAPTER EIGHTEEN
Contents
1 General Introduction, 782
1.1Introduction, 782
1.2 Definition of Chirality, 783
1.3 Pharmacology, 785
1.4 Protein Binding and Metabolism, 786
2 Chromatographic Separations, 787
2.1 Small-Scale HPLC Examples, 788
2.2 Chromatographic Diastereoisomer
Separation, 788
2.3 Preparative HPLCISMB, 789
2.4 Conclusions, 792
3 Classical Resolution, 793
3.1 Separation of the Active Pharmaceutical
Ingredient, 793
3.2 Separation of Intermediates to Single
Enantiomer Active Pharmaceutical
Ingredient, 797
3.3 Crystallization-Induced Asymmetric
Transformation, 798
4 Nonclassical Resolution, 799
4.1 Preferential Crystallization, 799
4.2 Enrichment of Enantiomeric Excess by
Crystallization,800
4.3 Resolution by Direct Crystallization,802
5 Enzyme-Mediated Asymmetric Synthesis, 804
5.1 Amide Bond Formation, 804
5.2 Transesterification and Hydrolysis, 805
5.3 Oxidation and Reduction, 806
6 Asymmetric Synthesis, 807
6.1 Chiral Pool, 807
6.2 Chiral Auxiliary, 810
6.3 Chiral Reagent, 813
6.4 Chiral Catalyst, 814
Burger's Medicinal Chemistry and Drug Discovery 7 Conclusions, 820
Sixth Edition, Volume 1: Drug Discovery
Edited by Donald J. Abraham
ISBN 0-471-27090-3 0 2003John Wiley & Sons, Inc.
Chirality and Biological Activity
u
lfosfamide (3)
Figure 18.2.
Chirality and Biological Activity
NMR spectra, retention time in HPLC or thin With the advancement in analytical and pre-
layer chromatography (TLC), and can behave parative technologies, the researcher is now
differently in chemical reactions with achiral more able to separate and study individual en-
reagents. The commercial glycopyrrolate antiomers. Pharmacological assessment of the
product contains only the threo isomers (S,R)- behavior of chiral compounds in early phase
(10)and (R,S)-(11). research is imperative for selection of the cor-
rect isomer for development.
1.3 Pharmacology When a racemate is administered, the over-
Biological systems are in the main constructed all pharmacological effect may have one of
from homochiral molecules such as L-amino three general outcomes described below.
acids or D-sugars. Such systems give rise to a
highly "chiral environment," and hence, it is 1. All activity resides in one of the isomers,
not surprising that many drugs possessing the other antipode being inactive.
asymmetric centers exhibit a high degree of 2. Both isomers have equal activity.
steroselectivity in their interactions with bio-
3. Both isomers have the same activity but
logical macromolecules. In the past 20 years or
differ in potencies.
so7pharmacological and toxicological investi-
gations have clearly demonstrated significant
differences in the biological activity of some We will briefly highlight some examples
isomeric pairs. Pharmacokinetic investiga- that help to elucidate the above general classes
tions have also led to a better understanding of with some pertinent examples. The antihyper-
racemic drug action. tensive agent a-methyldopa is an example
It is important to introduce two other where all the desired antihypertensive activity
terms that compare the pharmacological ac- is confined to a single isomer (the L-enantio-
tivity of a pair of enantiomers. The isomer im- mer). It is noteworthy that L-(a)-methyldopa
parting the desired activity is called the eu- is a prodrug, being metabolized to the isomer
tomer (in the case of Thalidomide this is the of the active metabolite, and it is this metabo-
R-enantiomer), whereas the isomer which is lite that has the required activity (5). L-Dopa is
inactive or causes unwanted side effects is marketed as the single enantiomer; during
called the distomer (this is the S-enantiomer early development it was noted that the D-iso-
for Thalidomide). Comparison of the potencies mer exhibited serious side effects such as
of the two isomers comes from the eudismic granulocytopenia (which is defined as a re-
ratio and this can be used in vitro or in vivo. duced number of blood granulocytes) (6).
Chirality and Biological Activity
2 CHROMATOGRAPHIC SEPARATIONS
from the small semi-preparative scale to large- tant class of drugs that are potent blockers of
potential manufacturing processes. calcium currents and have found use in the
treatment of cardiac arrhythmias, peripheral
2.1 Small-Scale HPLC Examples
vascular disorders, and hypertension (35). It
HPLC is now a widely available and user- has been shown that enantiomers of chiral
friendly method employed for qualitative and DHP have opposite pharmacological profiles
quantitative analysis and is also one of the (35). One of the antipodes is a calcium entry
most expedient methods for providing the mil- activator, while the other is a calcium entry
ligram quantities of stereochemically pure blocker. The analytical and semi-preparative
material required for initial testing. Often the separation using chiral HPLC for a number of
identification of a suitable CSP to effect sepa- DHPs of the structures (Fig. 18.7) has been
ration of a specific pair of enantiomers is seen described (36). Here a number of different
as being labor intensive and requiring consid- CSP were utilized and their ability to separate
erable exverimentation. However, the avail- the above DHPs determined.
ability o f commercial databases that compile
2.2 Chromatographic Diastereoisomer
literature on LC enantioseparations makes
Separation
this process significantly easier (30). The com-
panies that supply CSPs also provide detailed Another approach to the separation of enanti-
information about a specific columns' suitabil- omers by chromatography is to prepare a di-
ity towards the separation of certain types of astereoisomer of the enantiomer to be sepa-
compounds (31). This helps to avoid a "trial rated. As discussed in the introduction to this
and error" approach towards enantiosepara- chapter, diastereomers exist if there is more
tions using chromatography. The use of col- than one chiral center, but are not enanti-
umn switchers to test a number of CSPs can omers of one another. As such they do not
also be of enormous assistance in a rational have identical physical properties. In chroma-
screening program. tography, formation of derivatives such as
One example of separation by HPLC is esters, amides, etc., often leads to better sepa-
Clenbuterol, which is an orally active, sympa- ration of the components. In the case of a race-
thomimetic agent that has specificity for P2- mate, if a chiral reagent (i.e., acid or m i n e ) is
adrenoceptors. Owing to its bronchodilator employed, then a diastereomeric mixture re-
properties, it has found use in the treatment of sults on treatment with such a derivatizing
respiratory disorders in humans and animals agent. One such example is the derivatization
(32). The two enantiomers of Clenbuterol of Pirlindole, which is a racemic anti-depres-
have been separated using a chirobiotic col- sant drug. Here the use of amino acid deriva-
umn, which consists of a macrolide-type anti- tives as chiral derivatizing agents (CDA) was
biotic stationary phase, using a mobile phase shown to enable an effective and efficient sep-
with composition of 70% MeOH, 30% acetoni- aration (37). Preparation of the L-phenylala-
trile, 0.3% acetic acid, and 0.2% triethylamine nine methyl ester (21) enabled separation of
(33). The enantiomers eluted as follows: the Pirlindole enantiomers using a medium
R-(-)-Clenbuterol (15)with a retention time liquid pressure (MPLC) method. This is high-
of 8.35 minutes and S-(+)-Clenbuterol (16) lighted in Fig 18.8, after removal of the CDA
with a retention time of 9.12 minutes. The sin- the enantiomers of -virlindole were obtained in
gle enantiomers obtained through chromatog- high optical purity. This gave several grams of
raphy were of >95% optical purity. It has been each enantiomer, which permitted a study of
shown that (-)-Clenbuterol was 100-1000 the stereochemical influence at the pharmaco-
times more -potent than (+)-Clenbuterol in logical level. The interaction with monoamine
P-adrenergic agonist bioassays (34). oxidase A (MAO-A) and B (MAO-B) with
A number of 1,6dihydropyridines (17-20), Pirlindole racemate and single enantiomers
exhibiting axial chirality (chiralty stemming using biochemical techniques (in vitro and ex
from the nonplanar arrangement of four vivo determination of rat brain MAO-A and
groups about an axis), have been separated by hL4O-B activity) was studied. In vitro, the
small-scale HPLC methods. This is an impor- MAO-A IC,, of (+)-Pirlindole, R-(-)-Pidin-
2 Chromatographic Separations
H A
DHP 1 (17) DHP 2 (18)
I I
H H
DHP 3 (19) DHP 4 (20) Figure 18.7.
dole (22), and S-(+)-Pirlindole (23) were 0.24, some polyacrylamides (Chiraspher) (4I),
0.43, and 0.18 pM, respectively. The differ- cross-linked diallyltartramide (42), and to a
ences between the three compounds were not lesser extent, cyclo-dextrin based phases.
significant, with a ratio between the two enan- Clearly for the larger scale separations, the
tiomers R-(-)IS-(+)of 2.2 in vitro (38). availability of the CSP in larger quantities is a
prerequisite. It should also be noted that at
2.3 Preparative HPLC/SMB the preparative scale, it seems that up to 90%
In the initial discovery phase of drug research, of racemic compounds tested have been re-
time is the most important factor where a suc- solved with just four different polysaccharide-
cessful process must be rapidly identified, based phases (43).
have a short run time, and have general appli- The degree of separation of the two enanti-
cability. As the phase of the project changes to omers obviously plays an important part in
full development, the process needs to be es- the CSP selection. Another equally important
tablished and cost becomes a crucial factor. parameter is the loading capacity of the sta-
Thus, on scale up of an LC method to the pre- tionary phase. The higher the loading capac-
parative level (100 mg and above), a number of ity, the greater the amount of material that
additional important aspects become relevant. can be separated (44). For example the poly-
The selection of a suitable CSP from the pleth- saccharide-based CSPs have a saturation ca-
ora available depends on the following factors: pacity of 5-100 mg/g of CSP; this is clearly
CSP availability, loading capacity and selectiv- dependent on the type of racemate that is be-
ity, throughput, and mobile phase. ingresolved. On the other hand, protein-based
The most successful and broadly applied CSPs have lower saturation capacities, of the
chiral stationary phases comprise the cellu- order 0.1-0.2 mg/g of CSP.
lose-and amylose-based phases developed by For preparative chromatography, through-
Okamoto (Chiracel and Chiralpak) (39), put can be defined as the amount of purified
brush-type phases developed by Pirkle (40), material obtained per unit of time and per unit
Chirality and Biological Activity
0 H Separate diasteroisorners
I by HPLC cleave to enantiomers
Figure 18.8.
mass of stationary phase. Several factors af- mer was shown not to exhibit these antihista-
fect this including loading capacity, column ef- mink effects. A n asymmetric synthesis (46),
ficiency, selectivity, column size, temperature, and resolution of an intermediate have deliv-
cycle time, flow rate, and the solubility of the ered the single enantiomer previously. How-
racemate. ever, for various reasons, the development of a
The mobile phase plays a crucial role in the preparative HPLC method seems to be the
separation process for at least three main rea- method of choice (47). The main reasons are
sons. The selectivity of the separation, reten- the rapid scale up and the improved economics
tion time. and solubilitv of the racemate are of this approach. Utilization of the amide(24)
directly affected by the kobile phase composi- (Fig. 18.9)gave rise to a highly efficient sepa-
tion. Other parameters such as viscosity, sol- ration using a Chiralpak AD column in a mix-
vent recovery, cost, and solvent handling ture of acetonitrileliso-propanol60:40.The ef-
properties also play a prominent role. This ficiency of the separation can be measured by
brief introduction is also applicable to the cri- the a value(2.76)or the USP resolution(8.54).
teria for CSP selection for SMB. The a value and USP resolution numbers are
An example of a drug separated by prepar- measurements of how efficient the separation
ative HPLC is cetirizine dihvdrochloride,
" a ra- is; typically the higher the number, the better
cemic drug that is a second generation antihis- the separation. This enabled the production of
tamine H,receptor antagonist. Studies on the 1.6 kg of both the (+) and (-) isomers of high
effect of racemic and R (25) and S-Cetirizine purity.
(26)
. . on nasal resistance indicated that both Like all methods for separating chiral mol-
racemic and the R-enantiomer had similar ac- ecules, chromatographic separations do suffer
tivity. The racemate and R-enantiomer inhibit from drawbacks: large quantities of expensive
histamine and induced an increase in nasal stationary phases are needed and large vol-
resistance, thus indicating the antihistaminic umes of mobile phases are used, coupled with
properties of R-Cetirizine(45).The S-enantio- the resultant high dilution of separated prod-
2 Chromatographic Separations
1
Separation, HPLC
Conversion to acid
dihydrochloride
Figure 18.9.
ucts. A number of methods have been intro- The separation of racemic mixtures is well
duced in an attempt to improve on this tech- suited to SMB technology, because these
nology, such as recycling (44). Perhaps the counter current systems can generally only
biggest advancement in recent times has been perform two-component separations at a time
the introduction and application of SMB tech- (51). A detailed description of this technique is
nology in the field of chiral separations (48). given in an excellent article by Guest (52). The
This technique was pioneered in the late SMB system generally consists of several col-
1950s by Universal Oil Products in the United umns, typically 6-12, which are connected in
States as a useful method for separation of oil series. An arrangement of pumps and valves
derivatives and sugars (49). Initially SMB are set up to maximize the stationary phase
technology was applied to very large volumes utilization, allowing for better solvent effi-
of material. For example, xylene isomers are ciency and adsorbate concentration. This
separated in thousands of ton quantities an- leads to two streams coming off the system in
nually. The application of SMB to the separa- solution, one is termed the raffinate, which is
tion of racemic mixtures has led to downsizing enriched in the less adsorbed component, and
and modifications of this technology, but the the other termed extract, which is enriched in
main principles remain the same. The use of the more adsorbed component. The complex
counter-current contact in SMB maximizes set of conditions and parameters that are re-
the driving force for mass transfer and the quired to optimize SMB chromatography has
contact between the substrate and stationary led to the design and process optimization be-
phase. This provides a more efficient use of the ing done by computer simulations (53). A
adsorbent capacity than that of a simple batch number of examples will be discussed that
system (50). highlights this growing area of chiral separa-
Chirality and Biological Activity
u Tramadol (28)
levels of enantiopurity are required, the effi-
ciency and cost effectiveness of SMB may not
be economical. However, if for example, a
lower enantiomeric excess can be couded- with
Figure 18.11. an enhancement by crystallization, then the
3 Classical Resolution
1 \ i). 4-Me-morpholine
MeOH/H2O
ii). (D)-(+)-DBTA
The D-threo-methylphenidate, (D)-(+)-DBTA,
salt is readily converted into the hydrochlo-
ride salt. It is interesting to note that recently,
Celgene and Norvatis received a FDA approv-
able letter for the use of dexmethylphenidate
for use in ADHD. This consists of only the
D-threo enantiomer (291, in comparison with
the original product, which contained all four
isomers (29-32).
.(D)-(+)-DBTA .(D)-(+)-DBTA
Chemists at Chiroscience took an alterna-
+ tive approach to the D-threo-methylphenidate
4-Me-morpholine (29) single enantiomer (63). An efficient reso-
(R,S)-Naproxen(33)
+
Achiral arnine base
+
Chiral arnine base
O /
/
Precipitate
/ H d
\ Mother liquors
C 9 H
.R*NH2 .RNH2
/ /
\o \o
(S)-Naproxen.chiralarnine base (34) (R)-Naproxen.achiralarnine base (35)
Heat
&CO2H
/ /
'-0 \o
(S)-Naproxen(36) (R,S)-Naproxen.achiralarnine base (37)
Recycled to resolution
Figure 18.14.
by filtration. The mother liquors are then free base (40) in solution. Conversion of the
heated and the achiral m i n e base catalyzes tartrate salt to (S)-bupivacaine hydrochloride
racemization of the unwanted R-enantiomer. (39) was obtained in 35-40% overall yield
The resulting racemic mixture of the acid based on racemate input. To increase the eco-
(R,S)-(37)can then be put back into the reso- nomics of the process, a racemization of the
lution loop. Using this process, the overall unwanted R-enantiomer was required. Treat-
yield of (5')-Naproxen is >95%, based on the ment of the liquors containing the enriched
input of racemic acid. To further highlight the (Rbbupivacaine, tartaric acid, propanol, and
efficiency of this process, the N-alkylglucam- propionic acid at reflux resulted in complete
ine resolving agent is recovered in >98% per racemization in 2 h. By pertinent processing,
cycle. the racemic free base thus obtained is isolated
Racemic bupivacaine hydrochloride (38, by crystallization and can be put back into the
Marcaine) is currently used as an epidural an- resolution cycle (68). Another fine example by
esthetic during labor and as a local anesthetic chemists from Eli Lilly involves a clever reso-
in minor operations. Clinical studies have lution-racemization-recycle (R-R-R) process
shown that levo-bupivacaine (41) is less car- in the synthesis of Duloxetine (69).
diotoxic in man, making it significantly safer As discussed in Section 2 of this chapter,
than the racemate (67). Separation of the en- Tramadol is a chiral drug substance that is
antiomers was readily achieved using 0.25 eq currently used as a high potency analgesic
of D-tartaric acid. This resulted in the isolation agent. The preparation of Tramadol is shown
of a 2:l (S)-bupivacaine D-tartaric acid salt in Fig 18.16, which results in the formation of
(39) in 98% de, leaving the (R)-bupivacaine all four possible stereoisomers from the Grig-
Chirality and Biological Activity
+.0.25eq (D)-(+)-Tartaric
acid
(3-Bupivacaine.(D)-(+)-Tartaric
acid (39)
1). NaOH
2). HCl(g) in IPA
Figure 18.15.
nard reaction (70). The trans isomers (42,431 Another drug that is sold as a racemate is
form over the cis isomers (44,45) in a ratio of Etodolac (46),which is used as a non-steroidal
4 : 2 ; the currently marketed racemate con- anti-inflammatory agent (NSAID) that also
sists of only the trans isomers. It is possible to has analgesic properties; it has the ability to
take this crude reaction mixture and selec- retard the progression of skeletal changes in
tively isolate either the (+)-trans isomer (421, rheumatoid arthritis (72). It has been shown
by using di-p-toluoyl->tartaric acid [D-(+)- that the majority of therapeutic activity lies in
DTTA] resolving agent or the (-)-trans iso- the S-(+)-isomer (73). D-(-)-N-Methylglu-
mer (43) using L-(-)-DTTA. This highlights camine (meglumine) is obtained by ring open-
the high selectivity that can be achieved when ing of D-glucose with methylamine, and hence
using certain resolving agents. In the case of it is readily available and inexpensive. Scien-
Tramadol, the cis isomers (45,46) do not form tists at Chiroscience have described the use of
crystalline salts with DTTA and therefore re- meglumine to separate the enantiomers of
main in solution. This results in a highly effi- Etodolac (74). It was shown that the meglu-
cient process, where the chiral acid not only mine salt possessed suitable properties to en-
separates the single enantiomers (42 or 43) able its use as a salt for pharmaceutical admin-
but also removes other impurities (i.e., cis iso- istration. Therefore, in the case of Etodolac,
mers 44 and 45) at the same time (71). meglumine can not only be used to separate
3 Classical Resolution
(NlN
+
C02H
2(S)-CSA
/
Precipitate (47)yer liquors
/ \
Precipitate Mother liquors
Figure 18.16.
Figure 18.18.
the enantiomers, but it can also be used as the
pharmaceutical salt form of choice. this amino acid has found use as an interme-
In addition to the racemic drugs discussed diate compound of the HIV proteinase inhibi-
in this section, resolutions are also used in the tor L-735,525 (75). The racemic cyclic amino
isolation of key building blocks for the phar- acid (47) has been resolved with S-cam-
maceutical industry. An important class of phorsulfonic acid (CSA), which yields the S-
these intermediates are amino acids, many of isomer as the double CSA salt (48) as the pre-
which are available as the single isomer from cipitate (76). Retained in the mother liquors is
natural sources (see INTRODUCTION). The the R-isomer (49). This can neatly be racem-
use of unnatural amino acids and D configured ized to the S-isomer by mixing with S-CSA in a
ones are expected to have a greater influence suitable solvent. On seeding with pure (S,S)-
at the biological level. In the drive for molecu- diastereomeric salt, a further quantity of the
lar diversity and metabolic stability, a number desired (S,S)product (48) is obtained, leaving
of unnatural amino acids such as the non-pro- the R-isomer (49) once more in the liquors.
teinogenic piperazine carboxylic acid (47) The whole cycle can be repeated and has been
(Fig. 18.18) have been developed. Specifically, demonstrated with four complete cycles. To
complete the whole process, the resolving
agent is also readily recovered and recycled.
does offer several advantages from the point of The racemate aminoglutethimide (27) has
view of time and quality aspects, there are also been shown to be effective in the treatment of
a number of drawbacks. If, for example, a ra- hormone-dependent breast cancer (Fig.
cemization of the unwanted isomer cannot be 18.20). Further studies have shown that the
found, there would be a waste of 50% of mate- R-enantiomer is more potent than its antipode
rial. Therefore, it can often be advantageous as an aromatase inhibitor (82). The resolution
to conduct the separation at an earlier stage in of aminoglutethimide itself has been reported
the synthesis of the drug. This leads to better in the literature, using tartaric acid. This res-
atom efficiency compared with resolution of olution suffers from the formation of solid so-
the final product, resulting in a reduction of lutions (83),which require endless crystalliza-
the overall amount of waste and cost. tions to deliver the single enantiomer (84).
One such example is Verapamil, which is a Use of a suitable precursor (54) enabled sepa-
well-established treatment of cardiovascular ration of the intermediate (55),by treatment
ailments (77). S-(-)-Verapamil (51) has spe- with the alkaloid resolving agent (-)-cincho-
cific transmembrane calcium channel antago- nidine. This chiral acid was then cyclized to
nist activity, whereas its antipode (53) influ- nitroglutethimide, which on reduction, gave
ences a wider range of cell pump actions, the desired R-aminoglutethimide (56) (85). It
including those for sodium ions (78). Vera- is noteworthy that in the case of aminoglute-
pamil has been separated into its single enan- thimide, the m i n e functionality is an aniline
tiomers by resolution with expensive resolving moiety. Because of the low pK, associated with
agents, which required multiple recrystalliza- this amine (2.5-4.6), the number of acidic re-
tions to effect complete separation (79). Look- solving agents that can be employed are re-
ing into the synthetic sequence of Verapamil, duced, because they need to be of relatively
several intermediates seemed to be attractive high acidity to form a salt.
alternatives to Verapamil(80). The intermedi-
3.3 Crystallization-Induced Asymmetric
ate verapamilic acid (Fig. 18.19) was effi-
Transformation
ciently separated using a-methylbenzylamine
(a-MBA), which is an extremely cheap resolv- A number of amino acids have been separated
ing agent (81). Subsequent transformation of by resolution, in certain cases the yield of the
the easily obtained R- or S-verapamilic acid required diastereoisomer has been greater
(50 or 52), required a further three to four than 50% (86). p-Chlorophenylalanine is of
synthetic steps to yield the active pharmaceu- considerable pharmacological interest, be-
tical ingredient. cause of its ability to inhibit serotonin forma-
1 4 Nonclassical Resohtion
2 steps
I
Figure 18.20.
tion in laboratory animals (87). Both the R- ing agents. As with all screens, analysis of the
and S-enantiomers have also been used as data is often time consuming and laborious.
building blocks in the synthesis of other drugs. Bruggink et al. have shown that differential
An ingenious approach to R-p-chlorophe- scanning calorimetry (DSC) of the isolated
nylalanine methyl ester, which is based on a salts can help to quickly determine whether
one-pot resolution-racemization sequence, is the isolated salt will provide a through resolu-
highlighted in Fig. 18.21. Here, treatment of tion (91). However, with a methodical and pre-
racemic p-chlorophenylalanine methyl ester cise screening protocol, it is nearly always pos-
(57) with 0.5 eq of D-tartaric acid and 0.1 eq of sible to find a suitable resolving agent that
salicylaldehyde in methanol gave a 68% yield effects separation of the enantiomers (92).
of 98% enantiomeric purity of the 2:l R-p-
chlorophenylalanine D-tartaric acid salt (58).
4 NONCLASSICAL RESOLUTION
The reason that the absolute yield is greater
than 50% is caused by the S-enantiomer being
4.1 Preferential Crystallization
racemized in situ. The 2:l tartrate salt is crvs-
"
talline and is therefore removed from the sys- A brief description of the type of "racemic"
tem by virtue of its insolubility. This drives compounds is necessary for the reader to bet-
the equilibrium further in favor of the 2:1R-p- ter understand the principles behind the ap-
chlorophenylalanine D-tartrate salt (88). plication of crystallization methods to the sep-
While the common goal remains to be the aration of enantiomers. Three fundamental
rational design of resolving agents (89), it is types of crystalline racemates exist. In the
clear that we are still away from this actually first, the crystalline racemate is a conglomer-
happening. An alternative "family" approach ate, which exists as a mechanical mixture of
to classical resolution has been demonstrated crystals of two pure enantiomers. The second,
by Vries et al. (90). A group of similar resolv- which is the most common, consists of the two
ing agents are mixed simultaneously with the enantiomers in equal proportions in a well-
racemate. This was done to shorten the time defined arrangement within the crystal lat-
required to complete the resolving agent tice; this is termed racemic compound. The
screen. Note should be made that the families third possibility occurs with the formation of a
of resolving agents are very similar and that solid solution between the two enantiomers
the crystalline species obtained by this that coexist in an unordered manner in the
method contained more than one of the resolv- crystal. This kind of racemate is called a pseu-
Chirality and Biological Activity
(R)-pchlorophenylalanine.0.5eq(D)-tartaricacid (58)
Figure 18.21.
doracemate and is rather rare. Conglomerates ior of the two enantiomers (binary melting
have been estimated to be approximately 10% point phase diagram) or their solubility behav-
of all racemates (93). Diagrammatic represen- ior in the presence of a solvent (ternary solu-
tation of the first two types of racemate are bility phase diagram), separation of enanti-
shown in Fig. 18.22. omers can be reproduced. Phase diagrams for
By understanding the appropriate phase the three types of racemate are shown in Fig.
diagrams, which describe the melting behav- 18.23. For a full and detailed explanation of
this topic refer to the monograph of Jacques et
al. (57).
+x.>+x+
(.z.> Racemic mixture (conglomerate)
4.2 Enrichment of Enantiomeric Excess by
Crystallization
The attainment of high levels of enantiopurity
is not always possible by enzymatic or diaste-
reomeric resolutions or by asymmetric syn-
theses alone. It is however frequently possible
to prepare a pure enantiomer from a partially
resolved sample by simple recrystallization.
9+x.)+z+
For this process to proceed successfully it is
6
x necessary that the initial enantiopurity of the
mixture is greater than that of the eutectic
point in the phase diagram. By utilization of
the phase diagram, the optimal quantity of sol-
Racemic compound
vent required can be calculated. It is also pos-
Figure 18.22. sible to calculate the maximum expected yield.
4 Nonclassical Resolution
Conglomerate (-1
Racemic compound
u(-1
Figure 18.24.
(+I Pseudo racemate
Figure 18.23. to deliver enantiopure product. Another ex-
ample of this type of compound is Warfarin
Note should also be made that in some cases (13).Chemists at Dupont (97) developed an
recrystallization reduces the enantiomeric ex- asymmetric hydrogenation approach, which
cess, which can lead to crystallization of the gave Warfarin in -80% ee. Simple crystalliza-
racemate (94). In these cases the mother li- tion in an appropriate solvent yielded optically
quors contain moderately to highly enriched pure Warfarin, thus indicating that the eutec-
material. It is therefore important to plan the tic point is below 80% ee. (See earlier section
strategy at which point the enantiomer is re- on the metabolism and binding properties of
crystallized to optical purity. This may be the Warfarin enantiomers).
from an enzymic resolution, or in the event The phase diagrams below highlight two
that an asymmetric synthesis has failed, to de- typical cases, the first where the eutectic point
liver enantiopure product. As discussed in Sec- E is close to the racemate, and the second
tion 3, the liquors from the diastereomeric res- where the eutectic approaches the single en-
olution with DTTA of 88%de can be cleaved to antiomer as shown in Fig. 18.24. In the first
the free base, and crystallization of the hydro- case, it would be preferable to crystallize the
chloride salt gives >98% ee. This is because of enriched enantiomer to optical purity, e.g.,
the fact that methylphenidate hydrochloride methylphenidate. However, in the second
has a eutectic point of 30%ee. Davies et al. (95) case, a very stable racemic compound exists,
and Winkler et al. (96) have prepared single giving rise to a high eutectic point. Here crys-
enantiomer methylphenidate (29). Their ap- tallization of enriched enantiomer mixture
proaches use an enantioselective synthesis; will only be successful at high ee. For example,
the enantiomeric excesses are 86% and 69%, verapamil hydrochloride requires that the ee
respectively, thus requiring recrystallization be greater than 98% for crystallization to yield
Chirality and Biological Activity
enantiopure product. Below this, the enantio- of acetylcholine. An increase in the level of
purity is reduced. In this case, it is advanta- acetylcholine in patients with AD has been
geous to recrystallize the diastereomeric salt shown to improve their cognitive perfor-
precursor to optical purity before proceeding mance. Galanthamine has been extracted
to final product. from botanical sources; however, several tons
of daffodil bulbs are needed to produce 1 kg of
4.3 Resolution by Direct Crystallization
product. A synthetic route has been developed
It is important to show how conglomerates are that uses a crystallization-induced chiral
identified. We have already seen that they transformation (Fig. 18.25). This crystalliza-
have specific phase diagrams as shown in Fig. tion was first reported by Barton and Kirby
18.23. Other such data that support identifi- (100) and further developed by Shieh and
cation of a conglomerate are IR, X-ray data, Carlson (101). The success of this transforma-
and observation of a spontaneous resolution tion is based on two phenomena: narwedine
or resolution by entrainment. Note should be (591, which crystallizes as a conglomerate, and
made that in 1848, Louis Pasteur separated (-)-namedine (60), which equilibrates with
the dextrorotatory and levorotatory crystals of (+)-namedine through a retro-Michael inter-
sodium ammonium tartrate. This manual mediate. This process has now been developed
sorting of crystals is also known as triage, and so that (-)-narwedine (60) is routinely ob-
by its very nature is time consuming and labo- tained in 80% yield from the racemate input,
rious. The readers are again directed towards as shown in Fig. 18.25 (102).
the Jaques et al. monograph, which lists over Recently a number of potent 5-HT, recep-
250 known examples of conglomerates (57). tor antagonists such as Ondansetron have
There are two possibilities for separation of been reported to be clinically effective for the
enantiomers by direct crystallization. The blockade of chemotherapy-induced nausea
first uses spontaneous resolution, which oc- and emesis (103). The structurally novel com-
curs when a conglomerate crystallizes. This pound (62) has also been shown to be a highly
crystallization may be followed by the me- potent 5-HT, antagonist (104); specifically,
chanical separation of the crystals of the two the R-(-)-(62) enantiomer was shown to be
enantiomers. Various techniques have been the most active. Comparison of the physical
developed that aid this separation. data of the racemate and single enantiomer
The second type of resolution by direct indicated that this structure (62) exists as a
crystallization is known as entrainment. conglomerate (104). By careful experimenta-
Here, the differences in the rate of crystalliza- tion, the best concentration, temperature, and
tion of the enantiomers in a supersaturated time for crystallization were discovered. Table
solution give rise to a separation. Strict con- 18.1 highlights the results obtained for the en-
trol of the conditions for the crystallization are trainment.
required, with the system of crystals and solu- The initial concentration of the solution
tion not being allowed to come to equilibrium was 10.0 g of (2)-(62)in 50 g of acetone. In all
and time playing an important role. The oc- runs, 10 mg of seed crystals were used. From
currence of conglomerates has been estimated the 10 runs highlighted in the 18.1, 21.0 g of
to be approximately 10% of all racemic com- R-(-1462) of >92.O% ee and 21.4 g of (S)-(+)-
pounds. We will now illustrate this phenome- (62) of >90% ee are obtained from an input of
non with some pertinent examples. 50.4 g of racemate. The table also nicely illus-
An example of use of the conglomerate Nar- trates the continuous nature of the process,
wedine (59) in the synthesis of a natural prod- which coupled with the fact that no resolving
uct Galanthamine (61) which is an Amarylli- agent, chiral auxiliary, enzyme, or catalyst is
duceae alkaloid and has been used clinically needed, underlines the economic advantages
for 30 years for neurological illnesses (98). of this type of process.
More recently it has been approved for the use The importance of amino acids as building
in the treatment of Alzheimer's disease (AD) blocks for asymmetric synthesis is well docu-
(99). Galanthamine acts to inhibit acetylcho- mented (105). A number of amino acids have
linesterase (AChE), thus increasing the levels been shown to exist as conglomerates. Shi-
4 Nonclassical Resolution
0 0
Entrainment
NMe
Me0 Me0
Figure 18.25.
raiwa et al. have described the preferential is successfully resolved using preferential
crystallization of racemic methionine hydro- crystallization. The glycidic acid-substituted
chloride (106). The obtained D- or L-methio- phenylesters were prepared; of the 30 synthe-
nine hydrochloride was, however, only -75% sized, only one exhibited conglomerate prop-
optically pure, requiring a further recrystalli- erties (109). This was the 3-(Cmethoxyphe-
zation to furnish enantiopure product. Shi- ny1)glycidic acid 4-chloro-3-methylphenyl ester
raiwa et al. have also recently disclosed the (63). Table 18.2 summarizes the physical data
resolution of (2RS, 3SR)-2-amino-3-chlorobu- collected, which is illustrative of the conglom-
tanoic acid HC1 again using entrainment erate nature of this compound.
(107). Here it was shown to be necessary to The obtained single enantiomer (- )-epox-
conduct the crystallization in an ethanol15 M ide (64) is then converted into the required
hydrochloric acid solvent mixture for optimal (+)-isomer of Diltiazem (65) in several steps,
results. By careful control of the conditions, as highlighted in Fig. 18.27.
high levels of enantiomeric excess were ob- Taxol is a natural product isolated in very
tained in the crystalline salt. low yield from Taxus brevifolia and is used in
Chemists in Japan have developed an excel- the treatment of cancer (110). The extreme
lent approach to (+)-Diltiazem, which is a chemical complexity of Taxol makes produc-
coronary vasodilator (108). An intermediate tion by total synthesis uneconomical. How-
ever, a semisynthetic approach using the nat-
urally derived 10-deacetylbaccatin I11 (66)
condensation with N-benzoyl-(2R, 3s)-3-phe-
nylisoserine (67) does provide an alternative
and economic approach (111). N-benzoyl-(2R,
3s)-3-phenylisoserine (67) is also commonly
known as the Taxol side-chain and has been
prepared in optically active form using chiral
auxiliaries or resolving agents (112). It has
been shown that the Taxol side-chain is a con-
Figure 18.26. glomerate and can therefore be cheaply and
804 Chirality and Biological Activity
Reprinted from H. Harada, Tetrahedron Asymmetry, vol. 8, T . Marie, Y. Hirokawa, and S. Kato, 1997, pp. 2367-2374.
Reproduced with permission from Elsevier Science.
efficiently entrained to the single required en- do enable the use of higher temperatures,
antiomer (113). pressures, and organic solvents.
Enzymes can be utilized to affect a number
5 ENZYME-MEDIATED ASYMMETRIC of transformations; the broad spectrum of re-
SYNTHESIS actions, including amide bond formation, hy-
drolysis, esterification, reduction, oxidation,
Enzymes have found frequent use in the syn- and carbon-carbon bond formation, has been
thesis of single isomer drugs from racemic or reviewed elsewhere (114).
prochiral compounds at the larger manufac-
turing scales. The use of enzymes to effect 5.1 Amide Bond Formation
chiral transformations in the medicinal chem-
istry laboratory has been far less frequent; The use of enzymes to stereospecifically form
however, the increasing availability of immo- amide bonds has been described in many texts
bilized and stabilized forms of enzymes has (115); however, the commercial availability of
made their use easier and the resultant trans- cross-linked enzyme crystals (CLECs),for ex-
formations more predictable. ample, PeptiCLEC-TR, which is an immobi-
By virtue of their complex macromolecular lized form of Thermolysin protease, has been
structure, including a highly defined active used in the synthesis of D2163 (68), a novel
site, enzymatic transformations generally matrix metalloproteinase inhibitor (116). In
proceed with a high degree of chemical selec- vitro enzyme screening identified the all-nat-
tivity and stereospecificity. Reactions are typ- ural SSS-isomer as the active product. The
ically conducted under mild conditions of tem- elegant CLEC (117) technology used in this
perature, pressure, and pH, thus minimizing example makes the enzyme stable to typical
losses caused by unwanted side reactions or organic reaction conditions and enables facile
partial racemization. The use of extremo- removal of the enzyme at the end of the reac-
philes or cross-linked enzymes such as CLECs tion by simple filtration. On this basis, it is
0 SCOPh
I
Figure 18.29.
synthesis of the desired S-enantiomer has 5.3 Oxidation and Reduction
been achieved by the selective acylation of the
R-enantiomer of the key intermediate (73) as In addition to the widely reported techniques of
amide bond formation, transesterification,and hy-
-
shown in Fie. 18.30.
drolysis,enzymic enantioselectiveoxidation is also
used in the synthesis of single isomer drugs. Pate1
described the efficient oxidation of benzopyran
(751,an intermediate in the synthesis of potassium
channel openers (123).The transformationwasef-
fected with a cell suspension ofMortie~llaraman-
niana with glucose over a 48-h period, the isolated
product (77) was obtained in a 76% yield with an
optical purity of 97%and a chemical purity of 98%,
as shown in Fig. 18.32.
Reduction with a variety of enzymes has
been reported (114), including bakers yeast for
the reduction of a-methyleneketones to the cor-
responding a-methylalcohol (124),a functional-
ity that is present in a number of drugs. The
reduction of an azidoketone (78) using Pichia
Thioridazine (72) angusta enzyme has been used in the synthesis
of S-salmeterol (79) (125). Salmeterol (Ser-
event) is a potent, long-acting P2-adrenorecep-
tor used as a bronchodilitor in the treatment of
asthma. Recently, Sepracor claimed that the S-
enantiomer had a higher selectivity for P2 recep-
tors and that it did not cause certain adverse
Bupivacaine (38) effeds associated with the administration of
( 2 ) -or (R)-salmeterol (126). The synthesis of
Figure 18.30. 6')-salmeterol(79) is shown in Fig. 18.33.
6 Asymmetric Synthesis
Figure 18.31.
OH
- H
/// Br(CH2)60(CH2)4Ph,DMF
1
AcOH, water
Figure 18.33.
discussed for chiral auxiliaries below. Two unit available from D-pyranoses (136). Work-
acylations then complete the synthesis, with ers at Schering-Plough used this as the key
the final chiral center clearly derived from L- starting material in a concise synthesis of Sch
valine. 57939 (92), a P-lactam-based cholesterol ab-
The stereospecificity of binding at the his- sorption inhibitor (137).The condensation be-
tamine H3-receptor was investigated by pre- tween the dianion of (S)-3-hydroxy-y-butyro-
paring a series of ligands from D- or L-histidine lactone and an appropriate diary1 imine
(88)(134). It was found that compounds such proceeded with very high diastereo- and enan-
as (5')-(89)had greater affinity for the receptor tioselectivity, generating azetidinone (93)
than their R-enantiomers. In addition, replac- with a trans:& ratio of >95:5.
ing the aromatic moiety with a cyclohexyl Researchers at Abbott have been investi-
group (e.g., 90) switched the activity to ago- gating the use of pyrrolidinyl isoxazoles as nic-
nism for compounds with an amino group in otinic cholinergic channel activators (138).
the chain. Until recently, ABT-418 (97) was undergoing
Hydroxy acids are important c h i d start- clinical trials as a potential treatment for cog-
ing materials in the synthesis of many biolog- nitive impairment and decline and for Alzhei-
ically active compounds (135). (S)-3-Hydroxy- mer's disease. A short synthesis of ABT-418
y-butyrolactone (91)is a very useful synthetic was devised starting from commercially avail-
6 Asymmetric Synthesis
Captopril (82)
Figure 18.36.
\
(i) NaBH(TFA)3 (i) NaBH4,TFA
Figure 18.35.
Chirality and Biological Activity
1. LDA (2 eq.)/DMF/DMPU
2. Ar 'CH=NA?
0
(S)-3-Hydroxy-y-butyrolactone (91)
'F
Sch 57939 (93) (99.5%ee)
Figure 18.37.
Me
ABT-418 (97) (ee >99%)
Figure 18.38.
6 Asymmetric Synthesis
HN
i 0
NaHMDS,
THF, -78" to -50°C
BrCH2C02Bu
(100) 1 *
LiOOH, THFM20 eq,dO
0
t-Bu02C
6.127
Figure 18.39.
iary is then removed (and preferably recov- variety of different reactions (140, 141). The
ered), providing the product in high enantio- use of this chiral auxiliary in the preparation
meric excess. This process is most attractive of pharmaceuticals is widespread, and there
when both isomers of the auxiliary are readily are several large-scale processes using such
available in enantiomerically pure form, and chemistry (142).
where the reaction leads to high levels of ste- Abbott reported an improved synthesis of
reoselectivity in a predictable manner. Attach- ABT-627 (98)involving an asymmetric alkyla-
ing and removing the auxiliary should be tion of the valine-derived acyl oxazolidinone
straightforward and proceed without loss of (99) (143).ABT-627 (Atrasentan)is a selective
stereochemical integrity. endothelin ETAreceptor antagonist under de-
Many auxiliaries currently in use are de- velopment for the treatment of cancer, partic-
rived from 1,Parninoalcohols (140). These are ularly prostrate cancer. Acid (100) was acti-
readily available from natural sources with lit- vated as a mixed anhydride and treated with
tle or no synthetic manipulation and can react the lithium anion of the oxazolidinone to give
in a variety of ways to form conformationally (101). Both of the following deprotonation and
well-defined (usually cyclic) auxiliary systems. alkylation steps must be controlled to give
The use of oxazolidinones in asymmetric syn- high levels of stereoselectivity. The (Z)-eno-
thesis was developed by Evans et al.,and these late (102) is favored, both kinetically and ther-
oxazolidinones have been used extensivelv in a modynamically, by the bulky iso-propyl group
812 Chirality and Biological Activity
and is held rigid by chelation to the carbonyl their binding to Dl and D, dopamine receptors
oxygen of the oxazolidinone. The major stereo- was investigated by Cabedo et al. (146). The
isomer then results from alkylation of this synthetic route, illustrated by the preparation
chelated enolate anion from the least hindered of the (1s)-isomer involves stereoselective re-
"upper" face to yield (103) as the major prod- duction of the isoquinolinium salt (114) with
uct. There are many strategies for removal (R)-phenylglycinol (introduced in protected
and recovery of an oxazolidinone auxiliary form as 112) as the chiral auxiliary. The (1R)-
(141). In this case, hydrolysis with lithium enantiomer of (115), prepared in an analogous
peroxide provides the acid that is transformed fashion using (S)-phenylglycinol,binds to do-
into Atrasentan through a cyclization-ring pamine receptors with considerably less f i n -
contraction strategy controlled by the chiral- ity (>lo0 p N versus Dl and 61.2 pM versus
ity present in (103). D . In contrast, stereochemical differentia-
Tipranavir (PNU-140690)is a potent third- tion was not observed at the dopamine uptake
generation HIV protease inhibitor in clinical site for these compounds.
development by Boehringer Ingelheim (under Two different chiral auxiliary approaches
license from Pharmacia). The biological activ- have been applied to the synthesis of NPS
ity of such 5,6-dihydro-4-hydroxy-2-pyrone 1407 and it's enantiomer (119) (147). NPS
sulfonamides shows considerable stereochem- 1407 is an antagonist of the glutamate NMDA
ical variation (Table 18.3) (144).The R-config- receptor that has in vivo activity in neuropro-
uration is preferred at both chiral centers (3cr tection and anti-convulsant assays. The R-en-
and 6), and Tripanavir is more than 50 times antiomer was synthesized in four steps from
as potent as its enantiomer in a cell culture (116) with the chiral center introduced by. a
assay using HIV-lI,I,-infected H9 cells. An completely stereoselective alkylation of hydra-
asymmetric synthesis (145) begins with the zone (117). The chiral auxiliary, S-(-)-1-ami-
Michael addition of an aryl cuprate (derived no-2-(methoxylmethyl)pyrrolidine (SAMP),
from commercially available Grignard reagent was introduced by condensation with alde-
105) to the unsaturated oxazolidinone imide hyde (116) and removed by catalytic hydro-
(1041, yielding the adduct as a single diaste- genolysis. In the second method, the S-enan-
reomer (106). The nitrogen protecting group tiomer was formed in a four-step sequence
was changed and an acetyl group introduced with the chiral center installed by the Michael
to give ketone (107), which undergoes a addition of chiral amine (121) (formed in one
stereoselective aldol reaction with an acety- step from the readily available cr-methylben-
lenic ketone (108). The highest diastereoselec- zylarnine) to benzyl crotonate (120). NPS
tivity was obtained for this reaction using 1407 (123) was found to be 12 times more po-
Ti(OnBu)C1, as the Lewis acid. Both of the tent than it's enantiomer (119) at the NMDA
critical asymmetric steps to form new chiral receptor in an in vitro assay.
centers are controlled by the (R)-phenyl ox- An example of the use of a terpene as a
azolidinone. The chiral auxiliary is removed chiral auxiliary is provided by the synthesis
when (109) is treated with base to form the of the anti-viral reverse transcriptase inhib-
lactone ring. This is followed by two further itor Lamivudine (148). The nucleoside ana-
steps that generate PNU-140690 (110) as a log is marketed by Biochem Pharma (now
single enantiomer. Shire Pharmaceuticals) and Glaxo Wellcome
The enantioselective synthesis of dopa- (now GlaxoSmithKline) for the treatment of
minergic benzyltetrahydroisoquinolines and HIV and hepatitis B virus infection. In the
6 Asymmetric Synthesis
CuBrIDMSrrHF
0 O°C/l hour
*
N(TMS)z
(105)
MgBr
N(W2
1. Ti(0 " B U ) C I ~ / C H ~ C I ~ / - ~ ~ ~ C
OH 0 / /
- - 30
Figure 18.40.
production route, the glycolate derived from antio-enriched reagent system. The reaction
(-)-menthol(124) is coupled with thioacetyl proceeds through diastereomeric transition
dimer (125). The chiral auxiliary directs reac- states, resulting in the preferential formation
tion to install the desired (%)-stereochemis- of one enantiomer or diastereomer. Current
try in (126). In situ formation of chloro com- reagents can lack generality and may be diffi-
pound (127) is followed by a stereoselective cult to prepare in both chiral forms. At least
coupling reaction with trimethylsilyl cytosine one equivalent of the chiral component is re-
again directed by the (-)-menthy1 carboxy- quired, which can present economical and
late. Reductive removal of the auxiliary yields practical difficulties. Many examples are pro-
Lamivudine (129) as a single isomer that was
vided by the reduction of double bonds, espe-
found to have favorable toxicological and
cially ketones. Ketone (130) was reduced
pharmacokinetic properties to the racemate.
enantioselectively using either (+) or (-)-b-
6.3 Chiral Reagent
chlorodiisopinocampheylborane (149). Re-
duction with (-)-b-chlorodiisopinocampheyl-
In this approach, asymmetry is induced in a borane generated the alcohol (8)-(131), which
prochiral molecule or functional group by re- was transformed into the (1R,3S)-isochroman
action with a stoichiometric amount of an en- compound, (lR,3S)-(132), through a ste-
814 Chirality and Biological Activity
OTBDMS
-
- 0
0
Ph Ph
Ph
Ho
HO
9- <OH '4..
H
C--
t--
'I,..
JCOCH~P~
Ph
Ph Ph
(1s)-(115)(16.6 pM vs Dl (114) (78%de)
14.7 pM vs D2)
Figure 18.41.
1
-78"C, 89% MeLi, THF
H
NH2.HCI
i-'
2. HCI, 23%
F
<
1. H2, Pt02.H20
OMe
Figure 18.42.
potassium channel blocker that was devel- itor, was in clinical development for the treat-
oped for the treatment of cardiac arrhyth- ment of hypertension and congestive heart fail-
mia by Merck (155). ure, and its enantiomer does not possess the
Asymmetric hydrogenations with transition same biological activity. Several catalysts and
metal catalysts have been applied to single en- conditions were screened before arriving at op-
antiomer synthesis in the pharmaceutical in- timized conditions using cationic rhodium-
dustry with considerable success. ChiroTech (R,R)-MeDuPHOS (141), which provided the
and Pfizer developed an improved synthesis of product with complete enantioselectivity and
glutarate derivative (1391, an intermediate re- avoided previously observed problems associ-
quired for the synthesis of Candoxatril (140) ated with isomerization of the enone starting
(156).The drug, a neutral endopeptidase inhib- material. The reaction could be conducted at a
Chirality and Biological Activity
Lamivudine (129)
Figure 18.43.
high substrate-to-catalyst ratio of 3500:l with- bretastatin A-4 displays antitubulin activity
out a detrimental effect on enantiomeric excess and cytotoxicity to tumor cells and is therefore
or reaction rate. In catalytic asymmetric reac- an interesting lead structure for new antican-
tions, it is clearly economically advantageous to cer drugs. The asymmetric synthesis of (S,S)-
minimize the amount of catalyst that may com- combretadioxolane (144) involved treatment
prise expensive chiral material and transition of the trans-stilbene (142) with "AD-mix-a"
metals. [containing (DHQ),-PHAL] (145), whereas
A method for the asymmetric dihydroltyla- the enantiomer (R,R)-combretadioxolane re-
tion of alkenes to yield cis-diols was developed sulted from use of AD-mix-& which contains
by the research group of Sharpless using (DHQD),-PHAL as the chiral ligand. The tu-
chiral ligands derived from the cinchona alka- bulin polymerization-inhibitory activity of
loids dihydroquinidine (DHQD) and dihydro- (S,S)-wmbretadioxolane was comparable with
quinine (DHQ) with a catalytic amount of os- combretastatin A-4 (IC,, = 4- 6 CL2M) in an in
mium tetroxide (157). Although they are vitro assay, whereas (R,R)-combretadioxolane
diastereomers, the phthalazine ligands act as was essentially inactive (IC,, > 50 CLM). In ad-
"pseudo-enantiomeric" ligands, i.e., they give dition, (23,s)-combretadioxolane was 20 times
opposite asymmetric induction in a predict- more potent than vincristine as an in vitro
able manner. This procedure was recently growth inhibitor of the multidrug-resistant cell
used to prepare both isomers of combretadi- line PC-12.
oxolane (144), a chiral analog of the natural Workers at SmithKline Beecham reported
product Combretastatin A-4 (146) (158). Com- the stereoselective synthesis of inhibitors of
6 Asymmetric Synthesis
Figure 18.44.
3.136 (10mol%)
2. MeOH
Ph
cess that has been applied to the synthesis of to prepare in racemic form, and conversely,
biologically active compounds (162). As with difficult to prepare as single enantiomers by
any such resolution process, the maximum epoxidation of the corresponding alkene. (R)-
yield of enantiopure material is 50%based on 9-[2-(phosphonomethoxy)propyl]adenine (R-
starting material. Terminal epoxides are easy PMPA) is a nucleotide reverse transcriptase
"""P t-BuO
0
(138)
C02Na
[((RR)-Me-DuPHOS)Rh(COD)]BF4
Hz (5 atm)/MeOH
(COD = 1,5-cyclooctadiene)
Meow*
t-BuO
0
C02Na
Me..,,,
mO\
0
C02H
Me
(R,R)-Me-DuPHOS (141) Candoxatril (140)
Figure 18.47.
6 Asymmetric Synthesis 819
OMe
OMe
(S,S)-(143)(s99%ee;89% yield)
OMe
Cornbretastatin A-4 (146)
Figure 18.48.
inhibitor being developed by Gilead Sciences The process has been used by academic and
and a collaborative group from the University industrial groups and is operated by Rhodia
of Washington for the treatment and pre- ChiRex on a large scale (165).
vention of HIV infection (163). The com- A wide variety of synthetic processes have
pound can be prepared through kinetic res- been rendered asymmetric through the use of
olution of propylene oxide using (S,S)-(149) a chiral catalyst. In addition to the types of
and the resultant (R)-1-amino-2-propanol reaction described above, chiral transition
(153)was transformed into (R)-PMPA (154) metal catalysts have been used to influence
in five steps (162). the stereochemical course of isomerization,
In 1997, Tokunaga et al. reported the hy- cyclization, and coupling reactions. As an ex-
drolytic kinetic resolution of racemic termi- ample, an approach towards the natural prod-
nal epoxides using a Co(II1)-Salen catalyst uct (-)-epibatidine (158) was recently re-
(164). This remarkably general process uses ported by Namyslo and Kaufmann (166).
only water as the nucleophile and provides Epibatidine is a potent analgesic and a nico-
the synthetically useful chiral epoxides and tinic receptor agonist. The synthesis involves
diols in highly enantioenriched form. The an asymmetric Heck-type hydroarylation be-
catalyst can be recycled and the reactions tween the bicyclic alkene (155) and pyridyl
conducted under solvent-free conditions. iodide (156). A number of bidentate chiral li-
Chirality and Biological Activity
0 0
t-Bu t-Bu
Figure 18.49.
Figure 18.50.
References
Epibatidine (158)
Figure 18.51.
8. R. A. O'Reily, Clin. Pharmacol. Ther., 16, 348 27. J. T . F. Keurentjes and F. J. M. Voermans in
(1974). A. N. Collins, G. N. Sheldrake, and J. Crosby,
9. A. Breckenbridge, M. Orme, H. Wesseling, R. J. Eds., Chirality and industry. 11. Developments
Lewis, and R. Gibbons, Clin. Pharmacol. in the Manufacture of Optically Active Com-
Ther., 15,424 (1974). pounds, chap. 8,Wiley, NewYork, 1997, p. 157.
10. T . Walle, J. G. Webb, E. E. Bagwell, U . K. 28. S. C. Stinson, Chem. Eng. News, 73,44 (1995).
Walle, H. B. Daniell, and T . E. Gaffney, Bio- 29. E. Francotte, J. Chromatogr. A, 666, 565
chem. Pharamacol., 37, 115 (1988). (1994).
11. W . Lindner, M. Rath, K. Stoschitzky, and H. J. 30. CHIRBASE, available online at http://
Semmelrock, Chirality, 1, 10 (1989). chirbase.u-3mrs.fr, accessed on July 29,2002.
12. D. E. Drayer in I. W . Wainer and D. E. Drayer, 31. Daicel Chemical Industries, Ltd., available on-
Eds., Drug Sterochemistry-Analytical Methods line at http://www.daicel.co.jp/chiral,accessed
and Pharmacology, Marcel Dekker, New York, on July 29, 2002. NOVASEP, available online
1988, p. 209. at http://www.novasep.com, accessed on July
29, 2002. Astec, available online at http:ll
13. K. J. Fehske, W . E. Muller, and U . Wollert,
www.astecusa.com, accessed on July 29, 2002.
Biochem. Pharmacol., 30,687 (1981).
32. D. Boyd, M. O'Keeffe,and M. R. Smyth, Ana-
14. R. H. McMenamy and J. L. Oncley, J. Biol.
lyst, 119, 1467 (1994).
Chem., 233,1436 (1958).
33. D. A von Deutsch, I. K. Abukhalaf, L. E.
15. W . E. Muller in I. W . Wainer and D. E. Drayer, Wineski, H. Y . Aboul-Enein, S. A. Pitts, B. A.
Eds., Drug Sterochemistry-Analytical Methods
Parks, R. A. Oster, D. F. Paulsen, and D. E.
and Pharmacology, Marcel Dekker, New York,
Potter, Chirality, 12,637 (2000).
1988, p. 227.
34. B. Waldeck, E. Widmark, Acta Pharmacol.
16. S. Toon, L. K. Low, M. Gibaldi, W . F. Trager, Toxicol., 56,221-227 (1985).
R. A. O'Reily, C. H. Motley, and D. A. Goulart,
35. D. J. Triggle, D. A. Langs, and R. A. Janis, Med.
Clin. Pharmacol. Ther., 39, 15 (1986).
Res. Rev., 9 , 123 (1989);V . C. 0. Njar and
17. M. A. Campanero, B. Calahorra, M. Valle, I. F. A. M. H. Brodie, Drugs, 58,233 (1999).
Troconiz, and J. Honorato, Chirality, 11, 272
36. S. Visentin, P. Amiel, A. Gasco, B. Bonnet, C.
(1999).
Suteu, and C. Roussel, Chirality, 11, 602
18. R. Stevenson, Chem. Br., 37,24 (2001). (1999).
19. N. M. Maier, P. Franco, and W . Lindner, 37. P. Tullio, A. Ceccato, J-F. Liegeois, B. Pirotte,
J. Chromatogr. A, 906,3 (2001). A. Felikidis, M. Stachow, P. Hubert, J. Crom-
20. L. Miller, C. Orihuela, R. Fronek, D. Honda, men, J. Geczy, and J. Delarge, Chirality, 11,
and 0. Dapremont, J. Chromatogr. A, 849,309 261 (1999).
(1999). 38. J. Bruhwyler, J. F. Liegeois, J. Gerardy, J.
21. V . M. Meyer, Chirality, 7, 567 (1995);0. P. Damas, E. Chleide, C. Lejeuns, E. Decamp, P.
Kleidernigg, M. Lammerhofer, and W . Lind- De Tullio, J. Delarge, A. Dresse, and J. Geczy,
ner, Enantiomer, 1,387 (1996). Behav. Pharmacol., 9 , 731 (1998).
22. V . Schurig, J. Chromatogr.441,135 (1988);K. 39. T . Shibata, I. Okamoto, and K. Ishii, J. Liq.
Watabe, S. C. Chang, E. Gil-Av, and B. Kop- Chromatogr., 9,313 (1986);E. Yashima andY.
penhofer, Synthesis, 3,225 (1987). Okamoto, Bull. Chem. Soc. Jpn., 68, 3289
23. E. R. Francotte in S. Ahuja, Ed., Chiral Sepa- (1995).
rations, Applications and Technology, chap. 40. W . H. Pirkle, D. W . House, and J. H. Finn,
10, American Chemical Society, Washington J. Chromatogr., 192, 143 (1980).
DC, 1997, p. 271. 41. J. N. Kinkel in A. Subramanian, Ed., A Practi-
24. K. D. Altria, N. W . Smith, and C. H. Turnbull, cal Approach to Chiral Separations by Liquid
Chromatographia, 46,664 (1997);K. D. Altria, Chromatography, VHC, New York, 1994.
M. A. Kelly, and B. J. Clark, Trends Anal. 42. S. G. Allenmark, S. Andersson, P. Moller, and
Chem., 17,214 (1998). D. Sanchez, Chirality, 7,248 (1995).
25. K. L. Williams, L. C. Sander, and S. A. Wise, 43. M. Meurer, U . Altenhoner, J. Straube, and H.
J. Pharm. Biomed. Anal., 15, 1789 (1997);N. Schmidt-Traub, J. Chromatogr. A, 769, 71-79
Bargrnann-Leyder, A. Tambute, and M. (1997).
Claude, Chirality, 7 , 3 1 1 (1995). 44. M. Schulte, R. Ditz, R. M. Devant, J. N. Kinkel,
26. M. Juza, M. Mazzotti, and M. Morbidelli, and F. Charton, J. Chromatogr. A, 769, 93
Trends Biotechnol., 18, 108 (2000). (1997).
References
45. D. Y. Wang, F. Hanotte, C De Vos, and P. Clem- 63. S. Faulconbridge, H . S. Zavareh, G. R. Evans,
ent, Eur. J . Allerg. a n d Clin. Immunol., 56, and M. Langston, inventors, Medeva Europe
339 (2001). Ltd. (GB), assignee, World patent W0981
46. E. J. Corey and C. J. Helal, Tetrahedron Lett., 25902, June 18, 1998.
37,4837 (1996). 64. A. S . C. Chan, Chemtech., 3 , 4 6 (1993).
47. D. A. Pflum, H. Scot Wilkinson, G. J. Tanoury, 65. P . J. Harrington and E. Loderwijk, Org. Pro-
D. W. Kessler, H. B. Kraus, C. H. Senanayake, cess Res. Dev., 1, 72 (1997).
and S. A. Wald, Org. Process Res. Dev., 5 , 110 66. W. J. Pope and S. J. Peachey, J . Chem. Soc., 75,
(2001). 1066 (1899).
48. E. R. Francotte and P. Richert, J . Chromatogr. 67. R. Gristwood, H. Bardsley, H. Baker, and J.
A, 769, 101 (1997);M. Negawa and F. Shoji, Dickens, J. Exp. Opin. Invest. Drugs, 3 , 1209
J . Chromatogr., 590, 113 (1992). (1994).
49. G. Ganetsos, P. E. Barker, J. A. Johnson, R. G. 68. M. Langston, U. C. Dyer, G. A. C. Frampton, G.
Kabza, K. Hashimoto, S. Adachi, Y. Shirai, M. Hutton, C. J. Lock, B. M. Skead, M. Woods, and
Morishita, B. Balannec, G. Hotier, and H. Ma- H. Zavareh, Org. Process Res. Dev., 4, 530
kai in G. Ganetsos and P. E. Barker, Eds., Pre- (2000).
parative and Production Scale Chromatogra- 69. B. A. Astleford and L. 0. Weigel in A. N. Col-
phy, chaps. 11-15, Marcel Dekker, New York, lins, G. N. Sheldrake, and J. Crosby, Eds.,
1993, p. 233; D. B. Broughton and C. G. Ger- Chiralty in Industry I, chap. 6, Wiley, New
hold, inventors, Universal Oil Prod. Co., as- York, 1997, p. 99.
signee, US patent 2,985,589, May 23, 1961. 70. K. Flick and E. Frankus, inventors, Gru-
50. D. M. Ruthven, Principles of Adsorption and enenthal Chemie, assignee, US patent
Adsorption Processes, chap. 12, Wiley, New 3,652,589, March 28, 1972.
York, 1984, p. 380. 71. G. R. Evans, inventor, Darwin Discovery Ltd.,
51. E. R. Francotte, Chim. Nouvelle, 53, 1541 (US) assignee, World Patent W000132554,
(1996). June 8,2000; G. R. Evans, J. A. Henshilwood,
52. D. W. Guest, J . Chromatogr. A, 760, 159 and J. O'Rourke, Tetrahedron Asymmetry, 12,
(1997). 1663 (2001).
53. M. Schulte and J. Strube, J . Chromatgr. A, 72. L. G. Humber, Med. Res. Rev., 7 , l (1987).
906, 399 (2001). 73. L. G. Humber, J . Med. Chem., 29,871 (1986).
54. K. E. Goeringer, B. K. Logan, and G. D. Chris- 74. M. Woods, U . C. Dyer, J. F. Andrews, C. N. '
tian, J . Anal. Toxicol. 21, 529 (1997). Morfitt, R. Valentine, and J. Sanderson, Org.
55. E. Cavoy, M.-F. Deltent, S . Lehoucq, and D. Process Res. Dev., 4,418 (2000).
Miggiano, J . Chromatogr. A, 769,49 (1997). 75. D. Askin, K . K. Eng, R. M. Purick, K. M. Wells,
56. H. Lorenz, P. Sheehan, and A. Seidel-Morgen- R. P. Volante, and P. J . Reider, Tetrahedron
stern, J . Chromatogr. A, 908,201 (2001). Lett., 35,673 (1994).
57. J. Jacques, A. Collet, and S. H. Wilen, Enanti- 76. M. Kottenhahn, K . Stingl, and K. Drauz, in-
omers, Racemates a n d Resolutions, Krieger: ventors, Degussa (DE), assignee, US patent
Malabar, Florida, 1994. 6,093,823, July 25,2000.
58. A. Collet, M . J. Brienne, and J. Jacques, Chem. 77. M. Eichelbaum, Federation. Proc., 43, 2298
Rev., 80,215 (1980). (1984);M. Eichelbaum, Biochem. Pharmacol.,
59. S. H. Wilen in E. L. Eliel, Ed., Tables ofResolv- 3 7 , 9 3 (1988).
ing Agents and Optical Resolutions, Univer- 78. H. Echizen, T. Brecht, S. Neidergesass, B.
sity of Notre Dame Press, Notre Dame, IN, Volgelgesang, and M. Eichelbaum, Am.
1972; P. Newman, Optical Resolution Proce- Heart J.109,210 (1985).
dures for Chemical Compounds, vol. 1-3, Op- 79. 0. Ehrmann, H. Nagel, and W. Karau, inven-
tical Resolution Center, Manhattan College, tors, Knoll Aq (DE), assignee, US patent
New York, 1978-1984. 5,457,224, October 10, 1995 and World patent
60. M. J. Cannarsa, Chimica Oggi, 17,28 (1999). WO94/08950,April 14,1994.
61. M. Prashad, D. Har, 0. Repic, T. J. Blacklock, 80. E. J. Trieber, M. Raschack, and F. Dengel, in-
and P. Giannousis, Tetrahedron Asymmetry, ventors, Knoll Aq (DE), assignee, German
10,3111 (1999). patent 2059923,1972.
62. R. A. Maxwell, E. Chaplin, S. B. Eckhardt, J. R. 81. R. M. Bannister, M. H. Brookes, G. R. Evans,
Soares, and G. Hite, J . Pharmacol. Exp. Ther, R. B. Katz, and N. D. Tyrrell, Org. Process Res.
173,158 (1970). Dev., 4,467 (2000).
Chirality and Biological Activity
119. S.-W. Tsai and H . J . Wei, Enzyme Microb. 132. M. R. Attwood, C. H. Hassall, A. Krohn, G.
Technol., 16,328(1994). Lawton, and S. Redshaw, J. Chem. Soc., 1,
120. J.-Y. Xin, S.-B. Li, Y . Xy, J.-R. Chui; and C.-G. 1011-1019(1986).
Xi, J. Chem. Tech. Biotechnol., 76,579(2001). 133. E. J. Stoner, A. J. Cooper, D. A. Dickman, L.
121. M. E. Swarbrick, F. Gosselin, and W . D. Lubell, Kolaczkowski, J. E. Lallaman, J.-H. Liu, P. A.
J. Org. Chem., 64,1993-2002(1999). Oliver-Shaffer,K. M . Patel, J . B. Paterson Jr.,
122. W . L. Nelson and T . R. Burke, J. Org. Chem., D. J. Plata, D. A. Riley, H. L. Sham, P. J. Sten-
43,3641(1978). gel, and J.-H. J. Tien, Org. Process Res. Dev., 4,
123. R. M. Patel, Stereosel. Biocatal., Marcel Dek- 264-269(2000).
ker, Inc., New York, 2000, pp. 87-130.
124. E. P. Siqueira Fihlo, J. A. R. Rodrigues, and 134. J. T . Kovalainen, J. A. M. Christiaans, S. Koti-
P. J . S. Moran, Tetrahedron Asymmetry, 12, saari, J . T . Laitinen, P. T . Mannisto, L.
847(2001). Tuomisto, and J. Gynther, J. Med. Chem., 42,
125. P. A. Procopiou, G. E. Morton, M. Todd, and G. 1193-1202(1999).
Webb, Tetrahedron Asymmetry, 12, 2005 135. G. M. Coppola and H. F . Schuster, Chiral a
(2001). -Hydroxy Acids in Enantioselective Syntheses,
126. T . P. Jerusi, inventor, Sepracor Inc. (US),as- VCH,Weinheim, Germany, 1997.
signee, World patent W099113867,March 25, 136. G.Wang and R. Hollinsworth, J. Org. Chem.,
1999. 64,1036-1038(1999).
127. J. D. Morrison, Ed., Asymmetric Synthesis, Ac-
ademic Press, San Diego, CA, 1983;G. Procter, 137. G. G. W u , Org. Process Res. Dev., 4, 298-300
Asymmetric Synthesis, Oxford University (2000); G. W u , Y . S. Wong, X . Chen, and Z.
Press, Oxford, UK, 1996; M. N6gradi, Stereo- Ding, J. Org. Chem., 64, 3714-3718(1999).
selective Synthesis, 2nd ed., VCH, Weinheim, 138. D.S. Garvey, J. T .Wasicak, M .W . Decker, J . D.
Germany, 1995; D. J. Ager and M. L. East, Brioni, M. J. Buckley, J. P. Sullivan, G. M. Car-
Eds., Asymmetric Synthetic Methodology, CRC rera, M. W . Holladay, S. P. Arneric, and M.
Press, Boca Raton, FL, 1995; D. J. Ager, Ed., Williams, J. Med. Chem., 37, 1055-1059
Handbook of Chiral Chemicals, Marcel Dek- (1994).
ker, New York, 1999; P. O'Brien, J. Chem.
Soc., 1, 95-113 (2001); H. Tye and P. J. 139. N.-H. Ling, Y . He, and H. Kopecka, Tetrahe-
Comino, J. Chem. Soc., 1, 1729-1747 (2001); dron Lett. 36,2563-2566(1995).
P. I. Dalko and L. Moisan, Chem. Int. Ed., 40, 140. D. J. Ager, I. Prakash, and D. R. Schaad, Chem.
3726-3748 (2001); K. C. Nicolaou and E. J. Rev., 96,835-875(1996).
Sorensen, Classics in Total Synthesis, VCH, 141. D. A. Evans, J. M . Takacs, L. R. McGee, D. J.
Basel, Switzerland, 1996. Mathre, and J. Bartroli, Pure Appl. Chem., 53,
128. S. Redshaw in C. R. Ganellin and S. M. Rob- 1109 (1981); D. A. Evans, M. D. Ennis, and
erts, Eds., Medicinal Chemistry, 2nd ed., Aca- D. J. Mathre, J. Am. Chem. Soc., 104, 1737
demic Press, San Diego, 1993, pp. 163-186. (1982);D.A. Evans, Aldrichimica Acta, 15,23
129. A. A. Patchett, E. Harris, E. W . Tristram, M. J. (1982).
Wyvratt, M. T . W u , D. Taub, E. R. Peterson, 142. D. R.Schaad i n ref. 127, pp. 287-300.
T . J. Ikeler, J. ten Broeke, L. G. Payne, D. L.
Ondeyka, E. D. Thorsett, W . J. Greenlee, N. S. 143. S. J. Wittenberger and M. A. McLaughlin, Tet-
Lohr, R. D. Hoffsommer, H. Joshua, W. V . rahedron Lett., 40,7175-7178(1999).
Ruyle, J. W . Rothrock, S. D. Aster, A. L. May- 144. S. R.Turner, J. W . Strohbach, R. A. Tommasi,
cock, F. M. Robinson, R. Hirschmann, C. S. P. A. Aristoff, P. D. Johnson, H . I. Skulnick,
Sweet, E. H. Ulm, D. M. Gross, T . C. Vassil, L. A. Dolak, E. P. Seest, P. K. Tomich, M. J.
and C. A. Stone, Nature, 288,280(1980). Bohanon, M.-M. Horng, J. C. Lynn, K. T .
130. T . J. Blacklock, R. F. Shuman, J. W . Butcher, Chong, R. R. Hinshaw, K. D. Watenpaugh,
W . E. Shearin, J. Budavari, and V . J. J. M. N. Janakirarnan, and S. Thaisrivongs,
Grenda, J. Org. Chem., 53,836(1988). J. Med. Chem., 41,3467-3476(1998).
131. T . Ohashi and J. Hasegawa in A. N. Collins, 145. T . M. Judge, G. Phillips, J . K. Morris, K. D.
G. N. Sheldrake, and J. Crosby, Eds., Chirality Lovasz, K. R. Romines, G. P. Luke, J. Tulin-
i n Industry II: Developments in the Commer- sky, J. M. Tustin, R. A. Chrusciel, L. A. Dolak,
cial Manufacture andApp1ication.s of Optically S. A. Mizsak, W . Watt, J. Morris, S. L. Vander
Active Compounds, Wiley, Chichester, UK, Velde, J. W. Strohbach, and R. B. Gammill,
1997, p. 269. J. Am. Chem. Soc., 119,3627-3628(1997).
826 Chirality and Biological Activity
Contents
1 Introduction, 828
1.1 Development of Database, 828
1.2 Model Building, 829
1.3 Model Characterization, 831
1.4 Model Validation, 834
1.5 Applications and Mechanistic Studies, 836
2 Conclusions, 844
3 Acknowledgments, 844
carcinogenicity or the results of the Salmo- tive, marginally active, and inactive, or a con-
nella and E. coli WP uvrB mutagenicity assays tinuous scale of potencies) are entered, the
(23,24). Obviously, such data pooling must be program identifies the chemical substructures
based on a sound scientific basis as well as data significantly associated with the toxicological
that show extensive concordance between the phenomenon under investigation (Table
experimental results of the systems to be 19.2). Each of these structural determinants
pooled, i.e., that a substantial number of (''toxicophore") is associated with a base po-
chemicals must give identical results in the tency and a probability of activity (see Fig.
two systems, thereby indicating that results 19.1). The latter is derived from the distribu-
obtained with one system can be amalgamated tion of active and inactive molecules that con-
with those obtained in the other (25). For ex-
tain the toxicophore. The program also identi-
ample, when the same chemicals were tested
fies the chemicals that give rise to the
for their ability to induce sister chromated ex-
changes and chromosomal aberrations in cul- toxicophore (Table 19.3 and Fig. 19.7). This
tured Chinese hamster ovary (CHO) cells, enables the human expert (see below) to ascer-
they showed divergent results (26).Hence, the tain whether the structures of the chemicals
results of the two assays cannot be amalgam- giving rise to the toxicophores are germane to
ated into a single database to develop an SAR the test chemical whose toxicity is predicted.
model of cytogenetic effects. Similarly, even In addition to the toxicophores, the pro-
using the same indicator system, results can- gram also identifies modulators for specific
not be merged if different criteria are used to toxicophores (Table 19.4). These are substruc-
interpret the significance of the results. That tures or physicochemical parameters that de-
situation prevails with respect to the induc- termine whether the specific toxicity inherent
tion of mutations at the thymidine kinase lo- in the toxicophore will be expressed or
cus of mouse lymphoma cells vis-a-vis the cri- whether it is augmented further.
teria used by the U.S. National Toxicology Thus, when faced with a chemical of un-
Program versus those employed by the U.S. known activity, the program uses the presence
Environmental Protection Agency's GeneTox or absence of toxicophores and of modulators
Program. In fact, each data set gives rise to a to predict its toxicity (Figs. 19.1-19.3). Thus,
distinct SAR model (27-29). the presence of the toxicophore OH+ (a
On the other hand, the consensus database phenol) endows a chemical with an 87.5%
of potential developmental toxicity in humans, probability of being a contact allergen and a
based on experimental results in animals, ob- potency of 51 (moderate activity, see Table
servations in exposed humans, and expert 19.3). That basal activity is modulated by
judgment, yields a coherent SAR model of de- -25.8 x electronegativity (see Table 19.3).
velopmental risks to humans (30). That model For the example in Fig. 19.1, this results in a
is distinct from SAR models of developmental further increase in potency. The total potency
toxicity to individual rodent species (31). of 55 units corresponds to a moderately strong
activity (Table 19.3). A chemical with that
toxicophore may also contain a structural
1.2 Model Building
modulator that augments the basal activity
Once a "learning set" (i.e., database) satisfy- further (Fig. 19.2). On the other hand, the
ing preset criteria for acceptance (3) has been chemical may contain a modulator which com-
developed, the model building phase can be- pletely abolishes a chemical's potential to be
gin. In general, this is a straightforward pro- an allergen (Fig. 19.3). Additionally, the MUL-
cess that is specific for the SAR method em- TICASE SAR program will identify substruc-
-ployed.
- tures that are absent from the learning set and
Here, I will exemplify the various stages therefore may introduce an element of uncer-
with the MULTICASE SAR system ( 3 4 3 2 ) . tainty in the prediction, i.e., the "unknown"
Thus. once the structures of the chemicals and substructure could represent either potential
an indication of their potency (i.e., either ac- toxicophore or a modulator that alters a rec-
Table 19.2 Major Toxicophores Associated with Allergic Contact Dermatitis in Humans
Toxicophore
Fragment N* Inactives* Marginals* Actives* No.
The database and derivation of the SAR model have been described (33).
*N indicates the number of chemicals in the database that contain that toxicophore. "Inadives," "marginals," and "actives" indicate the distribution of that toxicophore among
activity groups.
Toxicophore No. 4 is shown embedded in chemicals in Figs. 19.1-19.3 and No. 5 is shown in Fig. 19.5.
C indicates a carbon atom shared by two rings; (3-NH,) indicates an amino group attached to the third non-hydrogen atom from the left. In toxicophore No. 17, the last carbon
to the right is shown as unsubstituted. This means that it can be substituted with any atom except a hydrogen. On the other hand, in toxicophore No. 8, the penultimate carbon is
shown unsubstituted; it can only be substituted by an amino group (i.e., (SNH,). However, the last carbon of that toxicophore is shown with an attached hydrogen. It cannot be
substituted by any other atom.
1 Introduction
Chemicals Potency"
1.3 Model Characterization
2,3-Dimercapto-1-propanol 55
2-Mercaptoethanesulfonic acid 55 As mentioned above, the nature of the SAR
2-Mercaptoethyl methyl sulfone 45 model that is derived is a reflection of the com-
2-Mercaptoethyl urea 35 plexity of the toxicological phenomenon that it
2-Methoxyethyl mercaptoacetate 35 describes, as well as of the size of the learning
N-(1,l-dimethylolethyl) and the extent to which it includes chemical
mercaptoacetamide classes and/or substructures that are repre-
N,N-dimethyl mercaptoacetarnide sentative of the chemical species to which it
N-(2-mercaptoethyl)acetamide will be applied. Thus, the chemical substruc-
N42-mercaptoethyl) pyrolidone tures present among therapeutics are much
N-(mercaptoacetyl) urea
greater and diverse than, for example, those
N4mercaptoacetyl) glycine
N-(mercaptoethyl) morpholine used or generated in the chemical or agricul-
N-methyl mercaptoacetamide tural industries. This means that SAR models
N-trimethylolmethyl used to examine pharmacologically active sub-
mercaptoacetamide 45 stance must contain a greater variety of chem-
Cysteine 45 ical substructures. This may well translate
Mercaptoacetamide 55 into a requirement for a larger experimental
Mercaptoacethydrazide 45 data set (i.e., one containing an increased
Mercaptoacetic acid 45 number of chemicals).
Thioglycerol 55 In evaluating the SAR model, it is of impor-
The program identifies the chemicals that are responsi- tance to determine the relationship between
ble for toxicophore No. 5 of Table 19.1 (see also Fig. 19.7). its predictivity and the size of the database to
The toxicophore is shown embedded in a molecule in Fig. determine whether the model is ovtimal. This
19.5. m
"The allergenic potencies were defined based on the per- can be ascertained by first determining the
cent responders in the human maximization test as follows model's predictivity (see below), and then sys-
(33):10, Non-sensitizer; 25, "marginal" (4-7% responders); tematically decreasing the size of the database
39, "weak" (8-23% responders); 49, "moderate": (2445% by random deletion of chemicals to determine
responders); 59, "strong" (56-83% responders); 69, "ex-
treme" (84-100%responders).
the predictive parameters of the model de-
rived from the reduced data set. Doing this
iteratively will allow a determination of the
ognized toxicophore or a noninformative relationship between database size and con-
structure unrelated to toxicity (Fig. 19.4). cordance between predicted and experimen-
It should be stressed that not every experi- tally derived results (22). If the relationship,
mental data set gives rise to a coherent SAR including the value for the SAR model derived
model. Failure to construct a model may be from the total database is linear, then the
caused by the fact that the experimental data model will not be optimally predictive and con-
are invalid or that they do not reflect a specific sideration should be given to obtaining addi-
toxicological phenomenon. Additionally, the tional experimental data and deriving a fur-
phenomenon under investigation may be so ther model. On the other hand, if the
complex or be the result of so many different relationship including the data for the SAR
mechanisms that the experimental database is model derived from the total database is no
not sufficiently large to describe it. With this longer linear, the size of the data set may be
in mind, it should be stressed that the predic- satisfactory. Incremental data may not yield a
tivity of the SAR model will be a reflection of correspondingly significant increase in the
the complexity of the phenomenon, the size of model's performance. Thus, the predidivity of
the database (i.e., the number of chemicals for the SAR model of mutagenicity in Salmonella
which experimental data are available), and the improves linearity until a database size of 350
ratio of activesJinactivesin the data set (3,221. chemicals is reached, and then it plateaus (22).
Table 19.4 List of MODULATORS Related to Toxicophore OH -c =
(conf.level=lOO%)
Another concern relates to the effect of the is a function of the size of the database (22,34,
ratio of active to inactive chemicals in the data 35), it follows that if the number of actives
set. Some SAR models are most predictive exceeds the number of inactives that removal
when that ratio is unity (3, 22). Hence, for a of actives to achieve a ratio of unity is not the
model that will be widely used for hazard iden- optimal solution. Rather, we have found that
tification and risk assessment purposes, it supplementing the database with randomly
would be of importance to determine whether selected chemicals from a "pool" of normal
its performance is optimal. Thus, if the num- physiological chemicals (amino acids, sugar,
ber of inactives exceeds the number of actives, lipids, purines, pyrimidines, etc., but exclud-
the number of inactives can be decreased by ing hormones, prostaglandins, and vitamins),
randomly removing the appropriate number assuming these chemicals to be inactive, is a
of inactives and determining the performance viable alternative (36,37). This is based on the
of the resulting SAR model. The random dele- recognition that the biological and/or toxico-
tion of inactives and the model derivation logical phenomena being modeled occur in a
should be repeated several times to ascertain milieu that is rich in these physiological chem-
that a robust model has been derived. We icals.
found that because the nature of the toxico- Finally, the "informational content" of an
phores is determined primarily by the actives SAR model determines its coverage. Thus, if a
and because the "quality" of the toxicophores test molecule contains a substructure un-
834 Structural Concepts in the Prediction of the Toxicity of Therapeutical Agents
known to the model, this introduces a measure moieties are present will allow a determina-
of uncertainty into the SAR prediction. In the tion of their importance and thereby identifies
MULTICASE SAR program, such an "un- chemicals that should be tested and the re-
known" moiety is flagged (Fig. 19.4). We have sults included in the model to improve the pre-
found that a satisfactory approach to deter- dictive performance. This is based on the ob-
mining informational content is to challenge servation that the greater the informational
an SAR model with a panel of 10,000 chemi- content (i.e., the fewer warnings of "un-
cals representative of the "universe of chemi- known" moieties), the greater the model's pre-
cals" and determining the frequency with dictivity (22,34,35).
which the SAR predictions are accompanied
1.4 Model Validation
by - of the presence of "unknown"
" a "warning"
substructures. An enumeration of the fre- In its application to toxicology, SAR can serve
quency with which the individual unknown two functions: (1)to predict a specific toxico-
1 Introduction
OH -cU
Figure 19.3. Prediction of the lack of contact allergenicity of zingerone.Whereas the presence of the
toxicophore (A) is associated with a probability of activity and a potency, the presence of the inacti-
vating modulator (B) abolishes the potency. Moreover, the presence of a deactivating moiety (C),
which is present in five chemicals in the database that are devoid of allergenicity (Table 19.2, No. 19),
further decreases the likelihood that the zigerone is a contact allergen.
logical effect based on the identification of chemicals) and the specificity (number of cor-
substructures significantly associated with rect negative predictionsltotal number of neg-
that activity and (2) to gain insight into the ative chemicals) (22). Moreover, because the
mechanistic basis of that effect. basic function of SAR applied to toxicological
To be useful in its predictive mode, the per- phenomena is the prevention, reduction, or
formance of a model does not need to be per- elimination of harmful chemicals from the
fect, but it must be known. The predictivity of home, the environment, and the workplace,
an SAR model is defined by the concordance risk averse prediction models are preferred.
between the predictions of chemicals external That is achieved by the development of SAR
to the SAR model and the experimentally de- models that yield a low frequency of false neg-
termined toxicities. The predictivity is gov- ative predictions, i.e., high specificity. Obvi-
erned by the sensitivity (number of correct ously, ideally the model should have high sen-
positive predictions/total number of positive sitivity as well as high specificity (38).
Structural Concepts in the Prediction of the Toxicity of Therapeutical Agents
Figure 19.4. Prediction of the lack of contact allergenicity of of dehydroalantolactone. The chemical
contains no toxicophore; therefore, it is presumed to be inactive. However, it contains two structures
(shown in bold) that are "unknown" to the model. That introduces an element of uncertainity in the
prediction.
The simplest way to determine predictivity however, the learning set consists of less than
parameters is to remove initially from the data 150 chemicals, a more tedious procedure may
set a random representative sample (e.g., 5%) be required, wherein one to two chemicals
to be used as a "tester set," to develop the SAR (i.e., n-1 or n-2) are removed at a time to serve
model on the remaining chemicals (i.e., 95%), as the "tester set" and the process is repeated
and then challenge the model with the "tester n or n/2 times.
set" and ascertain the predictivity. However,
as has been demonstrated on a number of oc-
1.5 Applications and Mechanistic Studies
casions, the predictivity of an SAR model is
determined by the size of the database (221, As has been mentioned earlier (Table 19.11,
and as in most instances, the size of the avail- SAR methodologies can be divided into two
able data set is not optimal, therefore, further general non-mutually exclusive approaches:
decreasing the size of the learning set by se- (1) hypothesis driven and (2) knowledge
questering the "tester set" is not optimal. based. The former is rule driven, wherein spe-
To overcome this limitation, a cross-valida- cific properties or chemical substructures are
tion approach has been used (39). In that pro- looked for, e.g., mutagens are electrophiles
cedure, a portion of the database (e.g., 5%) is and hence one would look for electrophilic or
randomly selected and removed, and a model proelectrophilic moieties. This approach as-
is developed from the remaining 95%. That sumes that mutations are caused solely by co-
model is challenged with the "tester set" (5%). valent binding of electrophiles to DNA. Agents
That procedure is repeated 20 times, and the that induce mutations by a nonelectrophilic
cumulative predictivity is determined. The fi- (i.e., non-DNA damaging) mechanism will not
nal SAR model includes the complete database be detected. Thus, agents that mutagenize
i . . , 100%). Because the predictive perfor- purely as a result of intercalation between
mance is a function of the size of the database, DNA base pairs (e.g., acridine orange,
the performance of the final model will be bet- ethidium bromide) will not be identified. Such
ter than that based on 95% of the data. When, rules are based on prior knowledge and/or in-
1 Introduction
SH -a2
(conf.levrl=100%)
tuition and do not necessarily require adher- tal results in the database (3) as well as in
ence to strict statistical criteria. examining the plausibility of the final model
The approach illustrated herein, exempli- based on exact knowledge of the toxicological
fied by MULTICASE (31, is knowledge based. phenomenon under investigation. The human
The input consists of the structures and toxi- expert again also determines the acceptability
cological activities of the chemicals in the of individual predictions (see below).
learning set. The program then identifies Once an SAFt model has been developed
structural descriptors (toxicophores) that are and validated, it can be applied in a number of
significantly associated with activity (see Ta- fashions. SAFt methodologies, such as MULTI-
ble 19.3). The human expert participates in CASE (3-5), which document predictions (Ta-
setting criteria for the inclusion of experimen- ble 19.21,are obviously preferable to those
Structural Concepts in the Prediction of the Toxicity of Therapeutical Agents
S -C
\\
N
/
C"
.
(conf lcvel=W%)
that operate like a "black box." The latter is associated with an 89% probability of carci-
simply provides a likelihood that a test chem- nogenicity and a potency of 63 units, which
ical is active or inactive. When, however, the corresponds to a TD,, value of 0.039 mmoljkg
SAR prediction is accompanied by documenta- per day (40). However, the program flags the
tion of the basis of that forecast, the human toxicophore because its environment in epi-
expert can determine whether it is justified tholone A is significantly different from that of
and whether it applies to the specific test the molecules in the learning set (Fig. 19.6). In
chemical. fact, examination of the structures of the mol-
Thus, the mucolytic agent N-acetyl-L-cys- ecules that contribute to the biophore (Fig.
teine is predicted to have a potential to induce 19.7) indicates that indeed the molecules are
allergic contact dermatitis by virtue of the bio- quite different from that of epitholone A, and
phore SH--CH, (Fig. 19.5). Moreover, exami- hence, the prediction of carcinogenicity can be
nation of the chemicals that contribute to that disregarded (however, also see below).
toxicophore reveals that indeed they all have Moreover the molecules that contributed to
the substructure in an environment that is this toxicophore (Fig. 19.7), even though they
similar to the one found in N-acetyl-L-cysteine contain the W N - C ! = moiety (Fig. 19.61,
(Table 19.3). On the other hand, the tubulin also contain functionalities (i.e., "structural
polymerization perturber (and potential anti- alerts") that are associated with carcinogenic-
neoplastic agent) epitholone A (Fig. 19.6) is ity/genotoxicity such as nitro, amino, and hy-
predicted to be a mouse carcinogen by virtue drazino groups. In fact, these could be respon-
of the toxicophore units shown in bold. That sible for the murine carcinogenicity of these
toxicophore is present in five molecules in the chemicals. Obviously, these latter functional-
learning set. The presence of that toxicophore ities are absent in epitholone A.
1 Introduction
H'
Epitholone A
Figure 19.7. Structures of epitholone A and of chemicals that contain the toxicophore. The toxico-
phore (Fig. 19.6) is shown in bold.
Based on all of these considerations, the A, on the other hand, was not predicted to be
"human expert" would overrule the prediction genotoxic (i.e., a DNA-damaging agent), evi-
of rodent carcinogenicity. Additionally, in denced by its lack of potential to induce muta-
overriding the computer-based prediction, tions in Salmonella, error-prone DNA repair,
cognisance was also taken of the understand- or unscheduled DNA synthesis in rat hepato-
ing that the vast majority of recognized hu- cytes (Table 19.5). Thus, even if the potential
man carcinogens are genotoxicants, i.e., for murine carcinogenicity were accepted, in
"genotoxic carcinogens" (41-44). Epitholone view of the fact that the vast majority of rec-
1 Introduction
CASE units **
Figure 19.8. Prediction of the ability of colchicine to inhibit tubulin polymerization. The structure
of colchicine is shown in Fig. 19.9. The biophore is shown in bold (a) in Fig. 19.9.
ognized human carcinogens are mutagens1 nogenicity in mice based on the differences in
genotoxicants or are hormones and epitholone chemical environments between epitholone A
A is neither, it would not represent a human and the molecules responsible for the toxico-
risk. phore (Figs. 19.6 and 19.7), he could examine
If, based on the above, it were accepted that mechanisms of non-genotoxic carcinogenicity,
epitholone A is not genotoxic, and if the hu- even though its relevance to human may not
man expert examining the documentation be applicable. One of the mechanisms of non-
wished not to override the prediction of carci- genotoxic carcinogenicity is inhibition of in-
tercellular communication (53). Epitholone A
does not possess such a potential (Table 19.5).
Another mechanism for non-genotoxic rodent
carcinogenesis may involve systemic or cell
toxicity followed by mitogenesis (54-56). This
may occur as a consequence of including the
maximum tolerated dose (MTD) in the cancer
bioassay protocol. When this is done, up to
50% of chemicals tested are found to be rodent
carcinogens (54). Obviously, this MTD situa-
tion rarely, if ever, applies to humans. Still,
epitholone A has the potential for inducing
Figure 19.9. Structure of colchicine. The biophore cellular as well as systemic toxicity (Table
A (bold, see Fig. 19.8) is responsible for the thera-
19.5), which may explain its potential carcino-
peutic effectiveness. Toxicophore B (see Fig. 19.10;
shown in bold) is responsible for the induction of genicity in mice, were we to discount the dif-
sister chromatid exchanges (SCE) in vivo. Removal ference in chemical environment.
of toxicophore B or its replacement be isopropoxy Obviously, the availability of a number of
groups abolishes the induction of SCEs without af- characterized and validated SAR models al-
fecting the therapeutic potential. lows the development of a putative toxicologi-
842 Structural Concepts in the Prediction of the Toxicity of Therapeutical Agents
Figure 19.10. The potential of colchicine to induce sister chromatic exchanges in vivo. The struc-
ture of colchicine and of the toxicophore B is given in Fig. 19.9. One of the inactivating modulators (c)
is also shown in bold in Fig. 19.9.
cal profile (Table 19.6). This can be used as a tials. However, the potential for inducing
guideline in the product developmental phase SCEs i n vivo is associated with the methoxy
to select lead compounds least likely to induce moiety (Figs. 19.9 and 19.10). Removal of that
unwanted side effects. However, the SAR ap- moiety or replacing it with an isopropoxy
proach can also be used to optimize beneficial group abolishes the SCE-inducing ability of
effects and decrease or eliminate unwanted CH without affecting its potential for iTP (i.e.,
toxic effects. the basis of its anti-inflammatory action).
Thus, let us examine colchicine (CH), an Finally, SAR approaches can also be used to
anti-inflammatory agent that has been in use provide a basis for making intelligent risk as-
for several centuries for the treatment of gout. sessments. Thus, it has been shown that the
The anti-inflammatory potential of CH is un- similarity in biophores/toxicophores present
derstood to derive from its ability to inhibit in different SAR models of toxicological phe-
tubulin polymerization (iTP) (57).That is also nomena provides a measure of mechanistic
the basis of the anticancer activity of pacli- similarity (3). The SAR models of mutagenic-
taxel (Taxol) (58-60). The structural basis of ity in Salmonella and of error-prone DNA re-
that activity derives from the presence in CH pair (SOS Chromotest) show significant over-
of the N H - C H - C . = moiety (Figs. 19.8 and laps (Table 19.7). This is not unexpected
19.9), which endows the molecule with a 93% because DNA is the target of both phenomena,
probability of activity. However, colchicine and the tester strain used for the Salmonella
also has the potential for inducing sister chro- mutagenicity assays contains a plasmid that
matid exchanges (SCEs) in vivo (Fig. 19.10). codes for error-prone DNA repair (61). In fact
This SCE-inducing ability may endow it with there is a substantial (though not complete)
genotoxic and developmental toxicity poten- overlap among chemicals that cause the two
1 Introduction
33. C. Graham, R. Gealy, 0. T. Macina, M. H. Karol, 52. M. Rosenkranz, H. S. Rosenkranz, and G. Klop-
and H. S. Rosenkranz, Quant. Struct. Activ. Re- man, Mutat. Res., 381, 171-188 (1997).
lat., 15, 224-229 (1996). 53. J. E. Trosko and C. C. Chang in R. W. Hoerger
34. N. Takihi, Y. P. Zhang, G. Klopman, and H. S. and F. D. Hoerger, Eds., Banbury Report 31:
Rosenkranz, Mutagenesis, 8,257-264 (1993). Carcinogen Risk Assessment: New Directions in
35. N. Takihi, Y. P. Zhang, G. Klopman, and H. S. Qualitative and Quantitative Aspects, Cold
Rosenkranz, Qual. Assur. Good Pract. Regul. Spring Harbor Laboratory Press, Cold Spring
Law, 2,255-264 (1993). Harbor, NY, 1988, pp. 139-174.
36. H. S. Rosenkranz and A. R. Cunningham, Mu- 54. B. N. Ames and L. S. Gold, Proc. Natl. h a d . Sci.
tat. Res., 476, 133-137 (2001). USA, 87,7772-7776 (1990).
55. S. M. Cohen and L. B. Ellwein, Science, 249,
37. H. S. Rosenkranz and A. R. Cunningham, SAR
1007-1011 (1990).
QSAR Environ. Res., 12,267-274 (2001).
56. S. Preston-Martin, M. C. Pike, R. K. Ross, P. A.
38. V. Chankong, Y. Y. Haimes, H. S. Rosenkranz,
Jones, and B. E. Henderson, Cancer Res., 50,
and J. Pet-Edwards, Mutat. Res., 153, 135-166
7415-7421 (1990).
(1985).
57. E. ter Haar, H. S. Rosenkranz, E. Hamel, and
39. Y. P. Zhang, N. Sussman, G. Klopman, and H. S. B. W. Day, Bioorg. Med. Chem., 4, 1659-1671
Rosenkranz, Quant. Struct. Activ. Relat., 16, (1996).
290-295 (1997).
58. E. Hamel, Med. Res. Rev., 16,207-231 (1996).
40. A. R. Cunningham, H. S. Rosenkranz, Y. P. 59. P. B. Schiff and S. B. Horwitz, Proc. Natl. h a d .
Zhang, and G. Klopman, Mutat. Res., 398,l-17 Sci. USA, 77,1561-1565 (1980).
(1998).
60. P. B. Schiff, J. Fant, and S. B. Horwitz, Nature
41. F. K. Ennever, T. J. Noonan, and H. S. Rosen- (Lond.),277,665-667 (1979).
kranz, Mutagenesis, 2, 73-78 (1987). 61. J. McCann, N. E. Spingarn, J. Kobori, and B. N.
42. H. Bartsch and C. Malaveille, Cell Biol. Toxicol., Ames, Proc. Natl. Acad. Sci. USA, 72,979-983
5, 115-127 (1989). (1975).
43. J. Ashby and R. S. Morrod, Nature, 352, 185- 62. V. Mersch-Sundermann, U.Schneider, G. Klop-
186 (1991). man, and H. S. Rosenkranz, Mutagenesis, 9,
44. M. D. Shelby, Mutat. Res., 204,3-15 (1988). 205-224. (1994).
45. J. Ashby and R. W. Tennant, Mutat. Res., 257, 63. J. A. Heddle, M. C. Cimino, M. Hayashi, F. Ro-
229-306 (1991). magna, M. D. Shelby, J. D. Tucker, Ph. Van-
parys, and J. T. MacGregor, Environ. Mol. Mu-
46. L. S. Gold, N. B. Manley, T. H. Slone, G. B.
tagen., 18,277-291 (1991).
Garfmkel, L. Rohrbach, and B. N. Ames, Envi-
ron. Health Perspect., 100,65135 (1993). 64. K. H. Mavournin, D. H. Blakey, M. C. Cimino,
M. F. Salamone, and J. A. Heddle, Mutat. Res.,
47. H. S. Rosenkranz and G. Klopman, Mutat. Res., 239,29-80 (1990).
228,51-80 (1990).
65. H. Tinwell and J. Ashby, Environ. Health Per-
48. V. Mersch-Sundermann, G. Klopman, and H. S. spect., 102, 758-762 (1994).
Rosenkranz, Mutat. Res., 340,81-91 (1996). 66. E. ter Haar, B. W. Day, and H. S. Rosenkranz,
49. Y. P. Zhang, A. van Praagh, G. Klopman, and Mutat. Res., 350, 331337 (1996).
H. S. Rosenkranz, Mutagenesis, 9, 141-149 67. E. ter Haar, R. J. Kowalski, E. Hamel, C. M. Lin,
(1994). R. E. Longley, S. P. Gunasekera, H. S. Rosen-
50. H. S. Rosenkranz and G. Klopman, Environ. kranz, and B. W. Day, Biochemistry, 35, 243-
Mol. Mutagen., 21, 193-206 (1993). 250 (1996).
51. H. S. Rosenkranz, E. J. Matthews, and G. Klop- 68. H. Tinwell and J. Ashby, Carcinogenesis, 15,
man, Altern. Anim. Test., 20, 549-562 (1992). 1499-1501 (1994).
CHAPTER TWENTY
B. Cox
Medicinal Chemistry
Respiratory Diseases Therapeutic Area
Nvvartis Pharma Research Centre
Horsham, United Kingdom
R. D. WAIGH
Department of Pharmaceutical Sciences
University of Strathclyde
Glasgow, Scotland
Contents
1 Introduction, 848
2 Drugs Affecting the Central Nervous System,
849
2.1 Morphine Alkaloids, 849
2.2 Conotoxins, 851
2.3 Cannabinoids, 852
2.4 Asperlicin, 855
3 Neuromuscular Blocking Drugs, 856
3.1 Curare, Decamethonium, and Atracurium,
856
4 Anticancer Drugs, 858
4.1 Catharanthus Winca) Alkaloids, 858
4.2 Camptothecin, 860
4.3 Paclitaxel and Docetaxel, 861
4.4 Epothilones, 864
4.5 Podophyllotoxin, Etoposide, and Teniposide,
865
4.6 Marine Sources, 867
5 Antibiotics, 868
5.1 p-Lactams, 868
Burger's Medicinal Chemistry and Drug Discovery 5.2 Erythromycin Macrolides, 874
Sixth Edition, Volume 1: Drug Discovery 5.3 Streptogramins, 876
Edited by Donald J. Abraham 5.4 Echinocandins, 877
ISBN 0-471-27090-3 O 2003 John Wiley & Sons, Inc. 6 Cardiovascular Drugs, 878
848 Natural Products as Leads for New Pharmaceuticals
2 DRUGS AFFECTING THE CENTRAL that have occurred since the isolation of mor-
NERVOUS SYSTEM phine in 1803. Codeine (2) continues to be
used widely for the treatment of moderate
2.1 Morphine Alkaloids pain and, although present in the opium poppy
The history of the opium alkaloids is too well (Papaver somniferum), it is normally synthe-
known to warrant repetition here, but the an- sized in higher yield from morphine (4).
algesics based on morphine (1)are too impor- Other than codeine, the earliest significant
tant to be left out of an account of natural semisynthetic derivative of morphine is the di-
products as leads. Thus we shall summarize acetate heroin (31, which is still widely used in
the clinically more important developments terminal cancer where its addictiveness is ir-
Natural Products as Leads for New Pharmaceuticals
(1)morphine R1 = Rz = H
(2) codeine Rl = CH3, R2 = H
(3) heroin Rl = Rz = COCH3 (5) naloxone
relevant. Acetylation masks the polar hydroxy give the morphinans (6). The system may be
groups, so that penetration into the central simplified even further (9),to give the benzo-
nervous system (CNS)is enhanced; hydrolysis morphans (7), although neither these nor the
then occurs to liberate the phenolic hydroxyl, morphinans have provided the long-sought
giving an active analgesic, and ultimately re- analgesic without addictive properties.
generates morphine (5). Heroin was thus one
of the first prodrugs.
Modifications to the C-ring of morphine are
legion, but none of the derivatives is free
from addictive liability, though many have
been used clinically. N-Demethylation and
realkylation yield more interesting analogs,
notably N-allylnormorphine and nalorphine
(4), which is a morphine antagonist (6). Fur-
ther modification leads to naloxone (51,
which unlike nalorphine has very little ago- (6) morphinan
nist activity (7) and has retained a place in
therapy for treatment of opiate-induced re-
spiratory depression. Naloxone will also pre-
cipitate withdrawal symptoms in opiate ad-
dicts, thereby facilitating diagnosis.
(7) benzomorphan
(9) etorphine
(11) atropine
(10) buprenorphine
OH
(13) conotoxin analog
easy target for synthesis and gives it poor dis- drocannabinol(14) (THC), which has a multi-
tribution properties in vivo (17). plicity of actions. In animals the effects
SNX-111 blocks N-type calcium channels, include sedation and apparent hallucinations
which are located throughout the CNS on neu- (19), which are similar to the major effects in
ronal somata, dendrites, dendritic spines, and the CNS in humans. There are also cardiovas-
axon terminals, where they play a major role cular effects, notably tachycardia and postural
in the regulation of the neurotransmitters as- hypotension, that can be separated from the
sociated with pain transmission and stroke. CNS action, as in the synthetic analog A,,,,,-
The drive is to discover an orally active, selec- dimethylheptylTHC (151, which has minimal
tive, small-molecule modulator of N-type cal- CNS activity (20).
cium channels to overcome the disadvantages
of administration of SNX-111.
High-throughput screening campaigns have
resulted in a number of leads being identified;
whereas others have chosen to modify known
drugs shown to block N-type channels. Work-
ers at Parke-Davis, however, employed a li-
gand-based approach using the three-dimen-
sional solution structure of the peptide (18).
Compounds such as (13)were designed where
key binding motifs are attached to an alkyl- (14) THC
phenyl ether scaffold. The compound had an
IC,, value of 3.3 pit4 in a human N-type
channel assay but showed no selectivity over
the L-type channel. Structure-activity work
on the conotoxins has shown that other re-
gions of the peptide, absent in these syn-
thetic ligands, are responsible for channel
family selectivity (17, 18).
2.3 Cannabinoids
The plant Cannabis sativa has been used by
humans for thousands of years, both for the
effects when ingested and for making rope
from the fibers in the stem. The major constit- Given the widespread illicit use of C. sativa,
uent of pharmacologicalinterest is A,-tetrahy- it was perhaps inevitable that eventually one
2 Drugs Affecting the Central Nervous System
(22) resiniferatoxin
(21) capsaicin
2 Drugs Affecting the Central Nervous System 855
(29) tubocurarine R = H
(32) metocurine R = CH3
0
0 /\/COOCH2CH2N(CH3)3
(H3C)3NCH2CH20C0
1 pseudocholinesterase
0
"4-COOH
(H3C)3NCH2CH20C0
0
HOCH2CH2N(CH3)3
(35) laudexium
(36) atracurium
pH 7.4
(37) mivacurium
Lilly introduced vinblastine and vincristine tion. Selective ammonolysis of the ester func-
into the clinic in 1960 and 1963, respectively, tion at C-3 and hydrolysis of the adjacent
but this did not preclude the search for im- acetyl group yielded the desacetyl vinblastine
proved derivatives. A chemical modification amide, vindesine (40). Better yields of vin-
program aimed at improving antitumor activ- desine were obtained from the hydrazide (41)
ity and reducing toxicity was initiated in 1972 on treatment with nitrous acid and reacting
(60). Concern about the neurotoxicity dis- the resultant azide (42) with ammonia. The
played by vincristine, its chemical instability, azide (42) proved to be a useful intermediate
and low natural abundance (0.03 glkg dried for the preparation of a range of substituted
plant material) led to vinblastine's being cho- amides, although vindesine proved to be the
sen as a template for semisynthetic modifica- derivative of choice, with significant differ-
860 Natural Products as Leads for New Pharmaceuticals
(38) R = CH3
(39) R = CHO
nata. Of particular note was the unusual ac- ical activity, but the 10-hydroxy analog (44)
tivity that camptothecin displayed in L1210 showed greater activity than that of (43) (65).
and P388 mouse leukemia life-prolongation Wall and Wani successfully deployed the
assays. The compound also inhibited the Friedlander reaction between substituted
growth of solid tumors in vivo and the water- 2-aminobenzaldehydes and the tricyclic inter-
soluble sodium salt was progressed to phase I1 mediate (45), to synthesize a variety of ring-
clinical trials before being withdrawn because A-substituted analogs. These studies may
of severe bladder toxicity. have prompted SmithKline Beecham (now
GlaxoSmithKline)to synthesize the water-sol-
uble 10-hydroxycamptothecin analog topote-
can (46) that was first approved in 1996 for the
treatment of recurrent ovarian cancer and, 2
years later, for small cell lung cancer (66). Iri-
notecan (471, developed by Daiichi and Yakult
Honsha in Japan and marketed by Pharmacia,
was also approved in 1996 for the treatment of
advanced colorectal cancer. Irinotecan is inac-
tive as a topoisomerase I inhibitor, but acts as
(43) Camptothecin: R1 = R2 = R~ = H
a prodrug of the active 7-ethyl-10-hydroxy-
(44) 10-hydroxycamptothecin: R1 = R2 = H, camptothecin (48) (67).
R~ = OH
(48) 7-ethyl-10-hydroxycamptothecin: R1 = C2H5, (47) Irinotecan: R1 = CzH5,R2 = H,
R~ = H, R3 = OH
(46) Topotecan: R1 = H, R2 = CH2-N(CH3)2, 1I - N ' J - N ~
R3 = O - C
R~ = OH
0
Interest in camptothecin gained new impe-
4.3 Paclitaxel and Docetaxel
tus in 1985, when it was discovered that the
compound exerts its antitumor activity Regarded as the tree of death by the Greeks
through a novel mechanism of action (64). and used to prepare arrow poison by the Celts,
Camptothecin binds to the covalent complex the yew tree has been associated with death
formed by topoisomerase I and DNA, which and poisoning for centuries (68, 69). The En-
initiates DNA replication and thus stabilizes glish yew, Taxus baccata, was used to make
the enzyme-DNA complex and prevents cell funeral wreaths and it was believed that one
proliferation. The elucidation of the mecha- could die by merely standing beneath the
nism of action provided a means of evaluating boughs of the tree.
camptothecin analogs as topoisomerase inhib- Yew certainly contains highly toxic metab-
itors in vitro and efforts then focused on syn- olites and their potency and fast duration of
thesizing water-soluble analogs with broad- action has often made extracts of yew the poi-
spectrum antitumor activities. The a-hydroxy son of choice for numerous murders and sui-
lactone (ring E) and, in particular, the 20(S)- cide attempts. It is thus ironic that extracts
form proved essential for maintaining biolog- from the Pacific yew, T. brevifolia, after being
Natural Products as Leads for New Pharmaceuticals
tested in the National Cancer Institute's Phase I clinical trials were initiated in
(NCI) screening program during the 1960s, 1983, but these were to proceed at a slow and
yielded what was described (70) as the most tortuous pace and proved all but disastrous
exciting anticancer compound discovered in when the high levels of oil-based adjuvant
the previous 20 years; that is, paclitaxel (49) used to formulate paclitaxel caused severe al-
(originally given the name tax01 by Wall and lergic reactions in many volunteers. Un-
Wani). daunted by the formulation problem and
spurred on by paclitaxel's novel mechanism of
action, clinicians were able eventually to min-
imize the allergic events and demonstrate use-
ful activity. Phase I1 clinical trials began in
1985 despite continuing supply problems, and
4 years later the program received a signifi-
cant boost when McGuire et al. (74) reported
good responses from patients suffering from
refractory ovarian cancer, a disease that kills
some 12,500 women a year in the United
States alone.
In many ways, the development of pacli-
taxel mirrored that of the camptothecin ana-
The initial isolation and characterization of logs, both being dogged for many years by sup-
paclitaxel proved particularly difficult because ply issues, poor pharmacokinetics, and
of (1) its very low natural abundance in T. toxicity, but the subsequent uncovering of
breuifolia bark (although this was the best novel mechanisms of action fueled renewed ef-
known source, the isolated yield was only forts to develop these leads into important
0.02% w/w, equivalent to 650 mg per tree), (2) new anticancer agents (75).
the poor analytical data obtained from the pu- In 1991 Bristol-Myers Squibb in conjunc-
rified compound, and (3) the failure of pacli- tion with the NCI agreed to manage the sup-
taxel to give crystals that were suitable for plies of paclitaxel and were granted a licence to
X-ray analysis (71). The structure of paclitaxel further develop the compound. The following
was published in 1971 (72),but further biolog- year the U.S. Federal Drug Administration
ical testing continued to be troubled by diffi- approved paclitaxel for the treatment of ovar-
culties. The compound showed only modest in ian cancer in patients unresponsive to stan-
viuo activity in various leukemia assays, which dard treatments, and in December 1993 ap-
was no better than that displayed by a number proval was given for the treatment of
of other new compounds at the time. In addi- metastatic breast cancer.
tion to the limited supplies of paclitaxel (the The sourcing of paclitaxel from T. brevifo-
complexity of the molecule precluded chemical lia was a major problem (76) because to treat
synthesis), the compound was very poorly sol- just the groups of patients suffering ovarian
uble in water, which made formulation diffi- cancer in the United States would require
cult. However, various new assays were devel- about 25 kg of compound per year, necessitat-
oped in the 1970s, including the murine B16 ing the felling of some 38,000 trees (70)! Al-
melanoma model, in which paclitaxel showed though the Pacific yew is not a rare tree, it is
very good activity, and another boost came extremely slow growing and such harvesting
when Horwitz et al. (73) discovered that the could not be sustained indefinitely. It has been
compound prevented cell division by a unique estimated that there were enough trees avail-
mode of action. In contrast to the antimitotic able to maintain a supply of paclitaxel for only
vinblastine and podophyllotoxin analogs 2-7 years (77). The isolation of paclitaxel from
(q.v.), which prevent microtubule assembly, other Taxus species has been investigated at
paclitaxel inhibits cell division by promoting length and reasonable quantities have been
assembly of stable microtubule bundles, obtained from the needles of several species
which leads to cell death. including T. baccata. Using the needles has
4 Anticancer Drugs 863
alleviated the supply problem because they the C-13 ester side-chain can be tolerated.
can be harvested without damaging the tree. Thus, the N-t-(butoxycarbonyl)derivative, do-
However, the needles contain much higher cetaxel(54), which appears to be more potent
quantities of several biosynthetic precursors than paclitaxel (81) and has better solubility
of paclitaxel and two of these, baccatin I11 (50) characteristics, has been developed and
and 10-desacetylbaccatin I11 (51) have been launched by Aventis for the treatment of ovar-
used to prepare paclitaxel semisynthetically. ian, breast, and lung cancers.
One approach, developed by Potier et al. (781,
involved acylation of the sterically hindered
C-13 position of baccatin 111 with cinnamic
acid and subsequent double-bond functional-
ization through hydroxyamination, to give
paclitaxel together with various regio- and ste-
reoisomers. A better approach involved pro-
tection of 10-desacetylbaccatin I11 as the tri-
ethylsilyl ether, followed by direct acylation
with the phenylisoserine derivative (521, giv-
ing paclitaxel in 38% overall yield (79). Fur-
ther improvements were made using less
sterically demanding acylating reagents; for
example, acylation with the p-lactam (53)
gave paclitaxel in up to 90% yield (80) and this Various "protaxols," designed to release
may be the preferred method for commercial paclitaxel in situ under physiological condi-
production in the future. tions, have been prepared by acylating the
C-2' hydroxyl group. Nicolaou et al. (82) re-
ported the synthesis of the sulfone (551, which
is soluble and stable in aqueous media, but is
able to release paclitaxel rapidly in human
blood plasma.
EtO T ~ ~ ; Ph
\\
\
COPh
These semisynthetic approaches also pro- Plant tissue culture (70),microbial fermen-
vide access to analogs with potential advan- tation (83),and total synthesis (84,85)provide
tages over paclitaxel itself. Structure-activity other possibilities for the production of pacli-
studies have shown that, although the oxetane taxel and its derivatives, although it is far
ring appears to be essential for activity, wide from certain whether any of them will be com-
variation in the nature and stereochemistry of mercially viable.
Natural Products as Leads for New Pharmaceuticals
OCOC~H~
-0u / h,,,,
11
X
,,,,\\OH
the epothilones. Nicolaou, Danishefsky, and
Schinzer independently adopted successful,
elegant synthetic approaches involving olefin
metathesis, macrolactonization, Suzuki cou-
pling, or ester-enolate-aldehyde condensa-
0 OH 0 tion (89). Within 3 years of the disclosure of
their absolute stereochemistry, 17 different
(56) epothilone A: X = 0,R = H total syntheses of the natural products were
(57) epothilone B: X = 0,R = CH3 reported. These syntheses paved the way for
(59) BMS 247550: X = NH, R = CH3 the generation of a large number of epothilone
4 Anticancer Drugs 865
analogs for biological evaluation, including the The story has all the classic ingredients, start-
use of solid-phase combinatorial approaches. ing with observation and reasoning, extending
The academic groups focused on modifica- through chance into new areas, and character-
tions around the core macrocyclic lactone, es- ized throughout by persistence and determi-
tablishing important structure-activity rela- nation, particularly when biological activity
tionships, but not improving on the in vitro had to be traced to very minor constituents in
biological activity of the most active natural the crude plant extract.
product, epothilone B (57). In vivo biological Podophyllum peltatum (may apple, or
data were comparatively scarce and, although American mandrake) and P. emodi are. re-
one group reported that epothilones B (57) spectively, American and Himalayan plants,
and D (58) showed activity in murine tumor widely separated geographically but used in
models, researchers at Bristol-Myers Squibb both places as cathartics in folk medicine (94).
have shown that (58)lacks in vivo activity as a An alcoholic extract of the rhizome known as
result of rapid metabolic inactivation (90). It podophyllin was included in many pharmaco-
was postulated that esterase-mediated hydro- poeias for its gastrointestinal effects; it was
lysis of the macrocyclic lactone formed an in- included in the U.S.P., for example, from 1820
active ring-opened species and, therefore, ef- to 1942. At about this time the beneficial effect
forts were focused on replacing the lactone of podophyllin, applied topically to benign tu-
with a more stable macrocyclic lactam moiety. mors known as condylomata acuminata, was
Several macrocyclic lactam derivatives were demonstrated clinically (95). This usage was
synthesized from (57) and (58).Of note was not inspirational, given that there are records
the preparation of BMS-247550 (59) in a of topical application in the treatment of can-
three-step synthesis from epothilone B (571, cer by the Penobscot Indians of Maine and,
utilizing a novel Pd(0)-catalyzed ring-open- subsequently, by various medical practitio-
ing reaction followed by reduction and macro- ners in the United States from the 19th cen-
lactamization. BMS-247550 (59), which is in tury (96). The crude resinous podophyllin is
phase I clinical trials, retains its activity an irritant and unpleasant mixture unsuited
against human cancer cells that are naturally to systemic administration.
-
insensitive to ~aclitaxelor that have devel-
oped resistance to paclitaxel, both in vitro and
The first chemical constituent was isolated
from podophyllin in 1880 and named podo-
in vivo (91). phyllotoxin (97). A structure was proposed in
1932 and after some fine-tuning (98) was
shown to be the lignan (60). As might be ex-
pected, the crude resin contains a variety of
chemical types, including the flavonols quer-
cetin and kaempferol (99). Although these
other constituents undoubtedly have biologi-
cal activity, it is the lignans that have received
most attention and to which we shall devote
the remainder of this section.
Chemists at Sandoz in the early" 1950s rea-
(58) epothilone D soned that crude podophyllin might contain
lignan glycosides with anticancer activity,
which might be more water soluble and less
4.5 Podophyllotoxin, Etoposide,
toxic than podophyllotoxin (92). The reason-
and Teniposide
ing for the latter is not entirely clear, but in
The development of the natural constituents the event they proved to be correct in both
of Podophyllum Resin into effective semisyn- respects. careful isolation gave podophyllo-
thetic and, ultimately, totally synthetic com- toxin p-D-glucopyranoside (61) its 4'-des-
pounds for the treatment of various kinds of methyl analog (62) and some less important
cancer provides one of the most sustained and lignans lacking the B-ring hydroxy group
intriguing stories of drug discovery (92, 93). (100-102). Unfortunately, the sugar deriva-
Natural Products as Leads for New Pharmaceuticals
(73) Bryostatin I
v
(74) Dolastatin 10
The discovery that the fused p-lactam nu- hydroxy group, but the compound is better ab-
cleus, 6-aminopenicillanic acid (6-APA) (76), sorbed by the gastrointestinal tract.
could be obtained from cultures of Penicillium Clavulanic acid (SO), isolated from Strepto-
chrysogenum led to the preparation of new, myces clavuligerus, is similar in structure to
semisynthetic derivatives with improved sta- the penicillins, except oxygen replaces sulfur
bility to gastric acid and p-lactamases, and in the five-membered ring (123). Clavulanic
with activity against a wider range of patho- acid has weak antibacterial activity, but is a
genic organisms (121). Sheehan (122) showed potent inhibitor of p-lactamases (124). A mix-
that compound (76) would react readily with ture of clavulanic acid and the p-lactamase-
acid chlorides to form new penicillin deriva- sensitive amoxycillin was introduced in 1981
tives with novel substituents at the 6-position. as Augmentin and has proved to be an effec-
Methicillin (77), with a sterically demanding tive combination to combat P-lactamase-pro-
2,6-dimethoxybenzamide side-chain, was the ducing bacteria (125). In 2001, 20 years after
first semisynthetic penicillin to show resistance its launch, Augmentin is the best-selling anti-
to staphylococcal p-lactamases, although the bacterial worldwide.
compound was still acid labile. Ampicillin (78) The clinical introduction of the penicillin
has an a-aminophenylatamido side-chain and group of antibiotics prompted an intensive
displays good activity against Gram-negativeor- search for novel antibiotic-producing organ-
ganisms, it is stable to acid and thus can be ad- isms and Selman Waksman demonstrated the
ministered orally, although it is susceptible to value of actinomycetes in this role, discovering
degradation by p-lactamases. Arnoxycillin (79) the aminoglycoside streptomycin (81) from
differs from ampicillin by the addition of a single Streptomyces griseus in 1943 (126). Pharma-
Natural Products as Leads for New Pharmaceuticals
(78) R = COCHPh
I
COOH
qlH2
R1= COCHPh R2 = C1
NH2
I
R1 = COCHPh
R1 = COC /n S
NOC(CH3)2
I
COOH
new class of carbapenem antibiotics to become tions, as developed for the penicillins and
available for clinical use (132). Imipenem has cephalosporins, led to compounds with im-
a very broad spectrum of activity against most proved activity against both Gram-positive
Gram-positive and Gram-negative aerobic and and Gram-negative bacteria. A derivative con-
anaerobic bacteria. taining the a-aminothiazoyl group, a side-
Screening bacteria such as Pseudomonas chain component common to the third-gener-
acidophila and Chromobacterium uiolacium ation cephalosporins (see above), showed
for production of p-lactam antibiotics resulted specific activity against Gram-negative aero-
in the discovery of naturally occurring bic bacteria, including Pseudomonas spp., and
monobactams, which had moderate antimi- was stable to most types of p-lactamases. The
crobial activity (133-135). Side-chain varia- compound aztreonam (108) became the first
874 Natural Products as Leads for New Pharmaceuticals
RHN
'rfS\l
COOH
I
(90) R = COCH2CH2CH2CHNH2
(91) R = H (108)
(109) Erythromycin A, R = H
(114) Clarithromycin, R = CH3
Erythromycin (109) was isolated, in 1952, closure to give the 9,12-tetrahydrofuran (112)
from a strain of Saccharopolyspora erythraea that is also inactive (139). The A, derivative
5 Antibiotics
5.3 Streptogramins
The streptogramins are produced by Strepto-
myces species and have been classified into two
groups: Group A are polyunsaturated macro-
cyclic lactones and Group B are cyclic
hexadepsipeptides. Both groups bind bacterial
ribosomes and inhibit protein synthesis at the
(113) Azithromycin elongation step and they act synergistically
against many Gram-positive microorganisms.
Both azithromycin and clarithromycin However, the naturally occurring strepto-
have been used for various bacterial infections gramins are poorly soluble in water and this,
for a number of years. Within the last decade, until recently, has limited their use to treat
resistance has emerged to a range of antibac- bacterial infections. New, water-soluble deriv-
terials, including the macrolides, arising from atives have been developed and the semisyn-
methylation of an adenine in the 23s ribo- thetic dalfopristin (116) and quinupristin
somal RNA target site, which prevents bind- (117) mixture (Synercid) has been approved
ing (146). The invention of the ketolides [e.g., for the treatment of Gram-positive infections,
telithromycin (115)l overcomes MLS, resis- including multidrug-resistant strains of En-
tance by removing the L-cladinose moiety at terococcus faecium, Staphylococcus aureus,
position 3: the exposed hydroxyl is also oxi- and S. pneumoniae (148).
(1 15) Telithromycin
5 Antibiotics 877
I
(116) Dalfopristin
5.4 Echinocandins
The fungal metabolite echinocandin B (118)is
one of the lipopeptides, in which a cyclic
hexapeptide is combined with a long-chain (118) echinocandin B, R = linoleyl
fatty acid. Echinocandin B inhibits p-1,3-glu-
can synthesis and as a result has anti-Candida Synthesis of the cyclic hexapeptide is unat-
and anti-Pneumocystis carinii activity (149). tractive for the purpose of securing analogs
As a group, the echinocandins are not orally with improved biological activity because of
bioavailable, are haemolytic, and are not very the unusual nature of the amino acids used
(117) Quinupristin
Natural Products as Leads for New Pharmaceuticals
and the complex stereochemistry generated by now in clinical trial and has the major advan-
the high degree of hydroxylation. However, tage of oral bioavailability (153). Many other
echinocandin B can be produced efficiently by antifungal peptides are under investigation
fermentation of a culture of Aspergillus nidu- (152). The member of this series that is fur-
lans and then deacylated by fermentation thest advanced is caspofungin (MK-991,
with Actinoplanes utahensis (151). The free L-743,872) (121),following its approval by the
amino group thus exposed can be derivatized FDA, early in 2001, for the treatment of as-
with a number of active esters. Synthesis of pergillosis. The two analogs, LY-303366 and
the amide from 4-octylbenzoic acid gives cilo- caspofungin, have been compared against clin-
fungin (119), which has specifically high po- ical fungal isolates i n vitro (154) and the latter
tency against Candida albicans and some has been evaluated in immunosuppressed
other Candida species (151). mice (155).
6 CARDIOVASCULAR DRUGS
(119) cilofungin, R -
OH
(121) caspofungin
6 Cardiovascular Drugs 879
the treatment of coronary heart disease (1561, group, naming the compound compactin, re-
lovastatin was introduced onto the market by ported its antifungal activity but failed to re-
Merck in 1987 for the treatment of hypercho- veal its mode of action as an inhibitor of HMG-
lesterolemia, a condition marked by elevated CoA reductase. The search for naturally
levels of cholesterol in the blood. occurring inhibitors of HMG-CoA reductase
gained pace and after spending several years
developing appropriate screens, Merck found
during only the second week of testing a cul-
ture of Aspergillus terreus that displayed in-
teresting inhibitory activity (160). In Febru-
ary 1979 the active component, lovastatin
(mevinolin), was isolated and characterized
(1611, and in November the following year
Merck was granted patent protection in the
United States. Although lovastatin proved to
be identical to monocolin K, a metabolite iso-
lated earlier from Monasus ruber (162), the
chemical structure of the latter compound had
not been reported, whereas Merck filed for
patent protection giving complete structural
Lovastatin works by inhibiting 3-hydroxy- details for lovastatin.
3-methylglutaryl coenzyme A (HMG-CoA) re- The discovery of compactin and lovastatin
ductase, a key rate-limitingenzyme in the cho- prompted efforts to develop derivatives with
lesterol biosynthetic pathway. However, the improved biological properties (163, 164).
first specific inhibitors of this enzyme were Modification of the methylbutyryl side chain
discovered several years earlier by Endo et al. of lovastatin led to a series of new ester deriv-
at Sankyo (157). The compounds, which are atives with varying potency and, in particular,
structurally related to lovastatin, were iso- introduction of an additional methyl group a
lated from Penicillium citrinum and shown to to the carbonyl gave a compound with 2.5
block cholesterol synthesis in rats and lower times the intrinsic enzyme activity of lova-
cholesterol levels in the blood. Development of statin (165). The new derivative, named sim-
the most active compound, designated ML-236B vastatin (124), was the second HMG-CoA re-
(123),is believed to have been curtailed because ductase inhibitor to be marketed by Merck.
of toxicity problems (158). Both lovastatin and simvastatin are prodrugs
and are hydrolyzed to their active open-chain
dihydroxy acid forms in the liver (166). A third
compound, pravastatin (125), launched by
Sankyo and Squibb in 1989, is the open hy-
droxyacid form of compactin that was first
identified as a urinary metabolite in dogs.
Pravastatin is produced by microbial biotrans-
formation of compactin.
The HMG-CoA reductase inhibitors de-
scribed above bind to two active sites on the
enzyme: the hydroxymethylglutaryl binding
domain and an adjacent hydrophobic pocket to
which the decalin moiety binds (167).The rec-
ognition that the ring-opened hydroxy acids
resemble mevalonic acid and that the decalin
Brown et al. a t Beechams also reported moiety could be replaced by 4-fluorophenyl-
the isolation of (1231, but as a metabolite substituted heterocycles led to the launch of
from Penicillium brevicompactum (159). The several new products including fluvastatin
Natural Products as Leads for New Pharmaceuticals
(127) Cerivastatin
Zeneca's rosuvastatin (129), is due to be way for other ACE inhibitors, such as enala-
launched in 2002 and is forecast to achieve pril(132) and lisinopril, which have had a ma-
sales of US $2.8 billion by 2005 (168). jor impact on the treatment of cardiovascular
disease (173).
6.2 Teprotide and Captopril
While studying the physiological effects of
snake poisoning, Ferreira (169) discovered
that specific components in the venom of the
pit viper Bothropsjararaca inhibited degrada-
tion of the peptide bradykinin and potentiated
its hypotensive action. The "potentiating fac-
tors" proved to be a family of peptides that
worked by inhibiting the dipeptidyl carboxypep-
tidase, angiotensin-converting enzyme (ACE)
(170,171). In addition to catalyzing the degra-
dation of bradykinin, ACE also catalyzes the
conversion of human prohormone, angiotensin
1, to the potent vasoconstrictor odapeptide, an-
giotensin 11.However, the significance ofACE in
the pathogenesis of hypertension was not fully
appreciated until the 1970s after Ondetti et al.
(172) had first isolated and then synthesized
the naturally occurring nonapeptide, tepro- 6.3 Adrenaline, Propranolol, and Atenolol
tide (130). The compound proved to be a spe- The true clinical potential of P-adrenoceptor
cific potent inhibitor of ACE and showed ex- blocking agents for treating angina, atrial fi-
cellent antihypertensive properties in clinical brillation, and tachycardias was first recog-
trials, although its use was limited by the lack nized by James Black and colleagues at ICI
of oral activity. (174).Black noted a report from Neil Moran of
Emory University in 1958, showing that di-
Pyr -Trp-Pro-Arg -Pro-Gln-Ile-Pro-Pro chloroisoprenaline antagonized the effects of
adrenaline on heart rate and muscle tension.
The first effective P-adrenoceptor blocker,
pronethalol (133), was synthesized 2 years
The discovery of teprotide led to a search later by the ICI group and marketed for lim-
for new, specific, orally active ACE inhibitors. ited use in 1963. Toxicity problems soon led
Ondetti et al. (172) proposed a hypothetical pronethalol to be replaced by the 1-naphthyl
model of the active site of ACE, based on anal- analog, propranolol (134), which became the
ogy with pancreatic carboxypeptidase A, and first P-adrenoceptor antagonist approved for
used it to predict and design compounds that general use, being more potent and yet devoid
would occupy the carboxy-terminal binding of the partial agonist or intrinsic sympathomi-
site of the enzyme. Carboxyalkanoyl and mer- metic activity shown by many other analogs.
captoalkanoyl derivatives of proline were Compounds with improved selectivity for the
found to act as potent, specific inhibitors of P-adrenoceptor of cardiac muscle (P-l-adreno-
ACE and 2-~-methyl-3-rnercaptopropanoyl-~-
proline (131) (captopril) was developed and
launched in 1981 as an orally active treatment
for patients with severe or advanced hyperten-
sion. Captopril, modeled on the biologically ac-
tive peptides found in the venom of the pit
viper, made an important contribution to the
understanding of hypertension and paved the (133) Pronethalol
882 Natural Products as Leads for New Pharmaceuticals
ceptor blockers) were to follow, including olite of coumarin (137), itself a common com-
atenolol (135), which became the most fre- ponent of Melilotus sp. Soon after the
quently prescribed P-blocker and one of the compound had been identified, trials were ini-
best-selling drugs of the time. tiated that confirmed the oral anticoagulant
activity in humans and in 1942 it was mar-
keted under the name dicoumarol (177). The
compound had a slow, erratic onset of action
and efforts were initiated to prepare synthetic
analogs that acted faster and had longer dura-
tion of action. A 4-hydroqcoumarin residue,
substituted at the 3-position, proved essential
for biological activity and in 1948, after syn-
thesizing over 150 compounds, a Chydroxy-
coumarin derivative that was longer acting
(134) Propranolol
and more potent than dicoumarol was selected
not for clinical use, but as a rodenticide for
development by the Wisconsin Alumni Re-
search Foundation! The compound (138),
(135) Atenolol
U.S.Army cadet unsuccessfully attempted smooth muscle and protected the animals
to commit suicide by taking massive doses of against allergen-induced bronchospasm (183).
the compound. The incident prompted fur- A clinical pharmacologist on Benger's staff,
ther clinical trials that resulted in warfarin who suffered from chronic asthma, questioned
being used as the anticoagulant of choice for the validity of the animal model and decided
prevention of thromboembolic disease (177). instead to test the compounds on himself. He
The mode of action of the coumarin antico- then prepared a "soup" of guinea pig fur, in-
agulants involves blocking the regeneration of haled the vapors to induce a reproducible
reduced vitamin K and induces a state of func- asthma attack, and assessed the effects of the
tional vitamin K deficiency, thus interfering synthesized khellin derivatives. Many of the
with the blood-clotting mechanism (178). compounds first prepared were insoluble in
water and caused nausea and other unpleas-
ant side effects when taken orally. This led to
7 ANTIASTHMA DRUGS
the test compounds being formulated as aero-
sol sprays and in 1958, an aerosol preparation
7.1 Khellin and Sodium Cromoglycate
of a chromone-2-carboxylic acid derivative
The toothpick plant, Ammi visnaga, had been (140) was found to exert a protectant effect,
used for centuries in Egypt as an antispas- albeit short lived, against bronchial allergen
modic agent to treat renal colic and ureteral challenge without showing the bronchodilator
spasm. In 1879 one of the plant's main constit- activity seen with other compounds. The com-
uents was isolated, crystallized, and named pound was completely inactive in the guinea
khellin (139) (179). Subsequently, the pure pig asthma model and afforded its protectant
compound was shown to relax smooth muscle effect in humans only when inhaled as an
and in 1938 the chemical structure was char- aerosol.
acterized as a chromone derivative (180). In
1945 a medical technician took khellin to treat
renal colic and found instead that it acted as a
potent coronary vasodilator and relieved his
angina (181). This chance discovery, together
with earlier observations, led to khellin being
used as a coronary artery vasodilator and for
treating bronchial asthma (182). However, its
clinical use was severely limited by some un-
pleasant gastrointestinal side effects.
About two new compounds were tested
each week and in 1965, after synthesizing
some 670 analogs, a bischromone was pre-
pared that gave good protection, even when
inhaled up to 6 h before bronchial allergen
challenge (184). The compound sodium cro-
moglycate (141) was obtained by condensing
diethyl oxalate with the bis(hydroxy acetophe-
none) (142) and cyclizing the resultant
bis(2,4-dioxobutyric acid) ester (143) under
acidic conditions (185). The essential chemical
features required for activity appeared to be
Five years later, a small British pharma- the coplanarity of the chromone nuclei, the
ceutical company, called Benger Laboratories, flexible dioxyalkyl link, and the carboxyl
initiated a program to synthesize khellin ana- groups in the 2-positions. It is believed to act
logs as potential bronchodilators for treating by stabilizing tissue mast cells against degran-
asthma, and had prepared a series of com- ulation, thereby preventing release of inflam-
pounds that relaxed guinea pig bronchial matory mediators (186).
Natural Products as Leads for New Pharmaceuticals
(145) amiodarone
8 ANTIPARASITIC DRUGS
(152) artemisinin
HO"'
of action of artemisinin has since been eluci- tivity. In the presence of acid, a highly reactive
dated (198, 199), although it is not without carbocation intermediate allows S,1-type
controversy (200, 201). The drug has a high substitution with a variety of nucleophiles.
affinity for hemozoin, a storage form of hemin For example, boron trifluoride catalyzes reac-
that is retained by the parasite after digestion tions with methanol and ethanol to give arte-
of hemoglobin, leading to a highly selective ac- mether (156) and arteether (1571, respec-
cumulation of the drug by the parasite. Arte- tively, two of the most important derivatives
misinin then decomposes in the presence of (196). Both are more potent than the parent
iron, probably from the hemozoin, and re- compound and have improved solubility in oil.
leases free radicals, which kill the parasite. Artemether has been chosen for development
The peroxide bridge is therefore a crucial part in the West under the name Paluther.
of the drug molecule, as was suspected from
structure-activity studies. Elucidation of the
mechanism of action has led to the synthesis of
a range of simple analogs capable of iron-cat-
alyzed decomposition, some of which have
good antimalarial activity (202).
In retrospect, it is not surprising that the
peroxide-bridged compound (154), isolated
from Artabotrys uncinatus, also has antima-
larial activity (197). Because peroxides of this
kind are likely to be formed from a variety of
precursors in dried plant material (see below),
(155) R=H
there may well be many more antimalarials of (156) R = CH3 artemether
this kind to be found. (157) R = CH2CH3arteether
(158) R = COCH2CH2COONasodium artesunate
line drugs for the treatment of cerebral ma- during World War I, stimulating a major pro-
laria caused by P. falciparum (197), which is gram of research into synthetic analogs.
otherwise fatal.
It seems highly likely (205)that most of the
artemisinin found in dried plant material is
formed by autoxidation after the death of the
plant. From the medicinal chemist's point of
view this is unimportant, but some plant bio-
chemists might have doubts about the descrip-
tion of artemisinin as a "natural product." In
our view, air drying in sunlight is a natural,
although not a botanical, process. It is proba-
ble that many other plant-derived peroxides
are formed in a similar way.
Whole plant extracts often show promising (159) quinine
activity that may not be traceable to single
components. This is obviously not true of Ar- The chemical techniques available to chem-
temisia annua extracts, but it is interesting to ists in the period 1820-1920, although im-
note that other constituents, notably me- proving rapidly, did not allow a structure to be
thoxylated flavones, have potentiating effects proposed for quinine with any confidence: the
on the antimalarial activity of artemisinin first completely correct proposal (211) came in
(206). 1922 and was finally confirmed by total syn-
The reported effect of artemisinin on sys- thesis (212) as late as 1945. However, part
temic lupus erythematosus (196) is intriguing, structures were known, such as the 6-me-
given the history of use of quinine-type anti- thoxyquinoline moiety, from long before, and
malarials in this disease. were sufficient to allow the synthesis of mim-
ics. The first clinically successful mimics were
the 8-aminoquinolines.
8.2 Quinine, Chloroquine, and Mefloquine
In the early years of the 20th century, syn-
The use of Cinchona bark (e.g., Cinchona suc- thetic organic chemistry was a young disd-
cirubra) by South American indians to treat pline, largely governed by empirical rules.
fevers and the subsequent importation of the Progress toward synthetic analogs of complex
bark into Europe by Jesuit priests in the 17th natural structures was governed as much by
century is well known (207). At that time ma- synthetic feasibility as by a desire for close
laria was widespread, even as far north as mimicry. The first quinine analogs were,
eastern Scotland, and there was no effective therefore, a combination of the accessible
treatment for "the ague." Although quinine 6-methoxyquinoline part of the quinine struc-
(159) is not very potent or long acting, a good ture, with elements of the first successful an-
sample of Cinchona bark contains about 5% of timicrobial agents, such as 9-aminoacridine.
the alkaloid (208). This high concentration Nitration followed by reduction could be used
permitted genuinely therapeutic doses of bark -
to generate a number of new molecules from a
to be given and allowed the pure alkaloid to be variety of parent heterocycles. It is recorded
isolated (209) as early as 1820. During the (213) that 4-, 6-, and 8-aminoquinolines have
next 100 years quinine was the only effective antimalarial properties and, quite extraordi-
treatment for malaria known to Europeans. narily, two of these chemical classes are still
Without quinine, life in the tropics was impos- used today, have quite different uses as anti-
sible for those without natural immunity to malarials, and quite possibly have different
malaria. "One thing that was compulsory was modes of action.
the taking of five grains of quinine a -
The first of the 8-aminoauinolines to be in-
day. . . . And if you didn't take it and got ill traduced into medicine was pamaquine (1601,
your salary was liable to be stopped" (210). not long after World War I (214). Despite
Supplies of quinine to Europe were threatened greater toxicity than that of quinine, this class
8 Antiparasitic Drugs
of drugs was found to have radical curative As has been explained, the major stimulus
ability against the relapsing malarias. Several for research into synthetic antimalarials was
hundred analogs were tested during World not so much the therapeutic inadequacy of
War I1 and of these, primaquine (161) sur- quinine as the potential lack of availability in
vives to the present day for short-term use as a times of social upheaval. During World War 11,
radical curative (215). the United States encouraged the planting of
Cinchona in Costa Rica, Peru, and Ecuador
(216). The total synthesis of quinine was too
difficult in the 1940s and is unlikely to become
economically viable even in the new millen-
nium. This problem was partly overcome with
quinacrine, which was used widely in World
War 11, although quinacrine has the defects
described above. The conceptual derivation of
chloroquine (163) from quinacrine is obvious
(160) pamaquine and apparently happened twice, in Germany
and the United States, the latter about 10
years after the Germans had discarded the
drug as being too toxic! The story of the redis-
covery of chloroquine is fascinating, as an ac-
count of human muddle and misjudgment, fi-
nally leading to an extraordinarily valuable
drug (216).
(161) primaquine
milbemycins R = H
A, Z = CH3
B,Z=H
a, X = CH(CH3)CH2CH3
b, X = CH(CH3)2
1, V-W =CH=CH
2, V-W = CH2CH(OH)
For further details of these descriptors, in the milbemycins, see Ref. 228.
pharmaceuticals and has contributed dramat- valin, Reminyl) (170),originally isolated from
ically to extending human life and improving the bulbs of the Arnaryllidaceae family (snow-
clinical practice. As long as Nature continues drops, daffodils, etc.), which has found use in
to yield novel, diverse chemical entities pos- the symptomatic treatment of Alzheimer's
sessing selective biological activities, natural Disease (239). It is a reversible and competi-
products will play an important role as leads tive inhibitor of acetylcholinesterase that also
for new pharmaceuticals. An interesting re- interacts allosterically with nicotinic acetyl-
cent example is the alkaloid galantamine (Ni- choline receptors to potentiate the action of
9 Conclusion
'.J
H3C
3
=063
H3C
OH
(169) X = CH(CH3)CH2CH3
(major)or CH(CH& (minor)
agonists. By acting to enhance the reduced be tested against more biological targets (243,
central cholinergic function associated with 244, although this approach sometimes pro-
this disease, significant improvements in cog- duces more data than can be conveniently - in-
nition and behavioral symptoms have been ob- tegrated into a research program. An alterna-
served in patients. In this case it is the alkaloid tive view is that the elucidation of the
itself that is used as the active compound and biological effects of chosen compounds, in
it will be interesting to see whether develop- some detail, will yield insight into biological
ment leads to better drugs. There are as yet processes that may open avenues for medici-
relatively few publications in this area, al- nal chemistry research that is not based on
though Sanochemia is interested (240,241). pure chance. This view is based on the recog-
nition that secondary metabolites have been
produced and ruthlessly selected, by evolu-
tion, over a long period of time. Either way,
the medicinal chemist has a wonderful oppor-
tunity to continue utilizing the rich chemical
diversity offered by nature, as is shown in two
recent reviews that explore this topic in some
detail (245,246).
The best approach for the identification of
natural product leads is a matter of debate.
Some very inventive techniques have been
used in the bioassay-guided method; for exam-
ple, by spraying TLC plates with reactive me-
dia that respond by producing a color change
in the presence of an active compound. An al-
Over 90% of bacterial, fungal, and plant ternative is to use an ethnobotanical or ethno-
species are still waiting to be investigated pharmacological technique, whereby the accu-
(242). High throughput screening methods mulated wisdom of many generations of
will allow even greater numbers of samples to native plant users may be harnessed in the
894 Natural Products as Leads for New Pharmaceuticals
search for better medicines for all. These two 12. J. W . Lewis, Adv. Biochem. Psychopharmacol.,
techniques may be combined, so that the na- 8, 123 (1974).
tive people describe the uses to which they put 13. J. Hughes, T. W . Smith, H. W . Kosterlitz, L. A.
the plant and the researchers devise a bioas- Fothergill, B. A. Morgan, and H. R. Morris,
say that is used to find the active components. Nature, 258,577-579 (1975).
The problem with any bioassay-guided tech- 14. J. A. H. Lord, A. A. Wakerfield, J. Hughes, and
nique, however, is that the inactive constitu- H. W . Kosterlitz, Nature, 267,495-499 (1977).
ents are not identified. This represents a con- 15. 0.Schaumann, Arch. Exp. Pathol. Pharmacol.,
siderable waste, given that the plant has had 196,109-136 (1940).
to be collected, preserved, and identified. An 16. See Fkf.7, pp. 209-301.
alternative view is that it is best to extract all 17. B. Cox, Curr. Rev. Pain, 4,448-498 (2000).
the constituents, with a view to screening in 18. B. Cox and J. C. Denyer, Expert Opin. Ther.
whichever way is appropriate, at that time or Pat., 8, 1237-1250 (1998).
in the future. With modern high-performance 19. A. G. Gilman, T . W . Rail, A. S. Nies, and P.
liquid chromatography facilities it is possible Taylor, Goodman and Gilman's The Pharma-
to reduce a plant to its secondary metabolites, cological Basis of Therapeutics, 8th ed., Perga-
as single compounds, in a few days: the prod- mon Press, New York, 1990, p. 550.
ucts are then able to be screened in a high 20. L. Lemberger, Clin. Pharmacol. Therap., 39,
throughput manner in an equally short time 1-4 (1986).
and the compounds can be reevaluated when 21. S. E. Sallan, N . E. Zinberg, and E. Frei,
new screens become available. One thing is N. Engl. J. Med., 293, 795-797 (1975).
certain: the variety of natural product struc- 22. R. K. Razdan, in P. Krogsgaard-Larsen, S.
tures, after perhaps 300 million years of natu- Brogger Christensen, and H. Kofod, Eds., Nat-
ral selection, far exceeds the bounds of human ural Products and Drug Development, Munks-
imagination, unlike the typical output from gaard, Copenhagen, 1984, pp. 486-499.
combinatorial chemistry! 23. L. Lemberger and H. Rowe, Clin. Pharmacol.
Ther., 18, 720-726 (1976).
REFERENCES 24. T. S. Herman, L. E. Einhorn, S. E. Jones, C.
1. G. M. Cragg, D. J. Newman, and K. M. Snader, Nagy, A. B. Chester, J. C. Dean, B. Furnas,
S. D. Williams, S. A. Leigh, R. T . Dorr, and
J. Nut. Prod., 60, 52 (1997).
T . E. Moon, N. Engl. J. Med., 300,1295 (1979).
2. R. Gerardy and M. H. Zenk, Phytochemistry,
32,79-86 (1993). 25. A. Ward and B. Holmes, Drugs, 30, 127-144
(1985).
3. M. J. Stone and D. H. Williams, Mol. Micro-
biol., 6,29-34 (1992). 26. W . A. Devane, F. A. Dysarz, R. M. Johnson,
L. S. Melvin, and A. C. Howlett, Mol. Pharma-
4. R. J. Bryant, Chem. Znd., 146-153 (1988).
col., 34,605-613 (1988).
5. C. E. Inturissi, M. Schultz, S. Shin, J. G.
27. W . A. Devane, L. Hanus, A. Breuer, R. G. Per-
Umans, L. Angel, and E. J. Simm, Life Sci., 33
twee, L. A. Stevenson, G. Griffin,D. Gibson, A.
(Suppl. 11, 773 (1983).
Mandelbaum, A. Etinger, and R. Mechoulam,
6. W . Sneader, Drug Discovery: The Evolution of Science, 258, 1946-1949 (1992).
Modern Medicine, John Wiley & Sons, Inc.,
28. N. Stella, P. Schweitzer, and D. Piomelli, Na-
New York, 1985, pp. 78-80 summarizes the
ture, 388, 773-778 (1997).
confusion surrounding the early work.
7. A. F. Casy and R. T . Pariitt, OpioidAnalgesics, 29. A. D. Khanolkar and A. Makryannis, Life Sci.,
Plenum, New York, 1986, p. 407. 65,607-616 (1999).
8. R. Grewe and A. Mondon, Chem. Ber., 81,279 30. A. Szallasi and V . Di Marzo, Trends Neurosci.,
(1948). 23,491-497 (2000).
9. See Ref. 7, p. 153. 31. S. H. Burstein, Pharmacol. Ther., 82, 87-96
10. K. W . Bentley and D. G. Hardy, Proc. Chem. (1999).
Soc., 220 (1963). 32. M. G. Bock, Drugs of the Future, 16,631-640
11. G. F. Blane, A. L. A. Boura, A. E. Fitzgerald, (1991) provides a succinct summary.
and R. E. Lister, Br. J. Pharmacol., 30, 11 33. R. S. L. Chang,V . J. Lotti, R. L. Monaghan, J.
(1967). Birnbaum, E. 0. Stapley, M. A. Goetz, G. Al-
References
71. M. Suffness, "Taxol: From Discovery to Ther- 88. D. M. Bollag, P. A. McQueney, J. Zhu, 0.
apeutic Use" in J. A. Bristol, Ed., Annual Re- Hensens, L. Koupal, J. Liesch, M. Goetz, E.
port of Medicinal Chemistry, Vol. 28, Academic Lazarides, and C. M. Woods, Cancer Res., 55,
Press, New York, 1993, pp. 305-314, provides a 2325-2333 (1995).
good review of the discovery and development 89. For an excellent review of the "Chemical Biol-
of tax01 and related derivatives. ogy of Epothilones," see K. C. Nicolaou, F. Ros-
72. M. C. Wani, H. L. Taylor, M. E. Wall, P. Cog- changar, and V. Dionisios, Angew. Chem. Znt.
gon, and A. T. McPhail, J.Am. Chem. Soc., 93, Ed. Engl., 37, 2014-2045 (1998) and refer-
2325-2327 (1971). ences therein.
73. P. B. Schiff, J. Fant, and S. B. Horwitz, Nature, 90. R. M. Borzilleri, X. Zheng, R. J. Schmidt, J. A.
277,665-667 (1979). Johnson, S.-H. Kim, J. D. DiMarco, C. R. Fair-
74. W. P. McGuire, E. K. Rowinsky, N. B. Rosen- child, J. Z. Gougoutas, F.Y. F. Lee, B. H. Long,
hein, F. C. Grunbine, D. S. Ettinger, D. K. and G. D. Vite, J.Am. Chem. Soc., 122,8890-
Armstrong, and R. C. Donehower, Ann. Intern. 8897 (2000).
Med., 111,273-279 (1989). 91. F. Y. F. Lee, R. Borzilleri, C. R. Fairchild, S.-H.
75. M. E. Wall and M. C. Wani, Cancer Res., 55, Kim, B. H. Long, C. Reventos-Suarez, G. D.
753-760 (1995). Vite, W. C. Rose, and R. A. Kramer, Clin. Can-
76. G. M. Cragg, S. A. Schepartz, M. Suffness, and cer Res., 7, 1429-1437 (2001).
M. R. Grever, J. Nut. Prod., 56, 1657-1668 92. H. Stahelin and A. von Wartburg in E. Jucker,
(1993). Ed., Progress in Drug Research, Birkhauser-
77. D. G. I. Kingston, Pharmacol. Ther., 52, 1-34 Verlag, Basel, Vol. 33, 1989, pp. 169-266.
(1991). 93. H. Stahelin and A. von Wartburg, Cancer Res.,
78. L. Mangatal, M.-T. Adeline, D. Guenard, F. 51, 5-15 (1991) present a shorter and more
Gueritte-Voegelein, and P. Potier, Tetrahe- readable account.
dron, 45,4177-4190 (1989). 94. M. G. Kelly and J. L. Hartwell, J.Natl. Cancer
79. J. N. Denis, A. E. Greene, D. Guenard, F. Znst., 14,967-986 (1954).
Gueritte-Voegelein, L. Mangatal, and P. Po- 95. I. W. Kaplan, New Orleans Med. Surg. J., 94,
tier, J. Am. Chem. Soc., 110, 5917-5919 388 (1942).
(1988). 96. J. L. Hartwell and A. W. Schrecker in L. Zech-
80. C. Palomo, A. Arrieta, F. Cossio, J. M. Aizpu- meister, Ed., Progress in the Chemistry of
ma, A. Mielgo, and N. Aurrekoetxea, Tetrahe- Organic Natural Products, 1958, pp. 83-166
dron Lett., 31, 6429-6432 (1990). provide a detailed review of the earlier devel-
81. F. Gueritte-Voegelein, D. Guenard, F. Lavelle, opments and background.
M.-T. Le Goff, L. Mangatal, and P. Potier, 97. V. Podwyssotzki, Arch. Exp. Pathol. Pharma-
J. Med. Chem., 34,992-998 (1991). col., 13,29 (1880).
82. K. C. Nicolaou, C. Riemer, M. A. Kerr, D. Ride- 98. J. L. Hartwell and A. W. Schrecker, J. Am.
out, and W. Wrasidlo, Nature, 364, 464-466 Chem. Soc., 73,2909-2916 (1951).
(1993).
99. K. S. Pankajarnani and T. R. Seshadri, Proc.
83. A. Stierle, G. Strobel, and D. Stierle, Science,
Ind. Acad. Sci., 36A, 157 (1952) through
260,214-217 (1993).
Chem. Abstr., 48,2702 (1954).See Ref. 77 for a
84. K. C. Nicolaou, Z. Yang, J. J. Liu, H. Ueno, wider discussion.
P. G. Nantermet, R. K. Guy, C. F. Claibome, J.
Renaud, E. A. Couladouros, K. Paulvannan, 100. A. Stoll, J. Renz, and A. von Wartburg, J.Am.
and E. J. Sorensen, Nature, 367, 630-634 Chem. Soc., 76,3103-3104 (1954).
(1994). 101. A. Stoll, A. von Wartburg, E. Angliker, and J.
85. R. A. Holton, H. B. Kim, C. Somoza, F. Liang, Renz, J. Am. Chem. Soc., 76, 6413-6414
R. J. Biediger, P. D. Boatman, M. Shindo, C. C. (1954).
Smith, and S. Kim, J. Am. Chem. Soc., 116, 102. A. von Wartburg, E. Angliker, and J. Renz,
1597-1600 (1994). Helv. Chim. Acta, 40, 1331-1357 (1957).
86. G. Hofle, N. Bedorf, K. Gerth, and H. Reichen- 103. I. Jardine in J. M. Cassady and J. D. Douros,
bach, Ger. Pat. DE 91-4138042 (1993);Chem Eds., Anticancer Agents Based on Natural
Abstr., 120, 52841 (1993). Product Models, Academic Press, New York,
87. M. R. Grever, S. A. Schepartz, and B. A. Chab- Vol. 16, 1980, pp. 319-351 provides a useful
ner, Semin. Oncol., 19,622-638 (1992). review of the middle years.
References
186. See Ref. 19, pp. 630-632. inal and Pharmaceutical Chemistry, 7th ed.,
187. B. N. Singh, Am. Heart J., 106, 788-797 J. B. Lippincott, Philadelphia, 1977, pp. 247-
(1983). 268.
188. B. N. Singh, N . Venkatesh, K. Nademanee, 208. F. A. Fluckiger and D. Hanbury, Pharma-
M. A. Josephson, and R. Karman, Prog. Car- cographia, A History of the Principal Drugs of
diovasc. Dis., 31,249-280 (1989). Vegetable Origin, Met With in Great Britain
189. J. van Schepdael and H. Solvay, Presse Med., and British India, Mamillan, London, 1879,
78,1849-1855 (1970). pp. 361-362.
190. See Ref. 62, pp. 189-191. J . Pelletier and J. Caventou, Ann. Chim. Phys.,
XV, 292 (1820).
191. See Ref. 6, pp. 98-105.
192. R. P. Ahlquist, Am. J. Physiol., 153, 586-600
Anonymous, quoted by C. Allen, Tales from the
(1948).
Dark Continent, Warner, London, 1992, p. 30.
193. A. M . Lands, F . P. Luduena, and H. J . Buzzo, P. Rabe, Berichte, 55,522 (1922).
Life Sci., 6,2241-2249 (1967). R. B. Woodward and W . E. Doering, J. Am.
194. See Ref.173, pp. 333-348. Chem. Soc., 67,860 (1945).
195. D. L. Burgoyne, R. J. Anderson, and T . M . F. Schonhofer et al., 2.Physiol. Chern., 274, 1
Allen, J.Org. Chem., 57,525428 (1992). (1942).
196. P. I. Trigg, in H . Wagner, H . Hikino, and N. R. P. Miffdens, Naturwissenschaften, 14, 1162-
Farnsworth, Eds., Economic and Medicinal 1166 (1926).
Plant Research, Academic Press, London, Vol. See Ref. 19, pp. 988-991.
3, 1989, pp. 19-55. G. R. Coatney, Am. J. Trop. Med. Hyg., 12,
197. W. Tang and G. Eisenbrand, Eds., Chinese 121-128 (1963).
Drugs of Plant Origin, Springer-Verlag, Ber- P. Winstanley and P. Olliario, Expert Opin. In-
lin, 1992, pp. 161-175. vest. Drugs, 7,261-271 (1998).
198. S. R. Meshnick, A. Thomas, A. Ram, C.-M. X u , Anonymous, Bull. World Health Org., 61,
and H.-Z. Pan, Mol. Biochem. Parasitol., 49, 169-178 (1983).
181-190 (1991).
L. H. Schmidt, R. Crosby, J . Rasco, and D.
.99. S. R. Meshnick, Y.-Z. Yang, V . Lima, F. Vaughan, Antimicrob. Agents Chemother., 13,
Kuypers, S. Kamchonwongpaisan, and Y . 1011-1030 (1978).
Yuthavong, Antimicrob. Agents Chemother.,
37, 1108-1114 (1993). R. M . Pinder and A. Burger, J. Med. Chem., 11,
267 (1968).
00. P. L. Olliaro et al., Trends Parasitol., 17, 122-
126 (2001). C. J. Ohnmacht, A. R. Patel, and R. E. Lutz,
J. Med. Chem., 14,926 (1971).
01. G. H. Posner and S. R. Meshnick, Trends Para-
sitol., 17, 266-267 (2001). 222. J. E. Rosenblatt, Mayo Clin. Proc., 74, 1161-
02. For example: J . Cazelles et al., J. Chem. Soc. 1175 (1999).
Perkin Trans. 1, 1265-1270 (2000); M. D. Ba- 223. F . Page, Lancet, 755 (1951).
chi et al., Bioorg. Med. Chem. Lett., 8,903-908 224. A. Freedman and F. Bach, Lancet, 321 (1952).
(1998).
225. G. 0.Haydu,Am. J. Med. Sci., 225,71(1953).
33. J. Karbwang, K. N . Bangchang, A. Thanavibul,
D. Bunnag, T . Chongsuphajaisiddhi, and T . 226. J . Forestier and A. Certonciny, Rev. Rhum.
Harinasuta, Lancet, 340, 1245 (1992), report Mal. Osteoartic., 21, 395 (1954).
some clinical experience to support the data in 227. A. L. Scherbel, S. L. Schuchter, and J . W . Har-
Refs. 196 and 197. rison, Cleve. Clin. Q., 24, 98 (1957); see also
A. N. Chawira, D. C.Warhurst, B. L. Robinson, A. L. Scherbel, Am. J. Med., 75, 1 (1983).
and W . Peters, Trans. R. Soc. Trop. Med. Hyg., 228. H. G. Davies and R. H. Green, Chem. Soc. Rev.,
81,554-558 (1987). 20,211-269 (19911, provide structural details
G. D. Brown, personal communication. of a large number of analogs.
B. C. Elford, M. F. Roberts, J . D. Phillipson, 229. H. G. Davies and R. H. Green, Nut. Prod. Rep.,
and R. J. M . Wilson, Trans. R. Soc. Trop. Med. 3,87-121(1986).
Hyg., 81,434-436 (1987). 230. W . C. Campbell, M. H . Fisher, E. 0.Stapley, G.
A. I. White in C. 0. Wilson, 0. Gisvold, and Albers-Schonberg, and T . A. Jacob, Science,
R. F. Doerge, Eds., Textbook of Organic, Medic- 221,823-828 (1983).
Natural Products as Leads for New Pharmaceuticals
231. K. Awadzi, K. Y. Dadzie, H. Schulzkey, 238. C. N. Burkhart, Vet. Hum. Toxicol., 42,30-35
D. R. W. Haddock, H. M. Gillies, and M. A. (2000).
Aziz, Ann. Trop. Med. Parasitol., 79,63 (1985). 239. L. J. Scott and K. L. Goa, Drugs, 60, 1095-
232. B. M. Greene, H. R. Taylor, E. W. Cupp, R. P. 1122 (2000).
Murphy, A. T. White, M. A. Aziz, H. Schulzkey,
S. A. Danna, H. S. Newland, L. P. Gold- 240. M. A. H. Mucke, J. Froehlich, and U. Jordis,
schmidt, C. Auer, A. P. Hanson, S.V. Freeman, WO 0032199 (2000).
E. W. Reber, and P. N. Williams, N. Engl. 241. U. Jordis, J. Froehlich, M. Treu, M. Hirnschall,
J. Med., 313, 133-138 (1985). L. Czollner, B. Kaelz, and S. Welzig, WO
233. F. A. Drobniewski, Microbiology Europe, 0174820 (2001).
24-28 (1993). 242. J. D. Coombes, Ed., New Drugs from Natural
234. F. 0 . Richards, E. S. Miri, M. Katabanva, A. Sources, IBC Technical Services, London,
Eyamba, M. Sauerbrey, G. Zea-Flores, K. 1992, pp. 59-62,93-100.
Korve, W. Mathai, M. A. Homeida, I. Mueller, 243. G. G. Yarbrough, D. P. Taylor, R. T. Rowlands,
E. Hilyer, and D. R. Hopkins, Am. J. Trop. M. S. Crawford, and L. Lasure, J.Antibiot., 46,
Med. Hyg., 66, 108-114 (2001). 535-544 (1993).
235. K. Awadzi, S. K. Attah, E. T. Addy, N. 0.
244. W. H. Moos, G. D. Green, and M. R. Pavia,
Opoku, and B. T. Quartey, Trans. R. Soc. Trop.
"Recent Advances in the Generation of Molec-
Med. Hyg., 93,189-194 (1999).
ular Diversity," in J. A. Bristol, Ed., Annual
236. B. A. Boatin, J. M. Hougard, E. S. Alley, Report of Medicinal Chemistry, Vol. 28, Aca-
L. K. B. Akpoboua, L. Yameogo, N. Dembele, demic Press, New York, 1993, pp. 315-324.
A. Seketeli, and K. Y. Dadzie, Ann. Trop. Med.
Parasitol., 92, S47S60 (1998). 245. Y.-Z. Shu, J. Nut. Prod., 61,1053-1071 (1998).
237. B. Leppard and A. E. Naburi, Br. J. Dermatol., 246. D. J. Newman, G. M. Cragg, and K. M. Snader,
143,520-523 (2000). Nut. Prod. Rep., 17, 215-234 (2000).
Index
Alprenolol 6-Aminopenicillanicacid
structure-based design, renal clearance, 38 (6-HA),869,870
431-432 Altracurium Aminopeptidases
AG2037 lead for drugs, 856-858 transition state analog inhibi-
structure-based design, AM1 wave function, 15, 102 tors, 652
431-432 AM404,854 8-Aminoquinolines, 888-889
Agenerase Arnaryllidaceue (snowdrops, daf- Amiodarone, 884,885
structure-based design, 440, fodils, etc.), 892 Amitriptyline, 692
441 Amastatin, 738 Ammi visnaga (toothpick plant),
Agent, 398 AMBER energy function, 264, 883
Aggregate concept, in molecular 307-308 Amoxycillin, 869,870
modeling, 90-91 performance in structure pre- AMP analogs, 764
AIDS, See HIV protease inhibi- diction, 315
Ampicillin, 869, 870
tors seeding experiments, 319
Amprenavir, 648,659
AIDS database, 386 AMBERIOPLS force field, 80,
structure-based design, 440,
AIMB, 255 81,103
ALADDIN, 259,363 in molecular modeling, 118 441
in molecular modeling, 111, American mandrake, drugs de- cu-Amylase
113 rived from, 865 X-ray crystallographic studies,
Alaninates Amides 482
binding to chymotrypsin, 35 enzyme-mediated asymmetric P-Arnyloid
Alanine racemase inhibitors, 717 bond formation, 804-805 X-ray crystallographic studies,
Alcohol dehydrogenase exchange ratesltemperature 482
QSAU studies, 5 coefficients, in NMR, 512 Analog design, 687-689
Alcohols pharmacophore points, 249 alkyl chain homologation, 689,
pharmacophore points, 249 Amines 699-704
QSAU studies, 27-29 pharmacophore points, 249, bioisosteric replacement and
Alcuronium, 856,857 250 nonisosteric bioanalogs,
Aldehydes Aminidine 689 - 694
filtering from virtual screens, pharmacophore points, 249 chain branching alteration,
246 Amino acid mimetics, 640 689
Aldose reductase inhibitors Amino acids fragments of lead molecule,
novel lead identification, 321 chemical modification re- 689,707-710
target of structure-based drug agents, 755 interatomic distances varia-
design, 447-449 chirality, 784 tion, 689, 710-712
Aldosterone, 746 classical resolution, 797 limitations of, 532
Alkyl amines conglomerate racemates, rigid or semirigid analogs,
polarization energy, 173 802-803 689,694-699
protonation, 179-180 Aminoacyl-tRNA synthetases ring-position isomers, 689
Alkyl chain homologation ana- binding of alkyl groups to, 8 ring size changes, 689
logs, 699-704 y -Aminobutyric acid amino- stereochemistry alteration and
Alkyl halides transferase (GABA-T) in- design of stereoisomers/geo-
filtering from virtual screens, hibitors, 488, 718, 766-767 metric isomers, 689,
246 y -Aminobutyric acid (GABA), 704-707
Allinger force field, 80 766 substitution of aromatic ring
Allosteric effectors analogs, 690 for saturated, or the con-
of hemoglobin, 421-424 geometric isomer analogs, verse, 689
and lock-and-key hypothesis, 5 705-706 Anandamide. 853
N-Allylmorphine, 850 molecular modeling, 149 Anchor and grow algorithm,
Almond program, 202 7-Aminocephalosporanic acid 294,296
Alogp, 390 (7-ACA),871,874 AND logical operator, 406
Alpha-amylase, 482 2-Amino-3-chlorobutanoic acid Androgen receptor
N-Alpha-(2-naphthylsulfonylgly- nonclassical resolution, 803 X-ray crystallographic studies,
cyl)-4-amidinophenylal[NA- Aminoglutethimide, 717 482
PAP] piperidide chromatographic separation, Angiotensin-converting enzyme
structure-based design, 442, 792 (ACE),650,881. See also
444 classical resolution, 798, 799 ACE inhibitors
Index
active site molecular model- Arabinose binding protein Aspergillus alliaceus, 855
ing, 131,132-133 genetic algorithm study of Aspergillus nidlans, 878
target of structure-based drug docking, 88-89 Aspergillus terreus, 879
design, 432-433 Arachidonic acid, 762, 763 Asperlicin
Angiotensin I, 432-433,746,881 Arecoline analogs, 693-694 fragment analogs, 708
Angiotensin 11, 432, 746, 881 Argatroban lead for drugs,855-856
non-peptide antagonists, structure-baseddesign, 442,444 Aspirin, 762-763,764
668-669,670 Arginase inhibitors, 736-737 Assay Explorer, 387
Anhydrides Arginine Association thermodynamics
filtering from virtual screens, chemical modification re- drug-target binding, 170-171,
246 agents, 755 177-179
Anomalous Patterson maps, 477 Aromatase inhibitors, 717, 770 Asymmetric centers, 784,785
Antiasthma drugs Aromatic-aromatic interactions, Asymmetric synthesis, 784,
natural products as leads, 286 804-820
883-886 Aromatics enzyme-mediated, 804-807
Antibacterial enzyme inhibitors, analogs based on substitution Asymmetric transformation
717 of aromatic for saturated crystallization-induced,
Antibiotic drugs ring; or the converse, 798-799
natural products as leads, 699-704 Atenolol, 882
868-878 growth inhibition by, 38 renal clearance, 38
Antibiotic resistant pathogens, molecular comparisons, 139 Atom-atom mapping, 380,398
770 ArrayExpress, 345 Atom-centered point charges,
Anticancer drugs Artabotrys uncinatus, 887 101-102
enzyme inhibitors, 717, 718 Arteether, 887 Atomic counts, 54
molecular modeling, 151 Artemether, 887-888 Atom list, 398
natural products as leads, Artemisia annua (sweet worm- Atom-pair interaction potentials,
858-868 wood), 886,888 120
Anticoagulant protein C Artemisinin, 849,886-888 Atom stereochemistry, 365,398
X-ray crystallographic studies, Artificial intelligence, 398 Atom-type E-State index, 26
482 Artificial neural networks Atorvastatin, 744, 880
Anticoagulants, 882-883 for druglikeness screening, ATP analogs, 763-764,765
Antifolate targets 247-248,250 Hf ,KC-ATPase inhibitors, 718
structure-based design, in molecular modeling, 126 Na+,K+-ATPaseinhibitors, 718
425-432 in QSAR, 53,62,67 Atracurium, 857,859
Antifungal enzyme inhibitors, for structural genomics study, Atrasentan (ABT-6271,811-812
717 353 Atrial natriuretic factor, 650
Antiparasitic drugs Arylsulfonamidophenethano- Atrolactate, 762
natural products as leads, lamine analogs, 703 Atropine
886-891 p-Arylthio cinnamide antago- lead for drugs, 851
Antiprotozoal enzyme inhibi- nists, 566-567 Augmentin, 869
tors, 717 ASCII (American Standard Code Aura-Mol, 388
Antisickling agents, 419-421 for Information Exchange), AUSPYX, 387
Antiviral enzyme inhibitors, 717 398 AutoDock
Aparnin Ascomycin affinity grids, 293
molecular modeling, 124 binding to FKBP, 552, explicit water molecules, 303
AP descriptors, 55, 56 553-554 flexible ligands, 263
Apex, 256,387 Asinex catalog, 385 Lamarckian genetic algo-
Apex-3D, 60 Aspartate transcarbamoylase rithm, 299
Application tier, 392,398,406 (ATCase)inhibitors, Monte Carlo simulated an-
Aquaporin 1 743-744 nealing, 297
X-ray crystallographic studies, Aspartic acid protein flexibility, 301
482 chemical modification re- Automap, 398
Aqueous solubility agents, 755 Available Chemicals Directory
and structure-based design, Aspartic peptidase inhibitors (ACD), 385,386
408 transition state analogs, virtual screening application,
Arabidopsis thaliana 647-649 254
genome sequencing, 344 virtual screening, 315 Avermectins, 891,892
Index
MS-MS, See Tandem mass spec- enzyme-mediated asymmetric X-ray crystallographic studies
trometry (MS-MS) synthesis, 805 [int B virus], 491
Mulliken population analysis, Narwedine, 802,803 Neuroleptics
101-102 National Cancer Institute data- molecular modeling, 150
MULTICASE SAR method base, 222,254,385486,387 Neuromuscular drugs
toxicity prediction application, National Center for Biotechnol- natural products as leads,
828- 843 ogy Information (NCBI), 856-858
Multidimensional databases, 335 Neuropeptide Y
390,407 sequence databases, 387 X-ray crystallographic studies,
Multidimensional NMR spec- National Toxicology Program, 492
troscopy, 512-514 246,829 Neuropeptide Y inhibitors, 671,
Multidimensional scaling, 201 Natural product mimetics, 636 673,674
Neutral endopeptidase (NEP),
Multidimensional scoring, 291 Natural products
650-651
Multilevel chemical compatibil- antiasthma drug leads,
Nitric oxide synthase, 736
ity, 249 883-886 Nitric oxide synthase inhibitor,
Multiple-copy simultaneous antibiotics drug leads, 738-739
search methods (MCSS), 868-878 Nivalin, 892
298 anticancer drug leads, NK receptor antagonists,
Multiple isomorphous replace- 858- 868 669-670,672
ment (MIR) phasing, 477 antiparasitic drug leads, NMR, See Nuclear Magnetic
Multiple regression analysis 886-891 Resonance (NMR) spectros-
in QSAR, 8-11,50,52,53 cardiovascular drug leads, COPY
Multisubstrate analog enzyme 878-883 NMR timescale, 537
inhibitors, 720, 741-748 CNS drug leads, 849-856 NN-703,671,675
Multi-tier architecture, 392, 407 drugs derived from, NOE, See Nuclear Overhauser
Multiwavelength anomalous dif- 1990-2000,849 effects (NOE),in NMR
fraction (MAD) phasing, extract encoding and identifi- Nolatrexed, 428
474,477-478 cation, 596-597 Non-Boltzmann sampling, 100
Munich Information Center for leads for new drugs, 847-894 Nonclassical bioisosteres,
Protein Sequences (MIPS), neuromuscular blocking drug 690-694
335 leads, 856-858 Nonclassical resolution, of chiral
Muscarinic receptors NMR structure elucidation, molecules, 799-804
distance range matrices, 136 517-518 Noncompetitive inhibitors,
stereoisomer analogs, 705-706 Natural products databases, 387, 730-731
Mutation 597 Noncovalent bonds, 6,170
in genetic algorithms, 87,88 Nearest neighbors methods, 53, energy components for inter-
MVIIA (Ziconotide) 62-63,67 molecular drug-target bind-
NMR spectroscopy, 518-523, Neighborhood behavior, 211 ing, 171-174
526,534 Nelfinavir, 648 Noncovalently binding enzyme
MVT-101,103-104,105,117 asymmetric synthesis, inhibitors, 720-754
Mycophenolate mofetil, 849 817-818 Nonisosteric bioanalogs,
Mycophenolic acid structure-based design, 440, 689-694
structure-based design, 442 Nonlinear QSAR models, 28-29
446-447 Neomycin, 870,871 descriptor pharmacophores,
Myoglobin, 419 Netropsin 62- 63
binding perturbations, 544 Nonlinear regression, 67
Nabilone, 853 Neu5Ac2en Non-overlapping mapping, 398
Nadolol structure-based design, 451 Non-peptide peptidomimetics,
renal clearance, 38 Neural networks, See Artificial 636,657-674
Naftifine, 717 neural networks Nonpolar interactions, See van
Na+,K+-ATPaseinhibitors, 718 Neuraminidase inhibitors, 717 der Wads forces
Nalorphine, 850 flexible docking studies, 265 Nonstructural chemical data,
Naloxone, 850 PMF function application, 314 373
NAPFMLERT, 597 Screenscore application, 319 Norapomorphine
Naproxen target of structure-based drug alkyl chain homologation ana-
classical resolution, 794-795 design, 450-452 logs, 701
Index
Receptor-relevant subspace, 204, RGD peptide sequence mimics, ROSDAL notation, 368,410
222 129,643,645,662-665 Rosuvastatin, 848,880-881
Receptor theory, 4-7 Rgroups, 368,373,397,405, Rosy periwinkle, vinca alkaloids
Reciprocal nearest neighbor, 220 409-410 from, 858
Recursive partitioning, 247-248 and combinatorial library de- Rotatable bonds
Red clover extract sign, 221 in druglikeness screening, 245
LC-MS mass spectrum, 589, Rhinoviruses in molecular modeling, 90-91
590 comparative molecular field Royal Society of Chemistry
Reduction analysis, 153 Chemical Information
enzyme-mediated asymmetric, molecular modeling of antivi- Group, 360
806 ral binding to HRV-14, 120, RPR109353,211
Refining, search queries, 409 122 R,S descriptors, for chiral mole-
target of structure-based drug
REFMAC, 478 cules, 365, 783
design, 454-456 3
Registration, of chemical infor- RS Discovery System, 377,385
Rhodopeptin
mation, 377-379 RSR-13,422,423
template mimetics, 644,645
Registry number, 378-379,409 Ribbed melilot, drugs derived RSR-56,422,423
Relational databases, 363, 373, from, 882 RTECS, 246
409 Rifamycin, 870,872 RUBICON, 386
Relative diversitylsimilarity, 209 Rigid analogs, 694-699 virtual screening application,
Relaxation parameters, in NMR, Rigid body rotations, in molecu- 254
511,512 lar modeling, 90-91 "Rule of 5," See Lipinski's "rule
changes on binding, 536-537 Rigid docking, 262-263,293 of 5"
and ligand dynamics, 528-531 Rigid geometry approximation,
and NMR screening, 571-573 in molecular modeling, 89 S-37435,675
in receptor-based design, 534 Ring-position isomer analogs, Saccharomyces cerevisiae
Relenza 699-704 genome sequencing, 344
structure-based design, 451 Rings Saccharopolyspora erythraea,
Relibase, 315 in druglikeness screening, 245 874
Reminyl, 892 molecular comparisons, 139 Salbutamol, 885,886
Renin inhibitors, 432 in molecular modeling, 91 Salmeterol, 885,886
molecular modeling, 123, 153 Ring-size change analogs, S-Salmeterol
transition state analogs, 647 699-704 enzyme-mediated reduction,
REOS filtering tool, 225 Ritalin 806,808
RESEARCH classical resolution, 793-794 Salmonella
Monte Carlo simulated an- nonclassical resolution, 801 mutagenicity prediction, 829,
nealing, 297 Ritonavir, 648,659 831-832,840,842-843
Resiniferatoxin, 854 asymmetric synthesis, Salt bridges, 285
Restrained electrostatic poten- 807-808,809 and virtual screening, 272
tial, 102 structure-based design, 438, Salts definitions, 376
Result set, 409,411 440 Salts search, 388
Retigotine, 783 Rivastigmine, 774 Sampatrilat, 651
Retinoic acid structure-based design, Saquinavir, 648,659, 717
docking and homology model- 449-450 structure-based design,
ing, 305 RNA 435-437,440
stereoisomer analogs, 707 molecular modeling, 154 SAR-by-NMR approach, 508,
X-ray crystallographic studies, NMR structural determina- 516
492-493 tion, 535 in NMR screening, 564468,
Retinoid X receptor RNA polymerase inhibitors, 717 576
X-ray crystallographic studies, Ro-31-8959,121 Sarin, 774
493 Ro-32-7315, 652, 653 Saturated rings
Retrosynthetic analysis, 409 Ro-46-2005,673, 676 analogs based on substitution
Retrothiorphan, 650, 651 ROCS, 256,259 of aromatic for saturated
Reverse nuclear Overhauser ef- shape-based superposition, ring; or the converse,
fects pumping, 573 260 699-704
Reversible enzyme inhibitors, Roll-up, 410 Saturation diversity approach,
720 Root structure, 368,404,410 223
Index